Zindi UberCT Part 3: Uber Movement

Uber Movement has launched in Cape Town

Today, Uber Movement launched in Cape Town. This is good news, since it means more data we can use in the ongoing Zindi competition I’ve been writing about! In this post we’ll look at how to get the data from Uber, and then we’ll add it to the model from Part 2 and see if it has allowed us to make better predictions. Unlike the previous posts, I won’t be sharing a full notebook to accompany this post – you’ll have to do the work yourself. That said, if anyone is having difficulties with anything mentioned here, feel free to reach out and I’ll try to help. So, let’s get going!

Getting the data

My rough travel ‘zones’

Zindi provided some aggregated data from Uber movement at the start of the competition. This allows you to get the average travel time for a route, but not to see the daily travel times (it’s broken down by quarter). But on the Uber Movement site, you can specify a start and end location and get up to three months of daily average travel times. This is what we’ll be using.

Using sophisticated mapping software (see above), I planned 7 routes that would cover most of the road segments. For each route, I chose a start and end zone in the Uber Movement interface (see table above) and then I downloaded the data. To do it manually would have taken ages, and I’m lazy, so I automated the process using pyautogui, but you could also just resign yourself to a few hours of clicking away and get everything you need. More routes here would have meant better data, but this seemed enough to give me a rough traffic proxy.

Some of the travel times data

I manually tagged each segment with the equivalent Uber Movement trip I would be using to quantify traffic in that area, using QGIS. This let me link this ‘zone id’ from the segments shapefile to my main training data, and subsequently merge in the Uber Movement travel times based on zone id and datetime.

Does it work?

Score (y axis) vs threshold for predicting a 1. In my case, a threshold of ~0.35 was good.

In the previous post, the F1 score on my test set was about 0.082. This time around, without anything changed except the addition of the Uber data, the score rises above 0.09. Zindi score: 0.0897. This is better than an equivalent model did without the uber movement data, but it’s still not quite at the top – for that a little more tweaking will be needed 🙂

I’m sorry that this post is shorter than the others – it was written entirely in the time I spent waiting for data to load or models to fit, and is more of a show-and-tell than a tutorial. That said, I hope that I have achieved my main goal: showing that the Uber Movement data is a VERY useful input for this challenge, and giving a hint or two about where to start playing with it.

(PS: This model STILL ignores all of the SANRAL data. Steal these ideas and add that in, and you’re in for a treat. If you do this, please let me know? Good luck!)

Advertisement

5 thoughts on “Zindi UberCT Part 3: Uber Movement

  1. Nice series.
    There’s something I don’t understand here.
    I don’t get how you merged the Zone IDs of the Uber Movement data with the train set and road segments?

    Like

    1. That’s the most manual and arbitrary part of this process. I looked at the map and picked start and end zones for each major section/route of road, then used travel time from one zone to the other as a rough measure of traffic on that stretch. I think I then used QGIS to manually assign the smaller sections of road to one of the main routes.
      Because Uber doesn’t really have a nice API or anything for downloading the daily travel times, I couldn’t experiment much with looking at more zones, or properly estimating the traffic for each road segment individually. At least this manual approach for a few major routes gives a rough proxy for average traffic in the different parts of town.

      Like

  2. Nice series.
    There’s something I don’t understand here.
    I don’t get how you merged the Zone IDs of the Uber Movement data with the train set and road segments?

    Like

    1. (Duplicate reply in case it’s needed):
      That’s the most manual and arbitrary part of this process. I looked at the map and picked start and end zones for each major section/route of road, then used travel time from one zone to the other as a rough measure of traffic on that stretch. I think I then used QGIS to manually assign the smaller sections of road to one of the main routes.
      Because Uber doesn’t really have a nice API or anything for downloading the daily travel times, I couldn’t experiment much with looking at more zones, or properly estimating the traffic for each road segment individually. At least this manual approach for a few major routes gives a rough proxy for average traffic in the different parts of town.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s