Bears, Beets, Bikes: My Google Data Analytics Capstone

Ok, so there aren’t any bears or beets. There are bikes, though, like a ton of them. This page is the result of my Google Data Analytics Professional Certificate. From November 3rd of 2023 until this very day I have worked to complete this project, and I’m proud to finally have it posted!

The Nitty Gritty


In the fictitious scenario for the capstone I work for a bike share company called Cyclistic. The director of marketing has asked three questions:

  • How do annual members and casual riders use Cyclistic bikes differently?
  • Why would casual riders buy Cyclistic annual memberships?
  • How can Cyclistic use digital media to influence casual riders to become members?

I’ve been assigned the first question. I won’t lie though. I attempted to answer all three. It was just too tempting.

*Note. If you’d like to view and/or download the capstone assignment PDF or any of my associated documents for the capstone you can do so here: The Boring Stuff.

The capstone also provided a link to a “Real” bike share company named Divvy, out of Chicago, who is gracious enough to bless us with millions of rows of sweet sweet data. The capstone suggested to take the data, focus on a quarter of a given year, and try to answer the question. I decided that wasn’t enough time for me so I went with January 2023 to November 2023. Let’s see some stats!

Data . . . In The Raw

My raw data ended up as follows:

  • 11 CSV files titled and organized by month
  • 5,495,804 rows of good clean fun

See, I’m not lying.

Once I got a gander at the size of the dataset, I put on my big boy pants, grabbed some caffeinated bean juice, and got busy. My data preparation and processing went something like this:

  • Downloaded all the CSVs
  • Created a Google BigQuery project to hold them all
  • Used SQL to append them all together into one glorious dataset
  • Used SQL some more to do some tidying up. (end_station_name = “OH Charging Stx – Test” I’m looking at you!) See the whole process here : More boring stuff.
  • Fired up Microsoft Power BI
  • Started verifying
  • “Wait, what’s that station doing in Lake Michigan?”
  • “What’s a docked bike?”
  • “Why do so many rows have no start_station_name?”
  • “Are people just leaving bikes laying around?”
  • Cleaned some more
  • Back to Power BI
  • Answers?

The first insight came when comparing total trips by members and casual riders by time of day. I noticed that member trips spiked around 8:00am and 5:00pm. So they’re using the bikes to…go to work? I compared the trend in warmer months and colder months to see just how committed these members are.

Although overall trips plummet in the colder months, the daily patterns of riders became even more distinct. The chart above shows just how committed members are to getting to work or school. While casual riders start to flatten out in the colder months, members keep on keeping on at 8:00am and 5:00pm.

To really bring home the work/school theory I looked at rides by day of the week, perhaps the most important visual of the entire analysis. members nearly triple casual riders during the weekdays. Casual riders rally towards the end of the week, when things are more… you know… casual.

“Curiouser and curiouser!”

Alice In Wonderland

As I continued, I got curious about more things. Did members prefer a type of bike? What about the length of trips? The above chart, to me, illustrates that members don’t really have a preference for bike type.

I will admit that there could be factors beyond the scope of our current data that could affect the outcome. For example, if the inventory of classic bikes outnumbers electric bikes 10 to 1 these numbers would mean a LOT more for bike preferences. I would also feel sorry for the hand full of electric bikes being ridden nearly 24/7.

This is also a good place to mention the limitations of the data. There’s a lot of data that a marketing team would certainly want to have to make a good decision, but alas, we can only speculate. These types of data would be:

  • The total number of members
  • The average number of trips per member
  • The number of classic and electric bikes
  • Which stations are frequently under stocked
  • What areas members live in

With those types of data the team could take a much more targeted approach to marketing to casual riders.

Next up, we have average trip length by member status. At first, I hadn’t separated the trips by bike type. Silly me. The high number of classic bike trips by casual riders skewed the trip length high for them. I might have walked away thinking that casual riders just ride farther in general. I think the data is more nuanced than that. To see why we must look at our trusty friend, Pricing.

Ah, capitalism, you’ve done it again. With classic bikes being nearly 2.5 times less expensive per minute than electric bikes, you’d be hard pressed to find me riding an electric bike more than 10 minutes either. At 30 minutes that’s $13.60 once you pay to unlock it. With a classic bike I could get there with enough left for a tall pumpkin spice cold brew. normally I’d have a grande, but inflation get’s us all I suppose.

There is still something to be said about members only traveling around 10 minutes for either bike type. They could ride a classic bike for up to 45 minutes for free, and get an electric bike for nearly half the price a casual member pays. You’d think they would ride for longer. So why don’t they? Let’s take a trip back to two of the questions from earlier:

  • How do members and casual riders use the service differently?
  • Why would casual riders buy a membership?

I’ve got a theory my dear Watson

But…

We still haven’t answered why members take shorter trips. And why might they need to get to and from work using a bike, rather than a car, bus, or the bony extremities at the bottoms of their legs. To attempt to answer this we had to get GEOGRAPHICAL!

This is when we cranked the dial up a few notches. Cleaning data for over 1,500 stations that had null values, shared titles, and wonky latitude and longitude data was a real rollercoaster. To see how I cleaned up the station information to get down to a single Station/Station ID pairing you can click here: The boring stuff from earlier.

Once it was done; however, the fruits were oh so sweet. The maps above display two things:

  • The first shows the ratio of members to casual riders. If the station favors members at least 2 to 1 the bubble is pink. Stations that favor casuals in any way are green, and neutral stations are blue.
  • The second map shows, in both color intensity and bubble size, the concentration of bike trips by station.

We already knew, from the weekly analysis earlier, that members take more trips that casual riders, like a lot more. Now we know that members tend to be concentrated in the downtown and university areas of the city. Let’s not get carried away, though, lest we fall into confusing causation with correlation. I pressed on.

More Maps!

There were several super interesting corelations between the graphs from earlier, the maps I generated, and the maps from zipatlas.com.

  • The number of trips and the stations that favor members are concentrated where there are the highest concentrations of students.
  • The number of trips and the stations that favor members are concentrated where where the commute to work is the shortest.
  • Concentration of trips and stations that favor members are more sparse where unemployment is the highest.

Phew… tired yet. At this point we might be reaching the limitations of the data available. I think that the most substantial thing we can say about the data we have is that members seem to be using the program to get to and from work and school.

So what about that third question?

How can Cyclistic use digital media to influence casual riders to become members?

I’m definitely not a marketing expert, but I couldn’t resist having a go at it. Divvy, uhm, I mean Cyclistic could use an affiliate program to offer a free trial period for memberships if they sign up for them through a university, adult education institution, downtown apartment complex, or business in a crowded area with limited parking. This would align with the idea that members are using bikes to get to and from school and work.

Once riders have experienced the convenience of the bike membership they will be more likely to sign up and maintain their membership. This is because they would be signing up out of genuine utility, rather than novelty.

And with that, folks, I am off to the next data adventure.