Strava Metro commuting cycledata (blogpart 3)

The Netherlands now has estimated one million Strava users. For most users, cycling, mountain biking or running is their main activity. But; once the app is installed Strava also records many 'commuting' or utilitarian bike rides.


Despite utilitarian cycling not being the most frequently recorded activity within Strava, it is perhaps the Strava Metro dataset with the greatest potential. Because, 96% of Dutch bike rides are of a (primarily) utilitarian nature (CBS). If Strava utilitarian bike rides represent a distinctive portion of them, it provides valuable insight into an important use of city and landscape.


Within Strava, activities are characterized as utilitarian if either the cyclist himself indicates this (check the 'commute' button), or if an activity has the same characteristics as that group of 'commuters'. For example, based on the start and end points of a route, it can be clearly recognized whether this is an 'A to B' cycling activity. This is done by an algorithm of Strava Metro itself. At Endomondo data, we have also investigated such distinctions ourselves in various cities. For more information; see for example this document (p15-20) from Utrecht, but such a distinction can indeed be well made based on such characteristics of cycling activities (including time of day, distance, detour factor and speed).



Strava usage among (Sports-)recreational cycling

In Strava, of the 287 million 'bike rides' worldwide in 2018, some 84 million were 'commutes' (29%). So the rest are 'leisure rides'. In 2021 Netherlands this commute share seems to be somewhat lower. Based on Amersfoort (where we have access to Strava Metro) we can give an indication. In June 2021, in Amersfoort, about 15% of Strava cycling activity was utilitarian.

What proportion of the total number of Dutch bicycle trips does this represent? In the app Endomondo, 2% of Dutch bicycle commutes went through Amersfoort. In the Netherlands, approximately 1.5-2 million commutes per year (calculated) would be recorded with Strava. Out of 4.4 billion Dutch bike rides per year; it would come down to 1 in ~2000a3000 utilitarian bike rides being recorded with Strava. On the one hand, that sounds like little; but it's already 4 times as many bike rides per year as were made in Bike Countdown Week, for example (416,000 in 2016).



User characteristics

Again, based on Strava Metro data in Amersfoort, we can give an indication of how utilitarian Strava bike rides are distributed across different age categories and men/women. It is particularly striking that, compared to 'leisure cycling', the proportion of women is higher (29% vs. 20%). Men still account for the largest number of 'commute counts' (67%). The age distribution is almost the same as the distribution within Strava 'leisure cycling'. Commuters are mostly middle-aged.



Representativeness Strava Metro commute cycledata


It is precisely in the activity type 'bike_commute' that representativeness is most relevant and complex. For two reasons; (1) because Strava bike commutes constitute only a minimal proportion of total bike commutes (~1 in 3000), and (2) because Strava bike commutes have distinctive characteristics that differ from 'the average' utilitarian bike ride'.


The frequently asked question around 'representativeness' is to what extent do the intensities of use of cycleways, resulting from Strava data, match the 'actual' intensity of use of cycleways? In other words, do Strava-utility cyclists cycle proportionally the same routes as 'average' utility cyclists?


The answer to that question is simple: no. At least; not in the Netherlands. Strava cyclists/cyclists certainly do not constitute a perfect cross-section of all Dutch cyclists/cyclists. For that matter, no form of 'crowd-sourced' cycling data does.


But in our view, that is not the most relevant question. What is more relevant is for which specific types of utility cyclists the data is more representative, and for which utility cyclists it is less so. Because that is the beauty of data generated by real people, and not by traffic models. It ís always representative: at least for the group of people who use Strava, and the kind of activities they record with it.


And the follow-up question is then: are the route usages of the group(s) it 'well' represents interesting and relevant?


Distance

In the past five years at TRACK-landscapes we analyzed mainly data from the activity tracking app 'Endomondo', which like Strava was also used for utilitarian bike rides.

In our finding, the main factor that characterizes both the utilitarian cycling activities of Strava and Endomondo is 'distance'.

It is a logical characteristic, with major implications for route usage. Those who cycle a few miles will rarely record it with an activity tracker; it is not substantial enough as an accomplishment or movement for most. But if the cycling distance is longer, it will be seen as more substantial performance/movement. Utilitarian cyclists are relatively more likely to record such a longer bike ride with an activity tracking app.


The average utilitarian bicycle ride in the Netherlands is 4.1 kilometers (CBS).. Almost 70% of bicycle rides, are shorter than 3.7 kilometers (long bicycle rides influence the average relatively strongly). Therefore, most bike rides are done for groceries, shops or to go out.


And this is very different for Strava utilitarian bike rides: Worldwide, the average utilitarian Strava bike ride was 15.4 kilometers (Strava YIS 2019). In several European countries (the Netherlands unfortunately not known), the median bike ride distance was around 8 kilometers. In the utilitarian cycling data (2012-2017) from the app Endomondo, that distance (in the province of Utrecht) was fairly similar; the median was 10.4 kilometers and the average cycling distance 15.5 kilometers.


And whether it is walking, running, recreational cycling or utilitarian cycling, we have seen in many different cities that the distance to be covered is very decisive for which roads are used a lot or a little. This does have different reasons in those different types of activities.


Long utilitarian bicycle trips (5-30km) in the Netherlands are often inter-urban (also called regional): from one work/residential center to another work/residential center. The type of destination is more likely to be a work location, for example, or a regional facility/store. In addition, ithe nner-city bicycle routes that are mainly used are those that logically connect to routes between these destinations.

Short utilitarian bicycle trips (<5km) will usually be made to do some shopping, or to the town centre, or to the sports club. These destinations are often located in other places than the major work locations and regional facilities. With that, the use of bicycle routes also becomes substantially different.

The Endomondo data also showed this very strongly in the Province of Utrecht. The picture of bicycle use of the city for bicycle trips of 0-5, 5-10 and 10+ kilometers, is very different. Long bike trips have a much more widespread picture of route use.



Comparison bike-count points (Endomondo)

You can of course compare utilitarian bike data from activity trackers such as Strava or Endomondo, with local bike counts. In Utrecht, for example, we compared local bicycle counts and utilitarian bicycle data from the app Endomondo.


When comparing the total numbers of passages in these, there is absolutely no similarity/relationship visible at the count points (R^2=0.385). The most passed bike routes, are often not the most passed bike routes within Endomondo:




Strava Metro vergelijking fietstelling
Strava Metro vergelijking fietstelling


But what if we take within the Endomondo dataset, all utilitarian cycling activities less than 4 kilometers long? After all, that distance has the vast majority of inner-city cycling activities, where these measurement points are located. Then suddenly the comparison is significantly better (R^2=0.588):






And this is also why in Flemish Brabant (see article 'Strava Leisure Cycling') the comparison between the counting points and the Endomondo cycling data showed a strong correlation: these counting points were all outside urban areas, inter-urban. You can't get there if you only cycle a few kilometers, only 'longer' cycling activities pass through there. And so it compares well with utilitarian bike rides from activity trackers: those also contain mostly longer bike rides.


Comparison bike-count points (Strava)

So do these Endomondo insights apply to Strava? Of course, that conclusion cannot be simply extended on a one-to-one basis; it is conceivable that Strava-utilitarian cyclists have different characteristics than Endomondo-utilitarian cyclists. But one very important characteristic, undertaking 'long' bike rides, is similar. The age distribution and gender distribution of Endomondo and Strava are also of great similarity.


Right now, we do have the ability to compare local bike counts in IJsselstein (where we also have access to Strava Metro). The interesting thing about IJsselstein is that here bicycle counts were done around the residential core of IJsselstein, to measure regional bicycle movements. The local counts were done in May 2019, the graph shows the comparison with Strava commute passages throughout 2019. The correlation comes out to R^2 of 0.55; there is a correlation, but it is not very strong.


Strava Metro vergelijking fietstelling IJsselstein
Strava Metro vergelijking fietstelling IJsselstein

The tricky part of this comparison, however, is that most of the points are really outside an urban area, but not all. The counting point 'Randdijk' concerns the main bicycle connection to Nieuwegein Centre, which is only 3-5 kilometers cycling from IJsselstein. It is on this bike path that Strava scores the most 'underperforming'. But that makes perfect sense if Strava actually represents lower bike rides. Without this Randdijk, the R^2 would come out at 0.72.


This remains the tricky part of these kinds of comparisons; to know even better whether Strava_commutes properly represents the longer bike movement, you should ideally compare it with other sources/research/counting points that actually show the long bike rides. But: there aren't any. At least, as far as we know; if you do know of studies with this focus, please let us know!


But, this absence of specific knowledge about long bike movements is exactly what makes the Strava Metro data potentially so valuable.


Scienctific research/comparisons (Strava)

Several scientific institutions have also examined the Strava cycling data for representativeness and usability. The article "Strava Metro data for bicycle monitoring: a literature review" summarizes scientific research on this.


Several (mainly American) studies concluded that Strava data can be of added value to other methods that estimate cycling intensities. Some studies compared -as we did- local bicycle counts with Strava counts. The studies were undertaken in America, Canada Australia, United Kingdom, and Germany. The R-values (expressed strength of correlation) were >0.75 in 5 of the 9 studies. That's a solid correlation. But those R-values also varied widely, between 0.3 and 0.5 also occurred.

This also demonstrates the difficulty of global-generalizing about Strava data representativeness. Bicycle use and also Strava use is so different worldwide, that you can hardly assume conclusions for one country for another country. And that certainly applies to the Netherlands; the Netherlands is both in terms of urbanization (small, dispersed cities), bicycle infrastructure, and bicycle use (highly developed, but also specific demographic differences) incomparable with other countries. Conclusions about Strava representativeness in the Netherlands, you really have to base on comparisons/studies in the Netherlands.

Studies do conclude that the Strava data is technically offered in a usable way, the accuracy of GPS devices and the translation to passages of cycleway is well possible. What this also indicates in the review is that Strava Metro has proven useful for doing intervention evaluations. It turned out to be quite possible in several studies to see changes in usage after adding or improving a bike route. Of course, the general question regarding representativeness of Strava data/ Strava user remains valid within this; who represents this visible change? Does this Strava change apply equally to local bicycle counts? This is still under limited investigation.

In a general sense, the review notes that a lack of demographic data and bicycle trip (nature) data such as origin-destination relationships, limits the view of representativeness. We agree, but also think that our research/comparisons add valuable insights in this matter.


It is striking that the -as far as our measurement experience is concerned- most decisive characteristic of activity tracking data, namely the distance to be cycled, remains unmentioned in all these studies. The differences in origin-destination are mentioned, but that this is caused by the fact that activity trackers are usually only used when cycling distances longer than a few kilometers is not emphasized. The question of whether Strava-like data can be a good representation of 'regional' or 'long distance' bicycle use is therefore not raised. And that is, in our opinion, precisely the most important question that needs to be asked with these datasets. In the Dutch situation, but likely also in most other countries.


Conclusions Strava commute cycledata


It is desirable to make comparisons between Strava Metro data and local counting points in more places (in the Netherlands). Then, an even better picture can emerge of who and what the Strava Metro does and does not show. You can already conclude that the total/average Strava-utilitarian-bicycle dataset does not give a good representation of the total/average Dutch bicycle use. You have to make specific subdivisions in the datasets, to be able to represent the route use of specific groups to a reasonable extent.


However, if you want to use bicycle data to better understand the needs, motivations and route use of cyclists, the 'total bicycle use' or the 'average bicycle use' is not that interesting at all. Precisely because, for example, short cycle trips give a totally different picture of route use than long cycle trips (mainly due to different destinations), and those cyclists also have different preferences and motivations.

Totals of bicycle passages' may for instance be relevant for pure capacity issues of cycle paths/roads, for which bicycle data seems to be used a lot at the moment. But in our opinion that is not where the greatest potential of bicycle data applies. A good bicycle policy should in essence be about insight into and understanding of the different needs of different cyclists.


And based on the various comparisons we have made so far, it is clear that Strava Metro data can provide a good representation of a specific group of cyclists/cycling trips, namely 'long distance', regional cycling trips.


Is that group of cycling activities interesting and relevant? As far as we are concerned, yes! What various studies in the Netherlands have shown (KiM), is that the bicycle has the potential to reduce car use precisely in that longer cycling movement. With the rise of the electric bike, these longer distances are becoming more frequent and easier. For that reason, Dutch municipalities and provinces are putting increasing emphasis on improving regional cycling routes, between different residential areas. And for that purpose it is very valuable to separately chart the bicycle use of bicycle trips over longer distances. To get a better grip on specific route preferences and use(s) of that group of cyclists.


And local bicycle counts, on the other hand, cannot do that. Each method of measurement, each dataset, has its own possibilities and limitations. It is almost never a question of 'representative or not'.

It would therefore be nice if in Strava Metro datasets, a subdivision is made in 'distance categories' (e.g. 1-5km, 5-10km,10-20km, >20km). It would not surprise us -based on the insights from the Endomondo data- that Strava data can then even give a decent representation of short distance cycling usage.


And for the sake of all this, I'll end with an appeal: do you know of any data from local bicycle counts, both inner-city and outer-city? Or: research on differences in route use between long-distance and short-distance bicycle trips? If so, we would be happy to do (even) more research to be able to paint a (even) better picture of the opportunities this dataset offers!


Curious about the ways we translate Strava Metro data and other activity tracking data, insights into sport-recreational cycling into spatial development opportunities?

In Utrecht we used data from Endomondo to map the shared interests of recreational and utility bicycle traffic, see for example the blog beautiful-fast cycleroutes. In the study 'Fiets+OV' we used data from the cycle countdown week to map out which routes to railway stations are cycled. We translated this into development opportunities for the station environment.