Elsevier

Cities

Volume 89, June 2019, Pages 268-280
Cities

Social media and urban mobility: Using twitter to calculate home-work travel matrices

https://doi.org/10.1016/j.cities.2019.03.006Get rights and content

Highlights

  • Twitter data is used to study home-work urban mobility in Madrid Metropolitan Area.

  • Home and work places detection have been improved using Land Registry (Cadastre) maps.

  • Origin-Destination matrices were designed and expanded using official different data sources.

  • Results were validated via comparison with an official travel Mobility Survey.

Abstract

The proliferation of Big Data is beneficial to the study of mobility patterns in cities. This work investigates the use of social media as an efficient tool for urban mobility studies. In this case, the social network Twitter has been used, due to its wealth of spatial and temporal data and the possibility of accessing data free of charge. Using a database of geotagged tweets in the Madrid Metropolitan Area over a two-year period, this article describes the steps followed in the preparation and cleansing of the initial data and the visualisation of the results in Geographic Information Systems in the form of home-work matrices. The Origin-Destination matrices obtained were then compared with the official data provided by the Madrid Transport Consortium from the 2014 Synthetic Mobility Survey. The results of this comparison demonstrate that the level of precision offered by Twitter as a source of geographic information is adequate and efficient, thereby permitting a more in-depth analysis of flows between different zones of interest in the study area.

Introduction

The monitoring company StatCounter recently revealed that, for first time since 1980, Android has replaced the Windows operating system as the main internet access mode (Simpson, 2017). The virtual world is entering a new era, in which mobile phones are displacing computers as the main means of interacting with society online. Enormous growth in the use of smartphones affects the quantity of information being generated. In an increasingly technological society, with an ever-greater use of internet, nearly all large cities' inhabitants generate a digital footprint of their activities and movements, a digital trail that can be followed (Blanford, Huang, Savelyev, & MacEachren, 2015). One of the consequences of this digitalization of society is the prominent role of so-called Spatial Big Data: large quantities of spatial information that can be captured, communicated, aggregated, stored, and analysed (Manyika et al., 2011). Shadows of this digitalization are intimately intermingled with offline, material geographies of everyday life (Jin et al., 2017). This new geographic data is produced constantly, in real time, can be acquired easily at a low cost, and can be incorporated and analysed in Geographic Information Systems (Osorio & García-Palomares, 2017).

Internet users are no longer mere passive recipients of information, but have become producers of vast amounts of data, particularly through social networks (García-Palomares, Salas-Olmedo, Moya-Gómez, Condeço-Melhorado, & Gutiérrez, 2018). These social networks enable information to be created, stored, shared, and exchanged with other users, and generate a huge volume of data every day (Cao et al., 2014). Social media data have been used to study the shape of urban agglomeration based on people's activity (Zhen, Cao, Qin, & Wang, 2017), and are useful to analyse urban structure and related socioeconomic performance (Martí, Serrano-Estrada, & Nolasco-Cirugeda, 2017; Shen & Karimi, 2016; Zhang, Zhou, & Zhang, 2017). Twitter, an application based on sending short texts or other multimedia content, stands out among the most-used social networks. Geo-located Twitter is a free and easily available global data source that stores millions of digital records on human activity in space and time (Hawelka et al., 2014). According to the company's data, in 2017, approximately 500 million tweets were sent every day all over the planet. According to Twitter's own data,1 the network has a base of 328 million active users per month, of which 82% generated information from their smartphones.

The new data sources provide information that is extremely useful in mobility studies. If we know the changing location of each person based on their digital footprint, we can analyse their general mobility patterns. Gathering geospatial data provides a unique opportunity to gain valuable insight into information flow and social networking within a society (Stefanidis, Crooks, & Radzikowski, 2013). Evolution of transport-demand model techniques have developed the need for a high-resolution database, with aggregated economic and sociodemographic attributes for daily travel behaviours modelling (Rashidi, Abbasi, Maghrebi, Hasan, & Waller, 2017). In this respect, Twitter is a particularly valuable source of data for the study of mobility, due to its ease of use on any mobile device, and because it offers the possibility of geotagging messages. Thus, the tweets of users who activate the geolocation service on their accounts, contain, in addition to semantic and temporal information, information about the geographic location from whence the tweet has been sent. It is possible to carry out studies on population distribution at any time (see, for example, Ciuccarelli, Lupi, & Simeone, 2014; Longley & Adnan, 2016, or García-Palomares et al., 2018).

Data from social networks, and in general, most of the data from sources associated with Big Data, are not data created to analyse urban mobility. This is common in the use of social networks in disciplines such as urban planning and mobility. Consequently, the use of social network data to study mobility bears great weaknesses in comparison with data from sources specifically created for this purpose, which are normally household mobility surveys. One of the conditioning factors is the importance of cleaning and pre-processing processes for data, so that the database that is finally used solely contains data from whence reliable mobility information is obtained. Another conditioning factor to bear in mind is data limitation. Thus, social networks provide a low temporal resolution, which is the average time between each tweet sent by a user. This makes it difficult, for example, to obtain data from activity sites related for different reasons to the residence and the workplace. The low temporal resolution also compels data to be collected from much greater time periods. With telephony, we can normally work with 2 or 3 months. However, with social networks, we need greater period samples (for example, 1–2 years). On the other hand, while mobile telephone data refer to very large user samples, samples are smaller in social network data, and are also more biased, since users of these social networks are concentrated into certain sociodemographic groups. Finally, using social network data for a purpose such as studying mobility requires a results validation process. Here, we have made an attempt to insist on this phase, and to validate results at different spatial aggregation scales, in order to see how far we can go with these data in obtaining travel matrices.

In studying urban mobility, the ability to generate Origin-Destination (OD) travel matrices is a fundamental tool for carrying out diagnostics, predicting demand for travel and contributing to modelling of transport and optimisation of network usage (Gao et al., 2014). OD flow matrices provide useful information for ridership forecasting, service planning, and control strategies: for modelling purposes, origins and destinations of movements and modal preferences are needed (Rashidi et al., 2017). Traditionally, household surveys or traffic counts were used to obtain travel matrices. However, these types of sources are expensive, time-consuming, static, and use small population samples (Iqbal, Choudhury, Wang, & González, 2014). Social networks in general, and Twitter in particular, provide new sources for estimating OD travel matrices. They enable us to work with high spatial-temporal resolution data, to access larger population samples, and to obtain this data free of charge (when downloaded via streaming). The aim of this paper is to validate the use of Twitter data in assessing urban mobility patterns by using the Madrid Metropolitan Area as case study. Trips from home to workplaces were chosen as a study motive. The economic value of the workplace is not of academic interest alone, but also of practical interest, as they are key spaces for urban design and planning. Understanding how workers move throughout the day is important to design and manage transport systems and public spaces (Pajević & Shearmur, 2017).

Previous studies focused on obtaining OD travel matrices through new data sources mainly used phone data, so-called call detail records (CDRs) (Chen, Ma, Susilo, Liu, & Wang, 2016). However, mobile phone companies are extremely reluctant to hand over their data, and if they do, charges are substantial. Furthermore, information from CDRs is usually linked to the coverage range of the antennae, which means that one frequently works with Voronoi. Using CDRs, spatial resolution is occasionally lower, and they frequently do not match the distribution of land uses or transport zones which are necessary for an accurate modelling of future situations (Järv, Tenkanen, & Toivonen, 2017). Bearing these two main limitations of previous studies based on phone data in mind, the main contribution of this paper is to validate Twitter as an alternative source to phone data in estimating OD travel matrices and detecting home-work trips, benefiting from: a) free-of-charge data; b) the availability of geolocated tweets and the possibility of spatially joining them to transport zones (which are of great interest to transport managers).

When processing Twitter data to obtain the matrices, some methodological improvements were carried out, as opposed to previous works. Thus, information on land uses from Land Registry (Cadastre) maps was used. These maps benefit from high spatial detail, offering greater precision when it comes to detecting home and work places. Furthermore, various factors of expansion were evaluated for the sample. On one hand, the authors worked with expansions based on the origins of the trip, using population data according to place of residence from official sources (Census). On the other hand, the matrices were expanded using data relating to the destination of the trip, using information about the workplace (from National Institute of Social Security records). The estimations were carried out at two different scales of analysis: a) municipality level (with finer spatial detail); b) metropolitan-zone level. The results were validated via comparison with the data from the Synthetic Mobility Survey carried out by the Madrid Transport Consortium in 2014, thereby enabling us to evaluate the quality of the matrices obtained by Twitter data and the sensitivity thereof, both to the type of expansion factor used and to the degree of spatial desegregation.

This article is divided into six sections. After this introduction, Section 2 reviews the research carried out to date using Twitter and OD matrices. Section 3 defines the study area, and Section 4 introduces the data and sources we used. Meanwhile, Section 5 reviews the methodology used in the research. The obtained results are analysed in Section 6 to establish a series of conclusions in Section 7.

Section snippets

Literature review

Urban mobility is increasingly more complex and less sustainable, characterised by an increasing number of trips, a greater diversity of reasons for travelling, more intensive use of motorized transport and longer trips (Gutiérrez & García-Palomares, 2007). Faced with this complexity, traditional sources need to be complemented with new data with higher spatial and temporal resolutions. In this context, due to their possibilities for tracking citizen's digital footprints, Big Data emerge as an

Study area

In evaluating the opportunities of Twitter data for estimating OD travel matrices, this study focuses on the Madrid Metropolitan Area. It has an estimated population of 6 million inhabitants (2017), of which 3.2 million live in the city of Madrid. Madrid is the metropolitan area with highest level of population, activity, and services in Spain.

We worked with two levels of spatial desegregation: an initial analysis with 71 spatial units, using the fifty municipalities of the metropolitan area

Twitter

The initial database for this work contains a total of 2,229,753 tweets, all geotagged and produced by 171,631 users located inside the Madrid Metropolitan Area. These tweets were compiled over a two-year period (from 1st June 2016 to 31st May 2018). Each tweet has information related to the user identification number (ID), username, latitude and longitude, date and time, language, and the hashtags it includes.

Data on resident and employment population

The data on the resident population used to expand the matrices were obtained from

Methodology

This section shows the methodology used, from obtaining the initial data to verification of results. Fig. 3 summarises the methodological flow.

The obtained sample: Twitter users according to home and work places

The first results are related to the registered population in each municipality and district in the study area based on the residence points found in the Twitter database. These data comprise the sample of residents that will be used to estimate the OD matrices. As already indicated, the authors based their calculations on a total of just over 20,744 users, for whom they were able to identify the municipality or district of both home and workplaces. These users represent almost 0.7% of the

Conclusions

The emergence of geotagged Big Data and the enormous possibilities it offers are revolutionising studies and professional practice in transport planning. One of the spheres in which the use of the new data is having the greatest impact is mobility, especially for obtaining OD travel matrices. Most works have been based on the use of mobile phone data (the so-called CDRs) (see Chen et al., 2016), which enable the collection of enormous population samples with high temporal resolution. However,

Acknowledgments

The authors gratefully acknowledge funding from the Spanish Ministerio de Educación, Cultura y Deporte (Program FPUAP2015-0147), the Spanish Ministerio de Economía, Industria y Competitividad (MINECO) and European Regional Development Fund (ERDF) (Project TRA2015-65283-R) and the Madrid Regional Government (SOCIALBIGDATA-CM, S2015/HUM-3427).

References (43)

  • P. Martí et al.

    Using locative social media and urban cartographies to identify and locate successful urban plazas

    Cities

    (2017)
  • T.H. Rashidi et al.

    Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges

    Transportation Research Part C: Emerging Technologies

    (2017)
  • T. Shelton et al.

    Social media and the city: Rethinking urban socio-spatial inequality using user-generated geographic information

    Landscape and Urban Planning

    (2015)
  • Y. Shen et al.

    Urban function connectivity: Characterisation of functional urban streets with social media check-in data

    Cities

    (2016)
  • J.L. Toole et al.

    The path most traveled: Travel demand estimation using big data resources

    Transportation Research Part C: Emerging Technologies

    (2015)
  • P. Zhang et al.

    Quantifying and visualizing jobs-housing balance with big data: A case study of Shanghai

    Cities

    (2017)
  • F. Zhen et al.

    Delineation of an urban agglomeration boundary based on Sina Weibo microblog ‘check-in’ data: A case study of the Yangtze River Delta

    Cities

    (2017)
  • R. Ahas et al.

    Using mobile positioning data to model locations meaningful to users of mobile phones

    Journal of Urban Technology

    (2010)
  • J.I. Blanford et al.

    Geo-located tweets. Enhancing mobility maps and capturing cross-border movement

    PLoS One

    (2015)
  • N. Caceres et al.

    Traffic flow estimation models using cellular phone data

    IEEE Transactions on Intelligent Transportation Systems

    (2012)
  • N. Caceres et al.

    Deriving origin–destination data from a mobile phone network

    IET Intelligent Transport Systems

    (2007)
  • Cited by (58)

    • The importance of digitalization in powering environmental innovation performance of European countries

      2023, Journal of Innovation and Knowledge
      Citation Excerpt :

      Consequently, leading companies compete on data collection and analysis, as well as on infrastructure and emerging markets. As also revealed by (Osorio-Arjona & García-Palomares, 2019), when the Internet and mobile communications have grown exponentially, a huge variety of platforms has developed - ranging from social networks and video sharing to search engines and mobile apps. Platforms are more likely to serve as intermediaries to connect service suppliers to user.

    • Contesting views on mobility restrictions in urban green spaces amid COVID-19—Insights from Twitter in Latin America and Spain

      2023, Cities
      Citation Excerpt :

      Even before the COVID-19 pandemic hit, online social media had become an extraordinary repository of attitudes, sentiments, and demands with respect to a wide range of topics. For instance, Twitter posts have been analyzed to characterize city dynamics, land use and mobility or activity patterns (García-Palomares et al., 2018; Osorio-Arjona & García-Palomares, 2019); to identify emerging discourses about a city (Monachesi, 2020); and to document how citizens use social networks to engage in and feel about decision-making of local governments (Alizadeh et al., 2019). Thus, it does not come as a surprise that a nascent but rapidly growing literature is analyzing social media messages to gain insights into residents' reactions to COVID-19.

    View all citing articles on Scopus
    View full text