Our guest today is Danny Arribas-Bel, PhD. He is the Senior Lecturer of Data Science at the University of Liverpool, and Deputy Programme Director for Urban Analytics at the Alan Turing Institute. Although his focus now is in urban planning and development, he began his academic pursuits with a PhD in economics. Building this background in the economic driving factors behind cities and communities has allowed Danny valuable insight and context into why we see certain patterns in these spaces. Through the Alan Turing Institute, Danny contributes to the Urban Grammar project, a spatial data science project which aims to classify the world’s cities based on the form and function of their unique organizations.
What is Urban Grammar?
Looking across the world, and across time, we have seen many cities rise and fall. While none of these cities are close to identical, they share an intriguing number of similarities in how they are built, populated, and utilized by their residents. Those who study the urban sciences, from planning to architecture to economics, have great interest in being able to quantify, qualify, and compare these places across the dimensions of time and space. This is where the Alan Turing Institute’s Urban Grammar project comes in.
In the same way that we can break down the grammar and syntax of a language, we can begin to break down the characteristics of cities. Before this can happen, however, there needs to be a more or less consistent dataset to pull from for analysis. Considering the characteristics of cities and their residents are constantly changing, it is a huge challenge to find data with the temporal granularity necessary to be useful to track changes.
The best and most complete data for Urban Grammar’s purposes are sources like the Census, or Ordnance Survey data. Official sources like this are rich with attributes, and reliable and consistent in their collection methods. The issue, of course, is that Census data is only collected every 10 years. In order to fill in some of the temporal gaps, researchers utilize spatial data generated from satellite imagery. This may take the form of crowdsourced OpenStreetMap datasets, which may come with built in attributes, or datasets generated through earth observation and artificial intelligence workflows.
Artificial intelligence (AI) practices like deep learning and machine learning are invaluable to Urban Grammar’s mission to make data available to study spatial changes in our urban landscapes. GIS workflows using AI generally consist of collecting and labeling a large amount of imagery in order to create training data samples. These samples are then fed into an algorithm to train it to begin to generate future samples and data on its own when provided similar imagery.
Again, the biggest challenge here is the availability of satellite imagery at a temporal scale that is useful to track change, and of high enough quality to perform deep learning workflows. Generally, this can be combated by combining data from many sources, including government provided satellite imagery, Census and Ordnance Survey data, and crowdsourced geospatial data.
Data Classification and Signatures
Once the necessary data has been aggregated, it is time to move into the equally large task of analysis. As mentioned, analysis at the scale necessary to understand entire cities requires deep learning and artificial intelligence workflows in order to match the pace at which the underlying data becomes available. These workflows require, and have the ultimate goal of classifying data.
Considering the diversity of study areas researchers deal with, how are these classes created?
Well, first and foremost, Urban Grammer approaches categorizations of cities through the framework of what they call signatures. Signatures are the resulting classes of analyzing the form and function of different spaces using concepts related to morphometrics. Form in this context is what does the space “look like”. This can be the building footprints, street network, or significant natural features revealed in imagery. Essentially, if you were learning a language and viewing flashcards with pictures of different types of buildings and features, form would be what the structure “is”.
Function is the other side of the coin here. This is what the structure “does”. For example, while the form of a house is a building, the function of the house is a living space. The form of a highway may be a street, but the function of that street is transportation. Oftentimes, form and function go hand in hand, one follows the other. Form and function can be distilled into signatures through a plug and play sort of process using AI algorithms, fine tuning it for optimal I/O.
Considering urban spaces are unique and different from each other, the inputs and outputs of these algorithms will be similarly unique, and likely will not result in uniform classifications across the world. This is where humans enter the system, providing checks and “ground truthing” on the resulting data by adding the context and knowledge of human and spatial sciences.
Applying the Sciences to Spatial Data
Understanding the function and form of the world’s cities is not a new concept. This has been a goal, direct or indirect, of many disciplines including architecture, urban planning and design, civil engineering, political science, archeology, GIS, history, etc.
If you have ever visited ruins on vacation, you have likely found yourself theorizing the day to day uses of different spaces on site through your modern lens. This can be a fun conversation starter with your traveling companions, and you may find yourself getting a bit creative with assigning applications to the space only to be surprised by an informational board that provides some correcting historical context.
Every dataset acts as a snapshot in time. Maps themselves become outdated as soon as they are put into print. By collecting, categorizing, and documenting datasets over time, researchers can build more nuanced pictures of the past, present, and future. A large part of this process for modern scientists is writing and utilizing code. Where once upon a time physicists used mathematical equations and the written word to document processes, now, researchers find code and the algorithms themselves to function as reusable and portable documentation for their research.
The building blocks of cities are pretty similar all around. Streets are streets and buildings are buildings. Every society has the same basic needs, food, lodging, work, recreation, etc. For this reason, AI practices like transfer learning, which promotes interoperability of algorithms by reusing the most base levels of code, then customizing it based on needs, are growing in popularity amongst data scientists.
Of course, these algorithms need to be adapted based on location to accommodate for differences in source data, and the technical and cultural variations inherently seen across our world. Adding regional context may add some steps to the process, but ultimately it results in richer, more relevant data for the area. Data may be beautiful, but it hardly exists just for the sake of looking pretty. We need data to make decisions. More locally relevant data results in better policy, and resource management decisions by those in charge. It allows for more nuanced predictions for the future, and a more acutely critical lens of the past.
A note from the author: If you are interested in learning more about how form and function have changed over time at the scale of day-to-day life, Bill Bryson’s book “At Home: A Short History of Private Life” does an excellent job of deconstructing the changes in use of our built world.