The use of air quality monitoring networks to inform urban policies is critical especially where urban populations are exposed to unprecedented levels of air pollution. High costs, however, limit city governments’ ability to deploy reference grade air quality monitors at scale; for instance, only 33 reference grade monitors are available for the entire territory of Delhi, India, spanning 1500 sq km with 15 million residents. In this paper, we describe a high-precision spatio-temporal prediction model that can be used to derive fine-grained pollution maps. We utilize two years of data from a low-cost monitoring network of 28 custom-designed low-cost portable air quality sensors covering a dense region of Delhi. The model uses a combination of message-passing recurrent neural networks combined with conventional spatio-temporal geostatistics models to achieve high predictive accuracy in the face of high data variability and intermittent data availability from low-cost sensors (due to sensor faults, network, and power issues).

In this paper, researchers describe a methodology to model and predict urban air quality at a fine-grained level using dense and noisy, low-cost sensors. There are two main questions we seek to answer in this paper—(i) how can we use a network of low-cost and portable air quality monitors in order to build a fine-grained pollution heatmap in a city that provides accurate prediction?, (ii) does it help to augment existing monitoring networks by the local governments with low-cost air quality sensors?

They deploy a network of 28 low-cost sensors, many of them concentrated in the south Delhi area, in collaboration with Kaiterra41, a company that makes low-cost air quality monitors and air filters. We dramatically increase the density of the deployment by 28× in Delhi (area 573 mi2) with 28 sensors, compared to previous deployments (Xi’an – area 3898 mi2, 8 low-cost sensors). Further, the large longitudinal dataset we have been able to capture over 2 years as compared to prior work, which captured at most a few weeks of data, allows us to model long-term seasonal changes and train more complex neural network models that can adapt to seasonal and daily patterns. We build on prior work and model the pollution network in its entirety, with prediction models at each sensor location using data from near-by sensor locations.

Findings

Our data consists of PM2.5 concentration data averaged to the hour from the 28 low-cost sensors and the 32 government monitors, a total of 60 monitors, collected over a period of 24 months, from May 1, 2018, to May 1, 2020. We use the until Oct 30, 2019 for training (75%) and hold out the remaining (25%) for testing. We report two criteria—the RMSE and the mean absolute percentage error (MAPE). We evaluate our models on the data from the combined set of our 28 low-cost sensors and the 32 government monitors, as well as separately on each set. For each of these locations, we compare our model-based predictions with the ground truth of the measurement of the pollution sensor.

Read the full paper here.