The 2025 EY Open Science AI and Data Challenge: Cooling Urban Heat Islands Experience


This challenge is focused on a phenomenon known as the urban heat island effect, a situation that occurs due to the high density of buildings and lack of green space and water bodies in urban areas. Temperature variations between rural and urban environments can exceed 10-degrees Celsius in some cases and cause significant health, social and energy-related issues.
The goal of the challenge is to develop a machine learning model to predict heat island hotspots in an urban location. Additionally, the model should be designed to discern and highlight the key factors that contribute significantly to the development of these hotspots within city environments.
Data Format
Participants were given near-surface air temperature data in an index format, which was collected on 24 July 2021 using a ground traverse in the Bronx and Manhattan region of New York City. This dataset constitutes traverse points (latitude and longitude) and their corresponding UHI (Urban Heat Island) index values.
It is important to develop the model not using the longitude and latitude as the parameters.
Problem Statement Analysis
By going through some articles, books related to the remote sensing and urban heat effect, concluded that urban heat is contributed by the factors:
Lack of vegetation.
Closed density of the buildings in urban. ( which makes wind block, result to less dispersion of heat )
Lack of water bodies.
Concrete absorbs heat from the sun and releases it slowly at night ( making nights warmer )
Data Collection
From our analysis, we determined that we needed data on land surface temperature, vegetation, and water bodies, so our team collected remote sensing bands data from Landsat 8 & 9 and Sentinel-2 datasets & EY team also suggested to collect from these. Our team also go through other datasets like bio-diversity, thermal emissivity datasets in NASA Earth Science, Microsoft Planetary , but didn’t found the diversity in it.
Graphs Analysis
From this we have concluded the data quality is good since diversity is there.
Labels
NVDI ( Normalized Vegetation Density Index )
Red band
Blue band
Green band
Land Surface temperature band
NDWI ( Normalized difference water index )
trad - thermal radiance band
emis - emissivity band
swir - short wave infrared
These are some random variables we have used to develop model. We have extracted bands using Microsoft planetary API.
Our Model Methodology
Our initial model developed from these random variables achieved r score 0.65 on the test data using Gradient Boosting. This indicated need for generating new derived labels.
Since the heat effect is cumulative, decided to generate derived labels by considering the cumulative effect of surrounding points.
One Interesting Observation
As the consideration of surrounding points increases, the co-relation for the bands like SWIR band with UHI Index is also increased.
f(derived_variable) = (Average of its surrounding points)
f - average pooling
By using the above effect, our model improved r2 score to 0.77.
Co-relation Index
Correlation index increased from about 30% to 45% for random variables like Landsat temperature. For the SWIR band, it increased from 18% to 42%. For other bands like red, blue, and green, the correlation value improved when using average pooling.
Conclusion
During the hackathon, our team developed models like Gradient Boost and XGBoost, focusing on generalization with a low learning rate. The challenge was collecting cloud-free remote sensing data and preprocessing it for accuracy, which we achieved by analyzing and experimenting. Creating better scoring methods was tough, but understanding the problem statement in data science is key to building a better model by offering insights into derived variables.
References
Subscribe to my newsletter
Read articles from Muppala Hindu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Muppala Hindu
Muppala Hindu
I'm currently pursuing Computer Science and Engineering in Chaitanya Bharathi Institute of Technology, Hyderabad.