Using Fabric OrgApps + Notebooks For Geospatial Data Exploration

Simon Willison is one of my favorite bloggers. In fact, what I blog, how I blog & test, is inspired by him. He wrote a blog a couple of weeks ago about FourSquare Places data that has been open-sourced. I was exploring this dataset and ended up creating a few maps. I love OrgApps in Fabric and I truly believe as it matures, it will be THE way for analysts & data scientists to provide rich insights + traditional reports to business users. Notebooks can augment the Power BI reports to provide insights that are otherwise not possible. I have submitted a session on this topic to FabCon ‘25, let’s see. If it is selected, I hope to show how transformational it is and how businesses can use it.
I won’t go into super details about the code below, but a few things to note:
I used daft to scan 104M rows from an S3 bucket in Fabric Python notebook without downloading the entire dataset. Why daft ? Because it’s optimized for reading S3 data. If you run the below notebook, you will see there is minimal memory & CPU consumption. Look at Simon’s blog above, he used Duckdb. I cleaned the transformed the data lazily using daft.
I also used Polars because polars has a nice altair integration.
Folium for creating interactive maps and timeseries using Plotly.
Notebook is embedded in OrgApps for users to explore the data. You can also embed a Power BI report using
QuickVisualize
for users to explore the data (as long as it is a small dataset).
Steps:
Just download this notebook, import it in your Fabric workspace and execute it.
To get a list of files at this S3 location:
## list of files
s3 = fs.S3FileSystem(region='us-east-1')
path = "s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/*.parquet"
file_info = s3.get_file_info(fs.FileSelector(
"fsq-os-places-us-east-1/release/dt=2024-11-19/",
recursive=True
))
for info in file_info:
print(info.path)
References
Subscribe to my newsletter
Read articles from Sandeep Pawar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sandeep Pawar
Sandeep Pawar
Microsoft MVP with expertise in data analytics, data science and generative AI using Microsoft data platform.