Distributed Google Maps scraping
Introduction
In this post, I will show you how you can utilize the power of Kubernetes to scrape data from Google Maps without using an API key.
for the tutorial, I will use as an example deploying to
But this will work in any managed Kubernetes provider.
The whole procedure to get the scraper up and running won't take more than 20 minutes. So give it a try.
Prerequisites
Create a Digital Ocean Account . I recommend if you do not have an account to create it via the referral link . This way you get 200$ of credit and I may also get 25$ (depending if you continue using Digital Ocean). This way you can try the tutorial for Free .
Note: To get the 200$ credit you need to add a payment method.Install kubectl in your local machine. Follow the official instructions .
Create a K8s Cluster
Login to your Digital Ocean Account and click on the top Right Create.
In the menu that popups select: Kubernetes
After clicking Kubernetes the Kubernetes page opens:
For the purposes of the tutorial leave the defaults.
In a real life scenario you need to pick the desired region and configure the nodes you like.
Don't change the defaults for now. If you registered in Digital Ocean via the referral link don't worry about costs for now. Additionally, keep in mind that since we are going to start headless web browser we need memory and CPU.
Please wait until the cluster initializes. This can take around 5 minutes.
Once the cluster is provisioned then you have to download the kubernetes configuration file:
Download the configuration file and take note of the location. For the purposes of the tutorial we assume that it is located at `/home/giorgos/k8s.config.yaml
Let's check that we can connect:
kubectl --kubeconfig=$HOME/k8s.config.yaml get pods && echo $?
You should get output like:
No resources found in default namespace.
0
Create a PostgreSQL database
In your Digital Ocean dashboard click on the left panel Databases
or follow this create a database in Digital Ocean .
Select PostgresSQL database and in the next page leave the defaults (the lower tier).
Then click Create:
Again wait a bit until it is provisioned.
Once the database is ready we need to:
- Create a User and a database
Create a User and a database
First, open a terminal (or your favorite GUI tool) and connect to your database
psql -p 25060 -h db-postgresql-sfo3-81615-do-user-14100026-0.b.db.ondigitalocean.com -U doadmin -d defaultdb
(Please replace host with yours)
If you managed to connect then we can move to the next step.
Create tables
CREATE TABLE gmaps_jobs(
id UUID PRIMARY KEY,
priority SMALLINT NOT NULL,
payload_type TEXT NOT NULL,
payload BYTEA NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
status TEXT NOT NULL
);
CREATE TABLE results(
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title TEXT NOT NULL,
category TEXT NOT NULL,
address TEXT NOT NULL,
openhours TEXT NOT NULL,
website TEXT NOT NULL,
phone TEXT NOT NULL,
pluscode TEXT NOT NULL,
review_count INT NOT NULL,
rating NUMERIC NOT NULL
);
Execute the above queries in your database client.
Google maps scraper deployment
First create a file with your queries. A sample is
bars in Athens
bars in Berlin
restaurants in Rome
Save this file in a file name queries.txt.
then:
docker run -v $PWD/queries.txt:/queries.txt gosom/google-maps-scraper:v0.9.3 -depth 5 -input /queries.txt -dsn "postgres://doadmin:{yourPassword}@{yourHost}:25060/defaultdb" -produce -lang en
(Replace with your password and your host)
be patient because the image is around 1GB so it needs to be downloaded
Once, the command finishes verify that the jobs are inserted to the database:
select count(1) from gmaps_jobs
Run the above query in your database client. It should return 3 if you use my sample file.
We are now ready to start our scrapers.
Create a file with the kubernetes deployment configuration named gmaps.deployment.yaml and paste the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: google-maps-scraper
spec:
selector:
matchLabels:
app: google-maps-scraper
replicas: 2
template:
metadata:
labels:
app: google-maps-scraper
spec:
containers:
- name: google-maps-scraper
image: gosom/google-maps-scraper:v0.9.3
imagePullPolicy: IfNotPresent
args: ["-c", "1", "-depth", "5", "-dsn", "postgres://doadmin:{YourPassword}@{YourHost}:25060/defaultdb"]
(Edit your password and your host)
Then apply the configuration:
kubectl --kubeconfig=$HOME/k8s.config.yaml apply -f gmaps.deployment.yaml
Give it some time, since the image needs to also get downloaded.
Check the status of the pods:
giorgos@gtp:~$ kubectl --kubeconfig=$HOME/k8s.config.yaml get pods
NAME READY STATUS RESTARTS AGE
google-maps-scraper-6489d96b84-7nltl 1/1 Running 0 68s
google-maps-scraper-6489d96b84-vvx6c 1/1 Running 0 116s
giorgos@gtp:~$
Meanwhile, check periodically the results table:
select count(1) from results;
it will start slowly populating the results table.
defaultdb=> select * from results limit 5;
-[ RECORD 1 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 1
title | Athens Sports Bar
category | Sports bar
address | Veikou 3a, Athina 117 42, Greece
openhours | Sunday, 10 AM to 12 AM; Monday, 10 AM to 12 AM; Tuesday, 10 AM to 12 AM; Wednesday, 10 AM to 12 AM; Thursday, 10 AM to 12 AM; Friday, 10 AM to 12 AM; Saturday, 10 AM to 12 AM. Hide open hours for the week
website | http://www.athenssportsbar.gr/
phone | +302109235811
pluscode | XP8H+V9 Athens, Greece
review_count | 1
rating | 4.4
-[ RECORD 2 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 2
title | 360 Cocktail bar
category | Bar
address | Ifestou 2, Athina 105 55, Greece
openhours | Sunday, 9 AM to 4 AM; Monday, 9 AM to 3 AM; Tuesday, 9 AM to 3 AM; Wednesday, 9 AM to 3 AM; Thursday, 9 AM to 3 AM; Friday, 9 AM to 3 AM; Saturday, 9 AM to 4 AM. Hide open hours for the week
website | http://www.three-sixty.gr/
phone | +302103210006
pluscode | XPGG+H6 Athens, Greece
review_count | 8
rating | 4.4
-[ RECORD 3 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 3
title | Teddy Boy
category | Bar
address | Taki 18, Athina 105 54, Greece
openhours |
website | https://m.facebook.com/teddyboy.bar
phone | +306951116651
pluscode | XPHF+8F Athens, Greece
review_count | 489
rating | 4.5
-[ RECORD 4 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 4
title | Revolt street bar
category | Bar
address | Koletti 25-27, Athina 106 77, Greece
openhours | Sunday, 11 AM to 2 AM; Monday, 11 AM to 2 AM; Tuesday, 11 AM to 2 AM; Wednesday, 11 AM to 2 AM; Thursday, 11 AM to 2 AM; Friday, 11 AM to 3 AM; Saturday, 11 AM to 3 AM. Hide open hours for the week
website | https://www.facebook.com/Revoltstreetbar/
phone | +302103800016
pluscode | XPPM+85 Athens, Greece
review_count | 461
rating | 4.5
-[ RECORD 5 ]+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 5
title | 42 Barstronomy Athens
category | Cocktail bar
address | Kolokotroni 3, Athina 105 62, Greece
openhours |
website | https://42barstronomy.gr/
phone | +302130052153
pluscode | XPGM+Q8 Athens, Greece
review_count | 1
rating | 4.5
defaultdb=>
Conclusion
In this tutorial, I showed you how you can use the google-maps-scraper in Kubernetes to automate and scale scraping Google Maps results.
Note: Please clean up the resources in your Digital Ocean account to avoid undesired charges once you are done with this tutorial
Subscribe to my newsletter
Read articles from Georgios Komninos directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Georgios Komninos
Georgios Komninos
I am a software engineer based in Cyprus with over 20 years of experience in the industry. My background in Computer Science has led me to work with PHP, Python, and more recently, with a focus on Golang. Originally from Greece, my career has taken me across Europe, and I now call Cyprus home. I've attended numerous conferences, continually expanding my knowledge and network. Recently, I started blogging to share my insights and experiences with the tech community. I'm passionate about engaging with fellow developers and contributing to the field through my writing and future projects. Thank you for visiting my blog.