Scraping Apple App Store Product Info And Reviews with Python
What will be scraped
Why using API?
There're a couple of reasons that may use API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page:
- Apple Product:
- Apple Reviews:
Head to the Apple Product Page playground and Apple App Store Reviews playground for a live and interactive demo.
Full Code
If you don't need an explanation, have a look at the full code example in the online IDE.
from serpapi import GoogleSearch
import json
def get_product_info(product_id):
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_product', # SerpApi search engine
'product_id': product_id, # ID of a product
'type': 'app', # type of Apple Product
'country': 'us', # country for the search
}
search = GoogleSearch(params) # data extraction on the SerpApi backend
product_info = search.get_dict() # JSON -> Python dict
del product_info['search_metadata']
del product_info['search_parameters']
del product_info['search_information']
return product_info
def get_product_reviews(product_id):
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_reviews', # SerpApi search engine
'product_id': product_id, # ID of a product
'country': 'us', # country for the search
'sort': 'mostrecent', # sorting reviews
'page': 1, # pagination
}
product_reviews = []
while True:
search = GoogleSearch(params)
new_page_results = search.get_dict()
product_reviews.extend(new_page_results['reviews'])
if 'next' in new_page_results.get('serpapi_pagination', {}):
params['page'] += 1
else:
break
return product_reviews
def main():
product_id = 1507782672
app_store_results = {
'product_info': get_product_info(product_id),
'product_reviews': get_product_reviews(product_id)
}
print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
if __name__ == '__main__':
main()
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
import json
Library | Purpose |
GoogleSearch | to scrape and parse Google results using SerpApi web scraping library. |
json | to convert extracted data to a JSON object. |
Top-level code environment
At the beginning of the function, the product_id
variable is created that stores the ID of the desired product:
product_id = 1507782672
Next, the app_store_results
dictionary is created, to which the data returned by the get_product_info(product_id)
and get_product_reviews(product_id)
functions are added. The explanation of these functions will be in the corresponding headings below.
app_store_results = {
'product_info': get_product_info(product_id),
'product_reviews': get_product_reviews(product_id)
}
After the all data is retrieved, it is output in JSON format:
print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
This code uses the generally accepted rule of using the __name__ == "__main__"
construct:
def main():
product_id = 1507782672
app_store_results = {
'product_info': get_product_info(product_id),
'product_reviews': get_product_reviews(product_id)
}
print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
if __name__ == '__main__':
main()
This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.
You can watch the video Python Tutorial: if name == 'main' for more details.
Get product information
The function takes a specific product_id
and returns a dictionary with all the information about that product.
At the beginning of the function, the params
dictionary are defined for generating the URL:
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_product', # SerpApi search engine
'product_id': product_id, # ID of a product
'type': 'app', # type of Apple Product
'country': 'us', # country for the search
}
Parameters | Explanation |
api_key | Parameter defines the SerpApi private key to use. You can find it under your account -> API key. |
engine | Set parameter to apple_product to use the Apple Product engine. |
product_id | Parameter defines the product id you want to search. You can use the specific id of a product that you would like to get the product page of. |
type | Parameter defines the type of Apple Product to get the product page of. It defaults to app . |
country | Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions. |
📌Note: You can also add other API Parameters.
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the product_info
dictionary we get data from JSON:
search = GoogleSearch(params) # data extraction on the SerpApi backend
product_info = search.get_dict() # JSON -> Python dict
The product_info
dictionary contains information not only about the product, but also about the request. Request information is not needed, so we remove the corresponding keys using the del
statement:
del product_info['search_metadata']
del product_info['search_parameters']
del product_info['search_information']
At the end of the function, the product_info
dictionary with the extracted data is returned:
return product_info
The complete function to get product information would look like this:
def get_product_info(product_id):
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_product', # SerpApi search engine
'product_id': product_id, # ID of a product
'type': 'app', # type of Apple Product
'country': 'us', # country for the search
}
search = GoogleSearch(params) # data extraction on the SerpApi backend
product_info = search.get_dict() # JSON -> Python dict
del product_info['search_metadata']
del product_info['search_parameters']
del product_info['search_information']
return product_info
Get product reviews
The function takes a specific product_id
and returns a dictionary with all the reviews about that product.
At the beginning of the function, the params
dictionary are defined for generating the URL:
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_reviews', # SerpApi search engine
'product_id': product_id, # ID of a product
'country': 'us', # country for the search
'sort': 'mostrecent', # sorting reviews
'page': 1, # pagination
}
Parameters | Explanation |
api_key | Parameter defines the SerpApi private key to use. You can find it under your account -> API key. |
engine | Set parameter to apple_reviews to use the Apple Reviews engine. |
product_id | Parameter defines the ID of a product you want to get the reviews for. |
country | Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions. |
sort | Parameter is used for sorting reviews. It can be set to mostrecent or mosthelpful . |
page | Parameter is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). |
📌Note: You can also add other API Parameters.
Define the product_reviews
list to which the retrieved reviews will be added:
product_reviews = []
The while
loop is created that is needed to extract reviews from all pages:
while True:
# data extraction will be here
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the new_page_results
dictionary we get data from JSON:
search = GoogleSearch(params)
new_page_results = search.get_dict()
Adding new data from this page to the product_reviews
list:
product_reviews.extend(new_page_results['reviews'])
# first_review = new_page_results['reviews'][0]
# title = first_review['title']
# text = first_review['text']
# rating = first_review['rating']
# review_date = first_review['review_date']
# author_name = first_review['author']['name']
# author_link = first_review['author']['link']
📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['reviews'][0]
. This is the index of a review, which means that we are extracting data from the first review. The new_page_results['reviews'][1]
is from the second review and so on.
After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination
dictionary, then the page
parameter is incremented by 1
. Else, the loop stops:
if 'next' in new_page_results.get('serpapi_pagination', {}):
params['page'] += 1
else:
break
At the end of the function, the product_reviews
dictionary with the extracted data is returned:
return product_reviews
The complete function to get product reviews would look like this:
def get_product_reviews(product_id):
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_reviews', # SerpApi search engine
'product_id': product_id, # ID of a product
'country': 'us', # country for the search
'sort': 'mostrecent', # sorting reviews
'page': 1, # pagination
}
product_reviews = []
while True:
search = GoogleSearch(params)
new_page_results = search.get_dict()
product_reviews.extend(new_page_results['reviews'])
if 'next' in new_page_results.get('serpapi_pagination', {}):
params['page'] += 1
else:
break
return product_reviews
Output
{
"product_info": {
"title": "Pixea",
"snippet": "The invisible image viewer",
"id": "1507782672",
"age_rating": "4+",
"developer": {
"name": "ImageTasks Inc",
"link": "https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
},
"rating": 4.6,
"rating_count": "620 Ratings",
"price": "Free",
"in_app_purchases": "Offers In-App Purchases",
"logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
"mac_screenshots": [
"https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/643x0w.webp",
"https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/643x0w.webp",
"https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/643x0w.webp",
"https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/643x0w.webp"
],
"description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
"version_history": [
{
"release_version": "2.1",
"release_notes": "- New \"Fixed Size and Position\" zoom mode- Fixed a bug causing crash when browsing ZIP-files- Bug fixes and improvements",
"release_date": "2023-01-03"
},
... other versions
],
"ratings_and_reviews": {
"rating_percentage": {
"5_star": "76%",
"4_star": "13%",
"3_star": "4%",
"2_star": "2%",
"1_star": "4%"
},
"review_examples": [
{
"rating": "5 out of 5",
"username": "MyrtleBlink182",
"review_date": "01/18/2022",
"review_title": "Full-Screen Perfection",
"review_text": "This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
},
... other reviews examples
]
},
"privacy": {
"description": "The developer, ImageTasks Inc, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.",
"privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt",
"cards": [
{
"title": "Data Not Collected",
"description": "The developer does not collect any data from this app."
}
],
"sidenote": "Privacy practices may vary, for example, based on the features you use or your age. Learn More",
"learn_more_link": "https://apps.apple.com/story/id1538632801"
},
"information": {
"seller": "ImageTasks Inc",
"price": "Free",
"size": "7.1 MB",
"categories": [
"Photo & Video"
],
"compatibility": [
{
"device": "Mac",
"requirement": "Requires macOS 10.12 or later."
}
],
"supported_languages": [
"English"
],
"age_rating": {
"rating": "4+"
},
"copyright": "Copyright © 2020-2023 ImageTasks Inc. All rights reserved.",
"in_app_purchases": [
{
"name": "Upgrade to Pixea Plus",
"price": "$3.99"
}
],
"developer_website": "https://www.imagetasks.com",
"app_support_link": "https://www.imagetasks.com/pixea",
"privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt"
},
"more_by_this_developer": {
"apps": [
{
"logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
"link": "https://apps.apple.com/us/app/istatistica/id1126874522",
"serpapi_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
"name": "iStatistica",
"category": "Utilities"
},
... other apps
],
"result_type": "Full",
"see_all_link": "https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
}
},
"product_reviews": [
{
"position": 1,
"id": "9446406432",
"title": "Stop begging for reviews",
"text": "Stop begging for reviews",
"rating": 1,
"review_date": "2022-12-28 21:42:28 UTC",
"author": {
"name": "stalfos_knight",
"link": "https://itunes.apple.com/us/reviews/id41752602"
}
},
... other reviews
]
}
Links
- Code in the online IDE
- Apple Product Page Scraper API
- Apple Product Page Playground
- Apple App Store Reviews Scraper API
- Apple App Store Reviews Playground
Add a Feature Request💫 or a Bug🐞
Subscribe to my newsletter
Read articles from Artur Chukhrai directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Artur Chukhrai
Artur Chukhrai
I have been teaching programming for over 3 years and do web scraping as a hobby.