Using Walmart Search Engine Results API from SerpApi

Artur ChukhraiArtur Chukhrai
5 min read

Intro

In this blog post, we'll go through the process of extracting filters, featured items, related queries and organic results plus pagination using the Walmart Search Engine Results API and the Python programming language.

You can look at the complete code in the online IDE (Replit).

What will be scraped

wwbs-walmart-search 1

📌Note: By default, Walmart returns 40 results. In this case, 8 results are displayed to make the image more compact.

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-all

Full Code

This code retrieves all the data with pagination:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine    
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    walmart_results['organic_results'].extend(results['organic_results'])

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
LibraryPurpose
GoogleSearchto scrape and parse Google results using SerpApi web scraping library.
urlsplitthis should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
parse_qslto parse a query string given as a string argument.
jsonto convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine    
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}
ParametersExplanation
api_keyParameter defines the SerpApi private key to use. You can find it under your account -> API key
engineSet parameter to walmart to use the Walmart API engine.
queryParameter defines the search query. You can use anything that you would use in a regular Walmart search.
spellingActivate spelling fix. True (default) includes spelling fix, False searches without spelling fix.
sortParameter defines sorting. (e.g. price_low, price_high, best_seller, best_match, rating_high, new)
min_priceLower bound of price range query.
max_priceUpper bound of price range query.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # data extraction on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

You may have noticed that I made a mistake when passing the value to the q parameter. This was done on purpose to demonstrate that SerpApi's Walmart Spell Check API allows you to extract the corrected search term and search it:

print(results['search_information']['spelling_fix'])    # coffee marker

At the moment, the results dictionary only stores data from 1 page. Before extracting data, the walmart_results dictionary is created where this data will be added later. Since the search_information, filters, featured_item and related_queries are repeated on each subsequent page, you can extract them immediately:

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

To get all organic results, you need to apply Walmart Pagination API. This is achieved by the following check: while the next page exists in the serpapi_pagination dictionary, we fetch the data from the current page, update the JSON data in the search object, and get updated information about the next page:

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    # ...

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

Extending the walmart_results['organic_results'] list with new data from this page:

# add data from current page
walmart_results['organic_results'].extend(results['organic_results'])

# title = results['organic_results'][0]['title']
# thumbnail = results['organic_results'][0]['thumbnail']
# rating = results['organic_results'][0]['rating']
# reviews = results['organic_results'][0]['reviews']
# price = results['organic_results'][0]['primary_offer']['offer_price']

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the results['organic_results'][0]. This is the index of a product, which means that we are extracting data from the first product. The results['organic_results'][1] is from the second product and so on.

After the all data is retrieved, it is output in JSON format:

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Output

{
  "search_information": {
    "location": {
      "postal_code": "60602",
      "province_code": "IL",
      "city": "Chicago",
      "store_id": "5402"
    },
    "total_results": 152051,
    "query_displayed": "coffee marker",
    "organic_results_state": "Results for exact spelling",
    "spelling_fix": "coffee maker"
  },
  "filters": null,
  "organic_results": [
    {
      "us_item_id": "622343372",
      "product_id": "363IFK4JZENM",
      "title": "Nespresso Vertuo Plus Coffee and Espresso Maker by De'Longhi, Black",
      "thumbnail": "https://i5.walmartimages.com/asr/b80b2bf3-f47c-494d-be9c-bd5b548760f9.b4bcbb88b02aaef77b5df4c697c22ab4.jpeg?odnHeight=180&odnWidth=180&odnBg=FFFFFF",
      "rating": 4.7,
      "reviews": 1603,
      "seller_id": "F55CDC31AB754BB68FE0B39041159D63",
      "seller_name": "Walmart.com",
      "fulfillment_badges": [
        "3+ day shipping"
      ],
      "two_day_shipping": false,
      "out_of_stock": false,
      "sponsored": true,
      "muliple_options_available": false,
      "primary_offer": {
        "offer_id": "8952A2034C634B9C9166D9A720E1DC5B",
        "offer_price": 127,
        "min_price": 0
      },
      "price_per_unit": {
        "unit": "each",
        "amount": ""
      },
      "product_page_url": "https://www.walmart.com/ip/Nespresso-Vertuo-Plus-Coffee-and-Espresso-Maker-by-De-Longhi-Black/622343372?athbdg=L1800",
      "serpapi_product_page_url": "https://serpapi.com/search.json?device=desktop&engine=walmart_product&product_id=622343372"
    },
    ... other results
  ],
  "featured_item": null,
  "related_queries": null
}

📌Note: Head to the playground for a live and interactive demo.

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

0
Subscribe to my newsletter

Read articles from Artur Chukhrai directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Artur Chukhrai
Artur Chukhrai

I have been teaching programming for over 3 years and do web scraping as a hobby.