#11.0 What XPath: Case Study

Souvik DeySouvik Dey
4 min read

In this section, we'll explore how XPath is used to solve complex web scraping and testing challenges in real-world scenarios. These case studies demonstrate the power and flexibility of XPath in handling diverse web structures and dynamic content.

Case Study 1: E-commerce Product Catalog Scraping

Scenario:

A large e-commerce platform needs to monitor competitor pricing across thousands of products. The competitor's website uses dynamic loading and has a complex, nested structure for product listings.

Challenge:

  • Product information is loaded dynamically as the user scrolls.

  • Product cards have inconsistent structures due to varying promotional badges and availability statuses.

  • Prices are sometimes displayed as a range or with discounts applied.

Solution:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://competitor-site.com/products")

# Wait for product grid to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, "//div[@class='product-grid']"))
)

# Scroll to load all products
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# Complex XPath to handle various product card structures
products = driver.find_elements(By.XPATH, """
    //div[contains(@class, 'product-card')]
    [.//div[contains(@class, 'product-title')] and .//div[contains(@class, 'product-price')]]
""")

for product in products:
    title = product.find_element(By.XPATH, ".//div[contains(@class, 'product-title')]").text

    # Handle regular and discounted prices
    price_element = product.find_element(By.XPATH, """
        (.//div[contains(@class, 'product-price')]//span[contains(@class, 'discounted')] |
         .//div[contains(@class, 'product-price')]//span[contains(@class, 'regular')])[last()]
    """)
    price = price_element.text

    # Check for availability
    availability = "In Stock" if product.find_elements(By.XPATH, ".//span[contains(@class, 'out-of-stock')]") else "Out of Stock"

    print(f"Product: {title}, Price: {price}, Availability: {availability}")

driver.quit()

Key XPath Techniques Used:

  1. Complex predicates to handle varying card structures

  2. Use of contains() for class matching to handle dynamic classes

  3. XPath unions (|) to handle different price display scenarios

  4. Relative XPath (.//) for navigating within each product card

Case Study 2: Social Media Dashboard Testing

Scenario:

A social media management tool needs to test its dashboard, which displays real-time analytics from various platforms. The dashboard uses Shadow DOM for encapsulation and has multiple nested components.

Challenge:

  • Elements are within Shadow DOM, making traditional selectors ineffective

  • Data is updated dynamically and may take varying times to load

  • The layout adjusts based on the user's subscribed features

Solution:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://dashboard.socialmediatool.com")

# Helper function to pierce Shadow DOM
def query_shadow_root(host, selector):
    return driver.execute_script('return arguments[0].shadowRoot.querySelector(arguments[1])', host, selector)

# Wait for the main dashboard container to load
dashboard = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "dashboard-root"))
)

# Navigate through Shadow DOM to find analytics cards
analytics_host = query_shadow_root(dashboard, "#analytics-container")
cards = analytics_host.find_elements(By.XPATH, ".//div[contains(@class, 'analytics-card')]")

for card in cards:
    # Extract platform name
    platform = card.find_element(By.XPATH, ".//h3[contains(@class, 'platform-name')]").text

    # Wait for and extract follower count
    follower_element = WebDriverWait(card, 5).until(
        EC.presence_of_element_located((By.XPATH, ".//span[contains(@class, 'follower-count')]"))
    )
    followers = follower_element.text

    # Check for growth indicator
    growth_elements = card.find_elements(By.XPATH, ".//span[contains(@class, 'growth-indicator')]")
    growth = growth_elements[0].text if growth_elements else "N/A"

    print(f"Platform: {platform}, Followers: {followers}, Growth: {growth}")

driver.quit()

Key XPath Techniques Used:

  1. Combining JavaScript execution with XPath to handle Shadow DOM

  2. Using contains() for class names to handle dynamic classes

  3. Relative XPath within each card for extracting specific data

  4. XPath to check for optional elements (growth indicator)

Case Study 3: Dynamic Form Validation in a CMS

Scenario:

A Content Management System (CMS) needs to test its dynamic form builder feature, which allows users to create custom forms with various field types and validation rules.

Challenge:

  • Form structure is not fixed and can vary based on user configuration

  • Validation rules are applied dynamically based on user input

  • Error messages appear in different locations depending on the field type

Solution:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://cms-example.com/form-builder")

# Wait for form to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, "//form[@id='dynamic-form']"))
)

# Function to fill a field and check for validation
def test_field(field_xpath, test_value, error_message):
    field = driver.find_element(By.XPATH, field_xpath)
    field.clear()
    field.send_keys(test_value)

    # Click outside to trigger validation
    driver.find_element(By.XPATH, "//body").click()

    # Complex XPath to find error message in various locations
    error_xpath = f"""
        ({field_xpath}/following-sibling::*[contains(@class, 'error')] |
         {field_xpath}/../*[contains(@class, 'error')] |
         {field_xpath}/ancestor::div[contains(@class, 'field-wrapper')]//
         *[contains(@class, 'error')])[last()]
    """

    try:
        error_element = WebDriverWait(driver, 5).until(
            EC.presence_of_element_located((By.XPATH, error_xpath))
        )
        actual_error = error_element.text
        assert actual_error == error_message, f"Expected '{error_message}', but got '{actual_error}'"
        print(f"Validation passed for {field_xpath}")
    except:
        print(f"Validation failed for {field_xpath}")

# Test various field types
test_field("//input[@name='email']", "invalid-email", "Please enter a valid email address")
test_field("//input[@name='phone']", "123", "Phone number must be at least 10 digits")
test_field("//textarea[@name='description']", "a" * 501, "Description cannot exceed 500 characters")

# Test a dynamically added field
add_field_button = driver.find_element(By.XPATH, "//button[text()='Add Custom Field']")
add_field_button.click()

WebDriverWait(driver, 5).until(
    EC.presence_of_element_located((By.XPATH, "//input[contains(@name, 'custom-field')]"))
)

test_field("//input[contains(@name, 'custom-field')]", "", "This field is required")

driver.quit()

Key XPath Techniques Used:

  1. Dynamic XPath construction for error message location

  2. Use of XPath axes (following-sibling, ancestor) to handle varying error message placements

  3. XPath unions to check multiple possible locations

  4. Attribute contains for dynamically named fields

These case studies demonstrate how XPath can be leveraged to handle complex, real-world scenarios in web scraping and testing. They showcase the flexibility of XPath in dealing with dynamic content, inconsistent structures, and varying page layouts.

0
Subscribe to my newsletter

Read articles from Souvik Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Souvik Dey
Souvik Dey

I design and develop programmatic solutions for Problem-Solving.