#8.0 What XPath: Hands-on

Souvik DeySouvik Dey
5 min read

In this section, we'll work through some challenging XPath problems using a complex HTML structure. These exercises will help you apply the concepts we've discussed and improve your XPath skills.

8.1 The HTML Structure

First, let's look at the HTML we'll be working with. Save this as advanced-xpath-practice.html and open it in your browser:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Advanced XPath Practice</title>
    <style>
        body { font-family: Arial, sans-serif; line-height: 1.6; padding: 20px; }
        table { border-collapse: collapse; width: 100%; margin-bottom: 20px; }
        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
        th { background-color: #f2f2f2; }
        .highlight { background-color: #ffffd0; }
        .level-1 { border: 1px solid #ddd; padding: 10px; margin-bottom: 10px; }
        .level-2 { border: 1px solid #bbb; padding: 8px; margin: 5px 0; }
        .level-3 { border: 1px solid #999; padding: 6px; margin: 3px 0; }
        .target { font-weight: bold; color: blue; }
        .exclude { opacity: 0.5; }
        .event { border: 1px solid #ddd; padding: 10px; margin-bottom: 10px; }
        .special { font-style: italic; }
    </style>
</head>
<body>
    <div id="current-date" data-value="2023-09-15">Current Date: September 15, 2023</div>

    <h2>Complex Table</h2>
    <table id="data">
        <thead>
            <tr>
                <th>Status</th>
                <th>Name</th>
                <th>Department</th>
                <th>Salary</th>
                <th>Start Date</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Active</td>
                <td>John Doe</td>
                <td>IT</td>
                <td>$75,000</td>
                <td>2021-03-15</td>
            </tr>
            <tr class="highlight">
                <td>Inactive</td>
                <td>Jane Smith</td>
                <td>HR</td>
                <td>$65,000</td>
                <td>2019-07-22</td>
            </tr>
            <tr>
                <td>Active</td>
                <td>Bob Johnson</td>
                <td>Sales</td>
                <td>$80,000</td>
                <td>2022-11-30</td>
            </tr>
            <tr class="highlight">
                <td>Active</td>
                <td>Alice Brown</td>
                <td>Marketing</td>
                <td>$70,000</td>
                <td>2023-01-10</td>
            </tr>
            <tr>
                <td>Inactive</td>
                <td>Charlie Wilson</td>
                <td>Finance</td>
                <td>$90,000</td>
                <td>2018-05-03</td>
            </tr>
        </tbody>
    </table>

    <h2>Nested Structure</h2>
    <div class="level-1">
        <h3>Level 1 - A</h3>
        <div class="level-2">
            <h4>Level 2 - A1</h4>
            <div class="level-3">
                <h5>Level 3 - A1a</h5>
                <span class="target">Target Content 1</span>
            </div>
            <div class="level-3 exclude">
                <h5>Level 3 - A1b</h5>
                <span class="target">Excluded Target Content</span>
            </div>
            <div class="level-3">
                <h5>Level 3 - A1c</h5>
                <span class="target">Target Content 2</span>
            </div>
        </div>
    </div>
    <div class="level-1">
        <h3>Level 1 - B</h3>
        <div class="level-2">
            <h4>Level 2 - B1</h4>
            <div class="level-3">
                <h5>Level 3 - B1a</h5>
                <span class="target">Target Content 3</span>
            </div>
            <div class="level-3">
                <h5>Level 3 - B1b</h5>
                <span class="target">Target Content 4</span>
            </div>
        </div>
    </div>

    <h2>Dynamic Content</h2>
    <div id="event-list">
        <div class="event" data-date="2023-07-15">
            <h3>Conference A</h3>
            <p class="special">Special Event</p>
            <span>Date: July 15, 2023</span>
        </div>
        <div class="event" data-date="2023-08-22">
            <h3>Workshop B</h3>
            <span>Date: August 22, 2023</span>
        </div>
        <div class="event" data-date="2023-10-05">
            <h3>Seminar C</h3>
            <p class="special">Special Event</p>
            <span>Date: October 5, 2023</span>
        </div>
        <div class="event" data-date="2023-11-18">
            <h3>Webinar D</h3>
            <span>Date: November 18, 2023</span>
        </div>
        <div class="event" data-date="2024-01-20">
            <h3>Symposium E</h3>
            <p class="special">Special Event</p>
            <span>Date: January 20, 2024</span>
        </div>
    </div>
</body>
</html>

8.2 XPath Challenges

Now, let's tackle some challenging XPath problems using this HTML structure. Try to solve these on your own before looking at the solutions.

Challenge 1: Complex Table Navigation

Find all 'Active' employees who started after June 2022 and have a salary greater than $70,000.

Challenge 2: Nested Structure Navigation

Select the second 'target' span that is not within an 'exclude' class and is in the third 'level-3' div.

Challenge 3: Dynamic Content Filtering

Find all special events that occur after the current date (2023-09-15 in this example) and before the year 2024.

Challenge 4: Attribute Manipulation

Find all events where the month in the data-date attribute is an odd number.

Challenge 5: Complex Conditional Selection

Select all table rows where the employee is either:

  • Active with a salary above $75,000, or

  • Inactive with a start date before 2020

8.3 Solutions and Explanations

Solution 1: Complex Table Navigation

//table[@id='data']//tr[td[1][text()='Active'] and 
                        number(translate(substring(td[5], 1, 4), '-', '')) >= 2022 and 
                        number(translate(substring(td[5], 6, 2), '-', '')) > 06 and 
                        number(translate(td[4], '$,', '')) > 70000]

Explanation:

  • Starts with the table having id 'data'

  • Selects rows where:

    • First column (status) is 'Active'

    • Start date year is 2022 or later

    • Start date month is after June

    • Salary (4th column) is greater than 70,000

Solution 2: Nested Structure Navigation

(//div[@class='level-3'][not(contains(@class, 'exclude'))])[3]//span[@class='target'][2]

Explanation:

  • Selects the third 'level-3' div that doesn't have the 'exclude' class

  • Within that div, selects the second span with class 'target'

Solution 3: Dynamic Content Filtering

//div[@class='event'][p[@class='special'] and 
                      translate(substring(@data-date, 1, 10), '-', '') > '20230915' and 
                      translate(substring(@data-date, 1, 4), '-', '') < '2024']

Explanation:

  • Selects 'event' divs that:

    • Have a paragraph with class 'special'

    • Have a date after 2023-09-15

    • Have a date before 2024

Solution 4: Attribute Manipulation

//div[@class='event'][number(substring(@data-date, 6, 2)) mod 2 = 1]

Explanation:

  • Selects 'event' divs where:

    • The month (characters 6-7 in the date) is odd

Solution 5: Complex Conditional Selection

//table[@id='data']//tr[
    (td[1][text()='Active'] and number(translate(td[4], '$,', '')) > 75000) or 
    (td[1][text()='Inactive'] and number(translate(substring(td[5], 1, 4), '-', '')) < 2020)
]

Explanation:

  • Selects rows where either:

    • Status is 'Active' and salary is above $75,000, or

    • Status is 'Inactive' and start date is before 2020

8.4 Testing Your XPath Expressions

To test these XPath expressions, you can use browser developer tools or online XPath testers. Here's how to use Chrome DevTools:

  1. Open the HTML file in Chrome

  2. Right-click and select "Inspect"

  3. In the Console tab, use $x("your_xpath_here") to test your XPath

  4. The matching elements will be returned in an array

Example:

$x("//table[@id='data']//tr[td[1][text()='Active']]")

This will return all rows with 'Active' status.

8.5 Conclusion

These challenges demonstrate the power and flexibility of XPath in handling complex document structures and conditions. Practice with these examples to improve your XPath skills, and don't hesitate to experiment with variations of these expressions to deepen your understanding.

Remember, while complex XPath expressions can be powerful, they can also be hard to maintain. In real-world scenarios, strive for a balance between specificity and readability in your locators.

0
Subscribe to my newsletter

Read articles from Souvik Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Souvik Dey
Souvik Dey

I design and develop programmatic solutions for Problem-Solving.