Understanding XPath: A Comprehensive Guide

XPath provides a powerful way to navigate and select elements in an XML or HTML document. In this section, we'll explore a wide range of XPath selectors, starting from the most basic and progressing to more complex and sophisticated selectors.

3.1 Basic Selectors

3.1.1 Select by Element Name

The most basic XPath selector simply selects all elements of a given type.

//div

This selects all <div> elements in the document.

3.1.2 Select by ID

To select an element with a specific ID:

//*[@id='username']

This selects any element with id="username".

3.1.3 Select by Class

To select elements with a specific class:

//*[@class='highlight']

This selects all elements with class="highlight".

3.2 Attribute Selectors

3.2.1 Exact Attribute Match

//input[@type='text']

This selects all <input> elements where the type attribute is exactly 'text'.

3.2.2 Attribute Contains

//div[contains(@class, 'user')]

This selects all <div> elements where the class attribute contains the word 'user'.

3.2.3 Starts-with

//a[starts-with(@href, 'https://')]

This selects all <a> elements where the href attribute starts with 'https://'.

3.3 Text Content Selectors

3.3.1 Exact Text Match

//button[text()='Submit']

This selects all <button> elements with the exact text 'Submit'.

3.3.2 Contains Text

//p[contains(text(), 'Welcome')]

This selects all <p> elements that contain the text 'Welcome'.

3.4 Position and Index Selectors

3.4.1 Select by Index

(//tr)[3]

This selects the third <tr> element in the document.

3.4.2 Select Last Element

(//li)[last()]

This selects the last <li> element.

3.4.3 Select by Position

//ul/li[position() < 3]

This selects the first two <li> elements that are children of a <ul>.

3.5 Logical Operators and Functions

3.5.1 AND Operator

//input[@type='checkbox' and @checked]

This selects all checkbox inputs that are checked.

3.5.2 OR Operator

//button[@type='submit' or @type='button']

This selects all buttons that are either submit or button type.

3.5.3 Not Function

//div[not(@class)]

This selects all <div> elements that don't have a class attribute.

3.6 Axes

3.6.1 Parent

//input[@id='email']/parent::div

This selects the parent <div> of the input with id="email".

3.6.2 Ancestor

//span[@class='error']/ancestor::form

This selects the <form> ancestor of a span with class="error".

3.6.3 Following-sibling

//label[@for='username']/following-sibling::input

This selects the <input> element that follows the label for 'username'.

3.7 Advanced Functions

3.7.1 Count

//ul[count(li) > 5]

This selects all <ul> elements that have more than 5 <li> children.

3.7.2 Sum

//div[@class='order'][sum(./span[@class='item-price']) > 100]

This selects <div> elements with class 'order' where the sum of the prices (assumed to be in <span> elements with class 'item-price') is greater than 100.

3.7.3 Normalize-space

//p[normalize-space(text()) = 'Hello World']

This selects <p> elements with the text "Hello World", ignoring leading and trailing whitespace.

3.8 Complex Combined Selectors

3.8.1 Multiple Conditions

//table[@id='data']//tr[td[1][text() = 'Active'] and number(td[last()]) > 1000]

This selects rows from a table with id="data" where the first cell contains 'Active' and the last cell (assumed to be numeric) is greater than 1000.

3.8.2 Dynamic Attribute Handling

//div[contains(@class, concat('user-', //span[@id='current-user']/@data-id))]

This selects a <div> where the class contains 'user-' followed by the data-id of a span with id="current-user".

3.8.3 Complex Text Manipulation

//p[translate(normalize-space(.), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'hello world']

This selects <p> elements that contain "Hello World" in any combination of upper and lower case, ignoring extra whitespace.

3.9 XPath Selectors Reference Table

Type	Description	Example	Use Case
Basic Selection	Select nodes by element name	`//div`	Select all div elements
Attribute Selection	Select nodes with specific attribute	`//input[@type='text']`	Find all text input fields
Attribute Contains	Select nodes with attribute containing value	`//div[contains(@class, 'highlight')]`	Find divs with a certain class
Text Content	Select nodes with specific text	`//button[text()='Submit']`	Find a button with exact text
Partial Text Content	Select nodes containing text	`//p[contains(text(), 'Welcome')]`	Find paragraphs with partial text match
Index Selection	Select a specific node by index	`(//li)[3]`	Select the third li element
Last() Function	Select the last node	`(//tr)[last()]`	Select the last table row
Position Function	Select nodes by position	`//ul/li[position() < 3]`	Select first two list items
Attribute Comparison	Select nodes based on attribute value comparison	`//div[@data-value > 100]`	Find divs with data-value greater than 100
Multiple Attributes	Select nodes with multiple attributes	`//input[@type='checkbox' and @checked]`	Find checked checkboxes
Parent Selection	Select parent of a node	`//input[@id='email']/..`	Find parent of a specific input
Ancestor Selection	Select ancestors of a node	`//span[@class='error']/ancestor::form`	Find form containing an error span
Descendant Selection	Select descendants of a node	`//div[@id='content']//p`	Find all paragraphs within a content div
Following Sibling	Select following siblings of a node	`//h1/following-sibling::p`	Find paragraphs after an h1
Preceding Sibling	Select preceding siblings of a node	`//input[@type='submit']/preceding-sibling::input`	Find inputs before a submit button
Starts-with Function	Select nodes with attributes starting with value	`//div[starts-with(@id, 'section')]`	Find divs with ids starting with 'section'
Ends-with Function	Select nodes with attributes ending with value	`//a[ends-with(@href, '.pdf')]`	Find links to PDF files
Not Function	Select nodes that don't match a condition	`//div[not(@class)]`	Find divs without a class attribute
Count Function	Select based on count of child elements	`//ul[count(li) > 5]`	Find lists with more than 5 items
Sum Function	Select based on sum of values	`//div[@class='total'][sum(./span/@data-value) > 1000]`	Find totals exceeding 1000
Normalize-space	Select ignoring leading/trailing whitespace	`//p[normalize-space(text()) = 'Hello']`	Find paragraphs with exact text, ignoring whitespace
Translate Function	Case-insensitive selection	`//button[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'submit']`	Find 'Submit' button case-insensitively
String-length Function	Select based on text length	`//p[string-length(text()) > 100]`	Find paragraphs with more than 100 characters
Math Operations	Perform calculations in XPath	`//product[number(@price) * number(@quantity) > 1000]`	Find high-value product entries
Union	Combine multiple XPath expressions	`//div[@id='content']	//div[@class='main']`
Axes	Navigate document tree	`//table/child::tr/descendant::td`	Find all cells in a table

Note: Some XPath functions like ends-with() are only available in XPath 2.0 and above. Ensure your XPath processor supports the version you're using.

Remember, while XPath provides powerful selection capabilities, overly complex selectors can be hard to maintain and may impact performance. Always strive for a balance between specificity and simplicity in your XPath expressions.

#3.0 What XPath: The Spectrum