#3.0 What XPath: The Spectrum
XPath provides a powerful way to navigate and select elements in an XML or HTML document. In this section, we'll explore a wide range of XPath selectors, starting from the most basic and progressing to more complex and sophisticated selectors.
3.1 Basic Selectors
3.1.1 Select by Element Name
The most basic XPath selector simply selects all elements of a given type.
//div
This selects all <div>
elements in the document.
3.1.2 Select by ID
To select an element with a specific ID:
//*[@id='username']
This selects any element with id="username"
.
3.1.3 Select by Class
To select elements with a specific class:
//*[@class='highlight']
This selects all elements with class="highlight"
.
3.2 Attribute Selectors
3.2.1 Exact Attribute Match
//input[@type='text']
This selects all <input>
elements where the type
attribute is exactly 'text'.
3.2.2 Attribute Contains
//div[contains(@class, 'user')]
This selects all <div>
elements where the class
attribute contains the word 'user'.
3.2.3 Starts-with
//a[starts-with(@href, 'https://')]
This selects all <a>
elements where the href
attribute starts with 'https://'.
3.3 Text Content Selectors
3.3.1 Exact Text Match
//button[text()='Submit']
This selects all <button>
elements with the exact text 'Submit'.
3.3.2 Contains Text
//p[contains(text(), 'Welcome')]
This selects all <p>
elements that contain the text 'Welcome'.
3.4 Position and Index Selectors
3.4.1 Select by Index
(//tr)[3]
This selects the third <tr>
element in the document.
3.4.2 Select Last Element
(//li)[last()]
This selects the last <li>
element.
3.4.3 Select by Position
//ul/li[position() < 3]
This selects the first two <li>
elements that are children of a <ul>
.
3.5 Logical Operators and Functions
3.5.1 AND Operator
//input[@type='checkbox' and @checked]
This selects all checkbox inputs that are checked.
3.5.2 OR Operator
//button[@type='submit' or @type='button']
This selects all buttons that are either submit or button type.
3.5.3 Not Function
//div[not(@class)]
This selects all <div>
elements that don't have a class attribute.
3.6 Axes
3.6.1 Parent
//input[@id='email']/parent::div
This selects the parent <div>
of the input with id="email"
.
3.6.2 Ancestor
//span[@class='error']/ancestor::form
This selects the <form>
ancestor of a span with class="error"
.
3.6.3 Following-sibling
//label[@for='username']/following-sibling::input
This selects the <input>
element that follows the label for 'username'.
3.7 Advanced Functions
3.7.1 Count
//ul[count(li) > 5]
This selects all <ul>
elements that have more than 5 <li>
children.
3.7.2 Sum
//div[@class='order'][sum(./span[@class='item-price']) > 100]
This selects <div>
elements with class 'order' where the sum of the prices (assumed to be in <span>
elements with class 'item-price') is greater than 100.
3.7.3 Normalize-space
//p[normalize-space(text()) = 'Hello World']
This selects <p>
elements with the text "Hello World", ignoring leading and trailing whitespace.
3.8 Complex Combined Selectors
3.8.1 Multiple Conditions
//table[@id='data']//tr[td[1][text() = 'Active'] and number(td[last()]) > 1000]
This selects rows from a table with id="data"
where the first cell contains 'Active' and the last cell (assumed to be numeric) is greater than 1000.
3.8.2 Dynamic Attribute Handling
//div[contains(@class, concat('user-', //span[@id='current-user']/@data-id))]
This selects a <div>
where the class contains 'user-' followed by the data-id
of a span with id="current-user"
.
3.8.3 Complex Text Manipulation
//p[translate(normalize-space(.), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'hello world']
This selects <p>
elements that contain "Hello World" in any combination of upper and lower case, ignoring extra whitespace.
3.9 XPath Selectors Reference Table
Type | Description | Example | Use Case |
Basic Selection | Select nodes by element name | //div | Select all div elements |
Attribute Selection | Select nodes with specific attribute | //input[@type='text'] | Find all text input fields |
Attribute Contains | Select nodes with attribute containing value | //div[contains(@class, 'highlight')] | Find divs with a certain class |
Text Content | Select nodes with specific text | //button[text()='Submit'] | Find a button with exact text |
Partial Text Content | Select nodes containing text | //p[contains(text(), 'Welcome')] | Find paragraphs with partial text match |
Index Selection | Select a specific node by index | (//li)[3] | Select the third li element |
Last() Function | Select the last node | (//tr)[last()] | Select the last table row |
Position Function | Select nodes by position | //ul/li[position() < 3] | Select first two list items |
Attribute Comparison | Select nodes based on attribute value comparison | //div[@data-value > 100] | Find divs with data-value greater than 100 |
Multiple Attributes | Select nodes with multiple attributes | //input[@type='checkbox' and @checked] | Find checked checkboxes |
Parent Selection | Select parent of a node | //input[@id='email']/.. | Find parent of a specific input |
Ancestor Selection | Select ancestors of a node | //span[@class='error']/ancestor::form | Find form containing an error span |
Descendant Selection | Select descendants of a node | //div[@id='content']//p | Find all paragraphs within a content div |
Following Sibling | Select following siblings of a node | //h1/following-sibling::p | Find paragraphs after an h1 |
Preceding Sibling | Select preceding siblings of a node | //input[@type='submit']/preceding-sibling::input | Find inputs before a submit button |
Starts-with Function | Select nodes with attributes starting with value | //div[starts-with(@id, 'section')] | Find divs with ids starting with 'section' |
Ends-with Function | Select nodes with attributes ending with value | //a[ends-with(@href, '.pdf')] | Find links to PDF files |
Not Function | Select nodes that don't match a condition | //div[not(@class)] | Find divs without a class attribute |
Count Function | Select based on count of child elements | //ul[count(li) > 5] | Find lists with more than 5 items |
Sum Function | Select based on sum of values | //div[@class='total'][sum(./span/@data-value) > 1000] | Find totals exceeding 1000 |
Normalize-space | Select ignoring leading/trailing whitespace | //p[normalize-space(text()) = 'Hello'] | Find paragraphs with exact text, ignoring whitespace |
Translate Function | Case-insensitive selection | //button[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'submit'] | Find 'Submit' button case-insensitively |
String-length Function | Select based on text length | //p[string-length(text()) > 100] | Find paragraphs with more than 100 characters |
Math Operations | Perform calculations in XPath | //product[number(@price) * number(@quantity) > 1000] | Find high-value product entries |
Union | Combine multiple XPath expressions | `//div[@id='content'] | //div[@class='main']` |
Axes | Navigate document tree | //table/child::tr/descendant::td | Find all cells in a table |
Note: Some XPath functions like
ends-with()
are only available in XPath 2.0 and above. Ensure your XPath processor supports the version you're using.
Remember, while XPath provides powerful selection capabilities, overly complex selectors can be hard to maintain and may impact performance. Always strive for a balance between specificity and simplicity in your XPath expressions.
Subscribe to my newsletter
Read articles from Souvik Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Souvik Dey
Souvik Dey
I design and develop programmatic solutions for Problem-Solving.