#3.0 What XPath: The Spectrum

Souvik DeySouvik Dey
6 min read

XPath provides a powerful way to navigate and select elements in an XML or HTML document. In this section, we'll explore a wide range of XPath selectors, starting from the most basic and progressing to more complex and sophisticated selectors.

3.1 Basic Selectors

3.1.1 Select by Element Name

The most basic XPath selector simply selects all elements of a given type.

//div

This selects all <div> elements in the document.

3.1.2 Select by ID

To select an element with a specific ID:

//*[@id='username']

This selects any element with id="username".

3.1.3 Select by Class

To select elements with a specific class:

//*[@class='highlight']

This selects all elements with class="highlight".

3.2 Attribute Selectors

3.2.1 Exact Attribute Match

//input[@type='text']

This selects all <input> elements where the type attribute is exactly 'text'.

3.2.2 Attribute Contains

//div[contains(@class, 'user')]

This selects all <div> elements where the class attribute contains the word 'user'.

3.2.3 Starts-with

//a[starts-with(@href, 'https://')]

This selects all <a> elements where the href attribute starts with 'https://'.

3.3 Text Content Selectors

3.3.1 Exact Text Match

//button[text()='Submit']

This selects all <button> elements with the exact text 'Submit'.

3.3.2 Contains Text

//p[contains(text(), 'Welcome')]

This selects all <p> elements that contain the text 'Welcome'.

3.4 Position and Index Selectors

3.4.1 Select by Index

(//tr)[3]

This selects the third <tr> element in the document.

3.4.2 Select Last Element

(//li)[last()]

This selects the last <li> element.

3.4.3 Select by Position

//ul/li[position() < 3]

This selects the first two <li> elements that are children of a <ul>.

3.5 Logical Operators and Functions

3.5.1 AND Operator

//input[@type='checkbox' and @checked]

This selects all checkbox inputs that are checked.

3.5.2 OR Operator

//button[@type='submit' or @type='button']

This selects all buttons that are either submit or button type.

3.5.3 Not Function

//div[not(@class)]

This selects all <div> elements that don't have a class attribute.

3.6 Axes

3.6.1 Parent

//input[@id='email']/parent::div

This selects the parent <div> of the input with id="email".

3.6.2 Ancestor

//span[@class='error']/ancestor::form

This selects the <form> ancestor of a span with class="error".

3.6.3 Following-sibling

//label[@for='username']/following-sibling::input

This selects the <input> element that follows the label for 'username'.

3.7 Advanced Functions

3.7.1 Count

//ul[count(li) > 5]

This selects all <ul> elements that have more than 5 <li> children.

3.7.2 Sum

//div[@class='order'][sum(./span[@class='item-price']) > 100]

This selects <div> elements with class 'order' where the sum of the prices (assumed to be in <span> elements with class 'item-price') is greater than 100.

3.7.3 Normalize-space

//p[normalize-space(text()) = 'Hello World']

This selects <p> elements with the text "Hello World", ignoring leading and trailing whitespace.

3.8 Complex Combined Selectors

3.8.1 Multiple Conditions

//table[@id='data']//tr[td[1][text() = 'Active'] and number(td[last()]) > 1000]

This selects rows from a table with id="data" where the first cell contains 'Active' and the last cell (assumed to be numeric) is greater than 1000.

3.8.2 Dynamic Attribute Handling

//div[contains(@class, concat('user-', //span[@id='current-user']/@data-id))]

This selects a <div> where the class contains 'user-' followed by the data-id of a span with id="current-user".

3.8.3 Complex Text Manipulation

//p[translate(normalize-space(.), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'hello world']

This selects <p> elements that contain "Hello World" in any combination of upper and lower case, ignoring extra whitespace.

3.9 XPath Selectors Reference Table

TypeDescriptionExampleUse Case
Basic SelectionSelect nodes by element name//divSelect all div elements
Attribute SelectionSelect nodes with specific attribute//input[@type='text']Find all text input fields
Attribute ContainsSelect nodes with attribute containing value//div[contains(@class, 'highlight')]Find divs with a certain class
Text ContentSelect nodes with specific text//button[text()='Submit']Find a button with exact text
Partial Text ContentSelect nodes containing text//p[contains(text(), 'Welcome')]Find paragraphs with partial text match
Index SelectionSelect a specific node by index(//li)[3]Select the third li element
Last() FunctionSelect the last node(//tr)[last()]Select the last table row
Position FunctionSelect nodes by position//ul/li[position() < 3]Select first two list items
Attribute ComparisonSelect nodes based on attribute value comparison//div[@data-value > 100]Find divs with data-value greater than 100
Multiple AttributesSelect nodes with multiple attributes//input[@type='checkbox' and @checked]Find checked checkboxes
Parent SelectionSelect parent of a node//input[@id='email']/..Find parent of a specific input
Ancestor SelectionSelect ancestors of a node//span[@class='error']/ancestor::formFind form containing an error span
Descendant SelectionSelect descendants of a node//div[@id='content']//pFind all paragraphs within a content div
Following SiblingSelect following siblings of a node//h1/following-sibling::pFind paragraphs after an h1
Preceding SiblingSelect preceding siblings of a node//input[@type='submit']/preceding-sibling::inputFind inputs before a submit button
Starts-with FunctionSelect nodes with attributes starting with value//div[starts-with(@id, 'section')]Find divs with ids starting with 'section'
Ends-with FunctionSelect nodes with attributes ending with value//a[ends-with(@href, '.pdf')]Find links to PDF files
Not FunctionSelect nodes that don't match a condition//div[not(@class)]Find divs without a class attribute
Count FunctionSelect based on count of child elements//ul[count(li) > 5]Find lists with more than 5 items
Sum FunctionSelect based on sum of values//div[@class='total'][sum(./span/@data-value) > 1000]Find totals exceeding 1000
Normalize-spaceSelect ignoring leading/trailing whitespace//p[normalize-space(text()) = 'Hello']Find paragraphs with exact text, ignoring whitespace
Translate FunctionCase-insensitive selection//button[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'submit']Find 'Submit' button case-insensitively
String-length FunctionSelect based on text length//p[string-length(text()) > 100]Find paragraphs with more than 100 characters
Math OperationsPerform calculations in XPath//product[number(@price) * number(@quantity) > 1000]Find high-value product entries
UnionCombine multiple XPath expressions`//div[@id='content']//div[@class='main']`
AxesNavigate document tree//table/child::tr/descendant::tdFind all cells in a table

Note: Some XPath functions like ends-with() are only available in XPath 2.0 and above. Ensure your XPath processor supports the version you're using.

Remember, while XPath provides powerful selection capabilities, overly complex selectors can be hard to maintain and may impact performance. Always strive for a balance between specificity and simplicity in your XPath expressions.

0
Subscribe to my newsletter

Read articles from Souvik Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Souvik Dey
Souvik Dey

I design and develop programmatic solutions for Problem-Solving.