Selenium: Mastering Actions and Window Handling

Selenium WebDriver is a powerful tool for automating web applications, and it offers advanced features to simulate user interactions like mouse movements, keyboard inputs, and handling complex web elements such as frames and child windows. In this blog, we’ll dive into the Actions class, frames, and child window handling in Selenium WebDriver with practical examples. Let’s get started!

Introduction to the Actions Class

The Actions class in Selenium WebDriver is designed to simulate complex user interactions, such as mouse movements, keyboard inputs, and gestures, that go beyond simple clicks or text inputs. Unlike basic Selenium methods (e.g., click() or sendKeys()), the Actions class allows you to perform advanced actions like:

Mouse hover: Moving the mouse over an element without clicking.
Right-click (context click): Simulating a right-click on an element.
Double-click: Performing a double-click action.
Drag-and-drop: Dragging an element and dropping it elsewhere.
Keyboard actions: Holding down keys (e.g., Shift) to type in capital letters.

These actions mimic how a real user interacts with a web application, making the Actions class essential for testing dynamic and interactive UI elements.

Why Use the Actions Class?

Imagine you’re testing a website like Amazon.com, where moving your mouse over a menu (e.g., “Hello, Sign in”) displays a dropdown without clicking. The Actions class helps you automate such scenarios, ensuring your tests validate real-world user behavior.

Using the Actions Class for Mouse and Keyboard Interactions

Let’s explore how to use the Actions class with practical examples, focusing on a test scenario on Amazon.com.

Setting Up the Actions Class

To use the Actions class, you need to:

Create an instance of the Actions class and pass the WebDriver object to it.
Use methods like moveToElement(), click(), or sendKeys() to define actions.
Build and perform the actions using build() and perform().

Here’s a basic setup in Java:

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;

public class ActionsDemo {
    public static void main(String[] args) {
        System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
        WebDriver driver = new ChromeDriver();
        driver.get("https://www.amazon.com");

        // Create Actions class instance
        Actions actions = new Actions(driver);
    }
}

Moving to an Element (Mouse Hover)

Let’s automate a scenario where hovering over the “Hello, Sign in” link on Amazon displays a popup.

Identify the element: Inspect the “Hello, Sign in” link to find its locator. For example, its ID is nav-link-accountList.
Use moveToElement(): Move the mouse to the element.
Build and perform the action: Combine the action and execute it.

WebElement signInLink = driver.findElement(By.id("nav-link-accountList"));
actions.moveToElement(signInLink).build().perform();

moveToElement(signInLink): Moves the mouse to the “Hello, Sign in” link.
build(): Prepares the action for execution.
perform(): Executes the action, simulating the hover effect.

When you run this script, the mouse hovers over the link, and the popup appears, just like a user interaction.

Performing Composite Actions

The Actions class allows you to chain multiple actions (called composite actions) in a single sequence. For example, let’s automate entering text in capital letters in Amazon’s search bar.

Scenario: Move to the search bar, click it, hold the Shift key, type “hello” (which appears as “HELLO”), and double-click to select the text.

Locate the search bar: Its ID is “twotabsearchtextbox”.
Chain actions:
- Move to the search bar.
- Click to focus.
- Hold the Shift key (keyDown(Keys.SHIFT)).
- Type “hello” (sendKeys("hello")).
- Double-click to select the text (doubleClick()).
Build and perform the composite action.

WebElement searchBox = driver.findElement(By.id("twotabsearchtextbox"));
actions.moveToElement(searchBox)
    .click()
    .keyDown(Keys.SHIFT)
    .sendKeys("hello")
    .keyUp(Keys.SHIFT)
    .doubleClick()
    .build()
    .perform();

click(): Focuses on the search bar.
keyDown(Keys.SHIFT): Holds the Shift key to type in capitals.
sendKeys("hello"): Types “hello”, which appears as “HELLO” due to Shift.
keyUp(Keys.SHIFT): Releases the Shift key.
doubleClick(): Selects the entered text.
build().perform(): Executes the entire sequence.

When you run this, the search bar will contain “HELLO”, and the text will be selected.

Right-Clicking (Context Click)

To simulate a right-click, use the contextClick() method. For example, right-clicking the “Hello, Sign in” link might open a context menu.

actions.moveToElement(signInLink)
    .contextClick()
    .build()
    .perform();

This moves to the link, performs a right-click, and displays the context menu.

Why `build()` and `perform()`?

Build: Combines all actions into a single executable unit.
Perform: Executes the built actions.

Without both, the actions won’t execute, as Selenium needs to prepare and then trigger the sequence.

Understanding Frames in Selenium

What Are Frames?

A frame (or iframe) is a container within a webpage that displays content independent of the main page’s HTML. It’s like a separate webpage embedded inside the main page. For example, a webpage might host a frame to display an advertisement or a draggable element.

Why Are Frames Tricky?

Selenium cannot directly interact with elements inside a frame because they’re in a separate context. You must explicitly tell Selenium to switch to the frame before interacting with its elements.

Switching to Frames

To work with elements inside a frame, use the switchTo().frame() method. You can switch to a frame using:

Frame ID: If the frame has an ID attribute.
Frame Index: Frames are indexed starting from 0 (first frame is 0).
WebElement: Locate the frame using its attributes (e.g., class or tag).

Example Scenario: On the demo website , there’s a draggable box inside a frame. Let’s click it and then drag-and-drop it.

Inspect the frame: Right-click and inspect the draggable box. You’ll notice it’s inside an <iframe> tag with a class “demo-frame”.

Switch to the frame using its class:

  driver.switchTo().frame(driver.findElement(By.className("demo-frame")));

Interact with the element:

  WebElement draggable = driver.findElement(By.id("draggable"));
  draggable.click();

switchTo().frame(): Moves Selenium’s focus to the frame.
After switching, Selenium can locate and interact with the draggable element.

Handling Drag-and-Drop in Frames

Let’s drag the draggable element to a droppable target inside the same frame using the Actions class.

Switch to the frame (as above).
Locate source and target elements:
- Source: draggable (ID: draggable).
- Target: droppable (ID: droppable).

Perform drag-and-drop:

  WebElement source = driver.findElement(By.id("draggable"));
  WebElement target = driver.findElement(By.id("droppable"));
  Actions actions = new Actions(driver);
  actions.dragAndDrop(source, target)
      .build()
      .perform();

dragAndDrop(source, target): Drags the source element to the target.
After switching to the frame, Selenium can interact with both elements.

Switching Back to the Main Page

After working inside a frame, you must switch back to the main page (default content) to interact with elements outside the frame:

driver.switchTo().defaultContent();

Example: After dragging and dropping, click a link outside the frame (e.g., “Accept” link):

driver.switchTo().defaultContent();
driver.findElement(By.linkText("Accept")).click();

Finding the Number of Frames

To determine how many frames are on a page, search for <iframe> tags:

int frameCount = driver.findElements(By.tagName("iframe")).size();
System.out.println("Number of frames: " + frameCount);

If there’s only one frame, you can switch to it by index:

driver.switchTo().frame(0); // Switches to the first frame

Note: Using indexes is risky, as adding new frames can break your script. Prefer WebElement based switching when possible.

Handling Child Windows in Selenium

What Are Child Windows?

A child window is a new browser tab or window opened by clicking a link or button on the parent page. For example, clicking a link on a login page might open a new tab with an email ID that you need to copy and paste back into the parent page.

In Selenium, both tabs and windows are treated as windows, and each has a unique window handle (ID).

Example Scenario: On the Practice Website, clicking a link opens a child window with an email ID. We’ll grab the email and enter it into the parent page’s username field.

Switching Between Parent and Child Windows

To switch between windows, use the switchTo().window() method and window handles.

Get all window handles:
```
  Set<String> windows = driver.getWindowHandles();
```
This returns a set of window IDs (e.g., parent ID and child ID).

Iterate through handles to find the child window:

  Iterator<String> it = windows.iterator();
  String parentId = it.next(); // First handle is parent
  String childId = it.next(); // Second handle is child

Switch to the child window:
```
  driver.switchTo().window(childId);
```
Interact with the child window:
- Locate the paragraph containing the email ID (e.g., class: “im-para red”).
- Extract the text:
```
  String text = driver.findElement(By.cssSelector(".im-para.red")).getText();
```

Extracting Data from a Child Window

The paragraph might contain extra text, so we need to parse it to extract only the email ID.

Steps:

Split by “at”: Split the text at “at” to separate the email part.
Trim spaces: Remove leading/trailing spaces.
Split by space: Further split to isolate the email.

String[] splitByAt = text.split("at");
String emailPart = splitByAt[1].trim(); // Get part after "at" and trim spaces
String[] splitBySpace = emailPart.split(" ");
String email = splitBySpace[0]; // Get email before space

Debugging Tip:

Use debugging to inspect the text at runtime.
Set a breakpoint, run in debug mode, and use the “Watch” feature to test string manipulations (e.g., split, trim).
This ensures you extract the correct email ID.

Switching Back to the Parent Window

After extracting the email, switch back to the parent window:

driver.switchTo().window(parentId);

Then, enter the email into the username field:

driver.findElement(By.id("username")).sendKeys(email);

Why Switch Back?

Selenium’s focus remains on the child window after switching. Attempting to interact with the parent page without switching back will cause a “NoSuchElementException”.

Complete Code:

driver.get("https://rahulshettyacademy.com/loginpagepractice");
driver.findElement(By.cssSelector(".blinkingText")).click();
Set<String> windows = driver.getWindowHandles();
Iterator<String> it = windows.iterator();
String parentId = it.next();
String childId = it.next();
driver.switchTo().window(childId);
String text = driver.findElement(By.cssSelector(".im-para.red")).getText();
String[] splitByAt = text.split("at");
String email KILLED = splitByAt[1].trim();
String[] splitBySpace = emailPart.split(" ");
String email = splitBySpace[0];

driver.switchTo().window(parentId);
driver.findElement(By.id("username")).sendKeys(email);

This script clicks the link, switches to the child window, extracts the email, switches back, and enters it into the username field.

Key Takeaways and Best Practices

Actions Class:
- Use for complex interactions like hover, right-click, double-click, and drag-and-drop.
- Always call build() and perform() to execute actions.
- Chain multiple actions for composite behaviors (e.g., typing in capitals).
Frames:
- Switch to frames using switchTo().frame() before interacting with elements inside.
- Use WebElement or frame ID for reliable switching; avoid indexes if possible.
- Return to the main page with switchTo().defaultContent() after frame operations.
- Count frames using findElements(By.tagName("iframe")).size().
Child Windows:
- Use getWindowHandles() to get window IDs and switchTo().window() to switch.
- Iterate through handles to access parent and child windows.
- Parse text carefully using string methods like split and trim.
- Always switch back to the parent window before interacting with its elements.
General Tips:
- Avoid relying on tools like SelectorsHub or ChroPath; practice writing locators manually using HTML and browser Developer Tools (like Chrome DevTools).
- Use debugging to validate string manipulations or locator accuracy.
- Maximize the browser window (driver.manage().window().maximize()) for better visibility during testing.

Conclusion

The Actions class, frames, and child window handling are advanced Selenium WebDriver features that enable you to automate complex user interactions and dynamic web elements. By mastering these concepts, you can create robust automation scripts that mimic real-world user behavior, such as hovering over menus, dragging elements, or switching between windows.

Check out the complete code repository below:

https://github.com/samikshakute/SeleniumLearning/tree/main/UserInteractions

Actions, Frames, and Child Window Handling in Selenium WebDriver

Table of contents

Introduction to the Actions Class

Why Use the Actions Class?

Using the Actions Class for Mouse and Keyboard Interactions

Setting Up the Actions Class

Moving to an Element (Mouse Hover)

Performing Composite Actions

Right-Clicking (Context Click)

Why `build()` and `perform()`?

Understanding Frames in Selenium

What Are Frames?

Switching to Frames

Handling Drag-and-Drop in Frames

Switching Back to the Main Page

Finding the Number of Frames

Handling Child Windows in Selenium

What Are Child Windows?

Switching Between Parent and Child Windows

Extracting Data from a Child Window

Switching Back to the Parent Window

Key Takeaways and Best Practices

Conclusion

Subscribe to my newsletter

Samiksha Kute

Samiksha Kute

Actions, Frames, and Child Window Handling in Selenium WebDriver

Table of contents

Introduction to the Actions Class

Why Use the Actions Class?

Using the Actions Class for Mouse and Keyboard Interactions

Setting Up the Actions Class

Moving to an Element (Mouse Hover)

Performing Composite Actions

Right-Clicking (Context Click)

Why build() and perform()?

Understanding Frames in Selenium

What Are Frames?

Switching to Frames

Handling Drag-and-Drop in Frames

Switching Back to the Main Page

Finding the Number of Frames

Handling Child Windows in Selenium

What Are Child Windows?

Switching Between Parent and Child Windows

Extracting Data from a Child Window

Switching Back to the Parent Window

Key Takeaways and Best Practices

Conclusion

Subscribe to my newsletter

Samiksha Kute

Samiksha Kute

Why `build()` and `perform()`?