Automate browsing on Debian 12 (KDE 5.7) using image recognition, keyboard, and mouse automation.

user1272047user1272047
3 min read

To automate browsing on Debian 12 (KDE 5.7) using image recognition, keyboard, and mouse automation, follow this structured approach:


0. LLM API.

U can use an LLM , say gpt4,

to generate a script with the below Tools: xdotool, AutoKey, wmctrl, scrot, PyAutoGUI, OpenCV .


1. Install Required Tools

Run the following command to install all dependencies:

sudo apt update && sudo apt install -y xdotool autokey-gtk autokey-qt wmctrl qdbus scrot python3-pip
pip install pyautogui opencv-python numpy pillow
  • xdotool → Simulate keyboard & mouse input.

  • AutoKey → Macro automation.

  • wmctrl → Control windows.

  • scrot → Screenshot tool for image recognition.

  • PyAutoGUI → GUI automation with image recognition.

  • OpenCV → Advanced image processing.


2. Automate Browser Actions Using Image Recognition

Using PyAutoGUI

(a) Open Browser & Navigate

import pyautogui
import time

# Open Firefox
pyautogui.hotkey('ctrl', 'alt', 't')  # Open terminal
time.sleep(1)
pyautogui.write('firefox\n')  # Launch Firefox
time.sleep(3)

# Click on Address Bar
pyautogui.hotkey('ctrl', 'l')  
pyautogui.write('https://www.debian.org\n')

(b) Locate & Click Elements via Screenshots

  1. Take a Screenshot of the Button

     scrot ~/button.png
    
  2. Find & Click the Button

     button = pyautogui.locateCenterOnScreen('~/button.png')
     if button:
         pyautogui.click(button)
     else:
         print("Button not found!")
    

3. Automate Browser with xdotool

(a) Open a Website

xdotool search --onlyvisible --class "firefox" windowactivate key ctrl+l type "https://www.google.com" key Return

(b) Click a Button via Coordinates

xdotool mousemove 500 300 click 1

(c) Scroll & Navigate

xdotool key Down Down Down
xdotool key ctrl+Tab  # Switch tab

4. Automate Browser Using Selenium (Headless)

If GUI automation is not required, use Selenium WebDriver.

(a) Install Selenium & Chrome WebDriver

pip install selenium webdriver-manager
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("https://www.google.com")

search_box = driver.find_element("name", "q")
search_box.send_keys("Debian 12 Automation")
search_box.send_keys(Keys.RETURN)

5. Automate KDE Desktop & Browser With AutoKey

  1. Open AutoKey (autokey-gtk)

  2. Create a New Script:

     keyboard.send_keys("<ctrl>+t")  # Open a new tab
     time.sleep(1)
     keyboard.send_keys("https://www.debian.org\n")  # Type URL
    
  3. Assign a Hotkey (Example: Ctrl+Shift+X)


6. Automate Browser Using QDBus

Open a Firefox Tab:

qdbus org.mozilla.firefox /browser org.mozilla.firefox.LoadURI "https://www.debian.org"

Summary

MethodPurpose
PyAutoGUIImage-based mouse & keyboard automation
xdotoolSimulate keyboard & mouse
AutoKeyAutomate text input & macros
SeleniumWeb automation (headless)
QDBusControl KDE applications

EXAMPLE 1 .

xdotool search --onlyvisible --class "firefox" windowactivate --sync key --delay 100 ctrl+l
xdotool type "https://www.google.com"
xdotool key Return
0
Subscribe to my newsletter

Read articles from user1272047 directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

user1272047
user1272047