Learning Python: Week 1 - Core Concepts for Automation

Table of contents

I started learning Python a while ago, but I wanted to learn more about what I can do with it to automate my daily tasks and gain a better understanding of Python's potential. So, I started writing scripts to solve daily-life problems.
This article covers the concepts I learned and felt were important before starting my journey to “Automation with Python.”
I covered my basic concepts from the “Chai aur code” Playlist, which built my foundation to build further (reference: Chai aur Python Playlist )
Internal Working of Python
Python language needs a Python interpreter to compile and run the code
How Python interpreter works?
Compile to ByteCode ⇒ intermediary step
- ByteCode⇒ low-level and platform-independent
ByteCode runs faster than script cause mostly the checks and other things have already been done when compiled to bytecode
.pyc
→ These are compiled Python ( also calledFROZEN BINARIES
)What is pycache? (
__pychache__
)a system folder to organize all versions and changes in code
These underscores before and after
pycache
represent that this is for Python’s internal use.hello_chai.cpython-312.pyc
What does the name signify?source change & python version
Python uses a diffing algorithm to find the difference between the last code and the updated code to update in pycache file
cpthon
Is the type of Python interpreter we usually use for standard Python, and this312
Is the version installed on your system 3.12 (Pythonversion)
This .pyc file works only for imported files
not for top-level files or when you have only one file
- Python Virtual Machine (PVM)
It is a software that continuously loops the code to iterate bytecode
You can also run direct Python scripts in PVM
It has a run-time engine, also known as the Python interpreter
Byte Code is not machine Code, it’s a Python-specific interpretation
cpython (standard Implementation), Jython (with Java Binaries), Iron Python, stackless, PyPy
Mutable vs Immutable
Mutable | Immutable |
List | Integer |
Set | Floating-point numbers |
Dictionary | Boolean |
ByteArray | Strings |
Array | Tuples |
Frozen set | |
Bytes |
Python has almost every datatype as an object, eg→, String object, float object
The value we store is immutable in a string, and not the variable that is referring to that string
OOPS
a way of organizing code that uses objects and classes to represent real-world entities and their behavior
Basic Class and Object
class→ collection of objects or blueprints for creating objects
defines a set of attributes (properties) and methods that the created object can have
Imp Points
Created via keyword
class
Attributes = variables that belong to a class
Attributes are always public and can be accessed via the dot (.) operator
Class names should always be capitalized
Object → instance of class
represents a specific implementation of a class and holds its data
state → represented by an attribute and reflects properties of an object
behavior → represented by methods of an object and reflects the response of the object
identity → gives a unique name to an object and enables one object to interact with other objects
self
parameter is a reference to the current instance of the class. It allows us to access the attributes and methods of the object.init
method is the constructor in Python, automatically called when a new object is created. It initializes the attributes of the class.Class variable → variables that are shared across all the instances of a class
- defined outside any method in a class
Instance variable → variable unique to each instance (object) of a class
class Car:
# can't write like this otherwise we won't be able to change or set values
# brand=None
# model=None
# this Init method is a constructor
engine="TOP" # class variable
def __init__(self,brand,model): # via self we are giving context to class and variables to be accessed
# self is nothing but 'this' in JS
self.brand=brand # instance variable
self.model=model
my_car=Car("Toyota","Corolla") # Object created
print(my_car) # Output: <__main__.Car object at 0x000001E8F7F4
print(my_car.brand) # Output: Toyota
Inheritance
properties inherited from the parent tree
promotes code reuse
Types
Single Inheritance→ child class inherits from a single parent class
Multiple Inheritance→ child class inherits from more than one parent class
Multilevel Inheritance → child class inherits from parent class, which inherits from another class
Hierarchical Inheritance → multiple child classes inherit from a single parent class
Hybrid Inheritance → combination of two or more types
# Single Inheritance
class Dog:
def __init__(self, name):
self.name = name
def display_name(self):
print(f"Dog's Name: {self.name}")
class Labrador(Dog): # Single Inheritance
def sound(self):
print("Labrador woofs")
# Multilevel Inheritance
class GuideDog(Labrador): # Multilevel Inheritance
def guide(self):
print(f"{self.name}Guides the way!")
# Multiple Inheritance
class Friendly:
def greet(self):
print("Friendly!")
class GoldenRetriever(Dog, Friendly): # Multiple Inheritance
def sound(self):
print("Golden Retriever Barks")
# Example Usage
lab = Labrador("Buddy")
lab.display_name()
lab.sound()
guide_dog = GuideDog("Max")
guide_dog.display_name()
guide_dog.guide()
retriever = GoldenRetriever("Charlie")
retriever.display_name()
retriever.greet()
retriever.sound()
Encapsulation
When you add two underscores before an attribute or variable, it becomes private
private as in it can be accessed within the class only, but no object can access it
Types of Encapsulation:
Public Members: Accessible from anywhere.
Protected Members: Accessible within the class and its subclasses. (single underscore before
_
)Private Members: Accessible only within the class. (double underscore before
__
)
class Car:
def __init__(self,brand,model):
self.__brand=brand
self.model=model
def full_name(self):
return self.__brand +" "+ self.model
def get_brand(self):
return self.__brand+ " !"
my_car=Car("Tesla", "Model S")
print(my_car.__brand) # cam't be accessed directly cause brand is private attribute
print(my_car.get_brand()) # Output : Tesla !
Polymorphism
overriding or overloading the method
allows methods to have the same name but behave differently based on the object's context
Types
Compile-Time → determined during the compilation of the program
allows methods or operators with the same name to behave differently based on their input parameters
method overloading
Run-Time → determined during execution of the program
occurs when a subclass provides a specific implementation for a method already defined in its parent class
method overriding
# Parent Class
class Dog:
def sound(self):
print("dog sound") # Default implementation
# Run-Time Polymorphism: Method Overriding
class Labrador(Dog):
def sound(self):
print("Labrador woofs") # Overriding parent method
class Beagle(Dog):
def sound(self):
print("Beagle Barks") # Overriding parent method
# Compile-Time Polymorphism: Method Overloading Mimic
class Calculator:
def add(self, a, b=0, c=0):
return a + b + c # Supports multiple ways to call add()
# Run-Time Polymorphism
dogs = [Dog(), Labrador(), Beagle()]
for dog in dogs:
dog.sound() # Calls the appropriate method based on the object type
# Compile-Time Polymorphism (Mimicked using default arguments)
calc = Calculator()
print(calc.add(5, 10)) # Two arguments
print(calc.add(5, 10, 15)) # Three arguments
Abstraction
Hides internal implementation details while exposing only necessary functionality
Types of Abstraction:
Partial Abstraction: Abstract class contains both abstract and concrete methods.
Full Abstraction: Abstract class contains only abstract methods (like interfaces).
from abc import ABC, abstractmethod
class Dog(ABC): # Abstract Class
def __init__(self, name):
self.name = name
@abstractmethod
def sound(self): # Abstract Method
pass
def display_name(self): # Concrete Method
print(f"Dog's Name: {self.name}")
class Labrador(Dog): # Partial Abstraction
def sound(self):
print("Labrador Woof!")
class Beagle(Dog): # Partial Abstraction
def sound(self):
print("Beagle Bark!")
# Example Usage
dogs = [Labrador("Buddy"), Beagle("Charlie")]
for dog in dogs:
dog.display_name() # Calls concrete method
dog.sound() # Calls implemented abstract method
static keyword
There can be some functionality that relates to the class, but does not require any instance(s) to do some work; static methods can be used in such cases.
Objects can’t access a particular method, but the class can
static method is a method which is bound to the class and not the object of the class
uses a
@static_method
decorator
class Car:
total_car=0
def __init__(self,brand,model):
self.__brand=brand
self.model=model
Car.total_car+=1
# self.total_car+=1
def full_name(self):
return self.__brand +" "+ self.model
def get_brand(self):
return self.__brand+ " !"
def fuel_type(self):
return "Petrol or Diesel"
@staticmethod
def general_description():
return "This is a car"
my_car=Car("Toyota","Corolla") # Object created
print(my_car.general_description()) # Gives Error
print(Car.general_description())
Make an attribute read-only
Using the property decorator, we can make an attribute read-only and access it just like a property
property makes sure you cannot overwrite that attribute
class Car: total_car=0 def __init__(self,brand,model): self.__brand=brand self.__model=model Car.total_car+=1 # self.total_car+=1 def full_name(self): return self.__brand +" "+ self.__model def get_brand(self): return self.__brand+ " !" def fuel_type(self): return "Petrol or Diesel" @staticmethod def general_description(): return "This is a car" @property def model(self): return self.__model my_car=Car("Toyota","Corolla") # Object created my_car.model="City" # Gives Error that it can't be set print(my_car.model)
class inheritance and isinstance() function
using isinstance
to check if the said object is an instance of a particular class or not
class ElectricCar(Car):
def __init__(self,brand,model,battery_size):
super().__init__(brand,model) # calling parent class constructor (Hamse uper)
self.battery_size=battery_size
def fuel_type(self):
return "Electric Charge"
my_electric_car=ElectricCar("Tesla","Model S","85kWH")
print(f"{isinstance(my_electric_car,Car)} {isinstance(my_electric_car,ElectricCar)}") # True True
Multiple Inheritance
- Multiple inheritance is possible in Python
class Battery:
def battery_info(self):
return "This is a battery"
class Engine:
def engine_info(self):
return "This is an engine"
class ElectricCar2(Battery,Engine,Car):
pass
my_new_tesla=ElectricCar2("Tesla","Model R")
print(my_new_tesla.battery_info()) # This is battery
print(my_new_tesla.engine_info()) # This is engine
Special methods
Special methods in Python (also known as dunder methods, for “double underscore”) are methods with names like __init__
, __str__
, etc. They allow custom classes to integrate naturally with Python syntax and built-in functions.
- These are executed first in the code
They’re the reason you can do:
len(obj)
obj + other
print(obj)
and have them work your way.
| Method | Signature | Explanation | | --- | --- | --- | | Returns string for a printable representation of object |
__repr__(self)
|repr(x)
invokesx.__repr__()
, this is also invoked when an object is returned by a console | | Returns string representation of an object |__str__(self)
|str(x)
invokesx.__str__()
|Mathematical Operator
| Method | Signature | Explanation | | --- | --- | --- | | Add |
__add__(self, other)
|x + y
invokesx.__add__(y)
| | Subtract |__sub__(self, other)
|x - y
invokesx.__sub__(y)
| | Multiply |__mul__(self, other)
|x * y
invokesx.__mul__(y)
| | Divide |__truediv__(self, other)
|x / y
invokesx.__truediv__(y)
| | Power |__pow__(self, other)
|x ** y
invokesx.__pow__(y)
|Container-like class
| Method | Signature | Explanation | | --- | --- | --- | | Length |
__len__(self)
|len(x)
invokesx.__len__()
| | Get Item |__getitem__(self, key)
|x[key]
invokesx.__getitem__(key)
| | Set Item |__setitem__(self, key, item)
|x[key] = item
invokesx.__setitem__(key, item)
| | Contains |__contains__(self, item)
|item in x
invokesx.__contains__(item)
| | Iterator |__iter__(self)
|iter(x)
invokesx.__iter__()
| | Next |__next__(self)
|next(x)
invokesx.__next__()
|__name__ == "__main__"
The whole idea behind it is , when you are importing from a module, you would like to know whether a module’s function is being used as an import, or if you are using the original
.py
file of that moduleWhen an interpreter runs a module, the
__name__
variable will be set as__main__
If the module that is being run is the main programIf importing the module from another module, then
__name__
variable will be set to that module’s name
# Python module to import
print("File two __name__ is set to: {}" .format(__name__)) # return __main__
-------------------------------------------------------
# Python module to execute
import file_two
print("File one __name__ is set to: {}" .format(__name__)) # for this flle it return __main__ , for file_two it will return __file_two.py__
The variable
__name__
for the file/module that is run will always be__main__
. But the__name__
variable for all other modules that are being imported will be set to their module's name.Now, usually when you don’t specify and
__name__
condition your top-level code of the file will be executed as it isNow, when we use
if __name__ == "__main__"
condition after you code, suddenly your functions, classes, etc. will be loaded but not run when you call them inside this if block, then they will run.We can use an
if __name__ == "__main__"
block to allow or prevent parts of code from being run when the modules are imported.
Errors and Exception Handling
We can use error handling to attempt to plan for possible errors
If nothing is used, then when an error comes, the entire script will stop, and the error will be displayed to us
We can use Error Handling to let the script continue with other code, even if there is an error
We use three keywords for this
try
→ This block of code is to be attempted ( may lead to an error )except
→ block of code executed in case there is an error in the try blockfinally
→ block to be executed, regardless of an error
Pylint
Pylint is a tool that
Lists Errors that come after the execution of that Python code
Enforces a coding standard and looks for code smells
Suggest how particular blocks can be updated
Offer details about the code's complexity
Pylint tool is similar to pychecker, pyflakes, flake8, and mypy.
There are several testing tools, and we will focus on two
pylint → a library that looks at your code and reports back possible issues
pylint <file>
→ gives statistics and reports for the fileCheck the documentation as to what the standard is
Decorators
These are essentially a function that takes another function as an argument and returns a new function
often used with logging, authentication, and memorization, allowing us to add additional functionality to existing functions or methods in a clean, reusable way
Syntax
- The wrapper function allows the decorator to handle functions with any number and type of arguments.
def decorator_name(func):
def wrapper(*args, **kwargs):
# Add functionality before the original function call
result = func(*args, **kwargs)
# Add functionality after the original function call
return result
return wrapper
@decorator_name
def function_to_decorate():
# Original function code
pass
Higher-order functions
take one or more functions as arguments, and return a function as a result
Properties
Taking functions as arguments: a higher-order function can accept other functions as parameters
Returning functions: can return a new function that can be called later
# A higher-order function that takes another function as an argument
def fun(f, x):
return f(x)
# A simple function to pass
def square(x):
return x * x
# Using apply_function to apply the square function
res = fun(square, 5)
print(res)
Decorators are higher-order functions because they take a function as input, modify it, and return a new function
Functions as First-class Objects
meaning they can be treated like any other object, like integer, string, list
This gives functions a unique level of flexibility and allows them to be passed around and manipulated in ways that are not possible in many other programming languages.
Meaning
Functions can be assigned to variables
Functions can be passed as arguments
Functions can be returned from other functions
Functions can be stored in data structures (lists, dict, etc.)
Type of Decorators
Function Decorator
Most common type
takes a function as input and returns a new function
# Eg: def simple_decorator(func): def wrapper(): print("Before calling the function.") func() print("After calling the function.") return wrapper @simple_decorator def greet(): print("Hello, World!") greet()
Method Decorator
- often handle special cases such as
self
arguments for instance methods
- often handle special cases such as
# Eg:
def method_decorator(func):
def wrapper(self, *args, **kwargs):
print("Before method execution")
res = func(self, *args, **kwargs)
print("After method execution")
return res
return wrapper
class MyClass:
@method_decorator
def say_hello(self):
print("Hello!")
obj = MyClass()
obj.say_hello()
Class Decorator
used to modify or enhance the behavior of a class
Applied to the class definition
work by taking a class as an argument and returning a modified version of the class
def fun(cls):
cls.class_name = cls.__name__
return cls
@fun
class Person:
pass
print(Person.class_name)
Build-in decorators
Python provides built-in decorators that are commonly used in class definitions
Modify the behavior of the method and attributes in the class
most common
@staticmethod
→ used to define a method that doesn’t use self (don’t operate on an instance of a class)called directly, not via an object
#Eg: class MathOperations: @staticmethod def add(x, y): return x + y # Using the static method res = MathOperations.add(5, 3) print(res)
@classmethod
→ used to define a method that operates on the class itself (uses cls)- can access and modify class state that applies across all instances of class
class Employee:
raise_amount = 1.05
def __init__(self, name, salary):
self.name = name
self.salary = salary
@classmethod
def set_raise_amount(cls, amount):
cls.raise_amount = amount
# Using the class method
Employee.set_raise_amount(1.10)
print(Employee.raise_amount)
@property
→ used to define method as property, allows you to access it like attribute- useful for encapsulating the implementation of a method while still providing a simple interface.
class Circle:
def __init__(self, radius):
self._radius = radius
@property
def radius(self):
return self._radius
@radius.setter
def radius(self, value):
if value >= 0:
self._radius = value
else:
raise ValueError("Radius cannot be negative")
@property
def area(self):
return 3.14159 * (self._radius ** 2)
# Using the property
c = Circle(5)
print(c.radius)
print(c.area)
c.radius = 10
print(c.area)
Chaining Decorators
- decorating function with multiple decorators
# code for testing decorator chaining
def decor1(func):
def inner():
x = func()
return x * x
return inner
def decor(func):
def inner():
x = func()
return 2 * x
return inner
@decor1
@decor
def num():
return 10
@decor
@decor1
def num2():
return 10
print(num())
print(num2())
Generator
Allows us to write a function that can send back a value and then later resume to pick up where it left
A special type of function that returns an iterator object
Instead of using return to send back a single value, it uses
yield
to produce a series of results over timeThis allows the function to generate values and pause its execution after each yield, maintaining its state between iterations.
#Eg: def fun(max): cnt = 1 while cnt <= max: yield cnt cnt += 1 ctr = fun(5) for n in ctr: print(n)
Why needed?
Handle large or infinite data without loading everything into memory
yield items one by one, avoiding full list creation
generating value only when needed → improve performance
Ideal for generating unbound data like the Fibonacci series
Chain generators to process data in stages efficiently
Creating generators
def generator_function_name(parameters): # Your code here yield expression # Additional code can follow
Yield | Return |
used in generator function to provide sequence of values over time | used to exit a function and return final value |
when yield executes , it pauses function , return current value and retain state of function | once return executed, the function is terminated immediately, no state retained |
useful for generating large or complex sequence efficiently | useful when single result is needed |
Generator Expression
Concise way to create generators
similar to list comprehension, except this runs in
(
,)
more memory efficient
# Synax:
(expression for item in iterable)
# Eg:
sq = (x*x for x in range(1, 6))
for i in sq:
print(i)
Usecases
processing large data files, like logs
Using a generator makes this easy, you just call next() to get the next number without worrying about the stream ending.
Collections Module
built-in module of Python
Implements specialized container data type → alternative to Python’s built-in containers that are general-purpose
Why needed?
provides specialized container data types beyond built-in types like dict, list, and tuple
include efficient alternative → deque, Counter, OrderedDict, defaultdict, and namedtuple
simplifies complex data structure → cleaner and faster implementation
ideal for improving performance and code readability in data-heavy applications
Counters
subclass of dictionary
It is used to keep the count of the elements in an iterable in the form of an unordered dictionary, where the key represents the element in the iterable and the value represents the count of that element in the iterable.
*class collections.Counter([iterable-or-mapping])*
OrderedDict
Dictionary that preserves the order in which keys are inserted
While regular dictionaries do this from Python 3.7+, OrderedDict also offers extra features like moving re-inserted keys to the end, making it useful for order-sensitive operations.
Syntax:
*class collections.OrderDict()*
from collections import OrderedDict
print("This is a Dict:\\n")
d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
for key, value in d.items():
print(key, value)
print("\\nThis is an Ordered Dict:\\n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
for key, value in od.items():
print(key, value)
DefaultDict
subclass of dictionary
used to provide some default values for keys that don’t exist and never raises a KeyError
Syntax:
*class collections.defaultdict(default_factory)*
from collections import defaultdict
# Creating a defaultdict with default value of 0 (int)
d = defaultdict(int)
L = [1, 2, 3, 4, 2, 4, 1, 2]
# Counting occurrences of each element in the list
for i in L:
d[i] += 1 # No need to check key existence; default is 0
print(d)
ChainMap
encapsulates many dictionaries into a single unit and returns a list of dictionaries.
*class collections.ChainMap(dict1, dict2)*
from collections import ChainMap
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d3 = {'e': 5, 'f': 6}
# Defining the chainmap
c = ChainMap(d1, d2, d3)
print(c)
# OUTPUT: ChainMap({'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6})
NamedTuple
a regular tuple but with named fields, making data more readable and accessible
Instead of using indexes, you can access elements by name
Syntax:
*class collections.namedtuple(typename, field_names)*
from collections import namedtuple
# Declaring namedtuple()
Student = namedtuple('Student',['name','age','DOB'])
# Adding values
S = Student('Nandini','19','2541997')
# Access using index
print ("The Student age using index is : ",end ="")
print (S[1])
# Access using name
print ("The Student name using keyname is : ",end ="")
print (S.name)
Conversion operations
_make()
→ used to return a namedtuple() from iterable passed as argument_asdict()
→ returns OrderedDict() as constructed from mapped valuse of namedtuple()
Deque
Doubly Ended Queue
for quicker append and pop operations from both sides of the container
Time complexity for append and pop → O(1)
- list has O(n)
Syntax:
*class collections.deque(list)*
from collections import deque
# Declaring deque
queue = deque(['name','age','DOB'])
print(queue)
Inserting element →
appendleft(<element))
removing element →
popleft()
UserDict
Dictionary-like container that acts as a wrapper around dictionary objects
Container used when someone wants to create their own dictionary with some modified or new functionality
*class collections.UserDict([initialdata])*
from collections import UserDict
# Creating a dictionary where deletion is not allowed
class MyDict(UserDict):
# Prevents using 'del' on dictionary
def __del__(self):
raise RuntimeError("Deletion not allowed")
# Prevents using pop() on dictionary
def pop(self, s=None):
raise RuntimeError("Deletion not allowed")
# Prevents using popitem() on dictionary
def popitem(self, s=None):
raise RuntimeError("Deletion not allowed")
# Create an instance of MyDict
d = MyDict({'a': 1, 'b': 2, 'c': 3})
d.pop(1)
UserList
list-like container that acts as a wrapper around list objects
Useful when someone wants to create their own list with some modified or additional functionality
UserString
string-like container, and just like UserDict and UserList it acts as a wrapper around string objects
used when someone wants to create their own string with some modified or additional functionality
Syntax:
*class collections.UserString(seq)*
from collections import UserString
# Creating a Mutable String
class Mystring(UserString):
# Function to append to string
def append(self, s):
self.data += s
# Function to remove from string
def remove(self, s):
self.data = self.data.replace(s, "")
# Driver's code
s1 = Mystring("Geeks")
print("Original String:", s1.data)
# Appending to string
s1.append("s")
print("String After Appending:", s1.data)
# Removing from string
s1.remove("e")
print("String after Removing:", s1.data)
Web scraping
A general term for automating the gathering of data from a website
browser loads a website, the user gets to see what is known as the “front-end” of the website
Grab data from html of the object and return
Rules of Web Scraping
Always try to get permission before scraping
If made to many scraping attempts or requests, your IP address could be blocked
Some sites block scraping software
Limitations
Every website is unique, which means every web scraping script is unique
Python can view these HTML and CSS elements programmatically, and then extract info from them
Libraries used are BeautifulSoup, Scrapy , Selenium
required libraries
requests → send HTTP requests to get webpages content (used for static sites)
import requests response = requests.get('<https://www.geeksforgeeks.org/python/python-programming-language-tutorial/>') print(response.status_code) print(response.content)
request.get (url)
→ sends GET request to given URLresponse.status_code
→ return HTTP status coderesponse.content
→ returns raw HTML of pages in bytesBeautifulSoup4 → parses and extract HTML content
import requests from bs4 import BeautifulSoup response = requests.get('<https://www.geeksforgeeks.org/python/python-programming-language-tutorial/>') soup = BeautifulSoup(response.content, 'html.parser') print(soup.prettify())
output
helps convert raw HTML to a searchable tree of elements
BeautifulSoup(html, parser) → converts HTML into searchable object ,
html.parser
isa built-in parsersoup.prettify() → formats HTML nicely for easier reading
soup.find('div', class_='article--viewer_content')
→ to find by particular element and tag- selenium→ automates browsers (needed for dynamic sites with JS)
WebDriver → software component Selenium uses to interact with the browser
bridge between Python and browser
Each browser has its own driver
Selenium uses this WebDriver to:
Open and control the browser
Load web pages
Extract elements
Simulate clicks, scrolls and inputs
*You can either manually download the WebDriver or use **webdriver-manager** which handles the download and setup automatically.*
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
element_list = []
# Set up Chrome options (optional)
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Run in headless mode (optional)
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
# Use a proper Service object
service = Service(ChromeDriverManager().install())
for page in range(1, 3):
# Initialize driver properly
driver = webdriver.Chrome(service=service, options=options)
# Load the URL
url = f"<https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page=%7Bpage%7D>"
driver.get(url)
time.sleep(2) # Optional wait to ensure page loads
# Extract product details
titles = driver.find_elements(By.CLASS_NAME, "title")
prices = driver.find_elements(By.CLASS_NAME, "price")
descriptions = driver.find_elements(By.CLASS_NAME, "description")
ratings = driver.find_elements(By.CLASS_NAME, "ratings")
# Store results in a list
for i in range(len(titles)):
element_list.append([
titles[i].text,
prices[i].text,
descriptions[i].text,
ratings[i].text
])
driver.quit()
# Display extracted data
for row in element_list:
print(row)
ChromeOptions() + --headless: Runs the browser in the background without opening a visible window — ideal for automation and speed.
ChromeDriverManager().install(): Automatically downloads the correct version of ChromeDriver based on your Chrome browser.
Service(...): Wraps the ChromeDriver path for proper configuration with Selenium 4+.
webdriver.Chrome(service=..., options=...): Launches a Chrome browser instance with the given setup.
driver.get(url): Navigates to the specified page URL.
find_elements(By.CLASS_NAME, "class"): Extracts all elements matching the given class name like titles, prices, etc.
.text: Retrieves the visible text content from an HTML element.
element_list.append([...]): Stores each product's extracted data in a structured list.
driver.quit(): Closes the browser to free system resources.
- lxml → fast HTML/XML parser, useful for large or complex pages
from lxml import html import requests url = '<https://example.com/>' response = requests.get(url) tree = html.fromstring(response.content) # Extract all link texts link_titles = tree.xpath('//a/text()') for title in link_titles: print(title)
html.fromstring(): Parses HTML into an element tree.
tree.xpath(): Uses XPath to extract specific tags or data.
- urllib
built-in library providing functions for working with URLs
allows you to interact with web pages by fetching URLs , opening and reading data from them and performing other URL-related tasks like encoding and parsing
urllib.request for opening and reading.
urllib.parse for parsing URLs
urllib.error for the exceptions raised
urllib.robotparser for parsing robot.txt files
import urllib.request
# URL of the web page to fetch
url = '<https://www.example.com/>'
try:
response = urllib.request.urlopen(url)
data = response.read()
# Decode the data (if it's in bytes) to a string
html_content = data.decode('utf-8')
# Print the HTML content of the web page
print(html_content)
except Exception as e:
print("Error fetching URL:", e)
schedule → lets you run scraping tasks repeatedly at fixed intervals
- simple library that allows you to schedule Python functions to run at specified intervals
import schedule
import time
def func():
print("Geeksforgeeks")
schedule.every(1).minutes.do(func)
while True:
schedule.run_pending()
time.sleep(1)
schedule.every().minutes.do(): Schedules your function.
run_pending(): Checks if any job is due.
time.sleep(): Prevents the loop from hogging CPU.
- pyautogui → Automates mouse and keyboard ; useful when dealign with UI =-based interaction
Simulate mouse and keyboard actions. It’s useful if elements aren’t reachable via Selenium like special pop-ups or custom scrollbars.
import pyautogui
# moves to (519,1060) in 1 sec
pyautogui.moveTo(519, 1060, duration = 1)
# simulates a click at the present mouse position
pyautogui.click()
pyautogui.moveTo(1717, 352, duration = 1)
pyautogui.click()
Subscribe to my newsletter
Read articles from MRIDUL TIWARI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

MRIDUL TIWARI
MRIDUL TIWARI
Software Engineer | Freelancer | Content Creator | Open Source Enthusiast | I Build Websites and Web Applications for Remote Clients.