Extracting an Authentication Token Using Selenium
Table of contents
- 1. Importing Necessary Libraries
- 2. Setting Up Chrome Options
- 3. Setting Up the Download Directory
- 4. Initializing the WebDriver
- 5. Defining Functions to Process Logs and Extract Tokens
- 6. Navigating to the Target URL
- 7. Capturing and Processing Logs
- 8. Extracting the Authentication Token
- 9. Finalizing the Browser Session
- Important Notes
- Step-by-Step Flow
- Practical Applications
- Ethical and Legal Considerations
- Additional Tips
- Conclusion
1. Importing Necessary Libraries
Before running the code, ensure you have the following libraries installed:
import os
import time
import json
from selenium import webdriver
os: For interacting with the operating system (e.g., file paths).
time: For adding delays in execution.
json: For parsing JSON data.
selenium.webdriver: For browser automation.
2. Setting Up Chrome Options
options = webdriver.ChromeOptions()
options.add_argument("--enable-logging")
options.add_argument("--v=1")
options.set_capability("goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"})
webdriver.ChromeOptions(): Initializes a ChromeOptions object to customize browser settings.
options.add_argument("--enable-logging"): Enables logging in Chrome.
options.add_argument("--v=1"): Sets the verbosity level of logging.
options.set_capability("goog:loggingPrefs", {...}): Configures Chrome to capture performance and browser logs, including network traffic.
3. Setting Up the Download Directory
if os.path.exists(os.path.join(os.path.abspath(os.getcwd()), output_etch)):
pass
else:
os.mkdir(os.path.join(os.path.abspath(os.getcwd()), output_etch))
request_path = os.path.join(os.path.abspath(os.getcwd()), output_etch)
prefs = {"download.default_directory": request_path}
options.add_experimental_option("prefs", prefs)
os.path.exists(...): Checks if the specified directory exists.
os.mkdir(...): Creates a new directory if it doesn't exist.
request_path: Stores the absolute path to the download directory.
prefs: A dictionary to set Chrome preferences, such as the default download directory.
options.add_experimental_option("prefs", prefs): Applies the preferences to the Chrome session.
4. Initializing the WebDriver
context.driver = webdriver.Chrome(options=options)
webdriver.Chrome(options=options): Starts a new Chrome browser session with the specified options.
context.driver: Stores the WebDriver instance for later use.
5. Defining Functions to Process Logs and Extract Tokens
a. Processing Browser Logs
def process_browser(logs):
for entry in logs:
log = json.loads(entry["message"])["message"]
if "Network.response" in log["method"] or \
"Network.request" in log["method"] or \
"Network.webSocket" in log["method"]:
yield log
logs: The raw logs captured from the browser.
json.loads(entry["message"]): Parses each log entry as JSON.
Filtering Logs: The function yields logs related to network responses, requests, or WebSocket communications.
b. Extracting the Bearer Token
def extract_bearer_token(event):
try:
headers = event["params"]["request"]["headers"]
auth_header = headers.get("Authorization", "")
if auth_header.startswith("Bearer"):
head = auth_header.split("Bearer ")[1].strip()
return head
except KeyError:
pass
return None
event: A single network log event.
headers = event["params"]["request"]["headers"]: Accesses the headers from the network request.
auth_header = headers.get("Authorization", ""): Retrieves the
Authorization
header if it exists.Extracting the Token: Checks if the
Authorization
header starts with"Bearer"
and extracts the token.Error Handling: Uses
try-except
to handle cases where keys may not exist.
6. Navigating to the Target URL
context.driver.get(chatgpt_url)
time.sleep(15)
context.driver.get(chatgpt_url): Navigates the browser to the specified URL (
chatgpt_url
should be defined elsewhere in your code).time.sleep(15): Waits for 15 seconds to allow the page to load and network requests to complete.
7. Capturing and Processing Logs
logs = context.driver.get_log("performance")
time.sleep(5)
context.driver.get_log("performance"): Retrieves the performance logs from the browser, which include network traffic.
time.sleep(5): Adds an additional wait time to ensure all logs are captured.
8. Extracting the Authentication Token
events = process_browser(logs)
bearer_token = set()
for event in events:
token = extract_bearer_token(event)
if token:
bearer_token.add(token)
for token in bearer_token:
context.auth_token = token
events: A generator object that yields filtered network events.
bearer_token: A set to store unique tokens found in the logs.
Iterating Over Events: For each event, attempt to extract the bearer token.
Adding Tokens to the Set: If a token is found, it's added to the
bearer_token
set.Storing the Token: The extracted token is stored in
context.auth_token
for later use.
9. Finalizing the Browser Session
time.sleep(2)
context.driver.maximize_window()
time.sleep(2): Waits for 2 seconds before proceeding.
context.driver.maximize_window(): Maximizes the browser window.
Important Notes
Context Object: The code uses a
context
object, which suggests it's part of a larger framework or testing setup (e.g., Behave for BDD testing). Ensure thatcontext
is properly initialized in your environment.Security Considerations:
Authorization Tokens: Bearer tokens are sensitive and should be handled securely.
Ethical Use: Ensure you have permission to access and extract authentication tokens from the target application.
Legal Compliance: Unauthorized access or extraction of tokens may violate terms of service or laws.
Step-by-Step Flow
Setup:
Initialize Chrome options to enable logging and set preferences.
Create the download directory if it doesn't exist.
Start a new Chrome WebDriver session with the specified options.
Define Helper Functions:
process_browser(logs)
: Filters and yields relevant network events.extract_bearer_token(event)
: Attempts to extract the bearer token from a network event.
Navigate to the URL:
Use the WebDriver to navigate to
chatgpt_url
.Wait for the page and network activities to complete.
Capture Logs:
Retrieve performance logs from the browser.
Process the logs to filter out relevant network events.
Extract Tokens:
Iterate over the filtered events.
Use
extract_bearer_token
to find and collect bearer tokens.Store the token in
context.auth_token
.
Finalize:
- Wait briefly before maximizing the browser window.
Practical Applications
Automated Testing: In testing environments, you might need to extract tokens to authenticate API requests.
Monitoring and Debugging: Capturing network logs can help in debugging authentication issues.
Performance Analysis: Analyzing network traffic for performance metrics.
Ethical and Legal Considerations
Permission Required: Always ensure you have explicit permission to access and extract authentication tokens from any application.
Data Privacy: Handle all sensitive data, including tokens, securely and responsibly.
Compliance: Be aware of and comply with all relevant laws, regulations, and terms of service.
Additional Tips
Exception Handling: Enhance the code with more robust error handling to manage unexpected scenarios.
Logging: Implement logging to record successes and failures during token extraction.
Modularity: Consider encapsulating the token extraction logic into a class or module for reusability.
Conclusion
This code automates a browser session using Selenium, captures network logs, and extracts an authentication token from the network requests. Understanding each part of the code helps in customizing it for specific use cases while adhering to ethical and legal standards.
Subscribe to my newsletter
Read articles from Sai Prasanna Maharana directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by