Optimizing Automation Scripts: How We Reduced Execution Time from 1 Hour to 30 Seconds, a 140x Improvement


TL;DR
We optimized our Pub/Sub automation script, reducing execution time from over 1 hour to less than 30 seconds using Node.js. Here’s a quick overview of the improvements:
Initial Python Script: Sequential processing, took over 1 hour.
Improved Python Script: Asynchronous processing, reduced time to 5 minutes.
Node.js Script: Further optimized, reduced time to less than 30 seconds.
From Over 1 Hour to Less Than 30 Seconds: Optimizing Pub/Sub Automation
In our quest to enhance performance, we transformed our Pub/Sub automation script from a slow, sequential process to a lightning-fast, asynchronous one. Here’s how we did it.
Initial Python Script
Our initial script was simple but slow, taking over 1 hour to create Pub/Sub topics and subscriptions sequentially.
from google.cloud import pubsub_v1
import mysql.connector
def create_topic(project_id, topic_name):
publisher = pubsub_v1.PublisherClient()
publisher.create_topic(request={"name": publisher.topic_path(project_id, topic_name)})
def create_subscription(project_id, topic_name, subscription_name, filter_expression):
subscriber = pubsub_v1.SubscriberClient()
subscriber.create_subscription(request={
"name": subscriber.subscription_path(project_id, subscription_name),
"topic": subscriber.topic_path(project_id, topic_name),
"filter": filter_expression,
})
def fetch_data():
conn = mysql.connector.connect(host="", port="3306", user="", password="", database="ecms")
cursor = conn.cursor()
cursor.execute("SELECT name FROM your_table")
results = cursor.fetchall()
conn.close()
return results
def create_pubsub_resources(project_id, name):
topic_name = f"topic-{name}"
subscription_name = f"subscription-{name}"
filter_expression = f'attributes.name = "{name}"'
create_topic(project_id, topic_name)
create_subscription(project_id, topic_name, subscription_name, filter_expression)
if __name__ == "__main__":
project_id = "your-project-id"
data = fetch_data()
for (name,) in data:
create_pubsub_resources(project_id, name)
Key Details:
Sequential Processing: Each Pub/Sub resource is created one after the other.
Execution Time: Over 1 hour due to the sequential nature and blocking I/O operations.
Improved Python Script
By introducing asynchronous processing, we reduced the execution time to 5 minutes.
import asyncio
from google.cloud import pubsub_v1
import mysql.connector
async def create_topic(publisher, project_id, topic_name):
await asyncio.to_thread(publisher.create_topic, request={"name": publisher.topic_path(project_id, topic_name)})
async def create_subscription(subscriber, project_id, topic_name, subscription_name, filter_expression):
await asyncio.to_thread(subscriber.create_subscription, request={
"name": subscriber.subscription_path(project_id, subscription_name),
"topic": subscriber.topic_path(project_id, topic_name),
"filter": filter_expression,
})
def fetch_data():
conn = mysql.connector.connect(host="", port="3306", user="", password="", database="ecms")
cursor = conn.cursor()
cursor.execute("SELECT name FROM your_table")
results = cursor.fetchall()
conn.close()
return results
async def create_pubsub_resources(project_id, name):
publisher = pubsub_v1.PublisherClient()
subscriber = pubsub_v1.SubscriberClient()
topic_name = f"topic-{name}"
subscription_name = f"subscription-{name}"
filter_expression = f'attributes.name = "{name}"'
await create_topic(publisher, project_id, topic_name)
await create_subscription(subscriber, project_id, topic_name, subscription_name, filter_expression)
async def main():
project_id = "your-project-id"
data = fetch_data()
tasks = [create_pubsub_resources(project_id, name) for (name,) in data]
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())
Key Details:
Asynchronous Processing: Uses
asyncio
to run tasks concurrently.Execution Time: Reduced to 5 minutes by overlapping I/O operations.
Node.js Script
Finally, rewriting the script in Node.js reduced the execution time to less than 30 seconds.
const {PubSub} = require('@google-cloud/pubsub');
const mysql = require('mysql2');
const {promisify} = require('util');
async function createTopic(pubSubClient, topicName) {
const topic = pubSubClient.topic(topicName);
await topic.create();
}
async function createSubscription(pubSubClient, topicName, subscriptionName, filterExpression) {
const topic = pubSubClient.topic(topicName);
const subscription = topic.subscription(subscriptionName);
await subscription.create({filter: filterExpression});
}
async function fetchData() {
const connection = mysql.createConnection({host: "", port: 3306, user: "", password: "", database: "ecms"});
const query = "SELECT name FROM your_table";
const rows = await promisify(connection.query).bind(connection)(query);
connection.end();
return rows;
}
async function createPubSubResources(pubSubClient, name) {
const topicName = `topic-${name}`;
const subscriptionName = `subscription-${name}`;
const filterExpression = `attributes.name = "${name}"`;
await createTopic(pubSubClient, topicName);
await createSubscription(pubSubClient, topicName, subscriptionName, filterExpression);
}
async function main() {
const projectId = "your-project-id";
const pubSubClient = new PubSub({projectId});
const data = await fetchData();
const tasks = data.map(({name}) =>
createPubSubResources(pubSubClient, name)
);
await Promise.all(tasks);
}
main().catch(console.error);
Key Details:
Non-Blocking I/O: Node.js handles I/O operations asynchronously, making it highly efficient for this task.
Execution Time: Reduced to less than 30 seconds due to the efficient handling of concurrent operations.
Conclusion
By leveraging asynchronous programming and Node.js, we achieved a significant performance boost, reducing the execution time from over 1 hour to less than 30 seconds. This journey underscores the importance of optimizing code for better efficiency and scalability.
Subscribe to my newsletter
Read articles from Suhail Akhtar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
