Optimizing Automation Scripts: How We Reduced Execution Time from 1 Hour to 30 Seconds, a 140x Improvement

Suhail AkhtarSuhail Akhtar
3 min read

TL;DR

We optimized our Pub/Sub automation script, reducing execution time from over 1 hour to less than 30 seconds using Node.js. Here’s a quick overview of the improvements:

  1. Initial Python Script: Sequential processing, took over 1 hour.

  2. Improved Python Script: Asynchronous processing, reduced time to 5 minutes.

  3. Node.js Script: Further optimized, reduced time to less than 30 seconds.

From Over 1 Hour to Less Than 30 Seconds: Optimizing Pub/Sub Automation

In our quest to enhance performance, we transformed our Pub/Sub automation script from a slow, sequential process to a lightning-fast, asynchronous one. Here’s how we did it.

Initial Python Script

Our initial script was simple but slow, taking over 1 hour to create Pub/Sub topics and subscriptions sequentially.

from google.cloud import pubsub_v1
import mysql.connector

def create_topic(project_id, topic_name):
    publisher = pubsub_v1.PublisherClient()
    publisher.create_topic(request={"name": publisher.topic_path(project_id, topic_name)})

def create_subscription(project_id, topic_name, subscription_name, filter_expression):
    subscriber = pubsub_v1.SubscriberClient()
    subscriber.create_subscription(request={
        "name": subscriber.subscription_path(project_id, subscription_name),
        "topic": subscriber.topic_path(project_id, topic_name),
        "filter": filter_expression,
    })

def fetch_data():
    conn = mysql.connector.connect(host="", port="3306", user="", password="", database="ecms")
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM your_table")
    results = cursor.fetchall()
    conn.close()
    return results

def create_pubsub_resources(project_id, name):
    topic_name = f"topic-{name}"
    subscription_name = f"subscription-{name}"
    filter_expression = f'attributes.name = "{name}"'
    create_topic(project_id, topic_name)
    create_subscription(project_id, topic_name, subscription_name, filter_expression)

if __name__ == "__main__":
    project_id = "your-project-id"
    data = fetch_data()
    for (name,) in data:
        create_pubsub_resources(project_id, name)

Key Details:

  • Sequential Processing: Each Pub/Sub resource is created one after the other.

  • Execution Time: Over 1 hour due to the sequential nature and blocking I/O operations.

Improved Python Script

By introducing asynchronous processing, we reduced the execution time to 5 minutes.

import asyncio
from google.cloud import pubsub_v1
import mysql.connector

async def create_topic(publisher, project_id, topic_name):
    await asyncio.to_thread(publisher.create_topic, request={"name": publisher.topic_path(project_id, topic_name)})

async def create_subscription(subscriber, project_id, topic_name, subscription_name, filter_expression):
    await asyncio.to_thread(subscriber.create_subscription, request={
        "name": subscriber.subscription_path(project_id, subscription_name),
        "topic": subscriber.topic_path(project_id, topic_name),
        "filter": filter_expression,
    })

def fetch_data():
    conn = mysql.connector.connect(host="", port="3306", user="", password="", database="ecms")
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM your_table")
    results = cursor.fetchall()
    conn.close()
    return results

async def create_pubsub_resources(project_id, name):
    publisher = pubsub_v1.PublisherClient()
    subscriber = pubsub_v1.SubscriberClient()
    topic_name = f"topic-{name}"
    subscription_name = f"subscription-{name}"
    filter_expression = f'attributes.name = "{name}"'
    await create_topic(publisher, project_id, topic_name)
    await create_subscription(subscriber, project_id, topic_name, subscription_name, filter_expression)

async def main():
    project_id = "your-project-id"
    data = fetch_data()
    tasks = [create_pubsub_resources(project_id, name) for (name,) in data]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

Key Details:

  • Asynchronous Processing: Uses asyncio to run tasks concurrently.

  • Execution Time: Reduced to 5 minutes by overlapping I/O operations.

Node.js Script

Finally, rewriting the script in Node.js reduced the execution time to less than 30 seconds.

const {PubSub} = require('@google-cloud/pubsub');
const mysql = require('mysql2');
const {promisify} = require('util');

async function createTopic(pubSubClient, topicName) {
    const topic = pubSubClient.topic(topicName);
    await topic.create();
}

async function createSubscription(pubSubClient, topicName, subscriptionName, filterExpression) {
    const topic = pubSubClient.topic(topicName);
    const subscription = topic.subscription(subscriptionName);
    await subscription.create({filter: filterExpression});
}

async function fetchData() {
    const connection = mysql.createConnection({host: "", port: 3306, user: "", password: "", database: "ecms"});
    const query = "SELECT name FROM your_table";
    const rows = await promisify(connection.query).bind(connection)(query);
    connection.end();
    return rows;
}

async function createPubSubResources(pubSubClient, name) {
    const topicName = `topic-${name}`;
    const subscriptionName = `subscription-${name}`;
    const filterExpression = `attributes.name = "${name}"`;
    await createTopic(pubSubClient, topicName);
    await createSubscription(pubSubClient, topicName, subscriptionName, filterExpression);
}

async function main() {
    const projectId = "your-project-id";
    const pubSubClient = new PubSub({projectId});
    const data = await fetchData();
    const tasks = data.map(({name}) =>
        createPubSubResources(pubSubClient, name)
    );
    await Promise.all(tasks);
}

main().catch(console.error);

Key Details:

  • Non-Blocking I/O: Node.js handles I/O operations asynchronously, making it highly efficient for this task.

  • Execution Time: Reduced to less than 30 seconds due to the efficient handling of concurrent operations.

Conclusion

By leveraging asynchronous programming and Node.js, we achieved a significant performance boost, reducing the execution time from over 1 hour to less than 30 seconds. This journey underscores the importance of optimizing code for better efficiency and scalability.

0
Subscribe to my newsletter

Read articles from Suhail Akhtar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Suhail Akhtar
Suhail Akhtar