πŸ”„ Mastering Multithreading, Multiprocessing & Data Science Essentials! πŸš€

Manav RastogiManav Rastogi
3 min read

Welcome, tech enthusiasts! Today, we dive into Multithreading, Multiprocessing, Data Science Process, and Numpy from basics to advanced concepts. Let’s get started! πŸ’ͺ


✨ Multithreading in Python

Multithreading allows multiple threads to run concurrently, making it ideal for I/O-bound tasks like:

  • Reading & Writing files πŸ“‚

  • Network communication πŸ›‘

  • Database queries πŸ“‚

Example: Downloading Files Using Multithreading

import time
import threading
import urllib.request

start = time.perf_counter()

url_list = [
    "https://github.com/itsfoss/text-script-files/blob/master/agatha.txt",
    "https://github.com/itsfoss/text-script-files/blob/master/agatha_complete.txt",
    "https://github.com/itsfoss/text-script-files/blob/master/sample_log_file.txt"
]

data_list = ["List1.txt", "List2.txt", "List3.txt"]

def file_download(url, filename):
    urllib.request.urlretrieve(url, filename)
    print(f"Downloaded {filename}")

threads = []
for i in range(len(url_list)):
    t = threading.Thread(target=file_download, args=(url_list[i], data_list[i]))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

end = time.perf_counter()
print(f"The program finished in {end - start} seconds")

Why Use Multithreading?

βœ… Efficient for tasks that spend time waiting (I/O operations). βœ… Improves speed by allowing other threads to execute during wait time.


πŸ’ͺ Multiprocessing in Python

Multiprocessing utilizes multiple CPU cores, making it perfect for CPU-bound tasks like:

  • Data processing πŸ“Š

  • Image rendering πŸ–Ό

  • Parallel computing πŸš€

Example: Running Tasks in Parallel

import multiprocessing
import time

start = time.perf_counter()

def test_func():
    print("Do something")
    print("Sleeping for 1 sec")
    time.sleep(1)
    print("Done with sleeping")

processes = []
for _ in range(10):
    p = multiprocessing.Process(target=test_func)
    p.start()
    processes.append(p)

for process in processes:
    process.join()

end = time.perf_counter()
print(f"The program finished in {end - start} seconds")

Why Use Multiprocessing?

βœ… Utilizes multiple CPU cores. βœ… Ideal for tasks requiring heavy computations.


🎯 Data Science Toolkit

A powerful set of Python libraries used in data science:

  • Numpy – Numerical computations.

  • Pandas – Data manipulation and analysis.

  • Matplotlib – Data visualization.

  • Seaborn – Statistical data visualization.

  • Plotly – Interactive plotting.

  • Bokeh – Web-based interactive visualization.


🌟 Data Science Process: CRISP-DM Framework

CRISP-DM (Cross-Industry Standard Process for Data Mining) is a structured approach to solving data science problems:

1. Business Understanding

  • Define the problem. Example: Predicting house prices.

  • Key factors: Area, Number of rooms, Location.

2. Data Understanding

  • Understand how variables impact the target variable.

  • Example: More rooms & larger area = Higher price.

3. Data Preparation

  • Cleaning, transforming, and selecting relevant data.

4. Modeling

  • Training machine learning models.

5. Evaluation

  • Assess model performance.

6. Deployment

  • Deploying the model into production.

πŸ€– Deep Dive into Numpy (Basic to Advanced)

✨ What is Numpy?

Numpy (Numerical Python) provides fast and efficient operations on large arrays.

🌟 Basic Numpy Operations

import numpy as np

# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
print(arr)

# Shape & Size
print(arr.shape)  # Output: (5,)
print(arr.size)   # Output: 5

🌟 Numpy Array Operations

# Arithmetic Operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(arr1 + arr2)  # Output: [5 7 9]
print(arr1 * arr2)  # Output: [ 4 10 18]

🌟 Numpy Indexing & Slicing

matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix[0, 1])  # Access element at row 0, column 1 (Output: 2)
print(matrix[:, 1])  # Get entire second column

🌟 Advanced Numpy - Broadcasting

arr = np.array([1, 2, 3])
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(arr + matrix)  # Broadcasting: Adds arr to each row

🌟 Numpy Random Module

# Generating Random Numbers
rand_arr = np.random.rand(3, 3)  # 3x3 random numbers between 0 and 1
print(rand_arr)

🌟 Numpy Aggregations

arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr))  # Average value
print(np.sum(arr))   # Sum of elements
print(np.max(arr))   # Maximum value

πŸš€ Conclusion

Mastering Multithreading, Multiprocessing, and Numpy will boost your Python performance and data science skills! Keep experimenting and leveling up your expertise! πŸŽ‰

Happy Coding! πŸš€πŸ’»

0
Subscribe to my newsletter

Read articles from Manav Rastogi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Manav Rastogi
Manav Rastogi

"Aspiring Data Scientist and AI enthusiast with a strong foundation in full-stack web development. Passionate about leveraging data-driven solutions to solve real-world problems. Skilled in Python, databases, statistics, and exploratory data analysis, with hands-on experience in the MERN stack. Open to opportunities in Data Science, Generative AI, and full-stack development."