Introduction

Asynchronous programming is an essential part of modern software development, regardless of the programming language you use. It's crucial for optimizing hardware usage and improving application performance. However, in my day-to-day work, I've noticed that terms like async, parallel, and concurrency are often used interchangeably, leading to confusion. This article aims to address this issue for three main reasons:

To explore the history and establish clear distinctions between these various terms.
To incorporate relevant academic concepts from Computer Science, helping to connect the dots.
To create a foundational article that will serve as a basis for future discussions on multithreaded programming.

In this article, we'll examine the buzzwords async, parallel and concurrency clarifying their meanings and how they should be used. We'll also compare how these concepts are implemented across different programming languages, providing a comprehensive overview of these critical programming paradigms.

Moore’s Law and a little bit of History

Moore's Law, an empirical observation rather than a physical law, states that the number of transistors in an integrated circuit (IC) doubles approximately every two years. This principle has largely held true since its inception in 1965.

For many years, this increase in transistor count directly translated to higher clock frequencies in single processors, as illustrated in the figure below.

(Source: https://github.com/karlrupp/microprocessor-trend-data)

However, as highlighted in the figure, a significant shift occurred around 2008. Chip manufacturers stopped increasing clock speeds, despite the continued growth in transistor count. This change was driven by the realization that higher clock frequencies led to increased heat generation and power consumption, making it challenging to use these chips in smaller computers like laptops. The need for larger cooling systems and reduced battery life made it difficult to manufacture compact, portable devices.

To address these issues while still adhering to Moore's Law, chip makers began increasing the number of logical cores instead of clock speeds. This shift marked a turning point in processor design, focusing on multi-core architectures to boost computational power rather than relying solely on faster single-core processors.

💡

Interestingly, this shift parallels the preference for horizontal scaling over vertical scaling in computing infrastructure, where adding more machines (cores) is often favored over increasing the power of a single machine (core).

This transition to multi-core processors has had profound implications for software development, necessitating new approaches to take full advantage of the available computational resources.

Concurrency

Concurrency is a broad term often used interchangeably with multitasking or delegation. Let's explore this concept using an everyday example.

Imagine a morning schedule with the following tasks:

Wash dishes (30 minutes)
Do laundry (30 minutes)
Cook lunch (60 minutes)
Shower and prayer (30 minutes)

Completing these tasks sequentially would take 30 + 30 + 60 + 30 = 150 minutes or 2 hours and 30 minutes.

Async / Multitasking

In our example, we can observe opportunities for multitasking:

Dishwashing: 15 minutes to load/unload, 15 minutes of independent machine operation
Laundry: 15 minutes to load/unload, 15 minutes of independent machine operation
Cooking lunch: Requires full engagement
Showering and prayer: Requires full engagement

A more efficient approach would be:

Load the dishwasher
While dishes are washing, load the laundry
Start cooking lunch
Respond to signals from the dishwasher and washing machine to unload when ready
Resume cooking
After cooking, shower and pray

This multitasking approach saves about 30 minutes from the original schedule.

Multi Processing/ Parallelism

To save even more time, we could outsource lunch preparation to a nearby eatery that delivers home-cooked meals. This parallel processing reduces the total time to 1 hour and 15 minutes—a 50% reduction from the original schedule.

Key Points:

The 50% time reduction was achieved through both multitasking (async) and multi-processing (parallelism).
The entire optimized morning routine is an example of concurrent operations.
Both async and parallelism are subsets of concurrency.

Correlating above example with programming

Asynchronous Operations:

IO operations (e.g., reading from disk or socket) are analogous to doing dishes and laundry. When the processor initiates a disk read, it delegates to a disk controller and can perform other tasks while waiting. Once the operation completes, an interrupt (similar to a machine's signal) notifies the processor that the data is ready.

Parallel Processing:

CPU-intensive calculations benefit from additional processors, much like outsourcing lunch preparation. For example, a desktop app might perform intensive calculations on a background thread using a separate processor, while the main thread keeps the app responsive on another processor.

This real-world analogy helps illustrate how async operations, parallel processing, and concurrency work together in modern computing to optimize resource usage and improve overall efficiency.

Limitations of Concurrency

We've observed that concurrency encompasses both multitasking and parallelism, achieved through:

Multitasking on a single processor
Parallel computation on multiple processors

While this may seem straightforward at first glance, it's actually quite complex. Let's delve deeper into our morning chores example to illustrate some key challenges:

Synchronization Requirements:

Even in our simplified scenario, there are frequent occasions that require synchronization:

The dishwasher and washing machine need attention when they signal completion.
Lunch delivery requires a response when the doorbell rings.

These synchronization points highlight the need for careful coordination in concurrent systems.

Limits of Parallelization:

There's a limit to how much parallelization can optimize a process. For instance, if we added two more helpers to our morning routine:

Helper 1: Load dishes, wait, unload dishes

Helper 2: Load laundry, wait, unload laundry

Eatery: Cook and deliver lunch

Self: Shower and pray

In this scenario, we would only save an additional 30 minutes. The entire process can't be shortened beyond 1 hour because cooking and delivering lunch remains the longest task. Notably, Helper 1 and Helper 2 spend a significant amount of time waiting idly by their respective machines.

Adding more helpers beyond this point would be inefficient, as they would have no tasks to perform and would sit idle.

Amdahl’s Law

Amdahl's Law focuses on the potential speedup of a program when part of it is improved or parallelized.

Speedup:

$$\frac{1}{(1 - P) + \frac{P}{N}}$$

Where:

P = Portion of the program that can be parallelized

N = Number of processors

(1 - P) = Portion that remains sequential

This law shows that the overall speedup is limited by the sequential part of the program. As N increases, the speedup approaches a maximum limit of 1 / (1 - P).

Gustafson’s Law

Gustafson's Law considers that as we get more computing resources, we tend to take on larger problems rather than just solving the same problem faster.

Scaled speedup:

$$N + (1 - N) \cdot s$$

Where:

N = Number of processors

s = Serial fraction of the program

This law suggests that the speedup can scale roughly linearly with the number of processors for many real-world problems.

Imagine a financial reporting system that needs to gather data from various financial data providers, process the data, and then combine the results into a comprehensive report.

Amdahl's Law would apply to the part of the program that fetches the data from the independent sources (e.g., stock prices, exchange rates, economic indicators). By parallelizing this data fetch, the program can speed up this part of the process.

Gustafson's Law would come into play as the program is able to handle more data sources and generate more comprehensive reports as the computing resources (processors) are increased. The program can then take on larger and more complex financial reporting tasks.

By understanding both Amdahl's Law and Gustafson's Law, the developers of the financial reporting system can optimize the program's performance and scalability, ensuring that it can efficiently handle increasing amounts of data and reporting needs.

Green Thread, Managed Thread, User Level Thread

Q : Why are green threads called green threads?

A: Because they are green 😛

Q: Isn’t the processor color blind? 🤔

A thread is the basic unit of execution that can run on a processor. Computers with multiple processors can run as many threads as there are CPUs or processors.

A processor takes a thread from the ready queue, processes it for a short duration, and then switches to the next thread. This rapid switching between threads from the ready queue creates the illusion of parallel execution, even on a single-processor computer.

There are several reasons that cause a processor to change context and pick a new thread. These include time slice expiration, preemption, and priority-based scheduling. However, one major reason is I/O operations. If the processor detects that a thread is waiting for an I/O operation, it will immediately release that thread and pick another one from the ready queue.

This approach works well for a finite number of threads, but as the number of threads grows very high, the time spent on context switching can increase drastically.

To address this issue, modern programming runtimes like Golang and C# have introduced the concept of green threads or managed threads. When you create a thread in these programming languages, it is not an actual OS-level thread (also known as a kernel thread) that is created. Instead, it is a green thread or managed thread, which is a user-level thread.

The runtime creates the actual OS-level threads and then assigns the green/managed threads to these OS threads. In this way, a single OS thread can run multiple green/managed threads within it.

When a green/managed thread encounters an I/O operation, the runtime creates a new OS thread and assigns the remaining green/managed threads to it. This optimization of running multiple green/managed threads within a single OS thread helps reduce the overhead of context switching, which becomes increasingly important as the number of threads grows.

Conclusion

In this article, we explored the evolution of computing systems and the importance of concurrency to improve computational performance beyond the limitations of increasing clock frequencies on single processors.

We learned that multitasking and multiprocessing are distinct but related concepts that fall under the broader umbrella of concurrency. While JavaScript, a single-threaded language, can support multitasking through techniques like event-driven programming, modern languages like C# and Golang offer robust support for multiprocessing by leveraging parallel operations across multiple CPUs.

A key concept we discussed was the difference between OS-level threads (also known as kernel threads) and user-level threads, often referred to as green threads or managed threads. This distinction is important, as it allows programming runtimes to optimize thread management and reduce the overhead of context switching, particularly as the number of threads grows.

Whether you are a seasoned developer or new to the world of concurrency, I hope this article has provided clarity and added valuable knowledge to your understanding of these fundamental computer science concepts. Even if you were already familiar with the topics covered, I trust that this article has helped solidify your grasp of the subject matter and perhaps even challenged your previous assumptions.

Thank you for taking the time to read this article. I appreciate your engagement and hope that the insights shared here will prove useful in your future endeavors.

The basics of Async and Parallel Programming

Table of contents