These terms are often used interchangeably but mean different things precisely:
Concurrency — the ability to manage multiple tasks at once. Tasks may be interleaved on a single CPU core; when one pauses (waiting for I/O), another runs. No two tasks necessarily execute at the same physical instant.
Parallelism — the ability to execute multiple tasks simultaneously on multiple CPU cores. True parallelism requires multiple processors.
A juggler managing three balls is concurrent (one ball in hand at a time). Three jugglers each with one ball are parallel. Python can do both — but through different mechanisms, and with an important constraint.
CPU-Bound vs I/O-Bound
This distinction determines which concurrency model to use:
I/O-bound tasks spend most of their time waiting: for a network response, a disk read, a database query. The CPU is idle during the wait. Examples: HTTP requests, file reads, database queries, subprocess calls.
CPU-bound tasks spend most of their time computing: number crunching, image processing, ML inference, encryption. The CPU is always busy.
Task timeline: I/O-bound Thread A: [compute][WAITING FOR NETWORK ........][compute] Thread B: [compute][WAITING FOR DB ....][compute] Overlap: threads can share a core productivelyTask timeline: CPU-bound Thread A: [compute compute compute compute compute] Thread B: [compute compute compute compute compute] Overlap: they compete for CPU — no benefit on one core
Python's Global Interpreter Lock (GIL)
The GIL is a mutex inside CPython (the reference Python implementation) that ensures only one thread executes Python bytecode at a time, even on a multi-core machine. It exists because CPython's memory management (reference counting) is not thread-safe — without the GIL, two threads decrementing the same object's reference count could corrupt memory.
Consequences:
Multi-threaded Python programs cannot parallelize pure-Python CPU work
The GIL is released during I/O operations (network, disk), so threads can overlap I/O waits
C extensions like NumPy often release the GIL during computation, allowing thread-level parallelism for those operations
multiprocessing bypasses the GIL entirely — separate processes have separate interpreters
python
# Illustration: GIL impact on CPU-bound workimport threadingimport timedef count(n): total = 0 for i in range(n): total += i return totalN = 50_000_000# Sequential: runs in ~2sstart = time.time()count(N)count(N)print(f"Sequential: {time.time() - start:.2f}s")# Two threads: also ~2s (GIL prevents true parallelism)start = time.time()t1 = threading.Thread(target=count, args=(N,))t2 = threading.Thread(target=count, args=(N,))t1.start(); t2.start()t1.join(); t2.join()print(f"Two threads (GIL): {time.time() - start:.2f}s")# For I/O-bound work, threads DO help because GIL is released during I/O
The Three Models at a Glance
| Model | Best for | True parallelism? | Overhead | Complexity |
|---|---|---|---|---|
| threading | I/O-bound | No (GIL) | Low | Medium |
| multiprocessing | CPU-bound | Yes | High | High |
| asyncio | High-concurrency I/O | No | Very low | Medium |
Part 2: threading — Deep Dive
Thread Lifecycle and Core API
A thread transitions through states: New (created, not started) → Runnable (started, waiting for GIL) → Running (executing bytecode) → Blocked (waiting for I/O or a lock) → Terminated (done or exception).
Thread Lifecycle
Click Run to execute — Python runs in your browser via WebAssembly
Race Conditions — Why Shared State is Dangerous
A race condition occurs when the result depends on the timing of thread execution. The classic example: incrementing a shared counter.
The += operator looks atomic but is actually three bytecode operations: LOAD, ADD, STORE. The OS scheduler can interrupt a thread between any two of these operations.
Race Condition vs Lock
Click Run to execute — Python runs in your browser via WebAssembly
All Synchronization Primitives
Synchronization Primitives
Click Run to execute — Python runs in your browser via WebAssembly
Thread-Local Storage and Timer
Thread-Local Storage and Timer
Click Run to execute — Python runs in your browser via WebAssembly
threading.Condition — Wait/Notify Pattern
Condition Variable — Bounded Buffer
Click Run to execute — Python runs in your browser via WebAssembly
Producer-Consumer with queue.Queue
queue.Queue is the idiomatic Python pattern for inter-thread communication. It is fully thread-safe, blocking, and supports priority queues.
Producer-Consumer with queue.Queue
Click Run to execute — Python runs in your browser via WebAssembly
ThreadPoolExecutor — The Right Way to Manage Threads
ThreadPoolExecutor Patterns
Click Run to execute — Python runs in your browser via WebAssembly
Part 3: multiprocessing — CPU Parallelism
Each Process has its own Python interpreter and memory space. No GIL contention. True parallelism on multi-core machines.
The tradeoff: inter-process communication requires serialization (pickling), and process startup takes ~50–100ms.
python
from multiprocessing import Process, Pool, Queue, Pipe, Value, Array, Managerimport osdef cpu_intensive(n): """Pure CPU work — each process uses a full core.""" return sum(i * i for i in range(n))if __name__ == "__main__": # REQUIRED guard on Windows/macOS with spawn # Direct Process usage p = Process(target=cpu_intensive, args=(1_000_000,)) p.start() p.join() # Wait for completion print(f"Process exit code: {p.exitcode}") # 0 = success
Spawn vs Fork — Start Methods
python
import multiprocessing as mp# Three start methods:# 'fork' — Copy parent process (fast, Unix only, can cause issues with threads)# 'spawn' — Start fresh Python interpreter (safe, Windows default, macOS default since 3.8)# 'forkserver' — Dedicated server process handles forking (Unix, safer than fork)# Set globally (call before creating any processes):mp.set_start_method('spawn') # Recommended for portability# Or per-context:ctx = mp.get_context('fork')p = ctx.Process(target=cpu_intensive, args=(100_000,))
Sharing State Between Processes
Because processes have separate memory, sharing data requires special objects:
python
from multiprocessing import Process, Value, Array, Lockimport ctypesdef increment_shared(counter, lock, n): for _ in range(n): with lock: counter.value += 1def write_array(arr, idx, val): arr[idx] = val * valif __name__ == "__main__": # Value: single shared primitive counter = Value('i', 0) # 'i' = C int, 'd' = double, 'b' = bool lock = Lock() procs = [Process(target=increment_shared, args=(counter, lock, 10_000)) for _ in range(4)] for p in procs: p.start() for p in procs: p.join() print(f"Counter: {counter.value}") # 40000 # Array: shared C array arr = Array('d', [0.0] * 8) # 'd' = C double procs = [Process(target=write_array, args=(arr, i, i)) for i in range(8)] for p in procs: p.start() for p in procs: p.join() print(f"Array: {list(arr)}")
multiprocessing.Queue and Pipe
python
from multiprocessing import Process, Queue, Pipe# Queue: multi-producer multi-consumer, process-safedef producer_proc(q): for i in range(5): q.put(f"item-{i}") q.put(None) # Sentineldef consumer_proc(q): while True: item = q.get() if item is None: break print(f"Got: {item}")if __name__ == "__main__": q = Queue(maxsize=10) p = Process(target=producer_proc, args=(q,)) c = Process(target=consumer_proc, args=(q,)) p.start(); c.start() p.join(); c.join()# Pipe: bidirectional (or unidirectional) channel between exactly 2 processesdef pipe_worker(conn): msg = conn.recv() # Receive from parent conn.send(f"Echo: {msg}") # Send back conn.close()if __name__ == "__main__": parent_conn, child_conn = Pipe(duplex=True) p = Process(target=pipe_worker, args=(child_conn,)) p.start() parent_conn.send("hello from parent") response = parent_conn.recv() print(response) p.join()
ProcessPoolExecutor — Recommended for Most CPU Work
python
from concurrent.futures import ProcessPoolExecutor, as_completedimport mathdef factorize(n): """CPU-bound: prime factorization.""" factors = [] d = 2 while d * d <= n: while n % d == 0: factors.append(d) n //= d d += 1 if n > 1: factors.append(n) return factorsdef expensive_computation(n): """Simulate heavy CPU work.""" result = sum(math.log(i + 1) * math.sqrt(i) for i in range(n)) return n, round(result, 4)if __name__ == "__main__": numbers = list(range(50_000, 50_020)) # map(): simple, ordered, blocks until all done with ProcessPoolExecutor(max_workers=4) as pool: results = list(pool.map(factorize, numbers, chunksize=5)) print(f"Factorized {len(results)} numbers") # submit() + as_completed(): stream results as they finish workloads = [100_000 + i * 10_000 for i in range(8)] with ProcessPoolExecutor(max_workers=4) as pool: futures = {pool.submit(expensive_computation, n): n for n in workloads} for future in as_completed(futures): n, result = future.result() print(f" n={n}: {result}")
Pickling Constraint
Multiprocessing requires arguments and return values to be picklable. This means:
Nested functions — NOT picklable (use functools.partial instead)
File objects, sockets, database connections — NOT picklable
python
import pickle# Test if something is picklable:def is_picklable(obj): try: pickle.dumps(obj) return True except Exception as e: return Falseprint(is_picklable(42)) # Trueprint(is_picklable([1, 2, 3])) # Trueprint(is_picklable(lambda x: x)) # False — lambdas not picklable
Part 4: asyncio — Event-Driven Async I/O
asyncio uses a single thread with a cooperative event loop. Coroutines voluntarily yield control with await. No OS context switching, no GIL issues — scales to thousands of concurrent connections.
Coroutines, Tasks, and the Event Loop
Coroutines, gather, create_task
Click Run to execute — Python runs in your browser via WebAssembly
asyncio.wait — Fine-Grained Control
asyncio.wait and Error Handling
Click Run to execute — Python runs in your browser via WebAssembly
asyncio Timeouts and Cancellation
Timeouts and Cancellation
Click Run to execute — Python runs in your browser via WebAssembly
asyncio Synchronization Primitives
asyncio Synchronization Primitives
Click Run to execute — Python runs in your browser via WebAssembly
async for, async with, and run_in_executor
async for, async with, run_in_executor
Click Run to execute — Python runs in your browser via WebAssembly
Part 5: Choosing the Right Model
Is your task I/O-bound? (waiting for network, disk, database) YES: Will you have 100+ concurrent operations? YES → asyncio (lowest overhead, highest throughput) NO → threading or ThreadPoolExecutor (simpler code) Is your I/O library async-aware (aiohttp, asyncpg, etc.)? YES → asyncio NO → threading (works with any sync library) NO (CPU-bound): Is it NumPy/Pandas/C extension code? YES → threading may work (GIL released by C code) NO → multiprocessing / ProcessPoolExecutor
Threading over asyncio when:
Working with legacy sync libraries (requests, psycopg2, etc.)
Simpler code is more important than maximum throughput
You have moderate concurrency (< 100 threads)
asyncio over threading when:
Building high-concurrency servers (thousands of connections)
Your I/O libraries support async (aiohttp, asyncpg, motor)
You want predictable, cooperative scheduling
multiprocessing when:
Pure Python CPU work (no NumPy)
Work per task takes > 100ms (startup overhead worthwhile)
You need true parallelism and can tolerate pickling constraints
Part 6: Advanced Patterns
Rate Limiting Async Calls
Rate Limiter with Token Bucket
Click Run to execute — Python runs in your browser via WebAssembly
Graceful Shutdown
Graceful Shutdown Pattern
Click Run to execute — Python runs in your browser via WebAssembly
PROJECT: Concurrent File Downloader
A simulated file downloader demonstrating threading with progress tracking, timeout handling, and retry logic. Uses asyncio.sleep to simulate network I/O in the browser environment.
Concurrent File Downloader with Retry
Click Run to execute — Python runs in your browser via WebAssembly
Exercises
Exercise 1 — Thread-safe counter class
Exercise 1 — Thread-Safe Counter
Click Run to execute — Python runs in your browser via WebAssembly
Exercise 2 — Async rate limiter: fetch N URLs with max M/sec
Exercise 2 — Rate Limited Fetcher
Click Run to execute — Python runs in your browser via WebAssembly
Exercise 3 — asyncio retry with exponential backoff
Exercise 3 — Retry with Exponential Backoff
Click Run to execute — Python runs in your browser via WebAssembly
Exercise 4 — Thread pool for parallel file processing (simulated)
Exercise 4 — Thread Pool File Processing
Click Run to execute — Python runs in your browser via WebAssembly