Python’s multiprocessing
module allows you to create programs that leverage multiple processors, which can significantly speed up CPU-bound tasks. Here’s a comprehensive guide on using Python’s multiprocessing
module, including examples, best practices, and standard coding structures.
Why Use Multiprocessing?
Multiprocessing is used to parallelize tasks to utilize multiple CPUs, which can lead to substantial performance improvements, especially for CPU-bound operations. It bypasses Python’s Global Interpreter Lock (GIL) by creating separate memory spaces for each process.
Key Concepts
- Process: An independent entity that has its own memory space.
- Thread: Shares memory space with other threads but can’t run in true parallel in Python due to the GIL.
- GIL (Global Interpreter Lock): A mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.
Basic Example
Here’s a simple example to demonstrate how to use the multiprocessing
module:
from multiprocessing import Process
import os
def print_square(number):
print(f"Square: {number * number}")
def print_cube(number):
print(f"Cube: {number * number * number}")
if __name__ == "__main__":
# Create Process objects
p1 = Process(target=print_square, args=(10,))
p2 = Process(target=print_cube, args=(10,))
# Start the processes
p1.start()
p2.start()
# Wait for the processes to complete
p1.join()
p2.join()
print("Done!")
Detailed Explanation of the Example
- Importing the Module: We import the
Process
class from themultiprocessing
module. - Defining Functions: We define two functions,
print_square
andprint_cube
, that perform CPU-bound tasks. - Creating Process Objects: We create
Process
objects for each function, passing the function and its arguments. - Starting Processes: We use the
start
method to begin execution of the processes. - Joining Processes: We use the
join
method to ensure the main program waits for the processes to complete before moving on.
Best Practices
1. Use Pools for Simplicity: For tasks that can be broken into smaller tasks, consider using multiprocessing.Pool
for easier management.
from multiprocessing import Pool
def square(x):
return x * x
if __name__ == "__main__":
with Pool(5) as p:
print(p.map(square, [1, 2, 3, 4, 5]))
2. Avoid Global State: Each process has its own memory space, so avoid relying on global state as it won’t be shared between processes.
3. Proper Synchronization: Use multiprocessing.Queue
, Lock
, Semaphore
, etc., to handle inter-process communication and synchronization.
from multiprocessing import Process, Lock
def printer(item, lock):
lock.acquire()
try:
print(item)
finally:
lock.release()
if __name__ == "__main__":
lock = Lock()
items = ["apple", "banana", "cherry"]
processes = [Process(target=printer, args=(item, lock)) for item in items]
for p in processes:
p.start()
for p in processes:
p.join()
4. Graceful Termination: Ensure that processes terminate properly, especially when handling exceptions.
from multiprocessing import Process
import time
def long_task():
try:
time.sleep(5)
except KeyboardInterrupt:
print("Task interrupted")
if __name__ == "__main__":
p = Process(target=long_task)
p.start()
p.join()
5. Use if __name__ == "__main__":
: This ensures that the multiprocessing code does not run unintentionally when the module is imported.
Standard Coding Structure
from multiprocessing import Process, Pool, Lock
import os
def worker_function(arg1, arg2):
# Your code here
pass
def pool_worker(arg):
# Your pool worker code here
return result
def main():
# Using Process
process1 = Process(target=worker_function, args=(arg1, arg2))
process2 = Process(target=worker_function, args=(arg1, arg2))
process1.start()
process2.start()
process1.join()
process2.join()
# Using Pool
with Pool(processes=4) as pool:
results = pool.map(pool_worker, iterable)
print("Main process done.")
if __name__ == "__main__":
main()
Example Explanation
- Function Definitions: Define the worker functions that will be executed in parallel.
- Main Function: The
main
function handles the creation, starting, and joining of processes or pool workers. - Multiprocessing Pool: The
Pool
object manages a pool of worker processes to which tasks can be submitted.
Conclusion
Python’s multiprocessing
module is a powerful tool for parallelizing CPU-bound tasks, improving performance significantly. Following best practices and standard coding structures ensures that your multiprocessing code is robust, efficient, and easy to maintain.