My learnings from this article on parallel processing :

  • Parallel processing in Python can be achieved via two libraries- multiprocessing and threading
  • A process is an instance of computer that is executed with its own memory and resources.
  • Threads are components of a process which can be run in parallel. There can be multiple threads in a process that share the same memory space of the parent process. They typically have low overhead compared to processes. Rely on Inter-process-communication model provided by the OS
  • Spawning process is slower than Spawning threads
  • Pitfalls of || programming
    • Race condition : Two threads try to modify a variable at the same time
    • Starvation : A thread does not get access to the resources
    • Deadlock - Threads are waiting for each other to complete execution
    • Livelock - Threads are running in a loop without any progress
  • Two threads do not write to the same location - This is achieved by GIL - Global Interpreter Lock
  • Any function that needs to execute should acquire a global lock. Only a single thread can acquire the lock at a time. This means that interpreter ultimately runs the specification serially
  • Usecases of threading - GUI programs, Web scraping
  • If the program is CPU intensive, then there is a case for multiprocessing. It outshines threading.
  • Threading cannot use multiple cores at all
  • CPU bound tasks - Multiprocessing is better than Multithreading
  • IP bound tasks - Multithreading is better Multiprocessing
  • Analyzing the emails stored in a mail server
    • Downloading email box in parallel
    • Since it involves a lot of IO operations, this is best done via multi-threading
  • Using a RF classifier is a computationally heavy task and hence it is better to use multiprocessor
  • Factors to consider are
    • Whether there is an IO operation
    • Whether IO is the bottleneck
    • Whether the task depends on large amount of computation by CPU

Till date I had never considered this aspect at all. In fact I have been totally ignorant about this. I am glad that I have atleast learnt it now.