A few months ago, I had this thought of practicing Python every day for 20 minutes. If you use Python in your daily work, you should not rely on that work a substitute for a deliberate practice session. This was also echoed by Josh Kaufman in his book, The First Twenty Hours, where he could not rely on daily work that involved typing as a substitute for a deliberate practice session on touchtyping. If you are trying to learn touch typing, you might assume that since you are anyway typing emails, reports etc, you are in essence doing deliberate practice. Not really. Once you are in a deliberate practice session, your focus become the craft itself unlike the outcome of the specific task. Unless you set aside some time for the task on a regular basis, it is difficult to improve in any skill, be it touch typing or coding python.

In any case, setting aside a 20 min time slot for going through the book, “Effective Python” , helped me in reading this book slowly and digest all the wonderful information present in it. In any case, this book cannot be consumed in a few sittings. It will take quite amount of time to read, to think and understand various ways in which one could improve the craft of coding

This blogpost summarizes some of the main points from the book.

Pythonic Thinking

Python version

1
2
import sys
sys.version_info
1
sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)

Difference between str and bytes

There are two types that represent sequences of character data: bytes and str.

Instances of bytes contain raw, unsigned 8-bit values.

1
2
a = b'h\x65llo'
a, list(a)
b hello (104 101 108 108 111)

Instances of str contain Unicode code points that represent textual characters from human languages

1
2
a = 'a\u002A sdfdf'
a, list(a)
a* sdfdf (a * s d f d f)

str instances do not have an associated binary encoding, and bytes instances do not have an associated text encoding. To convert Unicode data to binary data, you must call the encode method. To convert binary data to Unicode data, you must call decode method of bytes

1
2
3
4
5
6
7
8
9
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode("utf-8")
    else:
        value = bytes_or_str
    return value

return to_str('hello'), to_str(b'hello') None

hello hello
1
2
3
4
5
6
7
8
9

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value=bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

return to_bytes(b'foo'), to_bytes('bar')

b foo b bar
  • You can add two str instances or two bytes instances but you cannot add a byte instance to a str instance
  • If the file is opened in 'r' mode or 'w' mode, it expects that the file is in text mode. write operations expect str instances and read operations uses the system’s default text encoding to interpret data
  • If you want to read or write unicode data to/from a file, be careful about system’s default text encoding. Explicitly pass the encoding parameter to open if you want to avoid surprises
  • If you want to read or write binary data to/from a file, always open the file using a binary mode(like ‘rb’ or ‘wb’)
  • bytes and str instances can’t be used together with operators like (>,==, + and %)
  • Use helper functions to ensure that the inputs you operate are the type of character sequence that you expect(8-bit values, UTF-8-encoded strings, Unicode points)

Prefer interpolated F-strings Over C-style format strings

Python has four different ways to formatting strings that are built in to the language and the standard library

  • Use formatting operator %. These come from C’s printf function
    • One can use the % operator with a dict
1
2
3
a = 0b10111011
b = 0xc5f
return 'Binary is %d, hex is %d '%(a,b)
1
Binary is 187, hex is 3167
  • Python 3 added support for advanced string formatting that is more expressive than the old C-style format strings that use the % operator. For individual python values, this new functionality can be accessed through the format built-in function.
1
2
a = 1234.4
return format(a, ',.2f')
1
1,234.40
1
2
3
key = "rk"
value = "45"
return '{}={}'.format(key, value)
1
rk=45
  • You can use the new functionality to format multiple values together by calling the new format method of the str type
  • Python 3.6 added interpolated format stringsf strings for short – to solve most of the problems associated with displaying formatted strings
    • Python expression may also appear within the format specifier options
1
2
3
key = "rk"
value = "45"
return f'{key}={value}'
1
rk=45
1
2
3
key = "rk"
value = 45.12
return f'{key:<10}={value:.1f}'
1
rk        =45.1

Takeaways

  • C-style format strings that use the % operator suffer from a variety of gotchas and verbosity problems
  • The str.format introduces some useful concepts in its formatting specifiers mini language, but it otherwise repeats the mistakes of C-style format strings and should be avoided
  • F-strings are a new syntax for formatting values into strings that solves the biggest problems with C-style format strings
  • F-string are succinct yet powerful because they allow for arbitrary Python expressions to be directly embedded within the format specifiers

Write Helper functions instead of Complex expressions

1
2
3
from urllib.parse import parse_qs
my_values = parse_qs('red=5&blue=10')
return my_values
red : (5) blue : (10)

Python’s syntax makes it easy to write single-line expressions that are overly complicated and difficult to read. Hence it is better if you move such complicated expressions to helper functions

Prefer Multiple Assignment Unpacking over Indexing

Unpacking has less visual noise than accessing the tuple’s indexes and it often requires fewer lines.

1
2
3
4
5
6
7
8
9
books_to_read = [
    ("R", "Resampling"),
    ("Python", "Effective Python"),
    ("Finance", "Factor Models in R"),
]

for i, (sub, book) in enumerate(books_to_read, 1): print(f"{i}: Subject {sub} Book{book}")

1
2
3
1: Subject R BookResampling
2: Subject Python BookEffective Python
3: Subject Finance BookFactor Models in R

Unpacking is generalized in Python and can be applied to any iterable, including many levels of iterables within iterables.

Prefer enumerate over range

  • << operator is zero fill left shift operator
  • >> operator is zero fill right shift operator
  • | operator is OR operator
  • enumerate provides a concise syntax for looping over an iterator and getting the index of each item from the iterator as you go
  • Prefer enumerate instead of looping over a range and indexing in to a sequence
  • You can supply a second parameter to enumerate to specify the number from which to begin counting

Use zip to process iterators in parallel

  • Beware of the situation where the iterators are not of equal length. It yields tuples until any one of the wrapped iterators is exhausted
  • one can also use zip_longest for the case where iterators of varying lengths

Avoid else Blocks After for and while loops

  • else block runs immediately after the loop finishes
  • the else block runs only if the loop body did not encounter a break statement
  • Avoid using else blocks after loops because their behavior isn’t intuitive and can be confusing

Prevent Repetition with Assignment Expressions

  • An assignment expression - also known as walrus operator- is a new syntax introduced in Python 3.8 to solve a long-standing problem with the language. It is written as
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fresh_fruit = {
'apple':10, 'banana':8, 'lemon':5
}
if count:= fresh_fruit.get('lemon',0):
    print('Yes lemon')
else:
    print('No lemon')

if (count:= fresh_fruit.get('lemon',0)) > 4: print('Yes Cider') else: print('No Cider')

1
2
Yes lemon
Yes Cider
  • walrus operator can also be used a substitute for deeply nested if,elif,else statements.
  • walrus operator can also be used to eliminate loop-and-a-half idiom
  • Although switch-case statements and do-while loops are not available in Python, their functionality can be emulated much more clearly by using assignment expressions

Lists and Dictionaries

Know How to Slice Sequences

  • When slicing from a start of a list, you should leave out the zero index to reduce visual noise
  • When slicing to the end of a list, you should leave out the final index because it is redundant
  • The result of slicing a list is a whole new list.
  • Assigning to a list slice replaces that range in the original sequence with what’s referenced even if the lengths are different

Avoid Striding and Slicing in a Single Expression

  • Specifying start, end and stride in a slice can be extremely confusing
  • Prefer using positive stride values in slices without start or end indexes. Avoid negative stride values if possible
  • Avoid using start, end and stride together in a single slice. If you need all three parameters, consider doing two assignments

Prefer Catch-All Unpacking over Slicing

  • Use unpacking pattern
1
2
3
x = list(range(10))
a, b, *c = x
return f"a:{a}, b:{b}, c:{c}"
1
a:0, b:1, c:[2, 3, 4, 5, 6, 7, 8, 9]
  • Starred expressions may appear in any position, and they will always become a list containing the zero or more values they receive
  • When dividing a list in to non-overlapping pieces, catch-all unpacking is much less error prone than slicing and indexing

Sort by Complex Criteria Using the key parameter

  • Sorting arbitrary python objects in a list works by invoking the relevant comparison methods on the object. If the object does not implement the comparison operator, then there is a syntax error
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Tool:
    def __init__(self, name, weight):
        self.name = name
        self.weight = weight
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">f</span><span class="s2">&#34;Tool({self.name!r}, {self.weight})&#34;</span>

tools = [Tool("level", 2), Tool("axe", 21)] return sorted(tools, key=lambda x: x.name), sorted(tools, key=lambda x: x.weight)

Tool (axe 21) Tool (level 2)
Tool (level 2) Tool (axe 21)
  • Tuples are comparable by default and have a natural ordering
  • Returning a tuple from the key function allows you to combine multiple sorting criteria together. The unary minus operator can be used to reverse individual sort orders for types that allow it
  • For types that can’t be negated, you can combine many sorting criteria together by calling the sort method multiple times using different key functions and reverse values, in the order of lowest rank sort call to highest rank sort call

Be Cautious When Relying on dict insertion ordering

  • In Python 3.5 and before, iterating over a dict would return keys in an arbitrary order. This happened because the dictionary type previously implemented its hash table algorithm with an combination of the hash built-in function and a random seed that was assigned when the Python interpreter started
  • Starting with Python 3.6 and officially part of Python spec in version 3.7, dictionaries preserve the insertion order

Prefer get Over in and KeyError to Handle missing Dictionary Keys

  • There are four common ways to detect and handle missing keys in dictionaries: using in expressions, KeyError exceptions, the get method and setdefault method
  • The get method is best for dictionaries that contain basic types like counters, and it is preferable along with assignment expressions when creating dictionary values has a high cost or may raise exceptions
  • setdefault tries to fetch the value of a key in the dictionary. If the key isn’t present, the method assigns that key to the default value provided.
  • When the setdefault method of dict seems like the best fit for your problem, you should consider using defaultdict method

Prefer defaultdict Over setdefault to handle missing items in internal state

  • If you are creating a dictionary to mange an as arbitrary set of potential keys, then you should prefer using a defaultdict instance from the collections built-in module if it suits your problem
  • If a dictionary of arbitrary keys is passed to you, and you don’t control its creation, then you should prefer the get method to access its items. However, it’s worth considering using the setdefault method for a few situations in which it leads to shorter code

Know how to construct key-dependent default values using __missing__

  • The setdefault method of dict is a bad fit when creating the default value has high computational cost
  • The function passed to defaultdict must not require any arguments, which makes it impossible to have the default value depend on the key being accessed
  • You can define your own dict subclass with a __missing__ method in order to construct default values that must know which key was being accessed

Functions

Never Unpack More than three variables when functions return multiple values

  • You can have functions return multiple values by putting them in a tuple and having the caller take advantage of Python’s unpacking syntax
  • Multiple return values from a function can also be unpacked by catch-all starred expressions
  • Unpacking into four or more variables is error prone and should be avoided. One can use a namedtuple instance

Prefer Raising Exceptions to Returning None

  • Functions that return None to indicate special meaning are error prone because None and other values all evaluate to False in conditional expectations
  • Raise exceptions to indicate special situations instead of returning None.
  • Type annotations can be used to make it clear that a function will never return the value None, even in special situations

Know How closures interact with Variable scope

  • Python supports closures - that is, functions that refer to variables from the scope in which they were defined
  • Python has specific rules for comparing sequences. It first compares items at index zero; if they are still equal, it compares items at index two, and so on
  • When you reference a variable in an expression, the Python interpreter traverses the scope to resolve the reference in this order
    • the current function’s scope
    • any enclosing scopes
    • the scope of the module that contains the code
    • the built-in scope
  • Assigning a value to a variable works differently. If the variable is already defined in the current scope, it will just take on the new value. If the variable doesn’t exist in the current scope, Python treats the assignment as a variable definition. Critically, the scope of the newly defined variable is the function that contains the assignment
  • There is a special syntax for getting data out of closure. The nonlocal statement is used to indicate the scope traversal should happen upon assignment for a specific variable name.
  • avoid using nonlocal statements for anything beyond simple functions
  • use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scope
  • By default. closures can’t affect enclosing scopes by assigning variables

Reduce Visual Noise with variable positional arguments

  • Optional positional arguments are always turned into a tuple before they are passed to a function
  • functions that accept *args are best for situations where you know the number of inputs in the argument list will be reasonably small
  • Using the * operator with a generator may cause a program to run out of memory and crash

Provide Optional Behavior with Keyword Arguments

  • Positional arguments must be specified before keyword arguments
  • Function arguments can be specified by position or by keyword
  • Keywords make it clear what the purpose of each argument is when it would be confusing with only positional arguments
  • Keyword arguments with default values make it easy to add new behaviors to a function without needing to migrate all existing callers
  • Optional keyword arguments should always be passed by keyword instead of by position

Use None and Docstrings to specify dynamic default arguments

  • A default argument value is evaluated only once per module load, which usually happens when a program starts up. After the module containing this code is loaded, the datetime.now() default argument will never be evaluated again
  • Use None as the default value for any keyword argument that has a dynamic value. Document the default behavior using the function’s docstring

Enforce clarity with Keyword only arguments and Positional arguments

  • Keyword-only arguments force callers to supply certain arguments by keyword, which makes the intention of the function call clearer. Keyword-only arguments are defined after a single * in the argument list
  • Positional-only arguments ensure that callers can’t supply certain parameters using keywords, which help reduce coupling
  • Parameters between the / and * characters in the argument list may be supplied by position or keyword

Define Function decorators with functools.wraps

  • Decorators in Python are syntax to allow one function to modify another function at runtime
  • Using decorators can cause strange behaviors in tools that do introspection
  • Use the wraps decorator from the functools built-in module when you define your decorators to avoid issues

Comprehensions and Generators

Use comprehensions instead of map and filter

  • List comprehensions are cleaner than the map and filter built-in functions because they don’t require lambda expressions
1
2
3
4
data = list(range(10))
x1 =[x*2 for x in data if x%2 ==0]
x2 = list(map(lambda x : x*2, filter(lambda x: x%2==0, data)))
x1==x2
1
True
  • List comprehensions allow you to easily skip items from the input list, a behavior that map doesn’t support without help of a filter
  • Dictionaries and sets can also be created using comprehensions

Avoid more than two control subexpressions in comprehensions

  • comprehensions support multiple if conditions. multiple conditions at the same loop level have and implicit and expression
  • comprehensions support multiple levels and multiple conditions per loop level

Avoid repeated work in comprehensions by using Assignment expressions

  • if a comprehension uses the walrus operator in the value part of the comprehension and doesn’t have a condition, it ’ll leak the loop variable in the containing scope
  • Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance

Consider Generators Instead of Returning Lists

  • Using generators can be clearer than the alternative of having a function return a list of accumulated results
  • The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body
  • Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs

Be Defensive when iterating over arguments

  • The iterator protocol is how python for loops and related expressions traverse the contents of a container type. When Python sees a statement like for x in foo, it actually calls iter(foo). The iter built-in function calls the foo.__iter__ special method in turn. The __iter__ method must return an iterator object. Then , the for loop repeatedly calls the next built-in function on the iterator object until its exhausted
  • when an iterator is passed to the iter built-in function, iter returns the iterator itself
  • when a container type is passed to iter, a new iterator object is returned each time
  • Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values
  • Python’s iterator protocol defines how containers and iterators interact with iter and next built-in functions, for loops and related expressions
  • You can easily define your own iterable container type by implementing the __iter__ method as generator
  • You can detect that a value is an iterator if called iter on it produces the same value as what you passed in,

Consider Generator Expressions for Large List Comprehensions

  • List comprehensions can cause problems for large inputs by using too much memory.
  • Generator expressions avoid memory issues by producing outputs one at a time as iterators
  • Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another
  • Generator expressions execute very quickly when chained together and are memory efficient

Compose Multiple Generators with yield from

  • The yield from expression allows you to compose multiple nested generators together into a single combined generator
  • yield from provides a better performance than manually iterating nested generators and yielding their outputs

Avoid Injecting Data into Generators with send

  • Python generators support the send method, which upgrades yield expressions into a two-way channel. The send method can be used to provide streaming inputs to a generator at the same time it’s yielding outputs.
  • The send method can be used to inject data into a generator by giving the yield expression a value that can be assigned to a variable
  • using send with yield from expressions may cause surprising behavior, such as None values appearing at unexpected times in the generator output
  • Providing an input iterator to a set of composed generators is a better approach than using the send method

Avoid Causing State Transitions in Generators with throw

  • The way throw works is simple: When the method is called, the next occurrence of a yield expression re-raises the provided Exception instance after its output is received instead of continuing normally
  • The throw method can be used to re-raise exceptions within generators at the position of the most recently executed yield expression

Consider itertools for working with iterators and generators

  • use chain to combine multiple iterators into a single sequential iterator
  • use repeat to output a single value forever
  • use cycle to repeat an iterator’s items forever
  • use tee to split a single iterator into a number of parallel iterators
  • use islice to slice an iterator by numerical indexes without copying
  • use takewhile and dropwhile to filter iterator values
  • accumulate folds an item from an iterator into a running value by applying a function that takes two parameters
  • product returns the cartesian product of items from one or more iterators
  • permutations returns the unique ordered permutations of length N with items from an iterator
  • The itertools functions fall in to three main categories for working with iterators and generators - linking iterators together, filtering items they output, and producing combination of items

Classes and Interfaces

Compose classes instead of nesting many levels of built-in types

  • Avoid making dictionaries with values that are dictionaries, long tuples or complex nestings of other built-in types
  • Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class
  • Move your bookkeeping code to using multiple classes when your internal state dictionaries get complicated
  • Although a namedtuple is useful in many circumstances, it’s important to understand when it can do more harm than good:
    • You can’t specify default argument values for the namedtuple classes. This makes them unwieldy when your data may have many optional properties
    • The attribute values of namedtuple instances are still accessible using numerical indexes and iteration

Accept Functions Instead of Classed for Simple Interfaces

  • Instead of defining and instantiating classes, you can often simply use functions for simple interfaces between components in Python
  • References to functions and methods in Python are first class meaning they can be used in expressions
  • The __call__ special method enables instances of a class to be called like plain Python functions
  • When you need a function to maintain state, consider defining class that provides the __call__ method instead of defining a stateful closure

Use @classmethod Polymorphism to Construct Objects Generically

  • Polymorphism enables multiple classes in a hierarchy to implement their own unique versions of a method. This means that many classes can fulfill the same interface or abstract base class while providing different functionality
  • Use @classmethod to define alternative constructor for your classes
  • Use class method polymorphism to provide generic ways to build and connect many concrete subclasses
  • https://realpython.com/courses/threading-python/

Initialize Parent Classes with super

  • Python’s standard method resolution order (MRO) solves the problems of superclass initialization order and diamond inhertiance
  • Use the super built-in function with zero arguments to initialize parent classes

Metaclasses and Attributes

Metaclass lets you intercept Python’s class statement and provide special behavior each time a class is defined

Use Plain attributes instead of Setter and Getter methods

  • In Python, you never need to implement explicit setter or getter methods.
  • property() is a built-in function that creates and returns a property object. The syntax of this function is
1
property(fget=None,fset=None, fdel=None, doc=None)

where

  • fget is the function to get the value of the attribute
  • fset is the function to set the value of the attribute
  • fdel is the function to delete the attribute
  • doc is a string
  • Define new alss interfaces using simple public attributes and avoid defining setter and getter methods
  • Use @property to define special behavior when attributes are accessed on your objects, if necessary
  • Follow the rule of least surprise and avoid odd side effects in your @property methods
  • Ensure the @property methods are fast; for slow or complex work - especially involving I/0 or causing side effects - use normal methods instead

Consider @property instead of refactoring attributes

  • Use @property to give existing instance attributes new functionality
  • Make incremental progress towards better data models by using @property
  • Consider refactoring a class and all call sites when you find yourself using @property too heavily

Use Descriptors for Reusable @property methods

  • The big problem with @property built-in is reuse. The methods it decorates can’t be reused for multiple attributes of the same class.
  • The descriptor protocol defines how attribute access is interpreted by the language. Descriptor class can provide get and set methods that let you reuse any validation logic without boiler plate
  • weakref module: This module provides a special class called WeakKeyDictionary that can take the place of simple dictionary. The unique behavior of WeakKeydictionary is Python does the bookkeeping and the dictionary will be empty when all the keys are no longer in use
  • Reuse the behavior and validation of @property methods by defining your own descriptor classes
  • Use WeakKeyDictionary to ensure that the descriptor classes don’t cause memory leaks
  • Don’t get bogged won trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes

Use __getattr__, __getattribute__, setattr__ for Lazy attributes

Learned that it is important to pay attention if your classes have an implementation of __getattribute__

  • Use __getattr__ and __setattr__ to lazily load and save attributes for an object
  • Understand that __getattr__ only gets called when accessing a missing attribute, whereas __getattribute__ gets called every time any attribute is accessed
  • Avoid infinite recursion in __get_attribute__ and __setattr__ by using methods from super() to access instance attributes

Validate Subclasses with __init__subclass__

  • A metaclass is defined by inheriting from type
  • A metaclass receives the contents of the associated class statements in its __new__ method
  • The metaclass has access to the name of the class, the parent classes it inherits from and all the class attributes that are defined in the class body
  • Python 3.6 introduced a simplified syntax __init__subclass__ that can be used to validate the object hierarchy

Register Class Existence with __init_subclass__

  • Class registration is a helpful pattern for building modular Python programs
  • Metaclasses let you run registration code automatically each time a base class is subclassed in a program
  • Using metaclasses for class registration helps you avoid errors by ensuring that you never miss a registration call
  • Prefer __init_subclass__ over standard metaclass machinery because it’s clearer and easier for beginners to understand

Concurrency and Parallelism

Use subprocess to manage child processes

  • Python has many ways to run subprocesses, but the best choice for managing child processes is to to use the subprocess built-in module
  • Child processes run in parallel with the Python interpreter, enabling you to maximize your usage of CPU cores
  • Use the run convenience function for simple usage, and the Popen class for advanced usage like UNIX-style pipelines
  • Use the timeout parameter of the communicate method to avoid dead-locks and hanging child processes

Use threads for blocking I/O, Avoid for Parallelism

Because of the way CPython works, threading may not speed up all tasks. This is due to interactions with the GIL that essentially limit one Python thread to run at a time

  • The standard implementation of Python is called CPython. CPython runs a Python program in two steps. First it parses and compiles the source text into bytecode, which is a low-level representation of the program. Then, CPython runs the bytecode using a stack-based interpreter. The bytecode interpreter has state that must be maintained and coherent while the program executes. CPython enforces coherence with GIL
  • GIL is a mutex that prevents CPython from being affected by preemptive multithreading, where one thread takes control of a program by interrupting another thread.
  • Why does Python supports thread at all ?
    • Multiple threads make it easy for a program to seem like it’s doing multiple things at the same time. Managing the juggling act of simultaneous tasks is difficult to implement yourself. With threads, you can leave it to Python to run your function concurrently
    • Helps in dealing with blocking I/O which happens when Python does certain types of system calls
  • All system calls will run in parallel from multiple Python threads even though they are limited by the GIL. The GIL prevents my Python code from running in parallel but it doesn’t have an effect on system calls. This works because Python threads release the GIL just before they make system calls, and they reacquire the GIL as soon as the system calls are done
  • Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as the computation

Use Lock to prevent data races in threads

  • Although only one Python thread runs at a time, a thread’s operations on data structures can be interrupted between any two byte code instructions in the Python interpreter
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

from threading import Thread

class Counter: def init(self): self.count = 0

<span class="k">def</span> <span class="nf">increment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">offset</span><span class="p">):</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">count</span> <span class="o">+=</span> <span class="n">offset</span>

def worker(sensor_index, how_many, counter): for _ in range(how_many): counter.increment(1)

how_many = 10 ** 5 counter = Counter()

threads = []

for i in range(5): thread = Thread(target=worker, args=(i, how_many, counter)) threads.append(thread) thread.start() for thread in threads: thread.join()

expected = how_many * 5 found = counter.count print(f"Counter should be {expected}, got {found}")

1
Counter should be 500000, got 374258

The python interpreter enforces fairness between all of the threads that are executing to ensure they get roughly equal processing time. To do this, Python suspends a thread as it’s running and resumes another thread in turn. The problem is that you don’t know exactly when Python will suspend your threads. A thread can even be paused seemingly halfway through what looks like an atomic operation

The above program can be easily modified with the help of Lock to get the desired output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from threading import Thread
from threading import Lock

class Counter: def init(self): self.count = 0 self.lock = Lock()

<span class="k">def</span> <span class="nf">increment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">offset</span><span class="p">):</span>
    <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lock</span><span class="p">:</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">count</span> <span class="o">+=</span> <span class="n">offset</span>

def worker(sensor_index, how_many, counter): for _ in range(how_many): counter.increment(1)

how_many = 10 ** 5 counter = Counter()

threads = []

for i in range(5): thread = Thread(target=worker, args=(i, how_many, counter)) threads.append(thread) thread.start() for thread in threads: thread.join()

expected = how_many * 5 found = counter.count print(f"Counter should be {expected}, got {found}")

1
Counter should be 500000, got 500000

Use Queue to Coordinate Work Between Threads

  • Pipelines are a great way to organize sequences of work - especially I/O bound programs - that run concurrently using multiple Python threads
  • Be aware of the many problems in building concurrent pipelines: busy waiting, how to tell workers to stop, and potential memory explosion
  • The Queue class has all the facilities you need to build robust pipelines: blocking operations, buffer sizes and joining

Know How to Recognize When Concurrency is Necessary

  • A program often grows to require multiple concurrent lines of execution as its scope and complexity increases
  • The most common types of concurrency coordination are fan-out(generating new units of concurrency) and fan-in (waiting for existing units of concurrency to complete)
  • Python has many different ways of achieving fan-out and fan-in

Avoid Creating New Thread Instances for On-demand Fan-out

  • The Thread instances require special tools to coordinate with each other safely. This makes the code that uses threads harder to reason than the procedural, single-threaded code from before. This complexity makes threaded code ore difficult to extend and maintain over time
  • Threads require a lot of memory - about 8 MB per executing thread. On many computers, that amount of memory doesn’t matter for let’s say 100 threads. But if you span 10000 threads, then it is a issue as you would need 80GB of memory
  • Starting a thread is costly, and threads have a negative performance impact when they run due to context switching between them. In this case, all of the threads are started and stopped each generation of the game, which has high overhead and will increase latency beyond the expected I/O time
  • Thread class will independently catch any exceptions that are raised by the target function and then write their traceback to sys.stderr. Such exceptions are never re-reraised to the caller that started the thread in the first place
  • Threads have many downsides. They’re costly to start and run if you need a lot of them, they each require a significant amount of memory, and they require special tools like Lock instances for coordination
  • Threads do not provide a built-in way to raise exceptions back in the code that started a thread or that is waiting for one to finish which makes them difficult to debug.

Understand How Using Queue for Concurrency Requires Refactoring

  • Using Queue instances with a fixed number of worker threads improves the scalability of fan-out and fan-in using threads.
  • It takes a significant amount of work to refactor existing code to use Queue, especially when multiple stages of a pipeline are required
  • Using Queue fundamentally limits the total amount of I/O parallelism a program can leverage compared to alternative approaches provided by other built-in Python features and modules

Consider ThreadPoolExecutor when threads are necessary for concurrency

  • Python include concurrent.futures built-in module, which provides the ThreadPoolExecutor class. It combines the best of Thread and Queue
  • The threads used for the executor can be allocated in advance, which means three is no startup cost for each execution
  • ThreadPoolExecutor automatically propagates exceptions back to the caller
  • The big problem with using ThreadPoolExecutor is that it won’t be able to scale
  • Although ThreadPoolExecutor eliminates the potential memory blow-up issues of using threads, it also limits I/O parallelism by requiring max_workers to be specified upfront
  • ThreadPoolExecutor enables simple I/O parallelism with limited refactoring, easily avoiding the cost of thread startup each time fan out concurrency is required

Achieve Highly Concurrent I/O with Coroutines

  • Python addresses the need for highly concurrent I/O with coroutines. Coroutines let you have a very large number of seemingly simultaneous functions in your Python programs.
  • The cost of starting a coroutine is a function call. Once a coroutine is active, it uses less than 1 KB of memory until it’s exhausted
  • Like threads, coroutines are independent functions that can consume inputs from their environment and produce resulting outputs. The difference is that coroutines pause at each await expression and resume executing an async function after the pending awaitable is resolved
  • The magic mechanism powering coroutines is the event loop, which can do highly concurrent I/O efficiently, while rapidly interleaving execution between appropriately written functions The beauty of coroutines is that they decouple your code’s instructions for the external environments from the implementation that carries out your wishes.
  • Coroutines can use fan-out and fan-in in order to parallelize I/O while also overcoming all the problems associated with doing I/O in threads

Know how to port threaded I/O to asyncio

  • Python’s support for asynchronous execution is well integrated in to the language
  • Python provides asynchronous versions of for loops, with statements, generators, comprehensions and library helper functions that can be used as drop-in replacements in coroutines
  • The asyncio built-in module makes it straightforward to port existing code that uses threads and blocking I/O over to coroutines and asynchronous I/O

Consider concurrent.futures for True Parallelism

  • It enables Python to utilize multiple CPU cores in parallel by running additional interpreters as child processes. These child processes are separate from the main interpreter, so their global interpreter locks are also separate. Each child can fully utilize one CPU core. Each child has a link to the main process where it receives instructions to do computation and returns results
  • What does ProcessPoolExecutor do ?
    • It takes each item from the args list
    • It serializes the item in to a binary data using pickle module
    • It copies the serialized data from the main interpreter process to a child interpreter process over a local socket
    • It deserializes the data back into Python objects, using pickle in the child process
    • It imports the Python module containing the relevant function
    • It runs the function on the input data in parallel with other child processes
    • It serializes the results back into binary data
    • It copies the binary data back through the socket
    • It deserializes the binary data back into Python objects in the parent process
    • It merges the results from multiple children
  • Moving CPU bottlenexts to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code
  • The multiprocessing module provides powerful tools that can parallelize certain types of Python computation with minimal effort
  • The power of multiprocessing is best accessed through the concurrent.futures built-in module
  • Avoid the advanced parts of multiprocessing module until you have exhausted all other options

Robustness and Performance

Take Advantage of Each Block in try/except/else/finally/

  • use try/finally when you want exceptions to propagate up but also want yo run up cleanup code even when exceptions occur
  • use try/except/else to make it clear which exceptions will be handled by your code and which exceptions will propagate up
  • Use try/except/else/finally when you want to do it all in one compound statement. For example, say that I want to read a description of work to do from a file, process it, and then update the file in-place. The try block is used to read the file and process it; the except block is used to handle exceptions from the try block that are expected; the else block is used to update the file in place and allow related exceptions to propagate up; and the finally block cleans up the file handle
  • The else block helps you minimize the amount of code in try=blocks and visually distinguish the success case from the =try/except blocks
  • An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block

Consider contextlib and with Statements for Reusable try/finally Behavior

  • the with statement in Python is used to indicate when code is running in a special context.
  • It is easy to make your objects and functions work in with statements using the contextlib built-in module. This module contains the contextmanager decorator which lets a simple function be used in with statements. This is much easier than defining a new class with special methods __enter__ and __exit__
  • The context manager passed to a with statement may also return an object. The object is assigned to a local variable in the as part of the compound statement
  • The value yielded by context managers is supplied to the as part of the with statement. It is useful for letting your code directly access the cause of a special context
  • The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements

Use datetime instead of time for Local clocks

  • the time module fails to consider work properly for multiple local times. Thus, you should avoid using the time module for this purpose. If you must use time, use it only to convert between UTC and the host computer’s local time.
  • datetime only provides the machinery for time zone operations with its tzinfo class and related methods. The Python default installation is missing time zone definitions beside UTC
  • To use pytz effectively, you should always convert local times to UTC first. Perform any datetime operations you need on the UTC values. Then convert to local times as a final step
  • Always represent time in UTC and do conversions to local time as the very final step before presentation

Make pickle reliable with copyreg

  • The purpose of pickle is to let you pass Python objects between programs that you control over binary channels
  • If you serialize, deserialize and then serialize again making changes to the classes, there will be inconsistency between previous serialized objects and the most recently serialized objects
  • Deserializing previously pickled objects may bread if the classes involved ave changed over time
  • The copyreg module lets you register the functions responsible for serializing and deserializing python objects, allowing you to control the behavior of pickle and make it more reliable
  • Use the copyreg built-in module with pickle to ensure backward compatibility of serialized objects

Use decimal when precision is paramount

  • The Decimal class from the decimal built-in module provides fixed point math of 28 decimal places by default
  • The Decimal class is ideal for situations that require high precision and control over rounding behavior, such as computations of monetary values
  • Pass str instances to the Decimal constructor instead of float instances if it’s important to compute exact answers and not floating point approximations

Profile before optimizing

  • Python provides a built-in profiler for determining which parts of a program are responsible for its execution time. This means you can focus your optimization efforts on the biggest sources of trouble and ignore parts of the program that don’t impact speed
  • Python provides two built-in profilers: one that is pure Python and another that is a C-extension module. The cProfile built-in module is better because of its minimal impact on the performance of your program while its being profiled
  • The Profile object’s runcall method provides everything you need to profile a tree of function calls in isolation
  • The Stats object lets you select and print the subset of profiling information you need to see to understand your program’s performance.

Prefer deque for Producer-Consumer Queues

  • the list type can be used as a FIFO queue by having the producer call append to add items and the consumer call pop(0) to receive items. However, this may cause problems because the performance of pop(0) degrades superlinearly as the queue length increases.
  • The deque class from the collections built-in module takes constant time - regardless of length - for append and popleft, making it ideal for FIFO queues.

Consider Searching Sorted Sequences with bisect

  • Searching sorted data contained in a list takes linear time using the index method or a for loop with simple comparisons
  • The bisect built-in module’s bisect-left function takes logarithmic time to search for values in sorted lists, which can be orders of magnitude faster than other approaches

Know How to Use heapq for Priority Queues

Testing and Debugging

Consider Interactive Debugging with pdb

  • In most other programming languages, you use a debugger by specifying what line of a source file you would like to stop on, and then execute the program. In contrast, with Python, the easiest way to use the debugger is by modifying your program to directly initiate the debugger just before you think you’ll have an issue worth investigating
  • Three very useful commands make inspecting the running program easier
    • where
    • up
    • down
  • When you are done inspecting the current state, you can use these five debugger commands to control the program’s execution
    • step
    • next
    • return
    • continue
    • quit
  • The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program

Use tracemalloc to understand memory usage and leaks

  • Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the reference object is also cleared from memory, freeing up that space for other data. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected. In theory, this means the most Python programmers don’t have to worry about allocating or deallocating memory in their programs
  • One of the first ways to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector.
  • It can be difficult to understand how Python programs use and leak memory
  • The gc module can help you understand which objects exist, but it has no information about how they were allocated
  • The tracemalloc built-in module provides powerful tools for understanding the sources of memory usage

Collaboration

Know where to find community-built modules

  • The Python Package Index contains a wealth of common packages that are built and maintained by the Python community
  • pip is the command line tool you can use to install packages from PyPI
  • The majority of PyPI modules are free and open source software

Use Virtual environments for isolated and reproducible environments

  • Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts
  • Virtual environment are created with python -m venv, enabled with source bin/activate and disabled with deactivate
  • You can dump all the requirements of an environment with python3 -m pip freeze
  • You can reproduce an environment by running python3 -m pip install -r requirements.txt

Write Docstrings for every function, class and Module

  • Documentation in Python is extremely important because of the dynamic nature of the language. Python provides built-in support for attaching documentation to blocks of code. Unlike with many other languages, the documentation from the program’s source code is directly accessible as the program runs
  • You can use the built-in pydoc module from the command line to run a local web server that hosts all the Python documentation that’s accessible to your interpreter
  • Each module should have a top-level docstring - a string literal that is the first statement in the source file. The goal of this doc string is to introduce the module and its contents
  • If you are using type annotations, omit the information that’s already present in type annotations from docstrings since it would be redundant to have it in both places
  • For functions and methods: Document every argument, returned value, raised exception and other behaviors in the docstring following the def statement
  • For classes: Document behavior, important attributes, and subclass behavior in the docstring following the class statement

Use Packages to Organize Modules and Provide Stable APIs

  • Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names
  • Simple packages are defined by adding an __init__.py to a directory that contains other source files. These files become the child modules of the directory’s package. Package directories may also contain other packages
  • You can provide an explicitly API for a module by listing its publicly visible names in its __all__ special attribute
  • You can hide a package’s internal implementation by only importing public names in the package’s __init__.py field or by naming internal-only members with a leading underscore
  • When collaborating within a single team or a single codebase, using __all__ for explicitly APIs is probably unnecessary

Consider Module scoped code to configure deployment environments

  • Programs often need to run in multiple deployment environments that each have unique assumptions and configurations
  • You can tailor a module’s contents to different deployment environments by using normal Python statements in module scope
  • Module contents can be the product of any external condition including host introspection through the sys and os modules

Define a Root Exception to Insulate Callers from APIs

  • root exceptions let callers understand when there’s a problem with their usage of an API. If callers are using API properly, they should catch the various exceptions that are deliberated raised
  • root exceptions also help in finding bugs
  • Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers
  • Catching the Python Exception base class can help you find bugs in API implementations

Know how to break circular dependencies

  • When a module is imported, here’s what Python actually does
    • Searches for a module in locations from sys.path
    • Loads the code from the module and ensures that it compiles
    • Creates a corresponding empty module object
    • Inserts the module into sys.modules
    • Runs the code in the module object to define its contents
  • The attributes of a module aren’t defined until the code for those attributes has executed. But the module can be loaded with the import statement immediately after it’s inserted into sys.modules
  • Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity

Consider warnings to Refactor and Migrate Usage

  • Using warnings is a programmatic way to inform other programmers that their code needs to be modified due to a change to an underlying library that they depend on. While exceptions are primarily for automated error handling by machines, warnings are all about communication between humans about what to expect in their collaboration with each other
  • warning.warn also supports the stacklevel parameter, which makes it possible to report the correct place in the stack as the cause of the warning. stacklevel also makes it easy to write functions that can issue warnings on behalf of other code, reducing boiler plate.

Consider Static Analysis via typing to Obviate Bugs – WORK IN PROGRESS

  • the benefit of adding type information to a Python program is that you can run static analysis tools to ingest a program’s source code and identify where bugs are most likely to occur. The typing built-in module doesn’t actually implement any of the type checking functionality itself. It merely provides a common library for defining types, including generics, that can be applied to Python code and consumed by separate tools
  • Most popular implementations of typing tools are mypy , pytype, pyright, pyre
  • There are many new constructs in this chapter that I have never paid attention to. Infact I have hardly written any code that uses typing module to annotate. I should probably spend some time going over typing module and incorporate it in my daily work
  • A wide variety of other options are available in the typing module. Notably, exceptions are not included. Exceptions are not considered part of an interface’s definition. Thus, if you want to verify that you are raising and catching exceptions properly, you need to write tests
  • It’s going to slow you down if you try to use type annotations from the start when writing a new piece of code. A general strategy is to write a first version without annotations, then write tests, and then add type information where it’s most valuable
  • Type hints are most important at the boundaries of a codebase such as an API you provide that many callers depend on. Type hints complement integrations tests and warnings to ensure that your API callers aren’t surprised or broken by your changes
  • It can be useful to apply type hints to the most complex and error prone parts of you code that aren’t part of an API
  • If possible, you should include static analysis as part of your automated build and test system to ensure that every commit to your codebase is vetted for errors. In addition, the configuration used for type checking should be maintained in the repository to ensure that all of the people you collaborate with are using the same rules
  • As you add type information to your code, it’s important to run type checker as you go. Otherwise, you may nearly finish sprinkling type hints everywhere and then be hit by a huge wall of errors from the type checking tool, which can be disheartening and make you want to abandon type hints altogether
  • It’s important that in many situations, you may not need or want to use any type annotations at all. For small programs, adhoc code, legacy codebases, and prototypes, type hints may require far more effort than they are worth
  • Python has special syntax and the typing built-in module for annotating variables, fields, functions and methods with type information
  • Static type checkers can leverage type information to help you avoid many common bugs that would otherwise happen at runtime
  • There are variety of best practices for adopting types in your programs, using them in APIs, and making sure they don’t get in the way of your productivity.

Takeaway

This book is targeted towards intermediate level Python developer and can be a useful reference for writing beautiful code. If you writing throwaway code most of the time, then probably you can give this book a pass. However if you are writing or intend to write a piece of code that will be reusable by you or others, now or in the future, this book can be a valuable reference in writing effective code.