This blog post summarizes the book titled “Python Tricks - A Buffet of Awesome Python Features”

Covering your A** with Assertions

  • An assertion error should never be raised unless there is a bug in your program
  • In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it.[1] The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics the traditional term is probe effect, where attaching a test probe to a device changes its behavior.
  • assert can be globally disabled by -0 and -00 command line switches, as well as PYTHONOPTIMIZE environment variable in Python
  • Never use assert statements to validate data
  • It is surprisingly easy to write asserts that never fail
  • assert statement is a debugging aid that tests a condition as an internal self-check in your program
  • Asserts should only be used to help developers identify bugs. They are not a mechanism for handling run-time errors
  • pytest tells you to write assert and the test condition in a single line
  • assert(1 = 2, ‘This should fail’)= This will never fail because assert looks at a tuple and evaluates it to True always

Complacent comma placement

  • Multiple adjacent string literals, possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation
  • In python, you can place a comma after every item in a list, dict or set, including the last item
  • smart formatting and comma placement can make your list easy to maintain

Context managers and the with statement

  • with statement makes acquiring and releasing resources a breeze
  • The alternative to using context manager is to write your own try and finally block
  • A context manager is nothing but an object that happens to have dunder enter and dunder exit method implemented
  • one can use contextlib and use the contextmanager decorator function to define a generator based factory function for a resource that will then automatically support the with statement
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from contextlib import contextmanager
@contextmanager
def managed_file(name):
    try:
        f = open(name, 'w')
        yield f
    finally:
        f.close()
>>> with managed_file('hello.txt') as f:
... f.write('hello, world!')
... f.write('bye now')
  • Found an interesting implementation of indentation using contextmanager decorator
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@contextmanager
def rkindentor():
    level=0
    @contextmanager
    def _indenter():
        nonlocal level
        try:
            level += 1
            yield
        finally:
            level -= 1
    def _print(text):
        print('\t' * level + text)
    _indenter.print = _print
    yield _indenter

with rkindentor() as indent: print("\n") indent.print("radha") with indent(): indent.print("krishna") with indent(): indent.print("pendyala") indent.print("hey")

Underscores, Dunders and more

  • There are five underscore patterns that one must be aware of in Python
    • Single leading underscore
    • Single trailing underscore
    • Double leading underscore
    • Double leading and trailing underscore
    • Single underscore
  • Single leading underscore is agreed upon convention that the variable is intended for private use
    • If you do a wildcard import, the leading underscore variable and function will not be imported
    • If you do a regular import, the leading underscore variable and function will be imported
  • Single trailing underscore
    • Sometimes the most fitting name is already taken up by Python such as def, print etc. Hence the convention is to use a trailing underscore to use these names
  • Double leading underscore
    • These variables are changed by the Python interpreter so that any derived class cannot override these variables. The change of variables is called name mangling
  • Double leading and trailing underscore
    • variables starting with double leading and trailing underscore are not touched by Python interpreter
    • reserved for special use
  • Single underscore is meant to convey that the variable is a temporary or a throw away variable
  • _ is a special variable that represents the result of the last expression

A shocking truth about string formating

  • There are four ways to format strings in Python
  • % operator
    • There is a % operator on the string that can be used to do positional formatting of strings
    • If there are multiple substitutions that you need to make, it is better to bunch up all the variables in to a dictionary and then use the % operator
    • Using % operator is called old style string formating
  • format function - Python 3 - The new style is using the format function
  • fstrings : Python 3.6 - Formatted string literals
    • Behind the scenes the formatted string literals are a Python parser feature that converts f-strings in to a series of string constants and expressions
  • Template:
    • One needs to import from standard library
1
2
from string import Template
Template('Hey $name').substitute(name=name)
  • Template strings are better from a safety perspective as they reduce security vulnerabilities to your program
  • Rule of thumb
    • If strings are user supplied, use Template strings
    • If you are using Python 3.6+, use formatted string literals
    • If you are using older Python 3, use format function

Python Functions are first-class

  • Python attaches a string identifier to every function at creation time that can be accessed by dunder name
  • functions can be stored in data structures
  • The ability to pass functions around is powerful as it allows to pass around behaviors in your program
  • functions can also return functions, i.e. return behaviors
  • A closure remembers the values from its enclosing lexical scope even when the program flow is no longer in that scope
  • A functions can also preconfigure behaviors
  • All functions are objects but not the other way around. An object can be made in to a function by implementing dunder call method

Lambdas are Single-Expression functions

  • Lambda functions are restricted to single expression. They can’t use annotations or statements
  • Executing a lambda executes the single expression and then returns the result of evaluating the expression
  • It is better to use list comprehensions and generator expressions as compared to using map and filter operations

The Power of decorators

  • Python’s decorators allows you to extend and modify the behavior of the callable without permanently modifying the callable itself
  • Some of the usecases of decorators
    • logging
    • enforcing access control and authentication
    • instrumentation and timing functions
    • rate-limiting
    • caching, and more
  • decorators are applied from bottom to top
  • decorating functions that takes arguments: This is done by the following template
1
2
3
4
5
def trace(func):
    def wrapper(*args, **kwargs):
        original_result = func(*args, **kwargs)
        return original_result
    return wrapper
  • use functools.wraps to carry the docstrings and parameter names to the decorated function

Fun with *args and **kwargs

  • It is used to make the function flexible. It can take positional arguments and keyword arguments
  • It also gives an opportunity for the functions to modify the keyword arguments or positional arguments before passing along to other functions
  • *args collect extra positional arguments as tuple. **kwargs collect the extra arguments as a dictionary

Function Argument Unpacking

  • put a * before an iterable - Python will unpack it and pass the elements to the function
  • put a ** before a dictionary - Python will unpack it as keyword arguments and pass it along

Nothing to Return here

  • Every python functions returns None if you do not specify explicitly a return statement
  • It is better to communicate the intent of your code by explicitly stating a return statement than avoiding one
  • code is communication

Object comparisons

  • == operator is used to check equality whereas is operator is used to check identities
  • is expression evaluates to True if they are pointing to the same object
  • == evaluates to True if the objects are referred to by the variables are equal

String conversions - Every class needs dunder repr

  • The fact that str and such methods start and end with a double underscore is simply a naming convention to flag them as core Python features
  • Inspecting an object in Python interpreter simply prints the results of repr method
  • When to use what str and repr ? : It is better to use repr strings unambiguous and helpful to the developers
  • If you don’t addd a str method, Python falls back to the repr method
  • In Python 3, there’s one data type to hold all kinds of text in the world - str
  • In Python 2.x, there are two data types. str uses ASCII text and unicode which is equivalent to Python 3’s str function
    • In Python 2.x, str returns bytes whereas unicode returns characters
  • Always use Python 3’s str

Defining your own Exception classes

  • Custom Exception classes help the downstream applications/ developers make sense of errors without having to go through the source code implementation
  • One should have custom exception base class for a project and then derive all sorts of exceptions from this base class
  • defining custom exception classes makes it easier for your users to adopt an it’s easier to ask for forgiveness than permission (EAFP) coding style that’s considered more Pythonic.

Cloning objects for fun and profit

  • Assignment statements in Python don’t create copies of object. They only bind names to the object. For immutable objects, it does not matter
  • For mutable objects, the usual constructors available do shallow copy. This means that one constructs a new collection object and then populates references to the child objects
  • One can use copy.deepcopy to do deep cloning
  • copy module gives the power to do shallow copying and deep copying

Abstract base classes keep Inheritance in check

  • abc module is useful to respect inheritance structure
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from abc import ABCMeta, abstractmethod

class Base(metaclass = ABCMeta): @abstractmethod def foo(self): pass @abstractmethod def bar(self): pass

class Concrete(Base): def foo(self): pass

x = Concrete() # Gives Type Error

Base()

  • Using ABCs can help avoid bugs and make class hierarchies easier to maintain

What are Named Tuples good for ?

  • One cannot give names to various elements in a regular built-in tuple
  • NamedTuples are useful to give names to various elements in a tuple
  • These were added in Python 2.6 as a part of collections library
  • NamedTuples can be thought of as a memory efficient shortcut to defining an immutable class in Python
  • NamedTuples can help clean up your code by enforcing easier-to-understand structure on your data
  • _fields method is used to access the field of a named tuple
  • Namedtuples provide a few useful helper methods that all start with a single underscore, but are part of the public interface. It’s okay to use them.

Class vs Instance variable pitfalls

  • There are two kinds of data objects on Python objects - class variables and instance variables
  • you can access the class variables using instance or class
  • modifying a class variable on the class namespace affects all instances of the class
  • class variable can be shadowed by instance variables

Instance, Class and Static Methods Demystified

  • class method takes in cls attribute and hence cannot modify object instance state
  • static method takes in no parameter and hence cannot modify object or class state
  • Python allows only one init method and hence by using @classmethod, you can create as many class constructors as you want
  • Put differently, using static methods and class methods are ways to communicate developer intent while enforcing that intent enough to avoid most “slip of the mind” mistakes and bugs that would break the design.

Dictionary, Maps and Hash tables

  • A hashable object is one whose hash value never changes in its lifetime
  • OrderedDict preserves the order in which keys have been created
  • defaultDict is another class in the collections field that accepts a callable in its constructor whose return value will be used if a requested key cannot be found
  • chainMap groups multiple dictionaries in to a single dictionary
  • types.MappingProxyType is a wrapper around a standard dictionary that gives a read only dictionary

Array Data Structure

  • Arrays are contiguous data structures
  • A restricted parking lot corresponds to a TypedArray
  • Python lists are implemented as DynamicArrays
    • data is loosely packed
  • Python tuple sizes are decided at the time of initialization. They are immutable and hence the data is tightly packed
  • Python’s array module provides space-efficient storage of basic C style data types like bytes, 32-bit integers, floating point numbers
  • Arrays created with array module are TypedArrays
  • Python 3.x uses str objects to store textual data as immutable sequences of unicode characters
  • strings are recursive data structures
  • Byte objects are immutable sequences of single bytes
  • Bytearray objects is a mutable set of integers in the range of 0 to 255. They are closely related to bytes objects

Records, Structs and Data Transfer Objects

  • dict is an associative array
    • they are mutable and offer no protection against wrong field names
  • tuple is immutable but no protection against missing fields and wrong order
  • custom class
  • collections.namedtuple
  • tying.NamedTuple - similar to collections.namedtuple but with support for type hints
  • struct.Struct class converts between Python values and C structs
  • types.SimpleNamespace glorified dictionary
  • If you’re looking for a safe default choice, my general recommendation for implementing a plain record, struct, or data object in Python would be to use collections.namedtuple in Python 2.x and its younger sibling, typing.NamedTuple in Python 3.

Sets and Multiset

  • set, frozenset, collections.Counter are mentioned in this chapter
  • frozenset can act as dictionary keys
  • Counter implements a multiset bag type

Stacks

  • stack is LIFO
  • queue is FIFO
  • To get the amortized O(1) performance for inserts and deletes, new items must be added to the end of the list with the append() method and removed again from the end using pop(). For optimum performance, stacks based on Python lists should grow towards higher indexes and shrink towards lower ones.
  • list can be considered as simple built-in stack
  • collections.deque implements a double ended queue that supports adding and removing elements from either sides
  • queue.LifoQueue for implementing LIFO
  • list is backed by a dynamic array which makes it great for fast random access, but requires occasional resizing when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing, and you get an amortized O(1) time complexity for these operations. But you do need to be careful to only insert and remove items “from the right side” using append() and pop(). Otherwise, performance slows down to O(n).
  • collections.deque is backed by a doubly-linked list which optimizes appends and deletes at both ends and provides consistent O(1) performance for these operations. Not only is its performance more stable, the deque class is also easier to use because you don’t have to worry about adding or removing items from “the wrong end.”

Queues

  • list is terribly show queue
  • collections.deque can act as queue as it gives O(1) performance for adding elements at the beginning or end. However for random access it is O(n)
  • queue.Queue locking semantics for Parallel computing
  • -multiprocessing.Queue= shared job queues
  • If you’re not looking for parallel processing support, the implementation offered by collections.deque is an excellent default choice for implementing a FIFO queue data structure in Python. It provides the performance characteristics you’d expect from a good queue implementation and can also be used as a stack (LIFO Queue).

Priority Queues

  • One can use several alternatives in Python to get a Priority queue implementation.
  • list can be used to get a priority queue. One can add elements, sort it manually so that elements are in the order of priority
  • heapq module is also an alternative where it prov
  • queue.PriorityQueue is an another alternative if you are looking for synchronization and locking semantics
  • queue.PriorityQueue stands out from the pack with a nice object-oriented interface and a name that clearly states its intent. It should be your preferred choice

Writing Pythonic Loops

  • avoid the range(len) pattern if you are iterating over a list or set or some built in structure
  • if you want index, you can use enumerate
  • if you are iterating over a python data structure, check to see if the object itself has functions useful for iterating over it
  • Avoid managing loop indexes and stop conditions manually if possible.
  • Python’s for loops are actuall “for-each” loop that can directly iterate over the items from a container

Comprehending Comprehensions

  • They are a key feature in Python
  • They are just fancy syntactic sugar for simple for loop
  • Don’t use list comprehensions, dict comprehensions, set comprehensions for more than one level

List Slicing tricks and the Sushi operator

  • In Python 3, you can use = list.clear()=
  • One can use slicing to replace all elements of a list without creating new objects
  • lst[::] creates a shallow copy

Beautiful Iterators

  • for in loop is a syntactic sugar for iterator calling its __iter__ method. It returns an iterator object
  • The loop repeatedly calls __next__ method of the iterator
  • If you’ve ever worked with database cursors, this mental model will seem familiar: We first initialize the cursor and prepare it for reading, and then we can fetch data from it into local variables as needed, one element at a time.
  • iter(x) invokes dunder iter method
  • If you invoke __next__ method after you have exhausted the list, it invokes a StopIteration exception
  • To support iteration, an object needs to implement dunder init and dunder next methods

Generator Expressions

  • Once a generator expression has been consumed it cannot be reused. Hence in that sense a class based or method based generators have the added flexibility
  • They look similar to list comprehensions but do not generate any objects. Instead, they generate values “just in time”
  • They are best to implement simple adhoc iterators

Iterator Chains

  • You can chain iterators so that each iterator can be fed in to another iterator
  • Data processing happens one element at a time
1
2
3
4
integers = range(8)
squared = (i*i for i in integers)
negated = (-i for i in squared)
list(negated)
  • One can keep extending the chain of generators to build out a processing pipeline with many steps. It would still perform efficiently and could easily be modified because each step in the chain is an individual generator function
  • It can impact readability though

Dictionary Default Values

  • Avoid explicit key in dict checks when testing for membership
  • collections.defaultDict could be a better alternative

Sorting Dictionaries for Fun and Profit

  • one can use operator.itemgetter and operator.attrgetter for key arguments

Emulating Switch-case with dicts

  • One can use dictionary keys as conditions and push all the logic for each case as a lambda function or generic function as value for these keys

Craziest Dict expression in the West

1
{True: 'yes', 1: 'no', 1.0: 'maybe'}

evaluates to

1
{True:'maybe'}
  • Python treats bool as subclass of int

So many ways to merge dictionaries

  • In Python 3.5 and above, one can use ** operator to merge multiple dictionaries
1
2
3
4
5
>>> x = {'a':121}
>>> y = {'b':2121}
>>> z = {**x, **y}
>>> z
{'a': 121, 'b': 2121}
  • To stay compatible with older versions, you can use update method

Dictionary pretty printing

  • Disadvantage of using json.dumps is that it cannot stringify complex objects
  • Alternative to json.dumps is to use pprint.pprint function

Exploring Python Modules and Objects

  • Use dir and help to explore modules and objects

Isolated Project dependencies with virtual env

  • Virtual environments keep your project dependencies separated. They help you avoid version conflicts between packages and different versions of the Python runtime.
  • As a best practice, all of your Python projects should use virtual environments to store their dependencies. This will help avoid headaches.

Peeking beyond the Bytecode curtain

  • CPython executes programs by first translating them into intermediate byte code and then running the bytecode on a stack based virtual machine
  • You can use the built-in dis module to peek behind the scenes and inspect the byte code
  • CPython is a VM - Virtual Machine. VM’s are everywhere on the cloud. It pays to read up on them