image

This book is cited as the classic reference for  Python programmers. Instead of diving in to Python as the title suggests, I did some ground work before going through this book. I went over Learning Python the Hard Way, Think Python, Python Visual Quick Start guide and understood Python 101 . Those three books gave me some confidence to go over this book that is supposedly for experienced programmers.

The book is organized in a very interesting way. The author starts off every chapter with a rather challenging Python program source code. A reader is expected to go over the code before reading through the chapter. Most of the times I was clueless about what was coded, but I kept moving and found that the author explains almost every line of code written, the rational behind choosing a specific Python object, the specific programming style, and many other things in the code. The author manages to bring in some humor too, an element that is often not seen in Programming books.

I have realized the power of reproducible research, thanks to R. So, the first thing to work on was Literate programming in Python. I stumbled on to Pweave a great module for Literate Programming. As the name suggests, it is Python version of Sweave documents . There are ways to convert a Pnw document to whatever format you want your output as. The intermediate step typically is to convert Pnw to a reST document . Subsequently reST document can be converted to html/pdf/.doc etc. This summary in html format is also the output of using Pnw document ( Pnw=>rST=>html).

I have learnt a ton of stuff from this book such as generators, development frameworks like TDD, power of  Regex, many Python hacks to make program compact and elegant like optimizing look ups, etc.

In this post, I will try to summarize the main points of the book

Chapter 2 - Your First Python Program

  • Python has several implementations such as IronPython, Jython, PyPy, Stackless Python The default interpreter from python.org is the CPython implementation

  • Every Python functions returns something, either a value or None

  • Variable are never explicitly typed in Python. This sort of thing is called dynamic typing

  • Came across a very interesting comparison between Python datatypes and other language data types

    • Statically typed language - A language in which types are fixed at compile time. Most statically typed languages enforce this by requiring you to declare all variables with their datatypes before using them. Java and C are statically typed languages.

    • Dynamically typed language - A language in which types are discovered at execution time; the opposite of statically typed. VBScript and Python are dynamically typed, because they figure out what type a variable is when you first assign it a value.

    • Strongly typed language - A language in which types are always enforced. Java and Python are strongly typed. If you have an integer, you can’t treat it like a string without explicitly converting it.

    • Weakly typed language - A language in which types may be ignored; the opposite of strongly typed. VBScript is weakly typed. In VBScript, you can concatenate the string ‘12’ and the integer 3 to get the string ‘123’, then treat that as the integer 123, all without any explicit conversion.

  • Python is both dynamically typed language and strongly typed language. once a variable has a datatype, it actually matters

  • sys module is written in C . Also all the built-in modules are written in C

  • Everything in Python is an object, and almost everything has attributes and methods

  • sys module is an object that has path as the attribute

  • Definition of a class in Python is rather loose. Everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function. Some objects have neither attributes nor methods. Not all objects are subclassable

  • I thought that 4 spaces as code indent is a MUST. This chapter says that it is not necessary. It only needs to be consistent spacing

  • Indentation is a requirement and not a matter of style. Hence all the programs look similar and hence it is easier to read and understand other people’s code

  • if __name__ trick : Modules are objects, and all modules have a built-in attribute __name__. A module’s __name__ depends on how you’re using the module. If you import the module, then __name__ is the module’s file name , with out a directory path or file extension. If you run the module as a standalone program, __name__ will be a special default value __main__

Chapter 3 - Native Datatypes

  • Dictionary keys are case sensitive

  • Dictionary supports mixed keys. Dictionary values can be string, integers, lists, dictionaries etc. However keys have some restrictions. They can be string, integers and some other data types

  • Dictionaries are an efficient means of storing sparse data

  • Sorting a dictionary using three different ways

  • Lists have two methods, extend and append, that look like they do the same thing, but are in fact completely different. extend takes a single argument, which is always a list, and adds each of the element of that list to the original list. On the other hand, append take one argument, which can be any data type.

  • Python accepts anything in Boolean context according to the following rules

    • 0 is false

    • An empty string is false

    • An empty list is false

    • An empty tuple is false

    • An empty dictionary is false

  • remove only removes the first occurrence in the list

  • pop is an interesting beast as it removes the last element in the list as well as returns the deleted element

  • extend is faster than concatenating the list as the latter creates a new list whereas the former merely extends the list

  • tuples have no methods. They are immutable objects

  • tuples are faster than lists

  • It makes your code safer it you write-protect data and use of tuples can come in handy

  • Dictionary keys should be immutable and hence tuples can be dictionary keys

  • Tuples can be converted to lists and vice-versa

  • use tuples to assign multiple values at once

  • An easy way to assign values to day of the week

  • Tuples are used in formatting. I did not observe this thing even though I worked through a ton of examples in LPTHW. I need to be alert about the kind of code that I work on

  • Tuples are used in string concatenation as using a plus operator between string and integer raises an exception

  • One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list in to another list by applying a function to each of the elements of the list

  • Every thing is an object. “,” is also an object as one can invoke join method

**

Chapter 4 - The Power of Introspection

**

This chapter starts off with a rather complex looking function and explains various components of the function

  • str can be used to convert any thing in to a string

  • dir lists the attributes and methods of any object

  • callable objects include functions, class methods , even classes themselves

  • One can use getattr to invoke a function that is not known until the run time

  • getattr can be used as a dispatcher. Let’s say based on the type of input, you want to do something, you can use the various input types as function names and code the various functions , use getattr to dispatch to various functions

  • You can add a default function , in the getattr method

  • Python has powerful capabilities for mapping lists in to other lists, via list comprehensions.

  • The list filtering syntax [mapping-expression for element in source-list if filter-expression]

  • Boolean is handled in a peculiar way in Python. 0,'',[],(),{} and None are false in Boolean context, everything else is true.

  • In the case of OR statements, the statements are evaluated from left to right.If all the statements are false, then OR returns the last value

  • You can define one-line mini functions on the fly. These are called lambda functions .There is no return statement. The function has no name. They cannot contain commands

  • If you want to encapsulate specific non-reusable code without littering code, use lambda functions

  • Assigning functions to variables and calling the function by referencing the variable is important to understand properly . This mode of thought is vital to advancing understanding of Python

**

Chapter 5 - Objects and Object Orientation

**

Like other chapters in the book, this chapter starts off with a page long code that captures all the important aspects that come in OOPS.

  • Learnt about os.path.splitext(f) a function that split the file name in to 2 parts, one before the dot and one after the dot

  • Another function useful in normalizing the path , os.path.normcase(f)

  • To decide between using from x import y OR import x, it depends on how frequently one is using y function in the code. If there is a possibility of namespace clashes, its better to import specific functions instead of import x

  • Avoid doing a wild import

  • __init__ is like a constructor method but it is not. The object has already been constructed by the time init function is called

  • Subclassing is done easily by merely listing the parent classes in the parenthesis

  • Python support multiple inheritance

  • using self in the class methods is only a convention, but a very strong convention

  • class acts like a dictionary

  • __init__ methods are optional, but when you define one, you must remember to explicitly call the ancestors __init__ method

  • Every class instance has a built-in attribute __class__, __name__, __bases__

  • In Python, simply call a class as it were a function to create a new instance. There is no explicit new operator like in other languages

  • Memory leaks are rare in Python as it implements reference counting. As soon as something goes out of reference, it is removed immediately

  • In Python, you can forget about memory management and concentrate on other things.

  • There is no functional overloading in Python

  • UserString, UserList and UserDict are wrapper classes that mimic built-in string, list and dict classes

  • You can write special methods like __getitem__ and retrieve from the class instance using a dict syntax.

  • There are ton of special class methods that you can write like comparison, length, etc .

  • The convention for defining special class methods is to prepend and append two underscores to the function name

  • Class attributes are different from data attributes. One can think of class attributes as static attributes that are associated with the class. They are present even before instantiating the class. Class attributes are defined soon after the class definition statement

  • Data variables are defined in __init__ method

  • In Python, there is private or public scope for class method or attribute. There is no protected method like C++

  • __class__ is a built-in attribute of every class instance. It is a reference to the class that self is an instance of

Chapter 6 - Exceptions and File Handling

  • try …except has the same function as the try catch block in other languages

  • try .. except.. else :  If no exception is raised in the try block, the else clause is executed afterwards

  • try… finally : the code in the finally block will always be executed, even if something in the try block raises an exception

  • Most other languages don’t have a powerful list datatype like Python. So, you don’t need to use for loop that often

  • os.environ is a dictionary of the environment variables defined on your system.

  • sys module contains system-level information such as the version of Python you are using etc.

  • sys module is also a dictionary

  • Given the name of any previously imported module, you can get a reference to the module itself through sys.modules dictionary

  • The split function splits a full pathname and returns a tuple containing the path and filename

  • splitext functions splits a filename and returns a tuple containing the filename and the file extension

  • isfile and isdir are useful to check whether the object is a file or a directory

  • glob module helps in reading filtering files from a folder

  • fileinfo.py has taught me a lot about Python syntax and OOPS concepts. I think it will take a looooong time before I manage to write a program that is as elegant and succinct as fileinfo.py

Chapter 7 - Regular Expressions

This chapter introduces Regular expressions in a superb manner by using three case studies. First one involves parsing street addresses, second one involves parsing roman numerals and third one involves parsing telephone numbers. All the Regex 101 aspects are discusses such as

  • ^ matches the beginning of the string

  • $ matches the end of the string

  • \b matches a word boundary

  • \d matches any numeric didit

  • \D matches any non-numeric character

  • x? matches any optional x character

  • x* matches x zero or more times

  • x+ matches x one of more times

  • x{n,m} matches an x character atleast n timesm, but not more than m times

  • (a|b|c) matches either a or b or c

  • (x) in general is a remembered group. You can get the value of what is matched by using groups function

Chapter 8 - HTML Processing

The chapter starts off by showing a program that looked overwhelming to me. Its a program that parses an external html and converts the text in to various languages and renders it in to another translated html. So, at the outset reading through the program I did not understand most of things. Basically that’s the style maintained through out the book. Introduce a pretty involved program and explain each of the steps involved in the program. So, the book starts off by talking about SGMLParser that takes in a html document and consumes it. Well, that’s all it does. So, what’s the use of such a class? One has to subclass it and provide the methods so that one can do interesting things. For example one can specify start_a function and list all the urls in a page. This means instead of manually going through the data to find all a hrefs , you can extend this function and get all the links in the page. If a method has not been defined for a specific tag, unknown_starttag'' method is invoked. The chapter then talks about locals and globals , functions that are useful in string formatting. So, the basic structure of this chapter is ,start with SGMLParser ,subclass it and create a BaseHTMLProcessor, subclass it to create Dialectizer, and then subclass it to create various Language specific Dialectizers. One gets to understand the way to make a program extensible by reading carefully the ``dialect.py. This chapter makes one realize the power of sgmllib.py to manipulate HTML by turning its structure in to an object model. This module can be used in many different ways, some of them being

  • parsing the HTML looking for something specific

  • aggregating the results, like the URL lister

  • altering the structure along the way

  • transforming the HTML in to something else by manipulating the text while leaving the tags alone

After going through this chapter, I learnt to write a basic link crawler.

Chapter 9 - XML Processing

The chapter starts with the 250 line program that was overwhelming for me to go through. However the author promises that he would take the reader carefully over all aspects of the 250 line code. After this mega code, the chapter starts talking about packages and the need for organizing Python programs in packages. XML package uses unicode to store all parsed XML data and hence the chapter then dwells on the history of unicode. Python uses ascii encoding scheme whenever it needs to auto-coerce a unicode string in to a regular string. The last two sections of the chapter talk about searching for elements and accessing element attributes in an XML document. Overall, this chapter shows that accessing and reading XML document in Python is made easy by the xml module.

Chapter 10 - Scripts and Streams

  • One of the powerful use of dynamic binding is the file-like object

  • A file-like object is any object with a read method with an optional size parameter

  • file-like objects are useful in the sense that the source could be anything , a local file, a remote xml document, a string

  • Standard output and error are pipes that are built in to every UNIX system. when you print something, it goes to the stdout pipe, when your program crashes and print out debugging information, it goes to the stderr pipe

  • stdout and stderr are both file-like objects. They are both write-only

  • In windows based IDE, stdout and stderr default to interactive window

  • To read command line arguments, either you can import sys and use the iterator sys.argv or use getopt module

I have skipped Chapters 11 and 12 that are based on web services and SOAP. Will refer it to get some general idea at a later date.

Chapter 13 - Unit Testing

This chapter’s basic message is - “Write Unit Tests”, " Code later". This is one of the programming approaches that is popularly known as Test Driven Development. Some of the points I learnt from this chapter are

  • You have subclass the unittest module so that you can use all the useful features of the module in your own function

  • Each individual test that you write takes in no arguments. It returns no value whatsoever. If the method exists normally with out raising any exception, the test is considered passed

  • TestCase class provides a method called assertEqual to check whether two values are equal.

  • There is also a method called assertRaises to check whether the code fails for bad input. Instead of calling manually the function, passing in the argument and then checking whether it raises a specific exception, assertRaises is a goodway to check this all in one single function call

  • Each test case should handle only one question

  • Each test case must be able to work independently of the other test cases

Chapter 14 - Test First Programming

The purpose of this chapter is to show code development via testing. In the previous chapter, set of unit tests were written for testing the conversion from number to roman numerals and roman numerals to numbers. Through a series of Python programs, the chapter manages to come up with an all tests-ok code. I particularly like this idea of writing tests before even you start coding. It will force one to think of all the possible ways to develop a nice set of code.I tried doing this with out looking at the author’s working and I found the solution in the book a million times more elegant than my code. It never occurred to me that you can use regular expressions for checking the correct input for the conversion code. Overall one of the best chapters in the book.

Chapter 15 - Refactoring

By adding some more functions to the roman numerals problem, the author shows way to refactor the code.

Chapter 16 - Functional Programming

  • map and filter functions have been in Python forever. List comprehensions have been introduced since Python 2.0.All the three functions are very useful when you want to vectorize stuff. If you have already coded in MATLAB or R, vectorizing is the way you think and code Thanks to List comprehensions. they enable to you to think in vector centric way. Out of the the three, map, filter and list comprehensions, I think I like list comprehensions the most.

  • The author strongly recommends using map, filter and list comprehensions for creating a better code

  • The book shows a nice way to import modules dynamically

Chapter 17 - Dynamic Functions

This is my favorite chapter of the book. I was amazed at how the author takes a simple example of pluralizing a noun, introduces the concept of lambda functions, generators to make the code look extremely beautiful. From a raw if then statement code, the author takes the reader in a systematic manner in 6 iterations to a code that simply is beautiful. I am really thrilled to see so much of infra built in to this language. I don’t think I will ever move out of Python and R to do anything, at least for now. The one thing that I have to practice and implement in my own code soon is generators. For me, this chapter had so many ‘aha’ moments that I will revisit this chapter at a later date. The last chapter offers a list of performance hacks that probably an experienced programmer might appreciate. Overall a fantastic book. Loved every moment of it

image Takeaway:

Reading other’s code is an activity that a programmer must make it a part of his daily life, to become a better programmer. This book is organized in such a way that you are forced to read code to start with and then you are given the rationale behind the code.I think this particular structure is the highlight of the book and it is no wonder that this is one of the most popular books on Python.