This blog post is about a book titled “Grokking Data Structures” written by Marcello La Rocca

Introducing data structures

Python is an multi-paradigm language. Although Python is fundamentally imperative, it supports several programming styles:
- Procedural / Imperative: via statements, loops, and assignments.
- Object-Oriented: via classes and objects.
- Functional: via map, filter, lambda, and higher-order functions.
- Declarative-like: through frameworks and libraries such as SQLAlchemy or Pandas expressions that emphasize what over how.

Python is a multi-paradigm language, primarily imperative, but supporting functional and object-oriented styles.

Paradigm	Example Languages	Core Idea	Python Fit
Imperative	C, Python, Java	Specify how to perform tasks step by step	Strong
Declarative	SQL, HTML, Prolog	Specify what result you want	Partial
Functional	Haskell, Lisp	Avoid state, use pure functions	Partial
Object-Oriented	Java, C++, Python	Model data as interacting objects	Strong

Data structures are a way of organizing and storing information in a computer or a program. They help to efficiently manage and manipulate data
Why are data structures and algorithms mentioned or studied together ? Think of data structures as nouns and algorithms as verbs. The former implicitly defines algorithms for operations, such as adding, retrieving, and removing its elements
Some data structures are specifically designed to allow the efficient execution of certain algorithms, such as hash tables for key-based search
Why care about data structures ?
- New trends take advantage of data structures that have emerged
- Makes you a better developer
- You will have more tools to play with and attack a problem
Mental model of data structures
1. Understand the problem you are solving
2. Sketch out a possible solution
3. Identify the data structures you need
4. Implement a solution
5. Check whether the solution works; if not, iterate
6. Checks whether the solution is good enough; if not, iterate
Nice simple example of trying to register pets at a doctor’s clinic to illustrate stacks, queues and priority queues
Algorithms and data structures complement each other the way nouns and verbs complement each other in a sentence
Data structures are about how you store data. Algorithms are about how you use that data. Together, they determine how fast and how efficiently your program runs.

Static Arrays

Think of memory as a modular shelf holding removable drawers
Statically vs. dynamically sized arrays
Array data structure has the following properties
- It stores a collection of data
- Its elements can be accessed by index
- Elements don’t have to be accessed sequentially
Python offers lists that are closer to dynamic arrays
The term array usually serves as a synonym for a statically sized array, a collection of elements accessed by an index, where the number of elements is fixed for the entire lifetime of the collection
Many programming languages such as C or Java, offer static arrays as a built-in feature
Arrays can be initialized at compile time. If a language allows you to skip initialization, then the initial value of the array’s elements depends on the language

Sorted Arrays

A sorted array is an array whose elements are kept in order as they change
To maintain the elements of an array in order, we need a different approach when inserting and deleting elements. These methods must preserve the order and therefore require more effort than their counterparts for unsorted arrays
On an already sorted array, we can run binary search, a search algorithm that can find a match by looking at fewer elements than linear search
With sorted arrays, you have faster search, but you also have an extra cost to keep them sorted. Therefore, they should be preferred when there is a high read-to-write ratio

Big-O notation

Understanding this helps in choosing the right data structure for the problem
One can use profiling to check the performance of an implementation but there are issues with using profiling
- Since we profile using a specific implementation of an algo, the results are heavily influenced by the programming language we choose
- We can’t assume that the profiling tests will hold good for all implementations
- All profiling results are based on finite input
Advantage of Asymptotic analysis is that it is independent of any implementation
Profiling is useful in the implementation phase whereas Asymptotic analysis is useful in the design phase
RAM random access machine is a good simplified model to understand Asymptotic analysis
There are three types of analysis that can be done on the performance of an algo
- worst case analysis
- average analysis
- amortized analysis
The RAM model is a simplified computational model of a generic computer that provides only a limited set of basic instructions
Big-O notation is used to classify functions based on their asymptotic growth. We use these classes of functions to express how fast the running time or memory used by an algorithm grows as the input becomes larger

Dynamic arrays

The main issue with going for dynamic arrays is that one has to resize the array and move the data. Both are expensive operations and there is no option but to do it
The key to dynamic arrays is the strategy one adopts when shrinking or growing an array
The ideal case for growing an array is when it hits the capacity and double it
The ideal strategy for shrinking an array is when the elements falls below a certain threshold
Dynamic arrays are a way to get the best of arrays and add some flexibility. They are not a different type of data structure. They use fixed-size arrays but add a strategy to grow and shrink them as needed, reallocating the underlying static array each time it needs to resized
The best strategy for dynamic arrays is to double the size of underlying static array when we try to insert a new element into a full array and halve the size of the array when, after removing an elements, three-quarters of the elements are empty

Shrink Operation:

Worst-case performance:

The worst case for the shrink operation occurs when the number of elements in the array reaches the threshold for shrinking (e.g., 1/4 of the capacity) after a series of deletions.
In this case, the shrink operation needs to create a new array with half the size and copy all the remaining elements to the new array.
The time complexity of the shrink operation in the worst case is O(n), where n is the number of elements in the array before shrinking.

Average-case performance:

The average case for the shrink operation depends on the distribution of the number of elements in the array and the frequency of deletions.
On average, the shrink operation will be triggered less frequently than the worst case since deletions may not always reach the threshold for shrinking.
The average-case time complexity of the shrink operation can be considered as O(n), but with a lower constant factor compared to the worst case.

Amortized-case performance:

The amortized-case performance of the shrink operation takes into account the overall cost of multiple shrink operations over a sequence of deletions.
Although each individual shrink operation has a worst-case time complexity of O(n), the amortized cost of shrinking is O(1).
This is because the cost of copying elements during shrinking is offset by the larger number of deletions that can be performed before the next shrinking operation is triggered.
The amortized analysis guarantees that the average cost per deletion, including the cost of occasional shrinking, is constant.

Delete Operation:

Worst-case performance:

The worst case for the delete operation occurs when the element to be deleted is at the beginning of the array.
In this case, all the elements after the deleted element need to be shifted one position to the left to fill the gap.
The time complexity of the delete operation in the worst case is O(n), where n is the number of elements in the array.

Average-case performance:

The average case for the delete operation depends on the position of the element to be deleted.
On average, the delete operation requires shifting half of the elements in the array.
The average-case time complexity of the delete operation is O(n/2), which simplifies to O(n).

Amortized-case performance:

The amortized-case performance of the delete operation is O(1).
Although each individual delete operation may require shifting elements, the amortized cost per deletion remains constant.
This is because the cost of shifting elements is amortized over multiple deletions, and the dynamic array’s resizing strategy (e.g., shrinking when the number of elements falls below a threshold) ensures that the overall cost is amortized.

It’s important to note that the amortized-case performance provides a more comprehensive analysis of the overall efficiency of the dynamic array operations, considering the long-term behavior and the cost of resizing operations.

In practice, the amortized-case performance is often more relevant than the worst-case performance because it gives a better indication of the average-case behavior and the efficiency of the data structure over a sequence of operations.

Linked Lists

Dynamic arrays give us an illusion of flexibility, but they are not a different data structure. They are just a strategy to resize static arrays as efficiently as possible
Main difference between array and linked list is that in the latter the data is not stored in contiguous blocks of memory. This means knowing the position of the first element of the linked list does not automatically determine the position of the rest of the elements of the array
Since the location of each data element is not automatically known, there is a need to store the memory location of each of the element
The first element of a linked list is called its head, and the last element of the list is called its tail. The characteristic of a tail node is that its next pointer doesn’t point to another node
Linked lists as two tiered data structures. The external data structure that implements the linked list itself and provides an API for the clients to interact with the list and perform our usual operations
- Shell or wrapper around the linked list that provides the API
Linear time insertion in singly linked lists
Inserting elements at the front of the singly linked list is faster than insertion at the end because of the asymmetry of data storage. In singly linked list, we store the pointer to the head but not the pointer to the tail
We cannot use _search method in a linked list to delete a node
In a sorted linked list, it is difficult to improve the performance of searching from linear time.
Computer game example to highlight the importance of doubly linked list
Circular linked lists can contain singly linked list or doubly linked list
Circular linked lists are lists where the successor of the tail of the list is the head of the list. All nodes have a successor and all nodes have a predecessor
Circular linked lists are used whenever we need to traverse a list repeatedly
Operations such as insert, delete and search are, for most part, as fast in SLLs as DLLs. Doubly linked lists are faster when we need to insert or delete an element from the end of the list
DLLs require more memory than SLLs
In a doubly linked list, each node also stores a pointer to its predecessor. Therefor, doubly linked lists can also be traversed from their tail to their head making it easier to read the list from both directions.

Bags

Hierarchy

An abstract data structure focuses on what operations you can perform on the data, without specifying how those operations are implemented
A data structure specifies more concretely how data is represented, as well as algorithms for performing operations on the data
There are three levels of hierarchy in data structures
- ADT is a theoretical concept that describes at a high level how data can be organized and the operations that can be performed on the data. It provides little or no detail about the internal representation of the data, how the data is stored, or how physical memory is used
- DS is the refinement of the spec provided by ADT where the computational complexity of its operations are normally discussed
- Implementation: we have to write code for the data structure, so we choose a programming language and translate the general instructions given by the data structure into code
An ADT can be specified by many DSes. A DS can be implemented in many ways
Array can fit in to many levels of hierarchy
Linked lists and arrays are two refinements, two data structures, stemming from the same abstract data type
Another example that can be used to distinguish between the three levels of hierarchy is light switch. It can be thoughts of as an ADT, data structure and also at implementation

Real-world analogies illustrating ADT,DS

1. Book Library System
- ADT: Library — defines operations such as add, remove, search, and borrow books. Abstract idea of managing a collection of books.
- DS: Could be implemented using a list, tree, or hash map to organize and retrieve books.
- Implementation: Actual database code or system (e.g., MySQL tables, JSON files, or Python dictionaries) that carries out these operations.

1. Music Playlist
- ADT: Playlist — defines operations such as add song, remove song, play next, and shuffle.
- DS: Could use a linked list (for sequential play), array (for random access), or queue.
- Implementation: Spotify or Apple Music feature coded in Java/Kotlin/Python managing songs in memory or through an API.

1. Restaurant Queue
- ADT: Queue — defines enqueue (customer joins line) and dequeue (customer is served).
- DS: Could be a linked list, array, or circular buffer.
- Implementation: POS system code that manages waiting customers written in Python, C#, or JavaScript.

1. Calendar or To-Do List
- ADT: Scheduler — defines add event, delete event, and find next event operations.
- DS: Could use a priority queue or balanced binary search tree to manage events by time.
- Implementation: Calendar or task management app written in Swift or JavaScript using SQLite or heap structures.

1. Light Switch
- ADT: Switch — has two states (on, off) and operations (turn on, turn off).
- DS: Could be modeled as a boolean variable or simple state machine.
- Implementation: Physical circuit or GUI button controlling an electrical or software process.

Containers

A container is an abstract concept, a definition of a large group of data structures with common characteristics.

Containers are collection of elements. They hold multiple elements, which can be of the same or different type and can be stored in a particular order without any order
Containers typically provide the same set of basic operations to insert, delete, access, modify and search elements in the collection
Containers can be traversed
Containers can maintain the elements they store in a certain order
Containers are designed to provide efficient access to their elements.

A bag can be considered as an ADT and we can use the following DS

Static arrays
Dynamic arrays
Linked lists

A bag can be implemented on top of basic data structure such as arrays and linked lists. The singly linked list implementation guarantees the best running time for both operations defined on a bag.

Stacks

Stack works on LIFO principle
If one uses dynamic array as the data structure for implementing stack, here are the following disadvantages
- Occasional Costly resizing: While the amortized cost is O(1), for occasional resizes can cause latency spikes
- If the stack fluctuates a lot in size, you will keep holding onto that large underlying array until explicitly trimmed
- If the stack grows large, finding a contiguous block big enough might be an issue in low-memory environments
- Most implementations don’t automatically reduce capacity when you pop elements
- When resizing occurs, all existing pointers/references/iterators to stack elements become invalid because the memory location changes

Aspect	Dynamic Array Stack	Comment
Push (avg)	O(1)	Amortized constant time
Push (worst)	O(n)	When resizing occurs
Pop	O(1)	Always
Memory efficiency	Medium	May over-allocate
Reference stability	Poor	Resizing invalidates references
Memory layout	Contiguous	May cause fragmentation issues
Ease of implementation	Easy	Simple and cache-friendly

Linked list is one of the best ways to implement stack
However the author shows that if one compares a dynamic array implementation of stack and compare it with the linked list implementation, it turns out that the former is much faster
- While the worst-case running time for push and pop is linear, their amortized running time is as good as linked lists.
- Python provides an optimized, extremely efficient implementation for list. This code is usually written in C and compiled for use in Python to make sure it’s as efficient as possible
Applications of Stack
- Call stack
- Evaluating expressions in postfix notation
- Undo/redo functionality in an IDE

Quick Primer on cProfile and pstats in Python

What is cProfile

A deterministic profiler built into Python’s standard library.
It measures how much time your program spends in each function and how many times each function is called.
Useful for identifying performance bottlenecks and slow functions.

Key Columns in Profiling Output

Column	Meaning
ncalls	Number of calls made to the function
tottime	Total time spent in the function itself (excluding subcalls)
cumtime	Cumulative time spent in the function and all subcalls
percall	Average time per call (`tottime / ncalls` or derived from cumtime)

How to Use

Basic usage:

1
2
3
4
5
6


import cProfile
prof = cProfile.Profile()
prof.enable()
# Run your code
prof.disable()
prof.print_stats()

You can also wrap calls directly using:

1

prof.runcall(function, *args, **kwargs)

What is pstats

A companion module for formatting and sorting the results of cProfile.
Lets you sort by various metrics (time, filename, number of calls, etc.).

Example usage:

import pstats
from pstats import SortKey
st = pstats.Stats(prof)
st.strip_dirs().sort_stats(SortKey.CUMULATIVE).print_stats("pattern")

Common Methods in pstats

Method	Purpose
strip_dirs()	Removes long directory paths for readability
sort_stats()	Sorts results by criteria such as `SortKey.TIME`, `SortKey.CUMULATIVE`, `SortKey.FILENAME`
print_stats(“pattern”)	Prints only matching functions (e.g., containing “stack”)

Example Workflow

Create a cProfile.Profile() object.
Run code inside prof.runcall() or between prof.enable()/disable().
Pass the profile object to pstats.Stats().
Sort and print results with desired filters.

Typical Analysis Goal

Identify which functions dominate execution time.
Compare different implementations (e.g., linked list vs dynamic array stack).
Optimize functions that have high cumulative time.

Queues

Queues are based on LIFO principle
A method to insert an element into the queue is called enqueue
A method to delete an element into the queue is called dequeue
Queues are widely used in computer science and programming, including messaging systems, networking, web servers and operating systems
A queue can be implemented using either arrays or linked lists to store its elements
Using linked lists, enqueue and dequeue take O(1)
Using static arrays, we can implement a linear queue, which can only support a fixed number of enqueue operations. Alternatively, we can implement a circular queue, where the array is imagined as a circular container.
Advantages of choosing array over linked list for queue
- memory efficiency: an array will require less memory than a linked list to store the same elements. It has only a constant space overhead regardless of the number of elements in the array
- memory locality - single chunk of memory where all elements are stored side by side
- performance: operations on arrays are faster than on linked lists

Priority Queues and Heaps

There are two important methods that we need to include in the interface of a priority queue - one to add a new element to the queue and one to get the element with the highest priority
Priority queues can be implemented using various data structures, but the maximally efficient implementation is achieved using heaps
One can use the following sorted or unsorted arrays, sorted or unsorted linked lists. The issue is you either have to have O(n) for insertion or O(n) for top

	insert	top
Sorted static array	O(n)	O(1)
Unsorted static array	O(1)	O(n)
Sorted linked list	O(n)	O(1)
Unsorted linked list	O(1)	O(n)
Heap	O(logn)	O(logn)

Heap is a data structure that does not straddle between the extremes. It gives a O(logn) for insert and top
A heap is a special kind of tree with three properties
1. Binary heap is a heap where each node of the tree can have at most two children
2. The heap tree is almost complete, which means that every level of the tree, except possibly the last level, is complete; furthermore, the nodes on the last level are as far left as possible
3. Each node holds the highest priority element in the subtree rooted at that node.
Heaps are better implemented as an array. This is possible because a heap is an almost complete tree
With the array implementation of a heap, we can build a priority queue where insert and top take logarithmic time
It is possible to transform an array of n elements into a heap in linear time, using the heapify method
The explanation in this chapter is so elaborate and elegant that any newbie can follow it. It took me 30 min to read through the logic and understand the code behind heap. I don’t remember when was the last time I looked at heaps and actually thought about the inner workings. The author uses a ton of visual to illustrate the pseudo code behind the workings of a heap
The example used in the book to illustrate the use case for heaps is - finding the k largest entries in an array. Naive way would be sort the array in the descending order and take the first three elements. But that would be overkill for large k and large n. It would be O(nlogn). Instead using a heap makes it an O(nlogk) which is much better.
Relearned the concept of min-heap and max-heap. May be it was way back in 2012 or 2014 that I had a chance to read through this stuff. I am revisiting this after almost a decade

Binary search trees

Trees can be used to implement several abstract data types
A generic tree is a composite data structure that consists of nodes connected by links. Each node contains a value and a variable number of links to other nodes.
terminology: root, leaf, internal node, descendant, children, subtree, height of the tree
A node N is an ancestor of a node M if N is in the path between the root and M.
A descendant of a node N is either one of the children of N or the descendant of one of the children of N
All children of the same node are siblings. There is no direct link between siblings, so the only way to get from one sibling to another is through their common parent
Tree can represent a partial relation, where there are elements that are comparable and ordered and others that are not comparable or have no relationship between them
Binary trees are defined by restricting each node to a maximum of two children. The order of children is not important
Trees are a very large class of extremely versatile data structures
Trees can also be used as containers. BST is one container that is a tree, it’s binary and so each node has a left and right child, and it’s used for searching
BSTs have one advantage over sorted arrays: insertion and deletion can be faster on a BST
To go from binary tree to BST, one needs to add a constraint on the data stored in the nodes
- For any node N that stores a value v, all nodes in the left subtree of N will have values less than or equal to v, and all nodes in the right subtree of N will have values greater than v
Out of the operations defined on a BST, the delete and traversal operation are the most complex operations
For a BST, there are three ways to traverse it
- Pre-order, where we visit each node before its subtrees
- Post-order, where we visit the subtrees of a node before visiting it
- In-Order, where, given a node N, we first visit its left subtree, then N, then its right subtree
Finding a predecessor and successor are trivial operations in linked lists and arrays. However these operations are complicated in a BST
Deletions can make a BST skewed and that affects the order of the algo. It might become close to O(n)
There is a need for tree balancing because if we perform a lot of deletions on the tree, we get a skewed tree
A BST is said to be height-balanced if for any node, the difference in height between its left and right subtrees is at most 1, and both of its subtrees are also balanced
The most used balanced search trees are read-black trees and 2-3 trees.
With a balanced binary search tree (BBST) operations such as search, insertion and deletion can be performed on the tree in O(log(n)) time.

Dictionaries and Hash Tables

Common use case for dictionaries is to remove duplicates from a collection.
Sorting an array and removing duplicates from the array is O(nlog(n))
ADT for dictionaries is
- To insert a new value
- To retrieve the value associated with a key
- To delete a value
Whether we use an array or linked list, sorted array or sorted linked list, one usually comes across O(n) either in insertion or deletion or search
if one uses a BST the order id O(log(n))
Hash tables are a way to leverage the advantage of array based storage, constant
One can simply take a base 256 representation of the key and then store it in the index associated with the base 256 representation. This leads to storing data in sparse arrays and becomes terribly inefficient in many practical examples. What is needed is a way to create an index of a reasonably small sized array, as compared to the values keys can take
Using direct access table for storing associative data structures, leads to massive waste of memory
Array used to store elements indexed by hashing is called a hash table. Hashing can describe the entire process of getting from an object to its index.
Hash function can take the entire object as input and return a valid index
Requirements of a hash function
- The domain of a hash function must be the set of all possible keys.
- A hash function must return a valid index
Division and Multiplication methods can be used to generate indices based on keys.
There are two types of conflict resolution mechanisms that are popular
- Chaining: Add the elements as a linked list to a specific index
- Open addressing: Probing all the cells until one either find what we are looking for or we have probed all cells
If the hash function used is deterministic or easily guessed by an attacker, it is possible to design a sequence of keys that will cause the hash table to perform very poorly. This originated a vulnerability in servers written in several programming languages

Graphs

A graph is a pair of sets:
- a set of vertices: entities that are independent of each other and unique
- a set of edges: an edge is identified by a pair of vertices. The first one is called the source vertex, and the second one is called the destination vertex
An edge whose source and destination is the same is called a loop
Simple graphs are graphs without loops, with at most one edge between any two vertices
Multigraphs can have any number of edges between two given vertices
An edge can have a numerical value associated with it. Such a value is called a weight and the edge is called a weighted edge
A graph is sparse if the number of edges is relatively small
A graph is dense if the number of edges is close to maximum possible
In a directed graph, an edge have a direction. In an undirected graph, an edge can be traversed in both directions
A connected component is a subgraph where all vertices are connected
A directed graph is weakly connected if the undirected graph obtained by replacing directed edges with undirected ones is connected
Two vertices are strongly connected if there is at least one path in the graph that goes from source to destination and one that goes from destination to the source
Graphs are implemented as an adjacency list or adjacency matrix
Breadth first algorithm explores the graph starting from a start vertex s, by expanding a frontier of vertices connected to s until it reaches the target vertex. One can use queue data structure. It is about finding shortest path between a source node and destination node in a graph
If the edges are weighted, one uses a refined version of the algo called Dijkstra’s algorithm
Depth first search algo is another find of algo popular in the graph algorithms space where there is a need to find connected and strongly connected component, understand if a directed graph is acyclic, and find a topological sorting for DAGs
DFS can traverse all edges and visit all vertices of a graph, so its running time is O(n+m)

Takeaway

“Grokking” book series are fun books to read as authors make significant efforts to distill complex subjects in to visuals and text that can be understood by all. For any one looking to have a high level overview of the most popular data structures used in the computing world, this book doesn’t disappoint. Most of the data structures are explained using python and hence makes it accessible to a larger audience. The author has also written an advanced book on data structures for those who want to dig deeper.

Grokking Data Structures - Book Review

Contents

Introducing data structures

Static Arrays

Sorted Arrays

Big-O notation

Dynamic arrays

Shrink Operation:

Worst-case performance:

Average-case performance:

Amortized-case performance:

Delete Operation:

Worst-case performance:

Average-case performance:

Amortized-case performance:

Linked Lists

Bags

Hierarchy

Real-world analogies illustrating ADT,DS

Containers

Stacks

Quick Primer on cProfile and pstats in Python

What is cProfile

Key Columns in Profiling Output

How to Use

What is pstats

Common Methods in pstats

Example Workflow

Typical Analysis Goal

Queues

Priority Queues and Heaps

Binary search trees

Dictionaries and Hash Tables

Graphs

Takeaway