This blog post is about a book titled “Grokking Algorithms” written by Aditya Bhargava

Introduction to algorithms

  • Binary search is a lot faster than simple search as your array gets bigger
  • O(logn) is faster than O(n) as it gets a lot faster once the list of items you are searching through grows
  • Algorithm speed isn’t measured in seconds
  • Algorithm times are measured in terms of growth of an algorithm
  • Algorithm times are written in big O notation
  • Common big O run times
    • log time
    • linear time
    • log linear time
    • quadratic time
    • factorial time

Selection sort

  • Your computer’s memory is like a giant set of drawers
  • When you want to store multiple elements, use an array or a linked list
  • With an array, all your elements are stored right next to each other
  • Arrays allow fast reads
  • Linked lists allow fast inserts and deletes
  • With a linked list, elements are strewn all over, and one element stores the address of the next one.

Recursion

  • There is a base case and recursive case
  • Recursive is when a function calls itself
  • Every recursive function has two cases; the base case and the recursive case
  • A stack has two operations: push and pop
  • All function calls go onto the call stack
  • The call stack can get very large, which takes up a lot of memory

Quicksort

  • Divide and Conquer is a programming technique that can be used for a large set of problems
  • Identify a pivot, partition the array based on pivot and use recursion
  • Divide and Conquer works by breaking down into smaller and smaller pieces.
  • Given two algorithms with the same big O running time, one can be consistently faster than the other.

Hash tables

  • A hash function needs to be consistent, and should map to different numbers
  • Use cases
    • hash tables for lookups
    • preventing duplicate lookups
    • use hash tables as a cache
  • There will be collisions for any hash function you choose and hence one has to have strategies to handle collisions
  • Hash function and array is called a hash table. Load factor should be small
  • Hash tables are good for modeling relationships from one item to another

Breadth-first search

  • Graph models a set of connections. Each graph is made up of nodes and edges. Node are called in-neighbors or out-neighbors if the graph is acyclic
  • Operations on a queue are called enqueue and dequeue. They are also called as append and popleft if you are using python lingo
  • One can use the Python dictionary to store adjacent nodes
  • Topological sort is a linear ordering of vertices in a DAG such that for every directed edge u,v vertex u appears before v in the ordering
  • Queues are LIFO
  • Stacks are FIFO
  • Undirected graphs don’t have arrows and the relationship goes both ways
  • Directed graph has arrows, and the relationship follows the direction of the arrow
  • Breadth-first search tells you if there’s a path from A to B
  • If there’s a path, breadth-first search will find the shortest path

Trees

  • Trees are a type of graph
  • One can implement searching files in a directory using breadth first or depth first
  • Tree is a connected, acyclic graph
  • Binary tree is a special type of tree where nodes can have at most two children.
  • Use cases of Binary trees: Huffman coding

Balanced Trees

  • Balanced Trees achieves a performance that is better than array for inserts and better than linked lists for search
  • The height of a tree affects its performance
  • AVL trees are a type of self-balanced BST
  • Splay trees are a different take on balanced BSTs. The cool thing about splay trees is if you have recently looked up an item, the next time you look it up, the lookup will be faster
  • B-trees are a generalized form of binary tree. They are often used for building databases
  • Seek time is like travel time to a grocery store. B-trees try to minimize seek time by reading more data in one go
  • B-trees have nodes that can have multiple keys and multiple child nodes
  • AVL trees are a popular type of balanced BST. Like most balanced trees, AVL trees balance themselves through rotation

Dijkstra’s algorithm

  • Breadth-first search is used to calculate the shortest path for an unweighted graph
  • Dijkstra’s algorithm is used to calculate the shortest path for a weighted graph
  • Dijkstra’s algorithm works when all the weights are nonnegative
  • If you have negative weights, use the Bellman-Ford algorithm
  • Think it is better one should implement it in whatever language that one is comfortable before peeking in to the implementation code given by the author

Greedy Algorithms

  • At each step, you pick the locally optimal solution, and in the end, you are left with locally optimal solution
  • Knapsack problem and way to solve it via greedy algo
  • Set covering problem - There is no known algorithm that solves it fast enough
  • Greedy algorithms are called approximation algorithms
  • Greedy algorithms are easy to write and fast to run, so they make good approximation algorithms
  • if you have an NP-hard problem, your best bet is to use an approximation algo

Dynamic Programming

This chapter starts with the poster child of DP problems - knapsack problem. It goes on to visually explain the problem can be solved via DP. Along the way of solving the problem with the help of a grid, the author shows that

  • changing the order of rows in DP problem does not change the solution
  • filling in grid column-wise instead of row-wise does not change the solution

There is also a section on longest common substring. Frankly I think I have forgotten most of the aspects of DP and may be I will revisit this chapter later. DP at the heart of it has recursion in it. I think there are many problems that can be solved once you identify certain patterns in DP. It was a while ago that I did DP. May be one of these days, I should revisit DP as those skills seem very rusty.

k-nearest neighbors

  • KNN algorithm for regression and classification
  • Quick intro to ML
  • Feature extraction means converting an item in to a list of numbers that can be compared
  • Picking good features is an important part of successful KNN algo

Where to go next

A bunch of algos are covered at a high level urging the reader to follow their curiosities

  • Linear regression
  • Inverted indexes
  • Fourier Transform
  • Parallel algorithms
    • map reduce
  • Bloom filters and HyperLogLog
  • Cryptography algo
  • Locally sensitive hashing
  • Min heaps and priority queues
  • Linear Programming

Takeaway

First edition of this book came out a decade ago and I had read this book back then. The second edition expands on trees and other topics but the spirit of the book remains the same - explaining beginner algorithms with visuals. One would not be an expert after reading this book but one would certainly get an intuitive understanding of some of the common algos used in Computer Science. With this primer, one can then explore one’s curiosities and go deep in to any of the algo topics mentioned in the book. Have read it second time after a decade and still feels that I have walked away with fresh eyes. Definitely worth a read for anyone who is curious about algos or who wants to understand how one could use visuals to explain algos.