Advanced RAG

Andrew Ng and Team offer a variety of short courses at Deeplearning.AI , one of which is focused on ways to improve the semantic search component of RAG. The default approach is to search for semantically similar vectors to the user query within a vector database. The course discusses the limitations of this approach and explores several sophisticated techniques to enhance the semantic search of a query within a vector database.

EU Regulation Feeds

A week ago, I had to work with Green revenues data and ESG related datasets. Realized that the terminology used in the feeds is something that I had never paid attention to. Managed to listen to a few webinars on EU Regulation and data feeds to understand stuff happening in this space. Here are some of my learnings:

EU Taxonomy - Clarity with Data(Webinar)

EU Taxonomy has become quite important to many companies in the EU region. For someone like me who has never paid attention to this, this LSEG seminar was very useful to get a basic understanding of the importance of where EU taxonomy in the current financial environment.

Record Linkage Primer

In the last few weeks, I had to work on matching a large number of records relating to various companies with an internal feature rich dataset for 10 million companies. Needless to say, there were no readily available standardized identifiers across the two databases the one could perform a join operation. The record matching had to be based on approximate matching and probabilistic matching. Until this piece of work came along my way, I had never heard of “Record Matching” as a subject in itself where people do PhDs in.

Immersed myself in a quite a few papers, books and blogposts to understand this field just enough so that I can get my work done. In the process I found several interesting talks, books, decks, papers. Hopefully in the days to come, I will try to summarize a few papers and books. This post summarizes a series of posts written by Robin Linacre, who works in Ministry of Justice UK.

Mastering Shiny - Book Review

shiny is my goto package for building interactive dashboard. I have built more than fifty shiny apps so far and have found the entire package infrastructure around it to be extremely useful in showcasing data, algos, metrics - you name it. This book by Hadley Wickham came out in 2021 but I never had a chance to go over it, until now. It was wonderful to see so many code patterns/hacks that I have learned over years appearing in the book. Needless to say, I have learned so many fascinating aspects of shiny from this book. This blogpost summarizes some of my learnings/relearnings from the book:

Bubble Writing

Bubble writing is a way of writing where letters look bloated and puffy, like bubbles. This type of writing is popular in poster design, visual story telling and related domains. Had never seen a full fledged book written using bubble writing in the non-fiction genre, until I stumbled on to a book by Veronica Dearly

Singapore FinTech Festival 2022 Notes

Singapore FinTech Festival is a great festival to attend and get an understanding of various aspects of the intersection of Finance and Technology space. The festival is a great learning experience for anyone, as it brings together some of the best people and the best companies in the world, all at once place. This year, it was an in person event with 324 talks, panel discussions, industry initiatives. product announcements, demos, workshops. The fact that there were 324 scheduled events over 3 days meant that it was physically impossible to digest everything. I have managed to attend a few talks and events and this blogpost will summarize some of the points from the talks. In these summaries, I have also added fantastic visual summaries created by thoth.art

JavaScript - 10,000 ft. view

JavaScript ranks as one of the most popular languages for developers across the world. With the rise of Internet and mobile devices, JavaScript has evolved too. For some reason, till date, I had never paid attention or understood how various frameworks work at a 10,000 ft view. Since I never really worked or learned this space, my understanding was really vague; In my mind, JavaScript equates to some script that helps with interactivity on the browser side.

Became curious to at least understand the various types of frameworks popular in this space and here is what I have understood as a rookie:

Effective Python - Book Review

A few months ago, I had this thought of practicing Python every day for 20 minutes. If you use Python in your daily work, you should not rely on that work a substitute for a deliberate practice session. This was also echoed by Josh Kaufman in his book, The First Twenty Hours, where he could not rely on daily work that involved typing as a substitute for a deliberate practice session on touchtyping. If you are trying to learn touch typing, you might assume that since you are anyway typing emails, reports etc, you are in essence doing deliberate practice. Not really. Once you are in a deliberate practice session, your focus become the craft itself unlike the outcome of the specific task. Unless you set aside some time for the task on a regular basis, it is difficult to improve in any skill, be it touch typing or coding python.

In any case, setting aside a 20 min time slot for going through the book, “Effective Python” , helped me in reading this book slowly and digest all the wonderful information present in it. In any case, this book cannot be consumed in a few sittings. It will take quite amount of time to read, to think and understand various ways in which one could improve the craft of coding

This blogpost summarizes some of the main points from the book.

Python Concurrency with asyncio

A week ago, I was working on a project that involved calling a REST API end point 32 million times to retrieve certain type of documents. The input to the API was a presigned URL that had a validity of few days. Hence I did not have the luxury of doing things in sequential manner. A rough calculation for the time taken to perform the task using a simple for loop made me realize that the task is a nice little use case for parallelizing. That’s when I started looking at asyncio. In the first go at my task, I ventured along with a standard approach of using multithreading functions in python. However there was always an itch to see if I could get better performance using ayncio and multithreading. The book titled “Python Concurrency with asyncio” written by “Matthew Fowler” helped me understand the basics of concurrent and parallel computing with asyncio. Subsequently I went back and performed the task of pinging an API 32 million times to retrieve 32 million json documents using asyncio and multithreading. In this post, I will summarize a few chapters that I found it useful to get my work done.