Tech👨‍💻

Microsoft Python Programming Interview Question

I want to discuss an interview question that was asked in an interview at Microsoft during my interview for the poistion of a Research Fellow( RF ). The question primary deals with python programming knowledge and it requires understanding of generators. In this post I will discuss the programming topics that are required to solve the question and finally give a solutioin to the question. The question is pretty open ended and I was asked to write the code in a google doc....

Forest in Latex

I am working on one of my research paper and came acorss this cool Latex Package called Forest 🌳. The documentation can be found here Here is a cool example from this paper Forest that summarizes a survey work, used as an example You can produce a Forest like this from the code below. Note that bib file is missing so the code wont run, but you get the idea💡...

Transformer Activation Functions and their Details

Here are a few observations: GPT-2, developed by OpenAI, opts for the GELU (Gaussian Error Linear Unit) activation function On the other hand, LLaMA, a creation of Facebook Research, embraces SwiGLU activation function. Meanwhile, Gemma, a PyTorch implementation by Google, adopts GeGLU activation functions. So what are these new activation functions ? How should one go about implementing them in pytorch ? In this blog post I try to understand the definitions of these activation functions and how they could be implemented in pytorch....

Understanding Andrejs Tokenizer Video

This blog is based on Andrejs video on Tokenizer📽️. In an interview with Lex Fridman, Andrej said, he takes 10 hrs to make 1 hr of content, this is me trying to decode the rest of the 9 hours. In this I wont be concentrating on the python implementation of BPE algorithm itself. The idea is to uncover the details and subtle points around the video and look closely in this very dense video lecture by Andrej....

FlashAttention: Before flash

In this blog I delve into the intricate mathematical subtilities that abound in Flash Attention I recently came across. The primary aim is to unravel these mathematical complexities, offering readers a key to unlocking a deeper comprehension of the complete paper. The focus of this blog remains exclusively on the mathematical nuances, tailored to resonate with those who possess a keen mathematical acumen. Flash Attention Transformers rely on a core operation called Attention Calculation....