Implementing Transformers step-by-step in PyTorch from scratch

14 mins read Doing away with clunky for-loops, the transformer instead finds a way to allow whole sentences to simultaneously enter the network […]

Delving into GPT-2 and GPT-3 Language Models

32 mins read This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited an impressive ability to write coherent and passionate […]

A comprehensive tutorial on Transformers Architecture

43 mins read We’ve been hearing a lot about Transformers and with good reason. They have taken the world of NLP by storm […]

A complete guide to writing custom Datasets and DataLoader in PyTorch

19 mins read Table of Contents An Introduction To PyTorch Dataset and DataLoaderWhy Write Good Data Loaders and Datasets?The Basic PyTorch Dataset StructureImplementing […]

Review of intuitions behind the recent advances in NLP: From RNNs to Transformers and BERT

48 mins read Few areas of AI are more exciting than NLP right now. In recent years language models (LM), which can perform […]

Understanding 1D, 2D, and 3D convolutional layers in deep neural networks

21 mins read In deep learning, convolutional layers have been major building blocks in many deep neural networks. The design was inspired by […]

Understanding Attention Mechanism with example

14 mins read For decades, Statistical Machine Translation has been the dominant translation model, until the birth of Neural Machine Translation (NMT). NMT is an […]

Hyperparameter optimization techniques in machine learning with Python code

10 mins read In every Machine Learning project, it is possible and recommended to search the hyperparameter space to get the best performance […]

A guide to different Cross-Validation methods in Machine Learning

19 mins read In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It […]

Example of Beam search in Sequence to Sequence models

7 mins read In this article, you will get a detailed explanation of how neural machine translation developed using sequence to sequence algorithm […]

An illustrated guide to Attention Mechanism in Sequence Models with PyTorch code

22 mins read In this article, I will be covering the main concepts behind Attention, including the implementation of a sequence-to-sequence Attention model, […]

Why does LASSO regression (L1 regularization) shrink coefficients to zero but not the Ridge?

11 mins read We often read almost everywhere that Lasso regression encourages zero coefficient and hence provides a great tool for variable selection as well but it […]

Bahdanau and Luong Attention Mechanisms explained

11 mins read Conventional encoder-decoder architectures for machine translation encoded every source sentence into a fixed-length vector, irrespective of its length, from which […]

The BERT Model

17 mins read The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing […]

Using BERT for Sentence Sentiment Classification

11 mins read Progress has been rapidly accelerating in machine learning models that process language over the last couple of years. This progress […]

Seq2Seq models, Attention Mechanism, and Transformers Explained

29 mins read Sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, […]

A review of techniques for Time Series prediction

43 mins read Working with time series data? Here’s a guide for you. In this article, you will learn how to compare and […]

Deep Reinforcement Learning: Using policy-based methods to play Pong from pixels

34 mins read This is a long-overdue blog post on Reinforcement Learning (RL). RL is hot! You may have noticed that computers can […]

Understanding Attention Mechanism in Sequence 2 Sequence Machine Translation

39 mins read Introduction Recurrent Neural Networks (or more precisely LSTM/GRU) have been found to be very effective in solving complex sequence-related problems […]

Automatic Differentiation Explained

8 mins read Introduction There are several methods to calculate gradients in computer programs: (1) Manual differentiation; (2) Symbolic differentiation; (3) Finite differences […]