2023-01-11

Machine Learning for Big Data using PySpark with real-world projects

10 mins read Introduction I have prepared a GitHub Repository that provides a set of self-study tutorials on Machine Learning for big data […]
2022-07-14

Setup Apache Spark on a multi-node cluster

12 mins read This article covers basic steps to install and configure Apache Spark Apache Spark 3.1.1 on a multi-node cluster which includes installing spark […]
2022-04-09

Delving into GPT-2 and GPT-3 Language Models

32 mins read This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited an impressive ability to write coherent and passionate […]
2022-03-22

PySpark equivalent methods for Pandas dataframes

8 mins read Pandas is the go-to library for every data scientist. It is essential for every person who wishes to manipulate data […]
2022-03-08

Review of intuitions behind the recent advances in NLP: From RNNs to Transformers and BERT

48 mins read Few areas of AI are more exciting than NLP right now. In recent years language models (LM), which can perform […]
2022-02-22

Hyperparameter optimization techniques in machine learning with Python code

10 mins read In every Machine Learning project, it is possible and recommended to search the hyperparameter space to get the best performance […]
2022-02-18

A guide on PySpark Window Functions with Partition By

11 mins read Pyspark window functions are useful when you want to examine relationships within groups of data rather than between groups of […]
2022-02-17

Setting up a multi-node Apache Spark Cluster on a local Windows machine with Virtual Box

6 mins read Prerequisite Understand how to install Ubuntu inside Windows using Oracle VM VirtualBox from this Link Apache Spark is a fast and […]
2021-07-08

What is Word2vec word embedding?

24 mins read I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever […]
2021-06-21

Data Science and Machine Learning Cheat Sheets

5 mins read Click on the links to get the high-resolution cheat sheets. Algebra Linear Algebra Calculus Probability Statistics Python R Machine Learning […]
2019-07-17

A quick review of Apache Kafka

27 mins read Introduction Kafka is a word that gets heard a lot nowadays. A lot of leading digital companies seem to use it. […]