2022-08-01

Audio source separation (vocal remover) system based on Deep Learning

12 mins read Table of Contents: Introduction Are you looking for that instrumental version of your favorite song? Or are you a DJ […]
2022-08-01

A simple tutorial on Sampling Importance and Monte Carlo with Python codes

16 mins read Introduction In this post, I’m going to explain the importance sampling. Importance sampling is an approximation method instead of a […]
2022-03-28

Bulk Boto3 (bulkboto3): Python package for fast and parallel transferring a bulk of files to S3 based on boto3!

5 mins read Table of Contents: Introduction About bulkboto3 Getting Started Prerequisites Installation Usage Contributing Conclusion Introduction “How to transfer a bulk of […]
2021-12-26

Walk-forward optimization for algorithmic trading strategies on cloud architecture

11 mins read Table of Contents: Introduction Terminology Walk-forward Optimization Design of walk-forwards The Architecture Configuring cloud machines using Ansible Docker Swarm Optimization […]
2023-01-11

Machine Learning for Big Data using PySpark with real-world projects

10 mins read Introduction I have prepared a GitHub Repository that provides a set of self-study tutorials on Machine Learning for big data […]
2022-11-26

Coursera Deep Learning Specialization Notes

3 mins read A couple of years ago I completed Deep Learning Specialization taught by AI pioneer Andrew Ng. I found this series […]
2022-11-16

Repository for implementation of statistics concepts for Data Science in Python

3 mins read The field of statistics is becoming increasingly important in the world of data science and machine learning. I have recently […]
2022-10-24

How to return pandas dataframes from Scikit-Learn transformations: New API simplifies data preprocessing

3 mins read Scikit-learn, a popular Python library for machine learning, is often one of the first tools introduced to data science beginners. […]
2022-10-15

Setting up Apache Airflow using Docker-Compose

11 mins read Although being pretty late to the party (Airflow became an Apache Top-Level Project in 2019), I still had trouble finding […]
2022-09-26

Handling cyclical features, such as hours in a day, for machine learning pipelines with Python example

11 mins read What’s the difference between 23 and 1? If we’re talking about time, it’s 2. Hours of the day, days of […]
2022-09-23

Implementing Attention Mechanism in Python

7 mins read The attention mechanism was introduced to improve the performance of the encoder-decoder model for machine translation. The idea behind the […]
2022-09-20

Improvements in Deep Q-Learning with Python code: Dueling Double DQN, Prioritized Experience Replay, and Fixed Q-targets

28 mins read Deep Q-Learning was introduced in 2014. Since then, a lot of improvements have been made. So, today we’ll see four […]
2022-08-30

Setup collaborative MLflow with PostgreSQL as Tracking Server and MinIO as Artifact Store using docker containers

14 mins read In this post, I will show how to configure MLflow in a way that allows multiple data scientists using different […]
2022-08-29

Understanding different types of Scikit Learn Cross Validation methods

14 mins read Cross-validation is an important concept in machine learning which helps the data scientists in two major ways: it can reduce the […]
2022-08-24

Performing A/B test in Python example – A case study from Udacity Data Scientist Nano Degree

11 mins read This is a simple walkthrough of an A/B test case study developed and used by Udacity. It is part of […]
2022-08-18

A guide on regression error metrics (MSE, RMSE, MAE, MAPE, sMAPE, MPE) with Python code

25 mins read Regressions are one of the most commonly used tools in a data scientist’s kit. The quality of a regression model is how […]
2022-08-14

The default Random Forest feature importance is not reliable: Understanding Permutation Feature Importance

47 mins read The scikit-learn Random Forest feature importance and R’s default Random Forest feature importance strategies are biased. To get reliable results […]
2022-08-12

A review on information theory concepts for machine learning: Entropy, Cross-Entropy, KL divergence, Information gain, and Mutual Information

58 mins read Information Theory Information theory is a field of study concerned with quantifying information for communication. It is a subfield of mathematics […]
2022-08-11

A tutorial on Apache Cassandra data modeling – RowKeys, Columns, Keyspaces, Tables, and Keys

24 mins read In this post, I will discuss the basic concepts of data modeling in Apache Cassandra. It is important to understand […]
2022-08-05

Understanding Gradient Boost Regression by numerical examples and Python Code

13 mins read Gradient boost is a machine learning algorithm that works on the ensemble technique called ‘Boosting’. Like other boosting models, Gradient […]
2022-08-02

Measure the correlation between numerical and categorical variables and the correlation between two categorical variables in Python: Chi-Square and ANOVA

27 mins read Data analysis is an essential part of any research or business endeavor, and one of the most fundamental techniques is […]
2022-07-30

What is Reservoir Sampling in Stream Processing?

4 mins read Reservoir sampling is a fascinating algorithm that is especially useful when you have to deal with streaming data, which is […]