Aditya Gupta, Himanshu Sharma, Priyanshi Singh, Sanchita Paul.

Nearly 3,000 years ago, the philosopher-mystic Pythagoras claimed that everything can be expressed in numbers. At that time, no one understood him. Today, we are witnessing a digital breakthrough in which machines analyze large amounts of data on decisions made by people…

Light GBM is another version of Gradient Boosting. The light stands for lighter version which makes it faster and more accurate.

- It is a fast, distributed, high performing GBM framework based on Decision Trees.

2. It splits the tree leaf wise (other frameworks like XGBoost split level wise)

Natural Language Processing is getting to be one of the most popular techniques that is being used. It is a technique used to program computers to understand, process and analyze huge amounts of data that is in human natural language such as text, speech, etc.

Suppose let’s take for example…

What is Hierarchical Clustering? Well by definition it is an unsupervised method of creating similar groups from top to bottom or bottom to top.

There are two major types of this clustering: Agglomerative and Divisive.

I will be trying to explain the first type of clustering first and then the…

Before we dive into what Regularized Regression, let us understand what high bias and overfitting means.

Let us take a simple example: What will happen if we ask Sachin Tendulkar (Indian cricketer) to solve a data science problem? He will be unable to do it right? …

Before we move to checking the assumptions let us first understand why do we need to need to check for assumptions before fitting a model.

Why do we do this? You don’t need the assumptions for having a best fit line. But your parameters maybe biased or have high variance…

**What is Machine Learning?**

Machine Learning is the study of computer algorithms that improve automatically through experience and by the use of data. …

Before we begin with A/B testing, it is of the utmost importance that we understand Hypothesis testing first.

In simple words hypothesis is an idea that can be tested, and in hypothesis testing we are trying to reject the *STATUS QUO*, thereby trying to find an alternative change or innovation

…

We know that descriptive statistics provide information about our immediate group of data.

For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students.

Any group of data like this, which…

**Random Variables**

A random variable is a numerical description of the outcome of a statistical experiment.

**Types of Random Variable**

Random variables whose set of values can be written either as a finite sequence x1,x2….xn, or as an infinite sequence x1 are said to be discrete. …