**Problem Statement**

A sport hosting company would like to decide to host a cricket match between India and South Africa based on whether data. Weather data that is available has attributes like outlook, temperature, humidity and wind. And has a decision variable how many hours were played. …

As of today, FastAPI is the most popular web framework for building microservices with python 3.6+ versions. By deploying machine learning models as microservice-based architecture, we make code components re-usable, highly maintained, ease of testing, and of-course the quick response time. FastAPI is built over ASGI (Asynchronous Server Gateway Interface)…

Named Entity Recognition is the most important, or I would say, the starting step in Information Retrieval. Information Retrieval is the technique to extract important and useful information from unstructured raw text documents. Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into…

“ **spaCy” **is designed specifically for **production use**. It helps you build applications that process and “understand” large volumes of text. It can be used to build **information extraction** or **natural language understanding** systems or to pre-process text for **deep learning**. …

Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.

As the name suggests, Conditional Probability is the probability of an event under some given condition. And based on the condition our sample space reduces to the conditional element.

For example, find the probability of a person subscribing for the insurance given that he has taken the house loan. …

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational…

Logistic regression is the most widely used machine learning algorithm for classification problems. In its original form, it is used for binary classification problem which has only two classes to predict. However, with little extension and some human brain, logistic regression can easily be used for a multi-class classification problem…

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily mean to improve the model performance, however, it is used to simplify the…

There are different questions related to Multicollinearity as below:

- What is Multicollinearity?
- How Multicollinearity is related to correlation?
- Problems with Multicollinearity.
- Best way to detect multicollinearity in the model.
- How to handle/remove Multicollinearity from the model?

We will try to understand each of the questions in this post one by…

Lead Data Scientist. AI Content Creator. Blog: www.ashutoshtripathi.com YouTube: https://www.youtube.com/c/AshutoshTripathi_AI