Microservice imlementation using FastAPI | Ashutosh Tripathi | Data Science Duniya

As of today, FastAPI is the most popular web framework for building microservices with python 3.6+ versions. By deploying machine learning models as microservice-based architecture, we make code components re-usable, highly maintained, ease of testing, and of-course the quick response time. FastAPI is built over ASGI (Asynchronous Server Gateway Interface) instead of flask’s WSGI (Web Server Gateway Interface). This is the reason it is faster as compared to flask-based APIs.

It has a data validation system that can detect any invalid data type at the runtime and returns the reason for bad inputs to the user in the JSON format…

Text Processing using spaCy | NLP Library

Named Entity Recognition is the most important, or I would say, the starting step in Information Retrieval. Information Retrieval is the technique to extract important and useful information from unstructured raw text documents. Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into the standard categories such as person names, locations, organizations, time expressions, quantities, monetary values, percentage, codes etc. Spacy comes with an extremely fast statistical entity recognition system that assigns labels to contiguous spans of tokens.

Spacy Installation and Basic Operations | NLP Text Processing Library | Part 1

Spacy provides…

Text Preprocessing steps using spaCy, the NLP library

spaCy” is designed specifically for production use. It helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems or to pre-process text for deep learning. In this article, you will learn about Tokenization, Lemmatization, Stop Words and Phrase Matching operations using spaCy.

This is article 2 in the spaCy Series. In my last article, I have explained about spaCy Installation and basic operations. If you are new to this, I would suggest starting from article 1 for a better understanding.

Article 1 — spaCy-installation-and-basic-operations-nlp-text-processing-library/


Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.

In this post, I am concentrating on Bayes’ theorem assuming you have a good understanding of Conditional probability. In case you want to revise your concepts, you may refer my previous post on Conditional probability with examples.

Formula derivation:

From conditional probability, we know that

  • P(A|B) = P(A and B)/P(B)
  • P(A and B) = P(B) * P(A|B)…

As the name suggests, Conditional Probability is the probability of an event under some given condition. And based on the condition our sample space reduces to the conditional element.

For example, find the probability of a person subscribing for the insurance given that he has taken the house loan. Here sample space is restricted to the persons who have taken house loan.

To understand Conditional probability, it is recommended to have an understanding of probability basics like Mutually Exclusive and Independent Events, Joint, Union and Marginal Probabilities and Probability vs Statistics etc. …

Step by Step Explanation of PCA using python with example

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. …

Logistic regression is the most widely used machine learning algorithm for classification problems. In its original form, it is used for binary classification problem which has only two classes to predict. However, with little extension and some human brain, logistic regression can easily be used for a multi-class classification problem. In this post, I will be explaining about binary classification. I will also explain the reason behind maximizing the log-likelihood function.

To understand logistic regression, it is required to have a good understanding of linear regression concepts and it’s cost function that is nothing but the minimization of the sum…

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily mean to improve the model performance, however, it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

If we are given two models then we will prefer the model with lower AIC value. Hence we can say that AIC provides a means…

There are different questions related to Multicollinearity as below:

  • What is Multicollinearity?
  • How Multicollinearity is related to correlation?
  • Problems with Multicollinearity.
  • Best way to detect multicollinearity in the model.
  • How to handle/remove Multicollinearity from the model?

We will try to understand each of the questions in this post one by one.


Multicollinearity occurs in a multilinear model where we have more than one predictor variables. So Multicollinearity exists when we can linearly predict one predictor variable (note not the target variable) from other predictor variables with a significant degree of accuracy. It means two or more predictor variables are highly…

Feature selection is a way to reduce the number of features and hence reduce the computational complexity of the model. Many times feature selection becomes very useful to overcome with overfitting problem. It helps us in determining the smallest set of features that are needed to predict the response variable with high accuracy. if we ask the model, does adding new features, necessarily increase the model performance significantly? if not then why to add those new features which are only going to increase model complexity.

So now let's understand how can we select the important set of features out of…

Ashutosh Tripathi

Lead Data Scientist. AI Content Creator. Blog: www.ashutoshtripathi.com YouTube: https://www.youtube.com/c/AshutoshTripathi_AI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store