Finding outliers is technical, but understanding outliers is a mix of skill and art. As data scientists, we always tend to “remove” the outliers because it messes up a predictive model — and this practice has led to the wrong conclusion that an outlier is something not good and must be removed
Instead, the focus should be on understanding the outliers. And in this story, I will explain to you some techniques on how to better understand outlier data points.
We will take a dataset of cars which has columns such as make, fuel-type, aspiration, num-of-doors, price, etc.
This article is my new year greeting to you. I had far simpler alternatives to wish you an excellent year using simple text messages with nice words or copy-paste a new year greeting image.
However, I want to start the new year with a lot of optimism and euphoria. So here is how I have gone about
First of all, I searched for all keywords related to new year's wishes. I came up with the list shown here.
It is organized as follows
Sankey charts have become one of the important visualisation techniques in recent time for advanced analytics. It has both characteristics of any awesome visualisation — 1. It can look visually stunning 2. It gives very useful insights
However visualisation makes sense only if it is used in a certain context and purpose. For example, using bar chart to show sales trend is not as effective as using trend charts. Similarly a scatter plot does not make sense if data does not have enough variance
So here I would like to state the use-cases where Sankey charts makes sense.
When Sankey diagram originated in 1898, its main purpose was to show a flow. The chart itself is named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 in a classic figure shown below which shows the energy efficiency of a steam…
The top domain in which advanced analytics and data science is used is in understanding customers. And it is far more superior compared to any other domain — supply chain, IoT, finance etc.. And the reason is obvious — the success of your business depends upon how well you understand your customers. All other things will fall in place , once you have started to understand your customer in an effective way
Customer analytics is a very wide area. As well as every industry will analyse customers in a different way. …
Data driven means that your decision are driven by data and not by emotions. This approach can be very useful in stock market investment. Here is a summary of a data driven approach which I have been taking recently
Being a data scientist, the one thing which I do not like about existing stock market tools is that the analysis is done in a very crude way. The analysis has still to be done for each stock symbol. Something like this
Choose a stock, Analyse it, decide on buy/sell
Go to second stock, Analyse it, decide on buy/sell
Go to third stock, Analyse it, decide on…
Let us start with a small quiz
Question 1. What makes you satisfied
Option A. Complex data science code which can detect lung cancer with high accuracy
Option B. People life expectancy increases due to machine learning prediction capability
Question 2. How does your blogs and articles look like
Option A. There is lot of code in your blogs. You focus on technical explanation of how an algorithm or a data science technique works
Option B. Your blogs do not contain code. You focus on the usefulness and purpose of data science techniques
Question 3. How do you convince your audience in a…
The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take the example of Uber Estimated Time of Arrival (ETA) algorithm which informs the user when the ride is expected to arrive.
Behind the ETA , there is a lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this is of no use without the single text line which says “The closest driver is approximately 1 min away.”
A data scientist or data analyst produces lots of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its value using short textual explanations. …
Here is a very abstract question — What does an AI or data science model look like? We are all using data science models in our day to day life. Most people that aren’t data scientists have experienced a data science model but have never seen one. So, let me reveal the secret. It may look scary. Here is what a data science model looks like
Teachers and educational institutes deal with a lot of data related to student marks. In this story how they can use data science and advanced analytics to get insights into student marks data.
For this tutorial, we assume that an English teacher, who has a class of 26 students would like to
The first step is to get data. The teacher collects all score of students in excel. …
We all have faced the anxiety of looking at raw data and thinking what to do next. Though the data science algorithms are well-established, how to proceed from raw data to developing insights still remains a craft.
So how can one structure an art ? One of things which can be done is to develop some kind of list or building blocks. Take for example English language. The building blocks are alphabets A, B, C etc… It is with this basic building blocks of alphabets that we are able to build beautiful words
So in this article I make an attempt to list most effective data exploration techniques. This list is no means any exhaustive list, but my attempt here is to bring some structure to the art of data…