Understanding an outlier is more important than finding it

Image for post
Image for post
Understanding Outliers — Image by author

Finding outliers is technical, but understanding outliers is a mix of skill and art. As data scientists, we always tend to “remove” the outliers because it messes up a predictive model — and this practice has led to the wrong conclusion that an outlier is something not good and must be removed

Instead, the focus should be on understanding the outliers. And in this story, I will explain to you some techniques on how to better understand outlier data points.

We will take a dataset of cars which has columns such as make, fuel-type, aspiration, num-of-doors, price, etc.


Being euphoric right from 1st Jan

This article is my new year greeting to you. I had far simpler alternatives to wish you an excellent year using simple text messages with nice words or copy-paste a new year greeting image.

However, I want to start the new year with a lot of optimism and euphoria. So here is how I have gone about

Collecting data related to new year wishes

First of all, I searched for all keywords related to new year's wishes. I came up with the list shown here.

Image for post
Image for post
Greetings data — Image by author

It is organized as follows

  • The first column is a category column such as Health, Family, Professional, Prosperity, etc.
  • The second column has the greeting in a related…


From understanding flow to a quick trick to replace machine learning

Image for post
Image for post
Photo by Solen Feyissa on Unsplash + Image by Author

Sankey charts have become one of the important visualisation techniques in recent time for advanced analytics. It has both characteristics of any awesome visualisation — 1. It can look visually stunning 2. It gives very useful insights

However visualisation makes sense only if it is used in a certain context and purpose. For example, using bar chart to show sales trend is not as effective as using trend charts. Similarly a scatter plot does not make sense if data does not have enough variance

So here I would like to state the use-cases where Sankey charts makes sense.

Analysing flow

When Sankey diagram originated in 1898, its main purpose was to show a flow. The chart itself is named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 in a classic figure shown below which shows the energy efficiency of a steam…


Top Use-cases for customer analytics and how to do them

Image for post

The top domain in which advanced analytics and data science is used is in understanding customers. And it is far more superior compared to any other domain — supply chain, IoT, finance etc.. And the reason is obvious — the success of your business depends upon how well you understand your customers. All other things will fall in place , once you have started to understand your customer in an effective way

Customer analytics is a very wide area. As well as every industry will analyse customers in a different way. …


Let data do the talking and leave bias and emotions out of the stock game

Image for post
Image for post

Data driven means that your decision are driven by data and not by emotions. This approach can be very useful in stock market investment. Here is a summary of a data driven approach which I have been taking recently

Stop analysing stock by stock

Being a data scientist, the one thing which I do not like about existing stock market tools is that the analysis is done in a very crude way. The analysis has still to be done for each stock symbol. Something like this

Choose a stock, Analyse it, decide on buy/sell

Go to second stock, Analyse it, decide on buy/sell

Go to third stock, Analyse it, decide on…


How data scientist can go beyond coding and elevate themselves

Image for post
Image for post
Photo by Mario Azzi on Unsplash

Let us start with a small quiz

Question 1. What makes you satisfied

Option A. Complex data science code which can detect lung cancer with high accuracy

Option B. People life expectancy increases due to machine learning prediction capability

Question 2. How does your blogs and articles look like

Option A. There is lot of code in your blogs. You focus on technical explanation of how an algorithm or a data science technique works

Option B. Your blogs do not contain code. You focus on the usefulness and purpose of data science techniques

Question 3. How do you convince your audience in a…


Enhance the power of your data exploration using textual explanations

The popular saying “A picture is worth a thousand words” may be wrong when it comes to data science. Take the example of Uber Estimated Time of Arrival (ETA) algorithm which informs the user when the ride is expected to arrive.

Behind the ETA , there is a lot of complex predictive algorithm and cutting-edge visualisation with the map getting updated in real time. But all this is of no use without the single text line which says “The closest driver is approximately 1 min away.”

Image for post
Image for post
Uber Estimated Time of Arrival (ETA) algorithm in action

A data scientist or data analyst produces lots of data visualisation during a data exploration phase. All the cool visualisations look great, but you can really enhance its value using short textual explanations. …


Why treating models like data is a very strategic approach

Image for post
Image for post
Photo by Alexander Sinn on Unsplash

Here is a very abstract question — What does an AI or data science model look like? We are all using data science models in our day to day life. Most people that aren’t data scientists have experienced a data science model but have never seen one. So, let me reveal the secret. It may look scary. Here is what a data science model looks like


Code-free way for teachers to better understand students marks

Image for post
Image for post
Photo by Science in HD on Unsplash

Teachers and educational institutes deal with a lot of data related to student marks. In this story how they can use data science and advanced analytics to get insights into student marks data.

Context

For this tutorial, we assume that an English teacher, who has a class of 26 students would like to

  • Better understand student marks
  • See if performance in one subject can impact other
  • Compare marks by gender
  • Compare marks by student

Getting the Data

The first step is to get data. The teacher collects all score of students in excel. …


Structuring the art of data exploration

Image for post
Image for post
Photo by h heyerlein on Unsplash

We all have faced the anxiety of looking at raw data and thinking what to do next. Though the data science algorithms are well-established, how to proceed from raw data to developing insights still remains a craft.

So how can one structure an art ? One of things which can be done is to develop some kind of list or building blocks. Take for example English language. The building blocks are alphabets A, B, C etc… It is with this basic building blocks of alphabets that we are able to build beautiful words

So in this article I make an attempt to list most effective data exploration techniques. This list is no means any exhaustive list, but my attempt here is to bring some structure to the art of data…

About

Pranay Dave

Data to Insights

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store