Thursday 28 November 2013

Road accidents in India - A Visualization

As a kid I thought a best novelist was one who churned out nerve wracking stories, with every other page replete with cliff hanging moments and plot twists. But as maturity dawned on, I realised the best ones were the good story tellers. One who could bring out a vivid picture in reader’s mind, precisely conveying only the matters they intend to convey.

Drawing a parallel to analytics field, as data scientists mature, they nurture story telling abilities, evolving into good story tellers. The most frequent mistake a newbie would do is spending a lot of time and energy on analysis part, only to haphazardly put together a result without giving much thought on receptiveness of the audience. In current scenario as the tentacles of analytics proliferates into every single department of a business, more and more people who are novice when it comes to the art of data interpretation are suddenly thrust upon with duty of decision making based on data. The onus is now on a data scientist to bring out their story telling skills to convey the story in an easily understandable manner to wide range of audience.
This case study shows how one can leverage data visualization tools in conveying the right story to any range of audience.

Data collected from Government Ministries and Departments are made available at Data Portal India, which is a platform for supporting Open Data initiative of Government of India. Dataset about Road accidents in India is obtained from this data portal and Tableau Public is used to visualize the data.

You can find here an interactive presentation of data which we would use going forward.

Now for an analyst, the foremost issue while presenting any data is - should this be at a granular level presenting every break up of data, the way a veteran end user of the data would prefer. Or should it be at a macroscopic level for the benefit of novice audiences? This is where data visualization tools come to the rescue. For instance Tableau enables us to show the data at macroscopic view, conveying the story at a glance or if needed turns interactive, providing the data at required granularity. Feel free to play with the report by clicking on states on the map and see if they tell a different story from the one at first glance.

The first map shows the distribution of number of accidents in each state. Now what story does it tell you? Ok, northern and eastern states are relatively safer, probably due to unhurried pace of their lives. Larger states have worst traffic record and you start wondering what could be the correlation there. Then you want to dig deeper and start looking at state wise data, ‘Oh oh, wait there’ you say when the realization hit that the numbers here are absolute and merely reflects the population distribution of these states. So it would be fair enough to assume that this view is of less importance and might even be misleading. 

The second view shows the percentage increase in number of accidents in each state over the years (2003 to 2011). Maharashtra that was painted dark red in previous view now sports a blemish-less white canvas. The state had a mere 4% rise in accidents over these 9 years. But when you look at granular state level data even the 4% translates to a mind boggling 3K accidents. To give a perspective this is half the number of accidents in the state of J&K by year 2013. So let’s just blame the base effect (higher denominator) for the flat rate and not look much into those white states. But still, this view combined with the previous one conveys an important story - Uttar Pradesh, inspite of being a huge state, has high rise in number of accidents. 

Finally since the absolute numbers did not make much sense, let’s look at ratio of number of accidents to population in each state (4th tab in the tableau). It should hardly take 3 seconds for anyone to scream ‘Oh see there is GOA’. Viola!! We have a clear outlier and you can guess the reason :)

Now catching the same trends from multiple traditional data table requires a lot of skill that comes only from years of practice. Looking the data laid out on a map rather than in a spreadsheet gives a perspective that is simply unparalleled. Also time taken to discern those patterns, validate the assumptions, compare multiple data sets reduces drastically. This is where data visualization could be of a big value addition and an irreplaceable piece in the armor of a data scientist.