Data Analysis

What is Data Analysis?

Module 1

Module Structure

History of data analysis

This field got its start with capturing and counting census and other demographic data. Later, people figured out that there are better ways of the presentation of information and started tabulating the data.

Then using probabilities, early statisticians estimated the population of cities to plan for the future.

As mathematics advanced, various theories of distribution were formed. One of the most well-known -- and popular still -- theorems is the Bayes law.

Later, statistics as a field really took off. Various sampling methods, least squares methods and hypothesis testing followed soon.

Types of analyses

These are the main types of analyses: descriptive analysis, inferential, exploratory and visual analysis.

Descriptive: summary of data and quick look into the data. Some common measures are: range, mean, mode, standard deviation

Inferential: estimates properties of underlying distribution of the data. Some common tools in this type of analysis: random sampling, hypothesis testing, confidence intervals

Exploratory: helps us uncover insights and visualize statistical properties. Some common techniques are: steam and leaf, box-plots, histograms.

Visual: builds upon exploratory analysis. Some common visualizations are: bar charts, part-to-whole charts, correlations, and geographic.


Methods of analysis

Sometimes you start with a question in mind, other times you are exploring the data to ask the questions nobody has asked before.

Here’s a typical process: keep in mind, however, this is not a linear process. In some projects, you may be jumping, skipping, or combining steps. Even if you explicitly don’t follow these steps, you will get very close to this workflow.

Here are the steps:

You get the desired data sometimes yourself, or sometimes you have to beg the data overlords, affectionately known as database administrators

You have to clean the data once you get your hands on.

You then manipulate certain fields to get the desired format. This is where we spend most of our time in analysis. For example, you may create a “distance” field to calculate the distance between distribution center location and home of your customer.

You then do some exploratory analysis, to detect outliers, or get an idea of distribution of the data

In the analysis step, you may create plots, find patterns, create predictive models

The last steps are the most crucial ones. You draw insights from your learning of the data.

Any analyst worth her money will first ask “why” and not “how”. After seeing the results, she will say “so-what” rather than interesting We shouldn't be satisfied with our first question and even worse, with our first answer.

The last step is reporting your insights. Here you spend your time proving recommendation and answering key questions, not creating useless infographics.

You have to make sure that your reader is able to comprehend the information easily. You provided recommendations from your insights and not spend too much time on your theory and tools. Nobody really cares about the tools; at least your reader doesn’t.

At the end, your reader cares about “how any of this is going to improve my bottom line” and that is what we have to focus on. You have to make your reports visually appealing with a simple language and meet where the reader is.