# Data Scientist Training

Even after searching for many hours on how to get trained to become a data scientist, are you confused and tired? Relax. Your search ends here.

There are thousands of resources on the internet that show you how to get trained as a data scientist, but still we can't find a clear path.

That is frustrating and I can relate to that.

I have spent hundreds of hours learning and practicing these skills, while not pulling all my hair out, and I want to help you get started quickly.

In this free training self-study guide, you will find 30+ days path towards becoming a data scientist. You will find the key skills of data science: data manipulation, data analysis and data visualization/storytelling. I've excluded the "big data" tools and technologies for a simple reason: you can't run before you learn to walk. This self-study curriculum will teach you to walk and maybe to jog.

If you are able to complete this complete curriculum, you will show a key skill required of data scientists: perseverance. Keep going no matter how hard you find this journey. Leave your thoughts here and let me know how I can be of any help.

Let's get started then.

First, you need tools and books

## Tools you need

- R and RStudio
- Weka
- sqlite (optional: ?sqlite manager for Firefox)
- Notepad?++ or Sublime Text

## Books you need

### SQL

### R

### Machine Learning and Statistics

### Data Visualization

### Storytelling

# Self-study Training Guide

### Understand Relational Databases

- Read Appendix L from Krieger (2008) http://bit.ly/1urSPml
- Read Chapter 3 Krieger (2011)

### Practice

- Repeat these exercises using your own data examples

### Understand Joins

- Read Introduction to joins http://bit.ly/g6LH8
- Visualize joins http://bit.ly/ceW6QZ

### Write SQL

- Complete Exercise 6: Select Across Many Tables from Shaw
- Complete Exercise 9: Updating Data from Shaw
- Complete Exercise 10: Updating Complex Data from Shaw
- Complete Exercise 15: Data Modeling from Shaw

### Practice

- Write a small paragraph about your understanding of joins
- Repeat these exercises using your own data examples

### Learn Advanced SQL

- Read Chapter 3: Calculations and Aliases from Rockoff
- Read Chapter 4: Using Functions from Rockoff
- Read Chapter 6: Column-Based Logic from Rockoff
- Read Chapter 7: Row-Based Logic from Rockoff
- Read Chapter 8: Boolean Logic from Rockoff
- Read Chapter 10: Summarizing Data from Rockoff

### Practice

- Repeat these exercises using your own data examples

### Continue Advanced SQL

- Read Chapter 11: Combining Tables with an Inner Join from Rockoff
- Read Chapter 12: Combining Tables with an Outer Join from Rockoff
- Read Chapter 14: Subqueries from Rockoff
- Read Chapter 15: Set Logic from Rockoff

### Practice

- Repeat these exercises using your own data examples

### Learn Data Handling in R

- Read Getting data into R from Zurr
- Read Recipe 3.8 Accessing built-in datasets from Teetor
- Read Accessing variables and managing subsets of data from Zurr
- Read Chapter 1: Getting started from Matloff
- Read Chapter 5: Data Frames from Matloff

### Practice

- Write your understanding of these concepts?
- Repeat these exercises using your own data examples

### Learn Data Handling in R

- Read Chapter 6: Factors and Tables from Matloff
- Appendix B: Installing and using packages from Matloff
- Read Recipe 4.10 Writing to CSV files from Teetor
- Read Chapter 6 Data Transformations from Teetor
- Follow these slides on Accessing Databases from R

### Practice

- Write your understanding of these concepts?

### Learn Data Manipulation in R

### Practice

- Write your understanding of these concepts?
- Repeat these tutorials using your own data examples

### Learn ggplot2 Graphics

### Practice

- Repeat these tutorials using your own data examples

### R Miscellaneous

- Plot Polygons
- Learn best practices of R programming from Google's R Style Guide

### Practice

- Plot your data using various ggplot2 functions

### Understand Statistical Concepts

- Read theory of linear regression chapter 3 from James
- Run examples of linear regression in R section 3.6 from James
- Read theory of logistic regression
- Read theory of logistic regression section 4.3 from James
- Run examples of logistic regression in R from section 4.6 of James

### Practice

- Write your understanding of these concepts

### Understand Statistical Concepts

- Read theory of Naive Bayes section 4.2 from Witten
- Understand ?statisitcal? distribution: part I
- Understand Statistical Distributions: part II
- Understand Statistical Distributions: part III

### Practice

- Run Bayes classifier on Titanic data in Weka
- Plot various distributions in R

### Learn Data Preparation

- Learn normalization (scaling/std. dev difference) of data
- Learn discretization (unsupervised/supervised) from section 7.2 of Witten
- Learn sampling (cross-validation, bootstrapping) from section 5.2 and 5.3 of Witten

### Practice

- Implement normalization in R
- Test various discretization techniques in Weka
- Test various sampling techniques in Weka

### Learn Data Preparation

- Learn feature subset selection (FSS) from section 7.1 of Witten

### Practice

- Test various FSS techniques in Weka

### Understand Machine Learning

- Understand algorithms from chapter 4 of Witten
- Understand advanced methods from chapter 6 of Witten

### Practice

- Write your understanding of these algorithms?
- Test various algorithms in Weka

### Learn Effective Data Visualization

- Learn Best Practices
- Read Cleveland's Book
- Read Wong's Book
- Read Tufte's Book

### Become Better Writer

- Learn Effective Writing from Strunk

### Become Better Programmer

- Learn Good Programming Practices from McConnell

### Become Better Storyteller

- Read Confessions of a Public Speaker by Berkun
- Read How to Give a TED Talk by Donovan