 Data Scientist Training

Even after searching for many hours on how to get trained to become a data scientist, are you confused and tired? Relax. Your search ends here.

There are thousands of resources on the internet that show you how to get trained as a data scientist, but still we can't find a clear path.

That is frustrating and I can relate to that.

I have spent hundreds of hours learning and practicing these skills, while not pulling all my hair out, and I want to help you get started quickly.

In this free training self-study guide, you will find 30+ days path towards becoming a data scientist. You will find the key skills of data science: data manipulation, data analysis and data visualization/storytelling. I've excluded the "big data" tools and technologies for a simple reason: you can't run before you learn to walk. This self-study curriculum will teach you to walk and maybe to jog.

If you are able to complete this complete curriculum, you will show a key skill required of data scientists: perseverance. Keep going no matter how hard you find this journey. Leave your thoughts here and let me know how I can be of any help.

Let's get started then.

First, you need tools and books

Self-study Training Guide

Day 1

Write SQL

• Complete Exercise 1: Creating Tables from Shaw
• Complete Exercise 3: Inserting Data from Shaw
• Complete Exercise 5: Selecting Data from Shaw

Practice

• Repeat these exercises using your own data examples
Day 2

Write SQL

• Complete Exercise 6: Select Across Many Tables from Shaw
• Complete Exercise 9: Updating Data from Shaw
• Complete Exercise 10: Updating Complex Data from Shaw
• Complete Exercise 15: Data Modeling from Shaw

Practice

• Write a small paragraph about your understanding of joins
• Repeat these exercises using your own data examples
Day 3

• Read Chapter 3: Calculations and Aliases from Rockoff
• Read Chapter 4: Using Functions from Rockoff
• Read Chapter 6: Column-Based Logic from Rockoff
• Read Chapter 7: Row-Based Logic from Rockoff
• Read Chapter 8: Boolean Logic from Rockoff
• Read Chapter 10: Summarizing Data from Rockoff

Practice

• Repeat these exercises using your own data examples
Day 4

• Read Chapter 11: Combining Tables with an Inner Join from Rockoff
• Read Chapter 12: Combining Tables with an Outer Join from Rockoff
• Read Chapter 14: Subqueries from Rockoff
• Read Chapter 15: Set Logic from Rockoff

Practice

• Repeat these exercises using your own data examples
Day 5

Learn Data Handling in R

• Read Getting data into R from Zurr
• Read Recipe 3.8 Accessing built-in datasets from Teetor
• Read Accessing variables and managing subsets of data from Zurr
• Read Chapter 1: Getting started from Matloff
• Read Chapter 5: Data Frames from Matloff

Practice

• Write your understanding of these concepts?
• Repeat these exercises using your own data examples
Day 6

Learn Data Handling in R

• Read Chapter 6: Factors and Tables from Matloff
• Appendix B: Installing and using packages from Matloff
• Read Recipe 4.10 Writing to CSV files from Teetor
• Read Chapter 6 Data Transformations from Teetor
• Follow these slides on Accessing Databases from R

Practice

• Write your understanding of these concepts?
Day 7

Practice

• Write your understanding of these concepts?
• Repeat these tutorials using your own data examples
Day 8

Practice

• Repeat these tutorials using your own data examples
Day 9

Practice

• Plot your data using various ggplot2 functions
Day 10

Understand Statistical Concepts

• Read theory of linear regression chapter 3 from James
• Run examples of linear regression in R section 3.6 from James
• Read theory of logistic regression
• Read theory of logistic regression section 4.3 from James
• Run examples of logistic regression in R from section 4.6 of James

Practice

• Write your understanding of these concepts
Day 11

Practice

• Run Bayes classifier on Titanic data in Weka
• Plot various distributions in R
Day 12

Learn Data Preparation

• Learn normalization (scaling/std. dev difference) of data
• Learn discretization (unsupervised/supervised) from section 7.2 of Witten
• Learn sampling (cross-validation, bootstrapping) from section 5.2 and 5.3 of Witten

Practice

• Implement normalization in R
• Test various discretization techniques in Weka
• Test various sampling techniques in Weka
Day 13

Learn Data Preparation

• Learn feature subset selection (FSS) from section 7.1 of Witten

Practice

• Test various FSS techniques in Weka
Day 14 to 20

Understand Machine Learning

• Understand algorithms from chapter 4 of Witten
• Understand advanced methods from chapter 6 of Witten

Practice

• Write your understanding of these algorithms?
• Test various algorithms in Weka
Day 21 to 30

Day 31

Become Better Writer

• Learn Effective Writing from Strunk
Day 32

Become Better Programmer

• Learn Good Programming Practices from McConnell
Day 33+

Become Better Storyteller

• Read Confessions of a Public Speaker by Berkun
• Read How to Give a TED Talk by Donovan