Data Scientist Training

Data Scientist Training

Even after searching for many hours on how to get trained to become a data scientist, are you confused and tired? Relax. Your search ends here.

There are thousands of resources on the internet that show you how to get trained as a data scientist, but still we can't find a clear path.

That is frustrating and I can relate to that.

I have spent hundreds of hours learning and practicing these skills, while not pulling all my hair out, and I want to help you get started quickly.

In this free training self-study guide, you will find 30+ days path towards becoming a data scientist. You will find the key skills of data science: data manipulation, data analysis and data visualization/storytelling. I've excluded the "big data" tools and technologies for a simple reason: you can't run before you learn to walk. This self-study curriculum will teach you to walk and maybe to jog.

If you are able to complete this complete curriculum, you will show a key skill required of data scientists: perseverance. Keep going no matter how hard you find this journey. Leave your thoughts here and let me know how I can be of any help.

Let's get started then.


First, you need tools and books

Tools you need

Books you need

SQL

R

Machine Learning and Statistics

Data Visualization

Storytelling


Self-study Training Guide

Day 1

Understand Relational Databases

Write SQL

  • Complete Exercise 1: Creating Tables from Shaw
  • Complete Exercise 3: Inserting Data from Shaw
  • Complete Exercise 5: Selecting Data from Shaw

Practice

  • Repeat these exercises using your own data examples
Day 2

Understand Joins

Write SQL

  • Complete Exercise 6: Select Across Many Tables from Shaw
  • Complete Exercise 9: Updating Data from Shaw
  • Complete Exercise 10: Updating Complex Data from Shaw
  • Complete Exercise 15: Data Modeling from Shaw

Practice

  • Write a small paragraph about your understanding of joins
  • Repeat these exercises using your own data examples
Day 3

Learn Advanced SQL

  • Read Chapter 3: Calculations and Aliases from Rockoff
  • Read Chapter 4: Using Functions from Rockoff
  • Read Chapter 6: Column-Based Logic from Rockoff
  • Read Chapter 7: Row-Based Logic from Rockoff
  • Read Chapter 8: Boolean Logic from Rockoff
  • Read Chapter 10: Summarizing Data from Rockoff

Practice

  • Repeat these exercises using your own data examples
Day 4

Continue Advanced SQL

  • Read Chapter 11: Combining Tables with an Inner Join from Rockoff
  • Read Chapter 12: Combining Tables with an Outer Join from Rockoff
  • Read Chapter 14: Subqueries from Rockoff
  • Read Chapter 15: Set Logic from Rockoff

Practice

  • Repeat these exercises using your own data examples
Day 5

Learn Data Handling in R

  • Read Getting data into R from Zurr
  • Read Recipe 3.8 Accessing built-in datasets from Teetor
  • Read Accessing variables and managing subsets of data from Zurr
  • Read Chapter 1: Getting started from Matloff
  • Read Chapter 5: Data Frames from Matloff

Practice

  • Write your understanding of these concepts?
  • Repeat these exercises using your own data examples
Day 6

Learn Data Handling in R

  • Read Chapter 6: Factors and Tables from Matloff
  • Appendix B: Installing and using packages from Matloff
  • Read Recipe 4.10 Writing to CSV files from Teetor
  • Read Chapter 6 Data Transformations from Teetor
  • Follow these slides on Accessing Databases from R

Practice

  • Write your understanding of these concepts?
Day 7

Learn Data Manipulation in R

Practice

  • Write your understanding of these concepts?
  • Repeat these tutorials using your own data examples
Day 8

Practice

  • Repeat these tutorials using your own data examples
Day 9

R Miscellaneous

Practice

  • Plot your data using various ggplot2 functions
Day 10

Understand Statistical Concepts

  • Read theory of linear regression chapter 3 from James
  • Run examples of linear regression in R section 3.6 from James
  • Read theory of logistic regression
  • Read theory of logistic regression section 4.3 from James
  • Run examples of logistic regression in R from section 4.6 of James

Practice

  • Write your understanding of these concepts
Day 11

Practice

  • Run Bayes classifier on Titanic data in Weka
  • Plot various distributions in R
Day 12

Learn Data Preparation

  • Learn normalization (scaling/std. dev difference) of data
  • Learn discretization (unsupervised/supervised) from section 7.2 of Witten
  • Learn sampling (cross-validation, bootstrapping) from section 5.2 and 5.3 of Witten

Practice

  • Implement normalization in R
  • Test various discretization techniques in Weka
  • Test various sampling techniques in Weka
Day 13

Learn Data Preparation

  • Learn feature subset selection (FSS) from section 7.1 of Witten

Practice

  • Test various FSS techniques in Weka
Day 14 to 20

Understand Machine Learning

  • Understand algorithms from chapter 4 of Witten
  • Understand advanced methods from chapter 6 of Witten

Practice

  • Write your understanding of these algorithms?
  • Test various algorithms in Weka
Day 21 to 30

Learn Effective Data Visualization

Day 31

Become Better Writer

  • Learn Effective Writing from Strunk
Day 32

Become Better Programmer

  • Learn Good Programming Practices from McConnell
Day 33+

Become Better Storyteller

  • Read Confessions of a Public Speaker by Berkun
  • Read How to Give a TED Talk by Donovan

About the Author

A co-author of Data Science for Fundraising, an award winning keynote speaker, Ashutosh R. Nandeshwar is one of the few analytics professionals in the higher education industry who has developed analytical solutions for all stages of the student life cycle (from recruitment to giving). He enjoys speaking about the power of data, as well as ranting about data professionals who chase after “interesting” things. He earned his PhD/MS from West Virginia University and his BEng from Nagpur University, all in industrial engineering. Currently, he is leading the data science, reporting, and prospect development efforts at the University of Southern California.

>