UCLA‘s professor Jason Frand defines data mining as this: “data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both”
To become an expert and actually get a job in data analytics or data science you must master these following components:
- data mining/machine learning/statistics
- data visualization
- database management
and here’s how you can learn and train in these skills by taking free online courses that cover many of these areas, and these courses are usually part of a degree or a certificate program in data mining. Those who are new or interested in this field can learn a whole lot without paying a dime. Here is the list:
- Data mining with Weka: If you think you want to learn data mining, if you say want to learn data mining, you cannot miss this course. This course will taught by the author of the Practical Machine Learning Tools and Techniques (aka Weka book) and creator of the Weka software Prof. Ian Witten.
- Machine learning the course that started it all. Andrew Ng’s first course on Coursera.
- Data Analysis a Coursera course by Jeff Leek of John Hopkins
- Intro to Probability and Statistics (Carnegie Mellon)
- Machine Learning 101/102
- GovData (MIT/Harvard)
- STATS 120: Information Visualisation (The University of Auckland)
- R Programming (UCLA)
- CS 229: Machine Learning (Stanford) (videos)
- Linguistics 420: Statistical Natural Language Processing (Georgetown)
- SI 508: Networks: Theory and Application (University of Michigan)
- CS 591: Data Mining (West Virginia University)
- STATS 782: Computing for Statisticians (The University of Auckland)
- 6.867: Machine Learning (MIT)
- Andrew Moore’s Slides on Statistical Data Mining Tutorials
- Check the list of other courses on Coursera.
- Lots of tutorials (Data Mining Tools)
- Capstone project: kaggle or kdd (for a bigger list see kdnuggets)
Some free text books:
- The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- Mining of Massive Datasets by Rajaraman and Ullman
In addition, there is an excellent thread on quora on how to become a data scientist that covers a lot of things and is a very good resource on the practice of analytics.