Bootstrapping Machine Learning
My thesis will be related to machine learning(ML), therefore, I need to learn the necessary ML knowledge to do the project. In this post, I would like to revisit some concepts and materials that I used to start learning about ML. Feel free to comment and give suggestions!
Machine Learning is not statistics and not data-mining, but it is in between them. ML is more like automated application of statistics to perform data mining tasks i.e. ML develops algorithms for making predictions from data. Note that predictions in this context refers to statistical-prediction.
[caption id=”” align=”aligncenter” width=”640”] Not this kind of Machine Learning though :p[/caption]
Data in ML consists of data instances. The data instances are represented as feature vectors. Example: people can be represented as feature vectors of height and weight, such as:
- Arinto -> (158,55)
- Lionel Messi (169, 70)
We can add more features into our data instances. Features are chosen for a specific task at hand i.e. what do we want to accomplish from the data. The cool term for this concept is feature engineering.
Machine learning consists of three groups of methodology:
- Classification. Given a new data instance, we want to infer/predict in which group/class the new data instance belongs to.
- Clustering. Given a set of data instances, we want to group subsets of them that have similar characteristics.
- Regression. Give a set of data, we are trying to fit some lines into our existing dataset, so that we can predict some output based on given inputs.
I will not go deep into each of them for this post, but I will list some pointers about learning ML. Here they are:
- Machine Learning: The Basics, by Ron Bekkerman. Very intuitive and clear explanation about the what machine learning is and what are the three groups of methodology.
- Coursera’s Machine Learning Course, by Andrew Ng.
- Data Mining: Practical Machine Learning Tools and Techniques book,. And, don’t forget to get your hands dirty with WEKA ML framework. FYI, this book is often called WEKA-book.
To conclude this post, I would like to quote a tip from a fellow Yahoo! intern, Martin (@fris), who has been in machine learning for several years:
Machine Learning is huge. Start picking them up by studying at the high level first i.e. don’t go into details yet, read all the introduction or first few paragraphs of the all chapters in WEKA-book. Once you’re focusing on one part of machine learning, you can go drill down into the required details :)