# Machine Learning and its overall territory

November 7, 2015 Leave a comment

Machine Learning is automating automation OR getting computers to program themselves.

*sometimes writing software will be a bottleneck (like face detection, handwriting to ASCII mapping, stock prediction etc), so let the data do the work instead

Every machine learning algorithm has 3 components.

- Representation
- Evaluation
- Optimization

**Representation**

Like we programmers need a programming language like java/scala.. etc to develop a program, machine learning needs languages to accomplish learning, they are

- Decision Trees: these are much widely used in ML
- Set of Rules: Its simple set of rules (like a bunch of if-else conditions)
- Instances: This is one of the easiest, oldest and simplest lazy learnings of ML.
- Graphical Models: Bayes and Markov nets.
- Neural Networks
- Support Vector Machines: These are very important in business based learning and use much sophisticated mathematics.
- Model Ensembles: These are the newest ones (ex: Gradient Boosting Machine)

New representations come much less often than compared to next phases of ‘Evaluation’ and ‘Optimization’ and hence this is like a first time effort. Once a representation is chosen, it means that we have chosen a language and now the actual work starts, that is ‘Evaluation’

**Evaluation**

It explains how to measure accuracy of a ML program. There are few concepts that we use to measure accuracy (consider spam detection example)

- Accuracy: a program which counts number of spams which are actually marked spam and same with non-spams.
- Precision: What fraction of our predicted spams are actually spams (0-1 probability)
- Recall
- Squared Error: Square of the difference between the predicted value and the actual value.
- Likelihood: How likely is what we are seeing according to our model. Likelihood is good when the learning is not very powerful.
- Posterior Probability: It is a combination of Likelihood and ‘Prime Probability’. Along with likelihood it gives weightage to our beliefs.
- Cost/Utility: We should consider the cost of ‘false positives’ and ‘false negatives’ as they can be very different and expensive.
- Margin: If we draw a line between spams and non-spams then the distance between the line and spams/non-spams is the margin.
- Entropy: It is used to measure the degree of uncertainty.
- KL Divergence

**Optimization**

The type of optimization to use depends on what type of Representation and Evaluation we are using.

- Combinatorial Optimization: We use this if representation is discrete. (ex: greedy search)
- Convex Optimization: We use this if representation is discrete. (ex: gradient descent)
- Constrained Optimization: Its continuous optimization subjected to many constraints (in a way this is combination of above both, like Linear Programming)

This gives the complete picture of Machine Learning and we will dive into each ML algorithm going forward.

## Recent Comments