Machine Learning and its overall territory

Machine Learning is automating automation OR getting computers to program themselves.
*sometimes writing software will be a bottleneck (like face detection, handwriting to ASCII mapping, stock prediction etc), so let the data do the work instead



Every machine learning algorithm has 3 components.

  • Representation
  • Evaluation
  • Optimization

Like we programmers need a programming language like java/scala.. etc to develop a program, machine learning needs languages to accomplish learning, they are

  • Decision Trees: these are much widely used in ML
  • Set of Rules: Its simple set of rules (like a bunch of if-else conditions)
  • Instances: This is one of the easiest, oldest and simplest lazy learnings of ML.
  • Graphical Models: Bayes and Markov nets.
  • Neural Networks
  • Support Vector Machines: These are very important in business based learning and use much sophisticated mathematics.
  • Model Ensembles: These are the newest ones (ex: Gradient Boosting Machine)

New representations come much less often than compared to next phases of ‘Evaluation’ and ‘Optimization’ and hence this is like a first time effort. Once a representation is chosen, it means that we have chosen a language and now the actual work starts, that is ‘Evaluation’

It explains how to measure accuracy of a ML program. There are few concepts that we use to measure accuracy (consider spam detection example)

  • Accuracy: a program which counts number of spams which are actually marked spam and same with non-spams.
  • Precision: What fraction of our predicted spams are actually spams (0-1 probability)
  • Recall
  • Squared Error: Square of the difference between the predicted value and the actual value.
  • Likelihood: How likely is what we are seeing according to our model. Likelihood is good when the learning is not very powerful.
  • Posterior Probability: It is a combination of Likelihood and ‘Prime Probability’. Along with likelihood it gives weightage to our beliefs.
  • Cost/Utility: We should consider the cost of ‘false positives’ and ‘false negatives’ as they can be very different and expensive.
  • Margin: If we draw a line between spams and non-spams then the distance between the line and spams/non-spams is the margin.
  • Entropy: It is used to measure the degree of uncertainty.
  • KL Divergence

The type of optimization to use depends on what type of Representation and Evaluation we are using.

  • Combinatorial Optimization: We use this if representation is discrete. (ex: greedy search)
  • Convex Optimization: We use this if representation is discrete. (ex: gradient descent)
  • Constrained Optimization: Its continuous optimization subjected to many constraints (in a way this is combination of above both, like Linear Programming)

This gives the complete picture of Machine Learning and we will dive into each ML algorithm going forward.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


Mostly technology with occasional sprinkling of other random thoughts


Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates


A topnotch site


Just another site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey

%d bloggers like this: