Terminology and key issues with Machine Learning

These are some of the terms which are used in machine learning algorithms.

  • Training Example: An example of the form [x, f(x)]. Statisticians call it ‘Sample’. It is also called ‘Training Instance’.
  • Target Function: This is the true function ‘f’, that we are trying to learn.
  • Target Concept: It is a boolean function where
    • f(x) = 1 are called positive instances
    • f(x) = 0 are called negative instances
  • Hypothesis: In every algorithm we will try to come up with some hypothesis space which is close to target function ‘f’.
  • Hypothesis Space: The space of all hypothesis that can be output by a program. Version Space is a subset of this space.
  • Classifier: It’s a discrete valued function.
    • Classifier is what a learner outputs. A learning program is a program where output is also a program.
    • Once we have the classifier we replace the learning algorithm with the classifier.
    • Program vs Output and Learner vs Classifier are same

Some of the notations commonly used in Machine Learning related white papers


Some of the key issues with machine learning algos

  • What is a good hypothesis space? Is past data good enough?
  • What algorithms fit to what spaces? Different spaces need different algorithms
  • How can we optimize the accuracy of future data points? (this is also called as ‘Problem of Overfitting‘)
  • How to select the features from the training examples? (this is also called ‘Curse of Dimentionality‘)
  • How can we have confidence in results? How much training data is required to find accurate hypothesis (it’s a statistics question)
  • Are learning problems computationally intractable? (Is the solution scalable)
  • Engineering problem? (how to formulate application problems into ML problems)

Note: Problem of Overfitting and Curse of Dimentionality will be there with most of the real time problems, we will look into each of these problems while studying individual algorithms.




Machine Learning and its overall territory

Machine Learning is automating automation OR getting computers to program themselves.
*sometimes writing software will be a bottleneck (like face detection, handwriting to ASCII mapping, stock prediction etc), so let the data do the work instead



Every machine learning algorithm has 3 components.

  • Representation
  • Evaluation
  • Optimization

Like we programmers need a programming language like java/scala.. etc to develop a program, machine learning needs languages to accomplish learning, they are

  • Decision Trees: these are much widely used in ML
  • Set of Rules: Its simple set of rules (like a bunch of if-else conditions)
  • Instances: This is one of the easiest, oldest and simplest lazy learnings of ML.
  • Graphical Models: Bayes and Markov nets.
  • Neural Networks
  • Support Vector Machines: These are very important in business based learning and use much sophisticated mathematics.
  • Model Ensembles: These are the newest ones (ex: Gradient Boosting Machine)

New representations come much less often than compared to next phases of ‘Evaluation’ and ‘Optimization’ and hence this is like a first time effort. Once a representation is chosen, it means that we have chosen a language and now the actual work starts, that is ‘Evaluation’

It explains how to measure accuracy of a ML program. There are few concepts that we use to measure accuracy (consider spam detection example)

  • Accuracy: a program which counts number of spams which are actually marked spam and same with non-spams.
  • Precision: What fraction of our predicted spams are actually spams (0-1 probability)
  • Recall
  • Squared Error: Square of the difference between the predicted value and the actual value.
  • Likelihood: How likely is what we are seeing according to our model. Likelihood is good when the learning is not very powerful.
  • Posterior Probability: It is a combination of Likelihood and ‘Prime Probability’. Along with likelihood it gives weightage to our beliefs.
  • Cost/Utility: We should consider the cost of ‘false positives’ and ‘false negatives’ as they can be very different and expensive.
  • Margin: If we draw a line between spams and non-spams then the distance between the line and spams/non-spams is the margin.
  • Entropy: It is used to measure the degree of uncertainty.
  • KL Divergence

The type of optimization to use depends on what type of Representation and Evaluation we are using.

  • Combinatorial Optimization: We use this if representation is discrete. (ex: greedy search)
  • Convex Optimization: We use this if representation is discrete. (ex: gradient descent)
  • Constrained Optimization: Its continuous optimization subjected to many constraints (in a way this is combination of above both, like Linear Programming)

This gives the complete picture of Machine Learning and we will dive into each ML algorithm going forward.


Mostly technology with occasional sprinkling of other random thoughts


Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

Diabolical or Smart

Nitwit, Blubber, Oddment, Tweak !!


A topnotch WordPress.com site


Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey