Terminology and key issues with Machine Learning

These are some of the terms which are used in machine learning algorithms.

  • Training Example: An example of the form [x, f(x)]. Statisticians call it ‘Sample’. It is also called ‘Training Instance’.
  • Target Function: This is the true function ‘f’, that we are trying to learn.
  • Target Concept: It is a boolean function where
    • f(x) = 1 are called positive instances
    • f(x) = 0 are called negative instances
  • Hypothesis: In every algorithm we will try to come up with some hypothesis space which is close to target function ‘f’.
  • Hypothesis Space: The space of all hypothesis that can be output by a program. Version Space is a subset of this space.
  • Classifier: It’s a discrete valued function.
    • Classifier is what a learner outputs. A learning program is a program where output is also a program.
    • Once we have the classifier we replace the learning algorithm with the classifier.
    • Program vs Output and Learner vs Classifier are same

Some of the notations commonly used in Machine Learning related white papers


Some of the key issues with machine learning algos

  • What is a good hypothesis space? Is past data good enough?
  • What algorithms fit to what spaces? Different spaces need different algorithms
  • How can we optimize the accuracy of future data points? (this is also called as ‘Problem of Overfitting‘)
  • How to select the features from the training examples? (this is also called ‘Curse of Dimentionality‘)
  • How can we have confidence in results? How much training data is required to find accurate hypothesis (it’s a statistics question)
  • Are learning problems computationally intractable? (Is the solution scalable)
  • Engineering problem? (how to formulate application problems into ML problems)

Note: Problem of Overfitting and Curse of Dimentionality will be there with most of the real time problems, we will look into each of these problems while studying individual algorithms.




