Essence of Inductive/Supervised Learning

Inductive/Supervised Learning: Given a training example, [x, f(x)], for some unknown function ‘f’, a good approximation of ‘f’ is called Supervised Learning.

Appropriate applications of Supervised Learning

  • Situations where humans can perform the task but cannot describe how they do it.
    • x: bitmap picture of hand written characters; f(x): ASCII code of char
  • Situation where desired function is changing very frequently.
    • Predictiong stock prices.
  • Situation where each user needs customized function.
    • Amazon user recommendations.

Let us take this simple example

#  x1  x2  x3  x4 | y
1  0   0   1   0  | 0  
2  0   1   0   0  | 0  
3  0   0   1   1  | 1  
4  1   0   0   1  | 1  
5  0   1   1   0  | 0  
6  1   1   0   0  | 0  
7  0   1   0   1  | 0  
. so on

We have to predict ‘y’ for a given new combination of (x1,x2,x3,x4). Since there are 4 variables, there will be 2^4 states(16) and hence number of functions would become 2^16.

Since the number of functions is double exponential and in real time there will be 1000s of variables, brute force way of finding the function is not possible, therefore we need generalization.

Two views of Machine Learning to generalize

  • Removal of remaining uncertainty (suppose we know the unknown function is ‘m of n’, then we would use training examples to infer the function)
  • It requires guessing a good, small hypothesis class. We can start with very small class and enlarge it until it contains hypothesis that fits the data.

A framework for hypothesis spaces and learning algorithms

Framework for hypothesis spaces

Key factors to understand the hypothesis space

  • Size: Is the hypothesis space fixed in size (like Naïve Bayes) or variable in size (like Decision Trees)
    • Fixed size spaces are easier to understand, but variable size spaces are more useful.
    • Variable size spaces introduce the problem of overfitting.
  • Randomness: Is each hypothesis ‘Deterministic’ or ‘Stochastic’? this effects how we evaluate hypothesis.
    • With deterministic hypothesis, the training example is consistent, i.e, correctly predicted OR inconsistent, i.e, incorrectly predicted (ex: spam/non-spam detection)
    • With stochastic hypothesis, the training example is more likely or less likely.
  • Parameterization: Is the hypothesis described by set of symbolic (discrete) choices OR continuous parameters?


Framework for Learning Algorithms

Key factors to understand the learning algorithms

  • Search Procedure
    • Direct Computation: Solve for the hypothesis directly
    • Local Search: Start with initial hypothesis and make small improvements until local optimum.
    • Constructive Search: We start with empty hypothesis and gradually add structure to it.
  • Timing
    • Eager: Analyze training data and construct explicit hypothesis (Decision Trees)
    • Lazy: Store the training data and wait till the testing data is presented and then construct the adhoc hypothesis to classify the test instance (KNN Algo).
  • Online vs Batch Learning
    • If the data changes very often (like stock market prediction), we need online learning
    • If the data does not change much overtime (like drug analytics), then we need batch learning. This is more expensive.



Mostly technology with occasional sprinkling of other random thoughts


Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates


A topnotch site


Just another site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey

%d bloggers like this: