# Data Science

Machine Learning and its overall territory: Machine Learning is automating automation OR getting computers to program themselves…

Terminology and key issues with Machine Learning: some of the terms which are used in machine learning algorithms..

A framework for hypothesis spaces and learning algorithms: Key factors to understand the hypothesis space

Size: Is the hypothesis space fixed in size (like Naïve Bayes) or variable in size

Essence of Inductive/Supervised Learning: Inductive/Supervised Learning: Given a training example, [x, f(x)], for some unknown function ‘f’, a good approximation of ‘f’ is called Supervised Learning.

Decision Trees Learning and Implementation #1: These are most widely used data mining machine learning algorithms for two reasons, one is its fairly easier to understand and implement and other reason is the scalability…

Entropy and Information Gain: Let ‘v’ be the random boolean variable with the following probability distribution: P(v=0) = 0.2 and P(v=1) = 0.8; (Say, v=0 is heads on a coin and v=1 is tails). The surprise S(V=v) of each value ‘v’ is defined as ..

ID3 Implementation of Decision Trees : There are different implementations given for Decision Trees. Major ones are

ID3: Iternative Dichotomizer was the very first implementation of Decision Tree given by Ross Quinlan.

Importance of data distribution in training machine learning models : A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes data distribution, skewness and kurtosis ..

Reason for data normalization in Machine Learning Models : Standardization/Normalization is a common requirement for majority of algorithms (except like ID3 impl of Trees) which transforms asymmetric training data into symmetric. ML Algorithms behave badly if the training data is not brought on to the same scale …

Different Similarity/Distance Measures in Machine Learning : What is the best similarity/distance measure to be used in machine learning models? This depends on various factors. We will see each of them now. Distance measures for numeric data points…

A good diagram on skills related to data scientist:

**REFERENCES**

https://class.coursera.org/machlearning-001/lecture/preview

https://archive.ics.uci.edu/ml/datasets/Hepatitis

http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html

http://nirvacana.com/thoughts/becoming-a-data-scientist

## Recent Comments