A few basic statistics for Machine Learning

Machine learning itself is big area, but to get started one needs to be familiar with a few basic concepts of Statistics and Probability.

Here are a few basic concepts given a set of numbers say 12, 5, 7, 17, 22, 5, 10, 2, 5, 17, 2, 11

Mean- Given a set of N numbers, mean is average of these numbers. 115/12 = 9.58

Median- Sort the list, for even items, median is sum of the middle 2 numbers divided by 2. For odd items median is the middle number.
2, 2, 5, 5, 5, 7, 10, 11, 12, 17, 17, 22
7+10= 17/2=8.5

Mode- Number that appears most often in the list. In this case it is 5.

Range- Max number minus minimum number in the list: 22-2= 20

Variance- It is the measure of how the set of number is varying from the mean. This is calculated by finding difference of each number from mean and than squaring.

Taking a simple example in this case, say we have 3 numbers, 1, 2, 3 mean in this case would be 6/3 =2. So variance would be
(1-2)^2+(2-2)^2+(3-2)^2 / 3= 2 /3 = 0.67

Standard Deviation- The variance is figured by squaring the difference of mean and numbers in list. Standard deviation takes the square root of variance to reset the unit of original list. So in case variance was 0.67, standard deviation would be sqtr(0.67) or 0.81.