In this tutorial we will go back to mathematics and study statistics, and how to calculate important numbers based on data sets.

We will also learn how to use various Python modules to get the answers we need.

And we will learn how to make functions that are able to predict the outcome based on what we have learned.

In the mind of a computer, a data set is any collection of data. It can be anything from an array to a complete database.

Example of an array:

[99,86,87,88,111,86,103,87,94,78,77,85,86]

Example of a database:

By looking at the array, we can guess that the average value is probably around 80 or 90, and we are also able to determine the highest value and the lowest value, but what else can we do? And by looking at the database we can see that the most popular color is white, and the oldest car is 17 years, but what if we could predict if a car had an AutoPass, just by looking at the other values? That is what Machine Learning is for! Analyzing data and predicting the outcome!

To analyze data, it is important to know what type of data we are dealing with. We can split the data types into three main categories:

- Numerical
- Categorical
- Ordinal

Numerical data are numbers, and can be split into two numerical categories:

Discrete Data - counted data that are limited to integers. Example: The number of cars passing by. Continuous Data - measured data that can be any number. Example: The price of an item, or the size of an item

Categorical data are values that cannot be measured up against each other. Example: a color value, or any yes/no values.

Ordinal data are like categorical data, but can be measured up against each other. Example: school grades where A is better than B and so on.

Advertisement