Practical1

 

Practical 1

Aim: Data Preprocessing using scikitLearn python library

Theory: In any Machine Learning process, Data Preprocessing is that step in which the data gets transformed, or encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.As data could be in so many different forms: Structured Tables, Images, Audio files, Videos etc.. As machines don't understand free text, image or video data as it is, they understand 1s and 0s. So it probably won't be good enough if we put on a slideshow of all our images and expect our machine learning model to get trained just by that.

Dataset Description:

This dataset gives description about the football players like their jersey number, club,nationality, height, weight, potential, wage etc

Total rows: 18207

Total columns: 89 

Various data pre-processing techniques

Standardization: It is a technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

Normalization: It is technique where the larger values which dominate or we can say that the value which is far more greater than the other values in the particular feature has to become normalized relative to other values in the feature by it's mean value of that feature.

Encoding: It is a technique in which you convert the string values to digit value classes as computer is unable to identify string values. 

Discretization: In this technique we convert the given feature into a specific bins so that we can give a specific class value range of values

Missing Value Imputation: In this technique you simply fill all the null values with either 0 or string variable with any random string or with ' ' value.


Task-1 Missing Value Imputation: Here the total sum of all null values in columns would be displayed.

Null values

Task-2 Standardization: In standardization we convert standard deviation to 1 and mean to 0 
Standardization



Task-3 Normalization: Here the values are converted to normalized form with respect to other values in the column.
Normalization

Task-4 Encoding: String features are converted to a particular numeric values so that it is readable to the computer.
Encoding


Task-5 Discretization

Discretization

Here is the practical link: link




No comments:

Post a Comment