Practical 1
Aim: Data Preprocessing using scikitLearn python library
Theory: In any Machine Learning process, Data Preprocessing is that step in which the data gets transformed, or encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.As data could be in so many different forms: Structured Tables, Images, Audio files, Videos etc.. As machines don't understand free text, image or video data as it is, they understand 1s and 0s. So it probably won't be good enough if we put on a slideshow of all our images and expect our machine learning model to get trained just by that.
Dataset Description:
Total columns: 89
Various data pre-processing techniques
Standardization: It is a technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.
Normalization: It is technique where the larger values which dominate or we can say that the value which is far more greater than the other values in the particular feature has to become normalized relative to other values in the feature by it's mean value of that feature.
Encoding: It is a technique in which you convert the string values to digit value classes as computer is unable to identify string values.
Discretization: In this technique we convert the given feature into a specific bins so that we can give a specific class value range of values
Missing Value Imputation: In this technique you simply fill all the null values with either 0 or string variable with any random string or with ' ' value.
Task-1 Missing Value Imputation: Here the total sum of all null values in columns would be displayed.
No comments:
Post a Comment