Minimum steps for implementing a Machine Learning algorithm
Machine Learning is the way to able to make the computer decipher the patterns inside the dataset using various machine learning algorithms without being explicitly programmed. To know more about machine learning introduction see here https://sththapa999.medium.com/machine-learning-the-future-of-new-technology-d501fc4467f3
Implementing a machine learning algorithm itself is a very important step where each and every care must be taken to filter unnecessary data and outliers from the raw data. The more we go deep inside the data the more information about the data we can gain from it. Here are some of the beginner steps to carry out for implementing the machine learning model, almost any model.
- Collecting the raw data
- Exploratory Data Analysis(EDA) or data analyzing
- Data Cleaning
- Training and testing
- Accuracy score test
Let me describe all of them in brief:
- Collecting the raw data: Data is the core part in my view it is the soul of any machine learning algorithm. As a human being is nothing without the soul, machine learning and data science are also incomplete or meaningless without the data. The related data can be collected from the various business domain or from Kaggle and other data repositories.So the first step in machine learning and data science is to collect the data.
2. Exploratory Data Analysis(EDA) or data analyzing: There is a common saying,” Pictures speak a thousands words”. It is true in the case of machine learning also. Visually analyzing the data is very important in the case of machine learning. By exploring data we can see about the distributions of the data, how the data are correlated, how the data are sitting next to each other, and so on.
3. Data Cleaning: Data or raw data is always in impure form. There are many missing values, outliers, and so on. So, firstly we must impute all the missing values and convert all the values to numerical values as we know that machines can only understand numerical values. If we try using strings values then it will give an error. Hence, all of these things must be kept inside mind before moving to our next step.
4. Training and testing: Machine like similar to us in case of reading the data. For instance, we humans prepare for the examination and you can relate that with training data. We collect previous data, question sets, and be prepared for the test. And our examination is like test data, where we will be evaluated on the basis of how much we did well during our training phase or reading phase. The same goes for machine learning also, the machine is given a certain percentage of the dataset for the training set and a certain percent of data for the testing set. Generally, this assignment will be in the ration of 80%-20% or in the ration of 8:2. But this is not fixed, generally, training dataset must be given higher sets of data. After training from the training set it will decipher the pattern and that pattern will be used in the remaining test set where the machine can predict its accuracy. One major point to remember is that we should never ever test data on the same data where we trained it. It like looking at all the board exam questions before the examination. The test must be performed on separate data where it has not seen before and it makes a judgment based on the things that it had learned during the training phase.
5. Accuracy score check: After testing the data we can measure how well our model performed on the test dataset. For regression problem accuracy metrics like R2 score, Root Mean Squared Error, Root Mean Absolute Error can be used. Similarly for the classification problem, Recall, Precision, Confusion Matrix, etc. can be used.
Ok, these are some of the basics steps for implementing machine learning and data science models. There are other many ways like hyperparameters tuning, using powerful models like xgboost, and so on which I will be discussing on a separate blog. Thank you. Have a great day.