Accuracy is low

Hi,

In order to get the excellence certificate, we need to get more than 90% accuracy in our challenges. However, in this challenge, the maximum accuracy achievable on a test set is around 85% using MLP. I know KNN will not provide higher test accuracy but, I need to improve the accuracy of my results. How can I do it and increase it to be above 90% to get the excellence certificate for the same. Pls help out

Thanks

hey @A18ML0031 ,
No one can gurantee you that how you can increase your score.
But as a advice you can try the following.

  1. Try Feature engineering
  2. Use Ensemble models , like random forest they may be of big help.
  3. Use proper Standardization or normalization on your data.
  4. Apply Hyper parameter tuning on your models.

I guess this points can help you increase your score. (specifically 1 and 2. )

I hope this helps.
Thank you :slightly_smiling_face:.

hi, i used feature selection and random forest classifier as well as you have said. My accuracy is still just 77%. Should i share the notebook file?

hey @A18ML0031 ,
yeah please share me that code file, so that i can have a look and also try some ideas and let you know on this all.

Thank You and Happy Learning :slightly_smiling_face:.

Hi, here is the drive link to the file: drive. google. com/file/d/1vgJMCyD27Y4PZq-cc5mEPfZTeiCTf_SW/view?usp=sharing Remove the spaces

i am not able to access this code file , kindly share me at [email protected]
will have a look at it.

drive.google. com/file/d/1vgJMCyD27Y4PZq-cc5mEPfZTeiCTf_SW/view

hey @A18ML0031 ,
There are several things you can try improving.

  1. Feature Selection and Extraction
    You worked with SelectKBest , although in my opinion its not good.
    You can search for others on sklearn , but the best comes if your check the model.feature_importances_
    Like you used RandomForest , then just do , model.feature_importances_ after you have called model.fit . It will give importance of each column respectively , and then you can easily interpret which feature is more important for this model.
    and also , its not always that the given features are enough to work upon. There might be a case that there are some relations between your data that our tree models can’t understand but our mind thinks that relation can be useful. So for that you need to try different feature extraction techniques on those features to generate more features and start working on them and get improved results.
    As an example , you can try concatenating multiple features, dividing one by another , etc.

  2. Which model to use.
    There are several other model that can be used , like GradientBoosting , AdaBoost , ExtraTrees , lightgbm , etc.
    You can search about them more.

  3. Hyperparameter Tuning
    The most important step to be done if you have fixed our features and model.
    This step includes playing with various parameters provided to train any model. You can try gridsearch or randomsearch , any that is convenient to use.

Finally in the end comes , cross validation to check or get an understanding / prediction of how our model will work on unseen data.

I hope these helped you.

Hi, I used the model.feature_importances_ method to check with the important features and removed some of them. Does this removal of columns have to be specific to all the columns like there are columns below 0.1 importance and i removed them all. So, should I try with removing a few of them also to keep a check if I get higher accuracy with a certain set of columns? Also, I have been using RandomForstClassifier only for this as suggested by you. I checked online for the methods other people have used for the same dataset and some have used almost every possible classifier. I didn’t see any classifier achieving accuracy higher than 77%. My accuracy currently is 78% but there are people with 100% accuracy as well. So, for now I should just focus on the columns and the hyper-parameter fine tuning only or is there any other way to get a higher accuracy for this challenge?

Its just a way of checking. We can’t say as for now , that that feature , even having feature_importance_ less than 0.1 , might be useful. So you just need to try it and check how it works.

Try using some advanced models , like LIGHTGBM, XGBOOST or CATBOOST , they are highly effective if provided proper tuning.
Although , you can’t get 100 score , as those are our mentors who have scored that while testing this system.

There also is a concept OOF, in which while we perform cross validation for example for 5 folds.
In each fold , We predict on test data and average it on all folds.
This has always worked much better than others.

You can give it a try. Hope it works.