Regarding accuracy level and scoring parameter

commonid369 · June 11, 2019, 8:24pm

can we achieve accuracy of more than 96% from the same algo(linear regression using k-fold).
also plz explain the importance of ‘scoring’ parameter in cross validation score,

Manu-Pillai-1566551720093198 · June 12, 2019, 2:47pm

Yes you can increase accuracy if you data is good enough. For the accuracy part, could you please provide some insights on what all you have done so far to get to 96%.

Scoring parameter in cross validation score is basically asking you what formula(loosely speaking) to use when calculating the performance of each folds while doing k-fold validation. i.e, whether you want to calculate F1-score, mean-squared-error or something of that sort.

commonid369 · June 12, 2019, 7:08pm

here is the code, any improvement to increase accuracy!!!

Manu-Pillai-1566551720093198 · June 13, 2019, 9:38am

As i can infer from your code, you havent yet done any kind of analysis in your data yet. Directly feeding your raw data into an algorithm isnt much of a good practice. Its very rare you will get any good results doing so in real life datasets. The only preprocessing you have done is normalizing the dataset.
Here is some techniques you can try to increase model accuracy:

Calculate training accuracy and check whether its overfitting your training dataset or not. A overfitted model just memorizes each instances and does not performs any real learning. In such cases, you should regularize your model accordingly. (If your training accuracy is very much greater than testing accuracy, there is a great chance your model is overfitting).
Look for variables that doesnt has much value in the learning process. For this, you have to randomly drop some features and train your model on the remaining data and see whether it drops your algorithms performance or not.
Look for sets of variables that correlates with each other the most, and try training your model on such variables.
If regularizing your model, try different values of hyperparameters. You can use GridSearch for that. (
model_selection.GridSearchCV)
You can try using ensembling with random patches and/or random subspaces as well, but as you specifically asked for Linear models, lets drop this one.

These are some techniques you can opt to boost your accuracy. But keep in mind that, increasing the accuracy of your model is purely problem as well as data dependant, i.e each set of problem has its own working solution that might increase the accuracy. The only thing we as data scientist has to do is finding that working solution.
Also, there will be some point after which no methods will be able to increase the accuracy. The errors then occuring is known as irreducible errors and are arised due to noise in your dataset.

I hope this made things clear for you.
Happy Learning
Thanks.

commonid369 · June 13, 2019, 7:50pm

really thanx for those valueable points.

Manu-Pillai-1566551720093198 · June 14, 2019, 8:20am

Kindly mark it resolved if it helped you with your question.
Thanks.