Accuracy related issue in lyrics generation

Jan19LPN0013 · July 1, 2020, 9:05am

I get 99% accuracy and many get 100% accuracy how plz check my code:- https://github.com/agarwalyash02/machine-learning/blob/master/lyrics%20generation%20using%20markov%20chain%20text%20generation%20NLP.ipynb

Aayushkh_333 · July 1, 2020, 9:12am

Hey @Jan19LPN0013, can you please share your .txt file that you have generated ? Only then will I be able to debug the problem

Jan19LPN0013 · July 1, 2020, 10:39am

here is my text file:- https://github.com/agarwalyash02/machine-learning/blob/master/text_prediction.txt

Aayushkh_333 · July 1, 2020, 11:59am

Hey @Jan19LPN0013, you need to specify the encoding as following :

f = open('text_prediction.txt',"w",encoding = "utf8")

I hope this helps you achieve 100% accuracy
Happy Learning

Jan19LPN0013 · July 1, 2020, 3:32pm

so what’s if problem if I will not use encoding =“utf8” plz explain?
as i can see both files are exactly identical and still getting 100% accuracy on new file generated with encoding explain?

Aayushkh_333 · July 1, 2020, 6:33pm

No there is one character that is different. let me show you :

See when you open it in any text editor which uses encoding other than utf-8, there is a character mismatch as you can see above in the screenshot. It’s in the word hard'ch. Now if you want to know why we need to specify the encoding , I would request you to go through this link. It will explain you that utf-8 is the default encoding for everything done in python.

I hope this resolves your doubt
Happy Learning

Jan19LPN0013 · July 1, 2020, 6:39pm

but in my both the text file in which i not used encoding there hard’ch is only present

Aayushkh_333 · July 1, 2020, 6:41pm

This is because some text editors by default decode the file in UTF-8 format. That’s why you are not able to see it.

Jan19LPN0013 · July 1, 2020, 6:42pm

ohk thanks sir for your support