Error in reading u.item file

I tried reading u.item file as: - pd.read_csv(pd.read_csv('./ml-100k/u.item', sep='\|', header=None) -
I got decoding error. Iresolved the issue by passing argument encoding=“Latin-1” -
However I am curious to know what might be the error in the above decoding. I tried bu restarting the kernel also. That didn’t worked.

Hi @harshsharmajnv_9b70d236614796d5, the UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.

To avoid this type of situation always try to specify encoding parameter in read_csv function. I mostly use read_csv('file', encoding = "ISO-8859-1") , or alternatively encoding = "utf-8" for reading, and generally utf-8 for to_csv . You can also use one of several alias options like 'latin' instead of 'ISO-8859-1' etc.

Hope this might help :slight_smile:

1 Like

Thanks for sharing info sir :).

Glad it helped :slight_smile: for now I am closing this doubt but still if you face any issue you can reopen it any time

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.