Doubt regarding data trained and converted via LabelEncoder from sklearn.preprocessing

pasta · July 31, 2020, 10:06am

Hi
I want to know when we convert text to numbers using LabelEncoder then it gives same equivalent label to test & train data ?
For example let’s say “boat” feature has three unique values “CD”,“A” & “XY” . Then for training data I train feature “boat” via LabelEncoder and I get following labels:
CD=0
A=1
XY=2

Now when I train “boat” feature of test data via LabelEncoder, then will I get same label tags as I got for Training data i.e CD=0,A=1,XY=2 ?? because if it’s not the case then it will lead to wrong predictions.

Thank You

prashant_ml · July 31, 2020, 12:21pm

Hey @pasta ,
To get the above case to be implemented properly you need to use the same LabelEncoder that you used for training data , to be now used on testing data to just transform those text values into there respective labels.
Or what you can do is , you can first merge both the datasets . Now on this full dataset , you can apply LabelEncoder and then afterwards you can split them back into training and testing.

I hope this helps you.
Thank You .

pasta · August 2, 2020, 6:09am

By using same LabelEncoder you mean I shall make an object of LabelEncoder and on training data I should call fit_transform and on test data I should just call “transform” from same object. example le=LabelEncoder(), then on train data shall I call le.fit_transform() and for test data le.transform(), this is what you mean ??

prashant_ml · August 2, 2020, 6:19am

Yes. Like this way only.