INVERSE TRANSFORM

mananaroramail · May 22, 2020, 8:47pm

I am facing an error :

y contains previously unseen labels: [7 9]

what to do ?

Here’s the link for the code :

prashant_ml · May 23, 2020, 2:44am

It means for a particular feature X , there are some values in test data that were not available for Encoder and hence the encoder doesn’t know what to do with them,so it generates that error.

In such case,to tackle this problem you have to first merge both test and train datasets into a single dataframe and then fit your LabelEncoder on desired feature , after you transformed all features you wanted to , then you have to again split back the dataframe into train and test with same indices as they have initially.

A good way to do that is when you are merging the csv files , you will notice that the target values for test data indices are all null and for train data indices it isn’t null. So you can easily split the data into train and test , as the ones which doesn’t have null target values and ones which have null target values respectively.

I hope this would have resolved your doubt.
Happy Coding .

mananaroramail · May 23, 2020, 9:48am

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
ndf = df.apply(le.fit_transform)

In this part of the code, the ndf (new data frame) is based on both testing and training data.
Thus, from the above command, all the features must be labelled.

Then I splitted the data, using :

x_train,x_test,y_train,y_test = train_test_split(data_x,data_y,test_size=0.2)

Does this mean that some of the values of the mushrooms.csv file are null and no encoding was performed for them?
I am still not getting it !!

prashant_ml · May 23, 2020, 11:11am

That’s correct.

no not like this , use the below commands :
train = df[ ~df [Target_column].isna()]
test = df[ df [Target_column].isna()]

after these you can perform train_test_split on train dataframe.

No its not that these values are null.
Let me explain with an example
let X = [ “dog”,“dog”,“cat”,“cow”]
and let Y = [ “dog”,“cat”,“cat”,“bull”]

values in Y and not in X ( let say Z )= [ “bull” ]

encoder = LabelEncoder()

-> Case 1 : Fitting encoder only on X or Y
so if we fit encoder only on X , and then try to transform Y values , it will give an error as the encoder hasn’t seen those Z values , and hence doesn’t knows what to do with them. Therefore raises an error . This same thing was happening with you when you were trying to inverse transform your test set.

-> Case 2: Fitting on both X and Y
we first merge X and Y to form W dataframe . Then we fit encoder on W , the difference now is that encoder knows about all the values and hence also knew what to those Z values , So now if you try to inverse transform on test data , it doesn’t produces any error and works well.

Case 2 is what you need to implement to make your model and code works fine.
I hope this would have clarified you doubt.

If still there is some confusion or something you are not getting , then you can call or text me at 8630831390. I will be really happy to resolve your doubt.

Thank You.
Happy Learning .

mananaroramail · May 23, 2020, 12:58pm

Okay, that was an AMAZING EXPLANATION !!!
I got the reason why this error happens.

But let me just clarify :
You said that X values are encoded and we tested for Y values as in Case 1.
The encoder encountered a different example (" Bull ") which it has never seen before while applying encoding on the frame " X ".
Thus it gave an error, I get that.

But, in my case:

I read a csv file
It contains both the features and the labels , i.e both X and Y dataframe --> W
I applied this " le.fit_transform " on " W " rather than on X or Y
Then too, why the encoder is unaware of some values ( Z ) ??

prashant_ml · May 23, 2020, 4:11pm

i am considering
X = train feature
Y = test feature
Note: not Y to be as target variable.

prashant_ml · May 23, 2020, 4:15pm

Hey @mananaroramail , can you just do one thing .
Just upload your code file and the required csv file on google drive and share its link with me.

I will just let you know what actually error you are facing through your explanation.
and it will be much easier for you understand it though.

I apologize such delay in resolving your doubt.

Happy Learning .