About normalization of dataset

Sahil_Garg · May 29, 2019, 5:06pm

we normalize the trainging set( x_train) before training the model…do we also need to normalize the x_test before prediction?

rachitbansal2500 · May 29, 2019, 6:49pm

Hey Sahil,
When feeding data to a model, only the input data is normalized, i.e. x_train in this case.
Thus, we won’t normalize x_test.

I hope this clarifies your doubt.

rachitbansal2500 · May 31, 2019, 12:03pm

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.

mohituniyal2010 · June 8, 2019, 8:02am

Hi @rachitbansal2500
But I read somewhere that we have to normalize/scaling X_test also, since this is also an input for the model, and previously model is trained on the data which is normalized (specific range), and now if we don’t normalize our X_test, the range would be very different, and predictions would not be correct.
for an eg. there are 2 features house_size (in feet) and no_of_bedrooms. house_size is a very large number in comparison to no_of_bedrooms, so we bring data in one range say [-1,1] (normalization),
and now if we test our new data and provide input without normalization i.e size as 2500ftsq will it not predict the wrong answer, because it was trained earlier on a smaller range.

Please correct me, if I’m wrong @Prateek-Narang-10209158320224419

rachitbansal2500 · June 8, 2019, 6:47pm

Hello Mohit,
There must be a confusion on my part in understanding your initial question.

All input data to the model needs to be normalised, x_train as well as x_test (if not normalised already). Not the y values though.

I hope this clears the doubt.
Sorry for the confusion.

mohituniyal2010 · June 9, 2019, 5:00am

Yep, got it. thanks

Management718 · June 26, 2019, 5:13am

But Prateek Bhaiya only normalises the X_train data . I havent watched him normalising the X_test data in any tutorial yet. Whereas after reading all the above discussion it seems that normalisation of testing and training data both is necessary . Please clear me.

rachitbansal2500 · June 26, 2019, 6:18am

Mostly we normalise the data as a whole and only after that we split it into train and test. Otherwise the are normalised separately.

Both of them need to be normalised though. Consider a situation where you have normalised the training data but not the testing one, the model would be trained for returning predictions according to the normalised input and the un-normalised data would mean nothing to it.

I hope this clears your doubt.

Management718 · June 26, 2019, 6:36am

ThankYou Sir, Now Its clear.