Natural Language Processing

What is the difference between fit, transform and fit_transform method in scikit learn?

And how to identify which method is to be used in which situation ?

Hey @gautam75, to center the data (make it have zero mean and unit standard error), you subtract the mean and then divide the result by the standard deviation:

x′= x−μ/σ

You do that on the training set of data. But then you have to apply the same transformation to your testing set (e.g. in cross-validation), or to newly obtained examples before forecast. But you have to use the exact same two parameters 𝜇 and 𝜎 (values) that you used for centering the training set.

Hence, every sklearn’s transform’s fit() just calculates the parameters (e.g. 𝜇 and 𝜎 in case of StandardScaler) and saves them as an internal object’s state. Afterwards, you can call its transform() method to apply the transformation to any particular set of examples.

fit_transform() joins these two steps and is used for the initial fitting of parameters on the training set 𝑥, while also returning the transformed 𝑥′. Internally, the transformer object just calls first fit() and then transform() on the same data.

I hope this clears your doubt ! :+1:
Happy Learning ! :slightly_smiling_face:

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.