Movie rating prediction

I am unable to clean the review data.
Please explain the steps and the code

Hey @rajukumarbhui, there is complete setion, Project - Movie Review Classification, go through that section completely, and than you will be able to proceed.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

I had gone throgh the whole section of videos, in lectures either 5 lines of reviews or a sample text of 4-5 lines (hardcoded) is considered for cleaning the data. how it can be done for a csv file.

Hey @rajukumarbhui, first you need to load the csv, and preprocessing is more or less the same as in movie rating The .csv doesn’t mean that new type of preprocessing should be discovered for it.

You will find all cleaning code here,

Also you need to make little bit of changes as per your knowledge,

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

http://localhost:8888/notebooks/Desktop/ML_BOOK/Naive_Baeys/Movie_Rating_Prediction/RatingPrediction.ipynb how to remove this error?

Hey @rajukumarbhui the link will not open, it is for your local system, upload your ipynb on google drive and than share the link.

https://drive.google.com/file/d/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw/view it is giving memory related error.

Hey @rajukumarbhui, yes that is because your ram is completely utilized, and there is no free space to allocate.

Remove .toarray() from this line,
x_vec = cv.fit_transform(x_data).toarray()
Than your x_vec will be sparse matrix, Sklearn’s multinomial naive bayes accepts sparse matrix as well. So you can pass that as well.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

1 Like

https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw,, how to convert column vector into 1 d array as y_train is a column vector

Hey @rajukumarbhui, use np.reshape() function

https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw ,still getting error!!

fixed the previos error but stuck again. kindly see this… https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw

Hey @rajukumarbhui,
remove this line,

test_vec=cv.fit_transform(test)

and use

test_vec=cv.transform(test)

The error occurs because suppose there is a word in test_vec which was not in training data, than count vectorized assigns new index for that word, increasing the vector size, which should not be the case.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

Thanks allot buddy …