Movie rating prediction

rajukumarbhui · April 6, 2020, 12:14pm

I am unable to clean the review data.
Please explain the steps and the code

S18CRX0120 · April 6, 2020, 12:49pm

Hey @rajukumarbhui, there is complete setion, Project - Movie Review Classification, go through that section completely, and than you will be able to proceed.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.

rajukumarbhui · April 6, 2020, 2:55pm

I had gone throgh the whole section of videos, in lectures either 5 lines of reviews or a sample text of 4-5 lines (hardcoded) is considered for cleaning the data. how it can be done for a csv file.

S18CRX0120 · April 6, 2020, 5:05pm

Hey @rajukumarbhui, first you need to load the csv, and preprocessing is more or less the same as in movie rating The .csv doesn’t mean that new type of preprocessing should be discovered for it.

You will find all cleaning code here,

Also you need to make little bit of changes as per your knowledge,

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.

rajukumarbhui · April 7, 2020, 7:18am

http://localhost:8888/notebooks/Desktop/ML_BOOK/Naive_Baeys/Movie_Rating_Prediction/RatingPrediction.ipynb how to remove this error?

S18CRX0120 · April 7, 2020, 8:25am

Hey @rajukumarbhui the link will not open, it is for your local system, upload your ipynb on google drive and than share the link.

rajukumarbhui · April 7, 2020, 12:25pm

https://drive.google.com/file/d/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw/view it is giving memory related error.

S18CRX0120 · April 7, 2020, 1:58pm

Hey @rajukumarbhui, yes that is because your ram is completely utilized, and there is no free space to allocate.

Remove .toarray() from this line,
x_vec = cv.fit_transform(x_data).toarray()
Than your x_vec will be sparse matrix, Sklearn’s multinomial naive bayes accepts sparse matrix as well. So you can pass that as well.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.

rajukumarbhui · April 7, 2020, 3:03pm

https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw,, how to convert column vector into 1 d array as y_train is a column vector

S18CRX0120 · April 7, 2020, 3:07pm

Hey @rajukumarbhui, use np.reshape() function

rajukumarbhui · April 7, 2020, 3:43pm

https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw ,still getting error!!

rajukumarbhui · April 7, 2020, 4:04pm

fixed the previos error but stuck again. kindly see this… https://colab.research.google.com/drive/1yRuAzO82GBzeHBJHvW4PXEjJpGOjOlpw

S18CRX0120 · April 7, 2020, 7:12pm

Hey @rajukumarbhui,
remove this line,

test_vec=cv.fit_transform(test)

and use

test_vec=cv.transform(test)

The error occurs because suppose there is a word in test_vec which was not in training data, than count vectorized assigns new index for that word, increasing the vector size, which should not be the case.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.

shivam.mukati · July 25, 2020, 6:16am

Thanks allot buddy …