Movie review project

sir, its huge amt of dataset arround 40K documents how can we use countVectorizer its taking lot of time and also my laptop freezes for sometime while running the code

hey @Ashu1318 ,
for this so much large dataset you need to use something else to understand the reviews.

but , can you please let me know ,whether are you using any toarray() function ? if yess than remove that first and try .

yes im using but after removing how can we access vector from sparse matrix,im whats the alternative??

google colab pr try krna chahiye kya ??

alternative nhi.
just using count vectorizer makes a reference , but toarray stores in memory .
and the point it crashes.

so , you can still access it.

Google colab pr bhi yehi error aaega

its 50% accurate only can u plz check it

tried bigrams and trigrams also

hey @Ashu1318 ,
there are many things that you need tot learn about cleaning the data.

Its not always that you had remove the extra characters , some time , at some place they might be useful.
Dealing with urls , html tags , etc. everything matters .

SO you need to try upon that a lot.
Its not just that using biagrams or triagrams will make it work.

sir dont close this chat i will ask doubts later

No problem buddy.
You can raise them as new doubts.
there wont be any problem in that.

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.