The movie review classifier TLE

even after using a generate to iterate over all the reviews in order to clean them, the program is not able to do it (it keeps running like forever).
what’s the way out?

hey @muditarya31,
the way in which you doing this is wrong.
It is true that it will take time as you are trying to first merge the generator in a tuple, so it will take time for it.

can you please let me know that how you are going to use it in further as based on you approach forward i would be only able to let you know what you need to do.

THank You :slightly_smiling_face:.

https://drive.google.com/file/d/1_qT3Kb4o9P3jj_0ecOWoxFB8Dxe7WApu/view?usp=sharing
This is my code. If i’m wrong, please tell me how to use generators correctly to solve this problem.

hey @muditarya31 ,
can you let me know , around on average how much time it is taking to get it done.
Because i just tried , and due to 40000 records it just took me 2 min to get all done.


this is the problem i’m facing. Not sure if there will be more such errors in the program after this one gets resolved.

Try changing review to [review]

I hope it helps you.

even after solving this issue by writing [review] instead of review, I’m getting vectors of different size(length). How to resolve this problem? Is there something wrong with the way I have used the generator? Kindly help

it is happening as such because you are fitting your count vectorizer on each sentence individually rather what you need to do is to fit that count vectorizer on all sentences at once.
So just do

from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(ngram_range=(1,1)) genvec = (i for i in Xclean) #generator for vectors Xtrain = [] vec = cv.fit_transform(Xclean) vec

I hope this will help you :slightly_smiling_face:.

problem still not solved.

dont use toarray() , as it uses a lot of memory space to store the data.

now there’s a new error/warning that has occured preventing the program from running


does someone have its solution? like complete solution. I have been trying to solve this assignment since days. please help.

i have already shared complete code. sharing it again…
https://drive.google.com/file/d/11_WLIasMP2ZYVVfMqVdF9t51s_lgFOIn/view?usp=sharing

this is the link to your modified code
https://colab.research.google.com/drive/11DohoTJU76cT-0VO78EhzX5Yc5thRuLI

although you need to try something else as it is taking a hell lot of time to complete.

The access is denied for this link.
Also i don’t know any other way of doing this. I have done this according to what we were taught in the videos. Can you provide me the ideal solution that was expected for this assignment?

I have provided the access.
and by other technique i mean , you can try other alogithm like Naive bayes from sklearn , or you need to optimize your code .
Because with your current approach , its taking a lot time.