even after using a generate to iterate over all the reviews in order to clean them, the program is not able to do it (it keeps running like forever).
what’s the way out?
The movie review classifier TLE
hey @muditarya31,
the way in which you doing this is wrong.
It is true that it will take time as you are trying to first merge the generator in a tuple, so it will take time for it.
can you please let me know that how you are going to use it in further as based on you approach forward i would be only able to let you know what you need to do.
THank You .
https://drive.google.com/file/d/1_qT3Kb4o9P3jj_0ecOWoxFB8Dxe7WApu/view?usp=sharing
This is my code. If i’m wrong, please tell me how to use generators correctly to solve this problem.
hey @muditarya31 ,
can you let me know , around on average how much time it is taking to get it done.
Because i just tried , and due to 40000 records it just took me 2 min to get all done.
this is the problem i’m facing. Not sure if there will be more such errors in the program after this one gets resolved.
Try changing review to [review]
I hope it helps you.
even after solving this issue by writing [review] instead of review, I’m getting vectors of different size(length). How to resolve this problem? Is there something wrong with the way I have used the generator? Kindly help
it is happening as such because you are fitting your count vectorizer on each sentence individually rather what you need to do is to fit that count vectorizer on all sentences at once.
So just do
from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(ngram_range=(1,1)) genvec = (i for i in Xclean) #generator for vectors Xtrain = [] vec = cv.fit_transform(Xclean) vec
I hope this will help you .
dont use toarray() , as it uses a lot of memory space to store the data.
now there’s a new error/warning that has occured preventing the program from running
does someone have its solution? like complete solution. I have been trying to solve this assignment since days. please help.
i have already shared complete code. sharing it again…
https://drive.google.com/file/d/11_WLIasMP2ZYVVfMqVdF9t51s_lgFOIn/view?usp=sharing
this is the link to your modified code
https://colab.research.google.com/drive/11DohoTJU76cT-0VO78EhzX5Yc5thRuLI
although you need to try something else as it is taking a hell lot of time to complete.
The access is denied for this link.
Also i don’t know any other way of doing this. I have done this according to what we were taught in the videos. Can you provide me the ideal solution that was expected for this assignment?
I have provided the access.
and by other technique i mean , you can try other alogithm like Naive bayes from sklearn , or you need to optimize your code .
Because with your current approach , its taking a lot time.