Challenge - Movie Ratings(sentiment-analysis)

snehill090 · August 9, 2021, 7:18am

Sir here data is actully so messy i mean so shabby there were https, br, title:xyz, promotions nd much more like this are prsent in btwn the sentences. Yaa im trying to get rid out off this using regx and some other filtration methods but still i don’t know im not so sure about this can you please help me? anyone??
And one more thing is, data is actully too big so like if im trying to break sentncs like bigrams, trigrams as i tends towards n-grams dimensionality count of my data also increses like for trigrams it is around 48000 -_-
how do i deal with this?

prashant_ml · August 9, 2021, 8:11pm

hey @snehill090 ,
the first step in data cleaning , that you are working on correctly , keep going with that.

what you don’t know ??

you might be storing them as array using toarray function , just dont do that ,and use it directly as it is it will work.

snehill090 · August 9, 2021, 8:28pm

what you don’t know ??

how to clean all that big data as I already mentioned there were some words in every sentence which i think are completely irrelevant.
Like If say okky I’m going to read every sentence -_- so that I can filter out those irrelevant words. but is it worth? I don’t think so…
So how do I do this work? How do I filter those words?

prashant_ml · August 10, 2021, 5:09pm

hey @snehill090 ,
see you can’t be perfect in that , and so there will some such words there in the corpus which you can’t deal with , but there frequency of occurrence will be so low , that it won’t effect your modelling much.

Hence , to whatever extent you can go and understand such differences , that is way better than actual data.

I hope this helps.

snehill090 · August 11, 2021, 5:48pm

data_pre_poc.py look this is how i am doing…

prashant_ml · August 11, 2021, 6:09pm

so what is the problem in this ? its correct to go with.

See there are many other things to try on , but you will come to know about them while practicing only , so search and then try them, in some or the other way they will surely help you only.

snehill090 · August 11, 2021, 6:20pm

whole data-set gonna take lot of computational power which is -_-
but thank you sir for clearing my doubts…

one last question…
Is there any way I can reduce the dementiality? like mujhe smjh nhi aarha kaise

prashant_ml · August 11, 2021, 6:22pm

yes it will. No other option for that.

yeah , so once you have generated the features using count Vectorizer or tfidf , then you can PCA or TSNE , to reduce the dimensions of your data and then further use it for prediction purposes.

snehill090 · August 11, 2021, 6:32pm

TSNE( i haven’t read it yet; practically)
…
Today you have cleared many of my doubts …
so once again thank you

snehill signing off - till next time

prashant_ml · August 11, 2021, 6:52pm

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.

snehill090 · August 12, 2021, 7:34pm

bcz countvectorizer method returns sparce matrix…
how to apply pca on sparce matrix??
actully im trying to apply pca(otherwise taking to much time finding accuracy -_-)

prashant_ml · August 13, 2021, 6:10pm

okay , in that case you need to truncatedSVD

similar to PCA , but for sparse matrix

snehill090 · August 14, 2021, 7:54pm

somebody suggested me that use pca_v = PCA.fit_transform(sparce_mjatrix.A) and ya it works!!
But now the main problem comes whether I use PCA or SVD both make some values of my data set negative…nd when i finally apply mnd.fit(pca_v,Y) it says value error :Negative values in data passed to MultinomialNB (input X)

What should i do?

prashant_ml · August 15, 2021, 6:28pm

didn’t you used truncated SVD ??

what is this ? which model.

snehill090 · August 18, 2021, 10:35am

what is this ?which model

sorry its typo -_- it means multinomial naive bayes.

didn’t you used truncated SVD ??
nahh.

prashant_ml · August 18, 2021, 7:17pm

try that once.

its should work i guess , else once try with gaussian naive bayes also.