Error not resolving in this challange

kushal1998 · July 11, 2019, 10:15am

here the output is array([‘neg’]) what is the meaning of this output?
according to sample submission the output should be array([‘neg’,‘neg’…])
is my output wrong?? have i missed any steps?? if my answer is correct please tell me how should i bring it to the correct output format

also not able to draw confusion matrix…error - Found input variables with inconsistent numbers of samples: [40000, 1]

here is the link of code - https://ide.codingblocks.com/s/102941

rachitbansal2500 · July 12, 2019, 2:19pm

Hello Kushal,
Before answering your query I have a small request for you. Before asking new questions please first acknowledge the answers to the older questions that you’ve asked, either mark them resolved or reply if you have any further query, that’ll ensure us that we are able to answer your queries well.

rachitbansal2500 · July 12, 2019, 2:25pm

Now back to the question,
Firstly, you need to convert the strings ‘neg’ and ‘pos’ in the y data to integers. For this, you need to use Label Encoder function. The model needs to have integers in the X and Y data. Also, make sure to convert y into an np array before fitting.

Share your code for the confusion matrix which is returning that error. I’ll get back to you on that then.

kushal1998 · July 12, 2019, 6:51pm

Hello bhaia
well i have made the changes as per your guidelines but there is no change the error
link - https://ide.codingblocks.com/s/102941

kushal1998 · July 12, 2019, 6:52pm

yes bhaia i always click on resolve button asap i feel sure about that particular doubt

rachitbansal2500 · July 13, 2019, 7:01am

Okay, I am looking into it

rachitbansal2500 · July 13, 2019, 5:06pm

Kushal, when you are prininting the xt_vec, is it being displayed how you’d expect? Also try printing the shape of xt_vec, what is it coming out to be?

Plus, you need to fit your MultinomialNB Object on (x_vec, Y) not (x_vec,y).

kushal1998 · July 15, 2019, 6:05am

bhaia i have printed (x_vec,Y) only by mistake in ide it got printed (x_vec,y) and regarding the shape of xt_vec it is giving something abnormal - (1, 2270363)

rachitbansal2500 · July 16, 2019, 9:47am

Yes, that’s where the mistake is. The shape of the testing vector should be (no. of examples in the testing set, no. of words in the cv object).

I have made the changes in your code. Here is the revised code which works well:

Also, about the confusion matrix, you were plotting one between the predictions of x_test and the actual values of y_train. They are not related. You can only plot one when you have got the actual y_test values which you do not have (ofcourse), if you need to make you still then plot one between what your model predicts for x_train and the actual values of y_train which you have. I hope you’ll be able to do that.

kushal1998 · July 16, 2019, 5:49pm

yes thank you it worked …
1.its just like a magic that reshaping my test data worked but how the reshaping of test data worked so accurately.

2.now i am facing problem that the answer is in integer , now i have to convert it back to string that is pos or neg… how to do that and moreover how to put it in output format with two columns that is id and label?

rachitbansal2500 · July 17, 2019, 2:51pm

CountVectorizer accepts a list like object and vectorises it. The shape of the Data it accepts is (no. of sentences, ). Your x_test in this case had the shape of (no. of sentences, 1) which was giving abnormal vectorisation after transforming from the CV object.
Use the LabelEncoder function of le.inverse_transform(y) where y is the array of integers which the model has returned.

kushal1998 · July 18, 2019, 4:41am

thank you bhaia … but now how to convert it in csv format ?? with two columns one with name and another with label

rachitbansal2500 · July 18, 2019, 7:15am

Just how you do it for other challenges, refer to the pd.DataFrame function.