Error not resolving in this challange

here the output is array([‘neg’]) what is the meaning of this output?
according to sample submission the output should be array([‘neg’,‘neg’…])
is my output wrong?? have i missed any steps?? if my answer is correct please tell me how should i bring it to the correct output format

also not able to draw confusion matrix…error - Found input variables with inconsistent numbers of samples: [40000, 1]

here is the link of code - https://ide.codingblocks.com/s/102941

Hello Kushal,
Before answering your query I have a small request for you. Before asking new questions please first acknowledge the answers to the older questions that you’ve asked, either mark them resolved or reply if you have any further query, that’ll ensure us that we are able to answer your queries well.

Now back to the question,
Firstly, you need to convert the strings ‘neg’ and ‘pos’ in the y data to integers. For this, you need to use Label Encoder function. The model needs to have integers in the X and Y data. Also, make sure to convert y into an np array before fitting.

Share your code for the confusion matrix which is returning that error. I’ll get back to you on that then.

Hello bhaia
well i have made the changes as per your guidelines but there is no change the error
link - https://ide.codingblocks.com/s/102941

yes bhaia i always click on resolve button asap i feel sure about that particular doubt

1 Like

Okay, I am looking into it

Kushal, when you are prininting the xt_vec, is it being displayed how you’d expect? Also try printing the shape of xt_vec, what is it coming out to be?

Plus, you need to fit your MultinomialNB Object on (x_vec, Y) not (x_vec,y).

bhaia i have printed (x_vec,Y) only by mistake in ide it got printed (x_vec,y) and regarding the shape of xt_vec it is giving something abnormal - (1, 2270363)

Yes, that’s where the mistake is. The shape of the testing vector should be (no. of examples in the testing set, no. of words in the cv object).

I have made the changes in your code. Here is the revised code which works well:

Also, about the confusion matrix, you were plotting one between the predictions of x_test and the actual values of y_train. They are not related. You can only plot one when you have got the actual y_test values which you do not have (ofcourse), if you need to make you still then plot one between what your model predicts for x_train and the actual values of y_train which you have. I hope you’ll be able to do that.

1 Like

yes thank you it worked …
1.its just like a magic that reshaping my test data worked but how the reshaping of test data worked so accurately.

2.now i am facing problem that the answer is in integer , now i have to convert it back to string that is pos or neg… how to do that and moreover how to put it in output format with two columns that is id and label?

  1. CountVectorizer accepts a list like object and vectorises it. The shape of the Data it accepts is (no. of sentences, ). Your x_test in this case had the shape of (no. of sentences, 1) which was giving abnormal vectorisation after transforming from the CV object.

  2. Use the LabelEncoder function of le.inverse_transform(y) where y is the array of integers which the model has returned.

thank you bhaia … but now how to convert it in csv format ?? with two columns one with name and another with label

Just how you do it for other challenges, refer to the pd.DataFrame function.

1 Like