Low Accuracy on glove dataset

Tanishq-Chaudhary-2660485557354404 · June 7, 2020, 9:32am

The embedded dataset is giving very low accuracy on using the odd one out method as suggested in the video. I can’t think of any other way to find odd word. The dataset I used is glove.6B.50d.txt and I have divided it in a dictionary format. Pls tell how can I improve the accuracy.

CrazyRabbit · June 7, 2020, 9:45am

Hi @Tanishq-Chaudhary-2660485557354404,
I request you to share your Code (Or explain your algorithm if you have used anything different) through the Coding Blocks IDE or Google Drive (If it’s a .ipynb notebook).
Also, what is the accuracy score you are getting?

Tanishq-Chaudhary-2660485557354404 · June 7, 2020, 10:12am

Tanishq-Chaudhary-2660485557354404 · June 7, 2020, 10:14am

This is giving 40% accuracy.

CrazyRabbit · June 7, 2020, 12:11pm

Hi @Tanishq-Chaudhary-2660485557354404,

The accuracy you are getting is typical when using the mentioned Glove embeddings. I request you try these embeddings (Also mentioned in the challenge):

After downloading, you can use this line of code to load the embeddings:

embeddings_dict = word2vec.KeyedVectors.load_word2vec_format("GoogleNews-vectors-negative300.bin", binary=True)

You should be getting 85% accuracy (which is also kind of the maximum) when using these. Let me know if you are able to get the mentioned score.

Hope this helps!

Tanishq-Chaudhary-2660485557354404 · June 7, 2020, 12:43pm

I’m getting only 55% accuracy on using google news dataset too.

CrazyRabbit · June 7, 2020, 2:54pm

Can you share the updated code too?

Tanishq-Chaudhary-2660485557354404 · June 8, 2020, 10:45am

https://drive.google.com/file/d/1-w19vzCQAxGRpJukFq6koNHUl2Oqf_0l/view?usp=sharing

Tanishq-Chaudhary-2660485557354404 · June 8, 2020, 10:47am

well i got 85% accuracy from the googlenews dataset

Tanishq-Chaudhary-2660485557354404 · June 8, 2020, 10:52am

https://drive.google.com/file/d/1cjiY5b40jOnINM2RZUFdZXu3eJhimwpJ/view?usp=sharing

CrazyRabbit · June 8, 2020, 11:56am

Hi @Tanishq-Chaudhary-2660485557354404,
Your code is perfect and 85% is the maximum achievable accuracy. If you’re thinking about how others got 100%, worry not. Those submissions are either from CB testers or some students who took the easy way out and modified the submission .csv directly (because the dataset is too small).
Hope this helps!

CrazyRabbit · June 13, 2020, 6:28am

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.