Word Embedding Challenge

dipansha.chhabra19 · April 25, 2020, 4:41am

def readFile(file):
f = open(file,‘r’,encoding=‘utf-8’)
text = f.read()
sentences = nltk.sent_tokenize(text)

data = []
for sent in sentences:
    words =  nltk.word_tokenize(sent)
    words = [w.lower() for w in words if len(w)>2 and w not in stopw]
    data.append(words)
    
return data

text = readFile(‘glove.6B.50d.txt’)

This is giving memory error how can I load the file then

S18CRX0120 · April 25, 2020, 7:11am

Hey @dipansha.chhabra19, from this line text = readFile('glove.6B.50d.txt) i guess you are trying to read glove embeddings file, if yes than do it this way,

embeddings = {}

with open('./GloVE/glove.6B.50d.txt', 'r', encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coeffs = np.array(values[1:], dtype="float32")
        
        embeddings[word] = coeffs

Hope thi woks, and your doubt gets resolved.
Don’t forget to mark the doubt as resolved as well

S18CRX0120 · April 30, 2020, 4:18am

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.