Text embedding & dimension reduction.

Representation Learning for Text Embedding and Dimensionality Reduction

Word2Vec and Glove Models

Question 1: Representation Learning for TextEmbedding[CM1]:1. Embed the text dataset with Word2Vec to convert every word of the cor-pus to embedding vectors. Split data into train-val-test sets with portions80-10-10 percent. Set random state to zero for all functions of this as-signment. To train the Word2Vec model, follow the instructions in theGENSIM documentation linked in the notes below. You can use thegensim.models.word2vec.Word2Vecfunction described there to train it.2. Analyze the Word2Vec embedding space using the cosine similarity mea-sure. Discuss if the words in similar context are actually similar anddescribe your analysis completely.3. Further analyze the quality of the embeddings by trying to find 5 arith-metic computations on the embedding vectors (ie. Relationships suchas “King”-“Man”+“Woman”=“Queen” that we discussed in class). Thewords you choose obviously will depend on the dataset you are using.Discuss the relationships you find and what they mean. Note that thisdataset is small and training it from scratch may not arrive as the bestpossible arithmetic relationships.4. Now load two pretrained models, one Word2Vec and and Glove model(glove-wiki-gigaword-50is a good one to try) on the same dataset andcompare the results for arithmetic relationships on the same words tocompare the scores.Question 2: Representation Learning and dimen-sionality Reduction[CM2]: Part 1: PCA1. Apply PCA on the Word2Vec embeddings. Notice the train, val, and testsets in embedding with dimensionality reduction methods.2

Get instant help from 5000+ experts for