The following are my takeaways from the talk, Embedding work in NLP.

  • Glove can be used directly in gensim library
  • Do PCA of word vectors to find interesting relationships
  • Two most powerful Word2vec models are
    • CBOW
    • Skipgram
      • Negative Sampling is a technique used in Skipgram to reduce training time
  • Usecases of word2vec applications
    • Airbnb is looking at the listing click sequence from its users and creating a embedding space for various listings assuming that each listing sequence as a sentence
      • Using the sequence of listings that lead to user not clicking anything as negative examples
      • Using the sequence of listings that lead to user clicking something as positive examples
    • Alibaba
      • Creates a huge graph that traces the click through and then generates random walks across products as a sentence
      • These random walks that are a proxy to sentence contains products as words
      • Using the random walks and training via word2vec, Alibaba is creating recommendations for its users
    • ASOS
      • Predicting the customer life time value using word2vec model
    • ANGHAMI
      • Using Word recommendation engine
    • Spotify
      • Using it for music recommendations / artist recommendations
    • Factchecking uses word embeddings - https://www.youtube.com/watch?v=ddf0lgPCoSo
  • Challenges
    • 1 billion hours of Youtube video are watched every day
    • 700 million hours are recommended by algos
    • Scope for massive influence on us
    • Youtube has come out with a statement that says that it will not recommend certain things
    • Facebook is tweaking its algo so that content that has a potential to become illegal is removed from the recommendation algos