The following are some of the takeaways from this medium article.

  • Standard Word2vec models such as CBOW, Skip-gram, Glove have one limitation. They are not able to disambiguate words based on the context. For every word, there is one single vector irrespective of the context.The word “bank” might mean a river bank or a financial institution. Word2vec and Glove compresses it to one vector
  • Embeddings from Language Models(ELMo) and Bidirectional Encoder Representations from Tranformers(BERT) generate embeddings for a word based on the context in which the word appears
  • ELMo uses independently trained left-to-right and right-to-left LSTMs to generate embeddings
  • BERT embeddings are conditional on both the left and the right context and uses Transformer, a neural network architecture based on self-attention mechanism
  • Consistent improvements in NLP tasks
    • sentiment analysis
    • question answering
    • reading comprehension
    • textual entailment
    • semantic role labeling
    • coreference resolution
    • dependency parsing
  • Facebook FastText format
  • Sample data for spam classification: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
  • Flair is a library for state-of-the-art NLP developed by Zalando Research. It’s built in Python on top of the PyTorch framework. Flair allows for the application of state-of-the-art NLP models to text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation, and classification.