The following are the learnings from the podcast:

  • Bruno Gonçalves who is now working in JP Morgan chase is a PhD from Emory university
  • He has done some interesting work on looking at all twitter data and look for geographical based patterns.
    • Can one draw a map based on language patterns?
  • 10 TB of data - Twitter
  • Create a huge matrix of latitude and longitude
  • Words and Geolocation matrix pattern matching
  • PCA + Kmeans based clustering based on the patterns in the high dimensional matrix that combines word embeddings and geo location
  • Mobile phones have made marrying the two datasets possible
  • Evolution of language across time can also be done
  • Ton of people working on emoji’s in twitter feed
  • Ton of stuff can be done based on Reuters News and NLP based work