img

The following are the learnings from the podcast:

  • Transfer learning entails reusing existing models. Use the model that comes from training on different tasks
  • Value delivery through custom feature engineering is not required. Most of the recent successes are in the field of computer vision
    • If you do not have a lot of training data, then you can use a model that is already trained on a large image dataset(ImageNet).
    • Once the pre-trained model is done, additional layers can be overlaid on that
  • Source and Target dataset in language dataset is the same - Next word, neighboring words etc
  • How do you determine whether target is reasonable ? Domain Adaptation - Task remains the same - but source and target domain are different
    • Transfer from different sentiment categories
    • Create a similarity metric and then check whether the tasks are similar
    • If the tasks are similar, then one can apply transfer learning
  • At a practitioner level, leverage the information from a different domain
  • Whether you want to update or keep frozen - Adapt a model to a lot of different tasks - Freeze the model and then create several layers on the top of it
  • Is there an imagenet moment around the corner ?
  • It is apparent that we have reached ImageNet moment ?
    • Directly fine tuning the model or use features of the pre-trained model
  • Plethora of pre-trained models
  • XLNet
  • Domain expertise - Word from pre-trained models
  • Leverage the labels of the existing data
  • Leverage the data
  • Image recognition people vs NLP people
    • Language is more challenging
  • Deal with different languages
    • Learn to a lot more information
    • Societal context - Needs to work with data
    • Particular parts of the image
  • How different images relate to each other ?
  • Unlabeled data - We have the ability to get pre-trained information
    • Hopefully rely on fewer labels
  • Training cross-lingual models
  • Universal embedding space
    • One of the conceptually simpler approach
    • Map all the words in to a constant embedding space
    • Train the model on joint features
    • Mapping is easier is there is a common language
  • Scaling to distant languages is important
  • Powerful source dataset is needed
  • Difficult of the target task - Reasonably good binary / multi-class classification - 50 examples are good enough. 200 examples are good enough
  • Tasks that are more complex - required more training examples
  • OpenAI - used tldr
  • Transfer learning is useful for many types of tasks
  • Applying to the tasks and then use for your own tasks is pretty easy
  • Larger tasks - Fine tune in a couple of hours
  • More methodological developments are needed
  • Can generate datasets
  • Improving models and Improving techniques
  • Long term dependencies are still difficult to put in place.
  • BERT tries to solve this problem by giving a large window for capturing the context
  • Short term contextual information
  • Exploring other architectures + Exploring challenging datasets
  • Near term - Scaling up large training models - More performance. Atleast a couple of larger canonical models
  • Making the models smaller. Want to enlarge models to get most of the benefits. Don’t want to deal.
  • Lot of datasets on NLP available -
  • Developing new datasets are very useful for understanding the shortcomings of the model

Need to work on basics of NN and then move on Transfer learning