The Transformer

The following are the learnings from the podcast:

Transfer learning entails reusing existing models. Use the model that comes from training on different tasks
Value delivery through custom feature engineering is not required. Most of the recent successes are in the field of computer vision
- If you do not have a lot of training data, then you can use a model that is already trained on a large image dataset(ImageNet).
- Once the pre-trained model is done, additional layers can be overlaid on that
Source and Target dataset in language dataset is the same - Next word, neighboring words etc
How do you determine whether target is reasonable ? Domain Adaptation - Task remains the same - but source and target domain are different
- Transfer from different sentiment categories
- Create a similarity metric and then check whether the tasks are similar
- If the tasks are similar, then one can apply transfer learning
At a practitioner level, leverage the information from a different domain
Whether you want to update or keep frozen - Adapt a model to a lot of different tasks - Freeze the model and then create several layers on the top of it
Is there an imagenet moment around the corner ?
It is apparent that we have reached ImageNet moment ?
- Directly fine tuning the model or use features of the pre-trained model
Plethora of pre-trained models
XLNet
Domain expertise - Word from pre-trained models
Leverage the labels of the existing data
Leverage the data
Image recognition people vs NLP people
- Language is more challenging
Deal with different languages
- Learn to a lot more information
- Societal context - Needs to work with data
- Particular parts of the image
How different images relate to each other ?
Unlabeled data - We have the ability to get pre-trained information
- Hopefully rely on fewer labels
Training cross-lingual models
Universal embedding space
- One of the conceptually simpler approach
- Map all the words in to a constant embedding space
- Train the model on joint features
- Mapping is easier is there is a common language
Scaling to distant languages is important
Powerful source dataset is needed
Difficult of the target task - Reasonably good binary / multi-class classification - 50 examples are good enough. 200 examples are good enough
Tasks that are more complex - required more training examples
OpenAI - used tldr
Transfer learning is useful for many types of tasks
Applying to the tasks and then use for your own tasks is pretty easy
Larger tasks - Fine tune in a couple of hours
More methodological developments are needed
Can generate datasets
Improving models and Improving techniques
Long term dependencies are still difficult to put in place.
BERT tries to solve this problem by giving a large window for capturing the context
Short term contextual information
Exploring other architectures + Exploring challenging datasets
Near term - Scaling up large training models - More performance. Atleast a couple of larger canonical models
Making the models smaller. Want to enlarge models to get most of the benefits. Don’t want to deal.
Lot of datasets on NLP available -
Developing new datasets are very useful for understanding the shortcomings of the model

Need to work on basics of NN and then move on Transfer learning

Contents