Simultaneous Translation

Kyle Polich discusses with Liang Huang about his work on Baidu on Simultaneous translation. The following are the points covered in the podcast:

Most of the advertized cross language translation vendors such as skype do not do simultaneous translation. They wait for the speaker to finish and then the system does the translation. Skype does consecutive translation and not simultaneous translation
Simultaneous translation trades off between accuracy and latency
You cannot wait too much of a time to do the translation
Prefix-to-Prefix method of translating
What’s the dataset used ?
What’s the accuracy measure - BLUE
Different from general translation - There is a time pressure
Each person can sustain only 10 minutes of simultaneous translation
Main challenge is the structure. In German and Japanese, it is Subject-Object-Verb. Languages have a wierd mix of SOV, SVO etc. and that’s a challenge
Input and target side are generated incrementally
Variation of seq2seq model - Very easy to code prefix-to-prefix model
2 million sentence pairs - Chinese pairs
We use BLUE score for translation quality - String level similarity between human translation and machine translation
Higher the BLUE score, the better
One given chinese - there can be a tons of english sentences
This is unlike unique classification task
Ideal situation - Using simultaneous translation to date a foreigner

Contents