img

Kyle Polich discusses with Liang Huang about his work on Baidu on Simultaneous translation. The following are the points covered in the podcast:

  • Most of the advertized cross language translation vendors such as skype do not do simultaneous translation. They wait for the speaker to finish and then the system does the translation. Skype does consecutive translation and not simultaneous translation
  • Simultaneous translation trades off between accuracy and latency
  • You cannot wait too much of a time to do the translation
  • Prefix-to-Prefix method of translating
  • What’s the dataset used ?
  • What’s the accuracy measure - BLUE
  • Different from general translation - There is a time pressure
  • Each person can sustain only 10 minutes of simultaneous translation
  • Main challenge is the structure. In German and Japanese, it is Subject-Object-Verb. Languages have a wierd mix of SOV, SVO etc. and that’s a challenge
  • Input and target side are generated incrementally
  • Variation of seq2seq model - Very easy to code prefix-to-prefix model
  • 2 million sentence pairs - Chinese pairs
  • We use BLUE score for translation quality - String level similarity between human translation and machine translation
  • Higher the BLUE score, the better
  • One given chinese - there can be a tons of english sentences
  • This is unlike unique classification task
  • Ideal situation - Using simultaneous translation to date a foreigner