Elizabeth Salesky

About Me

Hi! I’m Liz Salesky ( /lɪz səˈlɛski/), a PhD student at the Center for Language and Speech Processing at Johns Hopkins University, advised by Matt Post and Philipp Koehn.
I am very lucky to be supported by the Apple Scholars in AI/ML PhD fellowship.

My research primarily focuses on language representations for machine translation and multilinguality, including alternatives to traditional tokenization, multimodal representation learning, and how to create more data-efficient and robust models. I am also interested in studying and modeling variation within and across languages.

Previously, I received my MSc from CMU in 2019 advised by Alex Waibel, collaborating often with the KIT ISL lab and Alan W Black. Before that, I worked at MIT Lincoln Laboratory from 2012-2017, focused on machine translation and language learning applications. I graduated from Dartmouth College in 2012, where I majored in Linguistics and Math.

When not at my computer, I like to learn languages, run, and bike to ice cream!

Publications

2023
Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post EMNLP 2023 · pdf · code · data · models · presentation
Text Rendering Strategies for Pixel Language Models Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott EMNLP 2023 · pdf · code · models
Evaluating multilingual speech translation under realistic conditions with resegmentation and terminology Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, Jan Niehues IWSLT 2023 · pdf · new ACL 60-60 data · presentation · demo
Findings of the IWSLT 2023 Evaluation Campaign Milind Agarwal, ..., Elizabeth Salesky, ..., + many more IWSLT 2023 · pdf · data · website
A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen ICASSP 2023 · pdf · demo · website
Language Modelling with Pixels Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott ICLR 2023 · notable top-5% · pdf · code · data · models · demo
2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model BigScience Workshop: Teven Le Scao, ..., Elizabeth Salesky, ..., + many more arXiv preprint · pdf · code · data · models
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer, David Ifeoluwa Adelani, Edresson Casanova, Alp Öktem, Daniel Whitenack Julian Weber, Salomon Kabongo, Elizabeth Salesky, Iroro Orife, Colin Leong, Perez Ogayo, Chris Emezue, Jonathan Mukiibi, Salomey Osei, Apelete Agbolo, Victor Akinode, Bernard Opoku, Samuel Olanrewaju, Jesujoba Alabi, Shamsuddeen Muhammad INTERSPEECH 2022 · pdf · data · code · website
UniMorph 4.0: Universal Morphology Khuyagbaatar Batsuren, Omer Goldman, ..., Elizabeth Salesky, ..., + many more LREC 2022 · pdf · data · website
Findings of the IWSLT 2022 Evaluation Campaign Antonios Anastasopoulos, ..., Elizabeth Salesky, ..., + many more IWSLT 2022 · pdf · data · website
2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, Samson Tan arXiv preprint · pdf
Assessing Evaluation Metrics for Speech-to-Speech Translation Elizabeth Salesky, Julian Mäder, Severin Klinger ASRU 2021 · pdf · data
Robust Open-Vocabulary Translation from Visual Text Representations Elizabeth Salesky, David Etter, Matt Post EMNLP 2021 · pdf · data · code · presentation
A surprisal—duration trade-off across and within the world's languages Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell EMNLP 2021 · pdf · data · code
The Multilingual TEDx Corpus for Speech Recognition and Translation Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post INTERSPEECH 2021 · pdf · data · code: kaldi · code: fairseq · code: vecalign · website
Findings of the IWSLT 2021 Evaluation Campaign Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner IWSLT 2021 · pdf · data · website
SIGTYP 2021 Shared Task: Robust Spoken Language Identification Elizabeth Salesky, Badr M. Abdullah, Sabrina Mielke, Elena Klyachko, Oleg Serikov, Edoardo Maria Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova SIGTYP 2021 · pdf · data · website · presentation
2020
SIGTYP 2020 Shared Task: Prediction of Typological Features Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein SIGTYP 2020 · pdf · data · website
Relative Positional Encoding for Speech Recognition and Direct Translation Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alex Waibel INTERSPEECH 2020 · pdf · code · presentation
A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, Jason Eisner ACL 2020 · pdf · data (gDrive) · data (OSF) · code · presentation · slides · website
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing Clara Meister, Elizabeth Salesky, Ryan Cotterell ACL 2020 · pdf · code · presentation · slides
Phone Features Improve Speech Translation Elizabeth Salesky, Alan W Black ACL 2020 · pdf · data · code · presentation · slides
Findings of the 2020 IWSLT Evaluation Campaign Ebrahim Ansari, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alex Waibel, Changhan Wang IWSLT 2020 · pdf · data · code · website
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden SIGMORPHON 2020 · pdf · data · code · presentation · website
Optimizing Segmentation Granularity for Neural Machine Translation Elizabeth Salesky, Andrew Runge, Alex Coda, Jan Niehues, Graham Neubig Machine Translation 2020. arXiv:1810.08641 Oct. 2018 · pdf · code
2019
The IWSLT 2019 Evaluation Campaign Jan Niehues, Roldano Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loïc Barrault, Lucia Specia, Marcello Federico IWSLT 2019 · pdf · website
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation Elizabeth Salesky, Matthias Sperber, Alan W Black ACL 2019 · pdf · data · code · presentation
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, David R. Mortensen, Jaime G. Carbonell, Yulia Tsvetkov SIGMORPHON 2019 · Interpretability Prize · pdf · data · code · poster · website
Fluent Translations from Disfluent Speech in End-to-End Speech Translation Elizabeth Salesky, Matthias Sperber, Alex Waibel NAACL 2019 · pdf · original data · new fluent data · code · poster
2018
Towards Fluent Translations from Disfluent Speech Elizabeth Salesky, Susanne Burger, Alex Waibel SLT 2018 · pdf · original data · new fluent data · code · poster
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning Florian Dessloch, Thanh-Le Ha, Markus Müller, Jan Niehues, Thai-Son Nguyen, Ngoc-Quan Pham, Elizabeth Salesky, Matthias Sperber, Sebastian Stüker, Thomas Zenkel, Alex Waibel COLING 2018 · pdf · demo
2017
KIT’s Multilingual Neural Machine Translation systems for IWSLT 2017 Ngoc-Quan Pham, Matthias Sperber, Elizabeth Salesky, Thanh-Le Ha, Jan Niehues, Alex Waibel IWSLT 2017 · pdf
The AFRL-MITLL WMT17 Systems: Old, New, Borrowed, BLEU Jeremy Gwinnup, Timothy Anderson, Michaeel Kazi, Elizabeth Salesky, Grant Erdmann, Katherine Young, Brian Thompson, Jonathan Taylor WMT 2017 · pdf
2016
The MITLL-AFRL IWSLT 2016 Systems Michaeel Kazi, Elizabeth Salesky, Brian Thompson, Jonathon Taylor, Jeremy Gwinnup, Timothy Anderson, Grant Erdmann, Eric Hansen, Brian Ore, Katherine Young, Michael Hutt IWSLT 2016 · pdf
The AFRL-MITLL WMT16 News-Translation Task Systems Jeremy Gwinnup, Timothy Anderson, Michaeel Kazi, Elizabeth Salesky, Grant Erdmann, Katherine Young, Brian Thompson WMT 2016 · pdf
Operational Assessment of Keyword Search on Oral History Elizabeth Salesky, Jessica Ray, Wade Shen LREC 2016 · pdf
2015
The MITLL-AFRL IWSLT 2015 MT System Michaeel Kazi, Brian Thompson, Elizabeth Salesky, Timothy Anderson, Grant Erdmann, Eric Hansen, Brian Ore, Jeremy Gwinnup, Katherine Young, Michael Hutt, Christina May IWSLT 2015 · pdf
The AFRL-MITLL WMT15 System: There’s More than One Way to Decode It! Jeremy Gwinnup, Timothy Anderson, Michaeel Kazi, Elizabeth Salesky, Grant Erdmann, Katherine Young, Brian Thompson, Christina May WMT 2015 · pdf
2014
The MITLL-AFRL IWSLT 2014 MT System Michaeel Kazi, Elizabeth Salesky, Brian Thompson, Jessica Ray, Michael Coury, Wade Shen, Tim Anderson, Grant Erdmann, Jeremy Gwinnup, Katherine Young, Brian Ore, Michael Hutt IWSLT 2014 · pdf
Exploiting Morphological, Grammatical, and Semantic Correlates for Improved Text Difficulty Assessment Elizabeth Salesky, Wade Shen BEA 2014 · pdf
2013
The MIT-LL/AFRL IWSLT-2013 MT system Michaeel Kazi, Michael Coury, Elizabeth Salesky, Jessica Ray, Wade Shen, Terry Gleason, Tim Anderson, Grant Erdmann, Lane Schwartz, Brian Ore, Raymond Slyh, Jeremy Gwinnup, Katherine Young, Michael Hutt IWSLT 2013 · pdf
A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners Wade Shen, Jennifer Williams, Tamas Marius, Elizabeth Salesky PITR 2013 · pdf