About Me
Hi! I’m Liz Salesky ( /lɪz səˈlɛski/), a PhD student at the Center for Language and Speech Processing at Johns Hopkins University, advised by Matt Post and Philipp Koehn.
I am very lucky to be supported by the Apple Scholars in AI/ML PhD fellowship.
My research primarily focuses on language representations for machine translation and multilinguality, including alternatives to traditional tokenization, multimodal representation learning, and how to create more data-efficient and robust models. I am also interested in studying and modeling variation within and across languages.
Previously, I received my MSc from CMU in 2019 advised by Alex Waibel, collaborating often with the KIT ISL lab and Alan W Black. Before that, I worked at MIT Lincoln Laboratory from 2012-2017, focused on machine translation and language learning applications. I graduated from Dartmouth College in 2012, where I majored in Linguistics and Math.
When not at my computer, I like to learn languages, run, and bike to ice cream!
Publications
2023 |
---|
Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer
|
Text Rendering Strategies for Pixel Language Models
|
Evaluating multilingual speech translation under realistic conditions with resegmentation and terminology
|
Findings of the IWSLT 2023 Evaluation Campaign
|
A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
|
Language Modelling with Pixels
|
2022 |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
|
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
|
UniMorph 4.0: Universal Morphology
|
Findings of the IWSLT 2022 Evaluation Campaign
|
2021 |
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
|
Assessing Evaluation Metrics for Speech-to-Speech Translation
|
Robust Open-Vocabulary Translation from Visual Text Representations
|
A surprisal—duration trade-off across and within the world's languages
|
The Multilingual TEDx Corpus for Speech Recognition and Translation
|
Findings of the IWSLT 2021 Evaluation Campaign
|
SIGTYP 2021 Shared Task: Robust Spoken Language Identification
|
2020 |
SIGTYP 2020 Shared Task: Prediction of Typological Features
|
Relative Positional Encoding for Speech Recognition and Direct Translation
|
A Corpus For Large-Scale Phonetic Typology
|
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing
|
Phone Features Improve Speech Translation
|
Findings of the 2020 IWSLT Evaluation Campaign
|
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
|
Optimizing Segmentation Granularity for Neural Machine Translation
|
2019 |
The IWSLT 2019 Evaluation Campaign
|
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
|
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
|
Fluent Translations from Disfluent Speech in End-to-End Speech Translation
|
2018 |
Towards Fluent Translations from Disfluent Speech
|
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
|
2017 |
KIT’s Multilingual Neural Machine Translation systems for IWSLT 2017
|
The AFRL-MITLL WMT17 Systems: Old, New, Borrowed, BLEU
|
2016 |
The MITLL-AFRL IWSLT 2016 Systems
|
The AFRL-MITLL WMT16 News-Translation Task Systems
|
Operational Assessment of Keyword Search on Oral History
|
2015 |
The MITLL-AFRL IWSLT 2015 MT System
|
The AFRL-MITLL WMT15 System: There’s More than One Way to Decode It!
|
2014 |
The MITLL-AFRL IWSLT 2014 MT System
|
Exploiting Morphological, Grammatical, and Semantic Correlates for Improved Text Difficulty Assessment
|
2013 |
The MIT-LL/AFRL IWSLT-2013 MT system
|
A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners
|