MT Marathon 2026 – Karlsruhe, Germany

Programme

What to Expect

Each Marathon follows a tried-and-tested format balancing learning, hands-on work, and collaboration. The detailed schedule will be announced closer to the event.

Keynote Talks

Invited presentations from leading researchers and practitioners at the forefront of MT.

Tutorials

MT fundamentals, advanced topics and hands-on tutorials led by experts from academia and industry.

Collaborative Projects

Teams form around proposed topics on day one. Most projects result in a final presentation — many continue long after.

Poster Session

Open session for short abstracts on MT/NLP research, open-source tools, and work in progress.

Social Activities

Events designed to bring participants, lecturers and local organisers together informally.

Final Presentations

Teams share project results on Friday afternoon with all participants and invited guests.

Schedule at a Glance

Specific talk titles and times will be filled in as the event approaches.

Monday

14 Sep

09:00–09:30

Registration

09:30–11:00

Opening Session: Jan Niehues & Keynote: The Interplay of MT Evaluation and Multilingual LLMs

David Vilar

11:00–11:30

Coffee Break

11:30–13:00

Keynote: Lessons from the Trenches of Building LLM Translation Model from Scratch

Tom Kocmi

13:00–14:00

Lunch Break

14:00–16:00

Project Proposals

16:00–16:30

Coffee Break

16:30–17:30

Project Work

Tuesday

15 Sep

09:00–09:30

Registration

09:30–11:00

Keynote: Toward Machine Interpreting: Lessons Learned from Human Interpreting Studies

Matthias Sperber

11:00–11:30

Coffee Break

11:30–13:00

Panel Discussion: Speech Translation Today: Progress, Gaps, and What's Next

13:00–14:00

Lunch Break

14:00–16:00

Poster Session

16:00–16:30

Coffee Break

16:30–17:30

Project Work

Wednesday

16 Sep

09:00–09:30

Registration

09:30–11:00

Keynote: Towards Multilingual, Multimodal Foundation Models

Jan Niehues

11:00–11:30

Coffee Break

11:30–13:00

Keynote Talk

13:00–14:00

Lunch Break

14:00–16:00

Tutorial: Human Evaluation of Multilingual Tasks

Vilém Zouhar · Maike Züfle · Patricia Schmidtova

16:00–16:30

Coffee Break

16:30–17:30

Project Work

Thursday

17 Sep

09:00–09:30

Registration

09:30–11:00

Keynote: Data is an LLM's Source Code: Lessons from Building a Massive German Corpus

Letitia Parcalabescu

11:00–11:30

Coffee Break

11:30–13:00

Keynote Talk

13:00–14:00

Lunch Break

14:00–16:00

Project Work

16:00–16:30

Coffee Break

16:30–17:30

Project Work

Friday

18 Sep

09:00–09:30

Registration

09:30–11:00

Keynote: What Makes a Speech LLM Efficient? Metrics, Trade-offs, Design Principles, and Post-training Solutions

Marco Gaido

11:00–11:30

Coffee Break

11:30–13:00

Keynote: Tokenization for LLMs and Machine Translation: Morphological Plausibility, Cross-lingual Alignment, and What Actually Matters

Jindřich Libovický

13:00–14:00

Lunch Break

14:00–16:00

Project Work

16:00–16:30

Coffee Break

16:30–17:30

Final Project Reports

Tutorials

Keynote

Projects

Social

Admin

Call for Poster Abstracts

We invite students, developers, and researchers to submit 1-page abstracts for the open poster session. All relevant submissions will be accepted after a light review for topical scope.

Suitable topics include:

Previously published MT/NLP results
Open-source tool demonstrations
Work in progress and preliminary findings
System descriptions and evaluations

Submission deadline: 5 Sep 2026

Speakers & Instructors

Invited Speakers

Marco Gaido

Fondazione Bruno Kessler (FBK), Trento, Italy

What Makes a Speech LLM Efficient? Metrics, Trade-offs, Design Principles, and Post-training Solutions

Abstract

As speech translation systems and speech-enabled large language models (LLMs) continue to scale, efficiency is emerging as a central challenge rather than a secondary concern. But what does it actually mean for a speech model to be efficient? In this talk, we discuss the many axes along which efficiency can be measured and which metrics are used, highlighting what their implications and limits are. We then explore practical approaches to improving efficiency in speech models and speech LLMs (but not only), spanning from architectural choices, to inference strategies, and post-training solutions, with considerations regarding their effectiveness and performance trade-offs. In conclusion, we highlight open research questions and potential directions for future work on the topic.

Bio

Marco Gaido is a researcher at Fondazione Bruno Kessler (Trento, Italy), specializing in speech translation. He obtained his PhD in Information and Communication Technology from the University of Trento in 2023, graduating cum laude with a thesis on direct speech translation systems. His research focuses on improving the quality, efficiency, and explainability of SpeechLLM, with 60+ peer-reviewed publications including top-tier venues such as ACL, EMNLP, ICLR, and TACL. His work has received multiple awards, including the ACL Outstanding Paper & SAC Award, COLING Outstanding Paper Award, and the Anthony C. Clarke Award for the 2023 EAMT Best Thesis, and he is a member of ELLIS. In addition to his research contributions, he has experience in large-scale systems and open-source development, having contributed to frameworks such as Apache Spark, NeMo, and fairseq, and previously worked in big data engineering roles.

Tom Kocmi

Cohere, Prague, Czechia

Lessons from the Trenches of Building LLM Translation Model from Scratch

Abstract

In this talk, we explore the reality of building a Large Language Model for machine translation from scratch, sharing practical lessons on what actually works and what doesn't. We will discuss a major focus is the shift away from simply chasing data quality. Instead, the new important part of the pipeline is focusing on data difficulty as training models on high-quality but easy data doesn't move the needle anymore. Alongside data strategies, we will detail the importance of iterating on evaluation and how meta-evaluation helped us find the right evaluation judges to set up the reward to drive actual performance gains.

Bio

Tom Kocmi has been actively involved in MT research for over a decade. He is currently a Staff Researcher at Cohere, leading the machine translation capabilities and focusing on evaluation and making models great at multilinguality. His research background is heavily guided by investigating which evaluation is trustworthy and could guide the model development. He also serves as the lead organizer of the WMT General MT Shared Task, a rigorous annual benchmark that evaluates state-of-the-art machine translation systems across diverse language pairs driving the MT research forward.

Jindřich Libovický

Charles University, Prague, Czechia

Tokenization for LLMs and Machine Translation: Morphological Plausibility, Cross-lingual Alignment, and What Actually Matters

Abstract

Subword tokenization is a core component of modern NLP systems, yet its properties are not fully understood. The talk will cover two aspects of tokenization quality: morphological plausibility and cross-lingual alignment, with attention to both how to measure them and whether they can be improved. On morphological plausibility, the talk will present segmentation methods that incorporate lexical and morphological information and discuss the limitations of existing evaluation metrics, proposing a more broadly applicable alternative. On cross-lingual alignment, the talk will examine whether token-level alignment between languages predicts cross-lingual transfer, how it relates to alignment in hidden states, and what happens when tokenizers are optimized directly for cross-lingual token alignment.

Bio

Jindřich Libovický is a researcher at the Charles University, Prague, Czechia. His research focuses on multilingual language modeling, machine translation, and tokenization. He leads a research group working on cross-lingual alignment and fairness in language models.

Jan Niehues

Karlsruhe Institute of Technology, Karlsruhe, Germany

Towards Multilingual, Multimodal Foundation Models

Abstract

Multimodal foundation models open up new opportunities for supporting multilingual communication beyond conventional, direct speech translation. By jointly processing speech, text, and visual context, these models can help users understand not only what was said, but also provide user-specific support. This keynote reviews the full development stack required to turn these capabilities into practical multilingual communication tools. It begins with the design of dedicated benchmarks and evaluation scenarios that reflect realistic communication settings. It then presents recent techniques for context-aware and multimodal language support. Finally, the talk highlights open challenges.

Bio

Jan Niehues is a professor at the Karlsruhe Institute of Technology, where he leads the AI for Language Technologies group. His research focuses on machine translation, spoken language translation, multilingual large language models, and AI-supported communication. He received his doctorate from KIT in 2014 and has been involved in several national and European research projects on language technologies. He is also active in the spoken language translation community and currently serves as an organizer of IWSLT.

Letitia Parcalabescu

Aleph Alpha Research, Heidelberg, Germany

Data is an LLM's Source Code: Lessons from Building a Massive German Corpus

Abstract

The race to scale LLMs has long been obsessed with quantity, but the real frontier is engineering data quality. Despite the "model collapse" narrative surrounding AI-generated content, we'll show how to systematically curate and synthesize high-quality pre-training data at scale. Using our 628-billion-word German corpus [1] as a case study, we'll walk through the mechanics of a modern pre-training data pipeline — from heuristic filters to synthetic generation — and show the measurable gains these choices deliver when training from scratch. This is a no-nonsense look at what actually makes data work for LLMs today.

Bio

Letitia Parcalabescu holds a PhD in Computational Linguistics and a background in Physics and Computer Science. She is an AI researcher at Aleph Alpha Research, focusing on training interpretable reasoning models and curating and synthesizing data for large-scale pretraining. Letitia also runs the YouTube channel AI Coffee Break with Letitia, where she explains cutting-edge AI research papers and tech.

Matthias Sperber

Apple, Aachen, Germany

Toward Machine Interpreting: Lessons Learned from Human Interpreting Studies

Abstract

Current speech translation systems, while having achieved impressive accuracies, are rather static in their behavior and do not adapt to real-world situations in ways human interpreters do. In order to improve their practical usefulness and enable interpreting-like experiences, a precise understanding of the nature of human interpreting is crucial. To this end, we discuss human interpreting literature from the perspective of the machine translation field, while considering both operational and qualitative aspects. We identify implications for the development of speech translation systems and argue that there is great potential to adopt many human interpreting principles using recent modeling techniques. We hope that our findings provide inspiration for closing the perceived usability gap, and can motivate progress toward true machine interpreting.

Bio

Matthias Sperber is a research engineer on Apple's machine translation team. He has published extensively on speech translation and related fields and holds a doctorate from the Karlsruhe Institute of Technology for a dissertation titled End-to-End Neural Speech Translation. His research interests span the technical challenges of speech translation and interdisciplinary approaches to identifying and addressing real user needs.

David Vilar

Google DeepMind, Berlin, Germany

The Interplay of MT Evaluation and Multilingual LLMs

Abstract

The rise of Large Language Models (LLMs) has deeply influenced the field of machine translation. Given their ability for instruction following and multilingual capabilities, translation has become "one more task" for many developers. But translation remains a very nuanced problem. In this talk we will see how years of research on machine translation, and very specifically on machine translation evaluation, influence the development of multilingual LLMs. We will put special emphasis on incorporating quality metrics in all stages of the development process, from pre-training down to reinforcement learning. At the same time, we will address how LLMs are influencing the development of evaluation metrics, while at the same time pointing out their inherent limitations.

Bio

For more than 20 years, David Vilar has been actively involved in machine translation research. In this time, he has participated in the evolution from statistical phrase-based models to neural machine translation systems and LLM-based systems. He is currently a Staff Research Scientist at Google, where he works on the Google Translate and Gemini teams. His core research focuses heavily on leveraging evaluation metrics to drive modern generative models. In addition to his scientific publications, he has contributed to the community with the release of several open-source and open-weight models, including Jane, Sockeye, Gemma 3 and TranslateGemma.

Tutorial

Vilém Zouhar

ETH Zurich, Switzerland

Maike Züfle

KIT Karlsruhe, Germany

Patricia Schmidtova

Charles University, Prague, Czechia

Human Evaluation of Multilingual Tasks → Tutorial materials

Abstract

Human evaluation is the gold standard for multilingual NLP but is frequently omitted due to operational complexity. This tutorial demonstrates how to design and execute rigorous human evaluation campaigns focusing on multilingual tasks (e.g. translation, multilingual, or multimodal evaluation), covering the full lifecycle: data selection, protocol selection, setting up the evaluation campaign, annotator management, and analysis of results. The practical focus will be on setting up the evaluation campaign with examples, while the theoretical part will be devoted to modern statistical techniques, such as turning pairwise preferences into absolute scores, or modelling benchmarking competitions. At the end, participants will have detailed knowledge of how to design, implement, and run high-quality human evaluation in their scientific and industry applications.

Instructor Bios

Vilém Zouhar
Vilém is a final-year PhD student at ETH Zurich and Google PhD Fellow. He researches natural language processing, focusing on both theoretical and practical aspects of evaluation (human and automatic), and multilinguality. He leads the human evaluation effort at WMT and recently the large-scale Last Translation Benchmark.

Maike Züfle
Maike Züfle is a PhD student at KIT Karlsruhe and an Apple AI/ML Fellow. Her research focuses on instruction-following speech models with speech as both input and output, with a focus on full-duplex models and speech evaluation. She co-organises the instruction-following and speech translation metrics shared tasks at IWSLT.

Patricia Schmidtova
Patricia Schmidtova is a PhD student at Charles University. She investigates the semantic accuracy (faithfulness) of NLG, specializing in evaluation methodology. She received best paper awards at EACL 2024 and INLG 2024. She serves as the student board member of SIGGEN, and co-organized 5 workshops including GEM 2026.

More Speakers to be Announced

We are currently finalising our list of invited keynote speakers. Check back soon or contact us to be notified when speakers are confirmed.

Machine Translation
Marathon 2026

What is the MT Marathon?

Quick Info

Important Dates

What to Expect

Keynote Talks

Tutorials

Collaborative Projects

Poster Session

Social Activities

Final Presentations

Schedule at a Glance

Call for Poster Abstracts

Invited Speakers

Marco Gaido

Tom Kocmi

Jindřich Libovický

Jan Niehues

Letitia Parcalabescu

Matthias Sperber

David Vilar

Tutorial

More Speakers to be Announced

Registration

How to Register

Key Dates

Fee

Questions?

Venue & Travel

Getting There

By Train

By Plane

Local Transport

Accommodation

Recommended Hotels

Sponsors

Machine TranslationMarathon 2026

What is the MT Marathon?

Quick Info

Important Dates

What to Expect

Keynote Talks

Tutorials

Collaborative Projects

Poster Session

Social Activities

Final Presentations

Schedule at a Glance

Call for Poster Abstracts

Invited Speakers

Marco Gaido

Tom Kocmi

Jindřich Libovický

Jan Niehues

Letitia Parcalabescu

Matthias Sperber

David Vilar

Tutorial

More Speakers to be Announced

Registration

How to Register

Key Dates

Fee

Questions?

Venue & Travel

Getting There

By Train

By Plane

Local Transport

Accommodation

Recommended Hotels

Sponsors

Machine Translation
Marathon 2026