A week-long gathering of MT researchers, developers, students and users
About the Event
The Machine Translation Marathon is a week-long gathering of machine translation researchers, developers, students, and users from around the world. It combines hands-on project work with lectures, tutorials, and keynote presentations.
The 2026 edition will be hosted by the Karlsruhe Institute of Technology (KIT) in Karlsruhe, Germany.
MTM 2026 will feature:
Timeline
Programme
Each Marathon follows a tried-and-tested format balancing learning, hands-on work, and collaboration. The detailed schedule will be announced closer to the event.
Invited presentations from leading researchers and practitioners at the forefront of MT.
MT fundamentals, advanced topics and hands-on tutorials led by experts from academia and industry.
Teams form around proposed topics on day one. Most projects result in a final presentation — many continue long after.
Open session for short abstracts on MT/NLP research, open-source tools, and work in progress.
Events designed to bring participants, lecturers and local organisers together informally.
Teams share project results on Friday afternoon with all participants and invited guests.
Specific talk titles and times will be filled in as the event approaches.
We invite students, developers, and researchers to submit 1-page abstracts for the open poster session. All relevant submissions will be accepted after a light review for topical scope.
Suitable topics include:
Submission deadline: 5 Sep 2026
Register & Submit AbstractSpeakers & Lecturers
More keynote speakers and tutorial instructors will be announced as confirmations come in.
As speech translation systems and speech-enabled large language models (LLMs) continue to scale, efficiency is emerging as a central challenge rather than a secondary concern. But what does it actually mean for a speech model to be efficient? In this talk, we discuss the many axes along which efficiency can be measured and which metrics are used, highlighting what their implications and limits are. We then explore practical approaches to improving efficiency in speech models and speech LLMs (but not only), spanning from architectural choices, to inference strategies, and post-training solutions, with considerations regarding their effectiveness and performance trade-offs. In conclusion, we highlight open research questions and potential directions for future work on the topic.
Marco Gaido is a researcher at Fondazione Bruno Kessler (Trento, Italy), specializing in speech translation. He obtained his PhD in Information and Communication Technology from the University of Trento in 2023, graduating cum laude with a thesis on direct speech translation systems. His research focuses on improving the quality, efficiency, and explainability of SpeechLLM, with 60+ peer-reviewed publications including top-tier venues such as ACL, EMNLP, ICLR, and TACL. His work has received multiple awards, including the ACL Outstanding Paper & SAC Award, COLING Outstanding Paper Award, and the Anthony C. Clarke Award for the 2023 EAMT Best Thesis, and he is a member of ELLIS. In addition to his research contributions, he has experience in large-scale systems and open-source development, having contributed to frameworks such as Apache Spark, NeMo, and fairseq, and previously worked in big data engineering roles.
The race to scale LLMs has long been obsessed with quantity, but the real frontier is engineering data quality. Despite the "model collapse" narrative surrounding AI-generated content, we'll show how to systematically curate and synthesize high-quality pre-training data at scale. Using our 628-billion-word German corpus [1] as a case study, we'll walk through the mechanics of a modern pre-training data pipeline — from heuristic filters to synthetic generation — and show the measurable gains these choices deliver when training from scratch. This is a no-nonsense look at what actually makes data work for LLMs today.
Letitia Parcalabescu holds a PhD in Computational Linguistics and a background in Physics and Computer Science. She is an AI researcher at Aleph Alpha Research, focusing on training interpretable reasoning models and curating and synthesizing data for large-scale pretraining. Letitia also runs the YouTube channel AI Coffee Break with Letitia, where she explains cutting-edge AI research papers and tech.
The rise of Large Language Models (LLMs) has deeply influenced the field of machine translation. Given their ability for instruction following and multilingual capabilities, translation has become "one more task" for many developers. But translation remains a very nuanced problem. In this talk we will see how years of research on machine translation, and very specifically on machine translation evaluation, influence the development of multilingual LLMs. We will put special emphasis on incorporating quality metrics in all stages of the development process, from pre-training down to reinforcement learning. At the same time, we will address how LLMs are influencing the development of evaluation metrics, while at the same time pointing out their inherent limitations.
For more than 20 years, David Vilar has been actively involved in machine translation research. In this time, he has participated in the evolution from statistical phrase-based models to neural machine translation systems and LLM-based systems. He is currently a Staff Research Scientist at Google, where he works on the Google Translate and Gemini teams. His core research focuses heavily on leveraging evaluation metrics to drive modern generative models. In addition to his scientific publications, he has contributed to the community with the release of several open-source and open-weight models, including Jane, Sockeye, Gemma 3 and TranslateGemma.
We are currently finalising our list of invited keynote speakers and tutorial instructors. Check back soon or contact us to be notified when speakers are confirmed.
Sign Up
Registration is free of charge for EAMT members. Space is limited — register early to help with planning.
Complete the registration form (link below) and optionally submit a short abstract for the poster session. The form will remain open until the start of the event, but early registration is appreciated.
Location
KIT Karlsruhe is centrally located in southwestern Germany and easy to reach from anywhere in Europe.
The 2026 Machine Translation Marathon will be held at the Karlsruhe Institute of Technology (KIT), one of Germany's largest and most prestigious research universities, in Karlsruhe, Baden-Württemberg.
The specific building and room will be confirmed closer to the event. Please check back for updates.
Karlsruhe is well connected by train, plane and car.
Direct ICE connections from Frankfurt (1h), Stuttgart (40 min), and Paris (2h). Book via DB Bahn.
Nearest airports: Frankfurt (FRA, ~1h by ICE), Stuttgart (STR, ~1h), and Karlsruhe/Baden-Baden (FKB).
KIT is well served by Karlsruhe's tram network. Tram lines 2 and S2 stop at Durlacher Tor/KIT-Campus Süd. KVV tickets cover the whole region.
Where to Stay
The following hotels are informal recommendations by the organisers and offer convenient access to KIT Institute for Anthropomatics via tram or walking.
Support the Event
We welcome organisations working in MT, NLP, language services, and related fields. Sponsoring MT Marathon 2026 connects you with researchers and developers from across Europe and beyond.
We are finalising sponsorship packages. Confirmed sponsors will be listed here with their logos. Interested in sponsoring? Get in touch.