Name: MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Start: 2025-03-27T04:00:00-07:00
End: 2025-03-27T05:00:00-07:00

ISI Natural Language Seminar

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

When

Thursday, March 27, 2025 11:00am - 12:00pm PDT

Add to calendar:

Presenter

Presented by:

Julie Kallini, Stanford University

Location

Conference Rm #689 in-person attendance will be permitted for USC/ISI faculty, staff, students only. Open to the public virtually via the zoom link.

Virtual URL

Online Link

Virtual Recording

This event is open to:

Everyone

Event Details

Location: CR#689 ISI-MDR

Speaker: Julie Kallini, Stanford University

REMINDER:

Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.

If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.

https://usc.zoom.us/j/92986255795?pwd=mbJqNRr6isZBQ9mn643fgalO5gksDs.1

Meeting ID: 929 8625 5795

Passcode: 804448

Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts. While character- or byte-level models like ByT5 attempt to address these concerns, they have not gained widespread adoption—processing raw byte streams without tokenization results in significantly longer sequence lengths, making training and inference inefficient. This work introduces MrT5 (MergeT5), a more efficient variant of ByT5 that integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length. After processing through a fixed number of encoder layers, a learned delete gate determines which tokens are to be removed and which are to be retained for subsequent layers. MrT5 effectively “merges” critical information from deleted tokens into a more compact sequence, leveraging contextual information from the remaining tokens. In continued pre-training experiments, we find that MrT5 can achieve significant gains in inference runtime with minimal effect on performance, as measured by bits-per-byte. Additionally, with multilingual training, MrT5 adapts to the orthographic characteristics of each language, learning language-specific compression rates. Furthermore, MrT5 shows comparable accuracy to ByT5 on downstream evaluations such as XNLI, TyDi QA, and character-level tasks while reducing sequence lengths by up to 75%. Our approach presents a solution to the practical limitations of existing byte-level models.

Speaker Bio

Julie Kallini is a second-year Ph.D. student in Computer Science at Stanford University, advised by Christopher Potts and Dan Jurafsky. Her research focuses on natural language processing (NLP), with an emphasis on computational linguistics/cognitive science, tokenization, and model architecture. Her paper, "Mission: Impossible Language Models," won Best Paper Award at ACL 2024. Her work is supported by the NSF Graduate Research Fellowship, the Stanford School of Engineering Graduate Fellowship, and the Stanford EDGE Fellowship. Before starting her Ph.D., Julie was a software engineer at Meta, where she worked on machine learning for advertisements. Julie graduated summa cum laude from Princeton University with a B.S.E. in Computer Science and a minor in Linguistics. If speaker approves to be recorded for this seminar, it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ For more information on the NL Seminar series and upcoming talks, please visit: https://www.isi.edu/research-groups-nlg/nlg-seminars/ Hosts: Jonathan May and Katy Felkner

This program is open to all eligible individuals. Information Sciences Institute operates all of its programs and activities consistent with the University’s Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.