Name: An Investigation of Intermediate Representations in Spoken Language Models
Start: 2025-04-10T04:00:00-07:00
End: 2025-04-10T05:00:00-07:00

ISI Natural Language Seminar

An Investigation of Intermediate Representations in Spoken Language Models

When

Thursday, April 10, 2025 11:00am - 12:00pm PDT

Add to calendar:

Presenter

Presented by:

Tolúlọpẹ́ Ògúnrẹ̀mí, Stanford University

Location

Conference Rm #689 in-person attendance will be permitted for USC/ISI faculty, staff, students only. Open to the public virtually via Zoom.

Virtual URL

Online Link

This event is open to:

Everyone

Event Details

Location: Conf Room CR#689

Speaker: Tolúlọpẹ́ Ògúnrẹ̀mí, Stanford University

REMINDER:

Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.

If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.

If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.

https://usc.zoom.us/j/93979709729?pwd=v8abin7zGE0E7jWy4cGoEj8vyyFlUT.1

Meeting ID: 939 7970 9729

Passcode: 804448

Spoken language models, large language models trained to process speech and audio inputs by leveraging speech encoder representations, have rapidly increased in popularity as a new modelling approach to speech processing tasks. These models train modality adapters to adapt speech encoder output into language model input.

In this work, we use CommonVoice and FLEURS automatic speech recognition (ASR) data in several languages to investigate the output of the modality adapter of spoken language models. We introduce an algorithm to determine whether the modality adapter output resembles a transcription, transliteration or a semantic representation of the speech. We also find that the representation of a language in the language model affects the modality adapter output and transcription abilities of the spoken language models.

Speaker Bio

Tolúlọpẹ́ Ògúnrẹ̀mí is a Computer Science PhD candidate at Stanford University in the Stanford NLP Group. Her work focusses on speech and language processing for low-resource languages, currently African languages. Her research combines linguistic investigations of these languages and community-based projects that integrate the concerns of local language communities with technological advances. Before, she did a Masters in Speech and Language Processing at the University Edinburgh. If speaker approves to be recorded for this seminar, it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ For more information on the NL Seminar series and upcoming talks, please visit: https://www.isi.edu/research-groups-nlg/nlg-seminars/ Hosts: Jonathan May and Katy Felkner

This program is open to all eligible individuals. Information Sciences Institute operates all of its programs and activities consistent with the University’s Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.