Seminars and Events
An Investigation of Intermediate Representations in Spoken Language Models
Event Details
Location: Conf Room CR#689
Speaker: Tolúlọpẹ́ Ògúnrẹ̀mí, Stanford University
REMINDER:
Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.
If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.
If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.
https://usc.zoom.us/j/93979709729?pwd=v8abin7zGE0E7jWy4cGoEj8vyyFlUT.1
Meeting ID: 939 7970 9729
Passcode: 804448
Spoken language models, large language models trained to process speech and audio inputs by leveraging speech encoder representations, have rapidly increased in popularity as a new modelling approach to speech processing tasks. These models train modality adapters to adapt speech encoder output into language model input.
In this work, we use CommonVoice and FLEURS automatic speech recognition (ASR) data in several languages to investigate the output of the modality adapter of spoken language models. We introduce an algorithm to determine whether the modality adapter output resembles a transcription, transliteration or a semantic representation of the speech. We also find that the representation of a language in the language model affects the modality adapter output and transcription abilities of the spoken language models.