Assessing ASR performance with meaning preservation

Meaning preservation as an alternative metric

Our research leveraged the Project Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from approximately 2,000 individuals with diverse speech impairments. To expand data collection to Spanish speakers, Project Euphonia partnered with the International Alliance of ALS/MND Associations, which facilitated the contribution of speech samples from individuals living with ALS in Mexico, Colombia, and Peru. Similarly, Project Euphonia expanded to French speakers through a partnership with Romain Gombert from the Paris Brain Institute to collect data from people with atypical speech in France.

For our experiments, we generated a dataset of 4,731 examples consisting of ground truth and transcription error pairs along with a human label identifying whether those pairs would be meaning preserving or not (see details in our paper). We split the dataset into training, test, and validation sets (80% / 10% / 10%, respectively) ensuring the three sets would not overlap on the ground truth phrase level.

With this data, we trained a classifier for meaning preservation on top of a base LLM. Using prompt-tuning — a parameter-efficient method to adapt LLMs — we conditioned our base LLM on our training set to predict the labels “yes” or “no” to indicate whether the meaning has been preserved or not.

We use the following format to represent the data to the LLM:

Meaning preservation as an alternative metric

Leave a Comment Cancel Reply