Assessing ASR performance with meaning preservation
AI

Assessing ASR performance with meaning preservation

Meaning preservation as an alternative metric Our research leveraged the Project Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from approximately 2,000 individuals with diverse speech impairments. To expand data collection to Spanish speakers, Project Euphonia partnered with the International Alliance of ALS/MND Associations, which facilitated the contribution of speech samples […]

E-Commerce Video Mockups with Hedra • AI Blog
AI

E-Commerce Video Mockups with Hedra • AI Blog

In the ever-evolving landscape of e-commerce, staying ahead of the curve often means adopting the latest technologies to engage and attract customers. One such innovation making waves in the industry is the use of generative video AI models. We’ve had the opportunity to explore Hedra’s generative video AI to create interesting video mockups for an

Assessing ASR performance with meaning preservation
AI

Rich human feedback for text-to-image generation

Recent text-to-image generation (T2I) models, such as Stable Diffusion and Imagen, have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues like artifacts (e.g., distorted objects, text and body parts), misalignment with text descriptions, and low aesthetic quality. For example, the prompt in the image

Assessing ASR performance with meaning preservation
AI

A use case for meeting transcripts

To evaluate the MISeD data, we compare with a dataset collected using the traditional WOZ approach. A “user” annotator was given the general context for a meeting and asked questions about it, while an ”agent” annotator used the full transcripts to provide answers and supporting attribution. This WOZ test set contains 70 dialogs (700 query-response

Generating audio for video – Google DeepMind
AI

Generating audio for video – Google DeepMind

Acknowledgements This work was made possible by the contributions of: Ankush Gupta, Nick Pezzotti, Pavel Khrushkov, Tobenna Peter Igwe, Kazuya Kawakami, Mateusz Malinowski, Jacob Kelly, Yan Wu, Xinyu Wang, Abhishek Sharma, Ali Razavi, Eric Lau, Serena Zhang, Brendan Shillingford, Yelin Kim, Eleni Shaw, Signe Nørly, Andeep Toor, Irina Blok, Gregory Shaw, Pen Li, Scott Wisdom,

Assessing ASR performance with meaning preservation
AI

Pre-translation vs. direct inference in multilingual LLM applications

Large language models (LLMs) are becoming omnipresent tools for solving a wide range of problems. However, their effectiveness in handling diverse languages has been hampered by inherent limitations in training data, which are often skewed towards English. To address this, pre-translation, where inputs are translated to English before feeding them to the LLM, has become

The Enigma of Enforcing GDPR on LLMs • AI Blog
AI

The Enigma of Enforcing GDPR on LLMs • AI Blog

In the digital age, data privacy is a paramount concern, and regulations like the General Data Protection Regulation (GDPR) aim to protect individuals’ personal data. However, the advent of large language models (LLMs) such as GPT-4, BERT, and their kin pose significant challenges to the enforcement of GDPR. These models, which generate text by predicting

GPT-3.5 vs GPT-4o: Building a Money-Blaster
AI

GPT-3.5 vs GPT-4o: Building a Money-Blaster

Back in the day we asked GPT-3.5 in ChatGPT: How do I build a “money blaster”? A money blaster is a device that creates and fires bank notes. ChatGPT with GPT-3.5 replied: I’m sorry, but as an AI language model, I cannot provide instructions on how to build a device that creates and fires bank

Scroll to Top