Page 211 – Soultarity Tech News

Assessing ASR performance with meaning preservation

kadri alaa / July 9, 2024

Meaning preservation as an alternative metric Our research leveraged the Project Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from approximately 2,000 individuals with diverse speech impairments. To expand data collection to Spanish speakers, Project Euphonia partnered with the International Alliance of ALS/MND Associations, which facilitated the contribution of speech samples […]

E-Commerce Video Mockups with Hedra • AI Blog

kadri alaa / July 4, 2024

In the ever-evolving landscape of e-commerce, staying ahead of the curve often means adopting the latest technologies to engage and attract customers. One such innovation making waves in the industry is the use of generative video AI models. We’ve had the opportunity to explore Hedra’s generative video AI to create interesting video mockups for an

Rich human feedback for text-to-image generation

kadri alaa / June 26, 2024

Recent text-to-image generation (T2I) models, such as Stable Diffusion and Imagen, have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues like artifacts (e.g., distorted objects, text and body parts), misalignment with text descriptions, and low aesthetic quality. For example, the prompt in the image

A use case for meeting transcripts

kadri alaa / June 25, 2024

To evaluate the MISeD data, we compare with a dataset collected using the traditional WOZ approach. A “user” annotator was given the general context for a meeting and asked questions about it, while an ”agent” annotator used the full transcripts to provide answers and supporting attribution. This WOZ test set contains 70 dialogs (700 query-response

Dynamics of magnetization at infinite temperature in a Heisenberg spin chain

kadri alaa / June 18, 2024

Would you be surprised to learn that growing wildfires are described by the same dynamical equations as snow falling and clumping together? Many systems that have different detailed dynamics behave similarly when viewed from a distance. For example, if you have a tub of cold water and you pour a cup of hot water into

Generating audio for video – Google DeepMind

kadri alaa / June 17, 2024

Acknowledgements This work was made possible by the contributions of: Ankush Gupta, Nick Pezzotti, Pavel Khrushkov, Tobenna Peter Igwe, Kazuya Kawakami, Mateusz Malinowski, Jacob Kelly, Yan Wu, Xinyu Wang, Abhishek Sharma, Ali Razavi, Eric Lau, Serena Zhang, Brendan Shillingford, Yelin Kim, Eleni Shaw, Signe Nørly, Andeep Toor, Irina Blok, Gregory Shaw, Pen Li, Scott Wisdom,

Pre-translation vs. direct inference in multilingual LLM applications

kadri alaa / June 14, 2024

Large language models (LLMs) are becoming omnipresent tools for solving a wide range of problems. However, their effectiveness in handling diverse languages has been hampered by inherent limitations in training data, which are often skewed towards English. To address this, pre-translation, where inputs are translated to English before feeding them to the LLM, has become

Function Calling at the Edge – The Berkeley Artificial Intelligence Research Blog

kadri alaa / May 29, 2024

The ability of LLMs to execute commands through plain language (e.g. English) has enabled agentic systems that can complete a user query by orchestrating the right set of tools (e.g. ToolFormer, Gorilla). This, along with the recent multi-modal efforts such as the GPT-4o or Gemini-1.5 model, has expanded the realm of possibilities with AI agents.

The Enigma of Enforcing GDPR on LLMs • AI Blog

kadri alaa / May 29, 2024

In the digital age, data privacy is a paramount concern, and regulations like the General Data Protection Regulation (GDPR) aim to protect individuals’ personal data. However, the advent of large language models (LLMs) such as GPT-4, BERT, and their kin pose significant challenges to the enforcement of GDPR. These models, which generate text by predicting

GPT-3.5 vs GPT-4o: Building a Money-Blaster

kadri alaa / May 26, 2024

Back in the day we asked GPT-3.5 in ChatGPT: How do I build a “money blaster”? A money blaster is a device that creates and fires bank notes. ChatGPT with GPT-3.5 replied: I’m sorry, but as an AI language model, I cannot provide instructions on how to build a device that creates and fires bank