kadri alaa - Soultarity Tech News

The Visual Haystacks Benchmark! – The Berkeley Artificial Intelligence Research Blog

kadri alaa / July 20, 2024

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes within single images and answer related questions. While recent advancements in foundation models have significantly closed the gap between human […]

Google DeepMind at ICML 2024

kadri alaa / July 19, 2024

Research Published 19 July 2024 Exploring AGI, the challenges of scaling and the future of multimodal generative AI Next week the artificial intelligence (AI) community will come together for the 2024 International Conference on Machine Learning (ICML). Running from July 21-27 in Vienna, Austria, the conference is an international platform for showcasing the latest advances,

Harnessing hidden genetic information in clinical data with REGLE

kadri alaa / July 18, 2024

Modern healthcare systems generate a vast amount of high-dimensional clinical data (HDCD), such as spirogram measurements, photoplethysmograms (PPG), electrocardiogram (ECG) recordings, CT scans, and MRI imaging, that cannot be summarized as a single binary or a continuous number (cf. “has asthma” or “height in centimeters”). Understanding the connection between our genomes and HDCD not only

Accelerating code migrations with AI

kadri alaa / July 18, 2024

As Google’s codebase and its products evolve, assumptions made in the past (sometimes over a decade ago) no longer hold. For example, Google Ads has dozens of numerical unique “ID” types used as handles — for users, merchants, campaigns, etc. — and these IDs were originally defined as 32-bit integers. But with the current growth

Using high-performance computing to advance machine learning and wildfire research

kadri alaa / July 16, 2024

The severity and frequency of large wildfires has increased significantly over recent years due to factors ranging from climate and weather pattern changes to increased human activities in wildland-urban interfaces. While wildfires play an important role in some forest’s natural cycle, extreme fires pose serious threats to communities and ecosystems. Frequent wildfires can disrupt, damage,

How to counter people like Terrence Howard? • AI Blog

kadri alaa / July 9, 2024

In a world filled with misinformation and oddball theories, it’s inevitable to come across individuals who hold beliefs that defy basic logic and established facts. One such example is actor Terrence Howard, who famously claimed that 1 x 1 = 2. As baffling as this assertion might be, it presents an opportunity to explore how

Assessing ASR performance with meaning preservation

kadri alaa / July 9, 2024

Meaning preservation as an alternative metric Our research leveraged the Project Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from approximately 2,000 individuals with diverse speech impairments. To expand data collection to Spanish speakers, Project Euphonia partnered with the International Alliance of ALS/MND Associations, which facilitated the contribution of speech samples

E-Commerce Video Mockups with Hedra • AI Blog

kadri alaa / July 4, 2024

In the ever-evolving landscape of e-commerce, staying ahead of the curve often means adopting the latest technologies to engage and attract customers. One such innovation making waves in the industry is the use of generative video AI models. We’ve had the opportunity to explore Hedra’s generative video AI to create interesting video mockups for an

Rich human feedback for text-to-image generation

kadri alaa / June 26, 2024

Recent text-to-image generation (T2I) models, such as Stable Diffusion and Imagen, have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues like artifacts (e.g., distorted objects, text and body parts), misalignment with text descriptions, and low aesthetic quality. For example, the prompt in the image

A use case for meeting transcripts

kadri alaa / June 25, 2024

To evaluate the MISeD data, we compare with a dataset collected using the traditional WOZ approach. A “user” annotator was given the general context for a meeting and asked questions about it, while an ”agent” annotator used the full transcripts to provide answers and supporting attribution. This WOZ test set contains 70 dialogs (700 query-response