AI - Soultarity Tech News

The Download: how your data is being used to train AI, and why chatbots aren’t doctors

kadri alaa / July 21, 2025

Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found. Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from […]

Monthly Roundup #32: July 2025 — LessWrong

kadri alaa / July 21, 2025

Welcome to the monthly roundup of things that don’t fit into other categories and don’t rise to the level of their own posts. Bad News When people tell you who they are, believe them (with obvious exceptions). In particular, if they explicitly describe themselves as evil, or demonic, or uses other similar terms, definitely believe

Building a Smart Python-to-R Code Converter with Gemini AI-Powered Validation and Feedback

kadri alaa / July 21, 2025

class EnhancedPythonToRConverter: “”” Enhanced Python to R converter with Gemini AI validation “”” def __init__(self, gemini_api_key: str = None): self.validator = GeminiValidator(gemini_api_key) self.import_mappings = { ‘pandas’: ‘library(dplyr)\nlibrary(tidyr)\nlibrary(readr)’, ‘numpy’: ‘library(base)’, ‘matplotlib.pyplot’: ‘library(ggplot2)’, ‘seaborn’: ‘library(ggplot2)\nlibrary(RColorBrewer)’, ‘scipy.stats’: ‘library(stats)’, ‘sklearn’: ‘library(caret)\nlibrary(randomForest)\nlibrary(e1071)’, ‘statsmodels’: ‘library(stats)\nlibrary(lmtest)’, ‘plotly’: ‘library(plotly)’, } self.function_mappings = { ‘pd.DataFrame’: ‘data.frame’, ‘pd.read_csv’: ‘read.csv’, ‘pd.read_excel’: ‘read_excel’, ‘df.head’: ‘head’, ‘df.tail’: ‘tail’,

Autonomous High-Quality Image Editing Triplet Mining

kadri alaa / July 21, 2025

[Submitted on 18 Jul 2025] View a PDF of the paper titled NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining, by Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh and Georgii Fedorov and Bulat Suleimanov and Vladimir Dokholyan and Aleksandr Gordeev View PDF Abstract:Recent advances in generative modeling enable image editing assistants that follow natural language

LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios

kadri alaa / July 21, 2025

[Submitted on 4 Feb 2025 (v1), last revised 18 Jul 2025 (this version, v4)] View a PDF of the paper titled From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios, by Yuan Gao and 4 other authors View PDF HTML (experimental) Abstract:Ensuring the safety of autonomous vehicles requires virtual scenario-based testing,

[2411.03537] Two-Stage Pretraining for Molecular Property Prediction in the Wild

kadri alaa / July 21, 2025

[Submitted on 5 Nov 2024 (v1), last revised 18 Jul 2025 (this version, v2)] View a PDF of the paper titled Two-Stage Pretraining for Molecular Property Prediction in the Wild, by Kevin Tirta Wijaya and 5 other authors View PDF HTML (experimental) Abstract:Molecular deep learning models have achieved remarkable success in property prediction, but they

Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings

kadri alaa / July 21, 2025

[Submitted on 21 Feb 2024 (v1), last revised 18 Jul 2025 (this version, v2)] View a PDF of the paper titled SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings, by Rishabh Bajpai and Bhooma Aravamuthan View PDF Abstract:Movement disorder diagnosis often relies on expert evaluation of patient videos,

[2409.04617] Sparse Rewards Can Self-Train Dialogue Agents

kadri alaa / July 21, 2025

[Submitted on 6 Sep 2024 (v1), last revised 18 Jul 2025 (this version, v3)] View a PDF of the paper titled Sparse Rewards Can Self-Train Dialogue Agents, by Barrett Martin Lattimer and 3 other authors View PDF Abstract:Recent advancements in state-of-the-art (SOTA) Large Language Model (LLM) agents, especially in multi-turn dialogue tasks, have been primarily

A Survey of Long Chain-of-Thought for Reasoning Large Language Models

kadri alaa / July 21, 2025

[Submitted on 12 Mar 2025 (v1), last revised 18 Jul 2025 (this version, v5)] View a PDF of the paper titled Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, by Qiguang Chen and 9 other authors View PDF HTML (experimental) Abstract:Recent advancements in reasoning with large language models (RLLMs), such

Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography

kadri alaa / July 21, 2025

[Submitted on 18 Jul 2025] View a PDF of the paper titled UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography, by Shravan Venkatraman and 3 other authors View PDF HTML (experimental) Abstract:Accurate classification of computed tomography (CT) images is essential for diagnosis and treatment planning, but existing methods often struggle with the subtle