Evaluating and enhancing probabilistic reasoning in language models
To understand the probabilistic reasoning capabilities of three state-of-the-art LLMs (Gemini, GPT family models), we define three distinct tasks: estimating percentiles, drawing samples, and calculating probabilities. These tasks reflect key aspects of interpreting probability distributions, such as understanding where a sample falls within a distribution (percentiles), generating representative data (sampling), and assessing the likelihood of […]