Unaligned Moral Values in Agent-centric Simulations

[Submitted on 21 Aug 2024 (v1), last revised 14 Jul 2025 (this version, v2)]

View a PDF of the paper titled Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations, by Simon M\”unker

View PDF
HTML (experimental)

Abstract:Contemporary research in social sciences increasingly utilizes state-of-the-art generative language models to annotate or generate content. While these models achieve benchmark-leading performance on common language tasks, their application to novel out-of-domain tasks remains insufficiently explored. To address this gap, we investigate how personalized language models align with human responses on the Moral Foundation Theory Questionnaire. We adapt open-source generative language models to different political personas and repeatedly survey these models to generate synthetic data sets where model-persona combinations define our sub-populations. Our analysis reveals that models produce inconsistent results across multiple repetitions, yielding high response variance. Furthermore, the alignment between synthetic data and corresponding human data from psychological studies shows a weak correlation, with conservative persona-prompted models particularly failing to align with actual conservative populations. These results suggest that language models struggle to coherently represent ideologies through in-context prompting due to their alignment process. Thus, using language models to simulate social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes properly.

Submission history

From: Simon Münker [view email]
[v1]
Wed, 21 Aug 2024 08:20:41 UTC (42 KB)
[v2]
Mon, 14 Jul 2025 08:34:57 UTC (44 KB)

Submission history

Leave a Comment Cancel Reply