Synthetic idiolectal styles in authorship analysis

Krzysztof Kredens, John Blake

Research output: Unpublished contribution to conferenceUnpublished Conference Paperpeer-review

Abstract

Recent advances in Large Language Models (LLMs) have opened new methodological possibilities in authorship analysis research (e.g. Huang et al. 2024, Huang and Grieve 2024, Przystalski et al. 2024). While ‘traditional’ approaches to identifying idiolectal features have relied on manual analysis and/or statistical modelling of naturally occurring text samples, LLMs offer capabilities for analysing and reproducing linguistic patterns at scale. As these models demonstrate abilities to capture and generate aspects of language variation, they present both opportunities and challenges for forensic linguistic research. This paper explores the potential of LLMs to synthesise idiolectal styles for use in authorship analysis experiments. The study uses the 100 Idiolects project data (Heini and Kredens 2023), a corpus of text samples from 112 individuals, each contributing input in seven prescribed discourse types. We used LLMs to reproduce each individual's idiolectal style in an eighth discourse type and used this output to run authorship attribution experiments to gauge the robustness of machine-generated ‘idiolectal’ styles. This experimental design allows us to evaluate both the ability of LLMs to capture individual linguistic patterns from across different discourse types and the reliability of using synthetic data in authorship attribution tasks. By comparing attribution results using human-authored versus LLM-generated texts, we assess the potential for LLMs to assist in authorship analysis tasks and discuss limitations in their ability to replicate idiolectal traits.
Original languageEnglish
Publication statusUnpublished - 2025
Event17th Biennial Conference of the International Association for Forensic and Legal Linguistics - Cape Town, South Africa
Duration: 30 Jun 20254 Jul 2025

Conference

Conference17th Biennial Conference of the International Association for Forensic and Legal Linguistics
Country/TerritorySouth Africa
CityCape Town
Period30/06/254/07/25

Keywords

  • authorship analysis, idiolect, large language models

Fingerprint

Dive into the research topics of 'Synthetic idiolectal styles in authorship analysis'. Together they form a unique fingerprint.

Cite this