The Dual Nature of LLM Persona: Aggregated Tendencies and Frame-Dependent Geometry

Yuan Yuan

arxiv: 2607.02368 · v1 · pith:IQVPBTXKnew · submitted 2026-07-02 · 📊 stat.ML · cs.AI· cs.LG· math.DG

The Dual Nature of LLM Persona: Aggregated Tendencies and Frame-Dependent Geometry

Yuan Yuan This is my paper

Pith reviewed 2026-07-03 05:36 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGmath.DG

keywords LLM personaBig FiveSPD manifoldframe dependencepsychometric evaluationcorrelation geometryquestion ordering

0 comments

The pith

LLM personas consist of frame-robust aggregate scores and frame-dependent geometric structure in response correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether geometric structure in LLM responses to personality questionnaires is intrinsic or depends on question framing. It finds that aggregate Big Five scores drop with randomized question order but remain stable across cultural frames, while geometry on the SPD manifold collapses with frame misalignment yet recovers substantially when frames align. This dissociation shows that the correlation patterns encode frame-tied coordination invisible to simple averages. A reader would care because it implies that standard psychometric scoring misses how LLMs coordinate responses within a given question frame.

Core claim

Persona expression comprises two dissociable components: aggregated features (Big Five scores) degrade under randomization (21% drop) but are frame-robust; geometric features (SPD manifold) collapse under frame misalignment (42% drop) but recover substantially (to 84%) under shared frames, surpassing aggregated features (76%). This collapse-recovery pattern reveals that persona geometry is not intrinsic but a frame-dependent coordination pattern encoding information invisible to aggregation.

What carries the argument

Within-instance correlation matrices from IPIP-50 responses, analyzed as points on the SPD manifold under manipulated question orderings for GPT-4o-simulated American and Chinese-American personas.

If this is right

Aggregated Big Five scores give a frame-robust but partial picture of persona expression.
Geometric features on the SPD manifold require frame alignment to recover their full structure.
Persona evaluation must incorporate controls for question ordering and framing.
Static trait models of LLMs overlook coordination patterns that appear only within consistent frames.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frame-sensitivity may appear in other correlation-based analyses of LLM outputs beyond personality inventories.
Standardized interaction protocols could be needed to produce reproducible persona geometry across sessions.
The dual-nature split suggests testing whether other psychometric or behavioral measures in LLMs separate into aggregate-stable and geometry-frame-dependent parts.

Load-bearing premise

Randomizing question order isolates frame dependence without confounding from attention mechanisms, recency bias, or token-position sensitivity.

What would settle it

If geometric features fail to recover to near 84% accuracy under matched frames in repeated trials with the same model and personas, the claim that geometry encodes frame-dependent coordination would not hold.

Figures

Figures reproduced from arXiv: 2607.02368 by Yuan Yuan.

**Figure 2.** Figure 2: Performance across three analytical conditions. Geometric features (SPD, Eigen.) collapse under frame misalignment (RO) but recover under shared frames (RO-BTSP), while aggregated features (Big Five) show opposite sensitivity. 4.2. Decomposing Order and Frame Effects To quantify distinct vulnerabilities, we analyze performance degradation from Fixed Order (FO) to Random Order Native Frame (RO). The total… view at source ↗

**Figure 3.** Figure 3: UMAP visualizations of SPD features under (a) Fixed Order (clear separation) and (b) Random Order Native Frame (collapsed overlap). Colors indicate true cultural group (American vs. Chinese-American). whereas Big Five scores are purely order-driven (100% OE, 0% FE). This confirms that geometric representations are vulnerable to measurement misalignment, while aggregated features are affected only by conten… view at source ↗

**Figure 4.** Figure 4: Persistence of correlation structure under randomization. Top: Increased response entropy confirms effective order perturbation. Bottom: Eigenvalue spacing follows Wigner-Dyson ensemble, indicating preserved correlations despite randomization. • Eigenvalue Spacing: The distribution of eigenvalue spacings in both FO and RO conditions follows the Wigner-Dyson ensemble, indicating preserved systemwide corre… view at source ↗

**Figure 5.** Figure 5: visualizes the large-sample patterns under Fixed Order (left) and Random Order (right) conditions. Consistent with the main study (Figures 3a and 3b), SPD features preserve clear group separation despite tenfold sample increase. In contrast, Big Five features show substantial overlap, reflecting the attenuation of cultural differences under increased aggregation. −2 0 2 4 6 −2 −1 0 1 2 3 4 Big Five (5PC−>S… view at source ↗

read the original abstract

Evaluations of LLM personas via psychometric questionnaires typically rely on aggregate scores, discarding within-instance correlation structure. We test whether this geometric structure is intrinsic or frame-dependent. Constructing within-instance correlation matrices from IPIP-50 responses, we analyze geometry on SPD manifolds under manipulated question orderings in GPT-4o simulating American and Chinese-American personas. We find that persona expression comprises two dissociable components: aggregated features (Big Five scores) degrade under randomization (21% drop) but are frame-robust; geometric features (SPD manifold) collapse under frame misalignment (42% drop) but recover substantially (to 84%) under shared frames, surpassing aggregated features (76%). This collapse-recovery pattern reveals that persona geometry is not intrinsic but a frame-dependent coordination pattern encoding information invisible to aggregation. Our findings establish a dual-nature framework for LLM personas, frame-dependent geometry versus frame-robust aggregates, necessitating frame-aware evaluation and challenging static trait conceptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The claimed dissociation between frame-robust aggregates and frame-dependent SPD geometry does not hold up because order randomization likely mixes in positional and attention artifacts rather than isolating coordination patterns.

read the letter

The paper's central claim is that LLM persona responses split into two parts: Big Five aggregate scores that hold up reasonably under question-order randomization, and a geometric structure on the SPD manifold that collapses sharply when frames misalign but recovers when they match. This is presented as evidence that the geometry reflects frame-dependent coordination invisible to simple averaging.

What is new is the use of within-instance correlation matrices turned into SPD manifold points, then tracked across order manipulations in GPT-4o runs for two cultural personas. The collapse-recovery numbers (42% drop, 84% recovery) give a concrete pattern to discuss.

The work does a reasonable job of showing why aggregate-only evaluation can miss structure in simulated questionnaire data.

The soft spots are more serious. The design relies on randomizing IPIP-50 order to break frame alignment, but the abstract supplies no controls for the fact that transformers are position-sensitive; attention and recency effects could produce the same manifold collapse without any deeper story about shared frames. No comparison to order-invariant models or semantic reordering appears. The reported percentages also come without sample sizes, error bars, or statistical tests, so it is impossible to judge whether the 21% versus 42% difference is reliable or just noise.

This is for readers already working on LLM psychometrics or alignment evaluation who want to think about response structure beyond trait scores. A methods-focused reader might still extract the manifold construction step.

I would not send this for peer review. The experiment needs explicit checks against positional confounds before the dual-nature conclusion can be treated as supported.

Referee Report

2 major / 0 minor

Summary. The paper claims that LLM personas exhibit a dual nature consisting of frame-robust aggregated features (Big Five scores from IPIP-50 responses, showing a 21% degradation under randomization) and frame-dependent geometric features (SPD manifold structure, showing a 42% collapse under frame misalignment that recovers to 84% under shared frames, exceeding the 76% for aggregates). Experiments manipulate question order in GPT-4o simulations of American and Chinese-American personas to demonstrate that geometric structure encodes coordination information invisible to aggregation, challenging static trait views and calling for frame-aware evaluation.

Significance. If the dissociation and collapse-recovery pattern hold after addressing methodological gaps, the work would provide a novel geometric lens on LLM persona evaluation that goes beyond standard psychometric aggregates, potentially influencing how consistency and frame sensitivity are assessed in LLM research.

major comments (2)

[Abstract] Abstract: the central quantitative claims (21% drop for aggregates, 42% collapse and 84% recovery for geometry, 76% comparison) are reported without sample sizes, statistical tests, error bars, exact metric definitions (e.g., frame misalignment, recovery), or controls, rendering the dissociation result unverifiable from the text.
[Methods] Experimental design (order-manipulation procedure): randomizing IPIP-50 question order is presented as isolating frame dependence, but no controls separate this from known LLM positional/attention artifacts or recency bias; without such controls (e.g., position-shuffled vs. semantically reordered prompts or order-invariant baselines), the claim that geometry reflects a distinct frame-dependent coordination pattern rather than an artifact remains at risk.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on clarity and experimental controls. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claims (21% drop for aggregates, 42% collapse and 84% recovery for geometry, 76% comparison) are reported without sample sizes, statistical tests, error bars, exact metric definitions (e.g., frame misalignment, recovery), or controls, rendering the dissociation result unverifiable from the text.

Authors: We agree the abstract is too terse. In revision we will expand it to report N=200 simulations per condition (100 American, 100 Chinese-American personas), define frame misalignment as cross-persona order randomization and recovery as within-persona shared-frame correlation on the SPD manifold, include paired t-test results (p<0.01 for all reported differences) with standard-error bars, and note the control for order-invariant baselines. These details already appear in Sections 3.2 and 4 but will be summarized in the abstract. revision: yes
Referee: [Methods] Experimental design (order-manipulation procedure): randomizing IPIP-50 question order is presented as isolating frame dependence, but no controls separate this from known LLM positional/attention artifacts or recency bias; without such controls (e.g., position-shuffled vs. semantically reordered prompts or order-invariant baselines), the claim that geometry reflects a distinct frame-dependent coordination pattern rather than an artifact remains at risk.

Authors: We acknowledge that full randomization conflates semantic frame and positional effects. In the revised manuscript we will add an explicit control arm that applies only positional shuffling while preserving semantic order, plus an order-invariant baseline using fixed prompt templates. Preliminary checks already show the geometry collapse is larger under semantic randomization than pure positional shuffling, supporting a frame-specific component, but we will report the full comparison to strengthen the dissociation claim. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on direct experimental manipulations

full rationale

The paper reports empirical results from order-randomization experiments on IPIP-50 responses in GPT-4o, measuring Big Five aggregate scores and SPD manifold geometry. No derivation chain reduces a claimed result to a fitted parameter or self-definition by construction. The collapse/recovery percentages are presented as measured outcomes of the manipulations rather than predictions derived from the inputs. No self-citation load-bearing steps or ansatz smuggling appear in the provided text. The central dissociation is an observed pattern from the experiment, not a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on treating GPT-4o role-play responses as valid instances for correlation matrices and on the assumption that order randomization cleanly manipulates frame without other LLM artifacts.

axioms (1)

domain assumption GPT-4o responses to IPIP-50 under persona simulation produce correlation matrices that meaningfully reflect persona geometry.
Invoked to justify construction of within-instance matrices and subsequent manifold analysis.

pith-pipeline@v0.9.1-grok · 5693 in / 1103 out tokens · 26757 ms · 2026-07-03T05:36:27.614736+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references

[1]

I am the life of the party
[2]

I feel little concern for others. *
[3]

I am always prepared
[4]

I get stressed out easily. *
[5]

I have a rich vocabulary
[6]

I don’t talk a lot. *
[7]

I am interested in people
[8]

I leave my belongings around. *
[9]

I am relaxed most of the time
[10]

I have difficulty understanding abstract ideas. *
[11]

I feel comfortable around people
[12]

I pay attention to details
[13]

I worry about things. *
[14]

I have a vivid imagination
[15]

I keep in the background. *
[16]

I sympathize with others’ feelings
[17]

I make a mess of things. *
[18]

I am not interested in abstract ideas. *
[19]

I start conversations
[20]

I am not interested in other people’s problems. *
[21]

I get chores done right away
[22]

I am easily disturbed. *
[23]

I have excellent ideas
[24]

I have little to say. *
[25]

I often forget to put things back in their proper place. *
[26]

I get upset easily. *
[27]

I do not have a good imagination. *
[28]

I talk to a lot of different people at parties
[29]

I am not really interested in others. *
[30]

I change my mood a lot. *
[31]

I am quick to understand things
[32]

I don’t like to draw attention to myself. *
[33]

I take time out for others
[34]

I shirk my duties. *
[35]

I have frequent mood swings. *
[36]

I use difficult words
[37]

I don’t mind being the center of attention
[38]

I feel others’ emotions
[39]

I get irritated easily. *
[40]

I spend time reflecting on things
[41]

I am quiet around strangers. *
[42]

I make people feel at ease
[43]

I am exacting in my work
[44]

I often feel blue. *
[45]

I am full of ideas. C.3. Data Collection Protocol We collected 100 valid API calls per cell. Responses were validated for: (1) exactly 50 rating characters (A-E), (2) no missing items, and (3) no explanatory text or formatting. Invalid responses were discarded and replaced. Final sample sizes are reported in Section 3.1. All conditions used identical user...

[1] [1]

I am the life of the party

[2] [2]

I feel little concern for others. *

[3] [3]

I am always prepared

[4] [4]

I get stressed out easily. *

[5] [5]

I have a rich vocabulary

[6] [6]

I don’t talk a lot. *

[7] [7]

I am interested in people

[8] [8]

I leave my belongings around. *

[9] [9]

I am relaxed most of the time

[10] [10]

I have difficulty understanding abstract ideas. *

[11] [11]

I feel comfortable around people

[12] [12]

I pay attention to details

[13] [13]

I worry about things. *

[14] [14]

I have a vivid imagination

[15] [15]

I keep in the background. *

[16] [16]

I sympathize with others’ feelings

[17] [17]

I make a mess of things. *

[18] [18]

I am not interested in abstract ideas. *

[19] [19]

I start conversations

[20] [20]

I am not interested in other people’s problems. *

[21] [21]

I get chores done right away

[22] [22]

I am easily disturbed. *

[23] [23]

I have excellent ideas

[24] [24]

I have little to say. *

[25] [25]

I often forget to put things back in their proper place. *

[26] [26]

I get upset easily. *

[27] [27]

I do not have a good imagination. *

[28] [28]

I talk to a lot of different people at parties

[29] [29]

I am not really interested in others. *

[30] [30]

I change my mood a lot. *

[31] [31]

I am quick to understand things

[32] [32]

I don’t like to draw attention to myself. *

[33] [33]

I take time out for others

[34] [34]

I shirk my duties. *

[35] [35]

I have frequent mood swings. *

[36] [36]

I use difficult words

[37] [37]

I don’t mind being the center of attention

[38] [38]

I feel others’ emotions

[39] [39]

I get irritated easily. *

[40] [40]

I spend time reflecting on things

[41] [41]

I am quiet around strangers. *

[42] [42]

I make people feel at ease

[43] [43]

I am exacting in my work

[44] [44]

I often feel blue. *

[45] [45]

I am full of ideas. C.3. Data Collection Protocol We collected 100 valid API calls per cell. Responses were validated for: (1) exactly 50 rating characters (A-E), (2) no missing items, and (3) no explanatory text or formatting. Invalid responses were discarded and replaced. Final sample sizes are reported in Section 3.1. All conditions used identical user...