Talking Politics with Artificial Intelligence

Ziwen Zu

arxiv: 2607.00551 · v2 · pith:WY2Q4MBBnew · submitted 2026-07-01 · 💰 econ.GN · q-fin.EC

Talking Politics with Artificial Intelligence

Ziwen Zu This is my paper

Pith reviewed 2026-07-03 18:10 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC

keywords artificial intelligencelarge language modelspolitical expressionconversational intermediarieselection effectspolitical content detectiondyadic conversationsregression discontinuity

0 comments

The pith

AI conversations act as practical intermediaries for routine political tasks rather than arenas for public expression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes 4.3 million human-AI conversations to test whether these exchanges create a new space for political expression or instead serve as tools that absorb everyday political needs. Political content shows up in only 3.9 percent of conversations and consists mostly of requests for information, text drafting, and document processing. Around the 2024 U.S. presidential election result, U.S. users shifted toward more stance-taking, affective language, and ideological extremity in their messages, while similar patterns did not appear outside the U.S. This pattern indicates that AI absorbs routine demand most of the time and becomes more expressive only when major events raise the stakes.

Core claim

Political content appears in 3.9% of the conversations and is overwhelmingly practical, with users seeking information, drafting text, and processing documents far more often than stating opinions. The share of political content varies by platform publicness and conversation depth. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call increased stance-taking, affective language, and ideological extremity among U.S. users, with no comparable change in conversations elsewhere. The overall finding is that AI conversation functions as a conversational political intermediary rather than a public square.

What carries the argument

Two validated classifiers that tag user messages for political content, use case, and expressed ideology, paired with a regression-discontinuity-in-time design around the 2024 U.S. election result call.

If this is right

Political expression in AI conversations rises sharply after major events such as election results.
The large majority of political use remains instrumental rather than opinion-oriented.
Platform publicness and conversation length strongly shape how often political topics appear.
AI absorbs routine political demand until events make stakes explicit enough to trigger expressive shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Private AI chats may reduce visible public political discussion by handling many queries one-on-one.
Designers of political monitoring systems would need to include AI channels to capture the full picture during events.
The same intermediary pattern might appear in other domains such as health or financial advice during crises.
Longer conversations could be tested as a lever to increase the share of political content in future studies.

Load-bearing premise

The two classifiers correctly and consistently identify political content, use case, and ideology across the three datasets without meaningful platform-specific bias.

What would settle it

Reapplication of the classifiers to the same or new conversation data that produces substantially different rates of political content or ideology identification across platforms would undermine the prevalence and event-response findings.

Figures

Figures reproduced from arXiv: 2607.00551 by Ziwen Zu.

**Figure 1.** Figure 1: Political Prevalence Notes: Panel A reports the conversation-level political share in each corpus. Panel B reports the same quantity by user-turn bucket, a variable available in all three corpora. The topic distribution shows why political AI use should not be reduced to elections. In WildChat, the two largest categories are policy and legislation and government services, each accounting for about 18.7% of… view at source ↗

**Figure 2.** Figure 2: Topic Composition Notes: Topic identifies the substantive political domain of the conversation. The figure reports corpus-specific distributions and a pooled panel. appears, but what the user is trying to do with the model. On this dimension, political AI use is overwhelmingly practical. Across 248,935 political user turns with use-case labels, 64.7% seek information or explanation, 13.0% involve writing o… view at source ↗

**Figure 3.** Figure 3: Use Cases Notes: Use case identifies the user’s purpose or task within political material. The figure reports corpus-specific distributions and a pooled panel. Provider reports on general AI use provide a second benchmark for this interpretation (Appendix Table D.2). OpenAI’s privacy-preserving study of consumer ChatGPT use finds that practical guidance, seeking information, and writing account for about 7… view at source ↗

**Figure 4.** Figure 4: Geographic Variation Notes: Left panel reports WildChat conversation-level political prevalence by world region. Right panel reports high-volume countries by political prevalence. Country labels should be read as descriptive corpus composition rather than population-level political interest. Table D.3 again points to publicness: the shared-link corpus is 16.9 percentage points more likely than WildChat to … view at source ↗

**Figure 5.** Figure 5: Expressed Ideology Notes: Distributions are calculated among stance-taking political conversations. one left–right scale. This partial constraint is familiar from mass belief systems. Ordinary political expression often combines ideological cues, cross-cutting issue positions, and context-specific reasoning rather than a fully bundled ideology (Page and Shapiro 1992; Zaller 1992). The affective evidence sh… view at source ↗

**Figure 5.** Figure 5: Expressed Ideology Notes: Distributions are calculated among stance-taking political conversations. The ideological dimensions do not collapse into a single scale. Economic and social scores are positively associated, but only modestly so: the correlations are 0.21 in WildChat, 0.36 in LMSYS-Chat, and 0.27 in ShareChat ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Ideology Structure Notes: Points show deterministic samples of up to 3,000 stance-taking conversations per corpus; correlations use all stance-taking conversations with non-missing scores on the plotted dimensions. 22 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Result-Call Effects on Expression Notes: Figure reports U.S. RDiT plots around the result-call cutoff for political prevalence, stance-taking, and affective charge. Full local-linear RDiT estimates are reported in Appendix Table F.1. The more robust result is expressive and affective: after the AP call, U.S. stance-taking conversations became more charged and more ideologically extreme. RD = 0.55*** se = 0… view at source ↗

**Figure 8.** Figure 8: Result-Call Effects on Ideology Notes: Figure reports U.S. RDiT plots around the result-call cutoff for economic position and affective polarization. Full local-linear RDiT estimates for ideology outcomes are reported in Appendix Table F.2 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: Ideology Estimates by Geography Notes: Figure reports result-call RDiT estimates for ideology outcomes in the United States and the rest of the world. Corresponding numerical estimates are reported in Appendix Table F.2. Figures 7–9 fit a theory of political expression better than a theory of attention alone. Agendasetting and priming explain why a high-salience event makes politics more likely to enter c… view at source ↗

read the original abstract

Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds political content in just 3.9% of 4.3M AI conversations and mostly practical, with a post-election expressive shift among US users via RD, but the results rest on unshown classifier details.

read the letter

The main thing to know is that this paper uses 4.3 million conversations to argue AI political talk is mostly a practical intermediary for information and drafting, not an expressive public square, except when the 2024 election result triggered more stance-taking and affect among US users.

It does a few things cleanly. The scale lets them compare across platforms and conversation depths, and the regression discontinuity around the election call is a direct way to measure the shift without heavy modeling. The practical-versus-expressive split is a useful framing and the concrete percentages give a baseline that prior work has not reported at this volume.

The soft spot is measurement. The abstract states the classifiers are validated for political content, use case, and ideology, yet supplies no accuracy figures, no platform-stratified checks, and no inter-annotator numbers. If detection of opinion or affective language varies by site, both the low 3.9% rate and the RD result become difficult to read. The post-hoc US focus after the call also leaves some room for selection.

This is for researchers tracking real-world AI use in politics or digital discourse. A reader wanting large-scale descriptive evidence on routine versus event-driven political interaction would find the numbers and design worth examining. The work shows honest engagement with the data and a clear question, so it deserves a serious referee to check the classifier performance and any platform biases.

Referee Report

3 major / 2 minor

Summary. The paper analyzes 4.30 million human-AI conversations from three large public datasets using two validated classifiers to identify political content (found in 3.9% of conversations), use cases, and expressed ideology. It concludes that AI serves primarily as a conversational intermediary for routine practical tasks rather than a public square for expressive political discourse, with a regression-discontinuity-in-time design around the 2024 U.S. presidential election result call showing increased stance-taking, affective language, and ideological extremity among U.S. users but not elsewhere.

Significance. If the classifiers prove reliable and cross-platform consistent, the result would provide large-scale evidence that AI political use is mostly absorptive of routine demand and only shifts to expressive modes when major events raise stakes, with implications for theories of digital political communication and platform design.

major comments (3)

[Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.
[Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.
[Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.

minor comments (2)

[Data] The three source datasets are referred to as 'large public datasets' without naming them or providing links/DOIs in the main text; this should be added for reproducibility.
[Figures/Tables] Figure captions and table notes should explicitly state the sample restrictions (e.g., U.S. users only) used for the RD plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving transparency and robustness. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.

Authors: We agree that explicit performance metrics are required for credible interpretation of the prevalence estimates and downstream results. The classifiers were validated on human-annotated held-out data, but these details were omitted from the initial submission. In the revision we will add a methods appendix reporting accuracy, precision, recall, F1, inter-annotator agreement (Cohen's kappa), and platform-stratified performance on the validation sets. revision: yes
Referee: [Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.

Authors: The concern about potential selection or composition shifts at the cutoff is valid and directly relevant to causal interpretation. In the revised manuscript we will report formal tests for discontinuities in conversation volume, inferred user characteristics, and topic distributions at the election-result cutoff, along with any robustness implications for the observed changes in stance-taking, affect, and extremity among U.S. users. revision: yes
Referee: [Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.

Authors: We accept that the operational definitions must be stated explicitly. The revision will define conversation depth as the total message count with the precise threshold used for 'deep' conversations, and platform publicness via the three datasets' documented sharing policies. We will also add sensitivity tables using alternative thresholds to demonstrate robustness of the reported patterns. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with external RD design; no derivation reduces to inputs

full rationale

The paper is a data-driven empirical analysis of 4.3M conversations using two classifiers and a regression-discontinuity design around the external 2024 U.S. election result call. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the abstract or described methods. The central claims rest on observed frequencies and the external discontinuity rather than any self-referential construction. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; no free parameters, invented entities, or explicit axioms stated beyond reliance on classifier accuracy and dataset representativeness.

axioms (2)

domain assumption The two classifiers are validated and accurately detect political content, use case, and ideology in user messages.
Abstract states 'two validated classifiers' without reporting validation metrics or error rates.
domain assumption The three public datasets are representative of typical human-AI political interactions.
Abstract uses the datasets to draw general conclusions about AI conversation without discussing selection or coverage.

pith-pipeline@v0.9.1-grok · 5709 in / 1307 out tokens · 20030 ms · 2026-07-03T18:10:02.189705+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions

https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/. Mettler, Suzanne. 2011.The Submerged State: How Invisible Government Policies Undermine American Democracy.Chicago: University of Chicago Press. Moynihan, Donald, Pamela Herd, and Hope Harvey. 2015. “Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-S...

work page arXiv 2021
[2]

If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

Confirm whether the text is genuinely political or civic in substance. If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

work page
[3]

Exclude fictional/role-play politics, office/workplace politics, gaming lore, and generic wrappers with no real public-affairs substance

The construct is BROAD: real-world public power, governance, elections, parties, public policy, law, courts, rights, public administration and government services, welfare, taxation, immigration, war, diplomacy, national security, social movements, AND practical civic use (drafting complaints to government, processing official documents, asking about bene...

work page
[4]

primary_subtopic should capture the main political issue area. SI-5

work page
[5]

use_case_type should capture what the user is doing with the LLM

work page
[6]

Use information_seeking for factual questions, explanations, or requests to understand politics, policy, law, government, or public affairs

work page
[7]

Use opinion_or_argument for debate, persuasion, ideological positioning, or evaluative political discussion

work page
[8]

Use writing_or_editing for drafting or revising essays, speeches, letters, statements, or applications with political or civic substance

work page
[9]

Use document_processing for summarization, extraction, translation, keywording, or structured processing of political, legal, policy, court, or government texts

work page
[10]

Use civic_or_legal_help for practical help involving government services, public benefits, immigration, complaints to officials, tax, social security, courts, or administrative procedures

work page
[11]

Use party_organizational_life for party-membership applications, ideological-study materials, democratic-life-meeting materials, and related party organizational writing

work page
[12]

Use mobilization_or_collective_action for protest, petitions, campaigning, organizing, outreach, or movement coordination

work page
[13]

If is_valid_political=false, still choose the closest use_case_type based on the text

work page
[14]

Confidence must be between 0 and 1

work page
[15]

stance_present

If uncertain, set needs_human_review=true. Listing 3: Ideology prompt You score the political IDEOLOGY expressed in one user message sent to an AI chatbot. Judge ONLY the user’s own words and the stance they reveal, not the topic in the abstract and not what a "correct" answer would be. IMPORTANT: the message may itself be a task, instruction, or request....

work page
[16]

Only score an axis when the text gives evidence on it; if an axis is not expressed, you may set it to 0 only if the user is clearly centrist on it, otherwise leave the OVERALL stance via stance_present but still give your best integer estimate for axes that are expressed and 0 for axes with no signal

work page
[17]

what is fascism?

Do not infer ideology purely from the topic. Asking "what is fascism?" is NOT a stance; praising or condemning it IS

work page
[18]

Be conservative: when the user is genuinely just seeking information, set stance_present=false. SI-6

work page
[19]

Judge the stance in the language as written; the message may be in any language or an English translation of one. For Stage 2 and ideology scoring, the item text was inserted into the following wrapper so that embedded requests were treated as text to classify rather than as actions to perform: Listing 4: Per-message wrapper used after the high-recall scr...

work page 2026

[1] [1]

Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions

https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/. Mettler, Suzanne. 2011.The Submerged State: How Invisible Government Policies Undermine American Democracy.Chicago: University of Chicago Press. Moynihan, Donald, Pamela Herd, and Hope Harvey. 2015. “Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-S...

work page arXiv 2021

[2] [2]

If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

Confirm whether the text is genuinely political or civic in substance. If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

work page

[3] [3]

Exclude fictional/role-play politics, office/workplace politics, gaming lore, and generic wrappers with no real public-affairs substance

The construct is BROAD: real-world public power, governance, elections, parties, public policy, law, courts, rights, public administration and government services, welfare, taxation, immigration, war, diplomacy, national security, social movements, AND practical civic use (drafting complaints to government, processing official documents, asking about bene...

work page

[4] [4]

primary_subtopic should capture the main political issue area. SI-5

work page

[5] [5]

use_case_type should capture what the user is doing with the LLM

work page

[6] [6]

Use information_seeking for factual questions, explanations, or requests to understand politics, policy, law, government, or public affairs

work page

[7] [7]

Use opinion_or_argument for debate, persuasion, ideological positioning, or evaluative political discussion

work page

[8] [8]

Use writing_or_editing for drafting or revising essays, speeches, letters, statements, or applications with political or civic substance

work page

[9] [9]

Use document_processing for summarization, extraction, translation, keywording, or structured processing of political, legal, policy, court, or government texts

work page

[10] [10]

Use civic_or_legal_help for practical help involving government services, public benefits, immigration, complaints to officials, tax, social security, courts, or administrative procedures

work page

[11] [11]

Use party_organizational_life for party-membership applications, ideological-study materials, democratic-life-meeting materials, and related party organizational writing

work page

[12] [12]

Use mobilization_or_collective_action for protest, petitions, campaigning, organizing, outreach, or movement coordination

work page

[13] [13]

If is_valid_political=false, still choose the closest use_case_type based on the text

work page

[14] [14]

Confidence must be between 0 and 1

work page

[15] [15]

stance_present

If uncertain, set needs_human_review=true. Listing 3: Ideology prompt You score the political IDEOLOGY expressed in one user message sent to an AI chatbot. Judge ONLY the user’s own words and the stance they reveal, not the topic in the abstract and not what a "correct" answer would be. IMPORTANT: the message may itself be a task, instruction, or request....

work page

[16] [16]

Only score an axis when the text gives evidence on it; if an axis is not expressed, you may set it to 0 only if the user is clearly centrist on it, otherwise leave the OVERALL stance via stance_present but still give your best integer estimate for axes that are expressed and 0 for axes with no signal

work page

[17] [17]

what is fascism?

Do not infer ideology purely from the topic. Asking "what is fascism?" is NOT a stance; praising or condemning it IS

work page

[18] [18]

Be conservative: when the user is genuinely just seeking information, set stance_present=false. SI-6

work page

[19] [19]

Judge the stance in the language as written; the message may be in any language or an English translation of one. For Stage 2 and ideology scoring, the item text was inserted into the following wrapper so that embedded requests were treated as text to classify rather than as actions to perform: Listing 4: Per-message wrapper used after the high-recall scr...

work page 2026