pith. sign in

arxiv: 2607.00551 · v2 · pith:WY2Q4MBBnew · submitted 2026-07-01 · 💰 econ.GN · q-fin.EC

Talking Politics with Artificial Intelligence

Pith reviewed 2026-07-03 18:10 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC
keywords artificial intelligencelarge language modelspolitical expressionconversational intermediarieselection effectspolitical content detectiondyadic conversationsregression discontinuity
0
0 comments X

The pith

AI conversations act as practical intermediaries for routine political tasks rather than arenas for public expression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes 4.3 million human-AI conversations to test whether these exchanges create a new space for political expression or instead serve as tools that absorb everyday political needs. Political content shows up in only 3.9 percent of conversations and consists mostly of requests for information, text drafting, and document processing. Around the 2024 U.S. presidential election result, U.S. users shifted toward more stance-taking, affective language, and ideological extremity in their messages, while similar patterns did not appear outside the U.S. This pattern indicates that AI absorbs routine demand most of the time and becomes more expressive only when major events raise the stakes.

Core claim

Political content appears in 3.9% of the conversations and is overwhelmingly practical, with users seeking information, drafting text, and processing documents far more often than stating opinions. The share of political content varies by platform publicness and conversation depth. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call increased stance-taking, affective language, and ideological extremity among U.S. users, with no comparable change in conversations elsewhere. The overall finding is that AI conversation functions as a conversational political intermediary rather than a public square.

What carries the argument

Two validated classifiers that tag user messages for political content, use case, and expressed ideology, paired with a regression-discontinuity-in-time design around the 2024 U.S. election result call.

If this is right

  • Political expression in AI conversations rises sharply after major events such as election results.
  • The large majority of political use remains instrumental rather than opinion-oriented.
  • Platform publicness and conversation length strongly shape how often political topics appear.
  • AI absorbs routine political demand until events make stakes explicit enough to trigger expressive shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Private AI chats may reduce visible public political discussion by handling many queries one-on-one.
  • Designers of political monitoring systems would need to include AI channels to capture the full picture during events.
  • The same intermediary pattern might appear in other domains such as health or financial advice during crises.
  • Longer conversations could be tested as a lever to increase the share of political content in future studies.

Load-bearing premise

The two classifiers correctly and consistently identify political content, use case, and ideology across the three datasets without meaningful platform-specific bias.

What would settle it

Reapplication of the classifiers to the same or new conversation data that produces substantially different rates of political content or ideology identification across platforms would undermine the prevalence and event-response findings.

Figures

Figures reproduced from arXiv: 2607.00551 by Ziwen Zu.

Figure 1
Figure 1. Figure 1: Political Prevalence Notes: Panel A reports the conversation-level political share in each corpus. Panel B reports the same quantity by user-turn bucket, a variable available in all three corpora. The topic distribution shows why political AI use should not be reduced to elections. In WildChat, the two largest categories are policy and legislation and government services, each accounting for about 18.7% of… view at source ↗
Figure 1
Figure 1. Figure 1: Political Prevalence Notes: Panel A reports the conversation-level political share in each corpus. Panel B reports the same quantity by user-turn bucket, a variable available in all three corpora. The topic distribution shows why political AI use should not be reduced to elections. In WildChat, the two largest categories are policy and legislation and government services, each accounting for about 18.7% of… view at source ↗
Figure 2
Figure 2. Figure 2: Topic Composition Notes: Topic identifies the substantive political domain of the conversation. The figure reports corpus-specific distributions and a pooled panel. appears, but what the user is trying to do with the model. On this dimension, political AI use is overwhelmingly practical. Across 248,935 political user turns with use-case labels, 64.7% seek information or explanation, 13.0% involve writing o… view at source ↗
Figure 2
Figure 2. Figure 2: Topic Composition Notes: Topic identifies the substantive political domain of the conversation. The figure reports corpus-specific distributions and a pooled panel. What people do. Topic breadth does not reveal the user’s purpose. A welfare-policy prompt can ask for an explanation, draft an appeal letter, summarize an official form, or state a view about redistribution. The use-case measure therefore asks … view at source ↗
Figure 3
Figure 3. Figure 3: Use Cases Notes: Use case identifies the user’s purpose or task within political material. The figure reports corpus-specific distributions and a pooled panel. Provider reports on general AI use provide a second benchmark for this interpretation (Appendix Table D.2). OpenAI’s privacy-preserving study of consumer ChatGPT use finds that practical guidance, seeking information, and writing account for about 7… view at source ↗
Figure 3
Figure 3. Figure 3: Use Cases Notes: Use case identifies the user’s purpose or task within political material. The figure reports corpus-specific distributions and a pooled panel. explanations (17%), documents and reports (15%), and guidance (11%) among the most common outputs (Anthropic 2026a). These benchmarks do not estimate political use, and their populations differ from the public corpora studied here. Their value is co… view at source ↗
Figure 4
Figure 4. Figure 4: Geographic Variation Notes: Left panel reports WildChat conversation-level political prevalence by world region. Right panel reports high-volume countries by political prevalence. Country labels should be read as descriptive corpus composition rather than population-level political interest. Table D.3 again points to publicness: the shared-link corpus is 16.9 percentage points more likely than WildChat to … view at source ↗
Figure 4
Figure 4. Figure 4: Geographic Variation Notes: Left panel reports WildChat conversation-level political prevalence by world region. Right panel reports high-volume countries by political prevalence. Country labels should be read as descriptive corpus composition rather than population-level political interest. left in all three corpora, while social positions lean mildly right, especially in LMSYS-Chat and ShareChat. Extreme… view at source ↗
Figure 5
Figure 5. Figure 5: Expressed Ideology Notes: Distributions are calculated among stance-taking political conversations. one left–right scale. This partial constraint is familiar from mass belief systems. Ordinary political expression often combines ideological cues, cross-cutting issue positions, and context-specific reasoning rather than a fully bundled ideology (Page and Shapiro 1992; Zaller 1992). The affective evidence sh… view at source ↗
Figure 5
Figure 5. Figure 5: Expressed Ideology Notes: Distributions are calculated among stance-taking political conversations. The ideological dimensions do not collapse into a single scale. Economic and social scores are positively associated, but only modestly so: the correlations are 0.21 in WildChat, 0.36 in LMSYS-Chat, and 0.27 in ShareChat ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ideology Structure Notes: Points show deterministic samples of up to 3,000 stance-taking conversations per corpus; correlations use all stance-taking conversations with non-missing scores on the plotted dimensions. 22 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ideology Structure Notes: Points show deterministic samples of up to 3,000 stance-taking conversations per corpus; correlations use all stance-taking conversations with non-missing scores on the plotted dimensions. 15 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Result-Call Effects on Expression Notes: Figure reports U.S. RDiT plots around the result-call cutoff for political prevalence, stance-taking, and affective charge. Full local-linear RDiT estimates are reported in Appendix Table F.1. The more robust result is expressive and affective: after the AP call, U.S. stance-taking conversations became more charged and more ideologically extreme. RD = 0.55*** se = 0… view at source ↗
Figure 7
Figure 7. Figure 7: Result-Call Effects on Expression Notes: Figure reports U.S. RDiT plots around the result-call cutoff for political prevalence, stance-taking, and affective charge. Full local-linear RDiT estimates are reported in Appendix Table F.1. whether users expressed positions, but also how they framed political disagreement. The finding is consistent with work showing that partisan conflict often operates through s… view at source ↗
Figure 8
Figure 8. Figure 8: Result-Call Effects on Ideology Notes: Figure reports U.S. RDiT plots around the result-call cutoff for economic position and affective polarization. Full local-linear RDiT estimates for ideology outcomes are reported in Appendix Table F.2 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 8
Figure 8. Figure 8: Result-Call Effects on Ideology Notes: Figure reports U.S. RDiT plots around the result-call cutoff for economic position and affective polarization. Full local-linear RDiT estimates for ideology outcomes are reported in Appendix Table F.2 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ideology Estimates by Geography Notes: Figure reports result-call RDiT estimates for ideology outcomes in the United States and the rest of the world. Corresponding numerical estimates are reported in Appendix Table F.2. Figures 7–9 fit a theory of political expression better than a theory of attention alone. Agenda￾setting and priming explain why a high-salience event makes politics more likely to enter c… view at source ↗
Figure 9
Figure 9. Figure 9: Ideology Estimates by Geography Notes: Figure reports result-call RDiT estimates for ideology outcomes in the United States and the rest of the world. Corresponding numerical estimates are reported in Appendix Table F.2. Figures 7–9 fit a theory of political expression better than a theory of attention alone. Agenda￾setting and priming explain why a high-salience event makes politics more likely to enter c… view at source ↗
read the original abstract

Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper analyzes 4.30 million human-AI conversations from three large public datasets using two validated classifiers to identify political content (found in 3.9% of conversations), use cases, and expressed ideology. It concludes that AI serves primarily as a conversational intermediary for routine practical tasks rather than a public square for expressive political discourse, with a regression-discontinuity-in-time design around the 2024 U.S. presidential election result call showing increased stance-taking, affective language, and ideological extremity among U.S. users but not elsewhere.

Significance. If the classifiers prove reliable and cross-platform consistent, the result would provide large-scale evidence that AI political use is mostly absorptive of routine demand and only shifts to expressive modes when major events raise stakes, with implications for theories of digital political communication and platform design.

major comments (3)
  1. [Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.
  2. [Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.
  3. [Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.
minor comments (2)
  1. [Data] The three source datasets are referred to as 'large public datasets' without naming them or providing links/DOIs in the main text; this should be added for reproducibility.
  2. [Figures/Tables] Figure captions and table notes should explicitly state the sample restrictions (e.g., U.S. users only) used for the RD plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving transparency and robustness. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.

    Authors: We agree that explicit performance metrics are required for credible interpretation of the prevalence estimates and downstream results. The classifiers were validated on human-annotated held-out data, but these details were omitted from the initial submission. In the revision we will add a methods appendix reporting accuracy, precision, recall, F1, inter-annotator agreement (Cohen's kappa), and platform-stratified performance on the validation sets. revision: yes

  2. Referee: [Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.

    Authors: The concern about potential selection or composition shifts at the cutoff is valid and directly relevant to causal interpretation. In the revised manuscript we will report formal tests for discontinuities in conversation volume, inferred user characteristics, and topic distributions at the election-result cutoff, along with any robustness implications for the observed changes in stance-taking, affect, and extremity among U.S. users. revision: yes

  3. Referee: [Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.

    Authors: We accept that the operational definitions must be stated explicitly. The revision will define conversation depth as the total message count with the precise threshold used for 'deep' conversations, and platform publicness via the three datasets' documented sharing policies. We will also add sensitivity tables using alternative thresholds to demonstrate robustness of the reported patterns. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with external RD design; no derivation reduces to inputs

full rationale

The paper is a data-driven empirical analysis of 4.3M conversations using two classifiers and a regression-discontinuity design around the external 2024 U.S. election result call. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the abstract or described methods. The central claims rest on observed frequencies and the external discontinuity rather than any self-referential construction. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; no free parameters, invented entities, or explicit axioms stated beyond reliance on classifier accuracy and dataset representativeness.

axioms (2)
  • domain assumption The two classifiers are validated and accurately detect political content, use case, and ideology in user messages.
    Abstract states 'two validated classifiers' without reporting validation metrics or error rates.
  • domain assumption The three public datasets are representative of typical human-AI political interactions.
    Abstract uses the datasets to draw general conclusions about AI conversation without discussing selection or coverage.

pith-pipeline@v0.9.1-grok · 5709 in / 1307 out tokens · 20030 ms · 2026-07-03T18:10:02.189705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions

    https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/. Mettler, Suzanne. 2011.The Submerged State: How Invisible Government Policies Undermine American Democracy.Chicago: University of Chicago Press. Moynihan, Donald, Pamela Herd, and Hope Harvey. 2015. “Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-S...

  2. [2]

    If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

    Confirm whether the text is genuinely political or civic in substance. If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political

  3. [3]

    Exclude fictional/role-play politics, office/workplace politics, gaming lore, and generic wrappers with no real public-affairs substance

    The construct is BROAD: real-world public power, governance, elections, parties, public policy, law, courts, rights, public administration and government services, welfare, taxation, immigration, war, diplomacy, national security, social movements, AND practical civic use (drafting complaints to government, processing official documents, asking about bene...

  4. [4]

    primary_subtopic should capture the main political issue area. SI-5

  5. [5]

    use_case_type should capture what the user is doing with the LLM

  6. [6]

    Use information_seeking for factual questions, explanations, or requests to understand politics, policy, law, government, or public affairs

  7. [7]

    Use opinion_or_argument for debate, persuasion, ideological positioning, or evaluative political discussion

  8. [8]

    Use writing_or_editing for drafting or revising essays, speeches, letters, statements, or applications with political or civic substance

  9. [9]

    Use document_processing for summarization, extraction, translation, keywording, or structured processing of political, legal, policy, court, or government texts

  10. [10]

    Use civic_or_legal_help for practical help involving government services, public benefits, immigration, complaints to officials, tax, social security, courts, or administrative procedures

  11. [11]

    Use party_organizational_life for party-membership applications, ideological-study materials, democratic-life-meeting materials, and related party organizational writing

  12. [12]

    Use mobilization_or_collective_action for protest, petitions, campaigning, organizing, outreach, or movement coordination

  13. [13]

    If is_valid_political=false, still choose the closest use_case_type based on the text

  14. [14]

    Confidence must be between 0 and 1

  15. [15]

    stance_present

    If uncertain, set needs_human_review=true. Listing 3: Ideology prompt You score the political IDEOLOGY expressed in one user message sent to an AI chatbot. Judge ONLY the user’s own words and the stance they reveal, not the topic in the abstract and not what a "correct" answer would be. IMPORTANT: the message may itself be a task, instruction, or request....

  16. [16]

    Only score an axis when the text gives evidence on it; if an axis is not expressed, you may set it to 0 only if the user is clearly centrist on it, otherwise leave the OVERALL stance via stance_present but still give your best integer estimate for axes that are expressed and 0 for axes with no signal

  17. [17]

    what is fascism?

    Do not infer ideology purely from the topic. Asking "what is fascism?" is NOT a stance; praising or condemning it IS

  18. [18]

    Be conservative: when the user is genuinely just seeking information, set stance_present=false. SI-6

  19. [19]

    Judge the stance in the language as written; the message may be in any language or an English translation of one. For Stage 2 and ideology scoring, the item text was inserted into the following wrapper so that embedded requests were treated as text to classify rather than as actions to perform: Listing 4: Per-message wrapper used after the high-recall scr...