Talking Politics with Artificial Intelligence
Pith reviewed 2026-07-03 18:10 UTC · model grok-4.3
The pith
AI conversations act as practical intermediaries for routine political tasks rather than arenas for public expression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Political content appears in 3.9% of the conversations and is overwhelmingly practical, with users seeking information, drafting text, and processing documents far more often than stating opinions. The share of political content varies by platform publicness and conversation depth. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call increased stance-taking, affective language, and ideological extremity among U.S. users, with no comparable change in conversations elsewhere. The overall finding is that AI conversation functions as a conversational political intermediary rather than a public square.
What carries the argument
Two validated classifiers that tag user messages for political content, use case, and expressed ideology, paired with a regression-discontinuity-in-time design around the 2024 U.S. election result call.
If this is right
- Political expression in AI conversations rises sharply after major events such as election results.
- The large majority of political use remains instrumental rather than opinion-oriented.
- Platform publicness and conversation length strongly shape how often political topics appear.
- AI absorbs routine political demand until events make stakes explicit enough to trigger expressive shifts.
Where Pith is reading between the lines
- Private AI chats may reduce visible public political discussion by handling many queries one-on-one.
- Designers of political monitoring systems would need to include AI channels to capture the full picture during events.
- The same intermediary pattern might appear in other domains such as health or financial advice during crises.
- Longer conversations could be tested as a lever to increase the share of political content in future studies.
Load-bearing premise
The two classifiers correctly and consistently identify political content, use case, and ideology across the three datasets without meaningful platform-specific bias.
What would settle it
Reapplication of the classifiers to the same or new conversation data that produces substantially different rates of political content or ideology identification across platforms would undermine the prevalence and event-response findings.
Figures
read the original abstract
Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes 4.30 million human-AI conversations from three large public datasets using two validated classifiers to identify political content (found in 3.9% of conversations), use cases, and expressed ideology. It concludes that AI serves primarily as a conversational intermediary for routine practical tasks rather than a public square for expressive political discourse, with a regression-discontinuity-in-time design around the 2024 U.S. presidential election result call showing increased stance-taking, affective language, and ideological extremity among U.S. users but not elsewhere.
Significance. If the classifiers prove reliable and cross-platform consistent, the result would provide large-scale evidence that AI political use is mostly absorptive of routine demand and only shifts to expressive modes when major events raise stakes, with implications for theories of digital political communication and platform design.
major comments (3)
- [Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.
- [Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.
- [Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.
minor comments (2)
- [Data] The three source datasets are referred to as 'large public datasets' without naming them or providing links/DOIs in the main text; this should be added for reproducibility.
- [Figures/Tables] Figure captions and table notes should explicitly state the sample restrictions (e.g., U.S. users only) used for the RD plots.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important areas for improving transparency and robustness. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and methods: the classifiers are described as 'validated' and used to produce the headline 3.9% political-content rate, use-case distributions, and ideology labels, yet no accuracy, precision/recall, inter-annotator agreement, or platform-stratified performance metrics are reported; without these, both the baseline prevalence and the RD discontinuity cannot be interpreted.
Authors: We agree that explicit performance metrics are required for credible interpretation of the prevalence estimates and downstream results. The classifiers were validated on human-annotated held-out data, but these details were omitted from the initial submission. In the revision we will add a methods appendix reporting accuracy, precision, recall, F1, inter-annotator agreement (Cohen's kappa), and platform-stratified performance on the validation sets. revision: yes
-
Referee: [Results (RD section)] Results / RD design: the analysis restricts attention to U.S. users post-election call and reports a shift in the expressive subset, but provides no evidence on whether the user pool, conversation sampling, or topic composition changed discontinuously at the cutoff, leaving open selection effects that could drive the observed change in stance-taking and affect.
Authors: The concern about potential selection or composition shifts at the cutoff is valid and directly relevant to causal interpretation. In the revised manuscript we will report formal tests for discontinuities in conversation volume, inferred user characteristics, and topic distributions at the election-result cutoff, along with any robustness implications for the observed changes in stance-taking, affect, and extremity among U.S. users. revision: yes
-
Referee: [Data / Measurement] Data section: conversation depth and platform publicness are invoked to explain variation in political content, but the measurement rules for these variables (e.g., message count thresholds, platform classification criteria) are not specified, preventing assessment of whether the reported patterns are robust to alternative codings.
Authors: We accept that the operational definitions must be stated explicitly. The revision will define conversation depth as the total message count with the precise threshold used for 'deep' conversations, and platform publicness via the three datasets' documented sharing policies. We will also add sensitivity tables using alternative thresholds to demonstrate robustness of the reported patterns. revision: yes
Circularity Check
Empirical measurement study with external RD design; no derivation reduces to inputs
full rationale
The paper is a data-driven empirical analysis of 4.3M conversations using two classifiers and a regression-discontinuity design around the external 2024 U.S. election result call. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes appear in the abstract or described methods. The central claims rest on observed frequencies and the external discontinuity rather than any self-referential construction. This matches the default expectation of a non-circular empirical paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The two classifiers are validated and accurately detect political content, use case, and ideology in user messages.
- domain assumption The three public datasets are representative of typical human-AI political interactions.
Reference graph
Works this paper leans on
-
[1]
Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-State Interactions
https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/. Mettler, Suzanne. 2011.The Submerged State: How Invisible Government Policies Undermine American Democracy.Chicago: University of Chicago Press. Moynihan, Donald, Pamela Herd, and Hope Harvey. 2015. “Administrative Burden: Learning, Psychological, and Compliance Costs in Citizen-S...
-
[2]
Confirm whether the text is genuinely political or civic in substance. If it now looks like a false positive or too borderline, set is_valid_political=false and primary_subtopic=non_political
-
[3]
The construct is BROAD: real-world public power, governance, elections, parties, public policy, law, courts, rights, public administration and government services, welfare, taxation, immigration, war, diplomacy, national security, social movements, AND practical civic use (drafting complaints to government, processing official documents, asking about bene...
-
[4]
primary_subtopic should capture the main political issue area. SI-5
-
[5]
use_case_type should capture what the user is doing with the LLM
-
[6]
Use information_seeking for factual questions, explanations, or requests to understand politics, policy, law, government, or public affairs
-
[7]
Use opinion_or_argument for debate, persuasion, ideological positioning, or evaluative political discussion
-
[8]
Use writing_or_editing for drafting or revising essays, speeches, letters, statements, or applications with political or civic substance
-
[9]
Use document_processing for summarization, extraction, translation, keywording, or structured processing of political, legal, policy, court, or government texts
-
[10]
Use civic_or_legal_help for practical help involving government services, public benefits, immigration, complaints to officials, tax, social security, courts, or administrative procedures
-
[11]
Use party_organizational_life for party-membership applications, ideological-study materials, democratic-life-meeting materials, and related party organizational writing
-
[12]
Use mobilization_or_collective_action for protest, petitions, campaigning, organizing, outreach, or movement coordination
-
[13]
If is_valid_political=false, still choose the closest use_case_type based on the text
-
[14]
Confidence must be between 0 and 1
-
[15]
If uncertain, set needs_human_review=true. Listing 3: Ideology prompt You score the political IDEOLOGY expressed in one user message sent to an AI chatbot. Judge ONLY the user’s own words and the stance they reveal, not the topic in the abstract and not what a "correct" answer would be. IMPORTANT: the message may itself be a task, instruction, or request....
-
[16]
Only score an axis when the text gives evidence on it; if an axis is not expressed, you may set it to 0 only if the user is clearly centrist on it, otherwise leave the OVERALL stance via stance_present but still give your best integer estimate for axes that are expressed and 0 for axes with no signal
-
[17]
Do not infer ideology purely from the topic. Asking "what is fascism?" is NOT a stance; praising or condemning it IS
-
[18]
Be conservative: when the user is genuinely just seeking information, set stance_present=false. SI-6
-
[19]
Judge the stance in the language as written; the message may be in any language or an English translation of one. For Stage 2 and ideology scoring, the item text was inserted into the following wrapper so that embedded requests were treated as text to classify rather than as actions to perform: Listing 4: Per-message wrapper used after the high-recall scr...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.