Selecting the right electricity market region for a hyperscale AI datacenter requires reasoning across live electricity prices, grid carbon intensity, technology cost trajectories, and causal grid dynamics -- a multi-step, multi-source analytical task that static knowledge benchmarks cannot evaluate. We introduce EnergyAgentBench, the first agentic benchmark grounded in live electricity market data for this problem class. The benchmark comprises 70 task variants across five families: datacenter siting under cost-carbon trade-offs (F1), long-horizon portfolio siting (F1-LH), lifetime LCOE ranking over multi-decade cost trajectories (F2), 30-year portfolio optimization (F2-LH), and causal grid diagnosis (F3). Tasks require 3 to 48 sequential tool calls against live endpoints from the QuarluxAI infrastructure platform, the U.S. Energy Information Administration (EIA), and the National Renewable Energy Laboratory (NREL) with ground truth derived from trained XGBoost cost-surface models (R^2 0.967--0.995) and the NREL Annual Technology Baseline 2024. We evaluate nine models across Anthropic, OpenAI, and HuggingFace over 1,414 runs at three random seeds. Claude Sonnet 4.6 achieves the highest overall score (0.900) at one-quarter the cost of Claude Opus 4.7 (0.889). Claude Haiku 4.5 leads on long-horizon procedural siting (0.986), outperforming all frontier models including those costing 16x more per run. F3 Causal is the most discriminating family, with a 30.7-point spread between Sonnet (0.793) and Llama 3.3 70B (0.486), versus a 6.6-point spread on F1 Siting. A failure taxonomy of 135 coded failures identifies null-value integration in NREL ATB trajectories as the dominant failure mode (70%), followed by premature commitment on causal tasks (20%) and adversarial injection blindness (6%). Benchmark code, run trajectories, and the failure taxonomy dataset are publicly released.
We argue that AI-saturated markets are likely to create Veblen-good premiums, which we term human-provenance premiums, for verified human presence, and hence AI governance should treat human-provenance verification as labor infrastructure. Generative and agentic AI systems lower the cost of many standardized cognitive, creative, and coordination tasks, weakening the scarcity premiums that have supported much middle-tier knowledge work. We argue that this pressure may produce an asymmetric barbell-shaped structure of value capture in advanced economies: high-volume synthetic production controlled by owners of AI infrastructure at one pole, and scarce, high-status human labor valued for verified human presence at the other.
We advance three claims. First, AI compresses the value of standardized middle-tier labor by making good-enough synthetic substitutes scalable at low marginal cost, hollowing out the middle of the skill distribution currently categorized by knowledge work. Second, this compression reallocates demand for human labor toward work valued for its visible human character. We term this performative humanity and distinguish three forms of labor: relational presence, aesthetic provenance, and accountability. Third, as these premiums depend on credible verification, AI governance should treat human-provenance systems as labor infrastructure rather than as luxury authenticity labels.
To evaluate hybrid human-AI work, we propose constitutive human presence as the relevant standard: human labor retains premium value when human judgment, attention, accountability, authorship, or relational participation is not incidental to the output but constitutive of what is being purchased.
Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity. We argue that these representations constitute an information channel, role coherence, through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage. In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, we show that seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure. Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level. Instead, we propose architectural interventions that trade off personalization against preference privacy.
We introduce endogenous shareholding auctions for production economies where a monopolist must elicit consumer demand in order to determine price and quantity. Each of these auctions has the property that the auction's profit is distributed across the monopolist and the consumers in accordance with ownership shares that are determined over the course of the auction. We characterize this class, and a larger class, on the basis of standard axioms. Finally, we investigate optimal auctions according to both prior-free domination and subjective expected welfare.
A common experimental research design is one in which individuals are randomly allocated into groups that then interact under different group-level treatment conditions. We develop design-based inference for such "group interaction" experiments, covering scenarios in which groups are either fixed or randomly formed and in which potential outcomes are either fixed relative to others' group assignments or subject to interference. For each scenario, we characterize the causal estimand that the design targets and the inferential strategy appropriate to it. Working in a sparse-sampling asymptotic regime, we show that cluster-robust inference remains consistent and accounts for dependencies from various sources when interference is present, delivering valid inference on marginalized exposure effects. When interference is absent and groups are formed randomly, the design reduces to an individually randomized experiment, and individual-level heteroskedasticity-robust inference suffices for the average treatment effect. Our results on the asymptotic distribution of commonly used estimators rely on a novel coupling strategy that may be useful for design-based inference in other complex experiments.
When an early information producer is judged only after others have reviewed and revised the work, the same review that sharpens the final decision can blur the question of who deserves the credit. This paper asks how an organization can still reward careful early work once a chain of later reviewers has acted on it. In the model, an analyst's hidden effort makes an initial report more likely to be right; a sequence of reviewers then reacts to it; and the organization can pay only on the record this process leaves behind. The main result splits the value of any such record into two parts: how much effort improved the first report, and how well the final record still indicates whether the first report was accurate. When later reviewers can overturn a flawed report, review improves decisions but washes the analyst's effort out of the final outcome; therefore, rewarding the final outcome stops working, while rewarding agreement between the first and last word has incentive value instead. When the first report is simply copied by downstream reviewers, the reverse holds. Which reward is better comes down to one thing: how likely the review process is to repair an initial mistake.
I develop the asymptotic theory of instrument strength for Granular Instrumental Variables (GIV) in large panels with both $N$ and $T$ growing. The strength of the GIV depends on the presence of dominant units. I formalise what dominance means and characterise three regimes of instrument strength. When a few units dominate the aggregate, the instrument is strong. The GIV estimator is consistent and asymptotically normal at the standard $\sqrt{T}$ rate. When large units stand out but do not dominate, the instrument weakens. But I show that the parameter of interest remains recoverable. The GIV estimator remains consistent and asymptotically normal, now at a rate slower than $\sqrt{T}$. When units are comparable in size and none stands out, the instrument is weak in the standard sense. The GIV estimator is inconsistent and has a non-standard distribution. Wald inference is reliable only outside the weak regime. When the instrument is weak, I recommend Anderson-Rubin confidence sets. In practice, the instrument must be constructed in a first stage. I show that the feasible estimator attains the same rate, but its asymptotic variance picks up an additional term from the first-stage estimation. Valid inference must use standard errors that account for this term. I apply the GIV estimator with the correct standard errors to recover the short-run demand elasticities of three commodities: refined copper, crude oil, and natural gas.
Measuring the long-term opportunity cost of interventions remains a critical challenge in e-commerce A/B testing. While strategic levers (such as dynamic pricing, ranking algorithms, and promotional campaigns) trigger shifts in consumer behaviour that persist over months, operational constraints necessitate fast decision-making cycles that are typically limited to weekly experimental windows. Standard metrics like revenue and conversion are inherently short-sighted, biasing decisions toward immediate gains. We introduce Stock Lifetime Value (SLV), a stock-centric metric that captures long-term opportunity cost within short experiments by aggregating expected profit from current inventory through the end of its selling lifecycle. We develop the methodology in the context of fashion e-commerce at Zalando, where stock constraints and seasonal lifecycles make the trade off between short-term and long-term outcomes particularly relevant. SLV aggregates the expected profit from current inventory through the end of its selling lifecycle, providing a way to evaluate interventions against their true profit impact. We discuss three applications: (a) SLV efficiency as a metric for article-level and customer-level A/B tests, validated against realized 18-month lifecycle outcomes; (b) SLV as an optimization target for pricing algorithms, aligning the metric used for measurement with the objective used for decision-making; and (c) a framework for annualizing treatment effects into financial reporting metrics required by business stakeholders. While our empirical setting is fashion retail, the framework applies broadly to any inventory-constrained environment where value decays over time or interventions shift demand across periods.
We present a multi-agent system for studying the allocation of discrete, congested resources among heterogeneous strategic agents, motivated by the problem of railway slot allocation under deregulation. Multiple operator-agents, differing in size and capacity, interact through a shared auction mechanism over repeated rounds under time-constrained decision-making. The mechanism combines a congestion-based base price that increases with aggregate demand with an asymmetric corrective adjustment that penalises the agent requesting the most slots and rewards the agent requesting the fewest, and is designed to mitigate strategic dominance by large agents while preserving transparency and congestion sensitivity. We formulate the interaction as a repeated game with incomplete information and implement the system as a real-time, web-based multi-agent environment in which human participants control individual agents and observe live marginal-cost and competitor feedback.
We report exploratory observations from two structured sessions with domain experts acting as operator-agents. The congestion mechanism responds to aggregate demand as designed and the corrective incentives are actively triggered, but agents representing large operators persist with high-request strategies despite the penalty, suggesting that corrective pricing is necessary but not sufficient to neutralise strategic dominance in this multi-agent setting. A post-session debrief indicates that participants' decisions were driven by the assumed agent role rather than personal disposition, and provides qualitative support for strategic motives, such as preserving market presence and raising rivals' costs, operating alongside short-term profit maximisation. We discuss implications for multi-agent mechanism design under asymmetric budgets and outline directions for analytical validation and larger-scale multi-agent experiments.
Average wages in Japan rose until the mid-1990s but stagnated thereafter. This paper studies Japan's long-run wage stagnation by decomposing changes in average log real hourly wages from 1980 to 2024 into four components: demographic change across worker types, changes in relative employment shares across job types, changes in relative wages across job types, and wage growth within job types. The framework combines a shift-share decomposition across worker types with an extension of the Olley-Pakes decomposition that separates employment reallocation from changes in relative wages across job types. Wage growth within job types contributes positively over the full sample period, but demographic change and employment reallocation partly offset it. Between 1996 and 2014, all four components are negative. The negative contribution from employment reallocation is not limited to the expansion of part-time employment, but reflects broader shifts across job types defined by employment type, establishment size, and industry.
This article introduces an empirical condition for the nonparametric point-identification of multivariate instrumental variable models with continuous endogenous variables using binary instruments. Verifying this condition can confirm point-identification in settings in which traditional approaches are not applicable. In particular, it shows that nonlinear instrumental variable models with general heterogeneity can be point-identified with only a binary instrument. This generalizes existing identification results which either restrict the unobserved heterogeneity substantially or require the instrument to have a large support. The main assumption on the instrumental variable model is cyclic monotonicity of its first stage, a multivariate generalization of the classical rank-invariance assumption for univariate models. Asymptotic convergence results for the empirical observable distributions are derived that allow to check the condition in practice. The identification rests on a fixed-set convergence result of cyclically monotone maps between quasi-concave functions. The corrigendum corrects the proof of Lemma 1. The proof given there incorrectly identifies preservation of distributional level sets with preservation of the underlying probability measure via Brenier maps. We replace that argument by one based on inverse Brenier maps, which play the role of multivariate ranks. The corrected argument applies to a different but significantly more flexible class of distributions than the quasi-concave class considered in the original paper. In particular, it allows for smooth non-quasi-concave and multimodal densities on compact supports, provided the associated rank fixed set satisfies a nondegeneracy condition. Moreover, it is generically satisfied for smooth parmetric classes of distributions.
We estimate Kyle's (1985) price-impact coefficient $\lambda$ directly from daily equity order flow and test its ability to forecast the cross-section of subsequent stock returns. Using CRSP data from 2020 to 2025, we construct firm-month measures of signed order flow and two estimators of $\hat\lambda_{it}$: a within-month price-impact regression and an Amihud-style ratio. Signed order flow strongly predicts contemporaneous and one-month-ahead returns, while volume volatility predicts lower subsequent returns, consistent with widening price impact degrading price discovery. Fama-MacBeth regressions confirm that our order-flow signal carries significant cross-sectional return information after Newey--West adjustment. Theoretically, we resolve the liquidity premium puzzle of Constantinides (1986) through an adverse-selection mechanism: low order flow widens $\lambda$ and depresses prices today; subsequent normalization restores prices, generating the illiquidity premium without risk-based compensation.
Law-invariance, monotonicity under vector dominance, and background-risk invariance force this exact structure.
abstractclick to expand
Suppose we want to assign a certainty equivalent--one number--to a multivariate risk. Which such assignments are law-invariant, monotone with respect to vector stochastic dominance, and invariant to independent background risk? I show that every such certainty equivalent is a positive mixture of scalar entropic certainty equivalents applied to positive projections of the vector risk. The same representation yields a robust-order characterization: unanimity across such certainty equivalents is equivalent, up to closure, to dominance after adding independent multidimensional background risk. In a social-welfare specialization, the corresponding shadow valuations are welfare weights.
We examine the economic impact of increasingly productive AI and policies that spread its benefits across the economy. Improvements in AI productivity trigger labor reallocation and changes in absolute and relative wages for different types of labor. Wages of labor that is essential for building AI increase faster than overall GDP. Wages of labor that is substituted for by AI decrease in both absolute and relative terms. Wages of labor that is used only in final goods production and is not displaced by AI increase in line with overall GDP. We contrast the impact of productivity gains depending on whether AI production is competitive or monopolistic. Monopoly production of AI restricts its deployment, slowing the transition and impact of AI. Optimal tax and regulatory policies that achieve Pareto-improvements differ depending on whether there is competition in AI production.
This paper estimates the effect of cross-border transmission constraints on suspected market power abuse in the German wholesale electricity market. Using a 2SRI instrumental variables approach, we study suspected strategic behavior by German gas- and coal-fired power plants in 2022-2024. Cross-border transmission constraints are measured using the maximum and minimum bounds of zonal net position, while suspected market power abuse is measured as the upward or downward deviation of observed dispatch from a modeled competitive benchmark. We find that transmission constraints significantly elevate the likelihood of suspected market power abuse. When headroom for further imports is already scarce, reducing import headroom by one Gigawatt (GW) increases the odds of suspected capacity withholding by 15%. Similarly, reducing export headroom by one GW when it is scarce increases the odds of suspected capacity push-in, a strategy to depress prices, by 16%. These results provide empirical support for interconnection expansion as an instrument to mitigate market power.
A generalized decomposition lemma isolates the relevant dominance relations and yields a unique set computable by standard tools.
abstractclick to expand
This paper studies the structure and computation of von Neumann-Morgenstern (vNM) stable sets in one-to-one matching markets. While pairwise stability and corewise stability coincide under strict preferences and provide a well-understood benchmark, vNM stability is defined through dominance relations among sets of matchings and remains considerably more difficult to characterize. A key contribution of the paper is a generalization of the classical Decomposition Lemma. We show that the structural decomposition traditionally used to compare stable matchings extends to any pair of matchings belonging to the same internally stable set. This result reveals a previously unexplored connection between internal stability and the cycle structure underlying matching markets. Building on this characterization, we identify the relationships that are relevant for dominance-based stability and derive a reduced environment that concentrates all undominated outcomes. Our main result shows that the vNM stable set is unique and admits a simple characterization in terms of the core of this reduced environment. The characterization provides both structural insight and a constructive procedure for computing the vNM stable set using standard matching theoretic tools.
Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.
Centralized hydrothermal planning models determine generation schedules and electricity spot prices based on inflow forecasts in audited-cost power systems, such as those prevalent in Latin America, and provide operational benchmarks and decision support in hydro-dominated competitive electricity markets. Consequently, biased forecasts can propagate directly into both operational decisions and market outcomes. This paper studies how persistent optimistic inflow-forecast bias propagates through the Brazilian hydrothermal power system and market. For a stylized hydrothermal model, we show analytically that optimistic bias weakly reduces water values and weakly increases first-stage hydro discharge relative to the unbiased optimum, thereby lowering reservoir storage and postponing thermal commitment. Using official Brazilian planning and operational data, we provide empirical evidence consistent with this mechanism. We then conduct a controlled SDDP experiment to compare policies trained under biased and bias-corrected inflow-forecast processes, evaluating both under the same bias-corrected inflow scenarios. The policy trained under biased forecasts produces lower reservoir levels, delayed dry-season thermal dispatch, sharper spot-price peaks, higher reliability risk, and higher expected operating costs. Finally, we show that these distortions increase the price-quantity risk for hydropower producers and reduce their willingness to contract. The results indicate that inflow-forecast bias is not merely a statistical forecasting problem, but can be a source of operational inefficiency, reliability risk, and distorted market incentives in hydro-dominated power systems. We argue that the insights and policy implications drawn in this paper may be relevant beyond Brazil to other hydro-dominated systems and electricity markets that are increasingly reliant on energy storage.
Researchers often use the density of connections between groups of agents, such as communities, blocs, or markets, to characterize the structure of a social or economic network. In many cases, these groups are selected using the network data, making conventional fixed-group inference procedures potentially invalid. To address this issue, we develop two new confidence intervals that are universally valid post-selection in the sense that they guarantee simultaneous coverage asymptotically over all pairs of groups whose relative sizes do not vanish. Our first interval builds on a strategy of \cite{berk2013valid}. Our second interval is based on a Talagrand-type concentration inequality for empirical processes. Both intervals are simple to compute and scalable to large networks, but a key technical contribution of our paper is show that only the second interval achieves the best-possible width asymptotically up to a constant factor. Three empirical illustrations show that accounting for selection can matter in practice. Some evidence for homophily in a social network and a hub-and-spoke structure in a trade network survives our correction, while evidence for disjoint market segments in a worker transition network does not.
Airbnb is a community based on connection and belonging -- many hosts on Airbnb are everyday people who share their worlds to provide guests with the feeling of connection and being at home; Airbnb strives to connect people and places. Among our efforts to connect guests and hosts, we provide tools to enable hosts to set competitive prices, which helps improve affordability for guests while helping hosts get more bookings. We also personalize the guest experience to show them the listings that match their needs.
To help inform these efforts, we combine economic modeling and causal inference techniques to understand how guests book stays based on the prices hosts set, among other factors, and how that preference varies across different guests and listings. Such understanding helps us identify opportunities for Airbnb to support the marketplace and better connect guests and hosts. For example, understanding how much guests respond to different prices helps optimize the tools that we provide to hosts, in order to enable hosts to choose and set competitive prices that further balance demand and supply. As another example, understanding heterogeneity in guest preferences helps us personalize the guest experience and better match them with the listings that meet their needs, based on how much they respond to different prices and other factors.
Census analysis 1981-2011 finds nights shift workers into paid agricultural roles but days move them to seasonal work, cutting output in bot
abstractclick to expand
This paper finds diverging partial effects of diurnal warming (higher nighttime and daytime temperatures) on agricultural wage-labour shares from decadal Indian Censuses (1981-2011). Though both margins contract grain output and cultivated area, only higher maxima raise harvest prices locally, consistent with a model where warmer nights shock land but warmer days shock land and labour productivity. Warming nights shift seasonal workers and self-cultivators into agricultural labour; warming days push labour to the seasonal margin. Long differences show the labour divergence is rural. In towns, both margins depress non-agricultural worker shares.
We consider inference for parameters of the form $\theta_0 = E[F_Y^{-1}\circ F_Z(X)]$ for some variables $X$, $Y$ and $Z$. Such parameters appear, in particular, in the ``changes-in-changes'' model of \cite{AtheyImbens2006}. We first establish that $\widehat{\theta}$, a plug-in estimator of $\theta_0$, is root-$n$ consistent and asymptotically normal under weaker conditions than those previously available, allowing in particular for unbounded variables. Next, we propose a new estimator of the asymptotic variance of $\widehat{\theta}$ and show its consistency, also allowing for unbounded variables. Monte Carlo simulations suggest that the conditions for root-$n$ consistency and asymptotic normality are, in some sense, minimal. These simulations highlight that our variance estimator also leads to more accurate inference than some alternative approaches.
We examine a choice between bonus contracts offered to dealers of a U.S. auto manufacturer. In our data, dealers select the non-profit-maximizing option in 20 percent of observations, costing the mistaken dealers $18,453 per year on average. We examine how the propensity to make this mistake varies with competition, identified both cross-sectionally and within dealers over time. Both analyses show that greater competition substantially lowers the rate of mistakes. However, even in the most competitive markets, consequential mistakes persist. Our results suggest that competition disciplines mainly through within-dealer changes in behavior rather than entry and exit.
This work addresses the problem of assessing player importance in coalitional settings where the available information concerns the relative strength between pairs of coalitions, rather than the absolute worth of each coalition. We introduce a novel framework that is flexible enough to represent all coalitional pseudo-games and, through the use of coalitional networks, naturally accommodates scenarios with limited or heterogeneous coalition comparisons. Importantly, this framework still enables the computation of semivalues of pseudo-games, such as the Banzhaf and Shapley values, that can be expressed as weighted sums of differences in specific coalition comparisons, thus offering interpretations beyond traditional approaches. Furthermore, for ranking players rather than computing exact numerical attributions, we introduce the concept of a player's score, which simplifies the process of determining rankings based on semivalues, and shifts the perspective from average marginal contribution to average coalitional worth. This turns out to be particularly enlightening for the Banzhaf value.
Maximal inequalities show convergence speed depends on function-class complexity, graph growth, and how fast dependence fades with distance.
abstractclick to expand
We develop maximal inequalities for empirical processes indexed by graph-dependent observations. Our bounds separate the complexity of the indexing class from two features specific to graph dependence: the geometry of the underlying graph and the cost of coupling graph-separated blocks to independent copies. The coupling construction combines a novel graph-adapted dependence coefficient with a coloring of a block partition. We specialize the results to graphs with polynomial and exponential growth and to directed dyadic graphs. We then derive Glivenko--Cantelli results and characterize the associated effective sample size. A central implication is that graph-dependent empirical processes need not exhibit a generic root-$n$ rate: convergence is jointly determined by function-class complexity, graph geometry, and the decay of dependence with graph distance. Finally, we apply the results to obtain uniform laws of large numbers for network autoregressive models, nonlinear local-propagation models, and treatment-interference settings.
AI agents increasingly operate inside digital accounts by exercising privileges that users already hold, raising a new control question: whether an existing account entitlement must be exercised manually or may be exercised through a user-authorized automated proxy. We define \emph{delegation rights} as the revocable, identity-preserving, scope-limited, and mode-specific authority of an account holder to authorize such proxy execution. We develop a three-party incomplete-contracts model with a User, an AI Agent provider, and a Platform. The contested object is not platform ownership, account transferability, data portability, or unrestricted API access, but residual control over the mode of account execution. Under Platform Control, the platform can protect infrastructure, identity systems, privacy boundaries, and third parties, but its discretionary veto weakens the User--Agent coalition's disagreement payoff and depresses relationship-specific investment. Under User Control, hold-up is reduced, but security, privacy, congestion, and third-party risks may remain insufficiently internalized. We then analyze \emph{Certified Delegation}, under which access protection is conditional on verifiable authorization, revocability, auditability, rate-limit compliance, data minimization, and risk mitigation. Certification is therefore not merely a technical safety screen; it is a conditional allocation of residual control. Illustrative mechanism simulations show how this regime can reduce deadweight loss by restoring delegation incentives while bounding residual risk.
Quasi-posterior mean preserves group objectives while the pooling term supplies decision-theoretic optimality in weak-GMM limits.
abstractclick to expand
We develop the Quasi-Bayesian Hierarchical Model (QBHM) for grouped GMM settings. The framework combines Bayesian hierarchical modelling with Laplace-type estimation: it preserves each group-specific objective function, while introducing a pooling term for economically comparable parameters. When the number of studies is fixed, the QBHM estimator-the quasi-posterior mean-has the same asymptotic distribution as GMM when estimating strongly identified study parameters. For weakly identified studies, we analyze the asymptotic properties of the method via a weak-GMM limit experiment: an asymptotic approximation in which the sample-moment criterion remains a random function over the weak parameter space, and the upper-level pooling relation induces a family of priors over weak values. In this experiment, the weak-limit QBHM rule is a Bayes rule under squared loss for the hierarchy-induced weak-limit prior, which provides a decision-theoretic justification for our procedure. We also extend our results to mixed within-study blocks, allowing a single study to contain both strongly and weakly identified parameters. Pooling can also reduce the pointwise asymptotic mean squared error (MSE) relative to unpooled estimation when the bias--variance tradeoff is favorable. Gaussian likelihood, nonlinear weak-GMM, and weak-IV calculations show when this happens, while simulations and a microenterprise application illustrate the method.
In an incentivized laboratory experiment, we study how people account for and respond to others' incentives for paying attention. Participants learn a binary state from an attention task under high or low accuracy incentives. We ask subjects to predict their peers' accuracy based on the peers' incentives and to aggregate answers from multiple peers with different incentives. Most subjects fail to consistently understand that peers with stronger incentives are more accurate, and these subjects also perform worse in individual attention tasks. Subjects also participate in a social-learning task where they first learn the binary state from an attention task, then observe a peer's guess about the state in the same task, and finally make a guess themselves. We find behavior in these tasks is inconsistent with leading models of flexible costly information acquisition. In particular, subjects fail to pay more attention when paired with lower incentive peers. Overall, we find that many decision-makers do not respond to others' incentives for accuracy even when those incentives are transparent.
Many bipartite social networks exhibit pronounced asymmetries in selectivity and matching opportunities: members of one side can afford to be highly selective, while members of the opposite side are forced to accept less desirable matches. While it is natural to try to explain this asymmetry in terms of the intrinsic characteristics of the two sides or other exogenous factors, here we show that such asymmetries can also emerge endogenously through a feedback process generated by the matching process itself: as one side becomes more selective, the other side is pushed to be less selective due to reduced matching opportunities, and vice versa. We develop a model in which individuals repeatedly form one-to-one matches across two groups and adapt their selectivity to achieve a target matching rate. Using both analytic and numerical methods, we show that when encounters are sufficiently frequent, the unique equilibrium is for one group to be highly selective and the other non-selective. This qualitative outcome holds even for heterogeneous groups with overlapping, almost indistinguishable distributions of target matching rates. The model makes several testable predictions, and it provides a mechanism for behavioral differentiation in repeated matching environments, with applications ranging from online dating to hiring and housing markets.
Model balances housing costs, the 30 percent affordability limit, and weighted social benefits to decide relocation.
abstractclick to expand
Deciding where to live involves a complex balance between commuting and moving, as households must weigh housing affordability, transportation expenses, access to workplaces, and social ties. Traditional urban economic theories focus on the balance between housing expenses and commuting costs, while modern studies also consider housing affordability, transportation access, and utility maximization. However, few studies have combined these elements into a clear mathematical model that can be used for both policy analysis and household decision-making. This paper introduces an algebraic model for deciding whether to commute or move, expanding on traditional residential location theories by including direct housing and commuting expenses, income-related affordability limits, indirect social and service access costs, and location-based utility within a single utility-maximization framework. The model uses the common 30% housing affordability rule as a constraint, acknowledging that residential choices are also shaped by social networks, access to institutions, neighborhood ties, and quality-of-life factors. The decision rule derived from the model integrates direct financial costs with weighted social benefits and indirect access costs to assess when moving offers more overall utility than staying put and commuting. Unlike complex discrete-choice, nested-logit, or agent-based models, this framework offers a mathematically clear, understandable, and flexible decision model that can easily be expanded to include more household characteristics, transportation options, or policy factors. The model advances urban economics, migration studies, and housing affordability research by providing a practical analytical tool for assessing residential mobility decisions within financial and behavioral limits.
This paper studies inference for time-series GMM when uncertainty comes from shock assignment within a realized historical episode. Rather than treating the data as one random draw from a population of hypothetical economies, the framework conditions on the historical environment and considers alternative realizations of shocks and instruments. For locally correctly specified GMM estimators, the centered moment has design long-run variance $\Omega_R$, which determines the sandwich covariance for the finite-history estimand. Conventional HAC estimators instead converge to $\Omega_R^+=\Omega_R+\Omega_\mu$, where $\Omega_\mu\succeq0$ is the long-run variance of the centered mean-moment path. HAC inference is therefore conservative for scalar functions of the finite-history estimand. Projection adjustment using predetermined covariates can reduce this HAC variance limit in Loewner order and, under an additional long-run orthogonality condition, yields a tighter conservative bound on the corresponding asymptotic covariance. Monte Carlo evidence shows when the distinction is quantitatively important. In a monetary-policy application, standard-error reductions from rich macro covariates provide a diagnostic for economically meaningful predictable variation in the mean-moment path.
Goto and Nakada (2026) showed that the Baldwin rule can be characterized using Neutrality}, Bottom Consistency, Faithfulness, Cancellation and Bottom Independence. While their proof relies on the technique of linear algebra and graph theory, in this note, we provide a simpler proof using purely combinatorial arguments based on permutations and amplified preference profiles, thereby providing a more transparent proof of the characterization.
This study proposes a new set of a firm's "social statements" that represent social value, in contrast to conventional financial statements that represent economic value. Financial statements externalize social and environmental costs, and this externalization is one of the primary causes of contemporary social problems. Insights from anthropology, philosophy, and sociology suggest that social value is grounded in social relationships, joint actions, and communication. Building on this understanding, we assign numerical indicators of a firm's social relationships with external stakeholders to the items of a balance sheet and a profit-loss statement as social statements. This approach enables unified measurement units and simplified calculation compared with existing methods for evaluating social impact or social value. Moreover, similar to financial statements, social statements allow firms to be assessed using managerial indicators such as equity ratios and profit margins. The significance of social statements lies in incorporating social value--alongside financial value--into corporate decision-making, and in encouraging social transformation as firms publicly articulate their social value.
TRI uses title and abstract embeddings to estimate the chance a paper matches past patent-paired work, with external validation at multiple
abstractclick to expand
Universities, funders, investors, and policy agencies often need to identify research with translational relevance before patents, licenses, startups, or industry collaborations are visible. This study introduces the Translation Readiness Index (TRI), a text-based measure evaluating a publication's semantic similarity to papers that appear in high-confidence patent-paper pairs. Using 20,610 publications from OpenAlex, including 9,431 publications from the Reliance on Science patent-paper pairs data and 11,179 matched comparison publications, we created paper-level 768-dimensional semantic embeddings from titles and abstracts with SPECTER2. After evaluating four machine learning classifiers, XGBoost achieved the highest ROC-AUC (0.77). We define TRI as the model-estimated probability that a publication belongs to the patent-paper-paired class. Linguistic analysis revealed that patent-paired publications more often use an invention-oriented framing, distinct from the observational language of the comparison group. External validation across University of Western Australia (UWA) publications and leading global universities demonstrated positive associations between high TRI scores and independent translational indicators. TRI provides a text-based method for identifying translation-ready research, though it should be interpreted as a measure of semantic proximity to patented science rather than a direct measure of realized commercialization.
Decision-makers often rely on multiple probabilistic forecasts that are individually calibrated but need not be fully informative. We develop a framework for aggregating such forecasts when the decision-maker knows only that experts satisfy calibration. We show that the joint distribution of calibrated forecasts can contain decision-relevant information that is unavailable from any single expert, so the standard optimal-in-hindsight (OIH) benchmark may substantially understate attainable performance. To formalize this idea, we introduce a robust max-min benchmark: the best payoff a decision-maker can guarantee against all profile-wise conditional-mean mappings compatible with calibration. This benchmark is tractable, admits a linear-programming formulation, and dominates the OIH benchmark up to calibration error. It can nevertheless be strictly below the Bayesian benchmark, clarifying the value of knowing experts' information structures. Finally, we provide online algorithms that attain the robust benchmark under forecast-only feedback and stronger contextual benchmarks under state feedback.
In two-sided marketplaces with heterogeneous products, it is important to understand the causal relationship between additional supply and marketplace outcomes, such as the total quantity transacted or transaction value in the marketplace. This paper studies a causal machine learning approach to estimating this relationship across product segments. We use the Airbnb marketplace as an example, focusing on the impact of additional listing supply on total bookings, but the methodology applies to other two-sided marketplaces. Our approach combines double/debiased machine learning with a hierarchical Bayesian framework that leverages pre-existing knowledge as priors. We construct tractable and informative features for the model by leveraging measures of product segment similarity from the geospatial literature. We find that such a model provides plausible estimates of the marketplace returns to additional supply and strong out of sample performance.
The coordination of prices in economics is one of the most complex phenomena. In particular, the classical and neoclassical approaches related to the economic theory provide some insights into such a complex coordination based on different formulations. However, these formulations have not been successful for explaining simple mechanisms to understand and predict a set of prices that theoretically clears all markets. Consequently, elementary cellular automata can contribute to clarify such a coordination problem by using simple computational rules to describe the theoretical bases of the classical and neoclassical economics. Therefore, we propose to use this type of cellular automata for explaining different escenarios of price coordination in which simple rules of price interactions generate stable and unstable patterns of coordination. We used an explorative data analysis based on the Shannon entropy for computing the uncertainty related to such generated patterns of coordination, and a Monte Carlo simulation approximation based on a Spearman correlation for evaluating the statistical significance of such price coordination. Findings suggested that the classical economics provides a consistent approach for understanding the coordination of prices because it emphasizes human interactions based on logical choices related to an objective data. On the other hand, the neoclassical approach does not propose any type of mechanism for describing the price coordination. The neoclassical individual is just a spectator and receiver of the unpredictable and supposed event of price coordination. As a result, by modeling the economic theory based on computational concepts, we reveal facts and believes behind the classical and neoclassical economics.
Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are paired with natural-language rationales and scored against realized outcomes. We introduce Explanation Quality Markers (EQMs), a set of sixty theory-guided reasoning patterns scored by large language models (LLMs). In a pre-registered analysis of over 55,000 forecast-rationale pairs from a multiyear forecasting tournament, EQMs predict accuracy at both the forecast and forecaster levels, consistently outperforming pre-LLM text-analysis methods. More than 90% of statistically significant pattern-level EQM-accuracy correlations match our directional hypotheses. The signal is asymmetric: EQMs identify likely underperformers more reliably than they distinguish the very best forecasters. Benchmarked against traditional indicators of forecasting skill, EQMs are the strongest predictor at the forecast level and competitive at the forecaster level, though weaker than prior accuracy. Human ratings of rationale quality are less consistently correlated with accuracy and place disproportionate weight on rationale length. Results transfer to an independent forecasting study. EQMs provide a scalable, interpretable method for extracting judgment-relevant information from written explanations.
Agentic artificial intelligence is increasingly deployed not as a single assistant but as a collective of planners, solvers, reviewers, memory managers, tool users, and orchestrators. These systems are entering organisational workflows under familiar labels such as teams, managers, committees, markets, and workflows. This article asks whether such agent collectives exhibit organisational behaviour in a sense that is analytically comparable to, yet distinct from, human organisational behaviour. I argue that agentic AI is a partial organisational analogue. It resembles a human organisation because it differentiates work, coordinates interdependence, performs recurrent routines, crosses boundaries, and produces collective outcomes. It differs because these patterns are not sustained by motivation, identity, trust, employment, socialisation, or moral accountability. They are sustained by context architecture: prompts, memory, traces, schemas, tools, validators, and permissions. The article develops contextual transaction cost as the central mechanism linking these similarities and differences. Computational theorising, synthetic task simulations, real LLM agent traces, and robustness analyses show that human-imitation forms often underperform when they add lossy handoffs, correlated deliberation, and verification burdens, whereas shared-state and adaptive forms perform better when they make context durable, inspectable, and task-contingent. The article contributes to organisation studies by theorising agentic AI as an emerging object of organising and by specifying the interface conditions under which human and agentic organisational behaviour can jointly support collective intelligence.
Pareto optimal contracts with many policyholders and insurers across indemnity types reduce to one aggregate minimization problem.
abstractclick to expand
This paper proves a sum-minimization characterization of Pareto efficient insurance with multiple policyholders, multiple insurers, and multiple indemnity environments. We also provide a result regarding the pairwise implementability of the policyholder- and insurer-aggregate level arrangements in the multiple policyholders and multiple insurers setting.
Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented size, scope and granularity data, we construct the AI Factor from growth in tokens, dollars, and users, estimate firm-level AI Betas from stock return comovement, and characterize the AI Premium. First, we build a high-frequency AI factor and decompose it into salient components. Second, we show that firms whose returns covary more positively with the AI factor--high AI beta firms--earn higher subsequent returns, and the AI premium is large and heterogeneous. A value-weighted long-short strategy earns 64.1 basis points per week, and the premium is large for loadings on the intensive, frontier-oriented margin of AI consumption-closed-source models, paying and seasoned users, and long prompts--but not on casual or open-weight use. Third, the premium reaches beyond technology firms into consumer-facing and capital-heavy parts of the economy, but is absent in emerging markets, including China. Fourth, the AI exposure is more positive in nonroutine interactive work and the more negative in analytical, scientific, and operations-control skills--an occupation one standard deviation higher in interaction-and-communication content has 0.36-standard-deviation higher market-implied AI premium. Additionally, we provide early evidence of the rise of the agentic economy.
We study retrieval over catalogs of structured metadata, where each record is a small schema whose fields answer different kinds of query. Embedding a record with a text encoder first serializes its fields into a string, which forces a choice of field order. We show this choice, usually treated as an implementation detail, silently controls retrieval quality once the encoder is fine-tuned. A standard fine-tune loses 7.4 nDCG@10 points when the index is rebuilt under a different field order, because it reads absolute position instead of the field labels. We propose permutation-invariant fine-tuning ($\textbf{PI-FT}$), which serializes each record under a freshly sampled field order with random field dropout, so meaning binds to the labels rather than to position. The change is about two lines in the data loader; it costs negligible in-distribution accuracy and cuts the order-change penalty to 0.2 points. We study this in the discovery of development statistics, a catalog of nearly 10,000 indicators that should be searchable in many languages by a model small enough to self-host. As AI assistants and agents increasingly mediate access to public data and statistics, this retrieval step decides whether an answer is grounded in the right indicator or series, making discoverability a precondition for disseminating data through AI. Because usage logs cannot provide training signal for indicators no one has searched, we generate the queries instead. $\textbf{DevDataBench}$ is a fully LLM-generated benchmark of grounded, facet-targeted queries across 15 languages, covering every indicator for both training and evaluation. A fine-tuned 118M-parameter CPU encoder outperforms every zero-shot baseline, including $\texttt{text-embedding-3-large}$ (0.707 vs.\ 0.556 nDCG@10), with the largest gains in low-resource languages. We release the benchmark, pipeline, models, and a reusable PI-FT framework.
Cost-based allocation under a formal convention allowed evasion of standard detection methods.
abstractclick to expand
This paper analyzes the internal organization and economic effects of a bid-rigging cartel in the road construction sector of the Swiss canton of Ticino, active from 1999 to 2005. Using exceptionally rich documentary evidence, we reconstruct how cartel members coordinated bids and allocated contracts under a formal agreement known as the 'convention'. We show that, despite the absence of side payments, the cartel implemented a cost-based allocation mechanism that closely approximated the first-best collusive outcome. Regression and machine-learning analyses indicate that observable cost proxies systematically predict both winning bids and bid rankings. The evidence further suggests that cartel members strategically mimicked competitive bidding behavior, allowing them to evade standard econometric detection methods. Using double machine learning, we estimate average overcharges of at least 45\%, and potentially substantially higher, highlighting the significant financial harm caused by this sophisticated form of collusion.
This paper studies whether news about banks' balance sheets propagates to aggregate financial conditions and macroeconomic activity. We construct high-frequency Canadian bank net-worth shocks using stock-price reactions around earnings announcements of the six large Canadian banks. Guided by a model in which higher intermediary net worth expands credit supply and lowers borrowing spreads, we use the co-movement between bank equity prices and Canadian corporate spreads to purge raw bank equity surprises from contaminating information. Favorable purged credit-supply bank net-worth shocks lower corporate spreads, raise bank valuations and broader equity prices, appreciate the Canadian dollar, and increase real activity over the medium run. The results are robust across specifications, samples, and additional outcomes, and suggest that bank earnings news is macroeconomically relevant in concentrated banking systems.
In tests with 24 rural users, internal coordination matched subsidies for investment under falling prices and low surplus pay.
abstractclick to expand
The success of distributed photovoltaics may be undermining its own future. As solar penetration increases, electricity prices decline during periods of peak generation, reducing the value of surplus photovoltaic production. This raises a critical question: can citizen-led energy systems remain economically viable in electricity markets dominated by renewable generation?
Rather than exploring technically optimal but institutionally unrealistic solutions, we examine the options available under current regulatory and market conditions. Using high-resolution consumption data from a rural community sharing a PV facility among 24 users, we identify pathways for long-term sustainability. The study makes two contributions. First, it shows that effective internal coordination can mobilize participation and investment as successfully as external subsidies. Second, it compares static, dynamic, and hybrid energy-sharing models, with and without storage, providing a flexible framework that balances efficiency, fairness, and governance.
Results show that collective self-consumption reduces required PV capacity, lowers investment costs, and increases annual savings compared with individually operated systems. Alternative allocation schemes further improve benefit distribution and local electricity use, although gains depend on trade-offs between efficiency, fairness, and governance complexity. Under current electricity prices and remuneration schemes, battery storage provides limited additional economic value and becomes attractive only under specific market conditions. Overall, the long-term viability of citizen-led photovoltaic initiatives depends less on technological sophistication than on collective coordination and adaptive governance.
We study prophet inequalities with discounted rewards, where i.i.d. base rewards are multiplicatively discounted over time. Our main message is that even this structured and arbitrarily weak form of nonstationarity can erase the classical advantage of the stationary i.i.d. setting. Focusing on single-quantile threshold policies, we show that the competitive ratio transitions from the classical $1-1/e$ guarantee to a fundamental $1/2$ barrier as discounting accumulates over many phases in a canonical regime with a common-decay factor and equal-length phases. We further show that, in the same regime, the $1/2$ barrier persists even for arbitrary stopping rules. Consequently, i.i.d. base rewards under discounting can be as hard as the fully non-i.i.d. case. On the algorithmic side, we design single-quantile threshold rules that attain the tight bounds by calibrating acceptance decisions to an effective horizon induced by discounting, and we extend this calibration to heterogeneous decay factors and unequal phase lengths. We further show that a similar discontinuous breakdown persists in an infinite-horizon continuous-decay benchmark, where arbitrarily weak decay collapses the stationary benchmark from $1$ to $1/2$.
Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical question that becomes more urgent by the day, with market research companies already offering provisional `synthetic' survey panels and the contamination of standard survey data from LLM-generated responses. In this study, we build on past work on silicon sampling by extending considerations of its algorithmic fidelity and alignment to the domain of cultural consumption. We use large-language models from OpenAI, Anthropic, and DeepSeek to each produce 277,470 (30x9249) silicon surrogates of survey respondents from the Survey of Public Participation in the Arts (SPPA). We find these silicon surrogates' tastes to be highly stylized facsimiles of human tastes. (1) Silicon samples have a systematic postive-bias for liking, resulting in inflated ecological estimates of tastes. The individual-level bias of silicon samples are not well-explained by the WEIRD-bias often discussed in the literature. (2) The complex relationality in real taste structures is completely lost among silicon samples. (3) Finally, very little of the known cultural alignment between tastes and social space are preserved. Silicon samples attenuate age-taste associations, resurrect anachronistic class-taste associations, caricaturize gender- and race-taste associations.
Spectral social aggregation obeys Pareto if and only if its spectrum uses quantiles present in society, rendering full representation dictat
abstractclick to expand
Many collective decisions under risk are made by people who care about different parts of the outcome distribution: downside losses, typical performance, or upside gains. This paper models this disagreement with quantile preferences and studies how the represented quantile levels can be aggregated. Our main result is a spectral support theorem: a spectral social aggregation satisfies the Pareto principle if and only if its social spectrum puts mass only on quantile levels represented in society. Hence, Pareto consistency makes representative-quantile aggregation a dictatorial case. In addition, we derive spectral aggregation from rank-based axioms, develop finite and threshold-Pareto consequences, and show when local benchmark-affine and elliptical common-shape domains admit a representative-quantile reduction.
Macroeconomic expectations are usually observed through point forecasts or through asset prices whose mapping into beliefs is model-dependent. This paper uses prediction-market prices to recover high-frequency distributions of short-run macroeconomic beliefs. We construct a panel of Kalshi-implied distributions for CPI and core CPI releases by converting adjacent threshold contracts into probability mass over inflation outcomes. The data reveal market-implied means, uncertainty, and upper-tail probabilities from 30 days to one hour before each release. The market-implied mean contains meaningful forecast information, especially for headline CPI, but the main signal is distributional. Lagged Reuters Poll surprises do not predict systematic deviations of Kalshi means from the current Reuters consensus. By contrast, large lagged surprises are associated with higher implied uncertainty, and positive lagged surprises raise the probability assigned to fixed high-inflation outcomes. In the baseline specification with variable-by-horizon fixed effects, a 0.1 percentage point positive lagged surprise raises the probability of monthly inflation above 0.3 percent by about 4.7 percentage points, even after controlling for the current consensus forecast. In release-level validation tests, Kalshi upper-tail probabilities also predict the realization of high-inflation states, including episodes in which the market-implied mean remains close to the Reuters consensus. The evidence suggests that prediction markets can provide real-time information about inflation risk that is missed by point forecasts.
This paper develops misspecification-robust sensitivity and informativeness diagnostics for GMM estimators, evaluated at pseudo-true values. The sensitivity matrix nests that of Andrews, Gentzkow, and Shapiro (2017) under correct specification. The informativeness $\Delta$ measures the share of an estimator's asymptotic variance explained by sampling variation in the moments, a notion of structural efficiency that equals one under correct specification and can fall below one under misspecification, even when the Hansen $J$-test does not reject. We derive influence-function representations for one-step, two-step, iterated, and continuously updating GMM. We show that in minimum-distance estimation, estimating the optimal weight matrix adds estimator variance that the moments do not explain, lowering informativeness, while simpler weight matrices largely avoid it. The choice of weight matrix therefore involves a trade-off between classical efficiency and informativeness. In applications to the automobile demand model of Berry, Levinsohn, and Pakes (1995), the consumption insurance model of Blundell, Pistaferri, and Preston (2008), and the income-and-democracy regressions of Acemoglu, Johnson, Robinson, and Yared (2008), misspecification reorders sensitivity rankings, simpler weights preserve the informativeness that the optimal weight loses, and $\Delta$ detects structural-efficiency losses that the $J$-test does not.
Reliable generative AI models critically rely on expert human annotations to evaluate output quality, yet these "gold" labels are expensive to collect and limited in quantity. Organizations thus often turn to collecting vast but noisy "silver" labels from crowdsourced workers or vendor annotators as proxies for gold labels. Because gold remains the evaluation target, naively aggregating noisy silver labels may introduce bias, and estimators built on sparsely observed gold labels may have high variance to resolve the model performance gaps that guide practical decisions. Model evaluation has become an ongoing operational practice rather than a one-time exercise, with evaluation rounds repeating across model versions, releases, and content domains. A natural question is whether the previous historical evaluation data can be used to improve each new round of evaluation. We introduce HERO (History Enhanced RObust model evaluation), a novel framework that uses historical data to suppress bias (improve reliability) and reduce variance (improve sensitivity) in model performance evaluation. HERO calibrates silver labelers' performance learned from historical gold annotations, and stabilizes the resulting estimator by anchoring it to covariate information measured with high precision in the historical data. HERO can be broadly applied across multiple common evaluation tasks, and remains valid when only a subset of historical labelers appears in the current round. We establish conditions under which the bias and variance reductions hold, showcase HERO's performance in simulation studies, and demonstrate its effectiveness on real-world model evaluation benchmarking datasets.
Conditions derived for the model allow checking data without parameters, and most cases fail even when payments are risky.
abstractclick to expand
We present a revealed preference characterization of the discounted expected utility model with a concave utility function. The characterization offers a nonparametric test of the model. We apply the test to an experimental data set in the literature and find that the model is almost always rejected even when all payments involved are subject to risk.
Calgary commuter survey shows the model allows simultaneous substitutions and reveals stronger responses to central and peak pricing.
abstractclick to expand
Effective congestion management strategies require a detailed understanding of how travellers respond to different pricing interventions. This paper presents an in-depth analysis of traveller behaviour under congestion pricing scenarios, focusing specifically on mode and departure time decisions. Utilizing stated preference survey data from commuters in Calgary, Canada, three discrete choice models including Multinomial Logit, Nested Logit, and Cross-Nested Logit are developed and compared. Results indicate that the Cross-Nested Logit model provides superior behavioural realism and flexibility by capturing simultaneous substitutions across modes and departure times.
Spatial analysis and elasticity assessments reveal substantial geographic variation in traveller sensitivity to pricing, particularly highlighting stronger responses among commuters travelling to high-demand central locations and during peak travel periods. Further elasticity analyses clarify behavioural patterns, identifying traveller groups with varying degrees of flexibility. Policy analyses underscore the effectiveness of targeted, dynamic tolling, particularly cordon-based pricing combined with time-specific toll adjustments, in reducing congestion levels. Additionally, the findings highlight the necessity of complementary measures, including improved transit services and targeted discounts, to ensure equitable outcomes. The findings offer targeted insights into how specific pricing strategies such as cordon, distance, and travel time-based tolls can be used to influence travel behaviour, reduce peak-period congestion, and guide equitable policy design in urban transportation planning.
Framework dispenses with parallel trends, works for one or many treated units, and supplies confidence intervals.
abstractclick to expand
We develop a factor-model framework for causal inference in panels with policy interventions. Treatment effects are represented as structural changes in treated units' exposure to latent common shocks and, in extensions, changes in the factor process itself. The approach does not impose the standard parallel-trends restriction, accommodates one or many treated units, and targets systematic effects when unit-time idiosyncratic effects are not point identified. We provide estimation and inference under both fixed and treatment-dependent factor processes. Simulations show coverage close to nominal levels. In applications to California tobacco control and German reunification, the method produces estimates broadly consistent with synthetic control while delivering formal confidence intervals.
Computing optimal policy in heterogeneous-agent economies is complicated by the possibility of multiple equilibria. We overcome this difficulty by showing that when the equilibrium manifold has a low-dimensional Negishi-weight parameterization, Bayesian optimization reliably finds approximate solutions and can be used to certify candidate solutions with high probability. This insight brings recent machine learning advances to bear on a core problem in macroeconomics. We apply Bayesian optimization to a dynamic economy with heterogeneous agents and climate change and compute optimal carbon taxes in this setting. Although in principle the presence of the carbon externality creates scope for multiple equilibria, we show that in an example with realistic calibration of damages competitive equilibra are most likely unique.
The company's role stays the same: create shared context where human and machine knowledge convert and amplify each other.
abstractclick to expand
Nonaka emphasized that innovation is the result of a continuous back-and-forth between tacit and explicit knowledge. Artificial intelligence introduces a fundamentally new object into this process -- tacit machine knowledge -- but Nonaka's ideas are more relevant than ever. The central role of the knowledge-creating company remains the same: to create the shared context in which different kinds of knowledge can feed off each other, become organizational knowledge, and set off further cycles of innovation.
Global supply chains are highly interconnected, making them vulnerable to cascading disruptions induced by trade policy shocks. Understanding how such disruptions propagate through production networks, and how mitigation mechanisms such as trade reallocation and production adjustment can alleviate their impacts, remains a central challenge. In this work, we develop a linear programming formulation of an Input-Output (IO) system that captures cascading supply-chain disruptions together with trade reallocation and production expansion. Our formulation yields a system-level equilibrium characterization that enables the joint analysis of disruption propagation and mitigation within a unified framework. We propose an efficient algorithm for computing approximate equilibrium solutions by minimizing total unmet demand in large IO systems. We apply our approach to tariff-induced disruptions in the global oilseeds supply chain arising from the U.S.-China trade war. Our results show that a localized 70% disruption to flows from the U.S. oilseeds sector to China leads to a 3.27% loss in global output, with China experiencing a disproportionate loss of 14.02%. As a counterfactual mitigation strategy, allowing a 20% reallocation from Brazil's oilseed sector to China significantly reduces global output losses to 1.36%, although pressure remains high on final-demand flows. We further investigate production expansion as an additional mitigation mechanism and show that it introduces tradeoffs between reducing global final-demand losses and protecting Brazil's domestic flows. Domestic reallocation disproportionately shifts losses toward smaller economies, while globally sourced expansion redistributes losses more broadly across the network.
It captures context-dependent goal activation and resolution across time scales, shown in scheduling, ownership, and location applications.
abstractclick to expand
Travel behavior and demand modeling seeks to understand the factors that motivate transportation decisions. At the same time, the field is increasingly adopting algorithmic and artificial intelligence (AI) tools that improve predictive accuracy, often at the cost of a grounding in hypothesis-based theory validation and behavioural explanation. In this discussion paper, we use goal pursuit theory (GPT) to illustrate why behavioral theory is a necessary complement to prediction in travel behavior research. Unlike random utility maximization (RUM) or close alternatives (e.g., random regret minimization (RRM)), GPT explicitly models how travelers (1) activate context-dependent goals (hedonic, gain, normative), (2) resolve conflicts between competing objectives, and (3) make sequential decisions across temporal scales. We demonstrate GPT's merits through three transport applications: activity scheduling (handling hierarchical goal structures), vehicle ownership (disentangling bundled mobility goals), and location choice (capturing latent goal interactions via matrix factorization). We provide actionable guidance for implementation, including: (a) hybrid choice model specifications linking goals to observable behaviors, (b) parallels to complementary behavioral theories from the transportation field, and (c) data requirements and comparative benchmarks against RUM/RRM models.
Firms shift from training the least-skilled to the most-skilled below the AI level when workers can switch jobs.
abstractclick to expand
When firms deploy autonomous AI, they must decide how much work to leave to the system and how much to keep workers engaged. This decision affects current output and future human capital. We develop a parsimonious two-period model in which AI may outperform the worker when it functions, but may fail with positive probability. A firm chooses worker engagement; engagement lowers current output for below-benchmark workers, but changes future skill through learning and erosion. We distinguish two dimensions of AI progress: capability, the system's output when it works, and reliability, the probability that it works. In a single-firm benchmark, engagement is valuable only as fallback investment. The firm engages the least-skilled workers most, because they have the largest skill gaps and are least costly to bring toward a useful fallback level. With worker mobility, engagement also affects labor-market sorting: workers prefer jobs that build more valuable skill trajectories. This sorting motive targets higher-skill workers near the AI frontier, where skill gains are more valuable and engagement is less costly. Mobility can therefore reverse the engagement pattern, shifting investment from the least-skilled toward the most-skilled workers below the AI benchmark. Mobility also reshapes how AI progress affects engagement: greater capability raises engagement by increasing the value of the skill trajectory a firm offers, whereas greater reliability can raise or lower it because it reduces fallback need while also changing learning opportunities. Under worker mobility, human-AI work design becomes a problem of human-capital investment, in which allocating work today shapes future skill.
This article reconstructs the economic and social history of Bolivian neoliberalism and evaluates whether economic liberalization reduced or increased poverty and inequality in Bolivia. The historical argument is that the Bolivian neoliberal cycle was not a single event but a layered sequence: hyperinflation and emergency stabilization, the 1985 New Economic Policy, labor displacement and mining restructuring, second-generation reform in the 1990s, capitalization, decentralized state restructuring, commodity dependence, and the social conflicts that culminated in the collapse of the party system. The empirical contribution is to integrate macroeconomic indicators, economic-freedom indices, poverty and inequality series, IMF and financial-reform data, commodity and disaster controls, Bolivian export aggregates, and harmonized historical survey indicators. The preferred design is a heterogeneous instrumental-variables model that instruments domestic liberalization with lagged regional leave-one-out policy diffusion and allows Bolivia to differ from the Latin American average. The central estimate is that a 10-point increase in the Heritage economic-freedom score is associated, for Bolivia, with approximately +4.46 percentage points of poverty at the USD 4.20/day line, +3.61 percentage points at the USD 3/day line, +7.40 percentage points at the USD 8.30/day line, and +3.91 Gini points. These results remain socially regressive in sign after adding export-structure controls to the poverty specifications, although the causal interpretation remains conditional on the exclusion restriction. The article therefore advances a qualified conclusion: Bolivian neoliberalism stabilized hyperinflation, but the historically specific liberalization package appears to have increased social vulnerability and inequality rather than producing inclusive development.
F-CCEMG records lowest RMSE and near-nominal coverage in Monte Carlo tests for G7 renewable-energy data.
abstractclick to expand
We study estimation of the mean slope in heterogeneous panels that combine cross-sectional dependence from unobserved common factors with unit-specific structural breaks occurring at different dates. We organize the available second-generation Mean Group estimators into a regime map indexed by the cross-section size, the strength of the cross-sectional dependence, and the nature of the structural change, and we examine two estimators for the small-to-moderate-dependence panels common in applied macroeconomics and energy economics. The Fourier SUR Mean Group (F-SURMG) estimator augments a seemingly unrelated regression system with unit-specific Fourier terms. The proposed Fourier Common Correlated Effects Mean Group (F-CCEMG) estimator augments the CCE regression with deterministic Fourier terms, filtering the common factor while absorbing the heterogeneously timed breaks. In a Monte Carlo study with R = 500 replications across weak, moderate, and strong dependence, F-CCEMG attains the lowest root mean squared error in almost every configuration and near-nominal coverage once the cross-section is not minimal, while F-SURMG gives the best-calibrated inference in the small-N, weak-dependence corner; estimators that do not filter the factor lose coverage as dependence rises. An application to the renewable energy-growth nexus in the G7 over 1965-2019 finds no significant aggregate effect of renewable energy consumption on growth.
The healthcare sector contributes approximately 4.4% of global greenhouse gas emissions, yet research on the organizational determinants of sustainable behaviors among healthcare workers remains limited. This study examines how green transformational leadership and ethical climate influence sustainable clinical behaviors among registered nurses, with green psychological climate as a mediator and perceived organizational hypocrisy as a moderator. Data were collected from 760 nurses across 11 public and private hospitals in Jordan using a cross-sectional survey design. Structural equation modeling with bootstrapping was employed to test the hypothesized relationships. The results revealed that both green transformational leadership and ethical climate positively predicted sustainable clinical behaviors. Green psychological climate partially mediated both relationships. Perceived organizational hypocrisy significantly weakened the positive effects of green transformational leadership and ethical climate on sustainable behaviors. The model explained 35.7% of the variance in sustainable clinical behaviors. These findings highlight that fostering sustainability in healthcare requires not only supportive leadership and ethical organizational environments but also authenticity and consistency between stated values and actual practices. The study extends green transformational leadership theory to healthcare settings, integrates ethical climate research with environmental sustainability, and introduces perceived organizational hypocrisy as a critical boundary condition. Practical implications for healthcare administrators seeking to reduce their environmental footprint are discussed.
Trade and price history recovers informed-trader versus market-maker distinction for linear strategies
abstractclick to expand
We show that net demand for liquidity by algo strategies is identifiable from its trade and price history alone, with no knowledge of its signal or optimization problem. An exact multi-period regret decomposition implies that the sign of this statistic classifies a linear strategy as a net liquidity consumer or provider, recovering the Kyle (1985) informed-trader/market-maker dichotomy from observables alone. Under an AR(1) cost process, the same statistic equals the product of strategy size and the squared Roll (1984) implied spread, making the correction a direct proxy for prevailing illiquidity. Extending to endogenous price impact and aggregating across N correlated strategies yields a liquidity-balance condition whose violation produces welfare loss scaling as N squared, a closed-form fire-sale externality. We calibrate to CRSP equity data (2016-2025), tracking implied spreads through the COVID-19 and 2022 rate-shock episodes, with an estimator computable in O(Tnd) time.
The equivalence holds for time-varying effects, surrogates, and mediation whether or not the models are correct.
abstractclick to expand
Time-varying treatment effects, surrogate-identified treatment effects, and mediation effects can all be written as recursive regressions, in which each regression's predicted values become generated outcomes for the next regression. We study how standard causal estimators behave in this setting. Formally, we compare the recursive plug-in, recursive balancing weight, and recursive doubly robust estimators. When every stage is fitted by ordinary least squares (OLS), the three recursive estimators coincide in any finite sample, whether or not the models are correctly specified. As such, estimation by recursively regressing generated outcomes is numerically equivalent to estimation by recursively balancing generated regressors. Under ridge penalisation for the balancing weights, the doubly robust estimator is a backward recursion of stage-wise blends of penalised and OLS regressions. The weight on the recursive OLS regression decays geometrically in the number of time periods. Therefore, the intuition from the cross-sectional setting, where the bias correction moves the estimator towards OLS, applies less and less as the number of time periods increases. For general convex penalties, we derive an identity at each stage.
This paper studies how topping up -- allowing recipients of in-kind transfers to supplement subsidized consumption in a private market -- affects optimal redistribution. Consumers can access a competitive private market, while a social planner offers an alternative nonlinear price schedule. We show that the effect of topping up depends on the correlation between redistributive priority and demand. When the correlation is positive, topping up does not affect the optimal mechanism. When the correlation is negative, topping up weakens screening and reduces redistribution. At the extensive margin, topping up reduces the set of environments in which intervention is optimal. At the intensive margin, topping up weakly reduces both the scope of a free public option and the mass of consumers served, and shifts redistribution away from the consumers with the highest redistributive priority. We characterize the optimal mechanisms and show how topping up changes the comparative statics of optimal redistribution with respect to redistributive priorities.
Consider an analyst interested in predicting the size of an effect. She has identified a set of prior published studies of similar effects. We provide a toolkit for (i) summarizing the prior literature, (ii) making predictions of effects in new contexts, and (iii) correcting for the bias from selectivity in the prior literature. We illustrate these methods with empirical examples from labor, public, behavioral, environmental, and development economics. Some of the tools are relevant even when only three prior studies are available. We show how it is possible to use covariates to transparently make predictions for a new context by reweighting prior estimates. The mean effect 0 after correcting for selectivity - is between 12% and 21% of the simple mean in our empirical examples. We conclude with a cookbook for practitioners producing meta-analyses.
We introduce MACROCAST, a lightweight Time Series Foundation Model (TSFM) for real-time macroeconomic forecasting. Existing TSFMs suffer from data leakage in two forms: temporal contamination, as the model may have seen the realized values of the series it forecasts, and revision bias, as training on fully revised data diverges from the preliminary, vintage-specific releases available to real-time forecasters. MACROCAST is, to our knowledge, the first TSFM that rules out both forms of leakage entirely: at no stage of training is the model exposed to information that would not have been available to a forecaster in real time. We train MACROCAST first on purely synthetic time series in approximately one GPU-day and then fine-tune it on synthetic time series drawn from Bayesian VARs, dynamic factor models, and ARIMA specifications estimated on vintage-specific ALFRED data. Because pretraining uses only simulated data and fine-tuning uses only real-time vintages, no observed future or revised value ever enters the model; each fine-tuning run takes nine minutes. Evaluated on the FRED-MD database in a genuine real-time out-of-sample exercise, MACROCAST improves on the AR(1) benchmark for roughly 80% of series-horizon pairs, matches or surpasses Chronos-2 -- the strongest currently available TSFM -- and outperforms the Bayesian VAR and dynamic factor model benchmarks, all in a data-leakage-free manner.
We introduce a moral hazard model in which public information about a payoff-relevant state arrives over time, an agent decides when to make an irreversible investment, and a principal commits to a state-contingent policy to incentivize investment. To discourage the agent from waiting for more information, the principal's optimal policy provides certainty, reducing the degree to which the agent's payoff depends on the state. This is inefficient -- both players would be better off with less certainty. We study when the agent receives positive rent, and when moral hazard delays investment. Our results apply to environmental subsidies and R&D incentives.
The competitive equilibrium of general equilibrium theory exists as a fixed point and is, by the theorys own results on aggregate excess demand, in general silent on whether that fixed point is unique, stable, or attained. This paper takes the economy to be not a configuration to be solved for but a process to be recovered, an asymptotically mean stationary information source carrying a partially identified operator of statistical dependence, populated by agents that are finite-capacity information channels. Within this adaptive order the competitive, rational expectations equilibrium is recovered exactly, as a joint limit taken along an explicit scaling path. Three parameter limits and two fixed-point conditions deliver it, the entropy rate falls to zero, agent channel capacity diverges, selection intensity grows infinitely sharp, adaptive learning reaches its expectationally stable rest point, and the recovered structure ceases to coevolve. At that corner the limiting object satisfies the axioms of the canon and its rest state is a Walrasian equilibrium, away from it the adaptive economy is a strict generalisation, carrying a positive entropy rate and a recovered dependence structure that the equilibrium primitive cannot express. We give the nesting as a theorem, establish the result by result correspondence with existence, with the Sonnenschein Mantel Debreu indeterminacy, and with the regular economies recovery, and characterise exactly what the equilibrium limit erases.
Machine learning (ML) has rapidly transformed economic history, lowering costs of digitization, data linkage, and imputation, and making information in historical text usable at scale. This paper offers a practical guide to using these tools well. However, ML tools have also created new problems. Prediction errors are often systematically correlated with covariates of interest, so even highly accurate models can distort and sometimes reverse coefficients, and standard validation cannot detect this. Given that ML tools often perform worse for historical data, this problem is especially severe for the field of economic history. We also identify a solution to this problem. We show that recent debiasing methods can correct such bias for a wide class of applications, using a small, randomly sampled set of expert-coded labels while retaining the efficiency of large-scale prediction. We organize the field with a taxonomy of three ML tasks, survey the literature along it, and indicate where debiasing applies and where validation against proxies remains the only recourse. We close with best-practice guidance on digitization, model choice, and reproducibility.
In many matching markets, agents care not only about their own partners but also about the matches formed by others. With externalities, stability depends on what agents believe would happen after a deviation. We introduce rationalizable conjectures: beliefs that survive iterated elimination, in the spirit of rationalizability in non-cooperative games. These beliefs define conjecture-rationalizable stability, a solution concept that always exists, extends Gale--Shapley stability, and coincides with it when externalities are absent. We also introduce rationalizable matchings, a non-equilibrium counterpart, and show that every conjecture-rationalizable stable matching is rationalizable. In matching with couples, our concept yields non-empty predictions even when standard stability is vacuous. Finally, we provide an epistemic foundation: rationalizability is behaviorally implied by pairwise rationality and common belief in pairwise rationality, while conjecture-rationalizable stability additionally requires belief correctness.
This paper provides a toolkit for the study of distributional treatment effects (DTEs) focused on treatment-effect discontinuities defined as points where marginal distributional effects change sign. Building on the Treatment Effects Curve (TEC, Verme, 2010), the paper makes three contributions. First, we propose a methodological framework comprising a Horizontal Discontinuity Analysis (HDA) comparing groups in regions of opposite-signed effects using causal forests, and a Vertical Discontinuity Analysis (VDA) examining sign-switch points. Second, we adapt crossing-point asymptotics to locate where a TEC crosses zero and to test the non-tangentiality of its local slope with a bias-corrected Wald statistic. Third, we illustrate the full workflow on synthetic data and add a diagnostic application to Mexico's PROGRESA data. The paper shows how these contributions complement and expand existing instruments for DTE analyses.
China's electric-vehicle (EV) sales share rose from about 1% in 2015 to roughly 45% in 2024. We evaluate this technology transition with an equilibrium differentiated-products model of the Chinese auto market, and quantify both its attribution and its welfare and reallocation consequences. Every yuan of 2024 EV subsidy delivered about 3.38 yuan of private surplus, but this surplus accrued asymmetrically. Per-capita consumer-surplus loss from subsidy removal is about five times larger in Tier 1 than in the Rest tier; about half of the aggregate welfare loss operates through indirect Wright's-law learning rather than the direct cash transfer; and EV-native firms (BYD, Tesla, New Forces) retain 16-27% of their 2024 EV business under subsidy removal while traditional state-owned manufacturers retain only 11%. A Shapley decomposition into six channels -- Quality, Variety, Battery, Subsidy, Residual, and Market -- attributes the historical 2015-2024 rise primarily to product-quality gains (+45.49%), choice-set expansion (+14.81%), and battery-cost decline (+8.20%). The Subsidy block is negative (-13.63%) because direct purchase subsidies were phased down, not because subsidies reduce demand: a separate counterfactual that removes the 2024 subsidy entirely lowers EV share by 23-33%.
Large Language Models (LLMs) are increasingly used as stand-ins in behavioural games. These stand-ins rely on the assumption that the LLM's distribution of choices meaningfully matches how humans play the same game. This study tests that assumption through two games. The first is a p-beauty contest, and the second one is a public goods game. The study first investigates five local-model settings within the same model family. These settings are varied together in a 360-cell factorial, which balances temperature, scale (0.5-32B), quantisation, instruct vs base, and framing. Each cell's distribution is then compared against whole choice distributions in published human data. Each deployment setting, except for quantisation, governs a different aspect of fidelity. Mechanically, while the dispersion of human players can be somewhat recovered through deployment settings, the strategic process behind it cannot. Through the lens of the level-k cognitive theory, we find that LLMs act as static, category-retrieved level-k players, where k is set by the model scale. The models also do not run within-game belief-updating or backward induction throughout multiple-round horizon settings. While human contributions decayed in the public goods game, LLMs stayed flat or rose at every scale. When the horizon test was administered, LLMs were more cooperative under an indefinite horizon compared to a finite one. However, LLMs ignore their relative round position, so no last-round defection was displayed. This implies that LLMs retrieved levels relative to the horizon category rather than working out iteratively from the specific game setting.
We use web search data to construct monthly indexes of derived demand for cobalt, copper, and nickel, which are key inputs in technologies driving the energy and digital transitions. We incorporate these indexes into Structural Vector Autoregressive (SVAR) models of global metal markets and identify structural shocks using zero, sign, and magnitude restrictions. This approach disentangles supply shocks from several demand-side drivers of metal prices and isolates a transition demand (TD) shock linked to the diffusion of metal-intensive technologies. We find that TD shocks generate persistent price effects, especially for copper and nickel, whereas supply and metal-specific demand shocks are more immediate and less persistent.
In this paper, we study reactive strategies in repeated additive games between two players with finitely many actions. Reactive strategies condition only on the opponent's previous action, making them one of the simplest ways players can respond to past interactions. Additive games include important models of cooperation, such as the donation game and games with a punishment option. We show that, for this class of games and strategies, the conditions for symmetric Nash equilibria reduce to a system of linear equalities and inequalities in the strategy parameters, allowing us to characterise all such equilibria. We establish a one-to-one correspondence between non-empty subsets S of the action set and equilibrium classes, which we call S-supporting equilibria. These are equilibria that use exactly the actions in S when playing against themselves. As a special case, we recover the well-known equalizer strategies as the equilibria supported on the entire action set. To assess which equilibrium classes are most evolutionarily relevant, we complement our analytical characterisation with simulations of social learning dynamics. We find that their prevalence is determined by two factors: how likely they are to be generated and how robust they are against invasion.
The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.
Product complexity estimates no longer vary with chosen geographical scale and track GDP per capita and employment more closely.
abstractclick to expand
Several network-based measures have been proposed to assess the economic complexity of countries. These measures have provided important insights into national economic development, and they are now widely applied at the subnational level as well. Here, we show that such applications lead to inconsistent results, in the sense that the estimated complexity of the same product appears to depend on methodological details such as the geographical scale of analysis. Building on these findings, we propose a measure of territorial economic complexity based on an exogenous and extensive computation. We show that these methodological choices yield estimates that are more consistent and more strongly aligned with standard economic indicators, such as GDP per capita and employment.
We analyze usage data from OpenAI's Codex tool to present large-scale evidence of how agentic AI technology, which can take actions on a user's behalf, changes how people work. We use an automated, privacy-protecting pipeline to contrast usage across three populations: external personal-account users, external organizational-account users, and workers within OpenAI. We find that agentic AI usage is growing rapidly: the number of active users has grown more than fivefold in the first half of 2026, with the most rapid increase occurring outside the initial audience of software developers. Uptake is uneven: within OpenAI, Codex usage is nearly universal and has largely replaced business usage of ChatGPT. We document a similar shift to agentic tooling outside OpenAI, particularly within organizations, although external adoption remains lower and more uneven. In addition to headline usage figures, we observe measures of sophistication, and find that a growing number of users have used Codex to change their workflows substantially. More than 10% of users manage three or more concurrent Codex agents at some point each week and that 26.6% use skills, which allow users to share instructions for complex workflows. Alongside these changes in usage practices, request complexity has increased: since the start of the year, the share of individual Codex users who submit at least one request for a task estimated to require more than eight hours for an experienced human to complete has increased nearly tenfold. Concurrently, output has grown rapidly -- in June 2026, the median OpenAI employee in a legal role generated 13 times more monthly output tokens across Codex and ChatGPT than they did in November 2025, while the median researcher generated more than 50 times as many. We conclude by discussing the implications of these patterns for productivity, job reorganization, and workforce restructuring.
In recent years, technological developments and activities by private actors have led a reemerged discussion of the potential of nuclear fusion to meet growing global energy demands. So far, however, fusion technologies remain at comparatively low development levels and their deployment in commercial power plants is probably still decades away. Regardless, over the last decades, many cost studies have been conducted that estimate the future cost of potential fusion power plants. But to date, there is no systematic and harmonized assessment of these projections. Therefore, this study conducts a stochastic analysis of future fusion power plant costs for three distint technology lines, magnetic confinement, inertial confinement, and magneto-inertial confinement fusion, including cost assessments of different technology maturity levels. These levels are further assessed to determine projected learning rates for future fusion costs. For mature technologies, mean LCOE are determined at 114.6, 110.3, and 143.9 USD per MWh for MCF, ICF, and MIF devices, respectively. This implies learning rates of more than 30%. We find that these projected values are rather optimistic when compared to other literature or comparable technologies like fission. We therefore urge policymakers to caution when potential fusion developers refer to the potential economic competitiveness of fusion power plants.