q-fin.EC is an alias for econ.GN. Economics, including micro and macro economics, international economics, theory of the firm, labor economics, and other economic topics outside finance
We argue that AI-saturated markets are likely to create Veblen-good premiums, which we term human-provenance premiums, for verified human presence, and hence AI governance should treat human-provenance verification as labor infrastructure. Generative and agentic AI systems lower the cost of many standardized cognitive, creative, and coordination tasks, weakening the scarcity premiums that have supported much middle-tier knowledge work. We argue that this pressure may produce an asymmetric barbell-shaped structure of value capture in advanced economies: high-volume synthetic production controlled by owners of AI infrastructure at one pole, and scarce, high-status human labor valued for verified human presence at the other.
We advance three claims. First, AI compresses the value of standardized middle-tier labor by making good-enough synthetic substitutes scalable at low marginal cost, hollowing out the middle of the skill distribution currently categorized by knowledge work. Second, this compression reallocates demand for human labor toward work valued for its visible human character. We term this performative humanity and distinguish three forms of labor: relational presence, aesthetic provenance, and accountability. Third, as these premiums depend on credible verification, AI governance should treat human-provenance systems as labor infrastructure rather than as luxury authenticity labels.
To evaluate hybrid human-AI work, we propose constitutive human presence as the relevant standard: human labor retains premium value when human judgment, attention, accountability, authorship, or relational participation is not incidental to the output but constitutive of what is being purchased.
Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity. We argue that these representations constitute an information channel, role coherence, through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage. In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, we show that seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure. Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level. Instead, we propose architectural interventions that trade off personalization against preference privacy.
Average wages in Japan rose until the mid-1990s but stagnated thereafter. This paper studies Japan's long-run wage stagnation by decomposing changes in average log real hourly wages from 1980 to 2024 into four components: demographic change across worker types, changes in relative employment shares across job types, changes in relative wages across job types, and wage growth within job types. The framework combines a shift-share decomposition across worker types with an extension of the Olley-Pakes decomposition that separates employment reallocation from changes in relative wages across job types. Wage growth within job types contributes positively over the full sample period, but demographic change and employment reallocation partly offset it. Between 1996 and 2014, all four components are negative. The negative contribution from employment reallocation is not limited to the expansion of part-time employment, but reflects broader shifts across job types defined by employment type, establishment size, and industry.
We examine the economic impact of increasingly productive AI and policies that spread its benefits across the economy. Improvements in AI productivity trigger labor reallocation and changes in absolute and relative wages for different types of labor. Wages of labor that is essential for building AI increase faster than overall GDP. Wages of labor that is substituted for by AI decrease in both absolute and relative terms. Wages of labor that is used only in final goods production and is not displaced by AI increase in line with overall GDP. We contrast the impact of productivity gains depending on whether AI production is competitive or monopolistic. Monopoly production of AI restricts its deployment, slowing the transition and impact of AI. Optimal tax and regulatory policies that achieve Pareto-improvements differ depending on whether there is competition in AI production.
This paper estimates the effect of cross-border transmission constraints on suspected market power abuse in the German wholesale electricity market. Using a 2SRI instrumental variables approach, we study suspected strategic behavior by German gas- and coal-fired power plants in 2022-2024. Cross-border transmission constraints are measured using the maximum and minimum bounds of zonal net position, while suspected market power abuse is measured as the upward or downward deviation of observed dispatch from a modeled competitive benchmark. We find that transmission constraints significantly elevate the likelihood of suspected market power abuse. When headroom for further imports is already scarce, reducing import headroom by one Gigawatt (GW) increases the odds of suspected capacity withholding by 15%. Similarly, reducing export headroom by one GW when it is scarce increases the odds of suspected capacity push-in, a strategy to depress prices, by 16%. These results provide empirical support for interconnection expansion as an instrument to mitigate market power.
Large language models (LLMs), a prominent form of artificial intelligence (AI), are becoming everyday interfaces for political questions, but most exchanges are dyadic rather than audiencefacing. This paper asks whether AI conversation functions as a new arena for political expression or as a conversational intermediary for routine political demand. Using 4.30 million humanAI conversations from three large public datasets, we apply two validated classifiers to user messages, identifying political content, use case, and expressed ideology. Political content appears in 3.9% of conversations, varies sharply by platform publicness and conversation depth, and is mostly practical: users ask for information, draft text, and process documents far more often than they state opinions. A regression-discontinuity-in-time design around the 2024 U.S. presidential result call shows that the call changed the expressive subset: among U.S. users, stance-taking, affective language, and ideological extremity rose; comparable conversations elsewhere did not. AI conversation is less a public square than a conversational political intermediary, absorbing routine demand and becoming expressive when major events make political stakes explicit.
Centralized hydrothermal planning models determine generation schedules and electricity spot prices based on inflow forecasts in audited-cost power systems, such as those prevalent in Latin America, and provide operational benchmarks and decision support in hydro-dominated competitive electricity markets. Consequently, biased forecasts can propagate directly into both operational decisions and market outcomes. This paper studies how persistent optimistic inflow-forecast bias propagates through the Brazilian hydrothermal power system and market. For a stylized hydrothermal model, we show analytically that optimistic bias weakly reduces water values and weakly increases first-stage hydro discharge relative to the unbiased optimum, thereby lowering reservoir storage and postponing thermal commitment. Using official Brazilian planning and operational data, we provide empirical evidence consistent with this mechanism. We then conduct a controlled SDDP experiment to compare policies trained under biased and bias-corrected inflow-forecast processes, evaluating both under the same bias-corrected inflow scenarios. The policy trained under biased forecasts produces lower reservoir levels, delayed dry-season thermal dispatch, sharper spot-price peaks, higher reliability risk, and higher expected operating costs. Finally, we show that these distortions increase the price-quantity risk for hydropower producers and reduce their willingness to contract. The results indicate that inflow-forecast bias is not merely a statistical forecasting problem, but can be a source of operational inefficiency, reliability risk, and distorted market incentives in hydro-dominated power systems. We argue that the insights and policy implications drawn in this paper may be relevant beyond Brazil to other hydro-dominated systems and electricity markets that are increasingly reliant on energy storage.
Census analysis 1981-2011 finds nights shift workers into paid agricultural roles but days move them to seasonal work, cutting output in bot
abstractclick to expand
This paper finds diverging partial effects of diurnal warming (higher nighttime and daytime temperatures) on agricultural wage-labour shares from decadal Indian Censuses (1981-2011). Though both margins contract grain output and cultivated area, only higher maxima raise harvest prices locally, consistent with a model where warmer nights shock land but warmer days shock land and labour productivity. Warming nights shift seasonal workers and self-cultivators into agricultural labour; warming days push labour to the seasonal margin. Long differences show the labour divergence is rural. In towns, both margins depress non-agricultural worker shares.
We examine a choice between bonus contracts offered to dealers of a U.S. auto manufacturer. In our data, dealers select the non-profit-maximizing option in 20 percent of observations, costing the mistaken dealers $18,453 per year on average. We examine how the propensity to make this mistake varies with competition, identified both cross-sectionally and within dealers over time. Both analyses show that greater competition substantially lowers the rate of mistakes. However, even in the most competitive markets, consequential mistakes persist. Our results suggest that competition disciplines mainly through within-dealer changes in behavior rather than entry and exit.
This study proposes a new set of a firm's "social statements" that represent social value, in contrast to conventional financial statements that represent economic value. Financial statements externalize social and environmental costs, and this externalization is one of the primary causes of contemporary social problems. Insights from anthropology, philosophy, and sociology suggest that social value is grounded in social relationships, joint actions, and communication. Building on this understanding, we assign numerical indicators of a firm's social relationships with external stakeholders to the items of a balance sheet and a profit-loss statement as social statements. This approach enables unified measurement units and simplified calculation compared with existing methods for evaluating social impact or social value. Moreover, similar to financial statements, social statements allow firms to be assessed using managerial indicators such as equity ratios and profit margins. The significance of social statements lies in incorporating social value--alongside financial value--into corporate decision-making, and in encouraging social transformation as firms publicly articulate their social value.
TRI uses title and abstract embeddings to estimate the chance a paper matches past patent-paired work, with external validation at multiple
abstractclick to expand
Universities, funders, investors, and policy agencies often need to identify research with translational relevance before patents, licenses, startups, or industry collaborations are visible. This study introduces the Translation Readiness Index (TRI), a text-based measure evaluating a publication's semantic similarity to papers that appear in high-confidence patent-paper pairs. Using 20,610 publications from OpenAlex, including 9,431 publications from the Reliance on Science patent-paper pairs data and 11,179 matched comparison publications, we created paper-level 768-dimensional semantic embeddings from titles and abstracts with SPECTER2. After evaluating four machine learning classifiers, XGBoost achieved the highest ROC-AUC (0.77). We define TRI as the model-estimated probability that a publication belongs to the patent-paper-paired class. Linguistic analysis revealed that patent-paired publications more often use an invention-oriented framing, distinct from the observational language of the comparison group. External validation across University of Western Australia (UWA) publications and leading global universities demonstrated positive associations between high TRI scores and independent translational indicators. TRI provides a text-based method for identifying translation-ready research, though it should be interpreted as a measure of semantic proximity to patented science rather than a direct measure of realized commercialization.
Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are paired with natural-language rationales and scored against realized outcomes. We introduce Explanation Quality Markers (EQMs), a set of sixty theory-guided reasoning patterns scored by large language models (LLMs). In a pre-registered analysis of over 55,000 forecast-rationale pairs from a multiyear forecasting tournament, EQMs predict accuracy at both the forecast and forecaster levels, consistently outperforming pre-LLM text-analysis methods. More than 90% of statistically significant pattern-level EQM-accuracy correlations match our directional hypotheses. The signal is asymmetric: EQMs identify likely underperformers more reliably than they distinguish the very best forecasters. Benchmarked against traditional indicators of forecasting skill, EQMs are the strongest predictor at the forecast level and competitive at the forecaster level, though weaker than prior accuracy. Human ratings of rationale quality are less consistently correlated with accuracy and place disproportionate weight on rationale length. Results transfer to an independent forecasting study. EQMs provide a scalable, interpretable method for extracting judgment-relevant information from written explanations.
Agentic artificial intelligence is increasingly deployed not as a single assistant but as a collective of planners, solvers, reviewers, memory managers, tool users, and orchestrators. These systems are entering organisational workflows under familiar labels such as teams, managers, committees, markets, and workflows. This article asks whether such agent collectives exhibit organisational behaviour in a sense that is analytically comparable to, yet distinct from, human organisational behaviour. I argue that agentic AI is a partial organisational analogue. It resembles a human organisation because it differentiates work, coordinates interdependence, performs recurrent routines, crosses boundaries, and produces collective outcomes. It differs because these patterns are not sustained by motivation, identity, trust, employment, socialisation, or moral accountability. They are sustained by context architecture: prompts, memory, traces, schemas, tools, validators, and permissions. The article develops contextual transaction cost as the central mechanism linking these similarities and differences. Computational theorising, synthetic task simulations, real LLM agent traces, and robustness analyses show that human-imitation forms often underperform when they add lossy handoffs, correlated deliberation, and verification burdens, whereas shared-state and adaptive forms perform better when they make context durable, inspectable, and task-contingent. The article contributes to organisation studies by theorising agentic AI as an emerging object of organising and by specifying the interface conditions under which human and agentic organisational behaviour can jointly support collective intelligence.
Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented size, scope and granularity data, we construct the AI Factor from growth in tokens, dollars, and users, estimate firm-level AI Betas from stock return comovement, and characterize the AI Premium. First, we build a high-frequency AI factor and decompose it into salient components. Second, we show that firms whose returns covary more positively with the AI factor--high AI beta firms--earn higher subsequent returns, and the AI premium is large and heterogeneous. A value-weighted long-short strategy earns 64.1 basis points per week, and the premium is large for loadings on the intensive, frontier-oriented margin of AI consumption-closed-source models, paying and seasoned users, and long prompts--but not on casual or open-weight use. Third, the premium reaches beyond technology firms into consumer-facing and capital-heavy parts of the economy, but is absent in emerging markets, including China. Fourth, the AI exposure is more positive in nonroutine interactive work and the more negative in analytical, scientific, and operations-control skills--an occupation one standard deviation higher in interaction-and-communication content has 0.36-standard-deviation higher market-implied AI premium. Additionally, we provide early evidence of the rise of the agentic economy.
We study retrieval over catalogs of structured metadata, where each record is a small schema whose fields answer different kinds of query. Embedding a record with a text encoder first serializes its fields into a string, which forces a choice of field order. We show this choice, usually treated as an implementation detail, silently controls retrieval quality once the encoder is fine-tuned. A standard fine-tune loses 7.4 nDCG@10 points when the index is rebuilt under a different field order, because it reads absolute position instead of the field labels. We propose permutation-invariant fine-tuning ($\textbf{PI-FT}$), which serializes each record under a freshly sampled field order with random field dropout, so meaning binds to the labels rather than to position. The change is about two lines in the data loader; it costs negligible in-distribution accuracy and cuts the order-change penalty to 0.2 points. We study this in the discovery of development statistics, a catalog of nearly 10,000 indicators that should be searchable in many languages by a model small enough to self-host. As AI assistants and agents increasingly mediate access to public data and statistics, this retrieval step decides whether an answer is grounded in the right indicator or series, making discoverability a precondition for disseminating data through AI. Because usage logs cannot provide training signal for indicators no one has searched, we generate the queries instead. $\textbf{DevDataBench}$ is a fully LLM-generated benchmark of grounded, facet-targeted queries across 15 languages, covering every indicator for both training and evaluation. A fine-tuned 118M-parameter CPU encoder outperforms every zero-shot baseline, including $\texttt{text-embedding-3-large}$ (0.707 vs.\ 0.556 nDCG@10), with the largest gains in low-resource languages. We release the benchmark, pipeline, models, and a reusable PI-FT framework.
Cost-based allocation under a formal convention allowed evasion of standard detection methods.
abstractclick to expand
This paper analyzes the internal organization and economic effects of a bid-rigging cartel in the road construction sector of the Swiss canton of Ticino, active from 1999 to 2005. Using exceptionally rich documentary evidence, we reconstruct how cartel members coordinated bids and allocated contracts under a formal agreement known as the 'convention'. We show that, despite the absence of side payments, the cartel implemented a cost-based allocation mechanism that closely approximated the first-best collusive outcome. Regression and machine-learning analyses indicate that observable cost proxies systematically predict both winning bids and bid rankings. The evidence further suggests that cartel members strategically mimicked competitive bidding behavior, allowing them to evade standard econometric detection methods. Using double machine learning, we estimate average overcharges of at least 45\%, and potentially substantially higher, highlighting the significant financial harm caused by this sophisticated form of collusion.
This paper studies whether news about banks' balance sheets propagates to aggregate financial conditions and macroeconomic activity. We construct high-frequency Canadian bank net-worth shocks using stock-price reactions around earnings announcements of the six large Canadian banks. Guided by a model in which higher intermediary net worth expands credit supply and lowers borrowing spreads, we use the co-movement between bank equity prices and Canadian corporate spreads to purge raw bank equity surprises from contaminating information. Favorable purged credit-supply bank net-worth shocks lower corporate spreads, raise bank valuations and broader equity prices, appreciate the Canadian dollar, and increase real activity over the medium run. The results are robust across specifications, samples, and additional outcomes, and suggest that bank earnings news is macroeconomically relevant in concentrated banking systems.
In tests with 24 rural users, internal coordination matched subsidies for investment under falling prices and low surplus pay.
abstractclick to expand
The success of distributed photovoltaics may be undermining its own future. As solar penetration increases, electricity prices decline during periods of peak generation, reducing the value of surplus photovoltaic production. This raises a critical question: can citizen-led energy systems remain economically viable in electricity markets dominated by renewable generation?
Rather than exploring technically optimal but institutionally unrealistic solutions, we examine the options available under current regulatory and market conditions. Using high-resolution consumption data from a rural community sharing a PV facility among 24 users, we identify pathways for long-term sustainability. The study makes two contributions. First, it shows that effective internal coordination can mobilize participation and investment as successfully as external subsidies. Second, it compares static, dynamic, and hybrid energy-sharing models, with and without storage, providing a flexible framework that balances efficiency, fairness, and governance.
Results show that collective self-consumption reduces required PV capacity, lowers investment costs, and increases annual savings compared with individually operated systems. Alternative allocation schemes further improve benefit distribution and local electricity use, although gains depend on trade-offs between efficiency, fairness, and governance complexity. Under current electricity prices and remuneration schemes, battery storage provides limited additional economic value and becomes attractive only under specific market conditions. Overall, the long-term viability of citizen-led photovoltaic initiatives depends less on technological sophistication than on collective coordination and adaptive governance.
Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical question that becomes more urgent by the day, with market research companies already offering provisional `synthetic' survey panels and the contamination of standard survey data from LLM-generated responses. In this study, we build on past work on silicon sampling by extending considerations of its algorithmic fidelity and alignment to the domain of cultural consumption. We use large-language models from OpenAI, Anthropic, and DeepSeek to each produce 277,470 (30x9249) silicon surrogates of survey respondents from the Survey of Public Participation in the Arts (SPPA). We find these silicon surrogates' tastes to be highly stylized facsimiles of human tastes. (1) Silicon samples have a systematic postive-bias for liking, resulting in inflated ecological estimates of tastes. The individual-level bias of silicon samples are not well-explained by the WEIRD-bias often discussed in the literature. (2) The complex relationality in real taste structures is completely lost among silicon samples. (3) Finally, very little of the known cultural alignment between tastes and social space are preserved. Silicon samples attenuate age-taste associations, resurrect anachronistic class-taste associations, caricaturize gender- and race-taste associations.
Computing optimal policy in heterogeneous-agent economies is complicated by the possibility of multiple equilibria. We overcome this difficulty by showing that when the equilibrium manifold has a low-dimensional Negishi-weight parameterization, Bayesian optimization reliably finds approximate solutions and can be used to certify candidate solutions with high probability. This insight brings recent machine learning advances to bear on a core problem in macroeconomics. We apply Bayesian optimization to a dynamic economy with heterogeneous agents and climate change and compute optimal carbon taxes in this setting. Although in principle the presence of the carbon externality creates scope for multiple equilibria, we show that in an example with realistic calibration of damages competitive equilibra are most likely unique.
The company's role stays the same: create shared context where human and machine knowledge convert and amplify each other.
abstractclick to expand
Nonaka emphasized that innovation is the result of a continuous back-and-forth between tacit and explicit knowledge. Artificial intelligence introduces a fundamentally new object into this process -- tacit machine knowledge -- but Nonaka's ideas are more relevant than ever. The central role of the knowledge-creating company remains the same: to create the shared context in which different kinds of knowledge can feed off each other, become organizational knowledge, and set off further cycles of innovation.
Global supply chains are highly interconnected, making them vulnerable to cascading disruptions induced by trade policy shocks. Understanding how such disruptions propagate through production networks, and how mitigation mechanisms such as trade reallocation and production adjustment can alleviate their impacts, remains a central challenge. In this work, we develop a linear programming formulation of an Input-Output (IO) system that captures cascading supply-chain disruptions together with trade reallocation and production expansion. Our formulation yields a system-level equilibrium characterization that enables the joint analysis of disruption propagation and mitigation within a unified framework. We propose an efficient algorithm for computing approximate equilibrium solutions by minimizing total unmet demand in large IO systems. We apply our approach to tariff-induced disruptions in the global oilseeds supply chain arising from the U.S.-China trade war. Our results show that a localized 70% disruption to flows from the U.S. oilseeds sector to China leads to a 3.27% loss in global output, with China experiencing a disproportionate loss of 14.02%. As a counterfactual mitigation strategy, allowing a 20% reallocation from Brazil's oilseed sector to China significantly reduces global output losses to 1.36%, although pressure remains high on final-demand flows. We further investigate production expansion as an additional mitigation mechanism and show that it introduces tradeoffs between reducing global final-demand losses and protecting Brazil's domestic flows. Domestic reallocation disproportionately shifts losses toward smaller economies, while globally sourced expansion redistributes losses more broadly across the network.
Firms shift from training the least-skilled to the most-skilled below the AI level when workers can switch jobs.
abstractclick to expand
When firms deploy autonomous AI, they must decide how much work to leave to the system and how much to keep workers engaged. This decision affects current output and future human capital. We develop a parsimonious two-period model in which AI may outperform the worker when it functions, but may fail with positive probability. A firm chooses worker engagement; engagement lowers current output for below-benchmark workers, but changes future skill through learning and erosion. We distinguish two dimensions of AI progress: capability, the system's output when it works, and reliability, the probability that it works. In a single-firm benchmark, engagement is valuable only as fallback investment. The firm engages the least-skilled workers most, because they have the largest skill gaps and are least costly to bring toward a useful fallback level. With worker mobility, engagement also affects labor-market sorting: workers prefer jobs that build more valuable skill trajectories. This sorting motive targets higher-skill workers near the AI frontier, where skill gains are more valuable and engagement is less costly. Mobility can therefore reverse the engagement pattern, shifting investment from the least-skilled toward the most-skilled workers below the AI benchmark. Mobility also reshapes how AI progress affects engagement: greater capability raises engagement by increasing the value of the skill trajectory a firm offers, whereas greater reliability can raise or lower it because it reduces fallback need while also changing learning opportunities. Under worker mobility, human-AI work design becomes a problem of human-capital investment, in which allocating work today shapes future skill.
The healthcare sector contributes approximately 4.4% of global greenhouse gas emissions, yet research on the organizational determinants of sustainable behaviors among healthcare workers remains limited. This study examines how green transformational leadership and ethical climate influence sustainable clinical behaviors among registered nurses, with green psychological climate as a mediator and perceived organizational hypocrisy as a moderator. Data were collected from 760 nurses across 11 public and private hospitals in Jordan using a cross-sectional survey design. Structural equation modeling with bootstrapping was employed to test the hypothesized relationships. The results revealed that both green transformational leadership and ethical climate positively predicted sustainable clinical behaviors. Green psychological climate partially mediated both relationships. Perceived organizational hypocrisy significantly weakened the positive effects of green transformational leadership and ethical climate on sustainable behaviors. The model explained 35.7% of the variance in sustainable clinical behaviors. These findings highlight that fostering sustainability in healthcare requires not only supportive leadership and ethical organizational environments but also authenticity and consistency between stated values and actual practices. The study extends green transformational leadership theory to healthcare settings, integrates ethical climate research with environmental sustainability, and introduces perceived organizational hypocrisy as a critical boundary condition. Practical implications for healthcare administrators seeking to reduce their environmental footprint are discussed.
This paper studies how topping up -- allowing recipients of in-kind transfers to supplement subsidized consumption in a private market -- affects optimal redistribution. Consumers can access a competitive private market, while a social planner offers an alternative nonlinear price schedule. We show that the effect of topping up depends on the correlation between redistributive priority and demand. When the correlation is positive, topping up does not affect the optimal mechanism. When the correlation is negative, topping up weakens screening and reduces redistribution. At the extensive margin, topping up reduces the set of environments in which intervention is optimal. At the intensive margin, topping up weakly reduces both the scope of a free public option and the mass of consumers served, and shifts redistribution away from the consumers with the highest redistributive priority. We characterize the optimal mechanisms and show how topping up changes the comparative statics of optimal redistribution with respect to redistributive priorities.
Machine learning (ML) has rapidly transformed economic history, lowering costs of digitization, data linkage, and imputation, and making information in historical text usable at scale. This paper offers a practical guide to using these tools well. However, ML tools have also created new problems. Prediction errors are often systematically correlated with covariates of interest, so even highly accurate models can distort and sometimes reverse coefficients, and standard validation cannot detect this. Given that ML tools often perform worse for historical data, this problem is especially severe for the field of economic history. We also identify a solution to this problem. We show that recent debiasing methods can correct such bias for a wide class of applications, using a small, randomly sampled set of expert-coded labels while retaining the efficiency of large-scale prediction. We organize the field with a taxonomy of three ML tasks, survey the literature along it, and indicate where debiasing applies and where validation against proxies remains the only recourse. We close with best-practice guidance on digitization, model choice, and reproducibility.
China's electric-vehicle (EV) sales share rose from about 1% in 2015 to roughly 45% in 2024. We evaluate this technology transition with an equilibrium differentiated-products model of the Chinese auto market, and quantify both its attribution and its welfare and reallocation consequences. Every yuan of 2024 EV subsidy delivered about 3.38 yuan of private surplus, but this surplus accrued asymmetrically. Per-capita consumer-surplus loss from subsidy removal is about five times larger in Tier 1 than in the Rest tier; about half of the aggregate welfare loss operates through indirect Wright's-law learning rather than the direct cash transfer; and EV-native firms (BYD, Tesla, New Forces) retain 16-27% of their 2024 EV business under subsidy removal while traditional state-owned manufacturers retain only 11%. A Shapley decomposition into six channels -- Quality, Variety, Battery, Subsidy, Residual, and Market -- attributes the historical 2015-2024 rise primarily to product-quality gains (+45.49%), choice-set expansion (+14.81%), and battery-cost decline (+8.20%). The Subsidy block is negative (-13.63%) because direct purchase subsidies were phased down, not because subsidies reduce demand: a separate counterfactual that removes the 2024 subsidy entirely lowers EV share by 23-33%.
Large Language Models (LLMs) are increasingly used as stand-ins in behavioural games. These stand-ins rely on the assumption that the LLM's distribution of choices meaningfully matches how humans play the same game. This study tests that assumption through two games. The first is a p-beauty contest, and the second one is a public goods game. The study first investigates five local-model settings within the same model family. These settings are varied together in a 360-cell factorial, which balances temperature, scale (0.5-32B), quantisation, instruct vs base, and framing. Each cell's distribution is then compared against whole choice distributions in published human data. Each deployment setting, except for quantisation, governs a different aspect of fidelity. Mechanically, while the dispersion of human players can be somewhat recovered through deployment settings, the strategic process behind it cannot. Through the lens of the level-k cognitive theory, we find that LLMs act as static, category-retrieved level-k players, where k is set by the model scale. The models also do not run within-game belief-updating or backward induction throughout multiple-round horizon settings. While human contributions decayed in the public goods game, LLMs stayed flat or rose at every scale. When the horizon test was administered, LLMs were more cooperative under an indefinite horizon compared to a finite one. However, LLMs ignore their relative round position, so no last-round defection was displayed. This implies that LLMs retrieved levels relative to the horizon category rather than working out iteratively from the specific game setting.
The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.
Product complexity estimates no longer vary with chosen geographical scale and track GDP per capita and employment more closely.
abstractclick to expand
Several network-based measures have been proposed to assess the economic complexity of countries. These measures have provided important insights into national economic development, and they are now widely applied at the subnational level as well. Here, we show that such applications lead to inconsistent results, in the sense that the estimated complexity of the same product appears to depend on methodological details such as the geographical scale of analysis. Building on these findings, we propose a measure of territorial economic complexity based on an exogenous and extensive computation. We show that these methodological choices yield estimates that are more consistent and more strongly aligned with standard economic indicators, such as GDP per capita and employment.
We analyze usage data from OpenAI's Codex tool to present large-scale evidence of how agentic AI technology, which can take actions on a user's behalf, changes how people work. We use an automated, privacy-protecting pipeline to contrast usage across three populations: external personal-account users, external organizational-account users, and workers within OpenAI. We find that agentic AI usage is growing rapidly: the number of active users has grown more than fivefold in the first half of 2026, with the most rapid increase occurring outside the initial audience of software developers. Uptake is uneven: within OpenAI, Codex usage is nearly universal and has largely replaced business usage of ChatGPT. We document a similar shift to agentic tooling outside OpenAI, particularly within organizations, although external adoption remains lower and more uneven. In addition to headline usage figures, we observe measures of sophistication, and find that a growing number of users have used Codex to change their workflows substantially. More than 10% of users manage three or more concurrent Codex agents at some point each week and that 26.6% use skills, which allow users to share instructions for complex workflows. Alongside these changes in usage practices, request complexity has increased: since the start of the year, the share of individual Codex users who submit at least one request for a task estimated to require more than eight hours for an experienced human to complete has increased nearly tenfold. Concurrently, output has grown rapidly -- in June 2026, the median OpenAI employee in a legal role generated 13 times more monthly output tokens across Codex and ChatGPT than they did in November 2025, while the median researcher generated more than 50 times as many. We conclude by discussing the implications of these patterns for productivity, job reorganization, and workforce restructuring.
In recent years, technological developments and activities by private actors have led a reemerged discussion of the potential of nuclear fusion to meet growing global energy demands. So far, however, fusion technologies remain at comparatively low development levels and their deployment in commercial power plants is probably still decades away. Regardless, over the last decades, many cost studies have been conducted that estimate the future cost of potential fusion power plants. But to date, there is no systematic and harmonized assessment of these projections. Therefore, this study conducts a stochastic analysis of future fusion power plant costs for three distint technology lines, magnetic confinement, inertial confinement, and magneto-inertial confinement fusion, including cost assessments of different technology maturity levels. These levels are further assessed to determine projected learning rates for future fusion costs. For mature technologies, mean LCOE are determined at 114.6, 110.3, and 143.9 USD per MWh for MCF, ICF, and MIF devices, respectively. This implies learning rates of more than 30%. We find that these projected values are rather optimistic when compared to other literature or comparable technologies like fission. We therefore urge policymakers to caution when potential fusion developers refer to the potential economic competitiveness of fusion power plants.
Share of China-produced science behind Chinese patents rose from 1% in 2000 to 26% in 2025, overtaking the U.S. share in 2021.
abstractclick to expand
U.S. policy increasingly seeks to slow China's technological rise by restricting its access to American science, on the assumption that Chinese innovation depends on U.S. science. Linking the full corpus of Chinese invention patents to the global scientific literature, we show that this dependence has fallen in recent years: the share of the China-produced science behind Chinese patents rose from 1% in 2000 to 26% in 2025, overtaking the U.S. share in 2021. As China's reliance on U.S.-produced science fades, policies restricting access fall out of alignment with the U.S.' actual strategic position.
A central challenge in modern energy market design is the formulation of a strategy-proof imbalance settlement layer that secures both the economic efficiency of the institution and the stability of the power grid. Public data reveals that the day-ahead market is strategically biased below actual consumer demand. Such empirical observations are explained by active prosumers which provide implementable incentives for demand under-reporting. Active prosumers buy energy in the day-ahead market and sell energy in the real-time market for balancing real-time energy deviations. By under-reporting their demand for the day ahead they inflate real-time imbalances and, under uniform pricing, they dispatch their generation assets more profitably. We model the two-stage institution under linear preferences and benchmark it against its associated competitive equilibria. We show that although consumers' incentives for demand under-reporting vanish when the day-ahead market scales, prosumers' incentives remain lower bounded by a positive gain which depends only on the real-time market generation stack and their shares over it. To restore incentive compatibility under the existing informational constraints, we design a leave-one-out contrastive scoring rule-based penalty that is implemented by the day-ahead market operator, incentivizes prosumers to report their demand truthfully and ensures small charges when participating honestly. We illustrate these results with numerical simulations on synthetic data and evaluate our mechanism on real-market data by first rationalizing demand reports as subjective equilibria of the induced game. Our mechanism demonstrates strong incentive alignment while retaining a low cost for honest participation.
We construct an analytical general equilibrium model of an economy with carbon offsets, and show that increasing the carbon offset price has an ambiguous effect on aggregate emissions and welfare. Using two carbon accounting metrics, we demonstrate that offsets are over-credited under many parameterizations; however, offset under-crediting can also occur. Due to general equilibrium effects, neither carbon accounting metric is a sufficient statistic for welfare. Furthermore, we define four margins whereby offsets can respond to payments, including a margin not previously identified. Our results suggest that market spillover effects warrant consideration when evaluating carbon offset policies.
Existing approaches to e-scooter mobility hub planning lack city-type-specific causal evidence. Demand models are typically correlational, built on proprietary trip data, and do not distinguish how driver profiles vary across urban typologies. This paper presents a three-phase agentic AI framework that constructs a Causal Template Library from public GBFS data across 29 German cities, encoding which environmental features causally drive hotspot demand for each combination of city type (large, university, industrial, hilly) and cluster type (core, peripheral). A large language model (LLM) orchestrated causal discovery pipeline adapts algorithm selection to local data conditions across 57 city-cluster units. The library reveals systematic variation. Core demand is driven by activity access and transit proximity, while peripheral demand responds to built form, with city-type-specific patterns supporting transferable siting templates. A planning tool built on the library scores candidate sites, calibrates infrastructure recommendations to local demographics, and generates practitioner-ready reports. In Heilbronn, Germany, two hub sites informed by the framework's causal evidence are currently under construction, illustrating how the outputs can support real-world siting decisions.
Some orbital locations are crowded while others remain unoccupied. We explain why using the geostationary orbit as a near-ideal laboratory: a mature, one-dimensional orbit in which satellite operators compete for position under first-come first-served allocation rules. Using the complete ITU registry and a simple competitive entry model, we predict the observed distribution of active GEO satellites with $R^2 = 0.64$. In walk-forward tests, the structural model also predicts individual slot choices out of sample better than a fitted conditional-logit discrete-choice model. Our model also predicts the distribution of inactive payloads in GEO with $R^2 = 0.44$, showing that the geography of debris risk can be predicted when it is a function of satellite launches. Surprisingly, we find that the current satellite distribution in GEO is relatively fair: driven by population rather than income and placing satellites in economically efficient locations. However, our model shows that this is only the case for mature slots.
Interviews across Europe show limited state power under growth paradigm and stress need for civil society ties.
abstractclick to expand
Current societies face interconnected environmental and social crises. Post-growth research argues that addressing these challenges requires a reorganization of society around the priorities of environmental sustainability, social equity, and human wellbeing over economic growth. While scholars highlight the state's potential role in enabling post-growth transformations through changes from within government institutions, post-growth-minded public officials face tensions between aspiring for radical changes of established structures while working within these structures. To understand how public officials across contexts navigate this tension, we ask: How do post-growth-minded public officials promote post-growth approaches in their work? How do the strategies differ between civil servants and elected officials? What do these strategies reveal about the capacity of the state to advance post-growth transformations? To answer these questions, we interviewed 41 post-growth-minded civil servants and elected officials. Interviews covered seven European countries and local to supranational governance scales. We find that public officials aim to influence thinking and discourses as well as decision-making processes and the implementation of policies. Overall, elected officials tend to feel that they can be more outspoken in their activities whereas civil servants are more inclined to promote post-growth approaches indirectly. Both groups pursue coalitions with various actors within and beyond their institutions. Public officials strategies underscore the limited capacity of the state to advance post-growth approaches under the current growth paradigm. We suggest that cooperation with civil society actors is central to build a sense of collective agency and to foster the interactions between symbiotic and interstitial strategies for post-growth transformations.
Energy poverty persists even among households that are not income-poor, suggesting a deeper mechanism than mere budget constraints. We develop a model in which indoor thermal comfort is produced through a non-convex technology that couples energy input with dwelling efficiency. A critical efficiency threshold emerges below which the minimum comfort level is physically unattainable, regardless of how much energy is purchased. Households below this threshold suffer from structural energy poverty, which income transfers alone cannot cure. The model yields three sharp policy predictions: energy price shocks are strongly regressive, efficiency investments dominate income transfers and price subsidies in reducing energy poverty, and a cost-effective anti-poverty strategy must combine targeted retrofits with temporary income support. The results are illustrated with symbolic diagrams and formal proofs.
Food bank use in the UK has soared in recent years. The combination of a global pandemic, over-stretched and underfunded public services, and a cost-of-living crisis has meant that millions of people cannot afford basic essentials such as food, heating, housing, and baby supplies. Food bank use is driven by a complex range of factors, including poverty, health emergencies, income shocks, delays to universal credit payments, housing issues, and homelessness. In this study we identify an urban-rural divide in spatial accessibility to food banks. In cities, food banks tend to be highly accessible by public transport to deprived populations but, on average, have shorter opening hours. In rural areas, however, despite generally longer opening hours, food banks are typically not highly accessible except for the most deprived residents. This matters. We find that spatial accessibility to a Trussell food bank centre is a key predictor of food parcel uptake, with a significantly stronger relationship than factors emphasised in the literature such as disability and Universal Credit. Importantly, this relationship is markedly stronger for rural populations, suggesting an unmet need in deprived rural areas far from food banks. Our work has important implications for food bank policy, suggesting a need for improved public transport in rural areas, and optimising current food bank locations and delivery models.
This paper develops regenerative bonds as formal debt instruments whose disclosed use-of-proceeds and governance rules allocate proceeds to locally governed settlement systems designed to strengthen settlement capacity across locally specified productive, ecological, care, mutual-aid, and repair commitments without converting those commitments into investor collateral. It separates bondholder claims from local redeemable commitments and models commitment pools that curate, value, limit, exchange, route, and repair those commitments. Sarafu Network, based in Kenya, provides component evidence on commitment circulation, stable-value interaction, liquidity, topology, and report-linked activity. A Monte Carlo engine calibrated to privacy-safe empirical moments asks whether bond liquidity can act as reusable catalytic funding while preserving issuer responsibility for debt service. Under the reported assumptions, the frontier identifies a modeled guardrail-pass region in which scheduled service is preserved, mutual-aid circulation is maintained or amplified, and bond issuer headroom remains available in lower-stress cells; edge diagnostics show that higher debt-service pressure and capital intensity narrow this region. The contribution is a settlement-architecture framework for evaluating when formal debt can strengthen local capacity to fulfill and repair commitments without becoming hidden household collateral.
A set of exposure scores calculated in 2023 has become a central empirical input to the future of work debate. Produced by Eloundou et al. (2023) and referred to here as the GPTs are GPTs scores, they define exposure as the share of occupational tasks a large language model can assist with. This work is a genuine methodological contribution, but as the scores travel from the time and place they were produced, the limitations the authors named do not always travel with them. Two gaps have widened as a result. The first is structural, between what static exposure scores measure and what policy questions actually require. Taking the diffusion of these scores as a case study, we show how their temporal, geographic, and ontological limitations compound in policy-facing analyses, and we survey five families of research responding to these limits: dynamic and benchmark-based measures, ensemble methods, task-framework extensions, worker-centered metrics, and adoption and usage data. The second gap is the one we argue needs more attention: the coordination between researchers and policymakers. The policy-relevant work which ask who is harmed, who benefits, how, and when, continues to reference the static GPTs are GPTs scores without engagement with the methodological updates that would let these questions be answered more reliably. We then ask what additional steps towards navigating uncertainty remain: ex-post frameworks and the deliberate, political work of reimagining what futures are worthy of building towards are. Closing the research-policy gap is a shared task: policymakers must widen their evidence base, engage workers as epistemic partners, and shift from prediction to preparedness; researchers must build data infrastructure, adopt participatory methods, and write with policymakers in mind. Better measurement matters, but it will not close the second gap alone.
We introduce \emph{Equilibrium World Models} (EWMs), a deep-learning method for globally solving dynamic stochastic models that feature rare disasters, binding constraints, and counterfactual states. Standard unsupervised neural-network-based solvers impose equilibrium conditions only on states generated by their own simulated policy. Their solutions can therefore be self-confirming: accurate on the simulated path, but untested off it, sensitive to initialization, and costly when expectations must be recomputed at each step. EWMs change the computational representation, not the economics. They enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. We demonstrate EWMs through a sequence of test cases that isolate the main pathologies of classical deep-learning solvers and then scale them to richer economies. In a rare-disaster Brock--Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude. In a high-dimensional international real-business-cycle model, classical deep-learning solvers fail from all random starts, whereas EWMs converge from nearly all and evaluate continuations up to two orders of magnitude less often. When actions move transition measures, EWMs use action-conditioned continuations to recover the relevant policy margin. In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution rational-expectations conditions.
Non-price interventions targeting specific household water uses are increasingly central to conservation policy, but whether end-use savings translate into lower aggregate demand remains unresolved. This paper reports evidence from a pre-registered field experiment in which 775 Finnish households were randomized to a shower timer, a water-saving shower head, or the same shower head with real-time feedback. Utility-grade water meters measure household-level effects, while shower-level data provide complementary end-use evidence for the two shower-head treatments. The shower timer has no detectable effect. In contrast, the water-saving shower head reduces daily household demand by about 5%, and pairing it with real-time feedback doubles this reduction to about 10%. The convergence between shower- and meter-based estimates shows that end-use savings largely pass through to aggregate demand rather than being offset elsewhere in the home. Cost-benefit analysis indicates that combining technological constraint with salient point-of-use feedback dominates reminder-based strategies.
Automation and artificial intelligence (AI) are reshaping labor demand unevenly across space, creating an urgent imperative for place-sensitive education and workforce policy. This study asks whether regional exposure to automation and to AI relates to local employment and wages in opposite ways, and whether those relationships differ between urban and rural regions -- two questions whose answers carry direct implications for how skills training and digital education should be targeted. Using a region-by-year panel and shift-share measures of technological exposure built from baseline industry and occupation composition, we estimate two-way fixed-effects and instrumental-variable models that interact exposure with an urban indicator. The framework distinguishes automation exposure, concentrated in routine work, from AI exposure, concentrated in cognitive work -- a distinction that maps directly onto the types of skills that education systems need to develop or preserve. Estimates show automation exposure lowering employment and wages, with the employment loss cushioned in cities, while AI exposure raises wages and concentrates in urban regions. Technology therefore reshapes, rather than simply widens, the divide. The findings argue for place-sensitive policy: weighting reallocation and reskilling support toward routine-exposed rural regions, while extending digital infrastructure and AI-complementary skills outward so that rural workers can share AI's wage gains rather than absorb only automation's losses.
Large language models are increasingly deployed as autonomous decision makers, yet the behavioral mapping they exhibit can vary substantially across decision environments that are payoff-equivalent by construction-environments that share identical payoff-relevant structure but differ in surface presentation. This sensitivity renders suite-based evaluation fragile and raises a fundamental question of behavioral portability: how well does a behavioral mapping learned in one decision environment informative on another that preserves the same underlying incentive structure? We introduce a formal framework to measure this property. Our protocol fits an interpretable behavioral model on data pooled from a set of source environments and evaluates its out-of-sample predictive performance in a held-out target environment, benchmarking against an oracle trained directly on target data. Portability is quantified via a loss-agnostic measure that delivers worst-case bounds on the performance of the induced prediction-action mapping in the target environment. In controlled experiments spanning seven canonical economic decision problems, we document substantial and systematic portability losses, suggesting that behavioral characterizations of LLMs obtained in one decision environment cannot be assumed to transfer reliably to structurally equivalent alternatives.
India's post-liberalisation higher education expansion was premised on widening credential access for historically excluded groups. We show that the groups most expected to benefit - Scheduled Caste and Scheduled Tribe (SC/ST) workers - instead bore a disproportionate share of the resulting wage cost, a pattern we term the double whammy. We merge eight rounds of the NSS Employment-Unemployment Survey (1987-2011) with a district-level measure of college-expansion intensity built from the All India Survey on Higher Education (AISHE) and estimate reduced-form triple- and quadruple-difference wage specifications across 91 districts in six states (N = 79,904), interacting graduate status, expansion intensity, and post-expansion cohort. The human capital return to a degree remains large and positive throughout (about 1.08 log points), yet the graduate wage premium erodes for post-2004 cohorts in high-expansion districts: non-SC/ST graduates earn roughly 9 per cent less than comparable graduates in low-expansion districts at mean intensity, and SC/ST graduates face an additional penalty of about 34 per cent (a combined shortfall near 43 per cent). The SC/ST differential is statistically indistinguishable from zero before the expansion and emerges only afterwards. Non-graduate placebo and pre-trend tests are broadly consistent with a credential-signalling channel, though we flag the limits of the design rather than claim clean identification. The results suggest that expanding access without commensurate investment in institutional quality can deepen, rather than narrow, labour-market inequality for disadvantaged groups.
In 2019, North Dakota repealed its Sunday closing law, which had required most non-grocery stores to close between midnight and noon. Using this policy change and consumer GPS data, we study the impact of opening hours on shopping behavior and welfare. We compare visits before and after the repeal in North Dakota and neighboring states using difference-in-differences and event-study designs. The repeal caused a large increase in Sunday morning visits, originating partly from intertemporal, store-type, and cross-border substitution. The closing law's welfare loss is equivalent to increasing the travel distance to affected stores by about 1.4 miles per consumer.
Testing single, adversarial, and multi-agent methods on mechanism design shows external checks catch errors that polished text misses.
abstractclick to expand
Empirical economists often start their projects with a toolbox. Shared packages, replication archives, and circulated guides shorten the time between and idea and a rough initial draft. Theorists, on the other-hand, largely start from a blank page. By 2026, large language models can a produce and check nontrivial mathematics. The can also hallucinate and write wrong claims very convincingly. The current bottleneck on machine-assisted theory is no longer production but trust: a model will claim to prove a false theorem as readily as a true one. Building on recent attempts in mathematics, I present 3 methods for doing economic theory with a language model. These methods differ on how the work is verified: a single disciplined pass, an adversarial prover-verifier pair (Claude Opus~4.8 proposing, OpenAI Codex refuting), and a structured multi-agent project with a reviewer gate (inspired by the Google co-mathematician architecture). I demonstrate these protocols on one open worked example: designing a Groves/Pigouvian incentive mechanism for the Gans--Kominers eigengrade model of grade inflation. None of the three runs produced a strict direct-revelation VCG/Clarke mechanism (as requested, perhaps due to the non-existence of such mechanism). Three phenomena recur. First, convergent discovery: two runs derive the same effective-resistance externality kernel on opposite margins. Second, adversarial verification is load-bearing: the pair caught three of its own false claims and the gate rejected a sub-goal. Third, polish is not rigor: the most finished-looking output was the least verified. The methodological takeaway is that external verification, not model capability, is the design variable.
Battery energy storage systems (BESS) are expected to play an important role in electricity markets with increasing shares of renewable generation. While existing research has primarily focused on price arbitrage and ancillary services, the role of grid fees in shaping BESS operation and profitability remains insufficiently understood. This article investigates how different levels of distribution fees affect the scheduling and economic viability of BESS in the day-ahead electricity market.
The analysis employs a mixed-integer linear programming model of BESS operation combined with electricity price data from the German market. Four system configurations are considered: stand-alone storage and BESS combined with consumption, generation, or both. The value of storage is measured as the difference between system profits with and without BESS. In addition, a rolling-horizon optimization framework is used to evaluate the impact of forecast uncertainty and decision horizon length on operational outcomes.
The results show that grid fees significantly influence both BESS profitability and operational strategies. For stand-alone storage, higher transmission charges reduce arbitrage revenues and battery utilization. When BESS is integrated with consumption and generation units, load shifting and self-consumption become the dominant sources of value, leading to a non-monotonic relationship between grid fees and storage profitability. These findings highlight the importance of considering tariff structures when evaluating storage investments and designing regulatory frameworks for electricity markets with increasing flexibility needs.
Weaker segmentation for Czechs, Hungarians, Italians and Slovaks than for Serbs and Croats emerges only with the recently reinvented indicat
abstractclick to expand
In this paper, we study the patterns of ethnic endogamy in Croatia in relation to six ethnic groups between 1970 and 2015. We find that, over the 45-year period analyzed, the segmentation of the Croatian marriage market was weaker between Czechs and non-Czechs, Hungarians and non-Hungarians, Italians and non-Italians, and Slovaks and non-Slovaks than between Serbs and non-Serbs or Croats and non-Croats. This finding is substantiated by survey evidence revealing similar patterns on relative social distances between different ethnic groups in Croatia and Serbia. From a methodological perspective, we show that a plausible ranking of the degree of segmentation of the Croatian marriage market along ethnic lines can be obtained only when marital sorting is characterized by a carefully selected indicator. While a recently reinvented indicator captures sensible patterns of ethnic endogamy, the commonly applied odds-ratio fails to produce results consistent with survey evidence.
AI generated video summary of the paper: https://idil.li/wp-content/uploads/2026/04/A_Tale_of_Two_Numbers2.mp4
Has generative AI changed how labor markets value human capital? We study this question using contract-level data from Upwork, a large online labor market. We represent worker profiles with high-dimensional text embeddings, allowing us to capture rich human capital information from unstructured profile text. We then compute the predictive importance of workers' human capital information and posted hourly rates for client demand, and incorporate these measures into a difference-in-differences design around the release of ChatGPT. We find that in more AI-exposed job categories, the importance of human capital declines and the importance of price rises, suggesting a commoditization effect of AI on labor. Two additional findings support commoditization as a mechanism: The demand premium enjoyed by workers with strong human capital declines in more AI-exposed categories, and demand reallocates toward lower-priced workers. Our results have implications for the design of online labor markets, workers' incentives to invest in human capital, and labor welfare.
Total part-time measures produce larger downward adjustments than involuntary part-time alone, with Japan averaging 2.7 percent.
abstractclick to expand
This paper extends the sufficient-statistics formula for efficient unemployment developed by Michaillat and Saez (2021) to account for part-time employment. I introduce two additional sufficient statistics that measure the share of part-time employment and part-time hours relative to full-time hours. Applying the framework to the United States (1951-2026) and Japan (1970-2025), I compare the effects of total part-time employment and involuntary part-time employment on efficient unemployment. Total part-time employment has substantially larger effects than involuntary part-time employment. While involuntary part-time employment provides information about labor-market slack, the main change in efficient unemployment comes from part-time work itself because part-time workers supply fewer market hours than full-time workers. Under the total part-time calibration, efficient unemployment averages 4.7 percent in the United States before COVID and 4.2 percent after COVID. In the Japanese application, the full-sample average is 2.7 percent. The distinction is especially important in Japan, where part-time employment is widespread and often reflects flexible work arrangements. These findings suggest that aggregate labor input, rather than involuntary part-time employment alone, is an important determinant of labor-market efficiency.
We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability.
AI augmentation breaks the accounting link between labor time and productive contribution, yet firms continue to evaluate talent through time-based overhead bundles. This paper develops a forecasting framework for the transition from time-based talent accounting to output-based talent ROI in the human-AI era. The framework centres on Theorem 3 (ROI Inversion at {\tau}*) as the empirical spine, with four mechanism theorems: overhead non-additivity, augmentation-saved-time pathways, innovation-premium amplification, and human-AI dyad attribution uncertainty. Korea's staged 52-hour workweek mandate provides an empirical early-warning case. In a DART panel of 365 listed firms (2,281 firm-year observations), the SG&A-to-revenue ratio rose from 18.26 percent in 2018 to 20.06 percent in 2020, corrected mildly in 2021-2022, and peaked at 20.10 percent in 2024. Under the revenue-percentile cohort proxy, two-way fixed effects (+1.56 pp, p = 0.049), pooled event-study estimates (+4.21 pp at t = +3, p = 0.001), and Callaway-Sant'Anna doubly-robust staggered DiD estimates (+4.51 pp at t = +4) converge on a positive overhead-pressure signature. A 2015-2017 backward extension (224 firms, 601 observations) supplies pre-treatment data, providing evidence against pre-existing upward-trend confounds. We read the Korean evidence not as a direct {\tau}* estimate or a point causal magnitude, but as, to our knowledge, the first empirically documented signature of the pre-{\tau} overhead-pressure regime, where time-based accounting still dominates while AI augmentation and labor-time compression jointly raise overhead. Output-based firms are forecast to outperform time-based peers by 1.5-2.0 percentage points in firm-level TFP growth by 2032. The contribution is a forecasting model and managerial planning tool for the shift to AI-augmented talent ROI accounting.
Why does massive AI investment fail to generate commensurate productivity gains? We argue the paradox is theoretically generated: prevailing production function frameworks encounter a structural boundary by treating AI as a separable factor of production without modeling the cognitive mediation through which AI generates productive value. This directs investment toward deployment when productivity requires prior development of what we term convergence capacity (C). We propose the Intellectually Converged Human (ICH) framework, a fifth-stage framework for production function theory: H-hat = H[1 + phi(A,C)], where effective productive capacity equals human capital (H) scaled by an augmentation factor [1 + phi], with phi jointly determined by AI utilization intensity (A) and convergence capacity (C), a four-dimensional cognitive construct encompassing embodied understanding, metacognition, temporal integration, and integrative thinking. The production function Y = F(K, H-hat) provides a human-centered mechanism for Solow's TFP residual: A_Solow = [1 + phi(A,C)]^(1-alpha).
The framework predicts three augmentation regimes with distinct policy implications. Descriptive cross-national analysis of 20 OECD economies shows the AIxC interaction is associated with 86% of TFP variance versus 31% for AI alone, a pattern-consistent finding in the small-n theoretical tradition. South Korea exemplifies national-scale under-augmentation: high H, substantial A, low C produce phi = 0. We distinguish convergence capacity from adjacent constructs, absorptive capacity, dynamic capability, and human capital, and demonstrate that C constitutes the specific cognitive mediator that prior frameworks have left implicit. We derive C-first policy prescriptions and offer three empirically
testable propositions with a falsifiable 10-year forecast.
We estimate that data centers caused average retail electricity rates to fall modestly in the United States from 2015 to 2024 using an instrumental variables approach. Despite prevailing sentiment, the finding is consistent with economic reasoning: existing large power system fixed costs, economies of scale in transmission and distribution, and declining unit costs for generation imply that durable demand growth lowers average prices. We find patterns of economies of scale for transmission, distribution, and generation costs as well as within and across retail customer classes. We caution that future supply constraints could reverse the effect.
Grassroots platforms keep ownership with users through cryptographic signatures instead of corporate servers.
abstractclick to expand
Legal precedents protect computer code as copyrightable expression. They have enabled centralized digital platforms -- operating from corporate servers that hold all user data -- to construct private governance regimes through the interaction of copyright, contract, and technical architecture: people who create virtually all platform value must surrender effective copyright control through Terms of Service agreements as a condition of participation.
In contrast, grassroots platforms consist of cryptographically-identified people operating their networked smartphones independently of any server or global resource; each person holds their own data on their own device, with no third party in possession or intermediation. Here, we define the notion of a digital speech act -- a deliberate volitional act by a person of cryptographically signing personal content with the person's private key, carried out on the person's own device -- through which the person simultaneously establishes attribution, accountability, and authorship over the signed content. We contend that (i) digital speech acts qualify for copyright protection under existing U.S. precedent: Burrow-Giles locates authorship in volitional creative choices despite mechanical or algorithmic processes, Feist supplies the minimal-creativity threshold, and persistent device storage satisfies the Copyright Act's fixation requirement; (ii) the digital social contract underlying grassroots platforms preserves this copyright by design -- signed content cannot be unbundled from its signature, and the full provenance chain accumulates as content is forwarded -- so that copyright ownership and physical possession of authenticated digital expressions coalesce in the person; and (iii) this coalescence of legal ownership and physical possession provides the foundations for digital sovereignty and democratic self-governance.
This paper presents a reproducible synthetic benchmark comparing a computational planner, an agent-based market, and a hybrid meta-market within a common simulated economy. The benchmark incorporates input-output production networks, heterogeneous firms, capacity constraints, endogenous prices, welfare metrics, structural shocks, adversarial stress testing, and information-reporting experiments. Across training, holdout, and adversarial scenarios, the planner consistently achieves lower welfare losses than the decentralized alternatives.
The main contribution is methodological rather than ideological. While the benchmark demonstrates a falsifiable framework for comparing economic coordination mechanisms, it does not establish the empirical superiority of planning. Several design choices mechanically favor the planner, including informational asymmetries, incomplete market representation, and simplified institutional assumptions. The results should therefore be interpreted as validation of a synthetic experimental architecture and as a prototype for future research. The paper concludes by outlining a validation agenda based on empirical calibration, structural holdouts, sensitivity analysis, uncertainty quantification, mechanism-design tests, and independent replication.
Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.
This paper studies the macroeconomic dynamics of climate policy in a multi-sector dynamic general equilibrium model with renewable and non-renewable energy, sector-specific capital adjustment frictions, household energy demand, and endogenous fossil resource dynamics. The central mechanism is that decarbonization requires reallocating energy use and installed capital: fossil energy demand can contract immediately, while renewable capacity and abatement adjust only gradually. The analysis delivers four results. First, gradual policy implementation sharply reduces transition costs: relative to immediate implementation, gradual emissions caps improve welfare by 2.26 percentage points under comprehensive regulation and by 5.06 percentage points under firm-only regulation. Second, renewable energy subsidies and non-renewable energy taxes support renewable capital accumulation and reduce, but do not eliminate, the welfare cost of front-loaded tightening. Third, sectoral coverage changes the welfare ranking across implementation speeds. Firm-only regulation performs better under gradual implementation because it shields utility-relevant household energy services, but becomes nearly as costly as the carbon-price-only transition under immediate implementation. Fourth, endogenous fossil exploration and stock-dependent extraction costs transmit climate policy into lower extraction, fewer discoveries, and a declining shadow value of reserves, providing a structural mechanism for stranded fossil assets. The results show that deep decarbonization can be achieved at substantially lower macroeconomic cost when policy manages the speed and incidence of energy-capital reallocation.
Florida stadium data shows faster driving after disappointing home-team defeats in tight games, but not after wins or NBA contests.
abstractclick to expand
Using average vehicle speed data in 10-minute increments at the Traffic Message Channel (TMC) location level, along with precise crash timing and location information, we analyze driving behavior around five Florida stadiums before and after NFL and NBA regular season games from 2015 to 2019. We find no evidence of emotional driving following NBA games, but strong and consistent effects following NFL games, concentrated in predicted-close games that end in disappointing home-team losses -- combining high pre-game suspense with negative outcome valence. These games are associated with significant increases in average vehicle speed within 3 km of stadiums during the first post-game hour, dissipating with increasing time and distance from the stadium. Average vehicle speed increases by up to 3 mph relative to predicted-close games that ended in a win -- an effect several times larger than the typical game day versus non-game day speed differential. Overall, our results highlight how the combination of sustained suspense and negative outcome valence in close sporting contests can spill over into risky post-game driving behavior, underscoring the behavioral and public safety implications of affective cues in large-scale sporting events.
Spanish firm data shows many listed intermediaries are manufacturer extensions or integrated firms, not independent traders.
abstractclick to expand
Previous studies conclude that intermediaries account for a large share of exports. Using Spanish firm-level data, we show that many firms classified as intermediaries are either manufacturer-owned export arms that ship their parent firms' products or vertically integrated firms that control design, production, and distribution and predominantly export goods sold under their own brands. Once we exclude these export arms and vertically integrated firms, the share of intermediaries in exports in our sample falls by about 70%. We also show that pure intermediaries differ markedly from export arms and vertically integrated firms along key firm and export dimensions.
Manufacturing skill and commercial skill together determine whether firms export directly, indirectly, as intermediaries, or both.
abstractclick to expand
Some firms export their own products directly, others rely on intermediary firms to export on their behalf, and still others both export their own products and intermediate exports for other producers. To explain this heterogeneity, we develop a model in which firms differ along two dimensions: manufacturing capability and commercial capability. Manufacturing capability lowers the marginal cost of producing a variety, whereas commercial capability lowers the variable cost of reaching foreign customers. Different combinations of these capabilities generate the different types of firms observed in export markets: direct exporters, indirect exporters, pure intermediaries, and hybrid firms. The model predicts that commercially capable intermediaries are matched with more manufacturing-capable producers, and that more commercially capable intermediaries export a broader set of varieties. We provide suggestive evidence for these predictions using Spanish firm-level export data.
Survey data across 63 countries links chronic seismic threat to stronger in-group orientation, conditional on religious alignment with the s
abstractclick to expand
This paper studies how long-run earthquake risk shapes national identity, separating a distributive margin (national membership as a rule for allocating scarce resources) from an expressive margin (pride, willingness to fight, and affective attachment). Linking World Values Survey respondents (1981-2022; 63 countries, 494 subnational regions) to subnational seismic-risk geography, I find that people living closer to high-risk zones express stronger national in-group orientation: more pride, more willingness to fight, and more priority for nationals when jobs are scarce. Family attachment and out-group hostility do not rise, while religiosity increases in parallel. The expressive margin is conditional: the pride response is pronounced where state-religion alignment and a cohesive religious field lend the symbolic infrastructure to cast disaster as a shared national ordeal, and indistinguishable from zero where they do not. A complementary design exploiting earthquakes between adjacent survey waves finds no average short-run response, yet the response it does detect concentrates among older, place-attached residents who cannot leave -- consistent with attitudes tracking a chronic, inescapable risk rather than single events. Together, the results point to a demand-side origin of national attachment: where a covariate shock would overwhelm local and family insurance, people turn to larger communities of protection and meaning -- the nation and religion -- a logic I formalize in a simple social-interaction model.
Large language models (LLMs) are increasingly deployed as autonomous agents that make consumption decisions on behalf of users. This shift raises fundamental questions for consumer theory, which has traditionally modeled humans as the primary decision-makers. In this paper, we introduce LLM Consumer Behavior Theory, a new field of study concerned with analyzing consumer behavior in agentic markets. Drawing on classical and behavioral economics alongside recent advances in Natural Language Processing, we formalize how human preferences are reflected and acted upon by LLM-based agents, and how agent-level decisions aggregate into market demand. We unify previously fragmented literature on LLM decision-making, human behavior simulation, and preference elicitation under a common economic lens, highlighting where assumptions, such as rationality and heterogeneity, may fail in agentic markets. Rather than providing empirical validation, this paper outlines the scope of LLM consumer behavior and identifies open research questions related to alignment, preference representation, and market dynamics.
According to the recent Wealth Thermalization Hypothesis (WTH) the wealth inequality in the world is described by the Rayleigh-Jeans (RJ) thermal distribution of interacting agents in a society with social stratification. In this concept, the wealth layers of society are associated with energy levels from a nonlinear dynamical system conserving two integrals of motion being total energy and probability norm. This leads to RJ condensation and the formation of a huge poverty phase of low wealth and a tiny oligarchic phase that captures a main part of total society wealth. This RJ phenomenon has similarities with self cleaning in multimode optical fibers and constraint driven condensation in various physical systems. We analyze real Lorenz and Pareto curves for wealth of households in countries and the world, Gross Domestic Product of countries, market capitalization of companies at stock exchange of Hong Kong, Shanghai, London, bitcoin transactions, world trade between countries and show that the WTH theory gives a good description of these curves. On the basis of this comparison we argue that the RJ thermal distribution provides a universal description of wealth inequality in the world.
Despite near-universal electrification in many countries, electricity supply shortages continue to shape household energy use. This paper examines how households adapt to chronic grid failure in high-electrification, high-dependence contexts, using Lebanon as a case study. Drawing on original survey data from 1,000 households, we analyze both supply-side coping mechanisms such as diesel generators and solar photovoltaic (PV)-battery systems, and demand-side adaptations, including load shifting and demand suppression. The results reveal a landscape of household responses, where socioeconomic status plays a central role in determining access to backup solutions and the extent of met demand. While diesel generators remain widespread, a transition toward PV-battery systems is observed, especially among financially capable households. However, decentralized self-generation is associated with inefficiencies, including substantial levels of curtailed solar generation. On the demand side, households exhibit reductions in electricity use, leading to distinct consumption profiles depending on the type of backup system employed. These findings highlight the importance of distinguishing between met and unmet demand when assessing energy needs under unreliable supply. The paper contributes to the literature by providing a quantitative characterization of the interaction between self-generation and demand adaptation in a supply-constrained high-electrification context. It also offers empirical demand profiles that incorporate suppressed consumption, addressing a key gap in electricity system planning. From a policy perspective, the results underscore the need to account for unmet demand, address inequities in access to coping technologies, and reduce inefficiencies in decentralized systems.
New York City implemented the nation's first cordon-based congestion pricing program in January 2025, providing an opportunity to evaluate how system-wide urban mobility responds to large-scale pricing interventions. Because such policies generate spillovers across modes and locations, credible control groups are difficult to construct. We address this challenge using time series foundation models to generate probabilistic counterfactual demand forecasts with calibrated uncertainty. Applying this framework to bus, subway, and aggregate trip volume data, we find that post-policy bus and subway ridership increased significantly relative to expected no-policy demand, while overall travel demand decreased modestly. The effects are spatially heterogeneous: while reductions in overall travel demand are concentrated within the Congestion Relief Zone, transit gains extend beyond Manhattan's core. Socio-demographic analyses further reveal uneven adaptation across neighborhoods, highlighting spatial equity implications. Our framework provides a scalable approach for the uncertainty-aware evaluation of system-wide urban interventions when clean control groups are unavailable.
How should recommender systems be designed when recommendations shape access to scarce, short-lived opportunities? We study this question in a production setting: Timee, Japan's largest platform for spot work, where workers favorite job templates and receive notifications when firms post shifts from those templates. Maximizing predicted favoriting can generate misdirected concentration: recommendations accumulate on popular templates that create few viable job openings, while templates with unmet labor demand receive too little exposure. We design exposure-control mechanisms for favorite-list management, reallocating template exposure based on posting activity and unfilled capacity. The proposed recommender, thresholded eligibility control (TEC), is fully parallelizable and suitable for large-scale digital platforms. In simulations calibrated to Timee data, TEC raises the per-round job-finding rate from 57.6% to 70.0%. A prefecture-level randomized field experiment increases realized matches and exposure per active template, reduces the share of low-exposure templates, and improves impression-level favoriting and downstream matching.
Addressing open-texture yields necessary conditions based on the ontology of concrete objects.
abstractclick to expand
Private Property is one of the central institutions of civilized society. We first consider its social, legal, and economic aspects. We then follow the Lockean tradition by focusing on a specific procedural definition: Homesteading is the acquisitive act of first using an object that is initially unowned. The ontology of concrete objects and the nature of their uses determine how objects may be acquired. In this article, we address the open-texture problem in the definition of property, then provide the necessary conditions for an object to be property in the Lockean Scheme.
Climate policy in global network industries is implemented across fragmented jurisdictions, yet firms respond through integrated operational networks. We develop a two-stage game-theoretic framework to analyze how firm-level responses interact with alternative governance structures. Regulators first choose emissions charges. Firms subsequently compete through pricing, service capacity and capital deployment decisions. The analytical results demonstrate that uniform global regulation maximizes welfare in symmetric markets. However, in sufficiently asymmetric markets, a uniform global charge is dominated by decentralized regimes. Multiple regulatory instruments better accommodate region-specific market externalities. We apply this framework to a calibrated case study of North American, Western European and transatlantic aviation markets. The numerical results establish that a globally coordinated regulator setting region-specific charges achieves the highest aggregate welfare. These aggregate gains nonetheless mask substantial distributional disparities across jurisdictions. Effective climate governance in network industries therefore requires more than determining an efficient emissions charge. Policy instruments ought to accommodate regional heterogeneity and transfer mechanisms will be necessary to ensure efficient, politically stable cooperation.
Players pick signal strengths to produce contrary receiver assessments, and the model maps exactly onto a square game with full equilibrium
abstractclick to expand
A game of information concerns two players transmitting messages that are obscured by noise. A receiver digests the combination of the two information sources and makes an assessment rationally. The aim of the players is to generate opposing assessments for the receiver by choosing signal-to-noise ratios of their information. It is shown that this problem can be reduced into an elementary infinite game on the square, thus admitting a complete equilibrium solution. Three generalisations of the game are proposed.
Analysis of 6,047 Africa and Latin America contracts shows legibility, not public importance, drives which uncertainties become tradable.
abstractclick to expand
Prediction markets are usually evaluated after their contracts exist, by asking how well prices forecast outcomes. We study the prior institutional margin of market formation, asking which uncertainties become tradable contracts at all. Using an audited dataset of 6,047 Africa-topic and Latin America-topic contracts listed on Polymarket and Kalshi, we construct a coded measure of settlement legibility, the degree to which an uncertainty can be worded, sourced, and credibly resolved by third parties, and validate it on 451 units under a frozen codebook, where independent double scoring reaches ordinal reliabilities of 0.92 and 0.96 on the primary dimensions and blind human benchmarks reach 0.97 and 0.92. Using this measure, we find that formation is selective in ways that public importance does not explain, with African inventory concentrated overwhelmingly in football while salient civic events produce little or no inventory, and Latin American inventory deeper but dominated by Venezuela, where attention to prospective United States military action sustains the largest civic cluster in the data. Legibility orders the inventory steeply, with sports and elections near the top of the scale and conflict at the bottom. In a formation test against an externally assembled frame of 131 civic events, legibility predicts listing in the expected direction but falls short of pre-specified acceptance criteria, while among listed contracts the relation between legibility and trading value is negative, as a model of selective listing implies and as we predicted before estimation. Prediction-market inventories therefore measure what platforms can settle as much as what traders believe, and reading them as maps of public interest conflates the two.
This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when knowledge becomes stock-like, mobile across forms, scalable, governable, recombinable, and imperfectly visible in accounting. The book introduces knowledge-bearing stock as the central object and analyses how it is generated, converted into governable form, deployed, improved through feedback, enclosed or shared, measured, impaired, and used as input to future production. It distinguishes embodied, disembodied, institutionalized, commons, and public knowledge forms and develops concepts such as first conversion, cognitive enclosure, feedback capture, dark capital, and expected knowledge loss. The argument is conditional and testable: modern wealth depends not only on capital accumulation, but on how productive knowledge is governed.
This paper analyses the distinction between educational and skill types of labour mismatch and their association with earnings. Drawing on cross-sectional data for 26 countries from the 1st Cycle of the OECD (2012) Survey of Adult Skills (PIAAC), I examine educational and skill mismatch using a comprehensive set of education- and skill-based indicators, explore heterogeneity across worker characteristics, and investigate the sources of conflicting country-level correlations with earnings through an error components model. The results show that country-level unobserved heterogeneity induces endogeneity bias, with both its direction and magnitude varying across mismatch measures. Once unobserved heterogeneity is controlled for, over-education and over-skilling are associated with wage penalties, whereas under-education and under-skilling are linked to wage premiums. These findings highlight both conceptual and empirical distinctions between educational and skill mismatch and demonstrate the importance of indicator choice in the analysis.
Who is exposed to generative AI in a developing-country labour market? We map three occupational AI-exposure indices to India's redesigned Periodic Labour Force Survey (2025) and document a steep caste gradient among 83,000 employed graduates: graduates from the Scheduled Castes and the Scheduled Tribes are 0.24--0.37 standard deviations less exposed than upper-caste graduates within the same district. Two channels drive the gap: one in four SC and one in three ST graduates work in farm or elementary occupations untouched by AI, and those in white-collar work are underrepresented in managerial, software, and finance occupations. Because exposure commands a wage premium of up to 20 per cent, generative AI stands to widen, not narrow, India's caste earnings gap.
In the fall of 2020, neural-network methods produced a large improvement in chess engines that became freely and widely available. By the end of 2021, the monthly draw rate in classical chess had risen by about four percentage points, but the distribution of player ratings, which are commonly read as measures of playing strength, had changed little. Ratings, however, are a relative measure, built from results against other rated players rather than from an absolute scale of play quality, so an improvement shared broadly across players need not change their ratings. Using 3.9 million rated classical games from March 2015 to November 2023, we document that the increased draw rate remains after conditioning on both players' ratings, holds within repeated same-color matchups, is not a continuation of a pre-existing trend, and persists through the end of the sample. A linear transformation that maps post-Covid ratings to higher pre-Covid equivalents, with a larger gap at lower ratings, accounts for more than 90 percent of the post-minus-pre shift in the fitted draw, White-win, and Black-win probabilities. Players' ratings and ranks, by contrast, show no additional rank reshuffling and no general widening of within-group dispersion relative to the pre-Covid benchmark. We interpret these findings as consistent with adoption across rating levels, with larger rating-equivalent gains for lower-rated players.
Large language models (LLMs) are increasingly used for tasks once reserved for trained researchers, including hypothesis generation, specification choice, and drafting conclusions. We argue that the reliability of AI-assisted research depends not only on model capability, but also on how cognitive labour is structured between humans and machines. We study this problem through Human-in-the-Loop Economic Research (HLER), a decision architecture based on pre-commitment, decision sequencing, accountability, and attention allocation. In a pre-specified 2*4 factorial experiment with 280 complete research runs across four datasets, an unconstrained multi-agent baseline produced critical failures in 72% of runs. Using the same underlying model, the same agent decomposition, and identical prompts for the shared reasoning agents, HLER reduced the failure rate to 16% by imposing three architectural commitments: LLMs reason but do not execute data work, data and estimation are handled deterministically, and three human decision gates bind the workflow. Fisher's exact test rejects equality of failure rates at p<0.001. Reliability gains were largest on the least publicly represented dataset, a Qing-dynasty population register, consistent with a task-based production model with Frechet-distributed output quality. An 80-run ablation suggests that deterministic computation and human gates contribute independently, with exploratory evidence of complementarity. We interpret HLER as a research harness rather than an autonomous AI scientist: it sharply reduces failures, makes residual weaknesses more visible, and prevents unreliable claims from being advanced as publication-ready outputs.
Which regional exposure conclusions are identified when public data do not observe buyer-seller links across states? We study this question by treating the missing intermediate-input spatial kernel as an unknown coupling constrained by regional activity margins, support restrictions, and auxiliary shipment moments. For linear exposure statistics, the sharp identified set is computed by transportation linear programs. Applying the method to U.S. state-sector data, we find that shipment data are inconsistent with the spatial diffuseness implied by proportional regionalization in key goods sectors. However, they do not identify a unique regional production network or a precise ranking of state exposure to local shocks. Bilateral shipment restrictions tighten the bounds, but much of the remaining uncertainty comes from large service and mixed sectors that are weakly covered by goods-movement data. The results show which exposure conclusions are supported by public data and which are imposed by maintained regionalization assumptions.