q-bio — Pith

Top Pith

8

physics.bio-ph 2026-05-25 2 theorems

Lorentzian family is unique invariant under Riccati transport

by Hugues Berry (AISTROSIGHT), Leonardo Trujillo (AISTROSIGHT)

Geometric Origin of Exact Mean-Field Reductions: M{\"o}bius Symmetry and the Lorentzian Ansatz

Reformulating dynamics on the circle shows the Cauchy law is the sole rotation-invariant measure, unifying exact mean-field reductions.

abstract click to expand

Low-dimensional descriptions of large systems of coupled oscillators and spiking neurons rely heavily on the Lorentzian Ansatz. We show that its privileged role is geometric rather than heuristic: for the transport induced by Riccati dynamics, the Cauchy-Lorentz family indeed emerges as the unique connected two-dimensional family of continuous probability densities that is invariant under the induced projective transport. The key step of the demonstration is to reformulate the dynamics on the circle, where the problem reduces to the uniqueness of the rotation-invariant probability measure. Under stereographic projection, this yields the standard Cauchy law and, under the full projective action, the Lorentzian family. This result gives a unified geometric foundation for the Ott-Antonsen [Chaos 18, 037113 (2008)] and Montbri{\'o}-Paz{\'o}-Roxin [Phys. Rev. X 5, 021028 (2015)] reductions, explains the failure of Gaussian closures, and identifies the structural condition underlying exact two-parameter reductions.

0

Top Pith

1

math.PR 2026-04-30

Degree and distance contact rules slow epidemic growth

by Zylan Benjert, Júlia Komjáthy +3 more

Degree-dependent and distance-dependent contact rates interpolate between explosive, exponential and polynomial epidemic growth

Even mild dependencies shift spread from explosive to polynomial rates on networks with geometry.

abstract click to expand

It is a fundamental question in epidemiology to estimate, model and predict the growth rate of a pandemic. Analogously, analysing the diffusion of innovation, (fake) news, memes, and rumours is of key importance in the social sciences. The resulting epidemic growth curves can be classified according to their growth rates. These have been found to range from exponential to both faster super-exponential curves and slower subexponential or polynomial curves. Previous research has lacked a unified explanatory framework capable of accommodating super-exponential, (stretched) exponential, and polynomial growth patterns within the same contact network. In this paper we propose a simple agent-based network model that can capture all these phases. We provide such a framework by modelling how transmission rates depend on spatial distance and on individuals' numbers of contacts. By comparing the growth rate of spreading processes with or without degree-dependent and/or distance-dependent contact rates through data-driven and synthetic simulations on real and modelled networks with underlying geometry, we find evidence that even a 'sublinear presence' of these causes may cause a significant slow down of the growth rate on the same underlying network. We find that the growth rate is governed by a combination of three factors: geometry, the prevalence of weak ties, and superspreaders. We confirm our results with rigorous proofs in a theoretical model, using a spatial multiscale-argument in long-range heterogeneous first passage percolation. Our results give a plausible explanation of why the consecutive waves of a single pandemic can differ in their growth even if their spreading mechanisms are similar.

0

Top Pith

1

q-bio.QM 2026-05-06

Compartment-stratified features classify IBD at AUROC 0.96 under donor-aware splits

by Jonathan Muhire

Donor-Aware scRNA-seq Benchmarks for IBD Classification

Benchmark on two cohorts shows CFN edges linear CLR in colon while compartment labels remove spurious dependency instability.

abstract click to expand

Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency embeddings, and scVI variational autoencoder latent embeddings. The cohorts are the SCP259 ulcerative colitis atlas (UC vs. Healthy, n=30 donors, 51 cell types) and the Kong 2023 Crohn's disease atlas (CD vs. Healthy, n=71 donors, 55-68 cell types across three intestinal regions). Compartment-stratified CLR composition achieves AUROC 0.956 +/- 0.061 on SCP259; GatedStructuralCFN on the same features achieves 0.978 +/- 0.050. In the Kong cohort, CFN achieves its best performance in the colon region (0.960 +/- 0.055 after feature filtering), exceeding linear CLR (0.900 +/- 0.100), while terminal ileum classification is dominated by linear models (CatBoost CLR 0.967 +/- 0.075 vs. CFN 0.811 +/- 0.164). Cross-dataset transfer (CD->UC, four shared cell types) achieves AUC 0.833 with XGBoost CLR; the reverse direction performs at chance. CFN edge stability analysis shows that compartment-wise composition eliminates spurious unit-sum-induced instability present in global composition (Jaccard 0.026 vs. top-20 recurrence 1.0). CFN shows a consistent numerical advantage over linear models in the colon region of CD (AUROC 0.960 vs. 0.900), though no inter-method comparison reached statistical significance at n<=34 donors per region. Compartment-aware feature construction is critical for both classification performance and structural interpretability. Code: https://github.com/Jonathan-321/sfn-scrna-study

0

q-bio.GN 2026-07-03

Pipeline annotates mechanisms for 19,293 human genes from papers

by Matteo Di Bernardo, Iain M. Cheeseman

Affinage: genome-scale mechanistic gene annotation from the published literature

Affinage pulls direct experimental evidence to fill gaps where UniProt entries are empty or minimal.

abstract click to expand

Understanding the mechanistic function of a gene is a critical starting point for biology. However, for much of the human proteome that knowledge is scattered across thousands of primary papers or remains poorly established, while the curated databases biologists rely on can lag years behind recent literature. Large language models can now read and synthesize that literature on demand, but doing so faithfully for many genes is an expensive, non-reproducible retrieval session that does not scale across users. Here, we present Affinage, an LLM pipeline that performs this retrieval and mechanistic reasoning once per gene--from the primary literature alone--and stores the result as a reusable, structured annotation. A biologist-designed reading pass extracts only direct experimental evidence, and a synthesis pass reasons over those findings alone. Applied across the genome, Affinage annotates 19,293 human protein-coding genes. This analysis provides mechanism for thousands of genes whose UniProt function is empty or a stub, beating the curated reference on 99.1% of head-to-head genes as scored by a cross-family LLM judge. Affinage also delineates the 10% of the proteome that remains mechanistically uncharacterized and will serve as a continuously-updated, literature-grounded census of gene function. All records are released openly at https://affinage.wi.mit.edu . More broadly, Affinage serves as an example of how domain experts can encode their expertise into scalable LLM pipelines to improve the publicly available data that guides biological hypotheses and experimentation.

0

q-bio.QM 2026-07-03

Graph-structured kernels improve omics classification

by Yue Zhang, Nandini Amit Gadhia +2 more

Structured Gaussian Processes for Uncertainty-Aware Classification of High-Dimensional, Small-Sampled Omics Data

Pathway propagation inside Gaussian process kernels captures both measurements and topology while reporting prediction confidence on small m

abstract click to expand

Classifying heterogeneous omics data remains a fundamental challenge in computational biology, particularly in high-dimensional, small-sample settings where nonlinear interactions dominate and class imbalance further complicates reliable prediction of minority phenotypes. While traditional kernel methods rely on feature abundance, they fail to leverage the known interaction landscapes of biological systems. In this work, we propose a structured Gaussian process classification framework that integrates graph-encoded biological pathways directly into the kernel construction. By propagating information along known interaction networks and combining this with abundance-derived features, the resulting classifier captures both quantitative measurements and topological context. We benchmark our proposed methodology on three publicly available gut and fecal microbiome datasets. To address severe class imbalance, we evaluate complementary strategies, including data-level resampling, threshold calibration, and confusion-matrix-based adjustments, and report minority-class performance alongside accuracy. The hybrid approach yields a performance gain over unstructured baselines and matches the performance of established benchmarks for similar datasets. Furthermore, the probabilistic nature of the framework naturally provides calibrated predictive uncertainty, enabling robust differentiation between confident predictions and ambiguous samples.

0

q-bio.OT 2026-07-03

Positive cues raise dog approaches to humans

by Srijaya Nandi, Dipanjan Roy +3 more

Operant Conditioning in Indian Free-Ranging Dogs: Effects of Positive and Threatening Cues on Sociability

Five-day experiments show learned sociability changes partly carry over to new people only after positive experiences.

abstract click to expand

Sociability toward humans is a key adaptive trait in free-ranging dogs, enabling them to access resources while navigating risks associated with human interactions. In this study, we investigated whether operant conditioning shapes sociability in Indian free-ranging dogs and whether learned responses generalize to unfamiliar individuals. We experimentally exposed 58 dog groups to either positive or a threatening cue over five consecutive days and assessed their behaviour using approach proportion, approach latency, and demeanor across repeated interactions with a familiar experimenter, followed by a test with an unfamiliar individual. Using Bayesian generalized linear mixed models, we found that cue type and repeated exposure significantly influenced sociability. Dogs exposed to a positive cue showed increased approach behaviour and reduced approach latency over time, along with increased affiliative demeanor. In contrast, dogs exposed to threatening cues exhibited reduced approach behaviour, increased approach latency, and a shift toward neutral and less affiliative responses across days. Importantly, positive cues partially generalized across individuals, as dogs showed increased approach toward an unfamiliar experimenter, although this was accompanied by hesitation to approach. In contrast, threatening cues did not generalize in the same way; dogs did not reduce their approach toward unfamiliar individuals but displayed increased approach latency, indicating heightened caution. Our findings demonstrate that operant conditioning plays a crucial role in shaping dog-human interactions, with asymmetric generalization of positive and threatening experiences.

0

cs.CV 2026-07-03

MolSight adds bond adjacency to vision tokens for molecular images

by Wenda Wang, Yihan Tong +2 more

MolSight: A Graph-Aware Vision-Language Model for Unified Chemical Image Understanding

Two modules for topology and grounding let the model outperform prior VLMs and chemistry tools on visual tasks.

abstract click to expand

Using molecular large language models (LLMs) as a unified framework for understanding molecular structures and functions is emerging as a new trend in tasks such as molecular design and drug discovery. However, these models struggle to fully capture the visual representation of molecular structures, limiting their potential. While existing molecular vision-language models (VLMs) show promise, they still face challenges in structural alignment and lack the necessary topological modeling for accurate molecular understanding. To address this, we propose MolSight, a graph-aware vision-language model framework designed to enhance the understanding of molecular images by VLMs. MolSight integrates a Molecular Topology Module to inject chemical-bond adjacency information into vision tokens, and a Molecular Grounding Module to align visual features with chemical symbolic semantics. Our experiments demonstrate that MolSight significantly outperforms existing VLMs, molecular LLMs, and specialized tools across multiple chemical visual understanding tasks, achieving a new level of molecular image reasoning.

0

cs.CV 2026-07-03

3D models cut plant reconstruction time from 6.5 minutes to 1.6 seconds

by Hanyue Jia, Wei Zhou +4 more

The Turning Point of 3D Plant Phenotyping: 3D Foundation Models Enable Minute-to-Second Cross-Crop Reconstruction and Beyond

Foundation models enable fast cross-crop 3D phenotyping from sparse smartphone views while preserving accuracy.

abstract click to expand

3D plant phenotyping is notoriously known to be procedure-complicated and of low throughput due to the extensive multi-view imaging, the fragile 3D reconstruction pipeline, and the additional cost from reconstructed geometry to phenotypic extraction. These limitations are further amplified in low-cost data acquisition, where smartphone videos or sparsely sampled multi-view images provide limited view overlap and self-occlusion. In this work, we show that the conventional 3D plant phenotyping pipeline could be streamlined and significantly accelerated with 3D Foundation Models (3DFMs), and particularly, present one of the first cross-crop 3D phenotyping frameworks powered by 3DFMs. The framework replaces COLMAP-style sparse initialization with 3DFM-based feed-forward geometric recovery, combines geometry-constrained 3D Gaussian Splatting for dense reconstruction, enables few-view reconstruction through iterative view synthesis and refinement, and converts reconstructed geometry into measurable organs through 2D-to-3D semantic transfer, metric scale recovery, and organ instance separation. We further construct a cross-crop dataset with smartphone-based image acquisition, diverse plant morphologies, and manual annotations for segmentation and phenotypic evaluation. Experiments across 26 plant sequences show that 3D Foundation Models reduce the average reconstruction time from 6.52 minutes to 1.58 seconds while maintaining high reconstruction quality and phenotyping accuracy. These results suggest a fresh technical route for high-throughput 3D plant phenotyping, from low-cost image acquisition to fast reconstruction, perception, scale recovery, and phenotypic measurement.

0

q-bio.QM 2026-07-03

Point source restores identifiability of spatial dynamics from snapshots

by Rujie Gu, Ray Zirui Zhang +1 more

Identifiability Limits of Physics-Informed Inference for Spatial Stochastic Dynamics from Static Snapshots

Distributed sources cannot be uniquely recovered from static patterns, but a transcription site allows physics-informed methods to infer the

abstract click to expand

Despite increasing scale and resolution, many biological measurements remain destructive, revealing only spatial information rather than the dynamics it encodes. By combining flexible representations with mechanistic constraints, physics-informed machine learning offers a promising route to inferring these dynamics from static snapshots. Motivated by subcellular imaging of gene expression, we ask when a static spatial pattern of molecules can identify spatially varying diffusivity, creation, destruction, and boundary exchange, and how different inference schemes perform on the task. A structural identifiability analysis shows that distributed sources are non-identifiable, whereas a point source such as a transcription site can restore identifiability. These limits are further shaped by seemingly innocuous modeling choices: the boundary conditions, the spatial regularity of the underlying dynamics, and even the stochastic calculus convention. We then adapt several physics-informed schemes, differing in how they represent the solution and enforce the governing equations, and demonstrate effective inference from a single snapshot. Physics-informed approaches can thus recover spatial heterogeneities of biological dynamics from static data, but their use should be accompanied and guided by careful identifiability analysis for meaningful interpretation of the results.

0

q-bio.CB 2026-07-03

Framework forecasts glycosylation under ammonia stress

by Yuming Zeng, Sarah W. Harcum +2 more

GlycoMAC: A Multiscale Metabolic-Glycosylation Framework for Predicting Glycosylation Across Conditions in Mammalian Cell Cultures

It connects single-cell metabolic states to population outcomes for better prediction of antibody quality attributes.

abstract click to expand

Antibody productivity and glycosylation quality in CHO cultures arise from a dynamically evolving metabolic environment, yet models often work in isolation or at a single scale. Here, we present a multiscale mechanistic framework linking molecular, cellular, and process levels to predict how inputs shape bioprocess trajectories. The framework is grounded on a single-cell kinetic model that couples metabolic and glycosylation networks governing yield and critical quality attributes (CQAs). A stochastic single-cell model describes environment-dependent transitions among growth, production, and decline, capturing population heterogeneity. We further introduce cumulative variation in the oxygen uptake rate, integrating total metabolic adjustment over time, as a compact biomarker for predicting metabolic shifts. Unlike population-averaged approaches, the model propagates cell-resolved metabolic states (including ammonia-regulated Golgi pH, nucleotide sugar availability, manganese cofactors, and synthesis rates) into glycan processing. The framework was evaluated using CHO-K1 fed-batch cultures producing VRC01 IgG1 under targeted ammonia stress, matched control conditions, and a pyramid-feeding strategy with tighter control. It accurately predicts trajectories of cell density, metabolites, productivity, and glycosylation, including increased G0F and reduced galactosylation under ammonia stress, and quantifies how metabolic heterogeneity drives variability in productivity and CQAs. This work provides a unified foundation for predictive biomanufacturing and advanced process control.

0

cs.SE 2026-07-02

Brain model signals show no link to YouTube replays

by Barada Sahu, Shivesh Pandey

A global predicted-fMRI drive signal from TRIBE does not predict YouTube replay heatmaps

Predicted fMRI engagement curve correlates near zero with re-watch heatmaps on 48 videos and does not beat simple baselines.

abstract click to expand

Deep multimodal brain-encoding models now predict fMRI responses to naturalistic video with high accuracy. Whether their predicted neural signals also forecast behavioral engagement is unknown. We run TRIBE, the winning model of the 2025 Algonauts brain-encoding challenge (Llama-3.2 + V-JEPA2 + Wav2Vec-BERT), on 48 YouTube videos and reduce its predicted cortical response to a per-second engagement curve, the global field power. Correlated against each video's "most replayed" heatmap, a passively-collected proxy for which moments viewers return to, the curve shows no evidence of predicting re-watch behavior. The pooled position-controlled partial correlation is +0.058 (95% CI [-0.04, 0.15]; one-sample t(47)=1.21, p=0.23), indistinguishable from zero and not significantly above simple loudness and motion baselines (loudness +0.04, paired p=0.74). The raw correlation is also near zero; the moderate values reported for music videos reflect a genre-specific intro/onset-replay artifact rather than content prediction, and do not generalize. The null holds across six cortical-network readouts and under an autocorrelation-preserving permutation test. We release the code, the video-ID manifest, and an acquisition method that works despite YouTube's SABR-only streaming.

0

physics.chem-ph 2026-07-02

NNPs reach near-chemical accuracy on 545-atom enzyme clusters with under 1,000 points

by Weiliang Luo, Heather J. Kulik

Enerzyme: A Framework for Efficient Training of Reactive Neural Network Potentials for Enzyme Catalysis with Application to Methyltransferases

Methyltransferase reaction energetics and transition states match DFT after training on system-specific reactive data.

abstract click to expand

Quantum mechanical (QM) cluster models provide an effective framework for mechanistic studies of enzymatic reactions but remain computationally demanding. Neural network potentials (NNPs) offer a promising route to reduce this cost, but enzymes present challenges beyond small molecules, including large system sizes, implicit-solvent environments, substantial polarization, and charge transfer. Here, we present an integrated software framework for efficient NNP training for mechanistic studies of enzymes, demonstrated on QM cluster models of S-adenosyl-L-methionine-dependent methyltransferases (MTases). Our Enerzyme code introduces modular electrostatics-aware NNP architectures and combines automated QM-cluster construction with reactive dataset generation. The Enerzymette subpackage automates reaction pathway exploration at both NNP and DFT levels. We show that iterative flexible scans and nudged elastic band calculations impose stricter requirements on NNPs than conventional dataset metrics. Nevertheless, NNPs trained on fewer than 1,000 system-specific datapoints reproduce reaction energetics and transition-state structures for MTase clusters containing up to 545 atoms with near-chemical accuracy. Direct supervision of atomic charges and consistent dielectric screening substantially improve simulation stability and accuracy, while multitask-learned atomic charges capture charge transfer and polarization trends and provide chemically meaningful descriptors of reactivity. Finally, transferability across chemically diverse catechol O-methyltransferase substrates indicates that NNPs learn generalizable reactivity patterns as training data expand across multiple enzymes. Together, these results establish a foundation for accelerating enzyme mechanistic studies and guide future NNP development for biomolecular reactivity.

0

cs.LG 2026-07-02

New model raises CNS tumor classification accuracy from 82% to 86%

by Paulo R. Ferreira Jr., Lucas Coutinho Freitas +5 more

A Novel Machine Learning Approach for Central Nervous System Tumor Classification from DNA Methylation

Sparse projection and logistic regression improve on prior reference on an independent set of 1,104 clinical samples at both class and famil

abstract click to expand

NA methylation profiling has become a powerful approach for central nervous system (CNS) tumor classification, yet important challenges remain regarding cross-cohort transferability, methodological correctness, and robust multiclass evaluation. In this work, we propose a novel and methodologically rigorous machine-learning approach for methylation-based CNS tumor classification that combines Sparse Random Projection for dimensionality reduction with multinomial logistic regression for classification. We evaluate the proposed approach in the same general experimental setting established by a widely used reference classifier. On the 2,801-sample reference cohort, our method achieves a mean accuracy of 96\% under stratified 3-fold cross-validation. On the independent 1,104-sample clinical evaluation cohort, it reaches 86\% accuracy at the 91-class level and 93\% when predictions are evaluated at the methylation class family level. These results improve upon the corresponding state-of-the-art reference figures of 82\% class-level concordance and 88\% family-level concordance, yielding absolute gains of approximately 4 and 5 percentage points, respectively. This improvement is clinically relevant: in a diagnostic setting, a 5-point increase in correct tumor classification can directly affect cancer subtype assignment and, in turn, influence treatment selection and downstream clinical decision-making. Our results show that the proposed model, grounded in stronger methodological practice in machine learning, consistently outperforms the previous state of the art across evaluation settings and can materially improve the reliability of CNS tumor classification.

0

q-bio.PE 2026-07-02

Factor-two peak approximation holds under Erlang scaling in multistage SIR

by Denis Tverskoi, Andrew Gothard +1 more

Approximating Peak Prevalence in Multistage SIR Epidemics

The delay limit expresses prevalence and weighted stages as moving averages of incidence, showing when the approximation is accurate and how

abstract click to expand

Estimating peak prevalence is a central problem in epidemic modeling because it determines the period of greatest infectious burden and is closely linked to health-care demand. In multistage SIR models, however, peak prevalence is generally less tractable than in the classical model with exponentially distributed infectious periods. Motivated by the use of weighted infectious-stage aggregates as surrogates for prevalence, we investigate the relationship between the prevalence peak and the maximum of a weighted stage functional in deterministic SI$(k)$R epidemic models. We show that this relationship depends critically on how the stage-progression rate is scaled as the number of infectious stages increases. Under naive scaling, in which the progression rate remains fixed, the weighted peak is asymptotically equivalent to the prevalence peak and the commonly used factor-two approximation fails. Under Erlang scaling, which preserves the mean infectious period, the multistage model converges to a delay formulation in which prevalence and the weighted stage functional become unweighted and triangularly weighted moving averages of incidence. This limiting representation provides a theoretical basis for the factor-two approximation and identifies the regimes in which it is accurate. It also explains why this approximation deteriorates as epidemic waves become more sharply peaked. We derive analytical error bounds and develop curvature-based and parameter-based corrections that substantially improve accuracy. Numerical studies confirm these improvements across a broad range of epidemiological parameters. Overall, the results show when weighted-stage peaks can be used reliably as proxies for peak prevalence and how the resulting estimates can be refined when the standard approximation loses accuracy.

0

q-bio.PE 2026-07-02

Immune history stabilizes recurrent variant epidemics

by Ryuichi Kumata, Yuma Fujimoto +2 more

Immune history shapes recurrent epidemics of antigenically related variants

Recurrence map shows equal-sized waves are stable but size peaks at moderate transmission due to cross-immunity

abstract click to expand

Population immunity carried over from past epidemics of an antigenically variable pathogen influences the epidemic of new variants based on their antigenic similarity to the previous ones. We develop a recurrent SIR model where a population faces sequential, antigenically related variants. The model yields a recurrence map for the population susceptibility to successive variants under the assumption of status-based population immunity. The model reveals that stable, equal-sized recurrent epidemics occur across broad parameter ranges, but can be destabilized when transmission is strong and antigenic escape is limited, leading to period-2 or more, or even more complex epidemic dynamics. Epidemic size is maximized at an intermediate basic reproduction number: higher transmissibility boosts immediate infection but also enhances cross-immunity, reducing future susceptibility of the population. Our results clarify how immune history shapes recurrent epidemics and why success in one wave does not ensure larger future epidemics.

0

q-bio.BM 2026-07-02

Algebraic descriptors raise B-factor accuracy 34.5% over GNM

by Honghao Zhang, Hongsong Feng

Commutative Algebra Learning for Protein Flexibility Analysis

Localized commutative-algebra features at multiple scales enable both improved prediction on 364 proteins and blind cross-protein models.

abstract click to expand

Protein flexibility, commonly quantified by B-factors, is closely related to protein structure and function. However, accurate B-factor prediction remains challenging due to the multiscale nature of protein structures and the complexity of atomic interactions. In this work, we propose a commutative algebra-based learning framework, termed CAL, for protein B-factor prediction. Unlike many biomolecular prediction tasks that rely primarily on global structural representations, B-factor prediction requires an accurate characterization of the local geometric environments surrounding individual atoms. To address this challenge, CAL employs commutative algebra theory to construct localized algebraic descriptors at multiple spatial scales. On a benchmark dataset of 364 proteins, CAL improves prediction accuracy by 34.5\% over the classical Gaussian network model (GNM). Extensive experiments demonstrate that CAL achieves robust and consistent performance across diverse datasets and is competitive with existing state-of-the-art methods. Furthermore, by integrating CAL with machine learning, we develop a blind prediction model capable of cross-protein B-factor prediction. Overall, CAL provides an effective, efficient, and mathematically principled framework for protein flexibility prediction and offers a powerful approach for analyzing and predicting localized structural properties in complex biomolecular systems.

0

q-bio.NC 2026-07-02

Toolkit unifies neuron selectivity with population manifold analysis

by Nikita Pospelov, Viktor Plusnin +8 more

DRIADA: A Python Toolkit for Cross-Scale Analysis of Single-Neuron Selectivity and Population Dynamics

DRIADA's shared model runs selectivity tests and dimensionality reduction on the same aligned recordings, recovering spatial structure in mo

abstract click to expand

Brain activity spans single-neuron, population, and network levels, and core questions in neural coding require moving between them. Yet current tools target a single paradigm and incompatible data formats, leaving cross-level questions hard to address. We present DRIADA, an open-source Python framework that unifies neural signals and time-aligned behavior in a shared data model, so selectivity testing, dimensionality reduction, and network analysis operate within a unified workflow. We evaluate it on synthetic data with known ground truth, hippocampal calcium imaging from 13~mice in an open field, and a simulated toroidal attractor network. In the hippocampal data, selectivity-based filtering restored a two-dimensional spatial embedding from a collapsed all-neuron embedding, while reverse analysis showed that ${\sim}57\%$ of neurons informative about leading manifold dimensions were not selective to any of the 11 measured behavioral features. On the toroidal benchmark, four independent modules recovered the expected topology. DRIADA makes cross-scale analysis routine across calcium imaging, spike trains, and simulated networks.

0

q-bio.PE 2026-07-02

Birth-regulated species fixate more often despite neutral mean growth

by Yunbei Pan, Tom Chou

Effective population sizes for asymmetrically regulated birth-death processes

How regulation splits between birth suppression and death elevation biases stochastic outcomes even when deterministic rates match.

abstract click to expand

In multispecies birth-death processes, how population regulation -- through suppressed replication, elevated mortality, or both -- affects macroscopic stochastic dynamics has escaped detailed analysis. Here, we show that the distribution of regulation mechanisms can be invisible in deterministic or mean-field dynamics but play a significant role in the diffusive evolution of population frequencies. By introducing a tunable regulation partitioning parameter $\alpha_i$ and projecting a $d$-species birth-death process onto a $(d{-}1)$-dimensional Moran process, we find a regulation-mechanism-dependent diffusion tensor. For the simple two-species case, we derive exact fixation times and probabilities to show how different regulation mechanisms stochastically favors a more birth-regulated species, even under complete deterministic neutrality. Our model also allows us to define an $\alpha$-dependent effective population size $N_{\rm e}(\alpha)$ among neutral species, generalizing its classical interpretation. For near-neutral populations or populations that are heterogeneous in their regulation mechanism, we used perturbation theory to calculate the spectral gap, identifying it with a diversity loss timescale which can also be interpreted as setting an effective population size. Our results are particularly applicable to interacting subpopulations of T cells ("clones") which are near-neutral, are regulated through proliferation and apoptosis, and lose diversity with time.

0

q-bio.PE 2026-07-02

Pontryagin principle yields optimal control for heterogeneous SI epidemics

by Elisa Paparelli (SU)

Optimal control on a heterogeneous SI epidemic model

Minimizes final infection size under total drug-supply limit using the reduced macroscopic dynamics

abstract click to expand

This work addresses an optimal control problem for a SI epidemic model incorporating heterogeneities in resistance and viral load at the population level. Building upon the heterogeneous SI framework developed in [1], a minimization problem constrained to the macroscopic counterpart of the SI dynamics derived therein is proposed. Unlike traditional optimal control problems in homogeneous epidemic models, the present approach focuses on an optimal control problem that accounts for population heterogeneity, offering insights from a microscale perspective. The contribution aims to minimize the final size of the infection within a finite time horizon by developing a pharmaceutical strategy, under a supply constraint that translates into an integral equality constraint in the control function. By applying the Pontryagin Minimum Principle, a characterization of an optimal control is provided.

0

q-bio.PE 2026-07-02

Land cover beats single metrics as bird diversity predictor

by Dilusha Chandrasiri, Maneesha Herath +7 more

How Environment and Urbanization Shape Bird Diversity in Sri Lanka

Sri Lanka analysis of thinned grids shows ALAN favors generalists while cutting overall richness at multiple scales.

abstract click to expand

This study presents a comprehensive analysis of bird diversity across Sri Lanka by integrating spatial, temporal, and environmental data. Bird observation records were combined with environmental variables, including weather conditions, air pollution, the Normalized Difference Vegetation Index (NDVI), land cover, elevation, and Artificial Light At Night (ALAN), and rigorously preprocessed to ensure data quality. Spatial analyses were conducted on multiple grid scales (2 km, 5 km, 10 km) to evaluate patterns in species richness while minimizing sampling bias through spatial thinning. Temporal trends were assessed using effort-corrected metrics including rarefied richness and occupancy rates to account for variations in observation effort over time. Environmental drivers of bird diversity were examined using multivariate statistical models, including Poisson Generalized Linear Models (GLMs) and correlation analyses, to identify key associations between ecological factors and species richness. Additionally, community structure, dominance patterns, and beta diversity were analyzed to understand variations in species composition across regions and time. The study found that land-cover type is a stronger predictor of bird diversity than individual continuous variables such as NDVI or temperature alone. Urbanization, measured by ALAN, exhibits nuanced scale-dependent effects, supporting high abundances of a few generalist species while reducing overall richness. The findings provide actionable insights into the patterns and drivers of avian diversity in Sri Lanka, offering a scalable and reproducible framework for biodiversity research and conservation planning.

0

cs.LG 2026-07-02

Active-GRPO raises average SRxSim to 0.1773 by updating references

by Xuefeng Liu, Mingxuan Cao +4 more

Active-GRPO: Adaptive Imitation and Self-Improving Reasoning for Molecular Optimization

The method switches from imitation to self-reinforcement per instance and replaces the reference with better policy candidates to exceed pri

abstract click to expand

Scientific reasoning is an increasingly important capability of large language models, yet improving the robustness and efficiency of training such reasoning remains a key open challenge. We study this problem in instruction-based molecular optimization, where answer-only supervised fine-tuning (SFT) collapses multi-step reasoning and reinforcement learning with verifiable rewards (RLVR) suffers from sparse feedback. Reference-guided Policy Optimization mitigates both by anchoring policy updates to dataset-provided references, but its effectiveness is tightly coupled to reference quality: weak or misaligned references impose a performance ceiling. To overcome this ceiling, we propose active reasoning, a paradigm in which the policy actively decides, on a per-instance basis, when to imitate a reference and when to reinforce its own discoveries, while continuously upgrading what it imitates. We instantiate this paradigm as Active Group Relative Policy Optimization (Active-GRPO), realized through two coupled mechanisms: active imitate-reinforce and active referencing. The former performs imitation learning when the reference still outperforms the policy's own candidates, and shifts to self-improvement via reinforcement learning once the policy has generated molecules that surpass the reference. The latter continuously upgrades the reference itself by replacing it with the best policy-generated candidate discovered so far, progressively raising the imitation target and ensuring that reference guidance remains informative-rather than restrictive-throughout training. Across TOMG-Bench MOLOPT, Active-GRPO improves average SRxSim from 0.0959 for GRPO and 0.1665 for RePO to 0.1773 under matched three-seed evaluation, with statistically significant gains on LogP, MR, and QED.

0

q-bio.NC 2026-07-02

LLM features form stable cognitive parcels conserved across models

by Zhongxiang Sun, Haolang Lu +12 more

NeuroCogMap Reveals Cognitive Organization of Large Language Models

The parcels mark distinct failure modes, connect to model outputs, and improve predictions of human brain responses during language tasks.

abstract click to expand

Understanding how complex cognitive functions are organized within artificial systems is central to interpreting large language models (LLMs) and relating them to biological cognition. Yet although LLMs exhibit broad cognitive-like behaviours, it remains unclear whether their internal representations form reproducible functional systems that explain behaviour, failure and links to human cognition. Here we present NeuroCogMap, a cognitive neuroscience-inspired framework that organizes internal features of LLMs into functional parcels and links them to interpretable functions, cognitive capabilities and a cognitive hierarchy. These parcels form a stable and semantically coherent organization that is partly conserved across models and functionally linked to model outputs. Within this organization, major LLM failures, including hallucination, bias, refusal failure and sycophancy, correspond to distinct disruptions in representational and behavioural-control systems, yielding internal signatures for mechanism-guided detection and targeted intervention. Beyond model behaviour, NeuroCogMap improves prediction of human cortical responses during naturalistic language comprehension, with the strongest correspondence in higher-order association cortex. At the cognitive level, its internal signatures expose latent strategies that guide refinements of classical models of human decision-making. Together, these findings establish NeuroCogMap as a system-level framework for mapping functional organization in artificial systems and for relating this organization to human cortical function and cognitive behaviour.

0

cs.CV 2026-07-02

One image yields pairwise fitness flows in expanding colonies

by Faruk Alpay, Baris Basaran

Radial Interaction Tomography: Recognizing Non-Transitive Evolutionary Games from One Range-Expansion Image

Boundary curves in log-polar view recover interaction histories and flag non-transitive games.

abstract click to expand

Colored sectors in a microbial range expansion encode more than lineage survival counts. We formulate a computer-vision inverse problem: from one endpoint image of an accretive multi-type expansion, recover the radius-indexed pairwise boundary-flow field and test whether the visual pattern is compatible with a transitive scalar fitness hierarchy. The observable is a geometric signal extracted from sector-boundary curves in log-polar coordinates. We prove endpoint observability and stability for frozen fronts, weighted transitive/cyclic decomposition, contact-complete circular design, physical-clock and mechanism non-identifiability, exact Gaussian cyclicity testing, and Bonferroni-valid interval scanning. The benchmark is deterministic: analytic endpoint images, blurred/noisy pixel round trips, scalar-null stress tests, public-image tracing, multi-resolution mechanistic endpoints, and a non-learning frozen-front simulator. The implementation recovers pairwise edge-flow histories from endpoint images, detects cyclic residuals in a mechanistic four-type expansion, and uses those residuals as forcing signals for a dimensionless active design-control layer covering reaction-diffusion control, phenotype-frontier optimization, protocol synthesis, Monte Carlo robustness, and a downstream population-state bridge.

0

q-bio.PE 2026-07-01

Senescence mortality matches multi-level selection patterns

by Ananda Shikhara Bhat, Hanna Kokko

Demographic senescence as multi-level selection in miniature

A two-level Moran process models both group competition and damage buildup, producing equivalent age-specific death rates through selective

abstract click to expand

Multi-level selection and senescence do not at first sight have much in common. Here, we demonstrate that the emergent mortality patterns generated by demographic senescence can be understood as the product of multi-level selection. We formulate a two-level Moran type process and use its scaling limits to illustrate that a simple mathematical framework that models multi-level selection in group-structured populations also models damage accumulation patterns and resultant mortality curves in ageing organisms. To verbally make the connection, observe that defectors spread within a group consisting of cooperators and defectors; when groups compete against each other, defector-rich groups suffer, and between-group selection causes such groups to be systematically under-represented. Exactly analogously, senescing individuals accumulate damage to physiological sub-systems, and `damage begets damage'; individuals who are more damaged are more likely to die, hence damage-rich individuals are systematically under-represented in later age classes. Thus, emergent senescence patterns in complex, integrated organisms are formally equivalent to the patterns generated by a within-generation multi-level selection process in which intra-organismal sub-systems play the role of particles, organisms play the role of collectives, and selective disappearance plays the role of group selection.

0

q-bio.BM 2026-07-01

Frustration patterns lift alternative conformation recovery over AF-Cluster

by Hanqun Cao, Zijun Gao +4 more

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Benchmark across 48 cases shows +15.5 point gain for allosteric proteins and transfer to other predictors.

abstract click to expand

Deep-learning structure predictors are sensitive to their multiple sequence alignment (MSA) input, making MSA subsampling a practical route to recovering alternative conformations. Existing approaches such as AF-Cluster operate in sequence space, providing limited control over which conformational basin is sampled. We introduce SF-Cluster, which subsamples MSAs using patterns of predicted local energetic frustration, a representation largely independent of sequence similarity. Across a benchmark of 48 cases spanning fold-switching, allosteric, oligomerization-coupled, and intrinsically disordered systems, and using an AF-Cluster-style dual-reference RMSD criterion, SF-Cluster improves target-state recovery of the alternative conformation over AF-Cluster across the two-state classes, with the largest improvement observed for allosteric systems (+15.5 percentage points). The selected MSAs transfer to an architecturally distinct predictor, indicating that the conformational signal resides in MSA composition. Mechanistically, matched-depth controls show that this recovery advantage is largely explained by the effective depth of the selected subsets, which frustration-pattern selection reliably reaches. At the same time, highly frustrated residues are enriched at sites supported by deep mutational scanning and NMR two-state exchange, and frustration covariation is enriched at state-switching contacts while remaining distinct from coevolutionary coupling. Together, these results identify frustration patterns as a transferable representation for conformational prediction and position MSA subsampling as a representation-guided reweighting problem.

0

q-bio.NC 2026-07-01

Closed equation derived for covariance spectra in discrete non-normal dynamics

by Jacob A. Zavatone-Veth

Stationary covariance spectra of discrete-time non-normal random recurrent dynamics

Free-probability method yields a scalar functional equation for stationary spectra under random Gaussian weights; continuous-time case produ

abstract click to expand

Principal component analysis is widely used to characterize structure in the dynamics of recurrent neural networks. For stationary noise-driven dynamics, the distribution of variance among the principal components is determined by the spectrum of the stationary covariance matrix. While the spectral properties of this matrix are well-understood for linear networks with normal synaptic weight matrices, our understanding of the stationary covariance spectrum for random non-normal dynamics remains incomplete. In this note, we use a free-probability approach to formally derive a closed functional equation for the moment generating function of the limiting stationary covariance spectrum of discrete-time dynamics with random non-normal Gaussian weights. This characterization allows us to analyze the behavior of tail eigenvalues in the critical regime. In contrast, applying the same approach to the analogous continuous-time dynamics leads to an infinite hierarchy of Schwinger-Dyson equations, rather than a closed scalar equation. We conclude with some comments regarding the relevance of these results to comparisons of models of non-normal dynamics to neural data.

0

q-bio.PE 2026-07-01

Size-dependent dispersal creates four invasion growth regimes

by Ulysse Marquis

Invasion with size-dependent dispersion range

Main colony shifts from linear to blow-up expansion as range scales with size, but symmetry breaks in explicit models

abstract click to expand

The coalescing colony model provides a minimal framework for biological invasions with long-range dispersion. In its standard formulation, the dispersion range is assumed independent of the size of the invading population. Here, we relax this assumption and consider size-dependent dispersal: a main colony of linear size $r$ emits secondary colonies at distance $r^\mu$, with $0 \leq \mu \leq 1$. We derive the generalized dynamical equations for this extended model and map out the growth phase diagram for the leading order contribution. Depending on $\mu$, the main colony exhibits distinct regimes: linear expansion, power-law growth, exponential regime and finite-time blow-up. We confront these theoretical predictions with a spatially explicit physical model. While the coalescing colony approach correctly captures the scaling of the perimeter, it fails to predict the scaling of the volume. We trace this discrepancy to an effective breakdown of circular symmetry in the morphology of the main colony. Finally, we quantify temporal evolution of the population fraction residing outside of the main colony. The coalescing colony model predicts its decay to~$0$ like a power-law when~$\mu<1$, and a macroscopic amount of the population remains in the secondary colonies at~$\mu=1$. Simulations of the physical model reveal a persistent satellite population not captured by the theory at~$\mu>\mu^*\approx 0.7$. Broadly, our findings highlight how coupling dispersal range to population size fundamentally alters invasion dynamics, with implications for biological invasions, metastatic growth, and urban expansion.

0

q-bio.PE 2026-07-01

Full ITN use by one host can raise disease in another

by Shravani Shetgaonkar, Anupama Sharma

Nonlinear Feedbacks Between Host Behavior and Vector Adaptation in a Multi-Host Vector-Borne Disease Model

Vectors redirect bites when nets protect the primary host, increasing transmission to the secondary host in the model.

abstract click to expand

Insecticide-treated nets (ITN) are an effective and low-cost intervention for controlling vector-borne disease (VBD), however, their use depends on individual decisions based on perceived cost and risk of infection. This study investigates a nonlinear multi-host model for the transmission of VBD with endogenous strategic control. We assume that hosts' adoption of ITN emerges from the payoff-based decision-making, creating a nonlinear coupling with disease prevalence. We model vector preference as a function of ITN coverage to probe the complex interplay among individual choices, disease prevalence, and its control in a multi-host setting. The qualitative behavior of the system is characterized by the thresholds $R_0$ and $R_c$, which determine the existence and local stability of the disease-free and endemic equilibria. The system exhibits rich dynamical behavior; hence, we provide a bifurcation analysis identifying the conditions for saddle-node and Hopf bifurcations. Our results demonstrate that the interaction between the perceived cost of ITN and the infection risk can induce critical transitions, including regime shift from stable endemic states to sustained periodic oscillations. Furthermore, we identify a counterintuitive effect whereby complete ITN adoption by the primary host can increase the overall prevalence in the secondary host due to adaptive shifts of vector feeding behavior.

0

cs.GT 2026-07-01

Extrinsic dynamics cap cooperation

by Harry Foster, Vince Knight +1 more

The Cooperation Ceiling: Extrinsic Population Dynamics and the Intrinsic Escape

In the public goods game with varying contributions, outward payoff comparison hits a limit that inward evaluation can exceed.

abstract click to expand

Evolutionary game theory provides a framework by which to study the emergence of cooperation in a population of self-interested actors. In such a framework, players' decisions on whether or not to cooperate evolve according to decision rules called population dynamics. However, often games are studied under the assumption that all individuals play under the same conditions, and many common choices of update rule are not well suited for a heterogeneous population. In this paper, we categorise and compare four different population dynamics in such a population as ``extrinsic'', where players learn by looking outward at the payoffs of other players, and ``intrinsic'', where players look inwardly at their own attributes or potential payoffs. We show that extrinsic population dynamics admit a ceiling on the rate of cooperation which can be exceeded by intrinsic population dynamics, and demonstrate this using the public goods game with heterogeneous contributions.

0

q-bio.PE 2026-07-01

Transposable elements link development to ageing

by Alessandro Fontana

A conceptual model for the evo-devo role of transposable elements and its implications for the ageing phenomenon

Early epigenetic repression of their regulatory activity is released later, contributing to transcriptional randomness and decline.

abstract click to expand

The Evolvable Soma Theory of Ageing is a recently proposed model that frames development as a continuous process of change accompanying organisms throughout the lifespan. This process is driven by developmental genes which encode epigenetic changes on target cells, whereas ageing reflects the expression of late-acting modifications, that are subject to ongoing evolutionary optimisation and function as somatic "experiments" to explore phenotypic novelty. In this work we examine the role of transposable elements in the model. Our proposal acknowledges that these elements facilitate the expansion and diversification of gene regulatory networks by providing transcription factor binding sites. To minimise disruption, their regulatory activity is tightly repressed by epigenetic mechanisms during early development, which may be progressively released by genetically driven, age-associated epigenetic changes in later life, thereby contributing to transcriptional pseudo-randomness and ageing-associated phenotypes. Within this framework, transposable elements are integrated into a unified view of evolution, development and ageing, providing a conceptual basis for their dual role in regulatory innovation and age-related decline.

0

cs.LG 2026-07-01

SAEs restore geometry in neuron image representations for RNA alignment

by Jisung Park, Seohyeon Kang +9 more

Resolving superposition in AI for interpretability and cross-modal alignment in patient-neuronal images

Purified representations allow adapting scRNA-seq methods to images and de novo reconstruction of pathology pathways like Calcium-AIS scaffo

abstract click to expand

Artificial intelligence is transforming our capability to solve biological challenges. In dimensionality bottleneck regimes exacerbated by high-dimensional biological data, neural networks force distinct concepts into the lower dimensions known as superposition. Although this superposition is widely known to hinder interpretability, its impact on corrupting the geometry of latent spaces remains critically overlooked. Here, we utilized sparse autoencoders (SAEs) trained on over 100,000 multiplexed images of patient-derived Parkinson's disease and healthy neurons to resolve superposition. This approach bypasses the mathematical non-uniqueness of feature attribution by shifting to interpretable latent representation analysis. We theoretically and empirically demonstrate that superposition contaminates representational metric spaces, and thereby SAEs successfully recover geometric fidelity. By treating these geometrically purified representations as single-cell state vectors, we adapted single-cell RNA sequencing (scRNA-seq) data analysis methodologies directly to the image domain. Finally, we introduce GW-map, utilizing Gromov-Wasserstein optimal transport to align these image representations with authentic scRNA-seq data de novo. This coupling reconstructs hierarchical neuronal pathology pathways such as Calcium-AIS scaffold, without reference spatial transcriptomics, establishing a scalable foundation for spatial biology. Code is available at https://github.com/jijihihi/Bio\_superposition

0

q-bio.PE 2026-07-01

Mutation induces effective mortality threshold for population persistence

by Phil. Pollett

Persistence, Thresholds, and Trait Composition in a Regulated Mutation-Selection Model

In two-trait regulated models this sets survival conditions, with initial composition mattering when inheritance dominates mutation.

abstract click to expand

We study a population model in which individuals carry one of two traits and evolve under mutation, selection, and density-dependent regulation. A deterministic large-population limit yields a nonlinear system coupling logistic growth with mutation-selection dynamics. We identify threshold conditions governing extinction, persistence, and long-term trait composition. In particular, mutation induces an effective mortality rate that determines whether the population can be sustained. When inheritance dominates mutation, a second threshold emerges: population establishment depends on initial trait composition as well as overall growth rates. Although extinction ultimately occurs, the system typically exhibits long-lived quasi-equilibrium behaviour. A diffusion approximation provides a tractable description of this, and reveals a transition in the sign of trait correlations. The model thus illustrates how mutation, selection, and resource limitation jointly shape both ecological persistence and evolutionary outcomes.

0

cs.LG 2026-07-01

Tabular in-context learners match on protein fitness with fixed reps

by Davy Guan, Lu Zhang +8 more

Can Tabular In-Context Learners Generalize to Biomolecular Property Prediction?

They stay competitive on ProteinGym using ESMC but performance on molecules hinges on descriptor choice.

abstract click to expand

Predicting biomolecular properties from limited labeled data is a central bottleneck in protein engineering and small-molecule design. As strong pretrained encoders now supply rich fixed-length representations, the difficulty has shifted from representation learning to building a data-efficient predictor for the few-shot regime. Tabular foundation models such as TabPFN3 and TabICL are unlikely candidates for this role: they are in-context learners pretrained on synthetic tables drawn from random causal graphs, a generative prior with no obvious correspondence to the processes that produce protein sequences or molecular graphs. That this tabular, causal inductive bias should transfer to biomolecular data at all is unintuitive, yet we find it does. Treating each method as a predictor-representation pair, we evaluate across two domains. Over a fixed ESMC representation, tabular in-context learning is consistently competitive for protein fitness regression on ProteinGym and a diverse esterase dataset. For small-molecule classification with ECFP/RDKit descriptors, no single pairing dominates across TDC ADMET, MoleculeNet, FS-Mol, and DrugOOD; representation choice becomes a primary determinant, as expected when the predictor's own prior is indifferent to molecular structure. We conclude that tabular foundation models are strong performers on biomolecular prediction tasks, but that their performance depends strongly on the sequence or molecular representation used.

0

q-bio.OT 2026-06-30

Soccer headers carry under 20% concussion risk

by Christopher Lewis, Anu Tripathi +6 more

Head Kinematics and Brain Tissue Deformation from Soccer Heading: A Review of Implications for Brain Injury Risk

Review of kinematics data shows higher motion in matches, corner kicks and older players but predicted mild brain injury odds remain low.

abstract click to expand

Purpose: Repeated heading of soccer balls has raised concerns of potential long-term neurological effects. Consequently, numerous studies have estimated head kinematics and brain deformation due to soccer headers across different cohorts and play scenarios to identify higher risk conditions. However, heterogeneity in study design, data collection, and analysis has produced inconsistent findings, and injury risk is infrequently reported. Therefore, a meta-analysis of the existing literature was conducted to identify knowledge gaps and inform future studies assessing injury risk in soccer. Methods: We synthesized data from studies reporting head kinematics or brain deformation from soccer headers on human subjects. The data from these studies were analyzed to obtain the risk of mild traumatic brain injury (mTBI) based on applicable injury metrics and risk curves. Results: The meta-analysis revealed specific trends, indicating that match scenarios, corner and goal-kicks, top and oblique impacts, and older age cohorts were associated with higher head kinematics, while sex-based trends were inconclusive. The choice of sensor system affected the estimated head kinematics, with headband sensors consistently measuring higher kinematics than mouthpiece sensors. The data showed large variability stemming from heterogeneous study designs, limiting the applicability of the observed trends. These factors also influenced injury risk predictions, with estimated concussion risks generally below 20%. Conclusion: This review reveals trends in mTBI risk from soccer heading across different cohorts and play scenarios. It also underscores the need for standardized reporting of kinematics and brain deformation to enable mTBI risk estimation and meaningful cross-study comparisons.

0

q-bio.BM 2026-06-30

TCR model shows generated structures yield weaker contact maps

by Jiarui Li, Zixiang Yin +5 more

Structure-Regularized Interpretable TCR-Epitope Prediction

TCR-SRIM matches top accuracy yet finds experimental structures produce more accurate and diverse interaction patterns than AlphaFold3 or TC

abstract click to expand

T cell receptor (TCR)-epitope binding prediction is essential for understanding adaptive immunity and developing immunotherapies. Existing sequence- and structure-based models often generalize poorly to unseen epitopes and provide limited interpretability. Furthermore, the impact of generated structures on model learning remains unclear. We present TCR-SRIM, a structure-regularized interpretable-by-design model that combines protein language model embeddings with interpretable contact prototypes to capture residue-level TCR-epitope interactions. TCR-SRIM achieves state-of-the-art predictive performance and improved interpretation quality on the TCR-XAI benchmark. Using its inherent interpretability, we further evaluate the effect of generated structures on model learning. While structures predicted by AlphaFold3, TCRModel2, and tFold-TCR yield competitive performance, they lead to less accurate interaction patterns and reduced binding-site diversity than experimentally-resolved structures. Our results highlight limitations of current structure prediction models for TCR-epitope learning and demonstrate the value of interpretable-by-design models for studying generated biological structures.

0

cs.LG 2026-06-30

Locked prior lifts Andes virus source ranking from 13.8% to 37.9% top-1

by Md Ahsan Karim

A Transferable Learned Temporal Prior for Transmission Reconstruction and Decision-Relevant Uncertainty in Real Outbreak Labels

Model trained on eleven other disease families transfers without refitting and shows many real transmission labels are genomically unsupport

abstract click to expand

Outbreak transmission reconstruction treats epidemiological timing and transmission labels as deterministic ground truth; neither has been systematically evaluated. We trained a logistic regression temporal prior on eleven disease families, locked all parameters before accessing any target outbreak data, and applied it without refitting to a strict Andes virus (ANDV) parent-ranking benchmark of 29 tasks. The locked prior achieved mean reciprocal rank (MRR) 0.571 versus 0.274 and Top-1 accuracy 37.9% versus 13.8% against the best source-trained parametric baseline (permutation p <= 0.0002; 7-8 reversals to lose MRR significance). A phylogenetic concordance audit of 75 NYC mpox inter-host pairs - independent label-reliability evidence rather than a prior validation - found that 54.67% (exact 95% CI: 42.75-66.21%) were genomically unresolved or unsupported. Retaining uncertain edges in ANDV and Guangdong Delta graphs shifted top-5 source-priority sets (Jaccard 0.429-0.667). Transmission-label uncertainty was measurable in the outbreak evidence modules examined, and retaining uncertain links changed which source cases were prioritized for intervention.

0

q-bio.NC 2026-06-30

Adaptation tunes chaotic networks through four oscillatory regimes

by Bowen W. Zheng, Earl K. Miller +1 more

Mean-field theory of rich oscillatory dynamics in low-rank recurrent networks with activity-dependent adaptation

Mean-field theory links random and low-rank connections plus firing-rate adaptation to brain-like rhythms.

abstract click to expand

We develop a dynamical mean-field theory for random recurrent networks with low-rank structure and firing-rate-driven adaptation. When the random connectivity is strong enough to generate chaos, increasing adaptation strength drives the network through four regimes: a static coherent state, noise-sustained oscillations that progress from regular to irregular, stochastic switching between symmetric wells, and a global limit cycle. The theory identifies two instability mechanisms, chaos onset from the random connectivity and a Hopf bifurcation of the coherent mode, and shows how adaptation shapes both through the frequency-dependent single-neuron transfer function. A reduced three-dimensional model captures the bifurcation structure of the full network. Above the chaos threshold, coherent population-level oscillations coexist with heterogeneous firing rates and network-generated stochasticity at the single-neuron level. The interaction of adaptation with random and low-rank connectivity produces a rich oscillatory repertoire, including waxing-and-waning rhythmic episodes, persistent state switching, and slow Up-Down alternations, dynamics that have been observed during wakefulness, sleep, and anesthesia.

0

q-bio.NC 2026-06-30

Brain models personalize in seconds without sharing patient data

by Amirhossein Esmaeili, Marmaduke Woodman +9 more

Cohort-amortized personalization: navigating the privacy-utility frontier for virtual brain twins

Cohort simulations train a shared estimator that fits new subjects locally and matches full per-subject accuracy on epilepsy and aging cohor

abstract click to expand

Personalized generative brain models require individual neuroimaging data that privacy constraints and re-identification risk make difficult to share, while per-subject fitting procedures cost hours of compute -- limiting clinical translation and multi-site collaboration. We introduce cohort-amortized personalization (CAP), which replaces data sharing with model sharing: a neural density estimator is trained on simulations from a mechanistic whole-brain model under a low-rank cohort prior, and only the compact estimator is distributed, so new subjects are personalized in seconds on their own data alone. To make this prior both compact and atlas-independent, a cross-atlas autoencoder (CrossCoder) maps connectomes from 20 anatomical atlases into a shared latent space, enabling deployment across sites with heterogeneous atlases. We validate CAP on two cohorts: 21 patients with drug-resistant epilepsy (epileptogenic-zone localization F1=0.56) and 832 subjects from the 1000BRAINS aging cohort (predicted age r=0.44); in both, CAP matches or exceeds per-subject inference with hours-to-seconds speed-up. Because the shared artifact couples a cohort prior to a mechanistic simulator, it can serve as a mechanistic surrogate supporting in-silico experimentation and synthetic-cohort generation without raw-data access -- a governance-audited alternative we term synthetic access, allowing for wider adoption of personalized modeling in more diverse settings.

0

math.PR 2026-06-30

Phosphorylation model admits regime with two stable equilibria

by Lucie Laurence, Philippe Robert

Thermodynamic Limits of Stochastic Chemical Reaction Networks with Phosphorylation

With substrate fixed at N and enzymes scaling with N, specific catalytic constants yield three equilibria of which two are stable.

abstract click to expand

In this paper we investigate the stability properties of a fundamental mechanism of biological cells called phosphorylation. The system is a chemical reaction network (CRN) for which a chemical species, {\em the substrate}, can be sequentially transformed into two phosphorylated forms, by the activity of two types of enzymes, one type for phosphorylation, the other for dephosphorylation. We investigate a stochastic representation of this model, under the mass action kinetics. The total mass of the substrate is fixed at $N$, while the total mass of enzymes scales proportionally to $N$. The asymptotic behavior, when $N$ is large, of the concentrations of all chemical species is studied. We investigate the possible {\em stable} subsets of chemical species for the kinetics of the law of mass action. A stable subset is such that, with a convenient initial state, the number of copies of the species of this subset remains $O(1)$ on any finite time interval as $N$ gets large. The role of the twelve reaction rate constants, {\em the catalytic constants} of the CRN, is investigated from this point of view. An averaging principle of the corresponding Markov process is established for several regimes of the CRN. It is shown in particular that there exists a regime with three equilibrium points, with two of them stable. The proofs of the results rely on stochastic calculus with Poisson processes, convenient couplings of subsets of coordinates of the Markov process, technical results on $M/M/\infty$ queues, and a stability analysis of a dynamical system in $\mathbb{R}_+^4$.

0

q-bio.SC 2026-06-30

Clathrin coats develop stiffness and memory from growth conditions

by Johannes H. H. Dreckhoff, Ulrich S. Schwarz +2 more

Pathway variability, coat stiffening and mechanical adaptation during clathrin-mediated endocytosis

Simulations reveal how emergent properties create two gates that decide flat, stalled or closed fates and match experiments without fitting.

abstract click to expand

Clathrin assemblies in cells can persist as flat plaques, abort after partial invagination, or close into clathrin-coated vesicles, but the determinants of these different fates remain unresolved. To investigate the stochastic and complex dynamics of clathrin assemblies, we have developed a kinetic Monte Carlo simulation framework that couples individual clathrin agents to an adaptive continuum membrane. In this hybrid discrete-continuum description, the effective coat bending rigidity and the preferred coat curvature emerge during growth, rather than being prescribed as material parameters. Once connected, curved lattices stiffen from molecular bending modes to coat-level rigidities, because curvature changes require increased stretching or compression, while newly incorporated triskelia hardcode a history-dependent preferred curvature. An analytical theory for non-Euclidean elasticity identifies the relevant internal variables and predicts growth laws that are validated by the simulations. The same microscopic assembly rules yield flat, stalled, and closed coats through two sequential gates in the effective membrane-coat energy landscape. Comparisons with experimentally observed coat geometries and nanodissection-induced curvature changes agree with our theoretical predictions without any fitting parameters. The clathrin coat thus emerges as an adaptive assembly with prestress and memory, whose fate and material parameters reflect the environment in which it has been growing.

0

q-bio.GN 2026-06-30

Pretraining may not pay off for DNA transformers

by Romain Karpinsky, Julien Mozziconacci +1 more

DNA Language Models: An Assessment of Pre-Training for Fine-Tuning Tasks

Benchmarks compare transformers to convolutional models to quantify gains from pretraining and BPE on fine-tuning tasks

abstract click to expand

Recent breakthroughs in foundation models and Large Language Models (LLMs) have introduced new opportunities for studying and decoding genomic sequences. Several state-of-the-art approaches, such as DNABERT2, rely on transformer-based architectures, while others, such as ConvNova, still build upon more conventional convolutional models. However, systematic benchmark comparisons across these methods remain scarce. Given that transformer-based models require extensive and costly pretraining, it is crucial to evaluate whether their performance gains justify this overhead. Moreover, LLMs such as DNABERT2 typically rely on Byte Pair Encoding (BPE) tokenization, whose relevance for DNA sequence representation is still debated within the genomics community. In this work, we investigate three key questions: (i) do transformer-based models provide sufficient improvements on fine-tuning tasks upon heavy pretraining, (ii) what is the actual contribution of pretraining in this setting, and (iii) how does BPE tokenization impact performance on genomics-related tasks?

0

eess.IV 2026-06-30

Lightweight module turns H&E slides into molecular pathway predictors

by Dominik Winter, Dominik Vonficht +7 more

Data-Efficient Multimodal Alignment for Histopathology-based Molecular Prediction

Contrastive training on 1720 samples aligns frozen models for 25-fold better gene-set retrieval without new sequencing.

abstract click to expand

H&E-stained whole-slide images offer cohort-scale availability and rich spatial context but lack molecular specificity, whereas bulk RNA-seq provides transcriptome-wide resolution at high cost with limited archival availability. We show that training a lightweight alignment module atop frozen histopathology and RNA-Seq foundation models enables open-vocabulary molecular prompting -- querying H&E slides with gene-set signatures to predict pathway activity without sequencing or end-to-end retraining. Using contrastive learning on a multi-cancer cohort (N=1,720), we achieve a 25-fold improvement in retrieval over baseline methods. Systematic analysis reveals a graduated predictability spectrum: morphologically grounded programs (cell-cycle programs, immune-related) are most reliably predicted (R^2>0.5), while predicting pathways with no morphological footprint remains challenging as expected. We validate clinical utility on the POSEIDON clinical trial: H&E-predicted squamous cell carcinoma scores recapitulate NSCLC subtype identity and predicted IFN-gamma mirror PD-L1 tumor-cell expression groups. Furthermore, genesets describing immune activation and fibrosis predict known tumor microenvironment archetypes from histology alone. We further validate generalization of our approach across unseen cohorts and demonstrate data-efficient domain adaptation, establishing a slide-native framework for molecular analysis on H&E images.

0

cs.CL 2026-06-30

LLM reasoning graphs match within and between case clusters

by Nisarg A. Patel (University of California, San Francisco)

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

Accuracy hits 60-70 percent on complex cases yet graph similarities show no difference for clinically similar versus dissimilar ones.

abstract click to expand

Modern large language models (LLMs) reach 60-70% diagnostic accuracy on complex clinical case benchmarks, but accuracy alone cannot distinguish stable clinically-grounded reasoning from pattern matching. We introduce clinical reasoning graphs, structured graph representations extracted from free-text LLM diagnostic traces using a domain-grounded ontology with 5 node types and 7 edge types. We apply this pipeline to 750 traces from five LLMs across 50 New England Journal of Medicine Clinicopathological Conference cases and three prompt conditions, and test whether diagnostic traces show stable structured reasoning patterns, or diagnostic schemas, for clinically similar cases. We operationalize this as higher graph similarity among clinically similar cases than among clinically dissimilar ones. Across 15 model-condition comparisons, within-cluster and between-cluster composite similarity are nearly equal, and no comparison survives multiple-testing correction; a component-level analysis finds any residual content signal far below schema scale. Graph similarity is also nearly identical for pairs of models that are both correct (0.488) and both incorrect (0.484), suggesting that graph structure captures a dimension not reflected in diagnostic accuracy. Structured reflection prompting increases explicit discriminating-feature analysis within traces (+33%) but does not increase cross-case consistency. These results show diagnostic competence without schema-scale reasoning consistency, and indicate that final-answer accuracy should be complemented by process-level evaluation. We release the ontology, extraction pipeline, validation protocol, and the extracted reasoning graphs and similarity artifacts as resources for structured evaluation of LLM clinical reasoning.

0

q-bio.NC 2026-06-30

Meditation raises brain signal-to-noise ratio

by Ruben Laukkonen

Clear Mind: Meditation and the Brain's Signal-to-Noise Ratio

A single construct unifies meditation effects by showing how practice sharpens relevant neural signals and reduces clutter, with possible us

abstract click to expand

Meditation is quintessentially associated with a clear mind. This paper proposes that diverse findings in the science of meditation can be mapped onto a single, empirically tractable construct: functional signal-to-noise ratio in the brain, or f-SNR. Signal denotes neural variance that tracks the goal-relevant causes of sensory input, while noise denotes residual activity, including irrelevant endogenous fluctuations. Mechanistically, meditation increases f-SNR through two primary operations: selectively enhancing signal and "decluttering" noise. Deepening practice is further proposed to increase f-SNR by reducing self-referential filtering and shifting global neural activity toward a critical regime, a thermodynamically efficient state that maximizes information transmission and dynamic range. This framework has a strong existing evidence base and is readily falsifiable using metrics such as neural variability quenching, mutual information, and multivariate decoding. The f-SNR account also offers a transdiagnostic explanation for the efficacy of meditation across a range of psychopathologies associated with low-SNR states. The theory also has implications for emerging technology: meditation may improve brain-computer interfaces, or BCIs, by making brain activity easier to read.

0

q-bio.QM 2026-06-30

Closed-loop cell-cycle head improves drug perturbation forecasts

by Dingping Zhao, Jie Lin

Modeling Cell-Cycle-Aware Single-Cell Drug Perturbation Responses

Deriving phase targets from the model's own treated-state predictions yields better expression and proliferative-state accuracy than standar

abstract click to expand

Single-cell drug perturbation models should predict not only transcriptional response magnitude, but also whether a treatment alters the proliferative state of a cell. This is challenging because cell-cycle variation is often treated as nuisance variation, and benchmark pipelines rarely treat drug-induced phase changes as a primary prediction target. We introduce scCycleMol, a cell-cycle-aware perturbation prediction framework built on a curated 24-hour SciPlex3 benchmark with standardized molecule identities, dose and cell-line metadata, and gene expression with cell-cycle supervision derived from treated states. Instead of using cell-cycle state as an input covariate, scCycleMol derives supervision from predicted treated expression and propagates it through a learnable full-expression cell-cycle head with circular G1/S/G2M phase targets. We evaluate marker-based supervision, molecular representations, and pretraining strategies to isolate sources of improvement. Across a SciPlex3 benchmark with over 600k cells, 186 perturbation conditions, multiple cancer cell lines, and thousands of genes, scCycleMol improves out-of-distribution expression prediction compared with conditional perturbation baselines. The best LINCS-pretrained circular model achieves 0.9093 expected all-gene r squared and 0.6843 expected differentially expressed gene r squared, compared with 0.6800 and 0.5400 for LINCS-pretrained ChemCPA. Closed-loop cell-cycle supervision improves phase accuracy by about 0.5 to 0.6 points while maintaining nearly unchanged expression prediction. A Tahoe-pretrained variant reaches 0.9609 phase accuracy, highlighting the benefit of explicit cell-cycle-aware supervision in perturbation modeling.

0

q-bio.NC 2026-06-29

Geometric stability of neural codes tracks behavior apart from drift

by Prashant C. Raju

Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Split-half distance consistency predicts trial-by-trial neural-behavioral coupling while centroid measures show none

abstract click to expand

Current models of representational reliability in neural populations focus on temporal stability: whether population centroids are preserved across sessions and days. This framing leaves a fundamental question unanswered: how reliably does the pairwise distance structure among stimuli reproduce across independent observations within a session? We argue that this property, geometric stability, constitutes an independent axis of representational analysis that existing frameworks do not capture. We formalize geometric stability as the Spearman rank correlation between split-half representational dissimilarity matrices (Shesha) and show that it is empirically dissociable from both temporal stability and decoding accuracy. Across 229 area-session observations spanning 68 brain regions in a visual discrimination task (Steinmetz et al. 2019), geometric stability predicts trial-by-trial neural-behavioral coupling ($\rho = 0.18$, $p = 0.005$) while centroid drift does not ($\rho = 0.002$, $p = 0.976$). The regional hierarchy, with striatum most stable ($\bar{S} = 0.44$) and hippocampus least ($\bar{S} = 0.19$), runs roughly opposite to the temporal stability hierarchy. Directionally consistent olfactory data (Bolding \& Franks 2018) motivate an attractor network model in which recurrent excitatory coupling amplifies split-half RDM consistency by completing stimulus patterns from sparse feedforward input ($\rho = +0.64$, $p = 0.010$), providing a circuit-level account of how geometric stability emerges. These results establish geometric stability as a functionally relevant, circuit-dependent property of neural population codes, orthogonal to temporal drift measures and complementary to recent accounts of how recurrent connectivity balances representational stability with sequential dynamics in hippocampal circuits.

0

q-bio.GN 2026-06-29

Hybrid system turns spatial transcriptomics into reproducible SLURM bundles

by Myles Joshua Toledo Tan, Vasco Gerardo Hinostroza Fuentes +5 more

DiSTILL: A Hybrid Cloud-HPC Workflow System for Reproducible Spatial Transcriptomics Analysis

DiSTILL uses cloud registries and a pipeline generator to produce consistent HPC execution packages across restricted environments.

abstract click to expand

Spatial transcriptomics workflows increasingly combine large annotated data objects, notebook-based analyses, and resource-intensive statistical models that must be executed on high-performance computing (HPC) systems. In practice, these workflows are often difficult to reproduce because configuration, validation, stage execution, and artifact handling are fragmented across $\textit{ad hoc}$ scripts and manually edited notebooks. We present $\textit{DiSTILL}$ (Disease Diagnosis from Spatial Transcriptomics via Interpretable Latent Learning), a hybrid cloud$-$HPC workflow system for reproducible spatial transcriptomics (ST) analysis. DiSTILL combines an application programming interface (API) backend built with $\texttt{FastAPI}$, a web frontend, a dataset and preset registry, and a Python pipeline generator that materializes run-specific execution bundles and $\texttt{SLURM}$ submission scripts. The system supports local, Secure Shell (SSH)-mediated, and pull-based poller execution modes, enabling HPC submission in environments where persistent API-initiated automation is restricted. We describe the system through the lens of an inflammatory bowel disease (IBD) ST workflow that operationalizes the analytical pipeline of Tan $\textit{et al.}$ into an auditable application layer. Accordingly, the contribution of this paper is a workflow systems contribution centered on reproducible execution, queue-based orchestration, configuration semantics, and deployment across a split cloud$-$HPC architecture. The broader application goal of DiSTILL is to support user-supplied datasets that satisfy the schema assumptions of the wrapped analytical pipeline.

0

q-bio.BM 2026-06-29

Manganese-GelMA hydrogels enable MRI-guided cancer immunotheranostics

by Motahareh Nazari, Keyvan Alavi

Manganese-Functionalized GelMA Hydrogels for MRI-Guided Immunotheranostics in Precision Oncology

Review outlines how these materials combine diagnosis, localized therapy, and immune modulation in one platform.

abstract click to expand

Precision oncology requires multifunctional platforms capable of integrating accurate tumor diagnosis, localized therapeutic delivery, immune modulation, and real-time monitoring of treatment response. Gelatin methacryloyl (GelMA) hydrogels have emerged as versatile biomaterials for biomedical engineering because of their biocompatibility, extracellular matrix-like structure, tunable mechanical properties, photocrosslinkability, and capacity to incorporate therapeutic agents, imaging probes, and functional nanomaterials. In parallel, manganese-based materials have gained increasing attention as promising alternatives to gadolinium-based magnetic resonance imaging contrast agents and as therapeutic components capable of modulating the tumor microenvironment. Manganese ions and manganese-based nanomaterials can enhance T1-weighted MRI contrast, generate reactive oxygen species, relieve tumor hypoxia, deplete glutathione, promote immunogenic cell death, and activate the cyclic GMP-AMP synthase-Stimulator of Interferon Genes pathway. The integration of manganese-based systems with GelMA hydrogels offers a promising strategy for developing localized, stimuli-responsive, and MRI-guided immunotheranostic platforms. This review summarizes the fundamental properties of GelMA hydrogels, the diagnostic and therapeutic roles of manganese-based materials, strategies for constructing manganese-functionalized GelMA systems, and their potential applications in precision oncology. Current challenges, including manganese-associated toxicity, controlled ion release, mechanical optimization, reproducibility, and clinical translation, are also discussed. Finally, future directions are proposed for the rational design of safe, scalable, and personalized manganese-functionalized GelMA platforms for cancer diagnosis and therapy.

0

q-bio.PE 2026-06-29

Generative models couple protein sequences to evolutionary dynamics

by Matteo Bisardi, Leonardo di Bari +4 more

Modeling Protein Evolution with Generative Models: from Extant Sequence Data to Evolutionary Dynamics

Probabilistic landscapes from sequence families are linked to population-genetic rules to simulate change on lab and tree timescales.

abstract click to expand

Protein sequences carry a record of evolutionary history shaped by mutation, selection, drift, and epistasis. Recent generative models trained on homologous sequence families offer a new way to read this record: they define probabilistic landscapes that score sequences, generate viable variants, and capture constraints that are difficult to measure experimentally. In this review, we discuss how such landscapes can be used not only for protein design or mutation-effect prediction, but also for modeling evolutionary dynamics. We focus particularly on Direct Coupling Analysis as an interpretable and experimentally validated framework, while placing it in the broader context of generative sequence modeling. We first describe how generative sequence landscapes are inferred and assessed, then review how they can be coupled to population-genetic or substitution-model dynamics to simulate protein evolution across experimental and phylogenetic timescales. Applications include viral evolution, laboratory drift experiments, historical contingency, entrenchment, epistatic drift over time, and long-term sequence-space exploration. We conclude by discussing open challenges, including score-fitness calibration, phylogenetic structure, codon-level mutation biases, indels, and the integration of experimental data.

0

math.DS 2026-06-29

Alzheimer's model reaches one equilibrium when new plaque formation stops

by Ruoyun Lang, Hui Zhou

Global stability analysis of a mathematical model from Alzheimer's disease

System of differential equations converges globally to a unique positive steady state from any starting concentrations.

abstract click to expand

This study focuses on a mathematical model of Alzheimer's disease involving $\beta$-amyloid, cellular prion protein and their complex. The global asymptotic stability of the model indicates that the complex continues to induce neuronal damage regardless of the initial states. To investigate the dynamics of this system, we have rigorously proved that when the formation rate of new plaques is zero, the system is unconditional globally asymptotically stable without any limitation proposed in previous work. Numerical simulations further validate the theoretical analysis, regardless of the random initial state, demonstrating that the system consistently converges to a unique positive equilibrium. From a therapeutic perspective, we propose targeted therapeutic strategies and verify their effectiveness through numerical simulations. These results provide a universal theoretical basis for understanding dynamic mechanisms of Alzheimer's disease and offer critical guidance for developing targeted therapeutics.

0

cs.LG 2026-06-29

Single-pass graph model lifts mass-spectrum retrieval accuracy

by Rui-Xi Wang, Runzhong Wang +1 more

GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem

GLACIER detects fragments directly instead of enumerating candidates first, raising top-1 accuracy from 64 percent to 70 percent and deliver

abstract click to expand

Predicting tandem mass spectra (MS/MS) from molecular structures represents a central task in analytical chemistry with direct relevance to clinical metabolomics, systems biology, and adjacent disciplines. In this work, we revisit the problem through the lens of object detection on molecular graphs. Molecular fragmentation, a central step in MS/MS prediction, can be approximated as detecting a set of subgraphs (i.e., fragments) and their associated spectral contributions. Existing fragment-based models follow a two-stage paradigm -- first generating candidate fragments and then scoring them -- analogous to two-stage R-CNNs in computer vision. Towards higher accuracy and faster inference, we introduce GLACIER, a single-stage transformer-based fragment detection neural network for molecular graphs. This unified formulation eliminates the need for candidate enumeration, enabling scalable and globally consistent modeling of molecular fragmentation. GLACIER is faster and more accurate than existing state-of-the-art by a significant margin, achieving 70.0% and 69.7% Top-1 retrieval accuracy with and without contrastive finetuning on the MassSpecGym dataset (from the previous SOTA of 64.0%) and 52.5% and 38.5% respectively on the NIST'20 dataset (from 33.2%). Furthermore, GLACIER provides nearly 8-fold inference speedup over our prior two-stage model. Code is available at https://github.com/coleygroup/ms-pred

0

stat.ML 2026-06-29

Relaxed noise assumptions yield directed brain connectivity estimates

by Stephan Goerttler, Min Wu +1 more

Connectivity Estimation using Stochastic Graph Heat Modelling

Adding regularization to stochastic graph heat models recovers spatial structure across multiple real-world neurophysiological datasets.

abstract click to expand

A growing number of techniques leverage the spatial structures that underlie many real-world datasets. Despite these advances, the complementary task of estimating spatial structures and understanding their role within these techniques has often been overlooked. In neurophysiological data analysis specifically, numerous methods exist to estimate brain connectivity, but most are not explicitly model-based, dynamic, multivariate, or directed. To address these limitations, we previously introduced noise-driven heat modelling on graphs for neurophysiological connectivity estimation. In this study, we extend this framework by relaxing earlier noise assumptions and adding regularisation to improve robustness. We also develop a simulation procedure to characterise and evaluate our technique in a controlled setting. Finally, we demonstrate that the technique is able to capture meaningful spatial structure across two experiments, each using two real-world datasets. The explicit model formulation of our connectivity estimator has the potential to improve the interpretability of graph-based techniques across a wide range of applications. The code implementing our method is available at https://github.com/sgoerttler/Heat_Connectivity.

0

q-bio.QM 2026-06-29

Code generates models for elongated viral capsids

by Daniel Antonio Negrón, Antoni Luque

Democapsid

Parameters control lattice, axis, sphericity and length to study packaging capacity in viruses.

abstract click to expand

Capsids are the protein shells that protect the genetic material of viruses. The precise structural description of capsids informs how viruses assemble and evolve and is key to the development of antiviral targets. Most viruses form icosahedral capsids; among these, most adopt quasi-spherical shapes, and some form elongated architectures. However, elongated capsids have been understudied, despite their decoupling of width and length providing greater control over their packaging capacity, a feature of particular interest in capsid evolution and in virus-based biotechnological platforms. A key bottleneck is the lack of tools for the analysis and design of elongated viral capsids. To that end, this article introduces Democapsid as a versatile tool for generating coordinates of both quasi-spherical and elongated (and shrunk) icosahedral capsids, as well as for producing customizable graphical models and publication-quality figures. The underlying algorithm builds on the generalized geometrical theory of viral capsids and employs numerical methods to assemble capsid elements based on folding constraints. It includes parameters controlling protein tiling associated with the eight regular icosahedral lattices, elongation axes (5-fold, 3-fold, and 2-fold), sphericity, and discrete body length for prolate (extended) and oblate (shrunk) shapes. It is available as a JavaScript browser application, a Python package powering plugins for UCSF ChimeraX and Blender, and an R package for generating reproducible documents with embedded models. The code (MIT License) is available on GitHub. Democapsid will benefit both researchers and graphic designers by enabling the investigation and communication of research on viral capsids and other icosahedral compartments.

0

cs.AI 2026-06-29

Specialized clinical AI beats general models by 25-39 points on real questions

by Jean Feng, Vishal Patel +6 more

Expert Evaluation of Clinical AI Tools on Real Point-of-Care Clinical Queries

Blinded specialist physicians preferred the targeted tool on accuracy, utility, sources, verifiability and completeness across 620 actual po

abstract click to expand

Physicians now pose millions of clinical questions to AI tools each week, yet these tools are evaluated largely on hypothetical or exam-style questions, not those actually asked in practice. We report a blinded evaluation built on 620 Real-world Point-Of-Care Queries (Real-POCQi) submitted to the OpenEvidence (OE) platform by physicians spanning 30 specialties, as well as 187 questions from HealthBench. 149 practicing physicians across 36 states made head-to-head comparisons between answers from three frontier general-purpose models (Claude Opus 4.8, Gemini 3.1 Pro, and GPT-5.5) and a specialized clinical tool (OE), with graders matched to each question's specialty. When comparing answers along five dimensions relevant to clinical decision support -- accuracy, clinical utility, source quality, verifiability, & completeness -- physicians scored the specialized tool highest on all axes; in the primary analysis on Real-POCQi, win differences (margins between win and loss rates) ranged from 25 to 39 percentage points (p<0.001). Results remained consistent in sensitivity analyses stratifying by citation display, answer length, OE-user status, and Real-POCQi versus HealthBench. In parallel, LLM judges were found to systematically differ from expert judges, though both generally agreed on the best model. These findings underscore two conclusions: (i) AI tool evaluations should reflect real-world query distributions and use expert judges that mirror the specialization defining modern medicine and (ii) the consistent advantage of the specialized tool over general-purpose models does not necessarily mean that the latter cannot serve similar purposes, but that targeted engineering and customization can yield meaningful gains in performance for its users. We release Real-POCQi as a public benchmark, as well as the prespecified statistical analysis for reproducing results of this study.

0

math.DS 2026-06-29

Generic parameters restrict exact lumping to obvious reductions

by Justin Eilertsen, Valery G. Romanovski +2 more

Lumping of reaction networks: Generic and critical parameters

Only elimination of non-reactants or projections along integrals survive for open sets of parameters; algorithms locate the special critical

abstract click to expand

We investigate linear lumping for parameter-dependent mass action reaction networks, distinguishing between generic and critical parameter regimes. For generic parameters -- those ranging in some non-empty open subset of parameter space -- we prove that exact linear lumping yields only "obvious" reductions: elimination of non-reactant species or projections along stoichiometric first integrals. This characterization extends to reaction networks with product-form kinetics, including Michaelis-Menten and Hill-type rate laws. For mass action systems we proceed to develop an algorithmic approach to identify critical parameter sets -- algebraic subvarieties in parameter space where non-trivial lumpings become available. This procedure reduces the determination of lumping maps to a system of finitely many polynomial equations. It also applies to constrained lumping scenarios (which are frequently motivated by chemical considerations). We then review and extend results about proper lumpings. Finally, we discuss lumpings of a self-replicator system, and of a two-pathway enzyme mechanism, to document the viability of our methods in relevant scenarios. Our results clarify the relationship between structural (parameter-independent) and fine-tuned (parameter-dependent) reductions, with implications for approximate lumping when system parameters lie near critical values

0

q-bio.OT 2026-06-29

Three-tier data upgrade proposed to make space biology AI-ready

by Sylvain V. Costes, Sergio Garcia Busto +16 more

Building AI-Ready Data Systems for Space Life Sciences, Aerospace Medicine, and Deep Space Exploration

Progressing from FAIR to AI-ready to space-ready closes the access gap and enables agent-based research for deep space missions.

abstract click to expand

While AI holds the potential to revolutionize space life sciences, realizing this promise is contingent upon the systematic restructuring of heterogeneous spaceflight biological data into machine-actionable, AI-ready forms. Even though open access principles support human reuse and scientific reproducibility, this does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets. In addition, the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces. In order to respond to such growing changes we propose a three-tier approach, proceeding from FAIR to AI-ready to space-ready data. We discuss existing infrastructures and how they can be improved to close the AI access gap. We conclude by proposing a neutral international coordinating body as the governance backbone for the trustworthy, agent-accessible space biology infrastructure that deep space biological research will require.

0

q-bio.QM 2026-06-29

Alzheimer's proteins form many stable patterns

by Sun Lee, Wenrui Hao

Pattern formation in a Reaction-Diffusion Model for Amyloid-β and Tau Interactions in Alzheimer's Disease

Simulations of Aβ-Tau interactions locate multiple steady states, offering one account for resilient cases despite heavy pathology.

abstract click to expand

Alzheimer's disease (AD) is characterized by the accumulation of Amyloid-$\beta$ ($A\beta$) plaques and hyperphosphorylated Tau proteins. However, many individuals exhibit substantial $A\beta$ and Tau pathology without developing dementia, suggesting that disease progression may depend not only on pathological burden but also on the spatial organization of these proteins. Motivated by this observation, we adapt Gray-Scott reaction-diffusion model to investigate pattern formation arising from the interactions between $A\beta$ and Tau. % To systematically identify stable spatial configurations, we employ a Companion-Based Multi-Level Finite Element Method (CBMFEM) on both two-dimensional domains and anatomically realistic cortical surface meshes. Numerical simulations reveal a rich landscape of multiple steady-state solutions, which are subsequently classified into representative pattern phenotypes using principal component analysis and clustering techniques. The results demonstrate that the coupled $A\beta$--Tau system admits numerous stable spatial patterns rather than a single pathological endpoint. % These findings provide a potential mathematical framework for understanding the heterogeneity of Alzheimer's disease and the existence of cognitively resilient individuals despite significant pathological burden. More broadly, the proposed framework suggests a pattern-based therapeutic paradigm in which disease dynamics are guided toward favorable stable states rather than solely targeting the elimination of pathological proteins.

0

q-bio.BM 2026-06-29

Transformers with active learning beat double-data baselines on PRRS epitopes

by Aspen Erlandsson Brisebois, Zahed Khatooni +6 more

Transformer-Based Active Learning for Data-Efficient Vaccine Epitope Selection in PRRS

Reach 86.8 percent accuracy at 60 labels while outperforming random acquisition and nearing the 85 percent noise limit.

abstract click to expand

High-fidelity molecular docking simulations can produce biologically relevant estimates of epitope-receptor binding affinity but are computationally expensive and therefore limit the number of candidates that can be screened for vaccine design. In this work, we evaluate machine learning (ML) approaches where variants of active learning are used to classify instances of high binding affinity between 9-mer epitopes and a well-conserved swine leukocyte antigen (SLA) receptor in the context of Porcine Reproductive and Respiratory Syndrome (PRRS). We use an internally generated dataset of 80 epitope-SLA docking affinities, each requiring more than 48 hours of high-performance computing (HPC). Multiple model families (linear, MLP, CNN, and a small transformer) are trained under strict low-data conditions within a pool-based active learning loop. In each case, optimal model configurations are identified by conducting large-scale hyperparameter optimization over the combined space of model architecture, training configuration, acquisition policy, and ensemble decision rules. To mitigate the effects of data subsample selection, each candidate configuration is evaluated by averaging performance over many randomized and balanced training and validation data subsets. Across experiments, transformer-based sequence models consistently emerged as the best-performing architecture, with active incremental learning yielding significant improvement over a baseline random sample acquisition strategy. Under moderate training data availability (N=30), the optimized ML-model configuration outperforms a standard baseline trained on twice the amount of data. Under higher training data availability (N=60), the same configuration achieves a peak accuracy of 86.8%, consistent with an upper bound of 85% classification accuracy based on two independent estimates of conformational noise.

0

quant-ph 2026-06-29

High-entanglement ZZ map cuts overfitting in quantum binding classifier

by Aspen Erlandsson Brisebois, Luis Pablo Gonzalez Dominguez +9 more

Exploring the Effects of Entanglement on Quantum Machine Learning of Pathogen Epitope-Receptor Binding

All-to-all ZZ feature map delivers highest test-to-training AUAC ratio on 80-example PRRS epitope dataset while matching classical accuracy.

abstract click to expand

Parameterized quantum circuits (PQCs) provide a flexible substrate for hybrid quantum machine learning (QML), but their practical value on Noisy Intermediate-Scale Quantum (NISQ) devices remains an empirical question, especially because training depth and scale can introduce optimization challenges such as barren plateaus. Here we study how the number and topology of two-qubit entangling gates in the feature-map stage influence a fixed hybrid QNN workflow for classifying strong versus weak epitope-receptor binding in Porcine Reproductive and Respiratory Syndrome (PRRS) vaccine design. The dataset consists of docking-derived binding affinities for N=80 9-mer epitopes, labeled as Strong or Weak binding, and partitioned into training, validation, and test subsets using a 40:30:30 split. We compare a classical CNN benchmark with a hybrid Embedding-QNN architecture under four feature-map configurations: a non-entangling Z feature map, an all-to-all high-entanglement ZZ feature map, and two interleaved nearest-neighbour entanglement patterns of low and high depth. Among the configurations tested, the high-entanglement ZZ feature map is seen to provide the strongest evidence of reduced training-set overfit, with a lower training area under the accuracy curve (AUAC) and the highest test/training AUAC ratio, while preserving competitive test-set accuracy. These results do not establish a general QML advantage, but they suggest that feature-map entanglement topology is a meaningful design variable for sparse biological screening tasks and warrants further evaluation with additional metrics, larger datasets, and noise-aware or hardware-based experiments.

0

q-bio.QM 2026-06-29

Habitual timing explains circadian peaks

by Billy C. Smith, Zeel Pansara +12 more

Habitual lifestyle timing explains circadian timing, but daily lifestyle changes do not, in free-living humans across 2000 days

In 2000 days of real-life data, stable habits accounted for 42 percent of timing variance while day-to-day shifts accounted for under 1 perc

abstract click to expand

Background: Both between- and within-subject variations in circadian timing matter for health. If lifestyle changes could be used to regulate circadian timing, they would offer accessible and scalable routes to chronotherapy, but this link remains unclear under real-life conditions. Here, we explore how lifestyle 'traits' (such as typical wake time) and 'states' (day-to-day deviations from traits, such as waking up later than typical) explain between- and within-subject variation in acrophase (peak time) of the circadian rhythm of heart rate (CRHR). Methods: We collected free-living wearable data (smartwatch, continuous glucose monitor) from healthy volunteers for up to 4 weeks. The CRHR was derived from activity-adjusted heart rate, and acrophase was defined as time-of-day at daily CRHR peak. Sleep, food, and physical activity 'factors' were calculated and split into traits and states. Using a linear mixed-effects model, we tested how traits and states associate with between- and within-subject acrophase variance. Findings: Data from 105 healthy volunteers (66 female, age = 42.5 $\pm$ 15.7 years) spanning ~2000 days (18.8 $\pm$ 8.30 days each) were analysed. Traits were substantially more influential than states, explaining 42.3% versus 0.9% of total acrophase variance. Accordingly, traits explained 86.5% of between-subject variance, whereas states explained only 1.8% of within-subject variance. Sleep, food and physical activity factors contributed both jointly and uniquely, and lifestyle timing mattered most. Interpretation: Between-subject lifestyle traits explained acrophase better than within-subject lifestyle states. This asymmetry, alongside the considerable overlap between factors, supports sustained, holistic, timing-focused lifestyle adjustments as chronotherapy targets, testable through future interventional studies.

0

cs.LG 2026-06-29

Tensor network models kids' emotional memory at 78% accuracy

by Henry Groves, Lucia F. Jackson +2 more

Modelling Emotional Memory in Children with Tensor Networks

Recall of a toy depends on the emotional tone of those before and after it, an effect standard models miss.

abstract click to expand

We demonstrate how emotional valence influences the order-dependent structure of children's recognition memory: correct recall of a sequence of emotionally-valenced toys depended not just on the valence of a given toy itself, but also on the valence of the toys shown before and after it. Whilst standard psychological models confirm that order-dependence differs across an event (a set of toys shown in sequence), accuracy is low and the model does not reflect how memory for an emotional object influences others in the set. A classical tensor network model factoring in valence is able to achieve a 77.98\% accuracy in modelling the results of the study. While not strictly a ``quantum cognition'' model, this massive increase in accuracy shows the value of quantum-inspired methods for modelling order-dependent phenomena, such as emotional memory. Further, the task protocol we introduce presents a novel, real-world tool for exploring emotional temporal memory in children for analysis using classical and quantum-like models of cognition.

0

cs.GT 2026-06-29

One simulation run drives evolutionary updates in queues

by Vincent Knight, Geraint I. Palmer-Liyu +1 more

Discrete Event Population Updates: finding game theoretic emergent behaviour in queueing systems with simulation

DEPU feeds payoffs from a long discrete-event run straight into replicator or Moran rules, removing the closed-form barrier and speeding ana

abstract click to expand

Strategic behaviour in queueing systems has been studied extensively in the behavioural queueing literature, but almost exclusively for systems that admit closed-form expressions for the cost or utility experienced by a strategic user. Evolutionary game theory offers a mature framework for analysing populations whose individual payoffs depend on the composition of the population itself, and would in principle apply to a much wider class of queueing systems; its application has, however, been constrained by the same closed-form requirement. We introduce Discrete Event Population Updates (DEPU), a general algorithmic framework that couples a single long run of a discrete event simulation (DES) directly to an evolutionary population update rule, removing that constraint. We present two implementations: Discrete Event Replicator Dynamics (DERD), which follows an Euler discretisation of the replicator dynamics equation, and Discrete Event Moran Replacement (DEMR), which maintains a finite population updated via Moran-style copying events. Both are applied to a multi-server jockeying model for which no closed-form fitness expressions are available. On the jockeying model considered, DEPU reaches comparable precision tens of times faster than the standard practice of nesting short simulations inside an outer evolutionary loop, and because each operating point then costs only a single simulation run it also makes systematic parameter sweeps tractable. This brings the toolkit of evolutionary dynamics within reach of any system a modeller can build in a discrete event simulator.

0

q-bio.QM 2026-06-29

Drug condition accuracy in embeddings fails to ensure cross-drug prediction

by Jake Y. Chen, Huu Phong Nguyen +2 more

SVC-Probe: A Framework for Evaluating Perturbation Generalization in Spatial Foundation-Model Embeddings

Leave-one-drug-out tests drop cosine similarity from 0.944 to 0.30 on the MDA-MB-468 atlas, showing current benchmarks test only narrow case

abstract click to expand

This work examines perturbation generalization in spatial foundation-model embeddings derived from fluorescence microscopy images. Although these models can discriminate drug conditions accurately, it remains unclear whether the learned representations reflect patterns consistent with expected perturbation axes that transfer across drugs. We introduce SVC-Probe, a perturbation-aware framework that combines Subcellular Embedding Atlas Stability, Mondrian Neighborhood Graphs, and a Foundation Model Perturbation Probe to assess embedding stability, neighborhood rewiring, and centroid prediction under drug treatment. Applied to the CM4AI MDA-MB-468 chemical-perturbation atlas comprising 462 antibody labels and SubCell 1536-dimensional embeddings, SVC-Probe demonstrates that 98.6% three-way condition accuracy does not correlate with reliable cross-drug prediction, with cosine similarity diminishing from 0.944 in-domain to 0.30 under leave-one-drug-out evaluation, constituting a two-drug stress test rather than a general benchmark. Null calibration indicates that raw residual-turnover coupling is largely influenced by generic embedding structure, whereas a drug-specific signal emerges under vorinostat and is consistent with chromatin-related reorganization. In contrast, the paclitaxel axis is not robustly reconstructed, likely due to sparse coverage of microtubule-associated proteins. Together, these results introduce and demonstrate a reusable diagnostic framework for stress-testing spatial virtual-cell representations and indicate that perturbation generalization may serve as a stricter and more informative benchmark than baseline condition discrimination.

0

cs.LG 2026-06-29

Recovered expressions rebuild cell graphs for better scRNA-seq clusters

by Jun Tang, Pengwei Hu +4 more

scKDGM: KAN-guided Dynamic Graph Masked Learning for Single-Cell RNA-seq Clustering

scKDGM updates topology from mask-guided recoveries and beats fixed-graph baselines on 12 datasets by NMI and ARI.

abstract click to expand

Single-cell RNA sequencing (scRNA-seq) clustering is essential for identifying cell types, but high dimensionality, sparsity, dropout, and technical noise hinder robust expression representation and cell graph construction. Existing masked autoencoders mainly use expression recovery for feature reconstruction, while graph clustering methods usually depend on fixed KNN graphs and do not feed recovered expression back into graph optimization. We propose scKDGM, a KAN-guided dynamic graph masked learning framework for scRNA-seq clustering. scKDGM uses graph-aware distribution preserving gene masking (GDP-Mask) to perturb cell identity, a KAN-based TAKGCN encoder to learn masked-view representations, mask-guided expression recovery to construct a dynamic graph, and cross-view contrastive learning to transfer recovery signals into topology updates. A ZINB loss models overdispersion and zero inflation. Experiments on 12 real scRNA-seq datasets show that scKDGM outperforms 10 baselines in average NMI and ARI.

0

q-bio.PE 2026-06-29

Maximum-likelihood paths often atypical of real evolutionary histories

by Roberto Netti, Martin Weigt

Reconstructability of evolutionary intermediates in generative epistatic landscapes

Generative protein landscapes show conditional sampling recovers plausible ensembles better than point estimates, with topology setting the

abstract click to expand

Evolutionary intermediates connect observed proteins, but the sequence of steps that produced them is rarely recoverable from extant data alone. Here we ask what can, and cannot, be inferred about such intermediates from the endpoints. Using generative sequence landscapes as controlled models of protein-family evolution, we benchmark data-driven reconstruction against ground-truth simulated trajectories. We find that the best point prediction is not necessarily the most faithful evolutionary reconstruction: maximum-likelihood intermediates can be residue-wise accurate yet statistically atypical, whereas conditional sampling better captures the ensemble of plausible histories. Predictability is limited by the topology of the landscape. Constrained, low-mutability regions preserve information about the path, while permissive high-mutability regions open many alternative routes and erase path-specific memory. We also show that sequence divergence alone is an insufficient measure of elapsed evolutionary time; incorporating endpoint mutability provides a more reliable way to place intermediates in the landscape. These results recast intermediate reconstruction as a calibrated probabilistic problem. Rather than seeking a single "true" sequence, data-driven models should identify when endpoints contain evolutionary information, and return realistic ensembles.

0

q-bio.BM 2026-06-29

Method adds coevolution to ancestral protein reconstruction

by Alya Zeinaty, Leonardo di Bari +4 more

Towards coevolution-aware ancestral sequence reconstruction

DCA-integrated phylogeny produces ensembles that match both trees and natural sequence statistics under epistasis

abstract click to expand

Ancestral sequence reconstruction (ASR) is a powerful approach for studying molecular evolution and the emergence of protein function. Yet most ASR methods assume that sites evolve independently, neglecting the epistatic constraints that shape protein structure, stability, and function. This simplification affects both ancestral inference and its evaluation: maximum-a-posteriori reconstructions may over-concentrate probability into a single over-idealized sequence, whereas independent posterior sampling can generate implausible or poorly functional ancestors. Here, we introduce a coevolution-aware ASR framework that combines standard phylogenetic inference with Direct Coupling Analysis (DCA), thereby preserving site-wise ancestral uncertainty while enforcing residue-residue constraints learned from extant protein families. To benchmark the method, we develop a controlled forward-evolution framework based on a DCA evolutionary sampler, allowing reconstructed ancestors to be compared with known ground-truth sequences generated under realistic epistatic constraints. Applied to beta-lactamases and DNA-binding domains, the approach improves reconstruction when ancestral states are epistatically constrained, and yields ensembles of candidate ancestors that are both phylogenetically consistent and statistically compatible with natural protein families. This framework bridges the gap between single-sequence MAP reconstruction and unconstrained posterior sampling, providing a practical route toward ancestral reconstructions that better reflect the coupled nature of protein evolution.

0

cs.LG 2026-06-29

Two-stage tuning aligns proteins to target amino-acid mixes

by Violeta Basten-Romero, Rubén Muñoz-Tafalla +4 more

Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition

Fine-tuning shifts average composition; reinforcement learning then enforces exact matches while sequence quality stays intact.

abstract click to expand

Protein language models are standard priors for biological sequence generation, but steering them toward explicit distributional design targets remains largely unexplored. We study a constrained protein generation problem in which sequences must match a desired amino-acid (AA) composition profile while preserving plausible sequence statistics and diversity. The motivating application is synthetic feed protein design, where the AA composition of dietary proteins directly determines their nutritional value. We propose a two-stage pipeline in which domain-adaptive fine-tuning (FT) on an in-domain protein dataset is followed by iterative reward-weighted FT via reinforcement learning (RL) anchored against the FT model as a frozen reference. We evaluate the pipeline on two AA compositions and find that FT brings the average composition close to the target, while the subsequent RL enforces specific sequence constraints that FT alone cannot satisfy. We additionally evaluate the design choices of the proposed composition reward term against two baselines and an ablated variant, isolate the contribution of each training stage, and verify that AA composition alignment is achieved without degrading sequence quality.

0

q-bio.QM 2026-06-29

MCIDs set for smartphone gait metrics in progressive MS

by Mike D Rinderknecht, Bernhard Fehlmann +8 more

Establishing the Minimal Clinically Important Difference (MCID) for Smartphone-Derived Gait Measures in Multiple Sclerosis

Anchor-based analysis of 243 patients yields thresholds including -0.16 m/s step velocity to interpret meaningful change.

abstract click to expand

Background: Digital health technologies allow for frequent, remote gait monitoring in people with multiple sclerosis (MS). However, to differentiate daily variability from actual disease progression in longitudinal data, established minimal clinically important differences (MCID) are required. Currently, there is limited literature defining these thresholds for digital gait metrics. Objective: To establish MCIDs for digital gait measures reflecting progression in MS. Methods: Digital gait measures were captured via daily, remote, smartphone-based Two-Minute Walk Tests in CONSONANCE (NCT03523858), a phase 3b study of ocrelizumab in progressive MS. Using an anchor-based approach, median changes from baseline at Week 96 on digital gait measures were computed for patients showing clinically meaningful worsening on either Timed 25-Foot Walk, Ambulation Score, Expanded Disability Status Scale, or 12-item Multiple Sclerosis Walking Scale. These changes were subsequently triangulated to derive the MCID estimates. Results: 243 patients with progressive MS (female: n=125 (51%); mean [SD] age: 49.3 [9.3]; mean [SD] EDSS: 4.8 [1.4]) had digital gait data available at baseline and Week 96. Median changes were generally consistent across anchors. Triangulated MCIDs are: Step Velocity = -0.16 m/s, Step Velocity Scaled to Walking Time = -0.18 m/s, Step Duration = 0.06 s, Step Length = -0.07 m, Total Number of Steps = -28, and Total Distance Walked = -24 m. Conclusion: These MCIDs provide a framework for interpreting meaningful gait changes and integrating digital measures into MS outcome evaluation. Beyond facilitating novel clinical trial endpoints to evaluate treatment efficacy, they enable objective, real-world monitoring to advance personalized patient care.

0

eess.AS 2026-06-29

Fusing Whisper acoustics with LLM language features detects dementia

by Olivier Jiyoun Jung, Jonghyeon Park +1 more

Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection

The pipeline combines acoustic embeddings and prompted linguistic descriptors through gated fusion to classify speakers on standard speech d

abstract click to expand

Early detection of dementia through speech analysis offers a non-invasive screening alternative, but capturing both acoustic and linguistic biomarkers remains challenging. We propose a multimodal framework leveraging Whisper for dual-purpose extraction: acoustic representations from encoder outputs and transcripts via automatic speech recognition (ASR). For the acoustic pathway, temporal networks with attention pooling aggregate variable-length sequences into fixed-dimensional embeddings. For the linguistic pathway, we prompt a large language model (LLM) to extract interpretable features spanning lexical diversity, syntactic complexity, semantic coherence, and discourse patterns. A gated fusion network integrates both modalities. On ADReSS and ADReSSo, our method achieves F1-scores of 89.47% and 90.14%, demonstrating effective integration of acoustic and LLM-augmented linguistic features. Ablation shows that multimodal fusion consistently outperforms either modality alone.

1 0

0

q-bio.NC 2026-06-29

Toolkit unifies CANN simulation and attractor analysis

by Sichao He, Aiersi Tuerhong +5 more

CANNs: A Toolkit for Research on Continuous Attractor Neural Networks

Python library, Rust backend and homology pipeline recover results on spatial and directional encoding.

abstract click to expand

Continuous attractor neural networks (CANNs) are the canonical computational framework for how the brain encodes continuous variables such as spatial position, head direction, and movement direction, and explain the activity of hippocampal place cells, entorhinal grid cells, and head-direction cells. CANN research, however, is fragmented: most results rest on lab-specific implementations, general-purpose simulators lack CANN-specific abstractions, and the path from spike trains to attractor geometry in real recordings lacks a standardized toolkit. Here, we present a comprehensive open-source toolkit that unifies the full CANN research workflow. It combines three tightly integrated components: 1) canns, a Python library on BrainPy/JAX that provides standardized 1D/2D CANNs, spike-frequency-adaptation variants, grid cell networks, hierarchical path-integration models, and brain-inspired attractor architectures, together with curated datasets, task generators, an analyzer module and trainer modules for biologically plausible plasticity; 2) canns-lib, a Rust acceleration backend delivering hundreds-of-times speedups for spatial-navigation workloads and modest gains for Ripser-based persistent homology; 3) ASA (Attractor Structure Analyzer), a PySide6 pipeline applying persistent homology and cohomology to experimental neural recordings to detect ring-like and toroidal attractor signatures in real data. The toolkit ships with full-detail reproducible pipelines that recover recent CANN results including SFA-driven anticipative tracking, theta sweeps in head-direction/place/grid systems, and hierarchical path integration.

0

cs.CV 2026-06-29

XAI should become standard for validating ecological image models

by Brinnae Bent, Holly R. Houliston +3 more

Explainable AI for Biodiversity Monitoring and Ecological Image Analysis

Conservation needs to know not only whether AI predictions are correct but why they are correct.

abstract click to expand

Artificial intelligence is transforming biodiversity monitoring by enabling automated analysis of ecological imagery collected from camera traps, drones, satellites, underwater platforms, and other sensing systems. These tools can expand the scale and speed of conservation assessments, yet many computer vision models remain difficult to inspect, making it challenging to determine whether predictions are based on ecologically meaningful signals or on spurious correlations, sampling biases, and other artifacts that may undermine conservation decisions. We argue that explainable artificial intelligence (XAI) should become a standard component of ecological model validation because conservation practitioners increasingly depend on understanding not only whether a model is accurate, but why it is accurate. We provide practical guidance for applying XAI to three common ecological computer vision tasks: image classification, object detection, and image segmentation. To illustrate how XAI can support ecological model auditing, refinement, and deployment, we present two case studies using aerial imagery: harbor seal detection and cetacean anatomical segmentation. These examples demonstrate how explanation methods can identify biologically meaningful cues, reveal false positives driven by background and shape confounds, uncover edge and occlusion effects, and guide data collection, augmentation, and retraining strategies. More broadly, they show how explainability can help assess whether model reasoning aligns with ecological understanding. We conclude by identifying key challenges and opportunities. By making model behavior more transparent and scientifically interrogable, XAI can help ensure that AI-supported ecological evidence is more reliable, understandable, and actionable for biodiversity conservation.

0

q-bio.GN 2026-06-29

scRNA-seq maps human fat-cell development to 15 states

by Weny S. M Sitinjak, Humasak Tommy Argo Simanjuntak

Reconstructing the Developmental Trajectory of Adipocytes in Human Adipose Tissue Using Single-Cell RNA Sequencing

Analysis of adipose tissue finds seven transitional states and names IGF and FGF pathways as the main signals active throughout differentiat

abstract click to expand

Obesity is a global health crisis associated with metabolic disorders such as type 2 diabetes and cardiovascular disease. This study employed single-cell RNA sequencing to reconstruct the developmental trajectory of human adipocytes from adipose tissue samples. Our analysis identified 15 transcriptionally distinct cell clusters, including 7 transitional states, revealing the dynamic process of adipocyte differentiation. We detected 16 functionally active signaling pathways mediating cellular communication between adipocytes and their progenitors. Among these, insulin-like growth factor (IGF) and fibroblast growth factor (FGF) pathways emerged as the most prominent networks, showing consistent activity across differentiation stages (p<0.05). The study revealed depot-specific differences, with visceral adipocytes undergoing additional extracellular matrix remodeling absent in subcutaneous differentiation. Spatial analysis further showed that IGF signaling was particularly active in perivascular niches, while FGF activity dominated in mature adipocyte zones. These results provide the first comprehensive map of human adipocyte development, highlighting IGF and FGF pathways as potential therapeutic targets. The identified signaling networks offer new insights for developing interventions to promote healthy adipose expansion or inhibit pathological fat accumulation. This work advances our fundamental understanding of adipose tissue biology while providing clinically relevant data for metabolic disorder treatments.

0

cs.GT 2026-06-29

Reactive Nash equilibria map one-to-one to action subsets

by Franziska Lesigang, Christian Hilbe +1 more

Characterisation of reactive Nash equilibria in repeated additive games

Linear conditions on strategy parameters fully characterize symmetric cases and recover equalizers when all actions are supported.

abstract click to expand

In this paper, we study reactive strategies in repeated additive games between two players with finitely many actions. Reactive strategies condition only on the opponent's previous action, making them one of the simplest ways players can respond to past interactions. Additive games include important models of cooperation, such as the donation game and games with a punishment option. We show that, for this class of games and strategies, the conditions for symmetric Nash equilibria reduce to a system of linear equalities and inequalities in the strategy parameters, allowing us to characterise all such equilibria. We establish a one-to-one correspondence between non-empty subsets S of the action set and equilibrium classes, which we call S-supporting equilibria. These are equilibria that use exactly the actions in S when playing against themselves. As a special case, we recover the well-known equalizer strategies as the equilibria supported on the entire action set. To assess which equilibrium classes are most evolutionarily relevant, we complement our analytical characterisation with simulations of social learning dynamics. We find that their prevalence is determined by two factors: how likely they are to be generated and how robust they are against invasion.

0

q-bio.PE 2026-06-26

Phylogenetic likelihoods gain up to 10x on multi-core CPUs and GPUs

by Karthik Gangavarapu, Xiang Ji +5 more

BEAGLE 4.1: A high-performance library for computation on phylogenetic trees across diverse parallel architectures

New gradient algorithms and hardware support in the library scale with taxa and site patterns for both nucleotide and codon models.

abstract click to expand

Efficient evaluation of sequence data likelihoods and their high-dimensional gradients on phylogenetic trees improves inference under both maximum-likelihood and Bayesian frameworks. Here, we present BEAGLE 4.1, a high-performance library for statistical phylogenetics that incorporates new algorithms to evaluate these gradients on phylogenetic trees. We also provide new hardware implementations for both likelihoods and gradients supporting ARM NEON intrinsics and optimized matrix multiplication units -- called tensor cores -- on NVIDIA graphics processing units (GPUs). We benchmark the performance scaling of the library across a number of patterns and taxa on multi-core CPUs and GPUs, and compare the speedup afforded by NVIDIA and AMD GPUs as well as performance scaling with an increasing number of GPUs. We show that multi-core CPU implementations provide up to a fourfold speedup over single-threaded CPU implementations and up to an tenfold speedup for nucleotide and codon models, respectively, with performance generally improving as the number of taxa and site patterns increases. GPUs outperform multi-threaded CPU implementations for a realistic number of patterns, even for nucleotide models with a small state-space size of 4, while for codon models they provide substantially higher performance gains even for a single pattern or four taxa. Tensor cores on GPUs provide up to 2-fold speedup relative to standard CUDA cores for codon models. Using NEON instructions on ARM CPUs affords up to a $\sim 1.3$-fold speedup over non-SIMD implementation with the speedup going down to 1.1-fold at 8 CPU threads. We provide these new algorithms to evaluate the gradient and efficient hardware implementations for both likelihood and gradient calculations through BEAGLE 4.1, such that they can be readily integrated into phylogenetic software packages.

0

cs.CV 2026-06-26

ZIBeta distribution model outperforms regression in TPS prediction

by Krzysztof Pysz, Artur Bartczak +3 more

Distribution-based deep multiple instance learning for tumor proportion scoring in NSCLC

Multiple instance learning estimates tumor proportion score distributions from slide labels alone, improving accuracy over linear baselines

abstract click to expand

Accurate assessment of tumor proportion score (TPS) in non-small cell lung cancer (NSCLC) is critical for treatment planning and prognosis. Key challenges include the tedious manual work required to annotate each slide, combined with the limited number of experts certified for this task. Multiple instance learning (MIL) has proven to be an effective approach for predicting TPS scores at the slide level; however, existing methods struggle with non-expressive (zero class) images. Our approach involves two models: (1) an embedding-extraction and multiclass-classification network that captures the histopathological features of individual patches, and (2) a MIL model that aggregates these embeddings to predict zero-inflated beta (ZIBeta) parameters representing the overall TPS probability distribution for the entire slide. Using only slide-level TPS scores as labels, we demonstrate how this end-to-end framework can leverage a novel distribution-based architecture to improve prediction accuracy and explainability. ZIBeta modeling significantly outperforms baseline linear and ridge regression while capturing expected accuracy through distribution concentration.

0

q-bio.NC 2026-06-26

Stronger inhibitory-to-excitatory synapses explain chronic stress effects

by Mauricio A Diaz, Manuela A. Beyer +1 more

Modelling chronic stress as an excitatory-inhibitory perturbation in recurrent working-memory networks

This single change matches all observed prefrontal signatures and produces resilient but less flexible networks.

abstract click to expand

Stress is an adaptive response coordinated by neural and physiological systems. While acute stress can enhance survival, chronic stress drives structural brain changes, cognitive dysfunction, and increased psychiatric risk. At the cellular level, chronic stress shifts the excitatory-inhibitory (E/I) balance of prefrontal pyramidal neurons toward inhibitory dominance, yet the mechanisms underlying these alterations are still unknown. We here investigate possible mechanisms causing inhibitory dominance using recurrent neuronal networks trained on a working memory task. Chronic stress is modelled as a modulation in synaptic strength or neuronal activity, systematically comparing eight candidate operators against three experimentally motivated signatures of stress-induced prefrontal dysfunction: inhibitory dominance, excitatory hypofunction, and impaired task performance. These signatures are all recovered by a single stress mechanism, stronger inhibitory-to-excitatory synapses. Contrasting naive networks with resilient networks trained under the stress mechanism, we find that resilience training not only preserves task performance under stress, but also confines the network to the same dynamical subspace and energetic regime with and without stress. This resilience comes at a cost: resilient networks generalise less well when the task requires longer memory than seen during training, indicating that resilient networks find a specialised solution tuned to the trained regime. This trade-off between resilience and generalization performance persists across stress magnitude and network size, offering a computational analogue of the shift toward rigid, habit-like behaviour reported in animal following chronic stress.

0

q-bio.GN 2026-06-26

GRAFT dataset links genes to traits in same Arabidopsis plants

by Manuel Serna-Aguilera, Vanshika Jindal +6 more

GRAFT: Biological Graph and Hypergraph Benchmarks for Linked Gene Expression and Phenotypic Trait Prediction in Arabidopsis thaliana

First resource pairs gene expression profiles with heterogeneous phenotypic data from identical specimens for genome-to-phenome mapping.

abstract click to expand

Understanding which genes control which traits in an organism remains one of the central challenges in biology. Despite significant advances in data collection technology, our ability to map genes to traits is still limited. This genome-to-phenome (G2P) challenge spans several problem domains, including plant breeding, and requires methods capable of reasoning over high-dimensional, heterogeneous, and biologically structured data. Current datasets and data repositories, however, are not well-equipped for this task. Current studies do not link gene expression and trait data, and most focus on very specific traits, limiting the breadth of possible correlations. To address this gap, we present the novel Gene-Graph Regression for Arabidopsis Functional Traits (GRAFT) dataset, a curated multi-modal dataset linking gene expression profiles with phenotypic trait measurements in Arabidopsis thaliana, a model organism in plant biology. GRAFT supports tasks such as phenotype prediction and interpretable graph learning. In addition, we benchmark conventional regression and explanatory baselines, including a biologically-informed hypergraph baseline, to validate gene-trait associations. To the best of our knowledge, this is the first dataset to provide multimodal gene information and heterogeneous trait or phenotype data for the same Arabidopsis thaliana specimens. With GRAFT, we aim to foster research to accurately understand the relationship between genotypes and phenotypes using gene information, higher-order gene pairings, and trait data from multiple sources.

0

q-bio.QM 2026-06-26

Emulators match marine model skill over decades at lower cost

by Jozef Skakala, Ieuan Higgs +1 more

Deep learning model emulators for marine biogeochemistry forecasting from days to decades

LSTM and 1D-CNN networks stay stable for multi-decadal runs, forecast spring blooms years ahead, and beat the parent model when trained on r

abstract click to expand

Deep-learning emulators have emerged as a promising approach for reducing the computational cost of Earth System Models while potentially improving forecasting skill. Here, we demonstrate the successful emulation of a high-complexity marine biogeochemistry model within a simplified one-dimensional water-column framework. We explore two emulator architectures: Long Short-Term Memory (LSTM) neural networks that emulate a selected subset of variables at daily resolution, and physics-informed one-dimensional Convolutional Neural Networks (1D CNNs) that emulate the full pelagic system throughout the water column also at daily resolution. Using ocean physics simulator inputs, both emulators remain largely stable over multi-decadal timescales and accurately reproduce the parent model in both decadal climate projections and short-range (10-day) forecasting applications. The former includes the ability to predict the timing of phytoplankton Spring blooms several years in advance. When trained on reanalysis data, the emulators substantially outperform the parent model's forecast skill score for several key ecosystem variables, including phytoplankton and zooplankton. If similar performance can be achieved in three-dimensional regional applications, these emulators could provide substantially higher-quality predictions at a fraction of the computational cost. We further apply novel explainability techniques to identify key drivers of emulator behaviour and gain insights into emergent ecosystem dynamics. Performance is evaluated using a range of metrics, including the reproduction of daily variability and extreme events. These approaches have considerable potential for future applications in operational forecasting, climate-scale simulations, and marine autonomous systems.

0

q-bio.QM 2026-06-26

Energy allocation links scaling exponents to von Bertalanffy growth

by Hana Krakovská, Klaus Stiefel +1 more

Metabolic scaling, von Bertalanffy growth and an exponent equation

Feasibility of energy fractions imposes constraints on developmental speed and mass scales.

abstract click to expand

In this work, we interpret developmental growth as a metabolic energy allocation problem and link the von Bertalanffy growth model to metabolic energy investments into the growth channel. Using a framework that specifies how metabolic energy is allocated among baseline maintenance, growth, and other processes, we analyse the resulting growth allocation patterns and derive direct relationships between key scaling exponents: the mass-growth exponent, the length-based exponent, the metabolic scaling exponent, and the geometric exponent, which describes the mass-length relationship. These exponents determine the metabolic investment exponent, which controls the qualitative behaviour of the growth-allocation function. Requiring the inferred allocation fraction to remain biologically feasible, we derive constraints on developmental velocity and characteristic mass scales. This provides a physical, energy-based interpretation of phenomenological growth curves and clarifies how metabolic scaling, geometric scaling, and growth dynamics are interrelated within a single allocation framework.

0

q-bio.PE 2026-06-26

Library scales phylogenetic shape inference to 850-node trees

by Gefan Yang, Marcus Teller +3 more

Hyperiax and Phylogenetic Inference from Shape Data

Hyperiax applies BFFG to 118-landmark butterfly wings and 79-landmark bird beaks on trees far larger than earlier studies allowed.

abstract click to expand

Phylogenetic inference on high-dimensional morphological traits requires algorithms that account for both the nonlinear geometry of the shape data and the phylogenetic tree structure. The Backward Filtering Forward Guiding (BFFG) framework provides smoothing for nonlinear stochastic processes on trees and enables inference of parameters and ancestral states. As practical adoption has been limited by a lack of efficient implementations, we present Hyperiax, an open-source library for tree traversal algorithms and message passing using JAX, designed particularly to support operations needed for BFFG. Hyperiax enables efficient execution of operations on trees with large numbers of nodes and, coupled with the BFFG-specific operations, this allows efficient inference in both discrete-time and stochastic differential equation models. Concretely, we demonstrate that Hyperiax enables parameter inference and ancestral reconstruction for butterfly wing shapes represented by landmarks in two dimensions, and analyses of avian beaks from landmarks in three dimensions. Both cases demonstrate application of BFFG on two substantially larger phylogenetic trees with 850 and 696 nodes with higher resolution shape data (118 two-dimensional landmarks and 79 three-dimensional landmarks, specifically) than previously possible.

0