q-bio.OT — Pith

0

q-bio.OT 2026-07-03

Positive cues raise dog approaches to humans

by Srijaya Nandi, Dipanjan Roy +3 more

Operant Conditioning in Indian Free-Ranging Dogs: Effects of Positive and Threatening Cues on Sociability

Five-day experiments show learned sociability changes partly carry over to new people only after positive experiences.

abstract click to expand

Sociability toward humans is a key adaptive trait in free-ranging dogs, enabling them to access resources while navigating risks associated with human interactions. In this study, we investigated whether operant conditioning shapes sociability in Indian free-ranging dogs and whether learned responses generalize to unfamiliar individuals. We experimentally exposed 58 dog groups to either positive or a threatening cue over five consecutive days and assessed their behaviour using approach proportion, approach latency, and demeanor across repeated interactions with a familiar experimenter, followed by a test with an unfamiliar individual. Using Bayesian generalized linear mixed models, we found that cue type and repeated exposure significantly influenced sociability. Dogs exposed to a positive cue showed increased approach behaviour and reduced approach latency over time, along with increased affiliative demeanor. In contrast, dogs exposed to threatening cues exhibited reduced approach behaviour, increased approach latency, and a shift toward neutral and less affiliative responses across days. Importantly, positive cues partially generalized across individuals, as dogs showed increased approach toward an unfamiliar experimenter, although this was accompanied by hesitation to approach. In contrast, threatening cues did not generalize in the same way; dogs did not reduce their approach toward unfamiliar individuals but displayed increased approach latency, indicating heightened caution. Our findings demonstrate that operant conditioning plays a crucial role in shaping dog-human interactions, with asymmetric generalization of positive and threatening experiences.

0

q-bio.OT 2026-06-30

Soccer headers carry under 20% concussion risk

by Christopher Lewis, Anu Tripathi +6 more

Head Kinematics and Brain Tissue Deformation from Soccer Heading: A Review of Implications for Brain Injury Risk

Review of kinematics data shows higher motion in matches, corner kicks and older players but predicted mild brain injury odds remain low.

abstract click to expand

Purpose: Repeated heading of soccer balls has raised concerns of potential long-term neurological effects. Consequently, numerous studies have estimated head kinematics and brain deformation due to soccer headers across different cohorts and play scenarios to identify higher risk conditions. However, heterogeneity in study design, data collection, and analysis has produced inconsistent findings, and injury risk is infrequently reported. Therefore, a meta-analysis of the existing literature was conducted to identify knowledge gaps and inform future studies assessing injury risk in soccer. Methods: We synthesized data from studies reporting head kinematics or brain deformation from soccer headers on human subjects. The data from these studies were analyzed to obtain the risk of mild traumatic brain injury (mTBI) based on applicable injury metrics and risk curves. Results: The meta-analysis revealed specific trends, indicating that match scenarios, corner and goal-kicks, top and oblique impacts, and older age cohorts were associated with higher head kinematics, while sex-based trends were inconclusive. The choice of sensor system affected the estimated head kinematics, with headband sensors consistently measuring higher kinematics than mouthpiece sensors. The data showed large variability stemming from heterogeneous study designs, limiting the applicability of the observed trends. These factors also influenced injury risk predictions, with estimated concussion risks generally below 20%. Conclusion: This review reveals trends in mTBI risk from soccer heading across different cohorts and play scenarios. It also underscores the need for standardized reporting of kinematics and brain deformation to enable mTBI risk estimation and meaningful cross-study comparisons.

0

q-bio.OT 2026-06-29

Three-tier data upgrade proposed to make space biology AI-ready

by Sylvain V. Costes, Sergio Garcia Busto +16 more

Building AI-Ready Data Systems for Space Life Sciences, Aerospace Medicine, and Deep Space Exploration

Progressing from FAIR to AI-ready to space-ready closes the access gap and enables agent-based research for deep space missions.

abstract click to expand

While AI holds the potential to revolutionize space life sciences, realizing this promise is contingent upon the systematic restructuring of heterogeneous spaceflight biological data into machine-actionable, AI-ready forms. Even though open access principles support human reuse and scientific reproducibility, this does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets. In addition, the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces. In order to respond to such growing changes we propose a three-tier approach, proceeding from FAIR to AI-ready to space-ready data. We discuss existing infrastructures and how they can be improved to close the AI access gap. We conclude by proposing a neutral international coordinating body as the governance backbone for the trustworthy, agent-accessible space biology infrastructure that deep space biological research will require.

0

cs.AI 2026-06-23

Distinct timescales turn closed organizations into agents

by Amahury J. López-Díaz, Carlos Gershenson

A Matter of Time: Towards a General Theory of Agency

A theory shows how associating processes in self-referential systems with different timescales produces endogenous anticipation and open-end

abstract click to expand

Agency is often invoked in research on philosophy, biology, and cognitive science without a clear account of how it originates from material organization. Building on temporally parametrized (F, A)-systems, this paper develops a graded organizational theory of agency grounded in relational biology, physical biosemiotics, and process ontology. We argue that self-referential closure cannot be adequately conceived outside time: once the constitutive processes of a semantically closed organization are associated with distinct characteristic timescales, the organization unfolds into an out-of-sync dependency structure that can be formally redescribed as a history-dependent, revisable Asynchronous Dynamic Bayesian Network. This move allows for a principled distinction between autonomy, goal-directedness, agency, and open-endedness. Autonomy arises from precarious closure to efficient causation under material openness; goal-directedness from the maintenance of viability-supporting organization; agency appears when such organization acquires an endogenous anticipatory structure that selectively modulates organism-environment coupling in light of possible futures; open-endedness begins when this anticipatory organization can reconstruct its own future space of possibilities. Our framework reconciles Rosennean anticipation with organizational closure, restricts Markov blankets and active inference to derived formal redescriptions rather than first principles, and reinterprets computational enactivism in non-Fristonian terms. By deriving weaker temporalized organizations, our contribution outlines a hierarchy from proto-agential chemical systems to fully semantically closed agents, with implications for multicellular organisms, synthetic lifeforms, and neuroscience.

0

cs.AR 2026-06-22

Computer systems must be redesigned for biological data analysis

by Nika Mansouri Ghiasi, Konstantina Koliogeorgi +1 more

Architecture for Health Initiative (Arch4Health): Computational Challenges in Health-Related Applications and the Role of Computer Architecture in Addressing Them

High-throughput biotech data outpaces conventional hardware, requiring architecture changes to deliver efficient, private healthcare process

abstract click to expand

Recent biotechnological advances enable high-throughput, low-cost, and accurate biological data generation. This wealth of data enables unique opportunities for advancing healthcare. Despite these opportunities, efficiently analyzing large-scale biological data poses significant challenges for conventional computing systems. These systems often cannot keep up with the high-throughput rate at which data is generated, and they face additional constraints related to energy efficiency, scalability, privacy, and security. Therefore, to facilitate the wide adoption of recent advances in healthcare, there is a need to optimize the computing systems to enable high-performance, energy-efficient, low-cost, private, and secure analysis of biological data. We introduce the Architecture for Health (Arch4Health) initiative, which aims to (i) identify and analyze key computational challenges in current and future health- and life science-related applications and (ii) explore how computer architects and computing system designers can advance healthcare by addressing these challenges. In this short paper, we first present the motivations behind the Arch4Health initiative and, second, elaborate on its vision and goals, related topics, Arch4Health workshops, and future outlooks.

0

q-bio.OT 2026-06-22

Randomized test isolates which prompt parts drive clinical AI accuracy

by Bin Hu, Avneek Sandhu +1 more

PROMPT: A Pre-registered Randomized Protocol for Component-Level Evaluation of Clinical AI Prompts

PROMPT uses matched controls to show a decoding rule as the sole active element while safety blocks and scaffolding prove neutral or harmful

abstract click to expand

BACKGROUND:Prompt engineering shapes medical AI outcomes, but prompt components are rarely tested as clinical interventions. We developed PROMPT (Pre-registered Randomized Outcome Measurement for Prompt Testing), a protocol using pre-specification, randomization, matched controls, dismantling, and decision rules.METHODS:Two pre-registered demonstrations used Claude Sonnet 4.6. Exp 1 used a synthetic tumbling-E orientation task: 630-trial main study, 480-trial dismantling study, and 1,050-trial 2x2 factorial extension. Exp 2 used the same arms on 16 label-masked CBIS-DDSM mammographic crops in four orientations: 256 confirmatory trials and a 64-trial Arm E extension. Matched controls removed the active component while preserving framing, structure, and output format.RESULTS:PROMPT identified beneficial, inactive, harmful, and task-dependent effects. In Exp 1, the full prompt achieved 98.6% orientation accuracy; removing the decoding rule reduced accuracy to 50.1% (difference, +48.5 pp; 95% CI, +43.2 to +53.7; P<0.001). A rule-only arm matched the full prompt (maximum difference, 2.3 pp), identifying the decoding rule as the sole measurable active component. A prohibited-reasoning block assumed to improve safety was inactive, an effect missed by whole-prompt comparison. Scaffolding without the task-specific rule underperformed the vehicle prompt, showing prompt structure alone was harmful. Exp 1 revealed a canonical-RIGHT error phenotype in no-rule arms, consistent with a RIGHT-orientation prior. In Exp 2, the phenotype recurred on mammographic images, but the rule's benefit was attenuated and did not meet the threshold (+14.1 pp; bootstrap 95% CI, -3.1 to +29.7; post-hoc mixed-model 95% CI, +3.5 to +24.6).CONCLUSION:PROMPT revealed component effects missed by whole-prompt evaluations, identifying safety vulnerabilities and performance failures before clinical AI deployment.

0

q-bio.OT 2026-06-22

Top pistachio genotypes hit 99% germination and match in genetic clusters

by Aram Akram Mohammed, Ibrahim Maaroof Noori

Germination capacity of pistachio (Pistacia vera L.) seeds related to genotypic variation and phytochemical contents

ISSR and RAPD markers place high-performers together while low ones like G4 and G8 group separately.

abstract click to expand

Genetic diversity and phytochemical components are the endogenous factors that influence seed germination. The current study aimed to compare the seed germination capacity of 15 Pistacia vera genotypes after assessing their genotypic variation using 32 primers (16 ISSR and 16 RAPD) and phytochemical contents. The obtained results explained that the ISSR primers classified the 15 P. vera genotypes into four groups, while the RAPD primers classified them into three groups. The genotypes G11, G5, G1, G9, G6, G14, and G10 had the highest germination percentages (98.89, 97.67, 96.67, 94.44, 93.33, 93.33, and 91.11%), respectively. Additionally, their germination speeds were also the highest. However, the lowest germination percentages (62.22 and 68.59%) were recorded in G8 and G4, respectively. Meanwhile, (G9, G10, and G11), (G1 and G14), and (G5 and G6) were identified together in the same group in accordance with both ISSR and RAPD primers. Also, G4 and G8 were in the same subgroup based on RAPD primers. Moreover, the maximum percent protein values (21.88, 21.88, and 20.78%) were measured in the seed kernels of G9, G11, and G1, respectively. Soluble sugar content was the best (798.9 ug g-1) in G11. The best percentage of oil (45.3%) was observed in G5.

0

q-bio.OT 2026-06-19

Dataset labels 5996 km² of cocoa and non-cocoa land at 99% accuracy

by Kasimir Orlowski, Michele Meroni +5 more

COLD-CI: A large-scale very high-resolution label polygon dataset for cocoa and non-cocoa classification in Cote d'Ivoire

123736 polygons from 0.5 m imagery released for transparent cocoa mapping and monitoring in Côte d'Ivoire

abstract click to expand

Spatially explicit information on cocoa cultivation is essential for land-use planning, deforestation monitoring, environmental assessment, and supply-chain analysis. Although several cocoa map products exist, their underlying reference data are often not publicly available, limiting transparency and methodological benchmarking. Here, we present a large-scale, very high-resolution cocoa and non-cocoa label polygon dataset for Cote d'Ivoire (COLD-CI), covering the main cocoa-producing regions as well as contrasting non-cocoa landscapes. COLD-CI consists of 123,736 vector polygons corresponding to a total labelled area of 5,996 km^2, including 58,107 cocoa polygons (1,788 km^2) and 65,629 background polygons (4,208 km^2). Polygon label candidates were first generated through conservative automated filtering of polygons from the West Africa Cocoa dataset and the combination of multiple external thematic datasets. These candidates were subsequently refined and complemented through systematic visual interpretation, manual correction, and digitisation using very high-resolution (0.5 m) satellite imagery. The resulting label polygons capture cocoa planted areas and associated fine-scale internal heterogeneity, as well as a wide range of non-cocoa land-cover types. Independent validation using field-based and expert photointerpreted reference data from the Copernicus4GEOGLAM validation dataset indicated an overall agreement of 99%, with producer's and user's accuracy exceeding 98% for both cocoa and background classes. COLD-CI is released as a vector dataset with associated metadata to support transparent benchmarking, model development, and validation across a wide range of spatial resolutions.

0

q-bio.OT 2026-06-17

Compact model matches large AI on Bangladeshi fish ID

by Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib +1 more

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

Dataset of 2845 sequences lets a small hybrid rival ProtBERT while using no GPUs and far less memory

abstract click to expand

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

0

q-bio.OT 2026-06-15

Entropy cost per heartbeat stays constant across body sizes

by Mesfin Taye

Biological proper time and entropy-cost invariance in cardiac and respiratory lifespan scaling

Kleiber power scaling and quarter-power frequency scaling cancel, leaving thermodynamic cycle cost independent of mass.

abstract click to expand

Warm-blooded vertebrates accumulate approximately conserved numbers of physiological cycles over a natural lifetime: of order $10^9$ heartbeats and $10^8$--$3\times10^8$ breaths. These regularities are not exact constants, but their persistence across orders-of-magnitude variation in body mass, metabolic power, physiological frequency, and lifespan suggests that biological time is not measured by chronological duration alone. We develop the Principle of Biological Time Equivalence (PBTE), a thermodynamic framework in which lifetime cycle count is determined by the ratio between total lifetime entropy production and the entropy cost of one physiological cycle. Starting from the open-system entropy balance $\dot S=\dot e_p-\dot h_d$, we define the entropy cost per cycle as $\sigma_0=d\Sigma/dN$, where $d\Sigma$ is the entropy produced as the physiological clock advances by $dN$ cycles. For an adult homeostatic regime, this gives the cycle-count relation $N_\star=\Sigma/\langle\sigma_0\rangle$, with $\Sigma=\int_0^L \dot e_p(t)\,dt$, where $N_\star$ is the lifetime cycle count, $\Sigma$ is total lifetime entropy production, and $\langle\sigma_0\rangle$ is the lifetime-averaged entropy cost per cycle. In the homeostatic limit, $\dot e_p\simeq P/T$, so direct measurement of metabolic power $P$, body temperature $T$, and physiological frequency $f$ gives $\sigma_0\simeq P/(Tf)$. PBTE converts the empirical lifetime-cycle invariants into entropy-cost invariants. Under Kleiber metabolic scaling and quarter-power physiological-frequency scaling, the mass-specific entropy cost satisfies $\bar\sigma_0=P/(TfM)\propto M^{3/4+1/4-1}=M^0$, providing a thermodynamic interpretation of allometric mass cancellation.

1 0

0

q-bio.OT 2026-06-12

Langurs show intentionality hallmarks when begging from humans

by Dishari Dasgupta, Shriparna Chattopadhyay +5 more

Begging with a Purpose? Testing Behavioural Hallmarks of First-Order Intentionality in Free-ranging Hanuman Langurs

Trials across six sites find audience checking, flexible gestures, and signaling that stops after food is acquired

abstract click to expand

Intentional communication has been studied extensively in primates, yet evidence from free-ranging non-ape species remains limited. Human-directed food-solicitation gestures in Hanuman langurs (Semnopithecus entellus) have recently been described, but whether these behaviours exhibit behavioural hallmarks associated with first-order intentionality remains unknown. Here, we experimentally investigated the presence of these hallmarks in free-ranging Hanuman langurs across six anthropogenic sites in southern West Bengal, India. We conducted 360 experimental and control trials and quantified behavioural markers commonly used to operationalize intentional communication. Experimental trials elicited audience checking, recipient-directed orientation, rapid approach responses, food-solicitation gestures and gestural flexibility, whereas these behaviours were rare or absent in control trials. Differences between experimental and control conditions were significant across all six study sites. Signalling also ceased following food acquisition, consistent with the stopping rule associated with an Apparently Satisfactory Outcome. Our findings demonstrate the presence of multiple behavioural hallmarks linked to first-order intentionality in the human-directed gestural communication of free-ranging Hanuman langurs. These results extend the study of intentionality beyond apes and provide new insights into the evolutionary distribution of intentionality-related traits across primates.

0

q-bio.OT 2026-06-10

France builds national link from microscopes to HPC for bioimage data

by Guillaume Gay, Théo Barnouin +4 more

From the microscope to High Performance Computing centers, a national effort toward automated data workflows for microscopy facility users in France

Platform uses OMERO and iRODS to automate workflows across facilities, storage, and computing centers.

abstract click to expand

Modern biological microscopy routinely generates large and complex image datasets, including multidimensional, multimodal, and time-resolved acquisitions. While imaging technologies have rapidly evolved, data management infrastructures within microscopy facilities often remain fragmented, relying on heterogeneous local solutions that are difficult to maintain, scale, and integrate with High-Performance Computing (HPC) centers and public data repositories. To address these issues, France BioImaging (FBI), the French national infrastructure for biological imaging, has developed FBI.DATA and the associated BioImage Cloud platform. This initiative aims to provide a coordinated national infrastructure connecting microscopy facilities, centralized storage resources, HPC environments, and public bioimaging archives through interoperable and scalable workflows.The proposed architecture combines open-source technologies including OMERO for image management, iRODS for distributed data orchestration, Authentik for federated authentication, and emerging standards such as OME-Zarr and REMBI metadata recommendations. The infrastructure is designed to support the complete imaging data lifecycle, from acquisition and transfer to visualization, analysis, sharing, and long-term archiving. Beyond the technical implementation, this work presents the organizational and governance strategies required to deploy a shared national infrastructure across distributed imaging facilities. We discuss the challenges associated with interoperability, metadata standardization, sustainability, and user adoption, as well as the perspectives opened by tighter integration between imaging data and large-scale computing resources for future AI-driven bioimage analysis workflows.

0

q-bio.OT 2026-06-10

Competition pins systems to retain only the strongest components

by Omer Karin

Compositional proofreading through critical self-tuning

Pinning to the stability threshold of dominant species extends their lifetimes and drives weaker variants into rapid turnover until drive ex

abstract click to expand

High-dimensional multicomponent systems, including immune and epigenetic repertoires, must selectively retain rare, beneficial components while purging a massive influx of suboptimal variants. We demonstrate that critical tuning of component control parameters through competition naturally implements proofreading in these systems. Competition for shared inputs pins the system to the marginal stability threshold of the most persistent species. This grants dominant species extended lifetimes, concentrating the population into dominant components while forcing less-stable variants into rapid drift-driven turnover. When aggregate drive exceeds a characteristic scale, this pinning fails, producing a non-selective state where component lifetimes scale as a universal power law with aggregate drive. Applying this framework to biological memory, we identify the hallmarks of this effect in plasma cell accumulation dynamics and propose that de-pinning transitions may represent failure points across biological domains, including cancer, immunodeficiencies, and the aberrant activation of harmful genomic elements during ageing.

0

q-bio.OT 2026-06-09

Ω test proposed to end arbitrary replicate counts in simulations

by Eric T. Lofgren, Kellen Myers +1 more

When is Enough Enough? A Proposed Termination Point for the Number of Replicates in Computational Simulations

A stopping rule modeled on P-tests aims to decide when additional runs add no further value.

abstract click to expand

Computational simulation provides a powerful toolkit for in silico experimentation. However, while the field has developed best practices for the design and implementation of such models, there remains ambiguity in discussions about how to understand and/or interpret their results due to their inherent ability to overwhelm traditional frequentist statistics by simply increasing the number of trials simulated. This fails the discipline in two ways: first, it leaves the community unsure of what constitutes a best practice for uniform understanding, and second, it potentially overburdens computational studies that burn clock cycles solely to ensure "enough runs to satisfy peers" without any theoretical underpinning for a definition of "enough". We propose a simple and straightforward standard for when to stop simulating additional trials, the {\Omega} test, designed to be analogous to the function of traditional frequentist P-tests. Community adoption of a reasonable and uniform standard will permit more efficient computational experimentation and clearly communication/interpretation of the findings discovered in this way.

0

q-bio.OT 2026-06-09

Segmentation errors propagate in spatial transcriptomics

by Naveed Ishaque, Peter Kharchenko +21 more

The Challenge of Cell Segmentation in Spatially Resolved Transcriptomics

Technical challenges in assigning transcripts to cells can distort downstream biological conclusions without better benchmarks.

abstract click to expand

Spatially resolved transcriptomics (SRT) is transforming how we study tissues by measuring gene expression in cells in their spatial context. However, the field lacks robust methodological guidance on one of its most fundamental analytical steps: how to accurately segment cells and assign spatially localized transcripts to them. Major technical challenges include sparse molecular signals, transcript displacement, complex cellular morphologies, and the projection of three-dimensional tissue architecture onto two-dimensional imaging planes. These challenges make segmentation a major source of uncertainty, with errors that can propagate through downstream analyses and ultimately lead to misleading biological interpretations. Here, we argue that segmentation should be treated as a central unresolved problem in spatial omics rather than a routine preprocessing step. We review current approaches, highlight key methodological limitations, including the lack of appropriate metrics and gold-standard benchmarks, and propose a community-driven path forward. Establishing shared evaluation frameworks, scalable benchmark datasets, and transparent reporting standards will be essential for transforming SRT into a robust and reproducible foundation for biological discovery and clinical translation.

0

q-bio.OT 2026-06-01

Biofilm index combines biomass and counts to quantify Candida resistance

by Nikhil Ujlayan, Teena Singh +4 more

Quantifying biofilm-virulence index to predict antifungal resistance in Candida albicans

Additive BVI parameter merges two measurements into one for easier assessment of antifungal effects.

abstract click to expand

Candida albicans is a commensal microorganism that causes opportunistic infections, such as oral candidiasis, vaginitis affecting females, newborns, and immunocompromised patients. Biofilm formation can lead to a commensal organism becoming a life-threatening organism by introducing antifungal resistance. The experiment we did combines crystal violet staining for biofilm biomass and CFU counts to statistically construct an additive BVI model by analysing the experimental data. Our study on the data proposes a Biofilm-Virulence Index (BVI) as a novel and quantitative parameter for assessing antifungal drug resistance in Candida albicans. The effect of the drugs on inhibition zone diameter is twofold, first, linear increase with time during early biofilm formation, second, stabilizing in later phases and correlating directly with virulence. Most BVI values remained in the mild infection range, indicating successful virulence reduction by antifungal drugs. The BVI model model combines the study of biofilm and viable cell count in a single parameter. So, this makes comparison between samples easier during biofilm analysis. Findings suggest that combination of CFU and biofilm measurement may improve interpretation of antifungal response in Candida albicans. This approach could be useful in future experimental studies investigating biofilm associated resistance.

0

q-bio.OT 2026-05-27

Biodiversity harm from LLM serving adds up at scale

by Tianyao Shi, Yi Ding

BIRDS: Characterizing and Understanding Biodiversity Impact of Large Language Model Serving

Request-level measurements identify serving choices that cut ecological cost while preserving output quality.

abstract click to expand

Large language model (LLM) serving creates environmental impacts beyond carbon and water, including ecosystem damage through biodiversity-related pathways. We present BIRDS, a framework for Biodiversity Impact of Request-Driven LLM Serving. BIRDS defines request-level functional units, quantifies operational and embodied biodiversity impact, and introduces Quality-Normalized Biodiversity Impact (QNBI) to jointly analyze ecological impact and response quality. Across diverse workloads, models, GPUs, and regions, BIRDS reveals that biodiversity impact accumulates at scale and exposes actionable quality-aware serving tradeoffs.

0

physics.bio-ph 2026-05-27

Meditation reduces intermittency in palm biophoton emissions

by E. Pace, L. De Paolis +6 more

Biophoton Emission from Palm during Meditation: A Multi-Method Complexity Analysis

Three sessions analyzed with four complexity methods show consistent drop during box-breathing, aligning with heart and brain changes

abstract click to expand

Biophotons are ultra-weak photon emissions in the visible spectrum produced by living organisms. While extensively studied in plants, germinating seeds, and cell cultures, no systematic multi-method complexity analysis of human ultraweak photon emission (UPE) under physiological modulation has been reported. We address this gap by applying a comprehensive analytical framework to UPE measurements from the right palm of a human subject. Three independent sessions were conducted on different days, each comprising four consecutive 15-minute phases: Dark reference, pre-meditation resting state (Pre), structured meditation based on the Sama Vritti box-breathing protocol, and post-meditation recovery (Post). Photon count series are analysed with four complementary methods: distributional statistics (Fano factor, skewness, tail Expected Shortfall); multiscale Fano factor and Allan deviation; stripe-filtered Diffusion Entropy Analysis (DEA); and Renyi entropy with a Time Reversal test. The methods show complementary sensitivities, converging on a coherent picture: a systematic reduction of emission intermittency during meditation, consistently detected across all three sessions. Stripe-filtered DEA places the emission in the non-ergodic renewal regime with a Pre-to-Meditation decrease of the scaling exponent. Renyi analysis reveals two effects: reduced marginal amplitude burstiness (Tdir) and increased sequential pattern structure (Tseq), interpreted as entrainment to the Sama Vritti rhythm. These findings are consistent with cardiac complexity transitions during meditation reported by Tuladhar et al. and with EEG reorganization during Sama Vritti breathing by Zaccaro et al., suggesting a coordinated multi-channel physiological response. The results establish a proof-of-concept framework for complexity analysis of human UPE under physiological modulation.

0

q-bio.OT 2026-05-26

Simulator predicts post-meal metabolism in milliseconds with 18% error

by Alberto Calderone

Real-Time In Silico Modeling of Postprandial Macronutrient Kinetics: A Validated Computational Engine for Nutrition Research and Digital Health

Bi-compartmental model and finite state logic generate curves for protein synthesis and glucose uptake from any meal input

abstract click to expand

Simulation of post-prandial pharmacokinetics, such as muscle protein synthesis (MPS) through mTORC1 and insulin-induced glucose uptake, is often challenging due to the computational intensity of the multi-compartmental approach. In this study, I introduce an in silico metabolic simulator that uses bi-compartmental Bateman kinetic processes, gamma-variate distributions, and finite state machine reasoning to solve temporal differential equations instantaneously, generating metabolic curves and predictions depending on input meals. The novel underlying algorithm was custom-built entirely independent of third-party libraries or external services. This original computational engine, bridging the gap between academia and the digital health sector, is integrated within a web dashboard and provided as a service via REST APIs. The average response time is approximately 135 ms with a maximum below 750 ms. The multi-dimensional model was calibrated using a Landmark Validation approach across diverse dietary conditions (Whey Protein, mixed meal, OGTT) and optimized via Grid Search. Ultimately, the system achieved a global physiologically optimal Mean Absolute Percentage Error (MAPE) of $\sim18\%$ while maintaining an algorithmic complexity of $O(n \log n)$.

0

cs.LG 2026-05-25

Gradient boosting predicts multi-organ diabetes risk at AUC 1.000

by Mini Han Wang, Liting Huang +2 more

Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes

Routine lab biomarkers on 1,195 patients yield perfect separation and rank hyperglycaemia plus renal impairment as top drivers.

abstract click to expand

Background: Type 2 diabetes mellitus (T2DM) is increasingly recognised as a systemic disease characterised by coordinated dysfunction across metabolic, renal, lipid, and inflammatory pathways. Existing clinical assessments often fail to capture this multi-dimensional burden. Methods: We conducted a retrospective study of 1,195 patients using routinely collected laboratory biomarkers. System-level abnormality indices were constructed to quantify organ-specific dysfunction, and multi-system involvement was defined as abnormalities in two or more systems. Supervised machine learning models, including logistic regression, random forest, and gradient boosting, were trained to predict multi-system dysregulation. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). Results: The gradient boosting model demonstrated near-perfect discrimination (AUC = 1.000), significantly outperforming logistic regression (AUC = 0.925). Feature attribution analysis revealed that hyperglycaemia, renal impairment, dyslipidaemia, and inflammation were the dominant drivers of multi-system risk. Dose-response relationships observed in partial dependence analyses further supported the biological plausibility of model predictions. Conclusion: This study presents an interpretable, data-driven framework for quantifying systemic disease burden in T2DM. By linking routine biomarkers to multi-organ dysfunction, our approach provides both predictive accuracy and mechanistic insight, offering potential for improved risk stratification and precision medicine in diabetes care. The data and code used in this study are openly available on GitHub at: https://github.com/MiniHanWang/Type-2-Diabetes-1.git

0

q-bio.OT 2026-05-25

Boundaries segregate chiral domains in 2D simulation

by Arturo Tozzi

Spatial confinement and boundary constraints governing biological chirality: a simulation study

Finite geometry and coupling in the model drive opposite-handed regions apart and stabilize local coherence.

abstract click to expand

Biological systems exhibit marked molecular asymmetry, with proteins based predominantly on L-amino acids and nucleic acids and carbohydrates largely composed of D-sugars. Explanations for homochirality include asymmetric photochemistry, autocatalytic amplification, stochastic symmetry breaking and mineral-surface stereoselectivity, but these mechanisms only partially address the influence of finite geometry and collective spatial interactions on stereochemical stabilization. Inspired by recent developments in condensed-matter physics, we investigated whether coherent chirality could emerge from the interplay among nonlinear stereochemical amplification, stochastic fluctuations and boundary-dependent spatial constraints. We developed a reaction-diffusion simulation in which local stereochemical populations evolved within finite two-dimensional domains under spatial coupling and weak geometrical bias fields. Our model combined bistable autocatalytic dynamics, nearest-neighbor interactions and suppression of locally inconsistent stereochemical configurations in order to quantify temporal evolution of enantiomeric excess, same-handed neighbor agreement and radial stereochemical organization under varying interaction strengths and fluctuation amplitudes. Our results showed progressive formation of chiral domains, segregation of opposite-handed regions and geometry-dependent modulation of local stereochemical organization. Spatial coupling increased local coherence and modified persistence of mixed stereochemical states, while finite boundaries influenced radial organization and anisotropic stabilization of molecular populations. Potential applications include geometrically controlled asymmetric synthesis, confined stereoselective catalytic systems, adaptive chiral materials and characterization of heterogeneous stereochemical distributions in microstructured reaction environments.

0

cs.CV 2026-05-25

5,000 vineyard images enable automated grape cluster closure scoring

by Xiangzhi Tong, Chengrui Zhang +6 more

ViViD-5K: Vineyard vision dataset for field-based berry detection and segmentation and grape cluster closure estimation

Dense berry-centroid and mask labels across 13 varieties power a pipeline that replaces subjective manual assessments.

abstract click to expand

Cluster closure, defined as the progressive filling of gaps between the berries in a grape bunch, is a key trait in vineyard management, impacting disease risk. However, traditional visual scoring methods are labor-intensive, subjective, and lack temporal resolution. Existing datasets rarely support fine-grained berry-level analysis, limiting the development of robust deep learning models. In this work, we present ViViD-5k, a large-scale in-field Vineyard Vision Dataset containing 5,000 images with dense annotations, including over 648,000 berry centroids and cluster segmentation masks spanning 13 grape varieties. Building on this dataset, we introduce GrapeSAM, a two-stage visual pipeline that combines point-based berry localization with prompt-based segmentation using Segment Anything, followed by transformer-based cluster segmentation. The pipeline enables automated, in-field estimation of cluster closure with minimal supervision. Quantitative results demonstrate strong segmentation and counting accuracy across diverse conditions, while visualizations confirm robustness on both in-domain and out-of-domain samples. This work provides a scalable and objective alternative to manual compactness scoring and supports high-throughput grape phenotyping with enhanced spatial detail.

0

q-bio.OT 2026-05-22 2 theorems

Fibrin networks keep stable topology around esophageal cancer surgery

by Thomas Burnett, Theresa Reinhold +6 more

Persistent Homology as a Morphological Signature of Fibrin Networks

Persistent homology on confocal z-stacks detects no changes across the perioperative period or between standard and intervention groups.

abstract click to expand

We present an investigation of the applicability of topological data analysis (TDA) to the study of high-resolution confocal microscopy images of fibrin network structures from patients with oesophageal cancer undergoing intended curative surgery. Investigation of clot structure brings new knowledge about blood coagulation, risk of bleeding, and thrombosis in this group of patients. Images of fibrin network formation in the collected blood samples were captured by confocal microscopy and three-dimensional z-stacks were analysed. Each z-stack was cropped to a centre region for analysis, the validity of which is assessed in detail. Overall, we found no significant differences in fibrin network topology across the perioperative period, and no consistent differences in network structure between the standard and intervention groups.

0

q-bio.OT 2026-05-18 1 theorem

Logistic model predicts malaria severity at 83% accuracy

by Mary Opokua Ansong, Asare Yaw Obeng +1 more

A Logistic Regression Model to Predict Malaria Severity in Children

Trained on 417 cases from Ghana district, it shows high infection rates but low severity and the importance of balanced class samples.

abstract click to expand

One of the main causes of death around the globe is malaria. Researchers have sought to develop predictive models for malaria outbreaks based on meteorological data, climate data and the breeding cycle of Plasmodium, the causative agent of malaria. This study predicts the severity of malaria based on environmental and biological factors. A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate. The study was carried out in the Bosomtwe District of Ghana with 417 respondents. It was deduced that although children in the District are highly prone to malaria infection, the severity is very low. The study recommends that not just having a good sample size alone is important during machine learning model development, but also having a good sample representation of the various class labels is equally important.

0

cs.CE 2026-05-18 2 theorems

Emulator finds 181 maize traits that hold up under future climate

by Mojdeh Saadati, Juan Panelo +4 more

From Simulation to Discovery: AI Enabled Probabilistic Emulation of Mechanistic Crop Systems

Fast probabilistic model screens 100k trait options in Midwest soils to spot consistent performers through 2100.

abstract click to expand

Global food security depends on predicting crop responses to climate variability, yet process based crop models remain too computationally expensive for large scale exploration of genotype and environment interactions. Here we develop a probabilistic neural emulator of APSIM that reproduces key maize growth processes across 13 outputs with high fidelity (with R^2 of 0.93) while reducing simulation time by several orders of magnitude. Trained on two million simulations spanning diverse genetic, soil, and management conditions, and augmented with a convolutional synthetic weather generator that produces physically consistent climate sequences, the framework enables scalable exploration of crop responses under realistic and diverse environmental inputs while providing calibrated predictive uncertainty without costly Bayesian inference. Applying this framework across 100,000 trait configurations, six soil environments in Iowa and Illinois, and climate projections through the year 2100 under two emissions scenarios, we identify 181 maize trait combinations that consistently maintain high yield across all tested conditionsan analysis infeasible with the mechanistic model alone. We further show that radiation use efficiency and temperature driven root dynamics are dominant drivers of yield resilience. Notably, projected yield distributions vary substantially across locations, with some lower productivity sites exhibiting yield increases under future climate scenarios, indicating that climate change may reshape regional yield potential in nonintuitive ways. These results demonstrate how uncertainty aware emulation transforms mechanistic crop simulation from a computational bottleneck into an on demand discovery engine, one capable of interrogating the full genotype, environment and management space at a scale no process-based model can match.

0

q-bio.OT 2026-05-18 2 theorems

Exponential-log measure keeps dynamic range in drug binding curves

by Arturo Tozzi

An exponential logarithmic measure of drug receptor binding and saturation

Reveals asymmetric sensitivity at low exposure and saturation where bounded occupancy compresses variability

abstract click to expand

Ligand receptor interactions are commonly assessed through equilibrium occupancy and pharmacodynamic measures that describe binding and saturation by means of bounded response curves. Thermodynamic approaches relate binding affinity to logarithmic concentration scaling, while probabilistic descriptions of occupancy arise from exponential relations. We introduce an exponential logarithmic descriptor (ELD) that integrates ligand availability and thermodynamic binding propensity within a single quantity. The logarithmic component corresponds to a thermodynamic term derived from concentration dependent free energy relations, whereas the exponential component is represented through an inverse normalized concentration term corresponding to the reciprocal of the exponential occupancy factor emerging from Boltzmann type binding formulations. We explored ELD behavior through numerical simulations spanning sub affinity, transition and saturating concentration regimes under multiple affinity conditions and time dependent exposure profiles. Compared with conventional occupancy curves, ELD retained a broader dynamic range and revealed asymmetric sensitivity across concentration scales, particularly at low exposure and near saturation, where bounded occupancy measures progressively compress variability. The resulting behavior reflects the coexistence of amplification and constraint processes within ligand receptor dynamics. ELD may provide quantitative representation for biological systems in which exponential and logarithmic processes coexist across different scales. Potential applications include characterization of dose response transitions, identification of subtherapeutic and saturating exposure states, comparison of compounds with different affinities, normalization across heterogeneous datasets and continuous tracking of pharmacodynamic regimes during time dependent exposure.

0

q-bio.OT 2026-05-13 1 theorem

Latent gait shifts reveal viability differences under occlusion

by Jacques Raynal, Pierre Slangen +2 more

From Organization to Viability: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

Comparable performance across three occlusal levels hides distinct long-term reorganization in a Parkinsonian single-case study

abstract click to expand

Clinical interpretation often assumes that observable performance provides sufficient information about the organization of an adaptive system. However, similar observable performance may correspond to distinct latent organizations. This study extends a previous multi-level framework by introducing a fourth analytical level centered on longitudinal viability. Using an exploratory single-case design in a Parkinsonian patient, gait data were recorded with instrumented insoles under three occlusal conditions: neutral natural occlusion (ONL), a 2.5-degree increase in vertical dimension of occlusion (OC2.5), and a 3-degree increase in vertical dimension of occlusion (OC3). Two measurement sessions were conducted eleven weeks apart, during which the participant underwent a structured sensorimotor intervention. The vertical dimension of occlusion was considered as an experimentally varied constraint applied to an adaptive neuromechanical system. Although observable performance remained globally comparable across conditions, PCA-based latent-space analysis revealed differentiated longitudinal centroid displacements. OC3 exhibited the smallest displacement, ONL an intermediate displacement, and OC2.5 the largest displacement. This hierarchy supports the relevance of a Level 4 framework centered on viability, understood here as an exploratory proxy for a configuration's capacity to maintain lower longitudinal reorganization over time. These findings remain within-subject, exploratory, and non-causal. They do not establish a validated clinical threshold, causal occlusal effect, or therapeutic optimum. More generally, the work suggests that clinical relevance cannot be inferred solely from instantaneous performance or static latent structure, but may also depend on the capacity of a configuration to sustain a coherent trajectory over time.

3 0

0

eess.IV 2026-05-12 1 theorem

SpecX benchmark compares spectral models on 1.7M molecules

by Chengrui Xiang, Tengfei Ma +4 more

SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation

Specialized models lead on precise signals while multimodal models lead on reasoning, revealing gaps that spectrum-native models must close.

abstract click to expand

Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, MS,UV,Raman and FL, and is organized into three tiers: a large-scale dataset for pretraining, an aligned multi-spectral subset for benchmarking, and a high-quality experimental subset for evaluation. SpecX supports a range of tasks such as molecular elucidation, spectrum simulation, and spectral understanding, and enables unified evaluation across both specialized spectral models and MLLMs. Experiments show that specialized models excel at signal-level modeling, while MLLMs exhibit strengths in high-level reasoning but lack precise spectral grounding. SpecX establishes a unified benchmark for spectral intelligence and highlights the need for spectrum-native foundation models.

0

q-bio.OT 2026-05-11 Recognition

Kurdistan study finds seven hawthorn taxa with major fruit variation

by Karzan Ezzalddin Mohammed

Morpho-Physiological and Genetic Diversity of Crataegus Taxa (Rosaceae) in Selected Locations of Iraqi Kurdistan-Region

Sixty-one accessions vary significantly in weight, size, seeds, pH and moisture, explained by eleven traits.

abstract click to expand

One of the great phytogeography zones of semi-arid lands in the world is the Kurdistan region of Iraq which hosts many important fruit species due to its geographical location and ecology. Mountain Hawthorn (Crataegus spp.) is a vital wild edible deciduous fruit tree of the genus Crataegus for the region, which is highly beneficial for ornamental, economical, industrial and medicinal uses. In the present study, morphological, phytochemical and molecular marker systems were applied on sixty-one Hawthorn accessions from different locations in the Iraqi Kurdistan region during April 2022 to September 2023. Phenotypic markers have proven to be extremely useful in studies of genetic diversity in Hawthorn genotypes, the results of the present morphological study showed that there are seven taxa (five species, two hybrids) were observed including, Crataegus azarolus, Crataegus meyrei, Crataegus monogyna, Crataegus orientalists, Crataegus pentagyna, Crataegus azarolus x Crataegus meyrei and Crataegus azarolus x Crataegus pentagyna. There was significant variation among different ecotypes in terms of plant type, reproductive stage, and fruit morphology and production uses. Fruit Physio-morphological data revealed a high level of significant variability (P 0.01) among accessions based on the analysis of variance. The most important characteristics for explaining fruit morphological variability `were 11 varbales including fruit weight (FW), fruit length (FL), fruit width (FW), seed length (SL), seed width (SW), number of seeds per fruits (NSF), volume solution (VS), fruit fresh weight (WOF), seed weight (WS), Potentional of hydrogen (pH) and mositure content (MC). They all are significantly different for all the traits measured for the studied accessions.

0

q-bio.NC 2026-05-11 Recognition

Internal errors trigger sparse neural net updates

by Arturo Tozzi

Internally triggered retrospective learning in neural networks

Discrepancy thresholds replace continuous weight changes, focusing adaptation on informative sequential patterns.

abstract click to expand

Learning in artificial neural networks usually relies on continuous, externally driven weight updates, in which parameters are modified at every step in response to incoming data, error signals or reward feedback. In this setting, routine and informative inputs contribute similarly to parameter adjustment. We introduce a learning approach in which parameter updates are governed by internally generated events arising from the network own representational dynamics. During ongoing activity, synaptic interactions are accumulated as latent traces encoding recent coactivation patterns, without immediately modifying the underlying parameters. In parallel, an internal predictive process estimates the evolving latent state, while a scalar measure of discrepancy between predicted and observed states is continuously computed. When discrepancy exceeds an adaptive threshold derived from recent error statistics, a learning event is triggered, inducing a retrospective update selectively integrating past activity into the current configuration. We performed simulations using a minimal neural network exposed to structured sequential inputs with transient perturbations. We found that learning occurs through sparse, temporally localized events associated with increases in prediction error, leading to stepwise changes in synaptic efficacy and discrete transitions in latent state organization. By selectively reorganizing parameters in response to internally detected discrepancies, our episodic updating may reduce unnecessary parameter drift while preserving informative patterns. Potential applications include systems requiring selective adaptation to rare or informative inputs such as physiological, industrial or environmental monitoring, edge computing under limited energy budgets, autonomous systems operating in dynamic conditions and sequential computational data processing.

0

q-bio.OT 2026-05-11 Recognition

Statin eligibility drops 3M or rises 21M under 2026 guidelines

by James A. Diao, Thomas A. Buckley +4 more

Statin Recommendations among US Adults with the 2026 Dyslipidemia Guidelines

The outcome hinges on whether the new 30-year risk pathway is applied, expanding access most for adults aged 40-59.

abstract click to expand

Importance: The 2026 multisociety dyslipidemia guideline recommended the PREVENT equations in place of the PCE equations, introduced 30-year risk assessment as a new treatment pathway, and lowered risk-based treatment thresholds. The net population impact of these concurrent changes on statin recommendations is unknown. Objective: To estimate changes in statin recommendations under 2026 PREVENT-based dyslipidemia guidelines compared with 2018 PCE-based guidelines. Design and Participants: Cross-sectional analysis of pooled data from NHANES, spanning 2011-2023 and comprising 24,199 participants aged 30-79 years. Main Outcomes and Measures: Number and proportion of US adults receiving or recommended for statin therapy. Results: At the class 1 threshold, the number of US adults receiving or recommended for statin therapy decreased by an estimated 3.0 million (95% CI, 2.3 million to 3.6 million), with larger reductions among Black adults (-4.2 percentage points [pp]), men (-4.0pp), and adults aged 50-69 years (-5.6pp). At the class 2 threshold--which additionally recommends statins for adults aged 30-59 years based on 30-year risk--the number of adults recommended increased by an estimated 20.8 million (95% CI, 19.6 million to 22.0 million), or +11.6pp. The increase was largest among adults aged 50-59 years (+19.7pp) and 40-49 years (+14.8pp). Conclusions: The net population impact of the 2026 dyslipidemia guidelines depends critically on which recommendation class is applied. At the class 1 threshold, statin recommendations decreased modestly; at the class 2 threshold, inclusion of 30-year risk assessment substantially expanded recommendations, particularly among younger adults. These divergent effects underscore the importance of the 30-year risk criterion as a major driver of new eligibility and the need for outcomes and equity monitoring during guideline implementation.

0

q-bio.OT 2026-05-08 3 theorems

Rhythms bias binary sequences to form catalytic and information systems

by Takeshi Ishida

Genetic Information as a "Chord" of Chemical Oscillations: Emergence of Catalyst-RNA Systems Driven by Superposed Rhythms

Simulations link superposed chemical oscillators to higher rates of functional polymer emergence compared with random selection.

abstract click to expand

A central challenge in the origin of life is understanding how catalytic peptide-like polymers and information-bearing nucleic acid-like polymers emerged as an interde-pendent system. This study constructs a primordial cognitive model incorporating two internal Lotka-Volterra chemical oscillators to investigate, through simulation, whether a catalytic loop, primordial tRNAs, and nucleic acids that record and amplify them, can form through the interaction of polymers represented by binary (0/1) sequences. In this model, a mechanism was introduced where the synthesis of internal oscillations pro-vides a temporal bias for 0/1 selection during polymer elongation, while generated functional sequences are protected, recorded, and re-amplified. Simulation results demonstrated that the proposed cognitive model significantly outperformed a contrast model based on random 0/1 selection in terms of the establishment rate of catalytic loops, the accumulation of functional molecules, polymer elongation, and the reduction of Shannon entropy in sequence distribution. Furthermore, this superiority was generally maintained across sensitivity analyses, including batch calculations with different ran-dom seeds. While this study is a computational model based on abstract binary se-quences and simplified translation/replication rules rather than a direct reconstruction of life's origin, it provides a working hypothesis for the interdependent emergence of catalytic function and information retention by demonstrating that internal oscillations can bias sequence exploration within a framework linking autocatalytic networks, re-cording, and group selection. Future research must verify the generality and empirical validity of this framework by expanding monomer types, evolving into multi-oscillator systems, and establishing correspondences with compartmentalized experimental sys-tems.

0

q-bio.OT 2026-05-04

Dynamic systems model treats genes as state and environment as inputs for behavior

by Mengman Wei, Qian Peng

DynoSys: A Dynamic Systems Framework for Multimodal Integration of Genetic, Environmental, and Neurobiological Signals

Framework builds harmonized representations from genetic, environmental and brain data to support longitudinal and event-based prediction of

abstract click to expand

Understanding the development of adolescent behavioral and mental health outcomes requires integrating genetic predisposition, environmental exposures, and neurobiological processes over time. Here, we present a unified quantitative framework that models the human body as a dynamic system, where genetic factors form the foundational state, environmental exposures act as time-varying inputs, the brain might serve as a mediation processor, and behavioral phenotypes emerge as system outputs. Using longitudinal data from the Adolescent Brain Cognitive Development (ABCD) Study, we construct harmonized multi-domain representations across six phenotypes: externalizing behavior, internalizing behavior, and four substance use initiation outcomes (alcohol, nicotine, cannabis, and any substance use). We integrate polygenic risk scores (PRS), multi-domain environmental features, and multimodal neuroimaging representations derived through stability selection and dimensionality reduction. Our framework supports both continuous longitudinal modeling and survival-based event modeling through a unified data structure. We further develop interpretable domain-level representations using principal components, weighted risk scores, and cluster-based summaries. These representations enable downstream modeling using survival analysis, state-space models, and machine learning approaches. This work establishes a scalable and interpretable framework for studying how genetic and environmental factors interact over time to shape behavioral outcomes, providing a foundation for identifying modifiable risk factors and informing early intervention strategies.

0

q-bio.OT 2026-05-01

Agentic AI biologists assist with blocked dual-use tasks

by Kimon Antonios Provatas, Avery Self +2 more

BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists

Scaffolding on Biomni raises scores on WMDP biology proxies; new taxonomy with 10 categories maps the agentic risks.

abstract click to expand

Agentic AI scientists equipped with domain-specific tools are rapidly entering scientific workflows across disciplines, with especially strong uptake in the life sciences where they can be used for literature synthesis, sequence analysis, and experimental planning support. While these systems accelerate biological research, they also introduce risks for dual-use applications that are not captured by current model-centric safety evaluations. We present evidence that current agentic AI scientists, including Biomni and K-Dense, are willing to assist with dual-use tasks that are blocked by base model safeguards. We also found that in a paired evaluation framework for biology and chemistry prompts involving Weapons of Mass Destruction proxies (WMDP), agentic scaffolding of Biomni increased the benchmark performance relative to the underlying standalone model, producing measurable capability uplift. We believe it is necessary to include additional safeguards in existing models and build future tools from the ground up with agentic vulnerabilities in mind. To systematically categorize broader risks, we introduce BioVeil MATRIX, a defensive taxonomy that maps AI-enabled biosecurity risks using 10 tactical categories (TA01--TA10) and 22 different techniques. We propose to use this taxonomy as a baseline for future AI scientist development and generate specialized benchmarks and protocols for red-teaming these vulnerabilities before public deployment. BioVeil MATRIX can be found at: https://bioveilmatrix.com/

0

q-bio.OT 2026-05-01

The paper models tumor containment as an anti-percolation process in which preventing…

by Arturo Tozzi

Tumor containment as an anti-percolation process

Tumor containment framed as anti-percolation, with simulations indicating partial independence of malignant area from connectivity metrics…

abstract click to expand

Percolation theory from statistical physics has been applied to several aspects of tumor progression. Tumor growth on percolation clusters has been used to model spatial expansion, vascular percolation to describe nutrient supply and transport related percolation to investigate drug and gene delivery. At the molecular level, mutational percolation has been employed to account for the emergence of malignant phenotypes, while inverse percolation to represent treatment-induced structural disruption. We examined whether tumor containment can be interpreted as an anti percolation problem, in which spatial expansion depends on the formation of a connected malignant domain. We implemented a spatial simulation with biologically scaled parameters to represent tissue heterogeneity, local growth, cell movement and clearance. We measured both total malignant area and connectivity metrics, including the largest connected component and the probability of forming a spanning cluster. Our results indicate that tumor size and spatial connectivity are partially independent, with configurations of similar size showing different connectivity patterns. A transition from fragmented to connected structures emerged within a limited parameter range, consistent with a threshold like behavior. Incorporating spatial connectivity into quantitative analysis, our approach provides a complementary way to characterize tumor organization. Potential applications include integration of structural descriptors into computational models of tumor growth, design of experimental systems to probe spatial organization and interpretation of therapeutic approaches via connectivity-based metrics.

0

q-bio.OT 2026-05-01

Shared parameter component speeds cancer model personalization with scarce data

by Logan Rose, Jonathan Martinez +4 more

Personalizing Cancer Models under Data Scarcity via Parameter Decomposition

Decomposing parameters lets the population-level part serve as a prior so only the patient-specific part needs fitting to limited new data.

abstract click to expand

Personalized cancer modeling for clinical applications requires robust and efficient parameter calibration, particularly in settings with limited patient data. This need is especially critical for medical digital twins (MDTs), which are virtual representations of disease continuously updated using longitudinal patient measurements. In this work, we propose a novel parameter personalization framework for dynamical cancer models under data scarcity. Our approach decomposes selected model parameters into a common component, shared across patients, and a personalized component, which is patient-specific and can be updated as new data become available. The common component captures population-level structure and is estimated once, providing an informed prior that enables rapid and accurate personalization. We demonstrate the effectiveness of this framework using synthetic data generated from canonical dynamical systems, such as logistic growth models with optimized treatment interventions. Our results show that parameter decomposition significantly improves calibration performance in limited-data regimes, facilitating fast and reliable personalization and supporting the development of patient-specific cancer models and MDTs.

0

q-bio.OT 2026-04-30

Vocal entropy measures outperform static features for spotting depression

by Himadri S Samanta

Entropy-Dominated Temporal Vocal Dynamics as Digital Biomarkers for Depression Detection

Unpredictability in speech timing and pitch raises AUC from 0.593 to 0.646 on the DAIC-WOZ dataset under strict validation.

abstract click to expand

Automated depression detection often relies on static aggregation of conversational signals, potentially obscuring clinically meaningful behavioral dynamics. We investigated whether entropy-driven temporal biomarkers improve depression detection beyond standard pooled features using the DAIC-WOZ corpus. Using 142 labeled participants, we reconstructed utterance-level acoustic trajectories and compared pooled temporal baselines, trajectory dynamics, Shannon entropy biomarkers, recurrence quantification, sample entropy, fractal complexity, and coupling biomarkers under leakage-aware validation. Static pooling achieved an AUC of 0.593, trajectory dynamics improved performance to 0.637, and entropy biomarkers produced the strongest statistically significant improvement over pooled baselines (AUC 0.646; nested cross-validated AUC 0.615; permutation p = 0.017). Entropy biomarkers outperformed recurrence, coupling, sample entropy, and fractalbased features, with several biomarkers stable across folds. These findings suggest depression-related signal may lie less in average acoustic levels than in entropy of conversational dynamics, supporting temporally informed digital phenotypes for mental-health assessment.

0

q-bio.OT 2026-04-28

OxyPOM is a new biogeochemical model that applies different temperature response…

by Ovidio García-Oliva, Carsten Lemmen

OxyPOM: a biogeochemical model for Oxygen and Particulate Organic Matter dynamics with detailed temperature sensitivity

Uniform sensitivities in oxygen models underestimate particulate organic carbon and overestimate nutrients across seasons in low-oxygenwater

abstract click to expand

Periods of low dissolved oxygen concentration -- hypoxia and anoxia -- threaten the health of aquatic ecosystems and the services they provide.Hypoxia is strongly influenced by temperature, but the different sensitivities and response functions of oxygen removal and production processes to temperature are not regarded in most models. Here we present OxyPOM -- Oxygen and Particulate Organic Matter, a nuanced temperature-aware process-based biogeochemical model. OxyPOM incorporates nuanced temperature sensitivities for the key oxygen-related processes photosynthesis, re-aeration, respiration, mineralization, and nitrification. Further sensitive variables like optimal light intensity, winter grazing inhibition, and pathogenesis are also represented. Our model was tested in an idealized water column experiment, representing a typical estuarine seasonal low-oxygen environment. Differences between nuanced and uniform temperature sensitivities affect seasonal patterns of oxygen-related processes, resulting in under- or overestimation during different times of the year, particularly with higher differences in summer. While these changes may balance in the overall annual oxygen budget, uniform sensitivities underestimate particulate organic carbon production by up to a factor of four along the year and overestimate nutrient concentrations. This nuanced approach to temperature sensitivity allows us to explore and test new hypotheses related to climate warming and heatwaves, addressing the ecosystem changes demanded by climate change models.

0

q-bio.OT 2026-04-27

Microbial networks rewire by sex in three diseases

by Marianna Milano, Pietro Hiram Guzzi

Differential Analysis of Microbial Interaction Networks

Gene-family association analysis detects interaction shifts between men and women that abundance changes alone do not capture.

abstract click to expand

Microbiome studies increasingly indicate that disease-associated shifts cannot be understood from compositional changes alone. The functional architecture of microbial communities encoded in patterns of association among microbial gene families may reveal how these systems reorganize across biological conditions. Here, we present a network-based framework for characterizing microbiome rewiring across conditions. The approach combines condition-specific network inference, differential network analysis and pathway enrichment to identify interactions that are gained, lost or altered between groups, with a specific focus on sex-dependent differences. We apply the framework to inflammatory bowel disease, type 2 diabetes and atherosclerotic cardiovascular disease, comparing male and female specific microbial gene-family networks within each disease context. Across these settings, differential networks reveal extensive rewiring of microbial functional interactions, suggesting that microbiome alterations are shaped not only by changes in abundance but also by shifts in community organization. Importantly, pathway enrichment of rewired interactions uncovers functional signals that are not apparent from individual networks alone, highlighting latent disease and sex associated mechanisms. Code, data and supplementary information are available on the web site.

0

q-bio.OT 2026-04-27

Framework spots cirrhosis endothelial cells and seven key genes

by Xueyuan Huang, Yuheng Wang +9 more

A multi-stage soft computing framework for complex disease modelling and decision support: A liver cirrhosis case study

Multi-stage model stabilizes features with network analysis, uses CNN on disease maps, and beats standard ML on classification.

abstract click to expand

Liver cirrhosis is a major global health problem causing millions of deaths annually, and timely detection with aggressive treatment can significantly improve patients' quality of life. Modelling complex diseases from biomedical data is computationally challenging due to high dimensionality, strong feature correlations, noise, and limited labelled samples. Conventional Machine Learning (ML) pipelines often struggle with robustness, interpretability, and generalisation under such conditions. In this study, we propose an ML-driven multi-stage decision framework for complex disease modelling and therapeutic exploration. The framework integrates single-cell transcriptomic profiling, high-dimensional network-based feature stabilisation, multi-model learning, deep representation construction, and post-hoc decision support. Specifically, single-cell sequencing data were analysed to identify key cellular subpopulations, followed by high-dimensional weighted gene co-expression network analysis (hdWGCNA) to stabilise gene modules under sparsity and noise. To enhance non-linear feature interaction modelling, tabular molecular features were restructured into two-dimensional disease maps and analysed using a CNN. Finally, molecular docking was incorporated as a decision-support module to evaluate candidate therapeutic compounds. Using liver cirrhosis as a representative case, the framework identified a disease-associated endothelial subpopulation and extracted seven robust signature genes (HSPB1, GADD45A, CLDN5, ATP1B3, C1QBP, ENPP2, and PARL). The CNN-based representation learning module outperformed conventional pipelines in classification. The framework is disease-agnostic and readily extends to other omics-driven biomedical applications involving uncertainty, heterogeneity, and limited samples.

0

q-bio.OT 2026-04-27

Reproductive history clusters with early multimorbidity in young women

by Sunday A. Adetunji

AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020

National survey analysis identifies a high-risk group where 77 percent of women with adverse reproductive burdens already meet multimorbidty

abstract click to expand

Background:Adverse reproductive history is a multisystemic risk factor, but evidence is constrained by isolated outcome studies, limited adjustment, and non-interpretable algorithmic models. We re-frame the estimand from prediction to concurrent risk classification and emphasize calibration, interpretability, and systematic error. Methods:We analyzed 1,602 U.S. women aged 20-44 years from NHANES 2017-March 2020 with reproductive-history variables, chronic-condition indicators, and PHQ-9 data. Restricted multimorbidity was defined as at least two of hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones. Features were summarized using principal components analysis and k-means clustering. We compared multivariable logistic regression with XGBoost and used SHAP values to quantify contributions. Results:Early multimorbidity occurred in 6.6% (106/1,602); 71.0% had no chronic condition and 22.4% had one. Adverse reproductive burden was common: 58% had at least one adverse reproductive factor and 12.6% had three or more. Four latent phenotypes emerged (n=398, 508, 102, 594), including a fragile subgroup in which 77.5% met the multimorbidity definition. In holdout evaluation, XGBoost improved discrimination relative to logistic regression (ROC-AUC 0.766 vs 0.667), but showed worse probability accuracy and calibration (Brier 0.069 vs 0.059; expected calibration error 0.113 vs 0.037). Dominant drivers were age, PHQ-9 score, income-to-poverty ratio, race/ethnicity, education, and the adverse reproductive index. Conclusions: Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice

0

q-bio.OT 2026-04-27

Dual-criterion iteration yields stable 5-miRNA signature from 332 features

by Akbar Yermekov, D.A. Herrera-Martí

StackFeat: a convergent algorithm for optimal predictor selection in genomic data

StackFeat combines signed coefficients and selection frequencies across repeated cross-validations to converge on reliable predictors that,

abstract click to expand

In high-dimensional genomic data, the curse of dimensionality (d >> n) and limited sampling make feature selection inherently unstable - a critical barrier to biomarker discovery. We introduce StackFeat, an iterative algorithm that accumulates two statistics across repeated cross-validation: signed coefficients (measuring effect strength and direction) and selection frequencies (estimating selection probability). Only features ranking highly by both criteria are retained. On a COVID-19 miRNA dataset (GSE240888), StackFeat identified a stable 5-miRNA signature from 332 features (98.5% reduction), achieving AUC 0.922, significantly outperforming the benchmark 9-gene set (AUC 0.907, p = 0.0016). The signature includes hsa-miR-150-5p, a marker implicated in both COVID-19 survival and Dengue infection. This dual-criterion approach provides convergence guarantees absent in single-criterion methods, enabling discovery of known biomarkers, novel candidates, and previously unknown relationships. Keywords: marker selection, feature selection, bioinformatics, dimensionality reduction, robust algorithm, stacking, miRNA, COVID-19

0

q-bio.OT 2026-04-22

Energy gradients localize prebiotic reactions without membranes

by Arturo Tozzi

Energy gradients as potential drivers of pre-cellular chemical organization

Simulations show strong coupled gradients in pH, redox and temperature overcome diffusion to create stable confined chemical states.

abstract click to expand

The onset of life is often framed around membrane bound compartments and encoded metabolism, leaving unresolved how spatial organization arose before stable boundaries. In this context, environmental gradients are usually treated as boundary conditions rather than variables structuring chemical dynamics. We ask whether spatial localization and functional coupling can emerge under realistic environmental gradients in the absence of membranes, proposing that spatial variations in energy availability act as organizing variables that bias transport and reaction. We introduce a reaction diffusion model in which interacting chemical species evolve within an externally imposed activity landscape defined by coupled gradients in pH, redox potential and temperature, integrating diffusion, gradient driven drift and position dependent reaction kinetics. We performed simulations across a range of gradient strengths representative of hydrothermal vent like conditions. Our results suggest that sufficiently strong gradients induce spontaneous accumulation of reactants, spatial alignment of reaction maxima and the emergence of stable, confined chemical states. Localization arises above a threshold at which gradient driven transport overcomes diffusive and degradative losses. We conclude that spatially structured energy landscapes can support organized chemical dynamics without predefined compartments, providing a mechanism for coupling and persistence in continuous media. Potential applications include experimental platforms for studying prebiotic chemistry, microfluidic systems with controlled gradients and the design of chemically responsive materials.

0

q-bio.OT 2026-04-21

Models enforce logic and yield intuition in microbiology

by Jamie A. Lopez, Amir Erez

Mathematical modeling and intuition in microbiology: a perspective

They require consistent hypotheses, support testable forecasts, extract unmeasured parameters, and guide choices of model detail for any set

abstract click to expand

Mathematical models are increasingly a part of microbiological research. Here, we share our perspective on how modeling advances the discipline by: (i) enforcing logical consistency, (ii) enabling quantitative prediction, (iii) extracting hidden parameters from data, and (iv) generating intuitive understanding. We map a spectrum of modeling frameworks, from whole-cell simulations to minimal logistic growth equations, and provide interactive examples for some common frameworks. Building on this overview, we outline pragmatic criteria for choosing an appropriate level of description to capture phenomena of interest. Finally, we present a case study in modeling of microbial ecosystems from our own work to illustrate how mechanistic modeling can yield generalizable intuition. This perspective aims to be an introductory roadmap for integrating mathematical modeling into experimental microbiology.

0

q-bio.OT 2026-04-21

Random Forest models predict anti-dementia activity in natural compounds

by Hafiza Syeda Yusra Tirmizi, Syed Ibad Hasnain +3 more

Predictive Modelling of Natural Medicinal Compounds for Alzheimer disease Using Machine Learning and Cheminformatics

Lipophilicity, molecular weight and polarity emerge as the key features driving accuracy in database-screened predictions

abstract click to expand

Alzheimer disease (AD) is a neurodegenerative disease that lacks specific treatment options. Natural drugs have displayed neuroprotective effects; however, their high-throughput discovery is challenging because of the expense of experimental testing.The study proposed a machine learning approach to identify the anti-dementia activity of natural compounds based on molecular descriptors obtained from cheminformatics. The study used a set of active and inactive compounds obtained from public databases like ChEMBL and PubChem. Various molecular descriptors, including molecular weight, lipophilicity (LogP), topological polar surface area (TPSA), and hydrogen bonding descriptors, were calculated with RDKit. Data preprocessing and feature selection were applied, followed by the development of several classification models (Random Forest, XGBoost, Support Vector Machines, Logistic Regression) and their evaluation based on accuracy, precision, recall, F1-score and ROC-AUC. The outcome suggests that ensemble techniques, such as Random Forest, delivered the best predictive accuracy and ROC-AUC values. This study also highlights that critical physicochemical descriptors in particular lipophilicity, molecular weight and polarity are important in driving neuroprotective activity as identified by feature importance analysis. The integrated machine learning approach shows the potential of combining natural product research and machine learning in early drug discovery for dementia. They provide a means of rapidly exploring large datasets and selecting candidates for experimental confirmation, thus minimising costs and time in the development of drugs for neurodegenerative diseases.

0

q-bio.OT 2026-04-15

Glucose baselines shift with prior meal response size

by Arturo Tozzi

Baseline glycemia exhibits non-random, history-dependent variation across repeated meals

Repeated identical meals produce non-random baseline changes scaled to the last post-meal rise, indicating memory in glucose control.

abstract click to expand

Glycemic regulation is often described as maintaining glucose levels near a stable baseline. However, continuous glucose monitoring after meals displays intra-individual variability even under controlled conditions, suggesting intrinsic system dynamics beyond sensor noise, measurement error or short-term variability around a fixed set point. Therefore, we estimated pre-meal glucose baselines, tracking their changes across repeated identical meal challenges within individuals. The baseline was defined as the median glucose level in a pre-meal window, while successive displacements were computed between consecutive repetitions. Using a publicly available dataset of normoglycemic subjects, we observed systematic changes in baseline levels across repeated exposures. These displacements exceeded short-term fluctuations within the same pre-meal interval and were robust to alternative baseline definitions. Moreover, the magnitude of each baseline shifted is positively related to the size of the preceding postprandial response. This association persisted under permutation testing, indicating that it cannot be explained by random temporal ordering. Overall, these findings suggest that glycemic dynamics cannot be fully described as independent fluctuations around a fixed baseline. Instead, baseline levels evolve across repeated perturbations through history-dependent adjustments, such that each perturbation influences subsequent system states. Potential applications include refined interpretation of continuous glucose monitoring data and development of models that incorporate temporal dependence in glucose dynamics.

0

cs.AI 2026-04-14 2 theorems

AI exercise advice stays similar in meaning but shifts in intensity

by Kihyuk Lee

Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

Repeated prompts to one model produce high semantic match yet inconsistent workout numbers, pointing to limits for direct clinical use.

abstract click to expand

Background: Large language models (LLMs) have been explored as tools for generating personalized exercise prescriptions, yet the consistency of outputs under identical conditions remains insufficiently examined. Objective: This study evaluated the intra-model consistency of LLM-generated exercise prescriptions using a repeated generation design. Methods: Six clinical scenarios were used to generate exercise prescriptions using Gemini 2.5 Flash (20 outputs per scenario; total n = 120). Consistency was assessed across three dimensions: (1) semantic consistency using SBERT-based cosine similarity, (2) structural consistency based on the FITT principle using an AI-as-a-judge approach, and (3) safety expression consistency, including inclusion rates and sentence-level quantification. Results: Semantic similarity was high across scenarios (mean cosine similarity: 0.879-0.939), with greater consistency in clinically constrained cases. Frequency showed consistent patterns, whereas variability was observed in quantitative components, particularly exercise intensity. Unclassifiable intensity expressions were observed in 10-25% of resistance training outputs. Safety-related expressions were included in 100% of outputs; however, safety sentence counts varied significantly across scenarios (H=86.18, p less than 0.001), with clinical cases generating more safety expressions than healthy adult cases. Conclusions: LLM-generated exercise prescriptions demonstrated high semantic consistency but showed variability in key quantitative components. Reliability depends substantially on prompt structure, and additional structural constraints and expert validation are needed before clinical deployment.

1 0

0

math.DS 2026-04-10 2 theorems

Tiered framework eases QSP model reporting for regulators

by Susana Zaph, Blerta Shtylla +13 more

Best Practices on QSP Model Reporting for Regulatory Use: perspectives from ISoP QSP SIG Working Group

Flexible tiers based on development phase and model impact help accommodate QSP diversity while meeting regulatory needs.

abstract click to expand

Quantitative systems pharmacology (QSP) models are increasingly applied to inform decision making across drug development and to support regulatory interactions within model informed drug development (MIDD). QSP supports a broad range of applications across drug development and can be tailored to specific therapeutic areas, mechanisms of action, and contexts of use (CoU). While this diversity is a core strength of QSP, it also presents challenges for reporting for regulatory use. Despite the growing impact of QSP models, there is currently no established guidance on how QSP analyses should be documented and reported for regulatory purposes. This white paper, developed by the International Society of Pharmacometrics (ISoP) QSP Special Interest Group Working Group on Credibility Assessment of QSP for Regulatory Use, seeks to address this gap by proposing best practices for QSP model reporting in regulatory settings. The recommendations are grounded in collective real world experience from regulatory interactions and are aligned with reporting guidance established for physiologically based pharmacokinetic (PBPK) modeling and reporting principles outlined in ICH M15. Rather than prescribing a rigid, one size fits all template, this work proposes a flexible, tiered reporting framework that accounts for development phase and model impact. The proposed framework is intended to facilitate regulatory review and enhance transparency while accommodating the inherent diversity of QSP modeling.

0

q-bio.OT 2026-03-02 Recognition

Healthy foot sensor data sets baseline for diabetic ulcer detection

by Md Tanvir Hasan Turja

Unsupervised Anomaly Detection in Wearable Foot Sensor Data: A Baseline Feasibility Study Towards Diabetic Foot Ulcer Prevention

Unsupervised Isolation Forest and KNN/LOF on temperature and pressure readings create reference pipeline for future patient studies.

abstract click to expand

Diabetic foot ulcers (DFUs) are a severe complication of diabetes associated with significant morbidity, amputation risk, and healthcare burden. Developing effective continuous monitoring frameworks requires first establishing reliable baseline models of normal foot biomechanics. This paper presents a feasibility study of an anomaly detection framework applied to time-series data from wearable foot sensors, specifically NTC thin-film thermocouples for temperature and FlexiForce A401 pressure sensors for plantar load monitoring. Data were collected from healthy adult subjects across 312 capture sessions on an instrumented pathway, generating 93,790 valid multi-sensor readings spanning September 2023 to June 2024. Two unsupervised algorithms, Isolation Forest and K-Nearest Neighbors using Local Outlier Factor (KNN/LOF), were applied to detect statistical deviations in foot temperature and pressure signals. Results show that Isolation Forest is more sensitive to subtle, distributed anomalies, while KNN/LOF identifies concentrated extreme deviations but flags a higher proportion of sessions not corroborated by Isolation Forest. Since no clinical ground truth is available, this difference is interpreted as lower specificity under the shared 5 percent contamination assumption rather than a confirmed false-positive rate. A mild positive correlation (0.41-0.48) between pressure and temperature features supports the case for combined multi-modal monitoring. These findings establish a validated baseline analytical pipeline and provide a methodological foundation for future clinical validation studies involving diabetic patients, where the relationship between detected anomalies and DFU-related pathophysiology can be directly assessed.

0

q-bio.OT 2025-11-17 2 theorems

Blood signatures cluster 103 diseases into 16 mechanistic groups

by Bolin Liu, Abicumaran Uthamacumaran +2 more

Multi-omic Enriched Blood-Derived Digital Signatures Reveal Mechanistic and Confounding Disease Clusters for Differential Diagnosis

Routine lab analytes reveal cytokine pathways that link conditions across traditional categories.

abstract click to expand

Understanding disease relationships through blood biomarkers offers a pathway toward data-driven taxonomy and precision medicine. In this study, we constructed a digital blood twin, a computational model derived from 103 disease signatures comprising longitudinal hematological and biochemical analytes. Profiles were standardized into a unified disease-analyte matrix, and pairwise Pearson correlations were computed to assess similarity across conditions. Hierarchical clustering revealed consistent grouping of hematopoietic disorders, while metabolic, endocrine, and respiratory diseases were more heterogeneous, reflecting weaker internal cohesion. To evaluate cluster structure, the tree was partitioned at a stringent distance threshold, yielding 16 groups. Enrichment analysis of the largest and most heterogeneous cluster demonstrated convergence on cytokine-signaling pathways, indicating shared inflammatory mechanisms that transcend conventional clinical boundaries. PCA and UMAP corroborated the correlation-based results, consistently separating hematological diseases as a distinct cluster. Random Forest feature selection identified neutrophils, mean corpuscular volume, red blood cell count, and platelet count as the most discriminative analytes, reinforcing the role of hematopoietic markers as key drivers of disease stratification. Collectively, these findings show that blood-derived digital signatures can recover clinically meaningful disease clusters while uncovering mechanistic overlaps across categories. This network physiology framework highlights the potential of integrating routine laboratory data with computational methods to refine disease ontology, map comorbidities, and advance precision diagnostics.

0

q-bio.OT 2025-11-13 2 theorems

Blood count reference intervals lack geographic structure

by Kunlin Wu, Abicumaran Uthamacumaran +1 more

Data from 28 countries shows high similarity across populations, supporting shift to personalized baselines instead of universal norms

abstract click to expand

Blood reference intervals (RIs) underpin diagnostic interpretation and therapeutic monitoring worldwide. However, many widely used RI systems originate from limited historical cohorts and have been propagated across health systems without harmonised derivation protocols, shared metadata, or cross-population validation. Consequently, the global RI landscape reflects a heterogeneous mixture of legacy standards and local laboratory practices rather than a biologically grounded framework. Here we examine published Complete Blood Count (CBC) reference intervals, one of the most commonly used laboratory panels worldwide. We compiled CBC RI data from 28 countries and analysed their similarity using variability mapping, hierarchical clustering, information-theoretic distances, cohesion benchmarking, and nonlinear manifold visualisation. Body mass index (BMI) served as a methodological positive benchmark and exhibited clear continent-level clustering (mean cohesion approximately 0.78-0.81). In contrast, CBC reference intervals showed no reproducible geography-linked clustering across methods, with uniformly high cohesion scores (mean approximately 1.27-1.30). Weak signals in red-cell indices (MCV, HGB) were unstable across sexes and distance metrics. This absence of structure should not be interpreted as evidence that current CBC reference intervals represent universal biological standards. Rather, it is more consistent with the fragmented and historically inherited nature of the global RI landscape. These findings indicate that published CBC reference intervals do not encode coherent global structure and provide limited support for universal population-based diagnostic thresholds. Instead, they support a transition toward recalibrated and personalised reference frameworks based on longitudinal individual baselines and harmonised derivation standards.

1 0

0

q-bio.OT 2025-11-06 2 theorems

Surgical AI validation often ignores video timing and structure

by Annika Reinke, Ziying O. Li +96 more

Current validation practice undermines surgical AI development

This leads to unstable and clinically irrelevant performance claims that slow adoption in operating rooms.

abstract click to expand

Surgical data science (SDS) is rapidly advancing, yet clinical adoption of artificial intelligence (AI) in surgery remains limited, with inadequate validation emerging as an important contributing factor. In fact, existing validation practices often neglect the temporal and hierarchical structure of intraoperative videos, producing misleading, unstable, or clinically irrelevant results. In a pioneering, consensus-driven effort, we introduce a comprehensive catalog of validation pitfalls in AI-based surgical video analysis that was derived from a multi-stage Delphi process with 92 international experts. The collected pitfalls span three categories: (1) data (e.g., incomplete annotation, spurious correlations), (2) metric selection and configuration (e.g., neglect of temporal stability, mismatch with clinical needs), and (3) aggregation and reporting (e.g., clinically uninformative aggregation, failure to account for frame dependencies in hierarchical data structures). A systematic review of surgical AI papers reveals that these pitfalls are widespread in current practice, with the majority of studies failing to account for temporal dynamics or hierarchical data structure, or relying on clinically uninformative metrics. Experiments on real surgical video datasets provide empirical evidence that ignoring temporal and hierarchical data structures can substantially understate uncertainty, obscure critical failure modes, and even alter algorithm rankings. To address these shortcomings, we provide a catalogue of best practices compiled in a multi-stage Delphi process. Together, this work provides an evidence-based framework to inform more rigorous validation of surgical video analysis algorithms and to guide future efforts in benchmarking, reporting, regulatory review, and clinical translation.

0

q-bio.OT 2025-09-03 1 theorem

Negative notes raise schizophrenia diagnosis odds for Black men

by Alissa A. Valentine, Lauren A. Lepow +4 more

Bias Detection in Emergency Psychiatry: Linking Negative Language to Diagnostic Disparities

Language model analysis of emergency notes shows high negative content weakens race effects and hits Black males hardest.

abstract click to expand

The emergency department (ED) is a high stress environment with increased risk of clinician bias exposure. In the United States, Black patients are more likely than other racial/ethnic groups to obtain their first schizophrenia (SCZ) diagnosis in the ED, a highly stigmatizing disorder. Therefore, understanding the link between clinician bias exposure and psychiatric outcomes is critical for promoting nondiscriminatory decision-making in the ED. This study examines the association between clinician bias exposure and psychiatric diagnosis using a sample of patients with anxiety, bipolar, depression, trauma, and SCZ diagnoses (N=29,005) from a diverse, large medical center. Clinician bias exposure was quantified as the ratio of negative to total number of sentences in psychiatric notes, labeled using a large language model (Mistral). We utilized logistic regression to predict SCZ diagnosis when controlling for patient demographics, risk factors, and negative sentence ratio (NSR). A high NSR significantly increased one's odds of obtaining a SCZ diagnosis and attenuated the effects of patient race. Black male patients with high NSR had the highest odds of being diagnosed with SCZ. Our findings suggest sentiment-based metrics can operationalize clinician bias exposure with real world data and reveal disparities beyond race or ethnicity.

0

q-bio.OT 2025-07-28 1 theorem

Ontology unifies NIRS concepts for acute respiratory care

by Md Fantacher Islam, Jarrod Mosier +1 more

Development and Evaluation of an Ontology for Non-Invasive Respiratory Support in Acute Care

145 classes and SWRL rules enable consistent documentation and therapy recommendations across scenarios.

abstract click to expand

Managing patients with respiratory failure increasingly involves noninvasive respiratory support (NIRS) strategies to support respiration, often preventing the need for invasive mechanical ventilation. However, despite the rapidly expanding use of NIRS, there remains a significant challenge to its optimal use across all medical circumstances. It lacks a unified ontological structure, complicating guidance on NIRS modalities across healthcare systems. This study introduced NIRS ontology to support knowledge representation in acute care settings by providing a unified framework that enhances data clarity and interoperability, laying the groundwork for future clinical decision-making. We developed NIRS ontology using the Web Ontology Language (OWL) and Protege to organize clinical concepts and relationships. To enable rule-based clinical reasoning beyond hierarchical structures, we added Semantic Web Rule Language (SWRL) rules. We evaluated logical reasoning by adding a sample of 6 patient scenarios and used SPARQL queries to retrieve and test targeted inferences. The ontology has 145 classes, 11 object properties, and 18 data properties across 949 axioms that establish concept relationships. To standardize clinical concepts, we added 392 annotations, including descriptive definitions based on controlled vocabularies. SPARQL query evaluations across clinical scenarios confirmed the ontology ability to support rule based reasoning and therapy recommendations, providing a foundation for consistent documentation practices, integration into clinical data models, and advanced analysis of NIRS outcomes. In conclusion, we unified NIRS concepts into an ontological framework and demonstrated its applicability through the evaluation of patient scenarios and alignment with standardized vocabularies.

0

cs.AI 2025-02-27 3 theorems

AI proposes lab-validated candidates for leukemia and fibrosis

by Juraj Gottweis, Wei-Hung Weng +32 more

Towards an AI co-scientist

Multi-agent Gemini system generates hypotheses that inhibit tumors in vitro and reduce fibrosis in organoids

abstract click to expand

Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned to scientist-provided research objectives and guidance. The system's design incorporates a generate, debate, and evolve approach to hypothesis generation, inspired by the scientific method and accelerated by scaling test-time compute. Key contributions include: (1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation. Automated evaluations show continued benefits of test-time compute, improving hypothesis quality. While general purpose, we focus development and validation in three biomedical areas: drug repurposing, novel target discovery, and explaining mechanisms of bacterial evolution and anti-microbial resistance. For drug repurposing, the system proposes candidates with promising validation findings, including candidates for acute myeloid leukemia that show tumor inhibition in vitro at clinically applicable concentrations. For novel target discovery, the AI co-scientist proposed new epigenetic targets for liver fibrosis, validated by anti-fibrotic activity and liver cell regeneration in human hepatic organoids. Finally, the AI co-scientist recapitulated unpublished experimental results via a parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution. These results, detailed in separate, co-timed reports, demonstrate the potential to augment biomedical and scientific discovery and usher an era of AI empowered scientists.

117 0

0

q-bio.OT 2024-09-25 Recognition

Brain-heart signal methods establish interactions as nervous system biomarkers

by Diego Candia-Rivera, Luca Faes +2 more

Measures and Models of Brain-Heart Interactions

Review shows current processing techniques link cardiac inputs to brain changes, enabling evaluation of neurological state and disease risk.

abstract click to expand

The exploration of brain-heart interactions within various paradigms, including affective computing, human-computer interfaces, and sensorimotor evaluation, stands as a significant milestone in biomarker development and neuroscientific research. A range of techniques, spanning from molecular to behavioral approaches, has been proposed to measure these interactions. Different frameworks use signal processing techniques, from the estimation of brain responses to individual heartbeats to higher-order dynamics linking cardiac inputs to changes in brain organization. This review provides an overview to the most notable signal processing strategies currently used for measuring and modeling brain-heart interactions. It discusses their usability and highlights the main challenges that need to be addressed for future methodological developments. Current methodologies have deepened our understanding of the impact of neural disruptions on brain-heart interactions, solidifying it as a biomarker for evaluation of the physiological state of the nervous system and holding immense potential for disease stratification. The vast outlook of these methods becomes apparent specially in neurological and psychiatric disorders. As we tackle new methodological challenges, gaining a more profound understanding of how these interactions operate, we anticipate further insights into the role of peripheral neurons and the environmental input from the rest of the body in shaping brain functioning.

0

stat.AP 2019-07-08 Recognition

Semi-Markov model yields analytic indexes for 3-base DNA periodicity

by Pavlos Kolias, Alexandra Papadopoulou

Investigating some attributes of periodicity in DNA sequences via semi-Markov modelling

Closed-form probabilities describe the repeating pattern that marks protein-coding gene segments, shown on synthetic and real sequences.

abstract click to expand

DNA segments and sequences have been studied thoroughly during the past decades. One of the main problems in computational biology is the identification of exon-intron structures inside genes using mathematical techniques. Previous studies have used different methods, such as Fourier analysis and hidden-Markov models, in order to be able to predict which parts of a gene correspond to a protein encoding area. In this paper, a semi-Markov model is applied to 3-base periodic sequences, which characterize the protein-coding regions of the gene. Analytic forms of the related probabilities and the corresponding indexes are provided, which yield a description of the underlying periodic pattern. Last, the previous theoretical results are illustrated with DNA sequences of synthetic and real data.

0