eess — Pith

Top Pith

5

cs.CV 2026-05-20 2 theorems

AI models lag behind text-only on 3D brain MRI benchmark

by Mohammad H. Abbasi, Favour Nerrise +13 more

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Top vision-language models reach only 47.5 percent on verified questions while text statistics yield 49.4 percent.

abstract click to expand

We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.

0

Top Pith

2

eess.AS 2026-05-14 Recognition

Benchmark standardizes early Parkinson's speech detection

by Terry Yi Zhong, Cristian Tejedor-Garcia +4 more

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

Speaker-independent splits on accessible datasets enable fair, replicable comparisons across tasks and training settings.

abstract click to expand

Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings. We also present multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage to support fine-grained comparisons and clinical adoption. Our results provide a replicable reference and actionable insights, encouraging the adoption of this publicly available benchmark to advance robust and clinically meaningful EarlyPD detection from speech.

1 0

Top Pith

2

eess.SY 2026-05-06

EV with rooftop PV and V2G saves up to EUR 2410 yearly

by Francesco Popolizio, Albert Škegro +3 more

Online Energy Management for Bidirectional EV Charging with Rooftop PV: An Aging-Aware MPC Approach

Aging-aware MPC optimizes bidirectional flows for arbitrage and self-consumption, beating unidirectional charging with only 1.27 percent额外电池

abstract click to expand

This paper investigates the economic impact of vehicle-home-grid integration in the presence of rooftop PV, by proposing an online, aging-aware energy management strategy for an electric vehicle (EV), a household, and the electrical grid. The model predictive control-based framework explicitly exploits vehicle-to-grid (V2G) and vehicle-to-home (V2H) operation to perform energy arbitrage, increase self-consumption, while respecting user-driven driving requirements. The framework optimizes power flows over a shrinking horizon using a detailed battery aging model that captures both calendar and cycle degradation, and a Transformer-based forecaster that provides short-term predictions of household load and solar irradiance. For a one-year horizon, the proposed strategy yields the lowest annual cost among all evaluated strategies. Adding PV increases the annual profit by EUR 1060.7 compared to operating without PV, and yields an economic gain of up to EUR 2410.5 over smart unidirectional charging, at the expense of only 1.27% extra battery degradation. Even in the least favorable case with no remuneration for V2G energy, bidirectional operation still delivers an economic gain of EUR 355.8 through V2H. Sensitivity analyses over V2G price ratio, EV battery size, household demand, and pickup time uncertainty confirm that these benefits persist across a wide range of scenarios and highlight the potential of EVs as active energy nodes, enabling sustainable energy management and cost-effective battery usage in real-world conditions.

0

eess.SY 2026-07-03

Sliding mode law docks 3D vehicles using range and sight angles

by Ram Milan Kumar Verma, Shashi Ranjan Kumar +1 more

Docking of Autonomous Vehicles with a Stationary Docking Station in 3D Space

Finite-time controller aligns orientation and brings speed to near zero for safe station approach.

abstract click to expand

In this letter, we present a strategy for autonomous docking of autonomous vehicles in three-dimensional space. Docking is a safety-critical task and requires expert piloting skills. Vehicles with autonomous docking capabilities are highly desirable in various applications, such as marine vehicle docking, aerial vehicle docking, spacecraft docking, and landing. To dock autonomously with the docking station, the vehicle must align itself to a specific desired orientation relative to the docking station and also reduce speed as it approaches. The vehicle achieves near-zero speed to dock successfully and safely without colliding with the docking station. Inspired by the philosophies from the guidance literature, we present a finite-time sliding mode-based strategy to achieve the same. The range and line-of-sight kinematics relations describing the motion of the vehicle with respect to the stationary docking station are used to steer the vehicle to achieve the desired orientation for docking. This docking strategy is validated in MATLAB\textsuperscript{\textregistered} simulations for various initial locations and orientations of both the vehicle and the docking station.

0

cs.RO 2026-07-03

QuadRocket achieves almost global trajectory tracking with adaptive control

by Pedro Santos, Joel Reis +2 more

QuadRocket: An Aerial Robotic Testbed for Adaptive Thrust-Vector Control of Rocket-Like Vehicles

The quadrotor-based rocket prototype models the vehicle as an axisymmetric body to enable disturbance rejection in thrust-vector control.

abstract click to expand

This paper presents QuadRocket, a quadrotor-based rocket prototype that provides a low-cost, low-risk platform for validating advanced thrust-vector control strategies for launch vehicle-type systems. The prototype consists of a cylindrical main body mounted on top of a quadrotor through a universal joint, forming a flying inverted pendulum with non-negligible inertia. For control design, the coupled system is modeled as a single axisymmetric rigid body actuated by a vectored force applied along its longitudinal axis. A reduced-attitude representation on the two sphere is adopted to explicitly exploit the vehicle's axial symmetry and to decouple yaw from the thrust-vector direction. On this model, we derive an adaptive backstepping controller that achieves almost global trajectory tracking in the presence of unknown constant disturbances, while a control-point transformation mitigates non minimum-phase behavior. The quadrotor is then treated as a thrust vector actuator, and a dynamic-surface-based attitude controller is designed to track the desired thrust-vector, accounting for actuation dynamics and avoiding explicit differentiation of virtual control signals. The complete architecture is evaluated in simulation and validated experimentally in an indoor motion-capture arena. Results demonstrate accurate trajectory tracking, effective disturbance compensation, and confirm the suitability of the QuadRocket as a versatile testbed for thrust-vector-controlled robotic vehicles.

0

cs.CL 2026-07-03

Narration acoustics predict audiobook appeal beyond title

by Shahar Elisha, Mariano Beguerisse-Díaz +1 more

Audio-Based Understanding of Audiobook Narration Appeal

Vocal features extracted from recordings remain tied to view-rate and engagement after title controls are applied.

abstract click to expand

Narration is central to the audiobook listening experience, shaping how listeners engage with and understand the content. This work explores how narration qualities shape an audiobook's appeal, noting that their effects can vary by genre, title, and audience. We extract vocal and acoustic features (e.g., tone, pace, loudness) from LibriVox using pre-trained audio models and analyse their relationship with consumption data (specifically, view-rate) and their interplay with genre and title. Despite limited consumption data, we find that acoustic information alone has a robust association with appeal, even after accounting for title effects. We further validate these findings using more nuanced proprietary engagement metrics. To our knowledge, this is the first systematic computational study linking narration qualities, genre, title, and audiobook consumption, highlighting the potential of data-driven insights to improve audiobook personalisation and narrator casting.

1 0

0

eess.SY 2026-07-03

Torque tuning steers nonholonomic vehicle to source orbit

by Bo Wang

Nonholonomic Source Seeking by Torque Tuning: Local and Semi-Global Feedbacks

Two feedback laws achieve local and semi-global stability from scalar sensor data alone, without position or gradient information.

abstract click to expand

This paper studies source seeking for a torque-controlled nonholonomic vehicle with a laterally displaced scalar sensor. The vehicle has constant forward speed, while its yaw motion is controlled by torque input with unknown inertia and damping. The objective is to steer the vehicle to a source-centered circular motion so that the lateral sensor approaches the unknown source, without using position, heading, source-location, gradient, or source-value information. The proposed torque law combines a fast oscillatory component, which generates averaged steering through symmetric-product approximation, with a slowly tuned bias component, which selects the desired orbit. Two bias-tuning designs are developed. The first is an output-feedback design using only the scalar measurement; it applies a Lie-bracket extremum-seeking update and yields local practical stability. The second is a velocity-assisted design using forward-speed and yaw-rate measurements; it tunes the bias through the yaw-rate tracking error and yields a globally asymptotically stable averaged system, implying semi-global practical stability of the original system. Simulations illustrate the proposed designs.

0

cs.CV 2026-07-03

Multi-expert model cuts OOD false positives on medical scans

by A.S. Anudeep, Vaanathi Sundaresan

MARVEL: Margin-Aware Robust von Mises-Fischer Expert Learning for Long-Tailed Out-of-Distribution Detection

Margin-aware nonlinear von Mises-Fisher experts plus outlier specialist achieve up to 37 percent lower error rates on three datasets.

abstract click to expand

For clinical deployment, it is essential that automated diagnostic systems remain reliable when confronted with previously unseen cases, yet deep models routinely misclassify out-of-distribution (OOD) inputs with high confidence, underscoring the need for more robust OOD detection methods. Although substantial effort has been devoted to improving model robustness, most of the existing literature assumes balanced datasets, evaluates OOD detection on coarse or non-clinical OOD sources, or lacks comprehensive assessment across diverse OOD scenarios. To address the gaps, we propose a novel methodology trained on diverse and imbalanced medical datasets and evaluated across a clinically reflective OOD spectrum. Our framework comprises three key components: (1) a Nonlinear von Mises-Fisher (NvMF) classifier capable of learning non-linear decision boundaries, with theoretical proof of its asymptotic connection to cosine classifiers; (2) a multi-expert framework in which margin-aware NvMF classifiers specialise in different regions of label distribution to better handle imbalance; and (3) an outlier expert trained explicitly to distinguish inlier from outlier data, thereby strengthening OOD detection. Evaluation on RFMiD, ISIC2019, and NCTCRC datasets demonstrates consistent improvements over state-of-the-art methods, achieving mean FPR95 reductions of 8.45%, 13.02%, and 36.90% respectively. These gains are further supported by comprehensive ablations that validated the contributions of each component. This enables reliable identification of unfamiliar cases for deferral to clinicians, supporting safer AI-assisted diagnosis in real-world workflows. Our code is available at https://github.com/redboxup/MARVEL.

0

eess.IV 2026-07-03

Self-auditing drift model leads SSIM in accelerated knee MRI

by Qing Lyu, Jianxu Wang +3 more

Self-Auditing Residual Drifting for Pathology-Preserving Accelerated Knee MRI

It adds per-slice risk scores that flag unreliable outputs while preserving lesion detail at high acceleration.

abstract click to expand

Accelerated magnetic resonance imaging reduces acquisition time, but reconstruction from undersampled k-space can blur diagnostically relevant structures or introduce failures that are not captured by global image metrics. We propose SA-RDM-DC, a Self-Auditing Residual generative Drifting Model with Data Consistency for accelerated knee MRI. The method adapts the newly proposed generative drifting paradigm to accelerated MRI by training a physics-conditioned drift field from the zero-filled reconstruction toward the fully sampled residual correction. It predicts image- and missing-k-space residual corrections, enforces data consistency with acquired k-space, uses frequency-aware and residual drifting supervision to recover fine detail, and produces dense error maps and slice-level risk scores in the same inference pass. We evaluate SA-RDM-DC on multi-coil fastMRI knee data at acceleration factors of 4, 8, and 12, with fastMRI+ pathology annotations for region-level and classifier-based task preservation, and on SKM-TEA for zero-shot and fine-tuned protocol-shift evaluation. Compared with zero-filled reconstruction, UNet-image-SENSE, DC-UNet, Score-Diffusion, ELF-Diff, SENSE-VarNet, and MoDL baselines, SA-RDM-DC achieves the highest SSIM across fastMRI acceleration factors while retaining subsecond per-slice inference and avoiding the long sampling time of iterative diffusion baselines. In pathology-aware analysis, SA-RDM-DC preserves lesion-region structural fidelity and reduces meniscus prediction instability. Its self-auditing scores strongly identify high-error reconstructions on fastMRI and partially transfer as a selective-review signal under SKM-TEA protocol shift. These results support reconstruction evaluation that jointly considers image fidelity, pathology preservation, runtime, and case-specific reliability.

0

eess.SP 2026-07-03

Optimization boosts sum-rate in RIS-aided RSMA-SWIPT with movable antennas

by Muhammad Asif, Asim Ihsan +5 more

Robust Transmission Design for RIS-Assisted RSMA-SWIPT Systems With Movable Antennas Under Hardware Distortions

Decomposes the coupled problem into convex surrogates to handle CSI errors and hardware distortions while jointly tuning beamforming, reflec

abstract click to expand

This paper investigates a robust transmission design for a multi-user rate-splitting multiple access (RSMA)-based simultaneous wireless information and power transfer (SWIPT) system empowered by movable antennas (MAs) and a reconfigurable intelligent surface (RIS) under channel state information (CSI) uncertainty and residual hardware impairments (HIs). The effective channels in MAs-enabled systems depend on antenna positions, causing CSI uncertainty to affect not only active and passive beamforming but also antenna position optimization. Furthermore, residual HIs distort the effective SINRs, creating additional coupling among beamforming, RIS reflection control, common-rate allocation, power-splitting ratio optimization, and antenna position optimization. Consequently, the joint impact of CSI uncertainty and HIs leads to a highly coupled and challenging resource allocation problem. To address this challenge, we propose a robust resource allocation framework that jointly optimizes common-rate allocation, transmit beamforming, RIS reflection coefficients, power-splitting ratios, and MAs positions to maximize the achievable sum-rate while satisfying practical system constraints. To obtain an efficient solution, the original problem is decomposed into active beamforming, RIS reflection design, power-splitting ratio optimization, and MAs position optimization subproblems, where tractable convex surrogate functions are constructed to handle the non-convex objective and constraints. Simulation results verify the effectiveness of the proposed framework and demonstrate substantial improvements in achievable sum-rate, robustness against CSI uncertainty and hardware impairments, and convergence performance compared with benchmark schemes.

0

eess.SY 2026-07-03

Decision Transformer cuts grid frequency error by 99 percent

by Mohamed Shamseldein

Generative Autonomous Grid Control: Integrating Decision Transformers with a Two-Stage Safety Stack

Offline sequence model plus two-stage safety stack achieves real-time control and 59.4 Hz nadir on 140-bus test system

abstract click to expand

The displacement of synchronous generation by inverter-based resources is accelerating power system frequency dynamics beyond the response capability of conventional automatic generation control. This paper presents Autonomous Grid Generation Control with Decision Transformers, a framework coupling an offline-trained Decision Transformer with a twostage symbolic safety stack for secondary frequency control. The Decision Transformer learns a conditional dispatch policy from offline supervisory control and data acquisition records via sequence modeling, eliminating online exploration risks. A Constraint Verification Unit provides sub-ten-millisecond algebraic screening using real-time power transfer distribution factors, while an aggregate digital twin performs swing-equation-based dynamic stability certification. Validated on the Northeast Power Coordinating Council 140-bus system under low-inertia conditions, the proposed controller reduces the area control error integral by over 99% relative to tuned automatic generation control, maintains a 59.4 Hz frequency nadir, and achieves inference latency of approximately 10 ms, well within real-time constraints. Comparative evaluation against a linear quadratic regulator baseline and structural analysis against conservative Q-learning demonstrate the advantages of the sequence-modeling formulation. Small-signal eigenvalue analysis characterizes the dominant 1.87 Hz electromechanical mode and confirms that the safety stack maintains stable operation across operating points. By falling back to tuned automatic generation control whenever proposals are rejected, the safety stack bounds worst-case performance to industry-standard levels in simulation.

0

eess.SY 2026-07-03

Optimal reliability threshold below P90 cuts reserve costs 14.5%

by Torine R. Herstad, Jalal Kazempour +2 more

Refinement of Reliability Grid Codes in the Provision of Ancillary Services

Bilevel model treats the threshold as a design variable and shows fixed P90 is not cost-minimizing for stochastic providers in Nordic FCR-D

abstract click to expand

Stochastic resources such as wind farms, electric vehicle aggregators, and demand-side assets are increasingly participating as reserve providers in ancillary service markets. To manage delivery uncertainty, system operators impose minimum reliability thresholds on such providers. Energinet, the Danish transmission system operator (TSO), has pioneered this approach through the P90 requirement, requiring stochastic providers to make accepted reserve capacity bids available with at least 90% probability. Yet this threshold is set by regulatory convention, not optimization: no existing framework treats it as a design variable or characterizes the cost-reliability trade-off it governs. This paper closes that gap. We develop a bilevel optimization framework in which the TSO in the upper level sets the reliability threshold endogenously while providers in the lower levels respond through reliability-constrained bidding, with chance constraints reformulated analytically using a Weibull tail distribution. Applied to the Nordic frequency containment reserve for disturbances (FCR-D) market, the cost-optimal threshold lies below P90 in the studied cases, with cost reductions by up to 14.5% relative to the fixed standard. Dynamic hourly thresholds yield a further reduction of up to 2.4%, suggesting efficiency gains may increase in larger and more diverse reserve markets.

0

eess.SP 2026-07-03

Rotatable arrays raise multiuser rates by shaping phase and gain

by Xingxiang Peng, Qingqing Wu +3 more

Beyond Beamforming: Phase-and-Gain Channel Shaping via Rotatable Antenna Arrays

Joint pose and boresight control improves channel strength and user separation beyond fixed beamforming.

abstract click to expand

This paper investigates geometry-reconfigurable transmission for multiuser communication systems enabled by a rotatable antenna array. In contrast to conventional fixed arrays, the proposed architecture jointly exploits array pose adjustment and element-level boresight steering, thereby reshaping both the array-induced phase responses and the direction-dependent channel gains. We formulate a weighted sum-rate maximization problem that jointly optimizes the transmit beamformers, array pose, and element boresights under practical visibility and steering constraints. To reveal the underlying design principles, we first provide a geometric interpretation via zero-forcing analysis, showing that the resulting rates stem from both channel-strength enhancement and spatial-separability improvement. Specifically, array-pose rotation improves inter-user channel orthogonality even with isotropic elements, whereas directional elements introduce a tradeoff between phase-based spatial separation and boresight-dependent gain alignment. Motivated by these insights, we develop an efficient optimization framework that jointly coordinates transmit beamforming, array-pose adaptation, and element-boresight steering to exploit the geometry-induced phase-and-gain channel-shaping capability. Simulation results demonstrate that the proposed joint design outperforms fixed-array, pose-only, and boresight-only benchmarks, with larger gains achieved under more directive element patterns and tighter boresight-steering constraints.

0

eess.AS 2026-07-03

Survey maps SSL, DSE, and ASR pipelines for spatial speech

by Pengyuan Shao, Dimitrios Kanoulas

Spatial Speech Perception Systems: A Survey of Sound Source Localization, Directional Enhancement, and Speech Recognition

Reviews classical and learning-based methods for robust performance in noisy, reverberant scenes.

abstract click to expand

Robust speech understanding in real-world acoustic environments remains a fundamental challenge for intelligent auditory systems such as robot audition, hearing aids, teleconferencing systems, smart speakers, and voice-controlled assistants. These systems must operate under background noise, reverberation, competing speakers, and dynamic acoustic conditions. Spatial speech perception addresses this challenge by exploiting microphone-array information to localize, enhance, and interpret target speech in complex acoustic scenes. This paper surveys spatial speech perception systems with emphasis on the roles of sound source localization (SSL), directional speech enhancement (DSE), and automatic speech recognition (ASR), both individually and within integrated processing pipelines. We review classical signal-processing approaches and recent learning-based methods for microphone-array localization, beamforming, neural enhancement, speech separation, and modern recognition architectures. Beyond component-level analysis, we discuss robustness to noise and reverberation, multi-speaker operation, real-time constraints, and computational efficiency. We also examine representative applications in robot audition, hearing assistance, smart speakers, and teleconferencing, and identify open challenges and future directions toward robust, low-latency, and perception-aware speech systems for complex acoustic environments.

0

eess.AS 2026-07-03

Adversarial contrastive training beats SOTA on cross-domain audio increments

by Yongjie Si, Yanxiong Li +2 more

Cross Domain Few-Shot Class-Incremental Audio Classification Via Adversarial Contrastive Learning

Freezing the encoder after base classes lets only the classifier adapt to new domains while raising average accuracy.

abstract click to expand

Current Few-shot Class-incremental Audio Classification (FCAC) methods assume that samples of base and incremental classes are in the same domain (following the same distribution). However, there is generally a domain shift between the above two types of samples. In this paper, we explore the problem of Cross Domain FCAC where samples of base and incremental classes have domain shift. We propose a strategy of adversarial contrastive training which enables the model to effectively classify samples of different classes from unseen domains. The model consists of an encoder and a classifier. The encoder is trained in base session but frozen in incremental sessions, whereas the classifier is trained in all sessions. Experiments are done on six pairs of cross-domain datasets. Results show that our method exceeds state-of-the-art methods in average accuracy. The code is at https://github.com/YongjieSi/ACL.

0

cs.CL 2026-07-03

Weight addition transfers instruction following to speech models

by Congrui Du, Yang Zhang +2 more

Unlocking Speech-Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning

A single speech pre-training round plus the text tuning delta yields capable speech instruction followers without dedicated speech tuning da

abstract click to expand

Instruction tuning for speech language models (SLMs) is substantially more challenging than for text-based large language models (LLMs), as it requires learning a new modality and a wide range of speech-specific instructions in addition to those supported by text LLMs. Existing SLM training approaches largely replicate the text LLM training paradigm by synthesizing large-scale speech pre-training and instruction-tuning datasets. However, this strategy is difficult to scale, since speech sequences are significantly longer than text sequences. In this paper, we propose SpeechCombine, an instruction-following speech language model trained without any instruction tuning, using only a single round of speech pre-training on 30k hours of data. Starting from a text LLM base model, we perform continuous pre-training on speech utterances to obtain a speech-adapted model, and then directly combine its weights with the weight difference between the instruction-tuned and base versions of the text LLM. Our results show that this simple combination strategy not only preserves the knowledge and capabilities of the original text LLM, but also effectively transfers them to the speech domain. These findings suggest a new direction for SLM training that avoids reliance on massive speech data.

0

eess.SP 2026-07-03

FFT preconditioning reduces neural feature error up to 50 percent

by Preston Pitzer, Anish Pradhan +1 more

Fourier Preconditioning for Neural Feature Learning

For stationary signals the transform packs dependence into dominant modes, cutting truncation error without extra training cost.

abstract click to expand

Mutual information (MI)-inspired feature learning techniques are capable of generating low-dimensional embeddings that retain nonlinear dependence structures, but direct estimations of MI suffer from noisy probability distribution estimates in the low-data regime. The H-Score objective, computed from second-order statistics, provides a practical proxy metric for training feature extraction networks. We prove that H-Score is invariant to invertible transformations in the unrestricted functional setting, but becomes sensitive to input basis rotations under constrained approximation classes. Consequently, we study unitary preconditioning for H-Score networks and show that selecting an appropriate basis rotation reduces finite-width truncation error by concentrating predictive dependence into fewer dominant modes. We identify the fast Fourier transform (FFT) as an effective data-independent, low-cost preconditioner for approximately stationary processes, where spectral structure induces concentration of the cross-covariance singular value spectrum. We introduce training-free metrics based on spectral entropy and cumulative dependence energy to quantify basis suitability and predict downstream inference gains prior to network training. Experiments across eight multivariate datasets demonstrate that FFT preconditioning is particularly useful in resource-constrained regimes, achieving up to 50% normalized mean squared error (NMSE) reduction, while the proposed metrics correlate with observed performance gains and correctly identify cases where spectral preconditioning is detrimental.

0

eess.SY 2026-07-03

Two-layer flow preserves optimal agreement under safety constraints

by Zhanglin Shangguan, Wei Xiao +2 more

Reference-Governed Distributed Safe Gradient Flow for Safe Optimal Output Agreement of Multi-Agent Systems

Separates regulation from optimization to avoid altering the steady-state solution in nonlinear multi-agent systems.

abstract click to expand

This paper studies safe optimal output agreement for nonlinear multi-agent systems with output safety constraints. Existing safe feedback optimization methods often implement gradient-flow dynamics directly through the plant input, which may require high-order control barrier functions (HOCBFs). The resulting derivative-chain design is tuning-sensitive and can introduce additional equilibrium conditions that alter the steady-state optimal solution. We propose a reference-governed two-layer architecture that separates lower-layer output regulation from upper-layer distributed optimization. The upper layer filters the reference gradient flow through first-order control barrier function constraints, which are easier to tune and preserve the steady-state optimality structure of the original agreement problem. The lower layer uses an internal-model-based output regulator with a reference-dependent Lyapunov function, from which dynamic safety margins (DSMs) are constructed to certify transient output safety. We prove forward invariance, optimal-solution preservation under DSM-compatibility conditions, and convergence via a Lyapunov small-gain argument. Simulations validate safe convergence, show advantages over HOCBF-based feedback optimization, and demonstrate adaptive tangential objective shaping for escaping spurious equilibria induced by nonconvex obstacles.

0

eess.SP 2026-07-03

Compressed LEO data locates GNSS jammers at high ratios

by Giacomo Pojani, Javier Tegedor +6 more

Complexity-Scalable Direct Geolocation and Cancellation of Terrestrial GNSS Jammers: Single-Satellite and Multi-Antenna Experiments in Low Earth Orbit

Quasi-direct geolocation processes quantized time-frequency samples from small satellites to track terrestrial jammers in near real time.

abstract click to expand

Monitoring the radio-frequency (RF) spectrum from space imposes demanding requirements to satellite platforms in terms of communication bandwidth and computational resources, which are necessary for the downlink, the storage, and the processing of high-throughput I/Q samples. This paper analyzes in depth the quasi-direct geolocation (QDG) as a technique to enable the exploitation of satellites of opportunity in low Earth orbit (LEO) to sense the spectrum in the bands of global navigation satellite systems (GNSS). This is a technique of passive RF geolocation and consists of an ensemble of signal processing algorithms, which compress the I/Q samples and process the compressed data through fast delay-Doppler shift matching and interferometry in a quantized time-frequency domain. These algorithms speed up the exhaustive search of multiple RF sources in the position domain. The efficiency gain addresses the bottleneck that prevents the employment of satellites, which are limited in downlink capacity and on-board computational power. These satellites are usually constrained in size, weight and power (SWaP) and represent most of the spacecrafts in LEO. The ability to exploit assets as such for the geolocation of terrestrial GNSS jammers in near real time is instrumental the performance of a multi-constellation GNSS RFI monitoring system. The present work describes the mathematical framework and precision bounds, introduces single- and multi-antenna uses cases, combines different compression methods, and evaluates the geolocation accuracy with real data. The I/Q samples were collected by a repurposed GNSS reflectometry (GNSS-R) satellite, OPS-SAT PRETTY, in a dedicated test session during Jammertest 2025. The experimental results demonstrate the capability to geolocate GNSS jammers with different signal-to-noise ratios (SNR) with extremely high compression ratios.

0

eess.SY 2026-07-03

RBF activation function shapes robotic tracking performance

by Kimmo Paldanius, Gabriel da Silva Lima +1 more

Influence of Radial Basis Activation Functions on Intelligent Controller for Robotic Manipulators

Experiments on a manipulator show stability for all kernels but clear differences in adaptation and accuracy

abstract click to expand

This paper presents an intelligent control framework for trajectory tracking of robotic manipulators using radial basis function (RBF) neural networks for online disturbance estimation. The proposed control structure combines model-based nonlinear control with an adaptive neural approximator that compensates for parametric uncertainties, friction, and unmodeled dynamics. A Lyapunov-based adaptation law with projection guarantees boundedness of the closed-loop signals and convergence of the tracking error to a compact region. The primary objective of this work is to investigate how the choice of activation function within the RBF network influences transient behavior, steady-state accuracy, and control smoothness. The controller is implemented on a robotic manipulator. Experimental results demonstrate that although stability is preserved for all kernels, activation function selection significantly affects adaptation dynamics and practical tracking performance. These findings demonstrate that activation function selection acts as a structural design parameter in intelligent control, directly shaping adaptation dynamics and practical closed-loop performance.

0

cs.LG 2026-07-03

Stacking ensemble flags early Alzheimer's from ADNI records

by Debopriya Ghosh

Predicting Early Stages Of Alzheimer's Disease And Identifying Key Biomarkers Using Deep Artificial Neural Network And Ensemble Of Machine Learning Methodologies

After fixing missing values and imbalance, the model ranks biomarkers while comparing classifiers on standard accuracy measures.

abstract click to expand

Alzheimers disease (AD) is a brain disorder that develops slowly and mainly affects memory, thinking, language, and daily activities. It is one of the most common causes of dementia and creates many difficulties for patients as well as their families. In the early stage, the symptoms are often mild and may look like normal ageing. For this reason, many people are diagnosed late, when the disease has already progressed. At present, there is no complete cure for AD. Still, early detection can help doctors manage the condition better and take suitable steps at the right time. In this study, a machine learning model is proposed to detect the early stages of Alzheimers disease using clinical details, neuropsychological test scores, and neuroimaging-related measures. The data used in this work is collected from the Alzheimers Disease Neuroimaging Initiative (ADNI). As the dataset has missing values, iterative imputation is applied to fill them. The dataset also has class imbalance, which is handled using Borderline SVM-SMOTE. After that, feature selection is carried out using wrapper-based and embedded methods so that only important features are used for training. The selected features are divided into training and testing sets, and feature scaling is applied. A stacking ensemble model is developed using Logistic Regression, Extra Trees, Bagging KNN, and LightGBM as base classifiers. Along with this, an artificial neural network is also trained on the same dataset. The performance of these models is compared using precision, recall, F1-score, and AUC-ROC. This study aims to find the best classifier and also identify important biomarkers that may help in the early diagnosis of Alzheimers disease.

0

cs.LG 2026-07-03

Learned time change improves diffusion sampling quality

by Yilie Huang, Wenpin Tang +1 more

ART for Diffusion Sampling: Continuous-Time Control and Actor-Critic Learning

ART-RL optimizes a sampling-clock speed via actor-critic RL to produce timestep grids that beat fixed schedules at the same budget and trans

abstract click to expand

We study timestep allocation for score-based diffusion sampling, where a learned reverse-time dynamics is discretized on a finite grid. Uniform and hand-crafted schedules are standard choices, but they rely on fixed prescriptions and can therefore be suboptimal. To address this limitation, we propose Adaptive Reparameterized Time (ART), a continuous-time control formulation that learns a time change by treating the speed of the sampling clock as the control, so that a uniform grid on the learned clock induces adaptive timesteps in the original diffusion time. Based on a leading-order Euler error surrogate, ART provides a principled objective for allocating timesteps along the sampling trajectory. To solve this deterministic control problem, we introduce ART-RL, an auxiliary randomized formulation with Gaussian policies that turns schedule learning into a continuous-time reinforcement learning problem. We prove that the randomized ART-RL formulation is equivalent to ART at the optimizer level, in the sense that its optimal Gaussian policy recovers the optimal ART time-warping rate through its mean. We further establish policy evaluation and policy improvement characterizations and derive trajectory-based moment identities that yield implementable actor--critic updates for learning the schedule. Across experiments ranging from controlled low-dimensional settings to image generation, ART-RL can be plugged into existing diffusion samplers by changing only the timestep grid, consistently improving sample quality over strong baseline schedules at matched budgets while leaving the rest of the sampling pipeline unchanged. The learned schedules also exhibit broad generalization, transferring without retraining across sampling budgets, datasets, solvers, pipelines, and representation spaces.

0

eess.SY 2026-07-03

Closed-form bounds certify safe approach to tumbling targets

by Omer Burak Iskender, Keck Voon Ling +2 more

Reachability-Based Safe-Start Regions for Approach to a Tumbling Target with Rotating LOS Constraints

Two conservative criteria run 250 times faster than Hamilton-Jacobi reachability while retaining 0.91 recall on 500 feasibility cases.

abstract click to expand

This paper presents a reachability-aware guidance architecture for autonomous approach to a tumbling, uncooperative target under a rotating line-of-sight (LOS) docking corridor. The LOS admissible set rotates with the target body frame, producing time-varying polyhedral constraints in the chaser's relative coordinates. A safe-start region is constructed via two conservative criteria: (i) directional per-constraint erosion, the margin consumed by rotation-induced drift before thrust can arrest it, and (ii) a synchronization range bound $r < 2a_{\max}/\omega_t^2$ ensuring the chaser can cancel the apparent rotational velocity without overshooting the hold point. Closed-loop guidance uses a receding-horizon MPC controller with Clohessy-Wiltshire-Hill (CWH) prediction dynamics and explicit LOS corridor constraints in the quadratic program. Truth propagation uses the exact discrete CWH state-transition matrix with sub-stepping, so feasibility claims are physically honest: no reference blending or state projection is applied. A three-regime tracking law manages the transition from long-range inertial approach to body-frame co-rotation and synchronized hold. The analytical safe-start region is benchmarked against four standard reachability engines (backward and forward polytopic reachable sets, Hamilton-Jacobi level sets, and closed-loop Monte Carlo): the closed-form criteria are 250x faster than Hamilton-Jacobi reachability while predicting closed-loop feasibility with precision 0.80 and recall 0.91 on a 500-case sweep. The residual 6% false-positive rate and the IoU gap against Hamilton-Jacobi quantify a structural property: the synchronization set (reach and co-rotate) is a strict subset of the positional reachable set, the gap widening with tumble rate. The analytical bound is thus a sound inner certificate for onboard go/no-go decisions where Hamilton-Jacobi is prohibitively expensive.

0

eess.IV 2026-07-03

Deep learning matches experts in penis MRI segmentation for 34k scans

by Jan Ernsting, Gunnar Paul Kordes +6 more

Population-Scale Segmentation of Penile Tissue in DIXON MRI using Deep Learning for Quantitative Phenotyping in Male Reproductive Health

Observer-level accuracy enables automated penile tissue volumetry in 34,412 UK Biobank participants.

abstract click to expand

Penile measurement is clinically relevant across male reproductive and urogenital health, including conditions such as micropenis, congenital and endocrine disorders, and sexual or urinary dysfunction. However, quantitative assessment of penile size has relied mainly on external length or circumference measurements, which are difficult to standardize, sensitive to measurement conditions, and unable to capture the internal portion of the penis. MRI enables volumetric assessment of the whole penis in vivo, but automated segmentation has not previously been established at population scale. Automated whole-organ volumetry would enable high-throughput phenotyping for multi-omics and clinical studies of male reproductive disease. Here, we present a deep learning framework for whole-penis segmentation in multi-channel DIXON MRI. Using a newly curated expert-annotated training dataset ($n = 145$ subjects; $13,050$ annotated slices) and a double-annotated independent test benchmark ($n = 24$ subjects; $2,160$ double-annotated slices), we optimized a 3D nnU-Net architecture. The model achieved a 5-fold cross-validation Dice score of $0.90$ and performed at observer-level accuracy on the independent test set (Dice: $0.92$; Hausdorff distance: $3.58$). We deployed the model in $34,412$ UK Biobank participants, enabling automated quantification of total penile tissue, including both external and internal components. Longitudinal evaluation in 2,282 men demonstrated high inter-session reproducibility ($r = 0.87$). This framework establishes a reproducible and population-scalable method for MRI-based assessment of penile anatomy and provides an open technical resource for future studies in urological imaging and male reproductive health. The trained model weights will be publicly released.

0

eess.SP 2026-07-03

Clustered THz HetNets outperform random models in coverage

by Hadeel Obaid

Coverage Analysis in Terahertz Clustered HetNets

Moderate spread of small base stations raises coverage probability when users cluster in hotspots.

abstract click to expand

Terahertz (THz) transmission technologies hold significant potential for enabling ultra-broadband, short-range communication in next-generation networks. Despite the vast bandwidth, THz signals suffer from limited transmission range and a feasible scenario is to deploy THz within clustered heterogeneous networks (HetNets) to enhance coverage. This paper investigates THz communication in clustered HetNets, leveraging stochastic geometry for performance analysis. Specifically, we consider two tiers of macro base stations (MBS) and small base stations (SBS). The MBS tier is modeled as a Poisson Point Process (PPP), and both the SBS tier and users are modeled as a Poisson Cluster Process (PCP) to capture user clustering and network hotspots. We derive the analytical expressions for user association probabilities, the Laplace transform of interference, and the coverage probability. The derived coverage probability is validated through Monte Carlo simulation. The numerical results show that the coverage in THz PCP-HetNets is higher than that achieved in THz PPP HetNets. In addition, a moderate spatial spread of SBSs is beneficial for coverage.

0

eess.AS 2026-07-03

vLLM keeps audio CFG at 80% of normal speed

by Haoran Wang, Jinchuan Tian +2 more

An Efficient vLLM-Based Inference Pipeline for Unified Audio Understanding and Generation

Co-scheduling conditional and unconditional requests inside the same batch absorbs the usual overhead while supporting delay-pattern de-inte

abstract click to expand

While Large Multimodal Models excel in comprehension, high-throughput inference engines lack native support for multimodal generation. This is severe in Speech Language Models, where generating multi-layered audio tokens via decoupled AR+NAR or synchronous Multi-Token Prediction (MTP) with delay-pattern interleaving conflicts with standard single-stream loops. We present a vLLM-based inference pipeline for unified speech understanding and generation. We extend autoregressive decoding to natively execute delay-pattern de-interleaving and coordinated multi-stream sampling, integrating an on-GPU acoustic decoder for end-to-end waveform synthesis. Crucially, we overcome the shared intuition that Classifier-Free Guidance (CFG) halves throughput. By co-scheduling paired conditional and unconditional requests within a continuous batch, our CFG implementation sustains 80% of non-CFG throughput, absorbing dual-request and logit merging overheads. We open-source our framework.

1 0

0

eess.SP 2026-07-03

Closed-form spatial correlation derived for cylindrical mMIMO arrays

by Shasha Liu, Abla Kammoun +1 more

Three-Dimensional Spatial Correlation Modeling for Cylindrical mMIMO Arrays in HAPS

Exact expression uses spherical harmonics and Fourier coefficients to handle arbitrary patterns and angles for HAPS.

abstract click to expand

High-altitude platform stations (HAPS) are envisioned as a key component of future wireless networks, enabling ultra-wide coverage and providing direct connectivity to users with cylindrical massive multiple-input multiple-output (mMIMO) systems. Exploiting the channel degrees of freedom necessitates accurate modeling and characterization of three-dimensional (3D) channels in the presence of spatial correlation functions (SCFs). However, existing spatial correlation models are primarily developed for planar or linear antenna arrays and cannot be directly applied to cylindrical geometries commonly adopted by HAPS platforms. To address this limitation, this paper derives an exact closed-form expression for the SCF of 3D MIMO channels with antenna elements arranged in a cylindrical array. The proposed formulation is based on the spherical harmonic expansion (SHE) of plane waves and accommodates arbitrary antenna radiation patterns and angular distributions through the Fourier series (FS) coefficients of the power azimuth and zenith spectra. The derived SCF is validated through Monte Carlo simulations under standard-compliant settings.

0

eess.SY 2026-07-03

Nominal controller stabilizes jumping PDEs with small mismatch

by Yihuai Zhang, Yidan Cao +2 more

Robust Stabilization of Linear Markov-Jumping Hyperbolic PDEs with Boundary Input Delay

Mode-independent Lyapunov analysis gives mean-square stability for Markov parameter jumps in 2x2 hyperbolic systems with boundary delay.

abstract click to expand

This paper studies the robust stabilization of 2 $\times$ 2 linear hyperbolic partial differential equations (PDEs) with Markov-jumping parameters and boundary input delay. The main challenge arises from the simultaneous presence of stochastic parameter variations and input delay, which complicates both the stability analysis and controller design. To address this issue, a nominal delay-compensating backstepping controller is first designed for a fixed nominal system. Applying the nominal transformation to the stochastic system yields a target system with additional perturbation terms induced by parameter mismatch. A mode-independent Lyapunov functional is then constructed to establish a pathwise exponential estimate, which directly implies mean-square exponential stability under an explicit small-mismatch condition. The proposed analysis provides a direct robustness certificate for nominal delay compensation without using mode-dependent Lyapunov functionals. Finally, we present simulation results and discuss how the conservative small-mismatch condition should be interpreted for the numerical example.

0

eess.SY 2026-07-03

Time margin unifies fault clearing time and load drift

by Marián Mešter

A Time-to-Boundary Margin for Transient Stability: Unifying Critical Clearing Time and Operating-Point Drift

The index equals critical clearing time on the single-machine model and reproduces it within 6 percent on the New England 39-bus system.

abstract click to expand

The loading margin to voltage collapse -- the distance in parameter space to the closest saddle-node bifurcation -- is a standard proximity index for voltage stability. This paper develops its transient-stability counterpart: a margin M that measures the time to the synchronism boundary rather than a distance, and that unifies two limits usually treated separately. The critical clearing time (CCT) is the fast, fixed-parameter limit; the slow drift of the operating point toward a static loadability limit is the other. M is defined as the first-passage time of the joint state-parameter motion to the survival boundary. We prove and verify that M equals the CCT exactly on the one-machine-infinite-bus reduction (deviation <= 0.01% across loadings on a published benchmark), establishing a certified single-machine pillar. Under operating-point drift, M yields an operational lead time before faults become unclearable; we take the 28 April 2025 Iberian blackout timeline as an illustrative time scale for the drift rate. On the New England 39-bus system, an independent benchmark, the single-machine-equivalent reduction reproduces the CCT within 1.8-6.0% (conservatively), and a critical slowing-down signature flags proximity to the boundary. For the multimachine case we characterize the limits explicitly: the transfer-conductance work is tightly boundable, while the controlling unstable equilibrium is the binding obstruction to a certified margin.

0

eess.AS 2026-07-03

480K-parameter model matches echo cancellation benchmarks in real time

by Chengwei Liu, Shaofei Xue +3 more

LMPAN: A Lightweight Multi-Path Alignment Network for Joint Full-Duplex Acoustic Echo Cancellation and Noise Suppression

LMPAN corrects signal mismatches via multi-path alignment and attention to enable full-duplex audio on devices.

abstract click to expand

We propose a lightweight multi-path alignment network (LMPAN) for on-device joint acoustic echo cancellation (AEC) and noise suppression (NS) in full-duplex spoken dialogue systems. To address hardware-induced distortions and dynamic acoustic conditions, we introduce three core innovations: (1) a multi-path alignment stage correcting temporal and energy mismatches across reference, linear AEC (LAEC) output, and microphone signals; (2) an attention-based mechanism that dynamically integrates enhanced LAEC and microphone features under varying acoustic scenarios; (3) a post-filtering module with a dynamic target generation strategy for downstream tasks (ASR, VAD). Furthermore, we adopt a two-stage training framework leveraging self-supervised learning representations to enhance perceptual quality. Experiments show that LMPAN, with only 480K parameters and 126 MACs, achieves performance comparable to the state-of-the-art lightweight model DeepVQE-S, while ensuring real-time inference capability.

1 0

0

cs.LG 2026-07-03

Online algorithm learns any LDS with O(k) parameters

by Yuval Ran-Milo, Angelos Assos +1 more

A Memory Efficient Unified Algorithm for Online Learning of Linear Dynamical Systems

It delivers sublinear regret when instability is limited to k modes and proves fewer filters cannot work.

abstract click to expand

Motivated by the challenge of stabilizing a general unknown linear dynamical system (LDS) from observations, we study the natural prerequisite of online prediction. Our goal is to achieve sublinear regret with a memory footprint that adapts to the intrinsic complexity of the dynamics rather than the full hidden -- state dimension. We focus on the practically central regime of systems with low instability complexity -- eigenvalues outside the real stable interval that do not decay rapidly, together with non-semisimple modes-potentially embedded in an otherwise stable real spectrum of much higher dimension; we write $k$ for this count. This regime is the primary setting in which stabilization is plausible: we show that many systems with high instability complexity cannot be stabilized without exponentially large controls. Thus, prediction is meaningful for stabilization precisely when the instability complexity is small. Within this regime, we introduce a unified online algorithm that handles every LDS (including non-diagonalizable systems with complex or exploding modes) with a learnable parameter count of $\widetilde{O}(k)$. Finally, we prove a lower bound showing that $k$ is a valid complexity measure: any filter-based predictor needs at least $k$ filters. Experiments corroborate our theory: on a high-dimensional system, our predictor sharply outperforms prior methods at an equal parameter budget.

0

eess.SP 2026-07-03

Dual antenna powers brain implants and sends data at 32 Mbps

by Ali Khaleghi, Aminolah Hassanvand +1 more

Antenna System for Simultaneous Wireless Power and Information Transfer to Brain Implants

Inductive link supplies energy while backscatter returns high-rate signals without batteries or wires.

abstract click to expand

Brain-Computer Interfaces (BCIs) have revolutionized neuroscience applications, from motor rehabilitation to neuroergonomics. Traditional implantable BCIs with invasive microelectrode arrays pose challenges, notably the need for wired connections and inherent implantation risks. This paper introduces a battery-free wireless BCI system, consolidating an implant and its external supporting system. Our design centers on a dual-function antenna system: firstly, an inductive coupling mechanism enables wireless power transfer, sufficiently powering the implant's Application-Specific Integrated Circuit (ASIC) for stimulation and readout without an implant battery. Secondly, a backscatter antenna in the implant facilitates battery-free, high-data-rate wireless connectivity (up to 32 Mbps). This system not only enhances the BCI experience by eliminating wires but also retains data fidelity and energy efficiency, promising a safer, more efficient interface for tasks like robotic arm control.

0

eess.SP 2026-07-03

MIMO beamforming cuts IoT false wake-ups by over 50%

by Israa Khaled, Ammar El Falou +2 more

Integrated Wake-Up Radio and MIMO Solution for Cellular IoT Networks

Specific antenna count focuses wake-up signals, raising reliability and extending battery life in multi-cell networks.

abstract click to expand

Wake-up radio (WUR) is a technology designed to enhance the energy efficiency of Internet of Things (IoT) networks and extend device battery life. While most studies focus on WUR performance with single-antenna base stations, this paper investigates the multiple-input multiple-output (MIMO) technology to improve device energy saving and extend the coverage of wake-up signals. By leveraging MIMO beamforming, the transmitted energy can be spatially focused toward the intended IoT devices, with high beamforming gain and minimal inter-device interference. We develop a preliminary analytical framework using stochastic geometry to evaluate the wake-up success probability of WUR-MIMO in multi-cell cellular IoT networks, when the number of antennas equals $2 \times (\text{number of devices}) - 1$. Monte Carlo simulations show that, relative to a single-antenna WUR baseline, MIMO beamforming significantly enhances wake-up reliability when this antenna configuration is applied, mitigates more than 50% of false activations across all settings, and thereby prolongs the lifetime of IoT devices.

0

cs.NI 2026-07-03

Gain control makes additive noise fail for CSI simulation

by Aymen Bouferroum (FUN), Ildi Alla (uni.lu) +2 more

CSI Simulation: Why Additive Noise Fails and How to Fix It

M_QTC learns the multiplicative amplitude mapping from measurements and lets classifiers recover 93% of real jamming detection performance.

abstract click to expand

Channel State Information (CSI) has become a widely used wireless channel sensing modality for applications such as indoor localization, activity recognition, and respiration monitoring. Because collecting labeled data under every target condition is impractical, training CSI-based models often relies on simulated data produced by adding noise or perturbations to recorded channel estimates, most commonly additive white Gaussian noise (AWGN). This practice assumes that the receiver chain between the antenna and the channel estimator is linear and gain-invariant. We test this assumption empirically using RF jamming as a controlled perturbation on 6 commodity receivers across 2 indoor environments. The assumption does not hold. Automatic gain control compresses the channel estimate multiplicatively before digitization, producing amplitude distributions that no additive noise variance can reproduce. To close the resulting fidelity gap, we propose M_QTC, a measurement-calibrated model that learns the per-subcarrier distribution transformation through quantile mapping, temporal filtering, and copula-based cross-subcarrier reordering. M_QTC reduces amplitude error 8-fold and closes 89% of the aggregate fidelity gap across four complementary dimensions. The improvement transfers directly to downstream tasks, where 5 classifiers from different families trained on M_QTC-simulated data recover 93% of real-data jamming detection performance, while AWGN-trained classifiers remain near random decision.

0

eess.AS 2026-07-03

Single neural audio codec model handles multiple token rates

by Tomohiko Nakamura, Wataru Nakata +2 more

Neural Audio Codec with Adjustable Token Temporal Resolution Using Sampling-Frequency-Independent Convolutional Layers

Shared parameters create resolution-specific kernels by scaling size and stride to each token interval

abstract click to expand

Discrete tokens obtained from neural audio codecs (NACs) have been used as compact representations in audio generation and understanding models. In such token-based systems, token temporal resolution (TTR), defined as the time interval between adjacent token frames, is important because it controls the trade-off between representing rapid acoustic events and reducing token-sequence length. However, most NACs are trained at a single TTR and require separate training for each TTR. This paper proposes a mechanism that enables a single NAC to operate at multiple TTRs using sampling-frequency-independent convolutional layers. The mechanism regards TTR as the sampling period of the token sequence and generates TTR-dependent convolutional kernels from a shared parameter set, while adjusting the kernel size and stride for each TTR. We incorporate the mechanism into Descript Audio Codec, leaving the quantizer unchanged. Experiments on environmental sound reconstruction show that the proposed model outperforms a single-model baseline that switches TTR-specific layers for each TTR.

1 0

0

eess.AS 2026-07-03

PLC models adapt using only received audio packets

by Yehoshua Dissen, Joseph Keshet

Self-Supervised Test-Time Tuning for Packet Loss Concealment

Self-supervised synthetic masking on arrived signals improves concealment of true losses without extra data or model changes.

abstract click to expand

Packet loss concealment (PLC) reconstructs audio packets that are missing at the receiver, usually with a trained model whose parameters remain fixed at deployment time. This treats the PLC model as static, even though each call or recording exposes signal-specific information through the packets that did arrive. We present TTT-PLC, a self-supervised test-time tuning framework that adapts existing PLC models using only those received packets. The method creates supervision by synthetically masking portions of the available signal, training the model to conceal them with its native PLC objective, and then using the adapted model to reconstruct the true packet losses. No clean reference signal, external adaptation data, or architectural modification is required. We study TTT-PLC in two deployment settings. In the non-causal setting, the received file is available before reconstruction, allowing repeated self-supervised adaptation passes and providing a per-file adaptation ceiling. In the causal setting, audio is streamed without revising emitted samples; adaptation is performed only on completed past blocks, and updated parameters affect only future audio. We instantiate the framework on two public PLC backbones, FRN, a recurrent full-band speech PLC model, and PARCnet, a hybrid autoregressive-neural model for networked music. Across these settings, the results show that pretrained PLC systems do not need to be treated as fixed at inference time, the still-observed portions of a lossy signal can provide an effective training signal for improving concealment on that same signal.

0

eess.SY 2026-07-03

Koopman operator linearizes nonlinear dynamics via observables

by Igor Mezić, Jorge Cortés +3 more

Koopman operator theory: fundamentals, control, and applications

Data-driven EDMD approximations come with error bounds and support input-driven control design including MPC.

abstract click to expand

The Koopman operator has gained considerable attention due to its ability to provide a global linear representation of highly complex dynamical systems. The operator describes nonlinear dynamics in a linear way through the lens of real- or complex-valued observable functions. Recently proposed data-driven techniques, like extended dynamic mode decomposition (EDMD), its kernelized variant, and machine-learning methods, can be used to generate finite-dimensional approximations accompanied by finite-data error bounds. In this tutorial paper, we provide a concise introduction into Koopman operator theory and its use in systems and control. A particular focus is put on data-driven surrogate models, their extension to systems with inputs, and controller design using Koopman operator theory. Moreover, we demonstrate the key techniques, i.e., EDMD and Koopman MPC. To this end, we provide simulation studies including source code on GitHub to enable the interested reader to experience the Koopman operator in systems and control step by step.

0

eess.SP 2026-07-03

PINN-GNN builds accurate multipath RF maps from sparse points

by Lizhou Liu, Xiaohui Chen +3 more

Scene-Conditioned PINN-GNN for Multipath RF Maps: Cross-Scene Generation and In-Scene Completion

Electromagnetic constraints plus graph modeling let the method generate or complete maps across scenes better than image or diffusion baseli

abstract click to expand

Radio frequency (RF) maps provide a compact representation of multipath propagation characteristics and are fundamental to channel modeling, coverage analysis, and environment-aware wireless optimization. This paper proposes a unified RF map construction framework based on a physics-informed neural network (PINN) and a graph neural network (GNN), supporting both cross-scene generation and in-scene completion with 2D and 2.5D environmental representations. The PINN embeds electromagnetic propagation constraints to establish a physically consistent mapping from receiver locations to multipath parameters, including path gain, time of arrival, and angles, while the GNN enforces spatial consistency by modeling correlations among neighboring receivers. To comprehensively evaluate multipath reconstruction quality, we propose a peak-weighted dynamic time warping metric that jointly accounts for amplitude errors and peak delay misalignment in channel impulse responses. Extensive experiments demonstrate that the proposed method consistently outperforms image-based, diffusion-based, and interpolation baselines across both map-level and multipath-level metrics, achieving robust generalization and high-fidelity RF map construction under sparse observations.

0

cs.CV 2026-07-03

LLM channel prompts cut localization error 40% in driving tests

by Wen Wang, Yaping Sun +6 more

LLM-Empowered Multimodal Fusion Framework for Autonomous Driving: Semantic Enhancement and Channel-Adaptive Design

Channel quality prompts let the model fall back to vision in noise and add radar when clear, shown on nuScenes and VIRAT.

abstract click to expand

Vision-radar fusion is central to robust autonomous driving, combining dense visual semantics with precise range and velocity measurements from radar. However, real-world fusion quality is fundamentally challenged by dynamically varying input quality, stemming from occlusion, adverse weather, and channel noise. To address this, we re-frame the problem from static data fusion to channel-aware semantic reasoning and propose a Large Language Model-centric Semantic-layer Channel-aware Integrated Perception (LM-SCIP) framework. It places a Large Language Model (LLM) as a central reasoning core to fuse a local visual stream with a quality-varying external radar stream used to cover perception-blind spots. Concretely, LM-SCIP couples a hierarchical radar-vision encoder with a Channel-Adaptive Semantic Module (CASM) that maps link indicators into a "Channel Prompt" to dynamically gate external radar features. A parameter-efficient, LoRA-tuned LLM, in conjunction with a heterogeneous Mixture-of-Experts (H-MoE), then arbitrates between local visual cues and the channel-conditioned radar context. Finally, a decoupled multi-task decoder outputs localization, trajectory forecasting, and image reconstruction. Experiments on nuScenes and VIRAT validate our approach. On nuScenes, under a controlled toggle of radar input, LM-SCIP reduces localization RMSE by 40.0% versus a vision-only baseline. On VIRAT, the model attains a 0.214m localization RMSE and 0.179m minFDE (k=1). These results reveal that the proposed LM-SCIP enables a robust vision-dominant fallback at low SNR and synergistic fusion at high SNR.

0

eess.SP 2026-07-03

Tighter surrogate raises multicell uplink weighted sum rates

by Zihan Jiao, Xinping Yi +2 more

Rethinking Fractional Programming for Joint Uplink Scheduling and Power Control in Multicell Wireless Networks

A reciprocal-inversion transform improves the lower bound on log-rate functions inside fractional programming while keeping closed-form per-

abstract click to expand

This paper investigates the joint uplink scheduling and power control problem in a coordinated multicell wireless network, where at most one single-antenna user is allowed to access the single-antenna base station in each cell simultaneously. The resulting weighted sum-rate (WSR) maximization problem is a mixed discrete-continuous, nonconvex optimization problem that is notoriously difficult to solve directly. Classical fractional programming (FP) methods tackle this problem by leveraging the Lagrangian dual transform (LDT) followed by the quadratic transform (QT), yielding a tractable closed-form solution for scheduling and power control, with the LDT playing a crucial role in handling discrete variables. In this paper, we revisit the LDT from a minorization-maximization (MM) perspective and observe that its induced surrogate is somehow conservative due to the reciprocal-coordinate construction. Motivated by this observation, we propose a novel reciprocal-inversion transform (RIT) that constructs a tighter first-order Taylor expansion lower bound for the logarithmic rate function. The proposed RIT remains fully compatible with the QT, leading to a surrogate-enhanced FP (SEFP) algorithm for joint uplink scheduling and power control. The proposed SEFP algorithm retains the desirable per-cell separability of the classical FP framework and admits closed-form updates for the auxiliary variables, scheduling decisions, and transmit powers. Simulation results demonstrate that the SEFP algorithm consistently outperforms the classical FP method and other baselines for different network utilities.

0

eess.SY 2026-07-03

Linear model matches low-speed ship maneuvers from real data

by Agnes N. Mwange, Taichi Kambara +3 more

Development and Identification of a Linear Low-Speed Ship Maneuvering Model from Full-Scale Data

State-space parameters identified via CMA-ES on full-scale trials reproduce observed trajectories.

abstract click to expand

Despite significant technological progress, the realization of fully autonomous berthing and unberthing remains a significant challenge. One of the primary obstacles is the complex, non-linear nature of low-speed ship dynamics, which are difficult to model and control and often necessitate equally complex maneuvering models and control systems. This study proposes a simplified approach to bridge this gap by modeling the ship dynamics in the form of a time-invariant, continuous-time linear state-space system. The model parameters are estimated through system identification using the Covariance Adaptation Strategy Evolution Strategy (CMA-ES) applied to full-scale maneuvering data. Validation results demonstrate a strong agreement between the model output and empirical data. This outcome demonstrates the significant potential of simplified models to effectively define the maneuvering motion of a ship at low speeds.

0

cs.LG 2026-07-03

CROF score selects world models for strong LunarLander control

by Nikolai Smolyanskiy

Predicting Closed-Loop Performance of Latent World Models: Offline Checkpoint Selection for MPC and Model-Based RL Under Non-Markovian Rewards in LunarLander

An offline metric using reward observability ranks checkpoints that beat model-free RL with 65 times less interaction data.

abstract click to expand

We study how to predict the downstream closed-loop performance of a learned latent world model from validation-time diagnostics alone. Choosing the right checkpoint from a world-model training run is difficult: validation loss and multi-step prediction RMSE keep improving long after closed-loop performance has collapsed. We present a suite of structural validation-time diagnostics drawn from optimal-control theory and apply them to Gymnasium's LunarLander v3, which features shaped rewards. We train an RSSM [5, 4] world model on it and treat per checkpoint CEM-MPC return as the oracle for closed-loop quality. By evaluating 40 metrics against this oracle, we find that the strongest single predictor is the Reward Observability Fraction (ROF), which measures the reward predictor's dependence on the observable subspace. We combine ROF with three structural regularizers into a single-number offline checkpoint selection score, the Composite Reward Observability Fraction (CROF). The CROF-selected world model trains a model-based A2C policy that beats a fairly evaluated model-free A2C baseline by ~24.5 return points while using ~65x fewer real-environment interactions, and the same world model also drives a strong zero-shot CEM-MPC policy. Code and data: https://github.com/nsmoly/LunarLander_RSSM.

0

cs.CL 2026-07-03

Interleaving speech and text lifts ASR entity accuracy on 38k hours

by Ruchao Fan, Yiming Wang +11 more

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

The method matches real domain text performance without synthetic pairs while keeping language model generation behavior.

abstract click to expand

Speech-LLM integration has shown promising results by leveraging extensive textual pretraining, yet its specific benefits for automatic speech recognition (ASR) remain unclear. We observe that as supervised ASR training data increases, the contribution of LLM priors becomes less evident, and simple speech-text joint training under-utilizes textual knowledge. We therefore propose Joint Speech-Text Interleaved Pretraining (JSTIP), an ASR-oriented pretraining strategy that constructs word-level and segment-level interleaved speech-text sequences within aligned pairs for speech-LLM architectures that accept continuous inputs. Experiments on 38k hours of ASR data show consistent entity accuracy improvement compared to ASR-only and joint speech-text training baselines. JSTIP achieves on-par entity recognition performance using domain transcription text compared to synthetic speech-text pairs, simplifying domain adaptation. Benefiting from textual pretraining and domain text data, JSTIP is competitive with open-source ASR and Speech-LLM systems in medical entity recognition. The zero-shot speech question answering behaviors further suggest that interleaving reduces the speech-text modality gap and preserves the LLM generative prior, which is likely the reason for the entity improvements on the ASR task.

0

eess.IV 2026-07-03

Wave functions model images to explain low-light enhancement

by Yiquan Gao

Quantum-Inspired Vision: Leveraging Wave-Particle Duality for Low-Illumination Enhancement

Treating images as probabilistic waves integrates physics into AI for better bias handling and noise robustness.

abstract click to expand

This study provides a theoretical expansion of the recent Data Relativistic Uncertainty (DRU) framework by formalizing a physics-to-AI paradigm for image enhancement. By modeling images as probabilistic wave functions rather than deterministic states, the paradigm explicitly integrates wave-particle duality to illustrate the system flow of how DRU leverages the intrinsic physical uncertainty of light, a dimension requiring further theoretical discussion. Consequently, this paradigm provides a rigorous Explainable AI (XAI) approach that enhances the interpretability of how DRU mitigates illumination bias and maintains robustness against data noise.

1 0

0

eess.SY 2026-07-03

Dynamic phasors allow eigen analysis of subsynchronous oscillations

by Fiaz Hossain, Nilanjan Ray Chaudhuri +4 more

A Dynamic Phasor Framework for Analysis of Subsynchronous Oscillations in Multi-Machine Systems with IBRs and Large Loads

A mixed dq and pnz frame model supports root-cause studies and faster simulation of large systems with IBRs and big loads.

abstract click to expand

Although the electromagnetic transient (EMT) framework can capture subsynchronous oscillations (SSOs), it faces scalability issues for large-scale systems. Thus motivated, we propose a generalized dynamic phasor (DP) framework to analyze SSOs in multi-machine systems with inverter-based resources (IBRs) and large loads such as artificial intelligence data centers (AI DCs) under balanced and unbalanced conditions. The grid-following (GFL) and grid-forming (GFM) IBRs are modeled in their respective $dq$-frame DPs. In contrast, the detailed model of multi-mass turbine driven synchronous generators (SGs) along with dynamic transmission network models and loads are represented in $pnz$-frame DPs. The linearizability and time-invariance of the framework enable us to perform eigen decomposition, which is a powerful tool for root-cause analysis of SSO modes and the design of damping controllers. In addition, the DP modeling approach facilitates faster simulation of large-scale systems. The generalized framework is validated with EMTDC/PSCAD simulations using the IEEE first benchmark model for subsynchronous resonance and the modified IEEE 4-machine system. Several use cases are presented on the modified IEEE 68-bus system with two GFL IBRs to show the applicability of the framework. First, time- and frequency-domain analyses of the IBR-induced SSO mode are presented. Then, two solutions are proposed to damp the poorly damped SSO mode: (a) a decentralized controller is designed using particle swarm optimization, and (b) the control of one GFL IBR is replaced by GFM control. Finally, the impact of AI DC load on primary frequency response of the system and the multi-mass turbines of the SGs are studied.

0

eess.SY 2026-07-03

Network loading tightens droop-gain limits for grid stability

by Zhimeng Wang, Sushobhan Chatterjee +3 more

Decentralized Stability Certificates in IBR-Dominated Grids: The Role of the Network State

Higher reactive mismatches and line loading shrink the set of stabilizing local controller parameters.

abstract click to expand

Small-signal instabilities, such as unforced sub-synchronous oscillations (SSOs), are increasingly observed in inverter-based resource (IBR) dominated grids. While decentralized stability certificates offer a scalable means to avoid instability onset, they are typically derived under restrictive network-state assumptions--such as small angle differences or negligible voltage drops--that cannot capture how departures from these conditions affect system stability. In this paper, we develop a network model and a decentralized analysis framework that explicitly characterizes how reactive power mismatches, line loading, and inverter control parameters jointly determine small-signal stability. We show that increased steady-state reactive power mismatches and line loading lead to more stringent conditions on admissible inverter droop gains. These results make decentralized stability certificates explicitly network-state dependent, showing how network stress shrinks the set of stabilizing local controller parameters.

0

eess.SY 2026-07-03

One inverter control structure covers forming and following modes by parameter tuning

by Xiaoyang Wang, Xin Chen

A Unified Framework for Hybrid Grid-Forming and Grid-Following Inverter Control

Continuous adjustments replace discrete switches for smooth shifts across PQ, PV, Qf, Vf, and hybrid operation

abstract click to expand

This paper proposes a novel unified control framework for achieving hybrid grid-forming (GFM) and grid-following (GFL) inverter operation by integrating dispatchable virtual oscillator control with reference-following synchronization. The proposed inverter control method supports multiple operating modes within a unified structure, including voltage- and frequency-following (PQ mode), voltage-forming and frequency-following (PV mode), voltage-following and frequency-forming (Qf mode), voltage- and frequency-forming (Vf mode), and a hybrid mode with mixed GFM and GFL behaviors. In particular, the proposed method achieves smooth pre-synchronization and enables seamless transitions across a spectrum of inverter operating modes by tuning a small set of continuous control parameters, rather than relying on discrete controller switching. This framework provides a flexible and physically interpretable approach for adapting inverter dynamics to varying grid conditions and operational requirements. The small-signal stability and input-output frequency-domain characteristics are further analyzed under different control parameter settings. The effectiveness and robustness of the proposed unified control method are demonstrated through extensive electromagnetic transient (EMT) simulations and hardware-in-the-loop (HIL) experiments.

0

eess.AS 2026-07-03

Three label targets train AAI models without SSL at test time

by Jesuraj Bandekar, Prasanta Kumar Ghosh

Enhancing Acoustic-to-Articulatory Inversion with Multi-Target Pretraining for Low-Resource Settings

Accuracy rises in low-data regimes and inference cost falls because the SSL extractor is removed after pretraining.

abstract click to expand

Acoustic-to-Articulatory Inversion (AAI) estimates vocal tract articulator movements from speech, benefiting tasks like ASR, speech synthesis, and speaker verification. While deep learning-based methods (CNNs, RNNs, Transformers) have advanced AAI, recent studies show that Self-Supervised Learning (SSL) features further enhance performance, particularly in low-resource settings. However, SSL feature extractors introduce inference latency and computational overhead. To address this, we propose a novel pretraining method leveraging three target representations-Phoneme Labels, Articulatory Feature Labels, and Critical-articulator Labels-eliminating the need for an SSL extractor during inference. We evaluate our approach against both baseline and SSL-based models across various data conditions. Results demonstrate that our method consistently improves AAI performance, particularly in low-resource scenarios, while significantly reducing inference costs without sacrificing accuracy.

1 0

0

eess.SP 2026-07-03

Multicarrier optimization boosts underwater acoustic power transfer

by Jinheng Kang, Yizhe Zhao +2 more

Waveform Design for Underwater Simultaneous Acoustic Information and Power Transfer

Including transducer frequency response and rectifier nonlinearity in the design raises energy transfer efficiency in simulations.

abstract click to expand

Simultaneous acoustic information and power transfer (SAIPT) plays a crucial role in enabling self-sustainable and maintenance-free Internet of Underwater Things (IoUT) networks. This paper studies a multicarrier underwater SAIPT system that jointly considers the frequency-dependent characteristics of acoustic transducers and the nonlinear behavior of rectifier circuits. The waveform vector is firstly optimized using the successive convex approximation (SCA) method under constraints on average and peak transmit power for acoustic power transfer (APT). Then, in the SAIPT scenario, both the power splitting factor and waveform vectors are jointly optimized through an alternating optimization (AO) framework based on SCA, subject to transmit power and achievable rate constraints. Simulation results demonstrate that incorporating the transducer's frequency response, rectifier nonlinearity, and the high peak-to-average power ratio (PAPR) of multicarrier waveforms leads to a significant improvement in acoustic energy transfer efficiency. The results also show that the energy harvesting DC output can be further enhanced by properly choosing system parameters, such as the number of subcarriers and subcarrier spacing.

0

eess.AS 2026-07-03

Data strategies lift rare nonverbal detection in ASR

by Gene Yang, Haibin Wu +11 more

Beyond Words: Towards Effective Modeling of Non-Verbal Vocalizations in ASR

Shared acoustic structure between common and rare vocal events enables better modeling of laughs, breaths, and cries without losing word acc

abstract click to expand

Modern automatic speech recognition (ASR) systems excel at transcribing lexical content but often omit nonverbal vocalizations (NVs), such as laughter, breaths, coughs, and cries, that carry conversational and affective information. Modeling NVs in ASR is challenging because NV annotations are sparse and highly long-tailed, with frequent categories such as breaths and laughter dominating rarer events such as cries and coughs. We study three data-centric strategies for improving low-resource NV recognition: (1) a two-stage curriculum that first maps all NV events to a generic token and then fine-tunes on target categories; (2) inter-token transfer from high-resource events, such as laughter and breath, to rare events, such as crying; and (3) voice-conversion augmentation with class balancing. Experiments show that shared acoustic structure across vocal events can be exploited to improve rare-category detection while preserving lexical ASR quality.

0

cs.LG 2026-07-02

Learned wind estimator cuts quadrotor tracking error 48%

by Abdullah Al Tasim, Wei Sun

Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence

A reinforcement learning controller using an attention-based onboard estimator outperforms a wind-blind baseline across 4-12 m/s mean winds

abstract click to expand

Small multirotor aircraft are increasingly tasked with operations in the atmospheric boundary layer, where turbulent winds comparable to the vehicle's airspeed degrade trajectory tracking and can defeat conventional feedback control. This work illustrates a two-stage learning pipeline that first estimates the local wind from onboard kinematics and dynamics and then exploits that estimate inside a reinforcement learning (RL) flight controller. The wind estimator, an attention-augmented gated recurrent network trained on thousands of simulated flights through von Karman turbulence with power-law shear and veer, recovers the horizontal wind vector with a per-flight root-mean-square error of 0.40 m/s and a direction error of 3.2 degrees on unseen wind regimes, an accuracy near the floor imposed by unresolved turbulence, and generalizes to vertical ascent profiles with a skill score of 0.861 over a constant-wind reference. A proximal policy optimization controller receiving the frozen estimator's output reduces horizontal trajectory tracking error by 48% relative to a wind-blind proportional-derivative baseline across mean winds of 4 m/s to 12 m/s, winning on 100% of evaluation episodes. A three-way ablation decomposes this improvement into a kinematic component, available without wind information, and a wind-perception component; the perception share rises with wind speed, from small in light winds toward roughly half the total benefit in strong winds, consistent with the quadratic scaling of aerodynamic drag. The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s, where the baseline fails catastrophically.

0

eess.SY 2026-07-02

Region graph transfers mode shape recognition across vehicles

by Tong Duy Son, Marc Brughmans +5 more

Robust and Explainable 3D Mode Shape Recognition Using Region-Aware Graph Neural Networks

Mapping models to shared structural regions lets AI work on new designs without identical meshes or retraining on full data.

abstract click to expand

Mode shape recognition is a fundamental task in automotive NVH development, yet it remains dependent on manual visual inspection by experienced engineers. Existing approaches based on engineering heuristics, Modal Assurance Criterion (MAC), or geometry-dependent AI representations often exhibit limited robustness across different vehicle architectures, finite element (FE) meshes, and experimental measurement layouts, restricting their industrial applicability. This paper presents a Canonical Engineering Graph Representation and region-aware graph learning framework for robust and explainable 3D mode shape recognition. Rather than learning directly from vehicle-specific FE meshes, heterogeneous FE models and experimental measurements are transformed into a common graph whose nodes represent semantically meaningful structural regions connected through engineering-informed relationships. Geometry-independent regional descriptors are combined with graph attention learning and region-aware pooling to capture structural interactions while preserving engineering semantics and enabling physically interpretable predictions. The resulting representation decouples engineering knowledge from numerical discretization, allowing transfer across different vehicle programs without requiring identical mesh topology or sensor configurations. The proposed framework is validated using FE and experimental datasets from four vehicle programs under severe label scarcity. Results demonstrate high classification accuracy, cross-vehicle transferability, and physically meaningful explanations by directly relating predictions to engineering-defined structural regions used in NVH analysis. Beyond mode shape recognition, the proposed Canonical Engineering Graph Representation provides a reusable engineering abstraction for trustworthy and transferable AI across heterogeneous simulation and experimental workflows.

0

eess.SY 2026-07-02

MPC value function certifies almost-sure RAS satisfaction

by Arash Bahari Kordabad, Satya Prakash Nayak +2 more

Context-Triggered Robust MPC for Temporal Logic Specifications

Robust constraints and local controller handle avoidance and stay; convex duality yields tractable quadratic and SOC programs for disturbed

abstract click to expand

We consider the problem of synthesizing robust feedback controllers for discrete-time linear systems that ensure the satisfaction of context-dependent linear temporal logic specifications in the presence of additive bounded disturbances. Building on existing results that reduce context-triggered temporal logic synthesis to the realization of context-dependent reach-avoid-stay (cRAS) objectives, we focus on the corresponding low-level control synthesis problem. We first employ certificate-based conditions for the almost-sure satisfaction of RAS specifications. Based on these conditions, we propose a switching control architecture that combines robust model predictive control (MPC) with a local invariant controller, and show that the resulting MPC value function serves as a reachability certificate while avoidance is enforced through robust constraints and the stay is enforced via the local controller. To obtain computationally tractable formulations for the resulting robust optimizations, we employ convex duality to reformulate the robust constraints into equivalent deterministic optimization problems, yielding convex quadratic and second-order cone programs for relevant geometric settings. The proposed framework is demonstrated on a robot navigation problem with context-triggered logical switches in both static and moving environments. The results show significantly larger feasible sets than Lyapunov-based approaches, while naturally accommodating dynamic environments and online task reconfiguration.

0

eess.SY 2026-07-02

MILP boosts lowest battery from 2.7% to 68.6% in 20-terminal network

by Pranay KC, Amin Taghieh +4 more

Optimal Reconfiguration of Distributed Battery Networks Under Connectivity and Energy Constraints

Algorithm with overlap correction and distance penalty manages energy under budget while cutting changes by 72%.

abstract click to expand

Networked battery systems arise in industrial automation, distributed energy applications, and multi-agent systems, where terminals consume energy locally and recharge only when connected to a source. Resource constraints often limit the number of simultaneous connections, requiring networks to be dynamically reconfigured to maintain system functionality. Managing such networks in dynamic environments is challenging, particularly when low-energy terminals must be prioritized for timely replenishment. This paper presents a battery-aware topology optimization algorithm that extends the GeoSteiner framework with a tailored Mixed-Integer Linear Program (MILP) formulation for Full Steiner Tree (FST) aggregation. The formulation minimizes network length while prioritizing low-battery terminals through a weighted objective subject to a global budget constraint, enabling partial network formation under realistic resource limits. An overlap-correction term is introduced that prevents double-counting when selected trees share terminals. To capture the network reconfiguration cost between time steps, a graph-distance metric penalizes frequent topology changes, resulting in 72.2% reduction compared to a baseline without penalty. Simulations on a 20-terminal network demonstrate battery levels are effectively managed as the lowest battery level improved from 2.7% to 68.6% over 30 iterations while maintaining the topology stability and budget utilization (92%). The framework offers a principled approach to designing energy-aware, adaptive connectivity in power-limited multi-agent systems.

0

eess.SP 2026-07-02

Pilot-first method cuts CKM error 0.79-1.33 dB from sparse measurements

by Zhonghao Jiu, Fan Meng +4 more

Channel Knowledge Map Reconstruction From Sparse Measurements via Pilot-Anchored Layout-Conditioned Fourier Refinement

Stabilizing supported pilots before layout-conditioned refinement improves accuracy at 5-10% coverage in outdoor scenarios.

abstract click to expand

Channel knowledge maps (CKMs) enable environment-aware wireless systems by providing location-specific channel knowledge, but long-term environmental variations, such as construction, traffic redistribution, and foliage changes, require periodic map refresh. In practice, channel measurements are often sparse and irregular, while environmental knowledge may be limited to coarse layout or topology descriptors. This paper studies CKM reconstruction from sparse measurements. We show that reconstruction pipelines that apply local aggregation or spectral operators directly to a zero-filled pilot grid can entangle the sampling mask with the channel field, allowing structural priors to act on mask-induced distortions before the measurements define a supported radio field. To address this issue, we propose Anchor-CKM, a measurement-first, knowledge-aided reconstruction framework. Anchor-CKM first uses support-aware partial convolutions to construct a pilot-supported representation, and then performs layout-conditioned dual-path Fourier refinement followed by coordinate-based heteroscedastic prediction of the CKM mean and per-location predictive variance. Experiments on transmitter-disjoint DeepMIMO scenarios cover missing ratios from 0.3 to 0.95, including stringent 5% to 10% pilot-coverage settings. In explicit-layout outdoor scenarios, Anchor-CKM reduces received-power root-mean-square error (RMSE) by 0.79 to 1.33 dB relative to the strongest reproduced baseline, while ablations identify pilot-support stabilization as the largest contributor and layout conditioning as beneficial for line-of-sight/non-line-of-sight (LOS/NLOS) boundary fidelity.

0

cs.CV 2026-07-02

Enhancing three capabilities brings trackers closer to human perception

by Shih-Fang Chen

Rethinking Generic Object Tracking Toward Human-Level Perceptual Intelligence

Methods for generic object tracking target failures from deformation, distractors, and unseen categories through better discrimination, adap

abstract click to expand

At the heart of human visual perception lies the ability to maintain a continuous and coherent understanding of the external world. By integrating observations with accumulated experience, the human visual system can continuously adapt to variations in both the target and its surrounding environment, while preserving robust visual continuity as scene dynamics evolve. Human vision can therefore integrate prior knowledge, spatial geometry, and semantic context to understand complex scenes and their changes. As a core problem in computer vision, visual object tracking aims to bring machine perception closer to human visual perception. These capabilities are central to the task of Generic Object Tracking (GOT). In this task, a visual tracker is initialized only with the bounding box of an arbitrarily specified target in the first frame, and must continuously localize the target in subsequent dynamic visual streams. However, future events, observations, and real-world variations are inherently unpredictable; therefore, the model's generalization and online adaptation capabilities remain bottlenecks. Tracking reliability can deteriorate when the target undergoes severe deformation, is affected by complex distractors, encounters significant environmental changes, or belongs to a category unseen during training. This dissertation aims to narrow the gap between machine visual tracking systems and human visual perception by proposing a series of methods that systematically enhance the target discrimination, robust adaptation, and geometric reasoning capabilities of tracking models.

0

cs.RO 2026-07-02

Flow matching safety guidance reaches 82.8% collision avoidance

by William English, Hao Zheng +1 more

Neuro-Symbolic Safety Guidance for Vision-Language-Action Models via Constrained Flow Matching

Predictive avoidance via constrained optimization during denoising improves task success by 19.8% over single-step methods.

abstract click to expand

Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities across robotic manipulation tasks, yet their real-world deployment remains limited by the lack of effective safety measures. Specifically, existing safety measures only prevent collisions caused by the robot's next action. In this paper, we propose a neuro-symbolic safety guidance mechanism for flow matching based VLAs that enables predictive collision avoidance. Flow matching based VLAs determine the next actions by predicting a trajectory (a sequence of actions) through an iterative neural flow matching process. Our method formulates safety enforcement as a minimum-norm constrained optimization problem that corrects safety violations during the denoising process of noisy intermediate trajectory predictions. By analyzing predicted trajectories and applying corrections during iterative denoising, our approach anticipates collisions before they become unavoidable. This interleaving of symbolic constraint satisfaction with neural trajectory generation enables predictive collision avoidance rather than reactive intervention. On the SafeLIBERO benchmark, our method achieves 82.8% collision avoidance and 81.6% task success, a 6.3% and 19.8% improvement respectively over single-step methods, with the largest gains on long-horizon tasks where compounding distribution shift is most pronounced. Video demonstrations of our approach are included on our project page at https://willenglish.tech/SafetyGuidedFlowMatching/.

0

eess.SP 2026-07-02

Wrist sensor framework spots eating episodes on new datasets

by Chunzhuo Wang, Emma De Schuyteneer +4 more

Generalizable framework of eating episode detection on free-living wrist-worn wearable data

Achieves F1 scores of 0.59 to 0.79 across varied sensors, hands, and even eating disorder groups.

abstract click to expand

Accurate assessment of eating behavior is essential for understanding and managing conditions such as eating disorders, obesity, and diabetes. Wearable-based food intake detection has shown considerable promise; however, most existing approaches are trained and evaluated using internal validation on a single dataset with fixed sensor orientation and known wearing hand, which limits their generalizability to real-world settings. Furthermore, many existing approaches rely on both accelerometer (acc) and gyroscope (gyro) signals to achieve strong performance. However, gyro measurements may be unavailable in some real-world deployments due to battery constraints, and performance often degrades when only acc data are used. We propose a generalizable framework for orientation-invariant eating episode detection, with an acc2gyro module to improve performance in acc-only settings. The framework is trained using fine-grained wrist-worn datasets and externally validated across three heterogeneous datasets: the Clemson All-Day (CAD) and Capture-24 datasets, as well as Physio-ED, a dataset collected from individuals with eating disorders. Across external evaluations, the proposed framework demonstrates robust performance despite substantial variations in sensor modality, wearing hand, participant population, and annotation protocols. Specifically, the framework achieved F1-scores of 0.751, 0.592, and 0.793 on CAD, Capture-24, and Physio-ED, respectively, with CAD performance exceeding recent state-of-the-art methods evaluated using internal validation only. This study provides the first external validation of eating episode detection in an eating disorder population. Additionally, the acc2gyro module improves the performance in acc-only settings. These findings demonstrate the potential of orientation-invariant wearable sensing for scalable and clinically applicable assessment of eating behavior.

0

cs.CR 2026-07-02

ML surrogates recover CPS after memory corruption attacks

by Mohsen Salehi, Karthik Pattabiraman

Chameleon: Recovering Cyber-Physical Systems from Memory Corruption Attacks via ML Surrogates

Replaces vulnerable compartments with accurate models to keep robotic vehicles running safely with low overhead.

abstract click to expand

Cyber-physical systems (CPSs) are increasingly deployed in every aspect of our lives and can be compromised through memory corruption vulnerabilities, allowing attackers to hijack the control flow and take over the system. Existing techniques mostly focus on detecting such attacks but respond by terminating or halting execution upon attack detection, which is not acceptable in CPSs used in safety-critical tasks, as interrupted tasks can have catastrophic consequences. Other techniques replace compromised CPS components with simplified defaults that degrade system behavior, or reboot the system upon attack detection. We propose Chameleon, a novel framework for automatically recovering CPSs from memory corruption attacks using machine learning (ML)-based surrogates trained at compartment granularity that nearly replicate their original compartments' behavior but do not have the same memory corruption vulnerabilities. Upon attack detection, Chameleon replaces the compromised compartment with its trained surrogate. We implemented Chameleon using the LLVM compiler and evaluated its efficiency and effectiveness on seven different robotic vehicles (RVs), including simulated and real ones. We found that Chameleon can generate surrogates that closely approximate the original compartments (with an average R$^2$=0.96), successfully recover the system despite real-world memory corruption attacks unlike prior approaches, and complete their tasks while incurring low performance and memory overhead.

0

eess.SY 2026-07-02

Data refines probabilistic zonotopes for tighter reachability

by Amir Modares, Zhen Zhang +3 more

Reachability Analysis With Probabilistic Zonotopes: Learning Realized Disturbances and Refining Aleatory Uncertainty

Trajectory constraints shrink the model set and yield high-probability reachable sets with formal guarantees and less conservatism.

abstract click to expand

This paper develops a data-driven reachability framework for linear systems whose disturbances are modeled by probabilistic zonotopes (PZs), combining bounded deterministic and Gaussian stochastic components. In contrast to methods that require a precisely known disturbance model (either purely deterministic or purely stochastic), we assume only a conservative prior PZ and refine it from data. The framework separates two uncertainty sources: realized disturbances, which act along the collected trajectory and govern the size of the data-consistent model set, and aleatory disturbances, which enter as future additive uncertainty during reachable-set propagation; both shape the reachable sets, but through different mechanisms. Refinement exploits prior system knowledge together with trajectory-consistency constraints induced by the data, which impose affine couplings between deterministic and Gaussian latent variables. We accordingly develop a constrained-PZ calculus that absorbs the stochastic part of these constraints into an equivalent representation, removes infeasible latent directions, and reduces stochastic covariance, together with identification-aware fusion rules for combining heterogeneous constrained-PZ descriptions. The refined realized-disturbance proxies then serve as scenarios in a linear program that learns the smallest translated and scaled copy of the prior disturbance set that contains all proxy confidence sets while remaining nested in the prior. The resulting deterministic, high-probability reachable sets carry formal containment guarantees with substantially reduced conservatism, and numerical examples confirm that the pipeline tightens both the data-consistent model set and the propagated reachable sets.

0

eess.SY 2026-07-02

Repulsive cages trap hijacked agents using their own avoidance

by Luigi Petruzziello, Camilla Fioravanti +1 more

Distributed Containment of a Compromised Agent through Repulsive Cages

Defenders position themselves to shape a compromised target's safety responses, keeping it inside a safe region with sublinear regret for th

abstract click to expand

UAV swarms and cyber-physical multi-agent systems are increasingly deployed in safety-critical missions that require coordinated motion, distributed decision making, and autonomy. A major security risk arises when a legitimate agent is hijacked and driven by adversarial high-level commands. Rather than focusing on detection and isolation of malicious agents, we exploit a structural property common in autonomous platforms: low-level collision-avoidance modules are typically implemented as independent safety layers and may remain active even under high-level compromise. Building on this property, we propose a distributed containment framework that uses the compromised agent's uncompromised avoidance response as an indirect actuation channel. Defender agents select their geometric configuration to shape the repulsive field experienced by the target, with the goal of keeping it inside a prescribed admissible region and, when required, steering it toward a desired destination. The interaction is modeled as an online Stackelberg game in which defenders act as leaders and the adversary reacts by choosing the target command. Using support-function and normal-cone arguments, we derive an exact geometric characterization of robust one-step containment and introduce the notion of a repulsive cage. These results define a centralized Stackelberg oracle and motivate a fully distributed online approximation based on local communication and dynamic field estimation. We prove sublinear dynamic-regret bounds with respect to the centralized benchmark, quantifying the effect of network-induced estimation errors and temporal variability of the stage-wise optimum. Simulations validate the approach and corroborate the theory.

0

math.OC 2026-07-02

The paper develops an LMI-based convex program to approximate the infinite-horizon value…

by Huu-Thinh Do, Trung B. Tran +2 more

Computationally Efficient Near-Optimal Control for Current Ripple Reduction and Optimization of Three-Phase Motors via LMIs

LMI-based quadratic approximation of the value function via iterated Bellman inequalities yields a tractable offline convex program for…

abstract click to expand

The optimal control of three-phase permanent-magnet synchronous motors (PMSMs) is challenging due to their nonlinearity and the discrete nature of the control set. Existing approaches either rely on mixed-integer trajectory optimization or require computationally intensive value-iteration procedures. This paper proposes a Linear Matrix Inequality (LMI)-based method for approximating the infinite-horizon value function using a quadratic parameterization and iterated Bellman inequalities, yielding a tractable convex program. The computed function can be obtained efficiently offline and used online as a tail cost in a horizon-one optimal control law. Simulation results show that the proposed approach achieves a favorable trade-off between switching effort and current ripple, with performance comparable to that of finite-control-set MPC but with a significantly lower computational cost.

0

eess.SY 2026-07-02

The paper develops GPU-parallel methods to compute tight linearization error bounds for…

by Jeffrey Fang, Keyi Shen +2 more

GPU-Parallel Linearization Error Bounds for Real-Time Robust Optimal Control of Nonlinear and Neural Network Dynamics

Tight linearization bounds computed in parallel allow online optimization of feedback policies with formal guarantees up to 168 dimensions.

abstract click to expand

This paper studies real-time robust optimal control for uncertain nonlinear systems, where linear time-varying (LTV) approximations make planning tractable but require sound linearization error bounds (LEBs) to guarantee robust constraint satisfaction. We develop tight, differentiable, GPU-parallel LEBs for LTV approximations of nonlinear and neural network (NN) dynamics. For analytic dynamics, we introduce path-based Hessian bounds that are tighter than standard interval methods. For NN dynamics, we derive certified LEBs using NN verifier-generated affine relaxations and local Jacobian corrections. We adapt a GPU-parallel system-level synthesis LTV-based robust control solver to be compatible with these LEBs by extending it to handle right-invertible disturbance matrices and non-zero-centered disturbance sets for tight zonotopic uncertainty propagation. Our method, GPUSLS-LEO, enables online optimization of robust feedback policies that account for linearization error, producing tight, formally verified reachable tubes. On complex nonlinear and NN dynamics up to 168 state dimensions, our method can compute robust control policies on the GPU at rates up to 67 Hz, reducing solve times and conservativeness relative to baselines while preserving formal guarantees and real-time performance.

0

eess.SY 2026-07-02

Python framework unifies Taylor-model reachability for hybrid and stochastic systems

by Salma Iraky, Andrew Sogokon

TERA: A Unified Taylor Model Enabled Reachability Analysis Framework

TERA computes tight reachable-set over-approximations inside one open codebase that already covers ODEs, hybrid dynamics and continuous stoc

abstract click to expand

Reachability analysis of safety-critical systems requires computing rigorous enclosures of all possible state trajectories. Taylor Model (TM)-based methods have proved effective at mitigating the so-called wrapping effect which leads to overly conservative enclosures of reachable sets. However, existing tools are often hard to extend or focused on narrow system classes (e.g. deterministic systems modelled by ODEs, or hybrid systems). We develop TERA: a Python-native framework for TM-based reachability analysis of continuous, hybrid and stochastic systems within a single symbolic-numeric workflow. TERA is free and open-source, enabling rapid prototyping of reachability analysis techniques with rigorous enclosures. At present, our implementation is able to compute tight reachable set over-approximations for non-linear ODEs and hybrid systems on difficult benchmark problems, and already supports analysis of continuous-time stochastic systems. Our goal is to develop a robust open-source Python infrastructure for rigorous reachability analysis supporting a broad class of systems, including stochastic hybrid systems.

0

eess.AS 2026-07-02

Same-speaker tests isolate language mismatch as main SV loss driver

by Pol Buitrago, Javier Hernando

Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages

Bilingual evaluation set for five Iberian languages shows speaker effects explain only part of cross-lingual performance drop.

abstract click to expand

Cross-lingual speaker verification (SV) systems typically exhibit performance degradation when enrollment and test utterances are spoken in different languages. However, standard evaluation protocols confound language mismatch with inter-speaker variability, as evaluation is generally performed with different speakers across languages. In this work, we introduce a bilingual same-speaker evaluation set for five Iberian languages, enabling analysis of cross-lingual SV under constant speaker identity. We apply this setup to a HuBERT-based SV system previously shown to exhibit strong language dependence, and analyze results using the Cross-Lingual Transfer Matrix (CLTM) to study pairwise cross-lingual transfer. Our results show that speaker-related variability accounts for part of the observed degradation, but language mismatch remains the main driver of cross-lingual performance loss. These findings provide a more precise characterization of language dependence in cross-lingual SV.

1 0

0

cs.SI 2026-07-02

LLM agent networks form with preferential attachment and possible weaker-model dominance

by Yiming Zhang, Vikram Krishnamurthy

Emergence of Preferential Attachment and Glass-Ceiling Effects in Autonomous Networks of LLMs

Prominent agents gain more links while weaker ones can reach central positions; mean-field model proves stable type equilibria.

abstract click to expand

We investigate the emergence of structural disparities in networks of collaborating large language model (LLM) agents. When LLM agents autonomously choose collaborators, the resulting communication network exhibits preferential-attachment dynamics: agents that are already prominent become increasingly likely to attract additional connections. In some cases, weaker LLM agents (agents with smaller base model or older version) can disproportionately occupy central and influential network positions relative to stronger LLM agents. We interpret this as a type-dependent glass-ceiling effect (GCE). We model the network of LLM agents as a time-evolving sequence of directed weighted graphs, where the vector-valued edge weights represent cumulative tokens exchanged, number of interaction rounds, and reasoning effort. Using a contraction mapping argument on the mean-field dynamics, we prove that the importance (centrality) of each agent type converges to a unique stable equilibrium. To ground the model in LLM decision mechanisms, we introduce a cross-attention-inspired utility for collaborator selection. This utility specifies the local connection dynamics and, together with the mean-field model, yields a predictive characterization of the limiting network structure and its type-dependent centrality gaps. To validate the theory, we develop an experimental testbed with 100 LLM agents. Our experiments show that autonomous network formation can generate persistent centrality disparities, with their magnitude and direction depending on model family, model size, system-prompt design, and task context. They further show that the effect of preferential attachment depends on its alignment with model capability: reinforcing it improves collective performance when stronger agents become central, whereas weakening it improves performance when network dynamics instead favor weaker agents.

0

cs.LG 2026-07-02

Hierarchical JEPA hits SOTA on ECG benchmark with low compute

by Siwon Kim

A Lightweight Self-Supervised Learning Framework for Multivariate Time Series using Hierarchical-JEPA on ECG Data

Pretrained on 180000 recordings, ER-JEPA reaches top ST-MEM scores using minimal resources and rapid computation.

abstract click to expand

Data analysis in the medical domain often encounters scenarios involving a limited target dataset and a large, unannotated dataset with a general distribution. Under such circumstances, self-supervised learning (SSL) methods are highly effective for utilizing large datasets, making them a popular choice for electrocardiogram (ECG) analysis. This work presents the Event Reconstruction Joint-Embedding Predictive Architecture (ER-JEPA), a lightweight SSL framework for multivariate time series, whose name and two-fold hierarchical structure are inspired by the diagnostic approach of cardiologists. At its core, ER-JEPA features: (1) a two-stage structure that constructs representations for each time interval and subsequently processes these representations as a univariate time series, (2) the hierarchical integration of two Joint-Embedding Predictive Architectures (JEPAs), and (3) a Vision Transformer (ViT) backbone. The structural concatenation of two JEPAs categorizes the model as a Hierarchical JEPA (H-JEPA), designed to encode multiple levels of abstract representations for enhanced prediction on complex tasks. This study reports a successful application of H-JEPA to 12-lead ECG data as a multivariate time series alongside an analysis of the sensitivity of hierarchical representation during the pretraining stage. Pretrained on approximately 180,000 10-second recordings, the model achieves state-of-the-art downstream performance on the ST-MEM benchmark, with rapid computation and minimal resource usage.

0

physics.ins-det 2026-07-02

Multifocal plenoptic system reaches 1 mm 3D resolution in scintillators

by Xiang Dai, Chi-Jui Ho +10 more

Plenoptic imaging of particle interactions in scintillation detectors

Design with varied focal lengths boosts depth sensitivity when photons are scarce, shown in prototype tests with O(100) photons.

abstract click to expand

Accurate 3D localization of radiation interactions in scintillation detectors is essential for nuclear and particle physics, safeguards, and medical imaging, but remains difficult in light-starved regimes with limited photon statistics. We present PRISM, a multifocal plenoptic imaging system designed for millimeter-scale 3D position reconstruction in a single-volume scintillator. PRISM uses a multifocal microlens array with diverse focal lengths and high effective numerical aperture to balance photon collection with spatial and depth encoding. A Cram'er--Rao lower bound analysis shows that the multifocal design improves axial sensitivity over conventional unifocal plenoptic systems under photon-limited conditions. We build a prototype system, calibrate its optical response with a tunable light source, and form photon-limited measurements with $\mathcal{O}(100)$ detected photons. For sparse single-vertex events, we reconstruct interaction locations using an Alternating Descent Conditional Gradient-inspired algorithm and demonstrate an average 3D localization error of approximately 1 mm. We also provide an initial evaluation of double-vertex events, showing that localization improves as the axial separation between interactions increases. These results demonstrate that multifocal plenoptic imaging can mitigate the traditional trade-off between light collection and spatial resolution, providing a photon-efficient approach to 3D reconstruction in scintillation detectors and a foundation for future multi-scattering event reconstruction.

0

eess.IV 2026-07-02

Invariant coresets skip symmetric copies to cut active learning labels

by L. C. Ayres, J. C. M. Bermudez +2 more

Group-invariant Coresets for Data-efficient Active Learning

By selecting orbits in quotient space instead of raw samples, the method reduces wasted queries on transformed duplicates when symmetries cr

abstract click to expand

Active learning reduces labeling cost by querying the most informative unlabeled samples, but standard coreset methods ignore known data symmetries and can waste budget on transformed versions of the same instance. We propose GRINCO, a group-invariant coreset framework that performs acquisition in the quotient space induced by a transformation group, so that selection operates on orbits rather than raw samples. The method uses either canonical representatives or learned orbit-separating invariant embeddings to define practical quotient metrics, and combines quotient-space k-center selection with invariant training through an orbit-averaged loss. We further derive a generalization bound that relates excess orbit-averaged risk to quotient-space coverage, label uncertainty, and intra-orbit variability. Experiments on synthetic scale-invariant data and image benchmarks with rotation-induced redundancy show that GRINCO improves orbit coverage and achieves stronger label efficiency than conventional coreset baselines, especially when group-induced redundancy is substantial.

0

eess.SP 2026-07-02

Sensing-aware reservation cuts PAPR and sidelobes in AFDM ISAC

by Eya Gourar, Abdul Karim Gizzini +3 more

Low-Complexity Sensing-Aware PAPR Reduction for AFDM-based ISAC Systems

Gradient minimization plus randomized local search improves both power efficiency and weak-target detection.

abstract click to expand

Integrated sensing and communication (ISAC) has emerged as a key technology for future wireless networks by enabling communication and environmental sensing through a common waveform and hardware platform. Among the candidate waveforms for ISAC, Affine Frequency Division Multiplexing (AFDM) had attracted significant attention due to its robustness in high-mobility environments, but it suffers from a high peak-to-average power ratio (PAPR). In this paper, we propose a sensing-aware chirp-subcarrier reservation (CSR) framework that reduces PAPR while improving ranging performance. The proposed method combines low-complexity gradient-based PAPR minimization with a randomized local search that exploits the phase sensitivity of the AFDM autocorrelation function to suppress delay low-ambiguity-zone (LAZ) sidelobes. Numerical results show that the proposed scheme achieves significant PAPR reduction together with significant sidelobe suppression, resulting in improved weak-target detection performance.

0

eess.IV 2026-07-02

Image tilt observations reduce UAV prediction error by 60 percent

by Minxing Sun, Yao Mao

Image-Domain Tilt Constrained Distributed Fusion for Maneuvering UAV Tracking with Multi-Camera Electro-Optical Observations

Apparent roll and pitch from rotorcraft images act as acceleration constraints in asynchronous multi-camera fusion

abstract click to expand

Short-horizon prediction is essential for electro-optical UAV tracking, especially when the target is small, maneuvering, or intermittently observed. Image center, line-of-sight, and range measurements provide direct constraints on target position, but their constraints on acceleration are weak. As a result, prediction can lag during aggressive maneuvers. This paper proposes an image-domain tilt constrained distributed fusion method for maneuvering UAV tracking. The method uses the apparent roll and pitch of a rotorcraft target in the image as low-level maneuver cues. A weak-prior auto-labeling pipeline first generates oriented bounding box and image-domain tilt labels from synchronized video, gimbal IMU, and UAV IMU data. A YOLO-OBB detector is then trained to provide online target position and tilt measurements. The front-end Python implementation is publicly available at github.com/ShineMinxing/PythonYOLO. In the fusion stage, the UAV state is modeled by position, velocity, and acceleration. Image-domain roll and pitch are introduced as acceleration-related pseudo-observations. For distributed tracking, one mobile gimbal camera and two fixed ground cameras are fused asynchronously. Camera attitude error states are augmented into the filter to absorb extrinsic drift and cross-camera systematic inconsistency. A Mahalanobis gate with time-since-last-valid covariance widening is used to reject false detections and handle dropouts. In simulation, adding roll/pitch observations reduces the prediction RMSE from 1.991 m to 0.821 m and decreases the cumulative prediction error by 60.75\%. In real distributed experiments, a self-consistency evaluation shows an 18.10\% reduction in cumulative prediction error. The results show that image-domain tilt can provide useful acceleration constraints for robust short-horizon UAV prediction.

0

eess.SP 2026-07-02

Joint optimization raises full-duplex fluid antenna rates

by Jingxuan Zhou, Yinchao Yang +3 more

Alternating Optimization for Joint Resource Allocation in Full-Duplex Multi-Sector Fluid Antenna-Enabled Near-Field Systems

Multi-sector system with antenna mobility and grouping outperforms half-duplex and fixed-position baselines in sum rate and efficiency.

abstract click to expand

This paper proposes a full duplex fluid antenna near field system (FD-FANS) with a multi-sector antenna array that jointly exploits resource allocation, antenna mobility, and group-based transmitting (TX) and receiving (RX) partitioning. A spherical wave uplink downlink channel is established that accounts for residual self interference (SI), wireless energy transfer (WET), and geometric constraints on antenna motion. Within the FD-FANS framework, an efficient protocol is devised to enable simultaneous downlink energy transmission (DET) and uplink data transmission (UDT) at the base station (BS). Furthermore, we formulate, for both perfect and imperfect SI cancellation (SIC), a weighted sum rate (WSR) maximization problem over time power allocation, antenna positions, and binary group selection, under practical average and peak power limits, per antenna box constraints, minimum spacing, and a half TX half RX balance. To tackle the resulting non convex mixed integer design, we develop an efficient alternating optimization (AO) framework based on majorization minimization successive convex approximation (MM SCA). The proposed algorithm monotonically improves the objective and converges to a stationary solution of a continuous relaxation. Simulation results demonstrate that the proposed scheme achieves consistent performance gains over several benchmark designs, including half duplex FANS (HD FANS), FD fixed position antenna near field system (FD FPANS), non-grouped FD FANS, and far field counterparts, in terms of average sum rate (ASR), energy efficiency (EE), and user fairness, while exhibiting robustness to residual SI and channel uncertainty.

0

eess.SP 2026-07-02

LEO satellites achieve under 10m positioning via standard signals

by Rainer Bachl, Muhammad Nabeel +1 more

Opportunistic Positioning with LEO Satellites based on SSB from NR NTN

NR NTN SSB provides pseudoranges whose ambiguities resolve geometrically for mean error below 10 meters in Starlink simulations.

abstract click to expand

Forthcoming Low Earth Orbit (LEO) satellite networks such as Starlink's Mobile Satellite Service (MSS) will incorporate the New Radio (NR) Non-Terrestrial Network (NTN) standard. The Synchronization Signal Block (SSB) specified as part of NR is periodically broadcast for cell search and initial access. We propose to exploit the SSB for opportunistic receiver positioning. Doppler shift measurements are modeled and pseudoranges are derived from SSB while also taking into account the receiver's clock bias and drift. The resulting per satellite integer ambiguity in the pseudorange is resolved by geometry alone, without inter-satellite differencing or an a-priori position. Measurements are taken from SSBs of multiple satellites and at multiple occasions per satellite, whereby the SSBs are subject to different transmission timings and varying propagation delays. Finally, a simulation model is developed for positioning based on the actual Starlink constellation and the NR NTN standard to evaluate the positioning accuracy to be expected. The proposed approach achieves a mean positioning error of less than 10m without requiring any modification of the NR NTN standard.

0

eess.SP 2026-07-02

MiLAC compression cuts estimation complexity 1540 times

by Qiaosen Zhang, Matteo Nerini +1 more

Channel Estimation and Beamforming for Microwave Linear Analog Computers (MiLACs)-Aided Multiuser MISO Systems

Rank-deficient correlations let limited-RF-chain multiuser beamforming match digital performance

abstract click to expand

Microwave linear analog computers (MiLACs) have recently gained attention for future gigantic multiple-input multiple-output (MIMO) systems by enabling beamforming with greatly reduced hardware and computational cost. However, channel estimation for MiLAC-aided multiuser systems remains an open problem. Conventional channel estimation requires many radio-frequency (RF) chains to access full-dimensional received signals, followed by massive digital processing, which undermines the advantages of MiLAC-aided systems in reducing the number of RF chains and computational complexity. In this paper, we propose computationally efficient channel estimation and beamforming schemes for MiLAC-aided multiuser multiple-input single-output (MU-MISO) systems with a limited number of RF chains. We consider the general case where different user groups experience different channel correlation matrices. By exploiting the rank deficiency of these matrices, the proposed schemes use MiLAC to compress the full-dimensional received signals in the analog domain, making them compatible with the available RF chains while preserving the essential channel information. Then, in the digital domain, only low-dimensional channel estimation is performed based on these compressed observations, substantially reducing computational cost. We further show how regularized zero-forcing beamforming (R-ZFBF) can be efficiently realized from the low-dimensional channel estimates through a cascade of two MiLACs, which offers greater computational flexibility than a single MiLAC. Numerical results show that the proposed schemes reduce computational complexity up to $1540\times$ and $16108\times$, for channel estimation and beamforming, respectively, while achieving performance comparable to digital baselines.

0

eess.AS 2026-07-02

Fused prototypes let audio models learn new classes from few shots and reject unknowns

by Yanxiong Li, Jiaxin Tan +4 more

Few-Shot Open-Set Audio Classification Using Attention Information-Fused Prototypes

Attention-weighted support-query fusion plus one open-set prototype enables updates with limited samples while avoiding misclassification of

abstract click to expand

Most existing audio classification methods suppose that each query (testing) sample belongs to a class of support (training) samples, and misrecognize samples of unseen classes as seen classes (cannot reject samples of unseen classes). In this study, we propose a method for Few-shot Open-set Audio Classification (FOAC), which can recognize query samples of seen classes after updating the model using a few support samples, and meanwhile reject query samples from unseen classes. We design a model consisting of an encoder and a classifier. The encoder is the backbone of a ResNet used for extracting embeddings. The classifier consists of prototype generators of few-shot classes and open-set classes. Prototypes of few-shot classes are obtained by fusing the class-discriminative information of support and query embeddings and by assigning larger weighting coefficient to representative part of the support embeddings. One prototype is generated for open-set classes using the proposed prototype generator. The encoder is trained with abundant samples of base classes in supervised manner, and then the prototypes of base classes are generated under the supervision of a joint loss. The classifier is trained using a few samples of few-shot classes in a meta-training way. Three public datasets (LS-100, NSynth-100, and FSC-89) are used to assess the performance of our method. Experiments show that our method has advantage over prior methods in AUROC and accuracy. This advantage has statistical significance for most prior methods. Our method has lower computational complexity than most prior methods. The code is at https://github.com/Jessytan/FOAC-AIFP.

0

eess.SP 2026-07-02

Lightweight vision model tracks mmWave beams across environments at 84% accuracy

by Mengyuan Ma, Ahmed Alkhateeb +3 more

Lightweight Vision-Aided Beam Tracking for Cross-Environment mmWave Communications

Cuts parameters by 52x and complexity by 79x versus ResNet while generalizing on real data from two distinct scenarios

abstract click to expand

Sensing-aided beam tracking is a promising approach to reduce the overhead for millimeter-wave beam management. However, real-world application remains challenging due to rapid channel variations and substantial environmental differences across deployment scenarios. Developing low-complexity sensing assisted approaches that generalize to diverse environments can alleviate the problem. With this motivation, this paper proposes a lightweight vision-aided model for cross-environment beam tracking. The task is formulated as a sequence-to-sequence classification problem, where the model jointly predicts the current and future optimal beams from past visual observations. We develop a low-complexity model based on depthwise separable convolutions and introduce hierarchical data augmentation and beam power-based label smoothing to improve robustness and generalization. Experimental results on real-world images from two geometrically distinct DeepSense 6G scenarios show that the proposed strategies consistently improve cross-environment beam prediction accuracy up to 84% across the current and three future time slots, outperforming the state-of-the-art solution. Notably, this performance is achieved while reducing the number of model parameters and computational complexity by factors of approximately 52 and 79, respectively, compared with the high-capacity ResNet baseline.

0

eess.SY 2026-07-02

Real-time EV policy cuts transformer aging while meeting urgent deadlines

by B Hari Kiran Reddy

Deadline-Aware Electric Vehicles Charging with Distribution Transformer Overload Mitigation

Convex aging model and marginal-cost urgency index allocate scarce capacity without hard feasibility assumptions.

abstract click to expand

High adoption of electric vehicles (EVs) can overload distribution transformers when charging requests with heterogeneous departure deadlines compete for limited capacity. Most existing coordination schemes enforce hard deadlines and strict transformer limits, implicitly assuming feasibility and failing under severe congestion. We propose a deadline-aware EV charging framework that explicitly trades off transformer thermal aging and charging service quality under capacity-constrained operation. We model transformer stress using a convex aging proxy and soften charging deadlines via penalty-weighted unmet energy at departure. We further develop a low-complexity online charging policy that prioritizes EVs based on a marginal-cost-aware urgency index. We demonstrate through case studies under increasing EV penetration that the proposed approach reduces transformer aging while preferentially allocating limited capacity to time-critical EVs, closely approximating offline benchmark performance using only real-time information.

0

eess.AS 2026-07-02

CNNs turn 4-mic covariance into 32-mic acoustic images

by Marianthi Adamopoulou, Parthasaarathy Sudarsanam +6 more

CNN Models for Microphone Array Covariance Matrix Upsampling and Acoustic Imaging

Models trained on real recordings achieve lower error than random guessing and produce sound maps nearly identical to those from a full 32-c

abstract click to expand

Acoustic imaging visualization is a core methodology in acoustics, enabling spatial analysis of sound sources and acoustic scenes. However, limited sensor availability in practical systems motivate approaches that enhance spatial resolution without increasing the hardware complexity. In this paper, we focus on upsampling virtually a tetrahedral 4-microphone array to a spherical 32-microphone array by estimating the covariance matrices of the channels employing deep learning techniques. Five neural network architectures are investigated for covariance upsampling for acoustic imaging using the real-world STARSS23 dataset. These models are developed to estimate a 32-microphone, time-frequency covariance matrix from a 4-microphone input covariance representation. The proposed architectures are based on 2D convolutional layers to capture the underlying spatial-spectral structure of covariance matrices, and are further enhanced with frequency dynamic convolution to model their frequency-dependent properties. The proposed architectures are evaluated in terms of root mean square error (RMSE) and using delay-and-sum beamforming acoustic imaging. Quantitative results show that all models outperform a random-guess baseline, which yields an RMSE of 0.548, with the best-performing architecture achieving an RMSE of 0.432. We analyze qualitatively the performance of the proposed models through beamforming heatmap visualizations derived from the 4-channel input covariance, the 32-channel ground truth, and the predicted 32-channel covariance matrices. These results demonstrate that covariance upsampling significantly enhances the effective performance of the 4-channel microphone array, producing sound maps that closely resemble those obtained with the 32-channel array.

0

cs.IT 2026-07-02

Outage probabilities derived for ISAC over Rician channels

by Marziyeh Soltani, Mahtab Mirmohseni +2 more

Fundamental Limits of Random Downlink Integrated Sensing and Communication over Rician Channels

SJB and LB schemes yield expressions and scaling laws showing K-factor impacts communication more than sensing.

abstract click to expand

This paper studies the stochastic performance of a downlink multiple-input multiple-output integrated sensing and communication (ISAC) system over Rician fading channels. Rician fading is important in line-of-sight (LoS)-dominated deployments, where a deterministic propagation component can strongly affect sensing and communication reliability. The base station (BS) simultaneously serves a user and senses a target. The BS-user channel contains LoS and non-line-of-sight components. The user LoS angle may be fixed or random, and the target angle may follow an arbitrary distribution potentially correlated with the user angle. Compared with Rayleigh fading, the deterministic LoS component introduces angle-dependent terms and leads to generally independent but non-identically distributed random vectors, requiring new analysis. We analyze two beamforming strategies: subspace joint beamforming (SJB), optimal for the shared waveform structure, and linear beamforming (LB), a practical alternative using separate sensing and communication beamformers. For both schemes, we derive communication outage probability (OP) and sensing OP based on the Cramer--Rao bound (CRB). We also identify special cases with simpler expressions. For LB, we derive upper and lower bounds on sensing OP and a tractable approximation. We characterize large-system and high-power scaling laws. LB without dirty paper coding (DPC) is interference-limited at high power due to radar self-interference. Results show the Rician K-factor affects communication more strongly than sensing, with non-monotonic behavior across regimes. LB with DPC achieves the best overall performance in strong LoS environments and is the only scheme achieving ultra-high communication reliability in Rayleigh fading, while SJB provides a robust lower-complexity alternative across operating conditions.

0

eess.AS 2026-07-02

Noise predictor defends speaker verification from attacks

by Yibo Bai, Sizhou Chen +6 more

Positive-Incentive Noise Predictor for Adversarial Purification in Speaker Verification

Input-adaptive positive-incentive noise replaces slow diffusion denoising, cuts real-time factor to 0.014, and preserves clean performance.

abstract click to expand

Modern automatic speaker verification (ASV) systems are vulnerable to adversarial perturbations. Diffusion-based purification has recently shown strong effectiveness against such perturbations, but its reverse denoising process requires iterative sampling and leads to high inference latency. We find that the forward noising process provides most of the robustness gain. Motivated by this observation, we reformulate adversarial purification as a learnable noising problem, and propose the Positive-Incentive Noise Predictor (PnP), the first framework that explicitly introduces positive-incentive noise ({\pi}-noise) into the purification task. PnP learns input-adaptive {\pi}-noise and mixes it with the input to improve the robustness of downstream ASV systems. Experiments on four advanced ASV backbones show that PnP effectively defends against adversarial attacks while preserving performance on natural speech. Compared with representative purification baselines, the proposed framework provides a competitive balance among defense effectiveness, impact on genuine utterances, and inference efficiency under white-box, black-box, and defender-aware adaptive attacks, with a real-time factor as low as 0.014. Moreover, PnP can be cascaded with a diffusion denoiser to further improve the perceptual quality of purified utterances. Code and purified audio examples are available at https://eurecom-asp.github.io/pnp/

0