cs.CR — Pith

Top Pith

5

cs.CR 2026-06-26

Reverse engineering finds six flaws in AirDrop and Quick Share

by Arash Ale Ebrahim, Nils Ole Tippenhauer

Protocol Prying: Systematic Vulnerability Research in the Apple AirDrop and Android Quick Share Proximity Transfer Protocols

Protocols on over five billion devices accept complex untrusted data without pairing, exposing reachable denial-of-service and bypass paths.

abstract click to expand

Apple AirDrop and Google/Samsung Quick Share are proximity file-transfer protocols used by over five billion devices, yet their application-layer security properties remain largely unstudied because both stacks are proprietary and undocumented. Both protocols are reachable from wireless proximity without any prior pairing and process complex serialized content (binary plists, CPIO archives, Protocol Buffers, UKEY2 handshakes) inside privileged daemons, making them attractive zero-click targets across multiple operating systems. We perform the first cross-platform reverse engineering and protocol-aware fuzzing study of both stacks. We reconstruct AirDrop's seven-layer state machine and DVZip adaptive compression from binary analysis, build AIRFUZZ, a protocol-aware fuzzer that mutates pre-compression representations, and complement it with targeted hand-written analyses of Samsung's Quick Share service and Google's Quick Share for Windows. We discover six vulnerabilities (V1-V6): three pre-authentication issues in macOS/iOS AirDrop (V1: Swift fatalError DoS in the HTTP path router; V2: unbounded XML plist recursion in Foundation; V3: NULL dereference in Network.framework's HTTP/1.1 parser), two protocol-layer flaws in Samsung Quick Share (V4: pre-authentication OfflineFrame dispatch; V5: D2D encryption bypass for three frame types), and a heap use-after-free in Google Quick Share for Windows (V6) for which Google awarded a bounty. We responsibly disclosed all findings, and Apple, Samsung, and Google have acknowledged the reports.

0

Top Pith

4

cs.DS 2026-05-21 2 theorems

Generalized Thresholding Mechanism tests DP mechanisms near-optimally

by Anamay Chaturvedi, Monika Henzinger +1 more

Near-Optimal Generalized Private Testing

It accepts the first sufficiently successful mechanism from a sequence, rejects the rest, and uses a bounded number of evaluations while保持纯ε

abstract click to expand

In differential privacy (DP), the generalized private testing problem was introduced by Liu and Talwar (STOC 2019). Given a dataset $X \in \mathcal{X}$ and a sequence of black-box $\varepsilon_t$-DP mechanisms $M_t:\mathcal{X}\to\{+1,-1\}$, the analyst must accept the first mechanism whose success probability $p_t=\Pr[M_t(X)=+1]$ exceeds a given threshold $p^*\in(0,1)$, while achieving DP. Accuracy is measured by the gap between $p^*$ and a rejection threshold $\bar{p}$, such that with probability $1-\beta$ for all $t\geq1$, if $p_t\leq\bar{p}$, then $M_t$ is rejected, and if $p_t\geq p^*$, then it is accepted. This generalizes the standard private testing problem, whose solution, the Sparse Vector Technique, is ubiquitous in DP. We introduce the Generalized Thresholding Mechanism (GTM) for generalized private testing. For $\varepsilon>0$ and any sequence of $(\varepsilon_t,\delta_t)$-DP mechanisms $M_t$, the GTM is pure $\varepsilon$-DP. For $\theta>0$, $\gamma\in(1,2]$, and $\beta\in(0,1)$, $\bar{p}_t=\max(p^*/\gamma\Lambda_t, 1 - \gamma\Lambda_t(1-p^*))-\delta_t/\varepsilon_t$ for $\Lambda_t=(5t\ln^3(t+2))^{(2+\theta)\varepsilon_t/\varepsilon}(4/\beta)^{(3+\theta+2/\theta)\varepsilon_t/\varepsilon}$. With probability $1-\beta$, the number of evaluations of $M_t$ is at most $O((\ln(t/\beta)/(\gamma-1)^2)\max(\Lambda_t/p^*,(1-p^*)^{-1}))$ for all $t\geq 1$. Our lower bounds prove near-optimality of our accuracy and sample complexity guarantees. Via the GTM, we give a black-box reduction for DP optimization from the continual observation (CO) setting to the batch setting. This gives us the first DP-CO algorithms for many maximization problems. Further, the GTM permits an adaptive choice of acceptance thresholds $(p^*_t)_{t\geq1}$, addressing a challenge mentioned in prior work on using generalized private testing for hyperparameter optimization (Papernot and Steinke (ICLR 2022)).

0

Top Pith

2

cs.CR 2026-05-22 1 theorem

All 119 tested MCP OAuth servers show authentication flaws

by Huijun Zhou, Xiaohan Zhang +4 more

A First Measurement Study on Authentication Security in Real-World Remote MCP Servers

First study of real-world remote servers finds 325 total issues, dynamic client registration flaws in 96.6 percent.

abstract click to expand

The Model Context Protocol (MCP) is emerging as a common interface connecting large language models (LLMs) with external services. Remote deployments are becoming increasingly important as agents connect to user-linked online services, such as social, productivity, and financial services. In such deployments, the authentication boundary between MCP clients and remote servers becomes security-critical, yet remains underexplored. We present the first measurement study of authentication security in real-world remote MCP servers. We identify 7,973 live remote MCP servers, finding that 40.55% expose tools without authentication. Among authenticated servers, OAuth is the dominant authorization mechanism for reaching remote services, and OAuth deployments in the MCP ecosystem commonly exhibit three characteristics: open client environments, dynamic client registration, and delegated authorization. These characteristics distinguish MCP deployments from traditional OAuth and introduce new attack surfaces. Guided by this observation, we derive a taxonomy of authentication flaws comprising three MCP-specific categories and conventional OAuth misconfigurations, for a total of four categories and nine concrete flaw types. To evaluate these flaws at scale, we implement a semi-automated detection framework that combines passive traffic inspection with active dynamic probing. Applying it to 119 testable real-world OAuth-enabled MCP servers, we find that each server exhibits at least one flaw, with a total of 325 flaws identified, among which dynamic client registration flaws affect 96.6% of tested servers. Many of these flaws can lead to sensitive information leakage and account takeover. Through responsible disclosure, we obtained 9 CVE IDs. Our findings expose pervasive authentication weaknesses in the MCP ecosystem and underscore the urgent need for hardened OAuth-based remote deployments.

0

cs.CR 2026-07-03

Taxonomy organizes factors in cybersecurity incident response

by Thomas Biege, Marius Brockhoff +4 more

SoK: A Taxonomy for Cybersecurity Incident Response Influence Factors

Review of 457 publications yields a more complete classification than seven prior frameworks and NIST elements.

abstract click to expand

Cybersecurity incident response has emerged as a critical area of interest for both researchers and practitioners. The corpus of literature on cybersecurity incident response is expanding, yet a unified framework for systematically organizing the accumulated knowledge remains absent. The aspects of incident response span multiple domains, including technology, human-computer interaction, organizational theory, and human factors. A comprehensive, integrative perspective on these factors can enable researchers to identify underexplored areas and more effectively target their empirical and theoretical investigations. Our study systematizes the factors that influence organizational preparedness for and response to cybersecurity incidents. Through a systematic review of academic literature (n = 417) and non-scientific publications (n = 40), we derived the "Cybersecurity Incident Response Influencing Factor Taxonomy" (\textit{CIR-IF Taxonomy}). Existing empirical findings were classified within this taxonomy, providing a comprehensive and up-to-date overview of knowledge from the period 1999 to mid-2024. The taxonomy categories were systematically compared with seven established scientific frameworks and with the \textit{NIST Cyber Security Framework} elements referenced in the \textit{NIST Special Publication 800-61r3} incident response profile. The results of this comparison show that the \textit{CIR-IF Taxonomy} delivers a richer, more rigorous, and more systematically organized view of the factors that drive and shape incident response.

0

cs.SE 2026-07-03

Traffic model spots REST API attacks at 82% recall without docs

by Ran Dubin, Amit Dvir

HTTP REST API Structure Learning

HRAL builds endpoint baselines from network data alone, outperforming alternatives when documentation is incomplete and hitting 100% with si

abstract click to expand

Application Programming Interfaces (APIs) are essential in software development, enabling web services, mobile apps, and microservices. However, their widespread use introduces significant security risks, highlighting the importance of API security. This paper presents HTTP REST API Learning (HRAL), a novel unsupervised anomaly detection approach that models the structure and behavior of API endpoints directly from network traffic, without relying on predefined rules or documentation. HRAL enables robust detection of malicious activity by understanding how APIs behave and flagging deviations as potential threats. We evaluate HRAL across varying levels of OpenAPI documentation detail and compare it with existing techniques. HRAL achieves strong performance, with an average recall of 82.07% and an F1-score of 87.24%, significantly outperforming alternatives when API documentation is limited. Moreover, our results approach the effectiveness of full API document definitions. When combined with signature-based rules such as the OWASP ModSecurity CRS, our system achieves 100% detection. These results highlight HRAL's effectiveness in real-world, partially documented API environments and its potential as a foundational layer for modern API security solutions.

0

cs.AI 2026-07-03

Constraints lift coding-agent backdoor recall from 54.5% to 90.9%

by Thomas Winninger

Steerability via constraints: a substrate for scalable oversight of coding agents

Access controls and enforced conventions let a small reviewer model catch most inserted backdoors while cutting token cost.

abstract click to expand

Coding agents are capable; human oversight is the bottleneck. Unconstrained agents introduce security risks, erode codebase scalability, and make human review increasingly costly. We argue that the same methods used for decades to manage large human engineering teams: access control, network policies, strict coding conventions enforced by tooling; transfer directly to coding agents, and are cheaper (in token) than recent agentic scaffolding. We sketch a start-to-end system on this principle, and report a controlled experiment in scalable oversight: a small reviewer (Gemma 4 e4b) inspects a Python codebase containing 11 inserted backdoors. Recall rises from 54.5% (unconstrained, no tools) to 90.9% (constrained substrate plus a ~200-LoC `docs` CLI), with substrate and tools contributing independently. We choose Python deliberately: substrate-level oversight gains are largest where the language gives the fewest guarantees by default; the principles extend to languages like Rust.

0

cs.CR 2026-07-03

Static scanners miss cloaked malicious agent skills

by Zimo Ji, Congying Xu +5 more

Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware

Runtime sandbox tracking catches attacks that hide from code inspection

abstract click to expand

LLM coding agents increasingly rely on third-party agent skills from public marketplaces, which execute with the agent's privileges and create a software supply-chain attack surface: a malicious skill can steal credentials, exfiltrate source code, or install backdoors. Existing defenses use static skill scanners based on pattern matching or LLM-as-judge analysis, but it remains unclear whether they withstand adaptive evasions that preserve malicious behavior while changing payload appearance. This paper first presents an adversarial study of existing skill scanners through SkillCloak, a payload-preserving evasion framework that keeps the attack semantics intact while transforming their visible form. SkillCloak uses two complementary strategies: Structural Obfuscation, which rewrites visible payload indicators into semantically equivalent forms, and Self-Extracting Skill (SFS) Packing, which hides malicious components from the install-time view and restores them during agent execution. Across eight scanners and 1,613 in-the-wild malicious skills, SFS Packing bypasses every scanner at over 90%, while Structural Obfuscation bypasses over 80% on most static scanners and reaches 96% on a hybrid scanner, showing that appearance-based auditing is insufficient. Motivated by this finding, we propose SkillDetonate, a behavior-centric runtime auditor that executes skills in a sandbox and detects malicious effects through OS-boundary information-flow evidence rather than install-time appearance. SkillDetonate combines on-demand closure lift, which observes instructions materialized during execution, with marker-based taint analysis, which tracks sensitive-data flows across the agent context, files, processes, and network operations. The results show that SkillDetonate detects 97% of attacks at a 2% false-positive rate and sustains 87% detection on real-world malicious skills.

0

cs.DC 2026-07-03

Social graph recovers lost keys via custodian majority

by Ohad Eitan, Idit Keidar +1 more

Securing People and their Machines Against Major Faults

Friends replace a person's public key after identity custodians approve, restoring the network without central servers.

abstract click to expand

We consider grassroots platforms -- distributed systems of agents consisting of people identified by self-chosen public keys and their machines (smartphones) -- and wish to make them secure against \emph{major faults}: the loss of their private keys and/or their smartphones. As grassroots platforms have no global resource to rely on for recovery, our peer-based solution is based on: (\ia) \emph{a grassroots social graph} in which agents establish and maintain friendships; (\ib) \emph{identity custodians}, designated by each person, and (\ic) \emph{state custodians}, which are grassroots platform-specific. Upon a person experiencing identity loss, and given a willing supermajority of the identity custodians of the person, the friends of the person replace the old public key with the new one across the graph and restore friendships, where all friends serve as state custodians for the social graph. Choosing a new keypair, obtaining a new smartphone, and convincing identity custodians to will a change of key all happen ``off-chain''. Recovery from machine loss without loss of key (e.g. smartphone run over by truck, or its memory wiped) is simpler, requiring only the help of state custodians. We specify the social graph and its secure version as guarded multiagent atomic transactions, and implement the secure social graph via communicating volitional agents, an eventually synchronous message-passing model one step closer to implementation. We prove the implementation maps runs with recoverable faults to correct runs of the specification. We follow a similar path for grassroots coins and bonds, showing a common core as well as the platform-specific aspects of state recovery: a currency's single-writer log is recovered exactly, the recovered sovereign resuming without double-spending.

0

cs.CY 2026-07-03

AI regulations drive need for structured risk assessment

by Javier Irigoyen, Roberto Daza +6 more

Overview of Risk Assessment and Management for Intelligent Systems under the AI Act and Beyond

Review of global rules and methodologies identifies best practices plus gaps in managing technical and ethical risks.

abstract click to expand

The society and emerging risk-based regulatory frameworks for AI underscore the need for rigorous risk assessment to ensure safe and reliable AI systems. In response to this imperative, this paper presents an overview of AI risk assessment (identification and analysis) and management methodologies. It begins by reviewing the worldwide regulatory landscape that drives the need for systematic AI risk assessment. Then we characterize the spectrum of AI-related risks identified in the literature, from technical failures to ethical and social impacts. Subsequently, it reviews key risk assessment methodologies proposed for AI systems, focusing on general frameworks. The paper highlights best practices and illuminates methodological gaps, highlighting areas for further research on AI risk assessment.

0

cs.LG 2026-07-03

Coded computing reduces privacy leakage in distributed ML

by Xavier Martínez-Luaña, Alba Gude-Santos +2 more

Privacy-Preserving and Verifiable Approximate Distributed Coded Computing

GPBACC plus aggregation and group testing limits leaks and isolates adversaries across federated and decentralized settings.

abstract click to expand

Distributed machine learning enables collaborative model training without centralizing data, but it also exposes learning processes to privacy leakage and malicious manipulation. Existing defenses typically address these threats in isolation and are often tailored to specific learning paradigms or model architectures, limiting their applicability in realistic deployments. In particular, federated learning and decentralized learning exhibit distinct adversarial surfaces that are rarely addressed within a unified framework. In this paper, we present a model-agnostic framework for adversary-resistant distributed learning that jointly addresses privacy preservation and malicious behavior across both federated and decentralized settings. Our approach combines paradigm-specific defense mechanisms with GPBACC, a privacy-enhancing coded computing technique applicable to arbitrary machine learning models. For federated learning, we integrate robust aggregation strategies to mitigate the impact of malicious participants, while for decentralized learning we employ approximate decode-and-compare and group testing techniques to enable lightweight verification and adversary isolation without relying on a trusted aggregator. Crucially, we evaluate the proposed framework through an explicit, attack-driven analysis. We implement representative privacy attacks and malicious behaviors, and empirically demonstrate that the combination of GPBACC with robust aggregation and verification mechanisms significantly reduces privacy leakage and improves resilience against active adversaries. These results suggest that privacy-enhancing coded computing, when combined with appropriate adversary-resistance strategies, provides a practical and deployable foundation for secure distributed machine learning.

0

cs.CR 2026-07-03

Black-box method detects guardrails via behavioral signals

by William Hackett, Peter Garraghan

Behind the Refusal: Determining Guardrail Activation via Behavioral Monitoring

HTTP, lexical and timing differences separate guardrail blocks from LLM rejections at 98 percent F1 on unseen prompts.

abstract click to expand

As Large Language Models (LLMs) and agentic systems become integrated into real-world applications, ensuring their safety and security is critical. Guardrail systems that detect and block malicious instructions sent to and from an LLM are an essential component of AI security. However, researchers conducting black-box adversarial emulation against production AI systems often struggle to determine whether a guardrail block or an LLM rejection has occurred. This distinction is important because the techniques used to bypass guardrails can differ substantially from those used to bypass LLM safety alignment, and has a material impact on attack technique selection and optimization. We propose the first black-box guardrail reconnaissance methodology, which detects the presence of a guardrail within a target AI system through behavioral monitoring of HTTP, lexical, and timing signals, assuming only black-box access and zero prior knowledge of the guardrail or AI system. Experiments demonstrate that our approach detects guardrail presence with 100% accuracy, with statistically significant behavioral separation between benign and malicious interactions (q < 0.001). Our approach further identifies the content categories a guardrail is designed to block, and distinguishes guardrail blocks from LLM rejection on unseen prompts with an average F1 score of 98%.

0

cs.CL 2026-07-03

0.8B open model tops safety benchmarks at 90.9 F1

by Navaneeth Sangameswaran, Preetham S +1 more

HaloGuard 1.0: An Open Weights Constitutional Classifier for Multilingual AI Safety

HaloGuard 1.0 beats 27B baselines on seven prompt-safety tests while holding low error rates across 46 languages.

abstract click to expand

We present HaloGuard 1.0, an open-weights implementation of the constitutional-classifier paradigm for input safety. It achieves state-of-the-art performance on English and multilingual prompt-safety benchmarks at roughly one-tenth the model size of current leading open guard models. The safety constitution is the organising structure of the corpus: a natural-language constitution of 46 policies and 2,940 subcategories drives synthetic data generation, with exhaustive one-to-one paired counterfactuals that hold topic and vocabulary fixed while flipping intent, a two-tier harmless design that separately targets boundary and baseline false positives (FPs), and balanced multilingual materialisation across 46 languages that treats language as a surface form appearing on both sides of the boundary rather than as an adversarial signal. Across seven prompt-safety benchmarks, HaloGuard 1.0-0.8B attains the best average F1 (90.9) of any open guard we evaluate, outperforming baselines up to 27B parameters (over 30 times larger) while holding false-positive rate (FPR) to 4.3 and false-negative rate (FNR) to 9.5. The HaloGuard 1.0-4B variant reaches average F1 of 92.1 and FPR of 3.5, spending its extra capacity on precision rather than recall. A structured adjudication of the remaining failures indicates that most apparent missed-harm cases are benchmark mislabels rather than genuine model misses. An always-on adversarial red-teaming protocol continuously hardens the guard against both content-level and agentic attacks. We release the models as open weights.

0

cs.LG 2026-07-03

kNN on LLM activations matches fine-tuned guardrails at 10x speed

by Mahmoud Abdelfattah, Hamid Nasiri +1 more

kNNGuard: Turning LLM Hidden Activations into a Training-Free Configurable Guardrail

A 50-prompt bank and multi-layer nearest-neighbor search classify unsafe inputs without any model training or slow inference.

abstract click to expand

Large language models (LLMs) are increasingly deployed in domains requiring guardrails to detect unsafe, off-topic, or adversarial prompts. Existing guardrails predominately rely on fine-tuning to build classifiers, which often suffer from low generalization and high inference latency. We present kNNGuard, a training-free guardrail that utilizes the activation space of an off-the-shelf LLM. Given a small bank of 50 safe and unsafe prompts, kNNGuard extracts hidden activations and performs multi-layer kNN fusing activation-space and embedding-space scores for classification. Across six domains spanning topical and security prompts, kNNGuard achieves competitive or superior F1 compared to fine-tuned state-of-the-art guardrails while running 2.7x faster than the best comparable guardrail, and 10x faster than a fine-tuned safety classifier without gradient updates or fine-tuning. Domain adaptation requires only updating the labeled bank, which can be constructed in under 10 seconds and several orders of magnitude faster than established guardrails. We also analyze the impact of system prompts, layer selection, and integration into production LLM pipelines as a configurable, low-latency guardrail.

0

cs.AI 2026-07-03

Protocol verifies agent state continuity to block poisoning

by Jiankai Jin, Xiangzheng Zhang +5 more

ElephantAgent: Contextual State Continuity in Agentic Systems

ElephantAgent recomputes digests against a trusted hardware ledger before each query to detect tampering in tools and memory.

abstract click to expand

Agentic systems enhance their capabilities by invoking external tools and maintaining persistent memory. However, these external dependencies introduce novel attack surfaces. Recent tool and memory poisoning attacks show that maliciously crafted tool descriptors and poisoned memory can covertly bias agent behavior. These threats reflect a deeper issue: the lack of verifiable continuity in the agent's contextual state for planning and execution. We present ElephantAgent, a protocol that enforces Contextual State Continuity to defend against contextual state poisoning. Inspired by prior state-continuity mechanisms (e.g., Nimble), ElephantAgent extends this protection to the evolving contextual state of agentic systems. We define the contextual state as the bounded, security-critical subset of the agent's entire context (e.g., tool state and memory). Before processing each query, ElephantAgent recomputes the digest of the local contextual state and verifies it against the latest authorized digest. Using replicated trusted hardware, ElephantAgent maintains a linearizable ledger of authorized contextual state transitions and detects out-of-band state tampering. To handle in-band semantic abuse, ElephantAgent additionally provides Historical Traceability, enabling conditional post-hoc audit and recovery to a known-good prior state.

0

cs.CR 2026-07-03

Two signals detect abliterated checkpoints at 0.95 AUROC

by Gabriel Hurtado

Has This Checkpoint Been Abliterated? A Two-Signal Audit and Its Failure Map

Z-sum of refusal-gap and weight energy separates 57 stripped models from 37 benign fine-tunes on 273 checkpoints with 0.89 accuracy on new f

abstract click to expand

Can a platform tell, before deployment, whether an open-weight checkpoint has had its refusal mechanism stripped? Runtime guards cannot: they score generations, not the artifact. We combine two cheap internal signals, a reference-anchored activation refusal-gap and a weight-recovery energy of the base-to-candidate weight difference, into a threshold-free checkpoint audit. The two are negatively correlated and label-complementary: the gap supplies refusal-specificity and the weight energy supplies recall. On a 273-checkpoint registry spanning Qwen, DeepSeek-distilled Qwen, Llama, and Gemma, their z-sum separates 57 public abliterations from 37 benign fine-tunes, merges, and instruction-tunes at AUROC 0.95, significantly above either signal alone (0.84, 0.90), and a Youden-calibrated threshold transfers to held-out families at balanced accuracy 0.89 (FPR 0.11), missing only 4 of 57. We then map two failures, in order of severity: a spoofed reference evades both axes with no training ({\Delta}W=0, \r{ho}=1 by construction), and a white-box owner trains a checkpoint past the threshold while it stays guard-unsafe and coherent. The audit is effective triage, not tamper-proofing: it presumes an attested reference, and its claims are bounded by the registry we evaluate it on.

0

cs.CR 2026-07-03

Evolved rules from few examples beat large models at smart contract checks

by Yuqiang Sun, Han Liu +5 more

Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection

Portable logic built from ten samples per type transfers across models at under fifty dollars

abstract click to expand

Smart contract vulnerabilities are predominantly logic bugs whose detection requires structured, step-by-step procedural knowledge of attack patterns and contract semantics. Existing LLM-based methods struggle to generate this knowledge automatically: prompt-based methods rely on manually crafted detection rules, while fine-tuning requires massive labeled datasets that are inherently scarce in this domain. We present EvoVuln, an automated framework that reformulates vulnerability detection as a procedural knowledge evolution problem, synthesizing and refining detection logic using only a minimal number of labeled samples. To achieve this, EvoVuln introduces two key mechanisms. First, a Runtime with an Inversion of Control (IoC) architecture compiles detection rules into Executable Policies. This strictly decouples deterministic control flow from LLM semantic reasoning, ensuring faithful logical adherence and producing dense diagnostic telemetry for precise error localization. Second, a two-phase evolution pipeline refines the rule via abductive semantic debugging without any parameter updates: Cold Start bootstraps and stress-tests an initial rule using auto-synthesized corner cases; Few-Shot Evolving then grounds the policy in real-world semantics using only five vulnerable and five safe examples per vulnerability type. Evaluated across five real-world vulnerability types, EvoVuln achieves a 71% macro-average F1-score, outperforming all baselines. The evolved procedural knowledge is portable across models: it enables a lightweight, low-cost model to surpass a much larger zero-shot model by 19 percentage points, and transfers to other LLMs without retraining, at a one-time evolution cost under $50.

0

cs.CR 2026-07-03

Delegation helps accuracy only with large unrepresentative abstention

by Zhuolun Li, Evangelos Pournaras

Resilient Liquid Democracy: Mitigating Voting Power Imbalances via Secure Delegation Networks

Sealed networks reduce power concentration and ranked fallbacks cut vote loss in liquid democracy

abstract click to expand

Liquid democracy promises to improve collective decision-making by allowing voters to vote directly, delegate their voting power to trusted participants, or combine both approaches through fallback mechanisms. However, existing deployments typically rely on transparent delegation, which exposes voters to popularity-driven herding, makes coercion verifiable, and introduces systemic fragility when highly-backed delegates abstain. In this paper, we propose a secure liquid democracy mechanism that resolves the tension between informed expertise routing and systemic robustness. We introduce a sealed delegation regime using decentralized timed-release encryption, which cryptographically hides delegation choices during the formation phase to prevent herding and coercion, while restoring full public auditability for the final tally. To address delegate failures, we extend the protocol with ranked multi-delegation and personal fallback ballots. We formally prove pre-reveal secrecy and resubmission receipt-freeness for our protocol. Finally, we evaluate the mechanism on four real datasets, a municipal participatory-budgeting election with a calibration survey, twenty further participatory-budgeting elections, and 60,000 US voters with an objective competence measure. They show that whether delegation improves representational accuracy follows a recoverable-gap law; it helps only when abstention is large and systematically unrepresentative, and is otherwise neutral or harmful, with representative-style delegation safer than delegating to a competence elite. The benefit of sealed formation is primarily structural, sharply reducing power concentration rather than directly improving accuracy; and ranked multi-delegation with personal fallback ballots sharply reduces vote loss under realistic and targeted delegate failures, a result that replicates across all twenty elections.

0

cs.CR 2026-07-03

Syntactic checks at trust boundaries leave semantic security gaps

by Doyeon Kim, Jin-Young Choi +1 more

Trust Boundary Semantic Gaps: A Multi-dimensional Analysis and Mitigation for Security-by-Design

Study of 75 incidents shows signed or protocol-compliant artifacts can still enable compromise; design method makes hidden assumptions expli

abstract click to expand

Modern systems use format-, protocol-, and signature-based mechanisms before accepting artifacts across trust boundaries. These mechanisms are necessary: they show that an artifact is well formed, protocol-compliant, or properly authenticated. They do not, however, show that the artifact satisfies the semantic security properties required by the receiving domain. A signed update or an authenticated token may therefore be accepted yet enable compromise. We call this condition a Trust Boundary Semantic Gap (TBSG): an artifact crosses a trust boundary and passes correctly implemented syntactic validation, but the assertions established by that pass are insufficient to satisfy the receiving domain's security requirements. TBSG concerns what remains unestablished after a syntactic pass, not absent checks or implementation bugs. Analyzing 75 publicly reported security incidents (2014-2025) at the boundary level, we organize semantic misalignment into a four-dimensional analysis model: Identity, Spatial, Temporal, and Interpretation (MDTBSG). Building on it, we develop Trust Boundary Semantic Analysis and Mitigation (TBSAM), a design-time framework that identifies TBSGs from design specifications, prioritizes them, traces propagated gaps to their originating boundary, and maps each to candidate architectural controls. We apply TBSAM to a retrospective reconstruction of the SolarWinds/SUNBURST supply-chain attack, showing how it makes receiving-domain assumptions explicit, separates locally originating from propagated gaps, and identifies controls that interrupt the path. These results suggest that syntactic validation, while necessary, is not sufficient at trust boundaries, and that making trust-boundary assumptions explicit can complement Security-by-Design.

0

cs.CR 2026-07-03

Meta-learning adds multiple backdoors to speech models via timbre leak

by Yueming Huang, Wenhan Yao +3 more

Pmeta-TLA: Backdoor Attacks for Speech Classification Models via Meta-Learning with Timbre Leakage Attack

New trigger spreads frame-level timbre info to create natural poisoned samples that bypass detectors.

abstract click to expand

Recently, speech classification methods have gained widespread adoption in intelligent gadgets. Current study indicates that backdoor attacks provide a substantial security concern to these models, underscoring the pressing necessity to investigate additional potential attack techniques to expose and prevent such risks. This work discusses the vulnerability of current speech triggers to detection by deep neural network defenders and introduces the Timbre Leakage Attack (TLA). The suggested trigger disseminates timbre information at the frame level within the deep self-supervised features, producing poisoned samples that appear natural to human perception. Furthermore, we introduce Pmeta-TLA, an innovative training mechanism for embedding numerous backdoors one time. This method proposes a multi-backdoor injection training strategy using meta-learning and Projected Conflicting Gradients (PCGrad) and introduces TLA as a multi-target attack tool within it. We performed tests on data-poisoning backdoor attacks in keyword spotting tasks utilizing some deep neural network models. Experimental results indicate that the proposed strategy attains superior Attack efficacy, enhanced stealthiness, robustness, and a reduced attack cost relative to baseline methods.

0

cs.CR 2026-07-03

Gradient-based attacks leave XGBoost predictions intact yet shift their SHAP explanations…

by Mona Rajhans, Vishal Khawarey

Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers

ZOO leaves XGBoost predictions nearly untouched but drives ESI down to 0.06-0.16, showing robustness and stability are separate for tree sec

abstract click to expand

Adversarial attacks on cybersecurity classifiers pose a dual threat: degrading predictions and destabilising the SHAP-based explanations that security analysts rely on to understand and triage alerts. We extend our prior MLP conference study to Random Forest and XGBoost across four tabular security datasets (phishing URLs, UNSW-NB15, NF-ToN-IoT, HIKARI-2021), evaluating five attacks including three black-box methods applicable to non-differentiable tree models. We introduce the Explainability Stability Index (ESI), a scalar metric computed from TreeSHAP attribution drift under adversarial perturbation, reported on the same [0,1] scale as the Robustness Index (RI). A key finding is that gradient-based black-box attacks (ZOO) produce degenerate results against XGBoost (apparent RI ~0.98) due to piecewise-constant prediction surfaces, while score-based Square Attack reveals genuine vulnerability (RI ~0.36). These degenerate perturbations still drive substantial attribution drift: XGBoost ESI ~0.06-0.16 despite near-perfect ZOO robustness, versus 0.14-0.29 for RF, showing that prediction robustness and explanation stability are distinct axes requiring joint measurement. A two-axis framework (gradient dependence, query efficiency) explains the observed attack ranking and yields practical guidance for tree ensemble evaluation. A step-size ablation explains a counterintuitive PGD anomaly on z-score normalised tabular data.

0

cs.CR 2026-07-03

Multi-agent AI assistant runs security checks on RTL designs

by Dipayan Saha, Khan Thamid Hasan +4 more

VeriChat: An Agentic Conversational AI Assistant for Hardware Security Verification

Three specialized agents ground responses with retrieved data and live EDA tool calls for simulation and proofs.

abstract click to expand

Hardware security verification is a multi-stage process in which engineers must navigate complex design analyses, threat considerations, and verification strategies. They often need security-focused guidance, yet current verification environments provide little structured support for such assistance. Although conversational AI could offer such on-demand assistance, directly using general-purpose chatbots like ChatGPT or Gemini is risky due to their tendency to hallucinate and their reliance on static, outdated knowledge. We present VeriChat, a domain-specialized conversational assistant designed to support, rather than replace, existing verification workflows by providing context-aware security guidance. VeriChat employs a retrieval-augmented, multi-agent workflow in which three specialized agents collaboratively minimize hallucinations while improving the transparency and reliability of the response. Beyond question answering, VeriChat integrates open-source EDA tools, including Icarus Verilog, Yosys, and SymbiYosys, to perform syntax checking, synthesis analysis, simulation, and formal verification directly on user-provided RTL designs. Evaluated using a comprehensive methodology, VeriChat achieves a Faithfulness score of 87.73%, significantly outperforming the leading proprietary models. We demonstrate the framework through a hardware Trojan detection case study on an AES S-Box IP, where VeriChat autonomously identifies, simulates, and formally proves a covert key-leakage vulnerability through a multi-turn conversational workflow.

0

cs.SE 2026-07-03

AgentFlow maps 238 prompt-to-tool risks via dependency graphs

by Shenao Wang, Xinyi Hou +3 more

AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs

The graphs recover more agent entities and flows than AST tools across 5,399 real programs from five frameworks.

abstract click to expand

LLM agents are increasingly developed as source-code applications built on agent frameworks. These agent programs combine conventional host-language code with framework-defined semantics for models, prompts, tools, memory, and multi-agent orchestration logic. As a result, their behavior depends not only on traditional control and data flows, but also on a new class of agent dependencies. Such dependencies are often expressed as framework-induced semantics, such as agent constructors, tool decorators, and agent handoff declarations, making them difficult to recover with existing static analysis or dependency tracking tools. In this paper, we present AgentFlow, the first static analysis framework for recovering and analyzing agent dependencies from agent programs. AgentFlow constructs an Agent Dependency Graph (ADG), a framework-agnostic graph representation that represents agents, prompts, models, capabilities, memory states, and control policies as typed nodes, and captures their component-dependency, control-flow, and data-flow dependencies as typed edges. Built on ADGs, AgentFlow supports a range of analyses for agent governance and security, including Agent Bill of Materials (BOM) generation and prompt-to-tool risk detection. We implement AgentFlow for five representative agent frameworks and evaluate it on AgentZoo, a corpus of 5,399 real-world agent programs. Our evaluation shows that AgentFlow recovers richer agent entities and dependencies than existing AST-based agent static analysis tools, generates more dependency-aware Agent BOMs, and uncovers 238 taint-style prompt-to-tool risks in real-world agent programs. These results show that ADG provides a practical foundation for understanding, governing, and securing emerging agent software.

0

cs.CR 2026-07-02

Tampered cell libraries mask hardware Trojans from chip designers

by Harish Kumar Dharavath, Md Muhtasim Alam Chowdhury +2 more

LIB-TRAP: Standard Cell Library Hardware Trojan Risk Assessment and Prevention

A foundry can swap deactivated Trojan cells for active ones during fabrication, shown on AES-128 and other benchmarks in 32nm and 130nm tech

abstract click to expand

Vulnerabilities inherent to the fabless semiconductor manufacturing model have significantly increased the risk of malicious Hardware Trojan (HT) insertion, posing severe threats to hardware security. Several HT mitigation and detection strategies have been developed, and existing works explore the insertion of HTs in the space between standard cells in an integrated circuit. However, there is a lack of research into the vulnerabilities posed by the building blocks of most digital designs on the market today, the standard cells. This work investigates a novel threat model in which standard cells are considered untrusted. Our proposed threat model provides the design house with a tampered standard cell library. The intended netlist is synthesized and implemented using the tampered library. During fabrication, a nefarious foundry replaces the library's deactivated HT cells with activated counterparts. Using open-source and industry-standard Electronic Design Automation (EDA) tools, existing standard cell libraries, Saed32nm and Sky130nm, are converted into malicious libraries capable of masking the presence of arbitrary HTs from IC designers. The malicious library is then applied and characterized in multiple standard benchmark designs. To demonstrate the efficacy and stealthiness of this standard cell-based attack vector, three benchmark circuits, an AES-128 encryption core, an Ethernet controller, and a WISHBONE DMA engine, were synthesized using both clean and Trojan-infected libraries across Synopsys 32nm and SkyWater 130nm technologies. Design-level features, including total cell count, total area, dynamic power consumption, and static power, were extracted from these synthesized circuits to serve as inputs for binary classification

0

cs.CR 2026-07-02

Scene text forces LVLMs to overthink and slows robots up to 7x

by Qiang Han, Jie Wu +1 more

Overthink-Triggered Slowdown Attacks on LVLM-Based Robotic Systems

Black-box attacks embed readable triggers in scenes to delay decisions, with physical prints still effective

abstract click to expand

Large Vision-Language Models (LVLMs) have been increasingly integrated into robotic systems. However, these models may exhibit overthinking behaviors, where they generate excessively long reasoning traces, incurring an excessive inference time. This overthinking behavior poses a serious risk to robotic systems, as the adversary can deliberately trigger overthinking to slow down the decision making of a victim robotic system, causing a variety of safety issues (i.e., an overthinking-induced slowdown attack). To initiate this attack, an adversary can embed carefully crafted, human-readable scene text into the visual scene observed by a victim robotic agent, causing significant inference delays even under a strict black-box setting. Therefore, the embedded scene text serves as a significant "trigger" for the attack. This work systematically identifies and validates transferable triggers of overthinking in robotic systems by introducing a three-stage framework. First, we construct a diverse corpus of reasoning-intensive scene text and extract overthinking-correlated lexical features from short response prefixes. Second, we perform an efficient black-box search guided by a prefix-based proxy score while selectively confirming a small set of top candidates with full latency measurements. Third, we evaluate black-box transfer using a fixed pool of triggers on unseen images and multiple LVLMs, reporting latency amplification and attack success rates under standard thresholds. Across three representative LVLMs, all triggers yield slowdown ratios greater than 1.0x, with the strongest single-trigger case reaching 6.96x. The physical printing of the text trigger still causes up to 4.74x latency amplification. These results demonstrate that our discovered triggers are transferred between multiple LVLM models and consistently cause significant slowdowns in robotic systems.

0

cs.AI 2026-07-02

User input boosts AI agent privacy but fatigue and context decide best design

by Natalie Grace Brigham, Eugene Bagdasarian +2 more

Janus: a Playground for User-Involved Agentic Permission Management

Six permission assistant designs tested across scenarios reveal trade-offs in security, load, and repeated-decision effects.

abstract click to expand

AI agents that autonomously execute tool calls on a user's behalf raise pressing questions about permission management: what role could users play, and what role should they play? Despite many proposed approaches, the user's role in agentic permission management remains under explored. We introduce Janus, a playground system for implementing and evaluating user-involved agentic permission management designs. Janus consists of two components: Janus-Core, a modular agentic system supporting a diverse spectrum of permission management designs, and Janus-Harness, an automated evaluation framework. Grounded in a conceptual model that identifies key design axes for user involvement, we implement six permission assistants spanning the design space and evaluate them across three scenarios and three synthetic responders. We demonstrate that user input is critical and can significantly strengthen privacy and security, that AI augmentation of user decisions can help reduce cognitive load, and that realistic user behavior including permission fatigue must be accounted for in system design. No single design performs optimally across all contexts, motivating a more principled and context-sensitive approach to deploying permission assistants in agentic systems. Janus is publicly available to support future investigation into this dimension of agentic system design.

0

cs.LG 2026-07-02

Unveiling the Non-Monotonic Effect of Privacy on Generalization under Byzantine Robustness

by Thomas Boudou, Batiste Le Bars +2 more

The benefit reverses when privacy is weaker and the usual tension with robustness returns.

abstract click to expand

Recent work has established a fundamental trilemma between Byzantine robustness, local differential privacy (LDP), and optimization error in distributed learning. We show that this trilemma does not universally extend to generalization error, but instead depends critically on the privacy regime. Specifically, in the high-noise regime (strong privacy), we prove that increasing privacy reduces the generalization error, i.e., there is no tension between robustness and privacy. In the low-noise regime (weaker privacy), however, the tension between robustness and privacy reappears and increasing privacy indeed degrades generalization. Our theory explains this surprising non-monotonic behavior of the generalization error via matching lower and upper bounds on the algorithmic stability of Byzantine-robust distributed learning under LDP constraints. We corroborate and further analyze these theoretical findings with empirical evaluations.

0

cs.CR 2026-07-02

Hamm-grams with single wildcards outperform n-grams for malware detection

by Derek Everett, Edward Raff +1 more

Hamm-Grams: An Algorithm for Mining Regular Expressions of Bytes

Locality-sensitive hash plus clustering finds byte patterns that tolerate small variations better than fixed sequences.

abstract click to expand

Malware poses a critical and ever-evolving threat, and robust and effective systems for detecting and classifying malware are of essential importance. $n$-grams features are among the common static features used in effective machine learning systems for malware, but these features are inherently brittle. We propose an algorithm for constructing more robust features, hamm-grams, which are a special class of regular expressions having a fixed length and single-character wildcards. We devise an efficient algorithm for finding common hamm-grams using a new locality-sensitive hash designed to produce collisions among pairs of small Hamming distance and a clustering within hash buckets to place wildcards. We then demonstrate the advantages of these features in malware classification and detection tasks.

0

cs.CR 2026-07-02

Top models miss over 25% of AI-made ID cards in zero-shot tests

by Gourab Das, Pavan Kumar C +1 more

From Forgeries to Foundation Models: A Systematic Survey of Identity Document Attack and Detection

Benchmarking on unseen synthesised documents reveals limits in current detection under security conditions.

abstract click to expand

Identity document forgery has undergone a fundamental capability shift: generative AI tools now enable high-fidelity document synthesis and field-level manipulation with minimal technical expertise, while detection methods remain constrained by benchmarks that do not reflect this threat. The resulting attack surface spans physical presentation, digital injection, and fully generative synthesis, introducing distinct forensic failure modes that require a unified threat model and evaluation framework. This survey provides, to our knowledge, the first unified treatment of Presentation Attacks, Digital Injection Attacks, and GenAI-driven synthesis within a single identity verification threat model. We trace detection methodologies from rule-based heuristics through forensic localisation, injection-aware pipelines, foundation models, and few-shot frameworks. A systematic audit of public datasets from 2019--2025 exposes a persistent Reality Gap between benchmark conditions and operational deployment. We further analyse large multimodal models for identity document manipulation, identifying Script-Dependent Generative Instability (SDGI) as a recurring typographic failure mode in non-Latin script inpainting. Finally, zero-shot benchmarking on unseen synthesised ID cards shows that even the strongest publicly available models achieve APCER values above 25% under security-oriented operating conditions, highlighting substantial limits in cross-domain generalisation. We conclude by outlining future directions toward forensically grounded, privacy-preserving, and legally accountable identity verification systems.

0

cs.CV 2026-07-02

3D air signatures authenticate VR users at 2.5% error rate

by Neda Abdolrahimi, Thiru Siddharth +2 more

Sign in the Air to Unlock: An Interface for authentication in Virtual and Augmented Reality Powered by Point-Voxel Cross-Attention Network

Point-voxel network turns natural gestures into password-free access without extra hardware or broken immersion.

abstract click to expand

Significant advancement of immersive technologies such as Virtual and Augmented Reality (VR/AR) and their integration into diverse aspects of modern life need authentication interfaces that are secure, intuitive, and compatible with embodied interaction. Traditional methods such as passwords, PINs, and device-based logins, break immersion and rely on external hardware. Recent 3D-specific behavioral approaches, such as hand-gesture, eye-tracking, and electroencephalography (EEG)-based methods, offer promising alternatives but often require specialized sensors or constrain natural movement, limiting usability in dynamic environments. We present Sign in the Air to Unlock, an in-air signature interface that enables users to authenticate by signing naturally in 3D space which is a familiar, personal, and reproducible gesture. To realize this interface, we design a point-voxel Cross-Attention Network (PV-Net) that jointly models local motion dynamics and global spatial structure from 3D trajectories. The model is evaluated on two datasets: the public DeepAirSig dataset (1,800 signatures from 40 users) and ImmAirsig, a new dataset collected using Meta Quest 2 in immersive VR (880 samples from 22 users). PV-Net achieves an Equal Error Rate of 2.5% on DeepAirSig and 76% classification accuracy on ImmAirSig. These findings highlight the potential of 3D behavioral interfaces for seamless, user-centric authentication that merges security with natural interaction in immersive environments.

0

cs.CR 2026-07-02

ML surrogates recover CPS after memory corruption attacks

by Mohsen Salehi, Karthik Pattabiraman

Chameleon: Recovering Cyber-Physical Systems from Memory Corruption Attacks via ML Surrogates

Replaces vulnerable compartments with accurate models to keep robotic vehicles running safely with low overhead.

abstract click to expand

Cyber-physical systems (CPSs) are increasingly deployed in every aspect of our lives and can be compromised through memory corruption vulnerabilities, allowing attackers to hijack the control flow and take over the system. Existing techniques mostly focus on detecting such attacks but respond by terminating or halting execution upon attack detection, which is not acceptable in CPSs used in safety-critical tasks, as interrupted tasks can have catastrophic consequences. Other techniques replace compromised CPS components with simplified defaults that degrade system behavior, or reboot the system upon attack detection. We propose Chameleon, a novel framework for automatically recovering CPSs from memory corruption attacks using machine learning (ML)-based surrogates trained at compartment granularity that nearly replicate their original compartments' behavior but do not have the same memory corruption vulnerabilities. Upon attack detection, Chameleon replaces the compromised compartment with its trained surrogate. We implemented Chameleon using the LLVM compiler and evaluated its efficiency and effectiveness on seven different robotic vehicles (RVs), including simulated and real ones. We found that Chameleon can generate surrogates that closely approximate the original compartments (with an average R$^2$=0.96), successfully recover the system despite real-world memory corruption attacks unlike prior approaches, and complete their tasks while incurring low performance and memory overhead.

0

cs.CR 2026-07-02

Isomorphisms distort samples and block attacks on fully-split PLWE

by Iván Blanco-Chacón, Rodrigo Martín Sánchez-Ledesma +1 more

An alternative approach towards attacks against fully-split PLWE instances

Every map between fully-split rings alters error terms enough to prevent distinction from uniform

abstract click to expand

In the present work we address some key questions regarding the generalization of root-based attacks presented in a recent work by the authors. In particular, we analyze potential root-based attacks extensions via the construction of explicit isomorphisms from vulnerable instances, and provide a formal proof that this approach will not yield any new vulnerabilities under a fully-split setting. To do so, we first construct an explicit isomorphism between fully-split polynomial rings and polynomial rings where previous attacks apply and show that the application of such an isomorphism will always distort the samples in a way that the resulting samples cannot be used to distinguish. Then, we prove that any isomorphism between fully-split polynomial rings must be of the form of the constructed isomorphism.

0

cs.LG 2026-07-02

Single-logit APIs still leak LLM hidden dimensions

by Christopher Ellis, Shreyas Chaudhari +4 more

Black-Box Inference of LLM Architectural Properties with Restrictive API Access

NightVision recovers dimensions to 23 percent error and depth for large models from prompts and timing alone.

abstract click to expand

In practice, most commercial LLM providers do not publicly release details of underlying LLM architectures. However, prior work has shown that given limited API access to an LLM (namely, top-$k$ logits and/or a logit bias function), one can recover certain architectural details of an LLM, such as the hidden dimension of the feed-forward network. Perhaps in response to these results, most commercial LLM providers have restricted their APIs to expose only the single logit for each decoded token, and they no longer give users the ability to bias logits. We show that even under current restrictive APIs, several architectural parameters are still recoverable. We present NightVision, an attack that uses restrictive black-box API access to estimate the hidden dimension, depth, and parameter count of an LLM. Algorithmically, NightVision relies on a novel common set prompting technique in which multiple prompts expose log probabilities for the same set of output tokens; a spectral analysis of these results is used to infer hidden dimension. NightVision additionally uses end-to-end time to first token (TTFT) measurements and the estimated hidden dimension to estimate depth and parameter count. We empirically evaluate NightVision on 32 open-source LLMs, recovering hidden dimension to within 23% average relative error across all models (9% on MoE models), and depth and parameter count to within 53% for models exceeding three billion parameters. We run extensive ablations to demonstrate how these accuracies scale with token budget and model properties. Overall, our results suggest that current LLM APIs are not sufficiently restricted to fully obfuscate the architectural details of their underlying models.

0

cs.CR 2026-07-02

All-out attack optimal for withholding blocks in PPS pools

by Mustafa Doger, Sennur Ulukus

All-out Attack: Optimal Block Withholding Under Pay-Per-Share Scheme

Under pay-per-share, attackers gain α/(1-α) after adjustment while operators pay for shares without blocks.

abstract click to expand

Classical Block Withholding (BWH) attacks have been extensively studied in block-dependent reward schemes, where pool members are compensated upon a block discovery within the pool. However, most contemporary mining pools operate under share-based scheme wherein participants are paid immediately upon submission of valid shares. In this paper, we analyze BWH under Pay-Per-Share (PPS) and Full-PPS (FPPS) schemes for Nakamoto-style blockchains and prove that these mechanisms are not incentive compatible -- contrary to claims in prior literature. Under PPS/FPPS, the optimal strategy for a BWH attacker is the All-out Attack (AoA): the adversary allocates its entire hashpower toward the victim pool, submitting only partial Proof-of-Work shares (pPoW) while withholding all valid blocks, i.e., full Proof-of-Work (fPoW). Under AoA, prior to the first difficulty adjustment, the adversary incurs negligible loss due to the withheld fPoWs. After the first difficulty adjustment, which reduces block difficulty, the adversary generates more pPoWs per unit time, achieving a relative gain of $\frac{\alpha}{1-\alpha}$ compared to pre-adjustment rates, where $\alpha$ is the fraction of adversarial hashpower. Moreover, per unit time and per unit hashpower, all honest miners benefit at the same rate as the adversary. In contrast, the victim pool operator incurs losses: it pays the attacker out-of-pocket for pPoW submissions but receives no fPoW compensation in return. Finally, advanced variants of BWH, such as Fork After Withholding (FAW), do not yield additional profit to the attacker.

0

cs.CR 2026-07-02

Detectors identify PANDA attacks on NIDS at 99 percent accuracy

by Niklas Bunzel, Ashim Siwakoti

Detecting Adversarial Evasion Attacks Against Autoencoder-Based Network Intrusion Detection Systems

RLD and FPC check image reconstruction errors and packet features to catch adversarial evasions on autoencoder systems.

abstract click to expand

Evasion attacks deliberately manipulate input to an ML-based system to produce an incorrect prediction while the manipulated input still appears benign. The PANDA framework has demonstrated that adversarial examples developed for the vision domain can be transferred to the network domain by converting packet sequences into invertible grayscale images, enabling gradient-based attacks such as masked FGSM against autoencoder-based network intrusion detection systems (NIDS). These attacks manipulate the NIDS anomaly score without altering the underlying attack semantics, leaving defenders without a straightforward way to distinguish between benign flows and carefully perturbed malicious traffic. In this paper, we propose two complementary detectors: the Residual Localisation Detector (RLD), which tracks the spatial concentration of reconstruction errors in the inter-arrival time feature region in image space; and the Feature-Space Perturbation Consistency (FPC) Detector, which operates directly on packet-level inter-arrival time features in packet-feature space. We evaluate both detectors on benign, malicious, and adversarial traffic from multiple IoT devices in the UQ-IoT dataset. Both detectors achieve near-perfect detection performance (TNR, TPR, precision, recall, and F1-score $\geq 0.99$) against adversarial examples across the evaluated IoT traffic. Our results indicate that integrating reconstruction-based scoring with perturbation consistency checks, in both image space and packet-feature space, offers a practical defence against emerging PANDA-style adversarial attacks on NIDS.

0

cs.CR 2026-07-02

Generative AI and federated learning tackle intrusion detection limits

by Jiefei Liu, Abu Saleh Md Tayeen +6 more

Generative AI and Federated Learning for Intrusion Detection Systems: A Survey

They enable synthetic traffic creation and distributed training without sharing raw network data.

abstract click to expand

Intrusion Detection Systems (IDSs) are essential for monitoring network traffic and identifying malicious activities in modern cyber-physical, Internet of Things (IoT), enterprise, and distributed network environments. However, developing reliable IDS models remains challenging because attack behaviors evolve over time, realistic datasets are difficult to obtain, traffic records may be incomplete, attack classes are often imbalanced, and privacy constraints limit centralized data collection. Recent advances in generative artificial intelligence (AI) and Federated Learning (FL) provide new opportunities to address these limitations. Generative models can support anomaly detection, synthetic traffic generation, data augmentation, data imputation, adversarial traffic generation, and IDS alert explanation. FL enables distributed IDS training without directly sharing local network traffic, making it suitable for privacy-sensitive and geographically distributed environments. This survey provides a structured review of generative AI and FL techniques for IDS. We first summarize representative IDS research directions, including adversarial machine learning, anomaly-based detection, IoT-oriented IDS, explainable IDS, and benchmark datasets. We then categorize generative AI applications in IDS according to model families and task objectives, covering autoencoder-based models, Generative Adversarial Networks (GANs), diffusion models, and Large Language Models (LLMs). Finally, we review emerging studies that integrate generative AI with FL-based IDS and discuss open challenges, including synthetic data quality, realistic traffic generation, dual-use adversarial risks, non-IID client distributions, communication-efficient model sharing, federated IDS benchmarking, and domain-specific LLMs for network security.

0

cs.CR 2026-07-02

Context-grounded LLMs find 15 logic vulnerabilities in 28 repos

by Michele Armillotta, Nicolò Romandini +2 more

Antaeus: Hunting Repository-Level Logic Vulnerabilities via Context-Grounded LLM Reasoning

Full repository view helps identify implicit security rules that isolated code snippets hide from standard LLM detectors.

abstract click to expand

LLM-based vulnerability detectors have shown promising results in identifying memory-safety bugs and vulnerability classes whose violations can often be expressed through established security properties. Logic vulnerabilities, however, pose a different challenge, as their identification requires inferring application-specific security invariants and implicit assumptions about intended behavior. Even frontier agentic models struggle because these invariants are often implicit and buried among unrelated code. Motivated by this gap, we present Antaeus, a framework for detecting logic vulnerabilities that grounds LLM reasoning in repository-level code context. Antaeus follows a repository-scale pipeline combining function prioritization, context-grounded reasoning, comparative validation, and structured reporting. It ranks functions using lightweight repo-wide security signals, directing costly LLM analysis toward relevant code and reducing calls, cost, and triage effort. For each prioritized function, Antaeus combines local code context with a repository-level view of the application's functionality, security resources, and trust boundaries. This enables reasoning about how the function is executed within the broader application rather than as an isolated snippet. Antaeus identifies security-sensitive sinks, derives safety conditions for safe execution, and checks whether they are locally satisfied. Candidate findings undergo comparative validation, pruning concerns that reflect project-wide norms rather than distinctive violations. Finally, Antaeus reports sinks, violated safety conditions, and evidence, making findings actionable and traceable. We evaluate Antaeus on 28 repositories with confirmed logic vulnerabilities and compare it against function-level and agentic models. Antaeus detects and explains 15 vulnerabilities, outperforming baselines with comparable token usage and cost.

0

cs.CR 2026-07-02

Fragmented security leaves AI-native 6G exposed

by Bidushi Barua, Ahsan Khan +7 more

Toward a Unified Security and Privacy Framework for AI-Native 6G Networks

Survey shows isolated solutions cannot cover integrated threats from communication, computing, sensing, and AI, so a unified cross-layer fra

abstract click to expand

Sixth Generation (6G) communication networks are expected to evolve into AI-native, highly autonomous ecosystems that integrate communication, computing, sensing, and artificial intelligence. While these capabilities enable unprecedented connectivity and intelligent services, they also create a highly heterogeneous security and privacy landscape that cannot be addressed through isolated, technology-specific solutions. This paper presents a comprehensive survey of security and privacy in AI-native 6G networks from a cross-layer perspective. We first examine the fragmentation of existing security and privacy approaches across emerging technologies, network architectures, AI systems, and standardization efforts, motivating the need for a unified security and privacy framework. Building upon this framework, we develop a cross-layer threat taxonomy encompassing infrastructure, network and architectural, AI, privacy, and security management domains, and analyze representative threats across key AI-native 6G technologies. Furthermore, we map these threats to corresponding cross-layer countermeasures, including standards harmonization as a security function, and identify critical research gaps and future priorities for secure, interoperable, and trustworthy AI-native 6G ecosystems. Finally, we discuss future research directions toward realizing secure, privacy-preserving, resilient, and globally interoperable 6G networks. This survey provides researchers, practitioners, and standardization communities with a holistic foundation for the design, evaluation, and deployment of trustworthy AI-native 6G systems.

0

cs.DS 2026-07-02

Private continual counting needs Ω((log n)^{3/2}) error

by Konstantina Bairaktari, Kasper Green Larsen

The Binary Tree Mechanism is Optimal for Approximate Differentially Private Continual Counting

Lower bound matches binary tree upper bound, proving asymptotic optimality under approximate DP and tightest separation from hereditary disc

abstract click to expand

Private continual counting is a fundamental problem in differential privacy: given a binary stream of length $n$, where each $1$ corresponds to the contribution of one individual, the goal is to release all running counts while protecting the privacy of each individual. The standard algorithm is the binary tree mechanism, whose Gaussian-noise variant achieves expected $\ell_\infty$ error proportional to $\log^{3/2} n$ for approximate differential privacy. Whether this dependence on the stream length is necessary has remained a central open problem. In this work, we resolve the dependence on $n$ by proving that every differentially private mechanism for continual counting must incur expected $\ell_\infty$ error $\Omega(\log^{3/2} n)$. This shows that the binary tree mechanism is asymptotically optimal in the approximate-DP setting. As a consequence, we also obtain a largest-possible separation between hereditary discrepancy and private $\ell_\infty$ error for linear queries, showing that the known general upper bound in terms of hereditary discrepancy has the optimal dependence on the number of queries.

0

cs.CR 2026-07-02

Bitcoin privacy tools used in under 1% of transactions

by Ben Hawkins, Joshua Levett +1 more

No Country for Old Privacy: The Evolving Challenges of Anonymity in Bitcoin

Adoption falls after regulations with no standard for stealth addresses, suggesting shift to hidden methods.

abstract click to expand

We present a longitudinal measurement study on the adoption of detectable, second-generation anonymisation protocols in the Bitcoin network, including CoinJoin, CoinSwap, CoinShuffle and Stealth Addresses. By implementing and refining a suite of heuristic filters, we identify over 5.94 million CoinJoin and 23.3 million CoinSwap transactions. Besides, the use of CoinShuffle was unexpectedly found to be closely aligned with the Wasabi wallet operation period. Our analysis reveals consistently low adoption rates, with these protocols constituting less than 1% of network transactions, and a sharp decline in detectable usage following key regulatory events. Furthermore, we find no evidence of standardised Stealth Address adoption, indicating a failure to converge on a common privacy standard. This study provides a comprehensive picture of a niche ecosystem whose on-chain visibility has been largely suppressed, strongly suggesting the migration of privacy-seeking users to less transparent and less detectable methods.

0

cs.CR 2026-07-02

Synthetic data trains detectors to 0.96 F1 with matching SHAP explanations

by Jose Luis Vela Alonso, Carmen Pellicer

Forensic-Oriented Intrusion Detection Using Synthetic Network Traffic Data and Explainable Artificial Intelligence

TSTR evaluation on CICIDS2017 stays within real-data variance while preserving attack fingerprints needed for expert testimony.

abstract click to expand

Digital forensic investigations of network intrusions require analytical outputs that are traceable, reproducible, and court-defensible - requirements existing machine learning pipelines do not satisfy, since they treat original evidence as training data and produce opaque classifications without instance-level justification. This paper presents a forensic-oriented intrusion detection framework resolving both problems simultaneously, integrating synthetic data generation, binary classification, and explainability within a single pipeline governed by ISO/IEC 27037, 27041, 27042, and NIST SP 800-86. The framework operationalises the ISO/IEC 27037 requirement for strict separation between original digital evidence and derived analytical artefacts. Original datasets are treated as immutable, hash-verified artefacts; all training operates on parameterized synthetic derivatives via SDV + CTGAN. XGBoost binary classification provides high-performance detection on tabular network flow data, and SHAP TreeExplainer produces instance-level feature attributions mapping statistical predictions to observable network behaviour for forensic reporting. Train-on-Synthetic, Test-on-Real (TSTR) evaluation on CICIDS2017 achieves F1-macro = 0.96, within cross-validation variance of the real-data baseline (0.97). Kolmogorov-Smirnov testing confirms synthetic privacy preservation (mean |KS| = 0.38) alongside operational utility. Cross-dataset validation on UNSW-NB15 and Kitsune identifies feature space dimensionality as the primary determinant of synthetic training effectiveness, establishing a practical deployment boundary of approximately 30 numeric flow-level features. SHAP attributions for Brute Force, Port Scan, and DoS attacks are consistent across real and synthetic instances, confirming synthetic training preserves forensically relevant attack fingerprints required for expert witness testimony.

0

cs.DB 2026-07-02

Turns approved tasks into budgeted database sessions

by Minmin Wu

SessionBound: Turning Enterprise Task Approval into Budgeted Database Sessions

AI agents generate SQL freely but stay inside pre-set budgets enforced directly by the database without LLM safety checks.

abstract click to expand

Enterprise AI agents are useful for internal analysis, audit, compliance review, and operational investigation, but they create a difficult authorization problem. A manager or data owner may approve a business task, while the agent later generates open-ended SQL below the application layer. Existing systems help identify agents, delegate authority, govern data products, or enforce database policy, but they do not directly turn an approved enterprise task into a bounded database execution context. SessionBound fills this gap. It turns approved enterprise tasks into short-lived, budgeted, and auditable database sessions for AI agents. A control plane defines task templates, accepts task applications, records approvals, assigns budgets, and issues signed task tokens. A database runtime, SessionBoundDB, binds a token to a session and enforces safe views, row scope, denied fields, operation limits, query budgets, disclosure budgets, and receipts. The database does not rely on an LLM to decide whether a query is safe. The agent may generate SQL freely, but each attempt must stay inside the approved boundary. A PostgreSQL prototype passed a 24-scenario validation suite. Microbenchmarks show p50 SessionBound execution around 1.4--1.5 ms versus raw PostgreSQL p50 around 0.052--0.074 ms on small synthetic queries: high relative overhead, but low absolute latency.

0

cs.CR 2026-07-02

Protocol enables cross-TEE mutual attestation without full stacks

by Daniel Andrade, João N. Silva +1 more

Know Thy Neighbor: Cross-TEE Mutual Attestation

Hema lets TA instances on same or different TEE types verify each other efficiently while maintaining security.

abstract click to expand

Cloud services are composed of multiple heterogeneous distributed components and instances that communicate with one another. This occurs both in applications and services running in traditional execution environments and in trusted applications (TAs) running in trusted execution environments (TEEs). TA instances use attestation before exchanging information to ensure all parties meet the expected security conditions. The straightforward solution to mutually attesting two TA instances that are willing to communicate is employing remote attestation mechanisms in both directions. This is typically the case when the two TA instances are running on TEEs of the same type. In order to support cross-TEE attestation, such an approach, that is, using remote attestation in both directions, would require each TEE type (e.g., SGX, TrustZone) to support the attestation software stack of all other TEE types with which it needs to interact. A dedicated cross-TEE mutual attestation solution has multiple benefits in terms of efficiency and security. This paper presents the Heterogeneous Mutual Attestation (Hema) protocol, a formally-verified protocol for the mutual attestation of TA instances running on the same TEE type or on different TEE types.

0

cs.CR 2026-07-02

Privacy Sandbox APIs stagnated before retirement

by Rachid Youssef Grib, Alberto Verna +3 more

The Rise and Fall of Google's Privacy Sandbox

Weekly crawls show most APIs used by few actors with steady decline, except CHIPS, leaving ad privacy challenge open.

abstract click to expand

On October 17th, 2025, Google announced the retirement of most Privacy Sandbox APIs, concluding nearly five years of experimentation with its alternative to privacy-invasive data collection on the Web. Designed to balance privacy with advertising functionality and cross-site tracking, the initiative faced repeated redesigns and limited ecosystem support. In this work, we present the first longitudinal, consent-aware measurement of the Privacy Sandbox's deployment across the Web. Using a custom call listener and weekly crawls of the top-10,000 websites, we monitor the usage of all major APIs in the months preceding their retirement. Adoption had already stagnated well before Google's announcement: most APIs were used by only a handful of actors, whose activity declined steadily throughout our study. Even the APIs that Google plans to maintain show no sign of growth. The sole exception is Cookies Having Independent Partitioned State (CHIPS). Overall, the demise of the Privacy Sandbox leaves unresolved the challenge of enabling privacy-preserving interest-based advertising.

0

cs.AR 2026-07-02

Redundant arithmetic removes corrections from NTT hardware

by George Alexakis, Dimitrios Schoinianakis +1 more

High-Performance NTT Accelerators for PQC leveraging Unified Redundant Arithmetic and Fine-Tuned Microarchitecture

New representation eliminates conditional steps in Montgomery operations and folds scaling into butterfly units for faster FPGA results.

abstract click to expand

Post-quantum cryptography and privacy-preserving technologies are expected to play a central role in future secure communication systems. Lattice-based PQC schemes such as ML-KEM (CRYSTALS-Kyber) and ML-DSA (CRYSTALS-Dilithium) rely heavily on large-degree polynomial arithmetic, making the Number Theoretic Transform (NTT) a key computational primitive. Although existing hardware accelerators exploit parallelism and pipelining to support both NTT and INTT, their efficiency is often limited by the overhead of modular reduction and correction steps, inverse-transform scaling operations, and suboptimal FPGA implementations. This work addresses these limitations by proposing parallel iterative NTT/INTT accelerators based on optimized unified butterfly units. We introduce a novel redundant number representation that eliminates conditional corrections for both Montgomery modulo multiplication and combined subtract-multiply operations, and integrate inverse-transform scaling into existing arithmetic hardware to avoid dedicated scaling units. Furthermore, we design hierarchical Montgomery multipliers that map efficiently onto FPGA DSP resources, reducing hardware cost while enabling high operating frequencies. FPGA-based experimental results demonstrate higher clock frequencies, reduced execution times, and competitive resource utilization, supporting efficient NTT acceleration for PQC and related privacy-preserving applications.

0

cs.CL 2026-07-02

Model spots toxicity only when benign images combine

by Jiaxian Lv, Shiyao Cui +4 more

Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine

MiShield-8B outperforms commercial APIs by analyzing cross-image correlations that create harm.

abstract click to expand

Multi-image content has become an increasingly prevalent form of visual communication in social media, giving rise to a new safety issue, multi-image implicit toxicity (MIIT), where each image appears benign in isolation, but harmful semantics emerge when the images are interpreted jointly. MIIT is particularly challenging for existing commercial moderation APIs and models due to the lack of explicit risky cues in each image. This paper aims to study how to identify MIIT. We first provide a formal definition of MIIT and analyze three key challenges for its detection. To alleviate the scarcity of data in this area, we construct MIIT-dataset, an image-only multi-image safety dataset covering seven representative risk categories through an automatic generation pipeline. Finally, we train MiShield with progressively distilled reasoning supervision, enabling it to produce safety judgments accompanied by explicit analyses of the correlated entities that result in the hazards. Experiments show that MiShield-8B models outperform representative moderation services and even larger-scale models, revealing its effectiveness and practical value for this widely used visual format. Warning: This paper contains potentially sensitive content.

0

cs.AI 2026-07-02

Coupling harm and refusal directions strengthens LLM safety

by Shei Pern Chua, Fangzhao Wu

HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment

A subspace-targeted fine-tuning method outperforms six major safety baselines while preserving capability and avoiding over-refusal.

abstract click to expand

Understanding how aligned LLMs internally represent safety is critical for diagnosing alignment vulnerabilities, as it explains why jailbreaks succeed and informs the design of robust alignment strategies. Prior work shows that aligned LLMs encode harmfulness and refusal as separable directions in the residual stream at prompt-side token positions. We show that jailbreaks succeed at prompt encoding by suppressing either the refusal or harmfulness direction before any token is generated, with distinct attack classes occupying separable regions of the harmfulness-refusal plane. Extending the analysis to response-token positions, we find that the model recognizes harmful content while it is generating that content, even when it failed to recognize the input as harmful at the prompt side. Motivated by our findings, we introduce HARC (Harmfulness-And-Refusal Coupling), a fine-tuning method that pairs the two directions across both prompt and response positions. Since the intervention is confined to the harmfulness-refusal subspace, it leaves the rest of the residual stream intact and does not degrade general capability or inflate over-refusal. Across extensive experiments, HARC achieves the strongest robustness-capability-usability trade-off among six baselines spanning the major training-time and inference-time safety methods. The harmfulness and refusal directions at prompt and response positions transfer across the five model families and two scales we tested without architecture-specific tuning.

0

cs.CR 2026-07-02

Lightweight IIoT detectors fail cross-network tests

by MD Azizul Hakim, Md Shihab Uddin +1 more

Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

They depend on port categories that occur 96-435 times more often in training attacks than in new domains.

abstract click to expand

Lightweight machine learning models are increasingly proposed for intrusion detection in Industrial Internet of Things (IIoT) networks due to their suitability for resource-constrained edge deployment. Most reported results evaluate these models only within their training network, leaving behavior on unseen networks unverified. This study trains four lightweight architectures on one IIoT dataset and evaluates them, without retraining, on two structurally distinct IIoT datasets using a feature representation restricted to attributes available across all three sources. Explainability analysis across two top-performing models shows both rely overwhelmingly on coarse port-category features; the most influential category occurs in source-domain attack traffic at 96 to 435 times the rate in the two target domains, indicating that coarsening port resolution relocates rather than removes a documented shortcut. Evaluation under naturally imbalanced class distributions reveals a further effect: the evaluation protocol used can reverse which target network appears to pose the greater generalization challenge. Adversarial robustness and recovery through limited target-domain exposure are also assessed; robustness to adversarial perturbation is unrelated to cross-network generalization, and recovery through adaptation varies considerably by architecture. These findings suggest deployment readiness should be assessed using cross-network evaluation under realistic class distributions, rather than within-domain accuracy alone.

0

cs.CR 2026-07-02

Four gates in oversight model cut LLM jailbreaks to 2 percent

by Michele Guida, Ruslan Shikhhamzayev +5 more

Cognitive Firewall: A Proactive, Zero-Trust, Multi-Gate Framework for LLM Safety

The approach handles multi-turn and human-crafted attacks while limiting over-refusals to 8 percent on safe queries.

abstract click to expand

Large language models (LLMs) can be induced to produce harmful content through multi turn strategies in which no single user message appears clearly unsafe. Existing runtime safeguards commonly evaluate prompts or responses as isolated messages, which limits their ability to recover ac-cumulated intent, verify asserted authority, or detect harmful objectives decomposed across a dialogue. This paper presents the Cognitive Firewall, a proactive runtime oversight framework that interposes an independent oversight model between a user and a protected target mod l. The framework decomposes safety assessment into four categorical gates: an intent gate that identi-fies the operational objective of a request, a zero trust context gate that treats claimed roles and permissions as unverified evidence, a consistency gate that detects escalation and decomposition across turns, and an output risk gate that inspects candidate responses before release. Gate decisions are combined through escalation rather than score averaging, allowing any confident danger signal to block an interaction while preserving an auditable rationale. Experiments on four jailbreak benchmarks and a benign safety test set show that the Cognitive Firewall substantially reduces attack success across single turn, multi turn, authority based, and human crafted attacks. It lowers attack success to 2 percent or below on three attack sets and to 14 percent on the most difficult human crafted set, while maintaining an 8 percent over refusal rate. These results indicate that decomposed, conversation level oversight can improve proactive containment and auditability for LLM safety.

0

cs.CR 2026-07-02

Simulated moderation jailbreaks function-calling LLMs

by Junlong Liu, Haobo Wang +2 more

Beyond the Prompt: Jailbreaking Function-Calling LLMs via Simulated Moderation Traces

Fabricated audit traces weaken safety constraints in commercial models using few queries.

abstract click to expand

Jailbreak attacks remain a critical threat to the safe deployment of large language models (LLMs). While prior work has primarily studied attacks and defenses at the prompt level, we show that this prompt-centric paradigm overlooks a structural vulnerability in stateful, function-calling environments. In such applications, developer-defined schemas, structured arguments, and untrusted tool outputs are interleaved into a single shared model context. This architecture expands the attack surface by blurring the boundary between trusted control logic and untrusted data, allowing adversarial intent to be distributed across a multi-turn execution path. We exploit this architectural flaw through SMT, a black-box attack framework based on Simulated Moderation Traces. Departing from purely prompt-based interactions, SMT constructs a multi-turn trajectory that simulates a legitimate moderation-auditing workflow. Within this trajectory, a fabricated moderation frame leverages red-team testing as a pretext to elicit harmful generations. The subsequent validation feedback treats safety refusals as execution failures, prompting refinements that gradually weaken the model's safety constraints and ultimately trigger harmful outputs. Extensive empirical evaluations on prominent commercial LLMs from five different providers across two standardized safety benchmarks show that SMT consistently achieves the highest average attack success rate and HarmScore while requiring a near-minimal number of queries, substantially outperforming existing baselines. These findings demonstrate that prompt-level sanitization alone is fundamentally insufficient for defending tool-enabled LLM systems and highlight the urgent need for context-aware validation across schemas, arguments, tool outputs, and accumulated conversation state. The code is available at https://github.com/liujlong27/SMT.

0

cs.CR 2026-07-02

Multi-agent LLM system tracks APTs at 0.92 recall

by Jiahui Wang, Zhenyuan Li +3 more

Minos: A Multi-Agent Collaborative Framework for Provenance-Based Backward Tracking

It replaces exhaustive provenance traversal with hypothesis-guided agents, yielding 49 percent more compact attack subgraphs than prior meth

abstract click to expand

Sophisticated cyber attacks, particularly Advanced Persistent Threats (APTs), require effective post-intrusion forensic analysis. Provenance-based backward tracking reconstructs attack scenarios by tracing causality from security alerts, but existing methods rely on low-level statistical features and rigid traversal strategies, limiting their ability to capture high-level adversarial intent and suffering from dependency explosion. We present Minos, a multi-agent framework that formulates backward tracking as an LLM-driven reasoning process. Minos adopts a two-tiered architecture: for event-level analysis, it combines hierarchical context management, retrieval-augmented reasoning with citation verification, and adversarial deliberation to improve reasoning quality; for graph exploration, it coordinates four specialized agents under a finite state machine (FSM), replacing exhaustive traversal with hypothesis-guided reasoning and count-first query protocols to efficiently prune the search space. Experiments on 14 attack scenarios across five public datasets show that Minos achieves an average recall of 0.92 and precision of 0.64, significantly outperforming state-of-the-art baselines while producing attack subgraphs that are 49% more compact. Moreover, Minos generates interpretable reasoning throughout the tracking process, facilitating forensic auditing and system refinement. These results demonstrate the effectiveness of LLM-driven reasoning for automated provenance-based backward tracking.

0

cs.CR 2026-07-02

Three poisoned documents hijack agentic RAG reasoning

by Chanwoo Choi, Euntae Kim +7 more

KidnapRAG: A Black-Box Attack for Hijacking Reasoning in Agentic Retrieval-Augmented Generation Systems

A black-box attack uses bait, link, and malicious evidence documents to redirect iterative retrieval without any model access.

abstract click to expand

Retrieval-Augmented Generation (RAG) systems are vulnerable to poisoning attacks that inject malicious documents into the retrieval process to manipulate model outputs. Recent Agentic RAG systems are more robust to such attacks because they iteratively perform retrieval and reasoning, allowing them to ignore weakly relevant poisoned documents and preserve the reasoning chain induced by the user query. However, existing attacks on Agentic RAG systems often assume white-box access to system prompts, reasoning traces, retrievers, or model parameters, limiting their applicability in realistic settings. In this paper, we study black-box poisoning attacks against Agentic RAG systems, where the attacker can only publish externally retrievable poisoned documents. We propose KidnapRAG, a sequential poisoning attack that hijacks the agent's multi-step reasoning chain using three role-specific documents: Bait, Chain-Link, and Mal-Ins, which attract initial retrieval, induce query reformulation, and provide attacker-controlled evidence, respectively. Experiments across multiple Agentic RAG frameworks, LLM backbones, and benchmarks show that KidnapRAG consistently outperforms existing poisoning baselines under black-box conditions. Further analyses show that KidnapRAG progressively weakens the original retrieval intent, redirects retrieval behavior, and increases reliance on attacker-controlled evidence. Our code is publicly available at https://github.com/chanwoochoi316/KidnapRAG.

0

cs.HC 2026-07-02

LLM use reaches over 80% on some survey platforms

by Zane Xu, Nathan Malkin

A Penny for Your Prompts: Experiments Detecting and Mitigating LLM Usage by Survey Respondents

Response patterns and keystroke logs allow detection, yet efforts to block AI do not always improve answer quality.

abstract click to expand

Large language models are increasingly used by participants on crowdsourcing platforms when responding to surveys, potentially undermining the validity of collected data. Our study aims to quantify the prevalence of this behavior and investigate methods to detect and prevent it. In a series of surveys (N = 250), we examined conditions such as platform choice, survey length, requests not to use AI, and disabling copy-paste functionality. We were able to identify distinct characteristics of LLM-assisted responses and found that their frequency varied widely, from under 10% on Prolific to over 80% on Mechanical Turk. Mitigation measures reduced LLM usage but did not necessarily improve data quality. No participants employed browser-use agents at the time of our survey, but we report on our own detection experiments. We recommend that researchers actively screen survey responses for LLM usage by recording and analyzing keystroke data and crafting instructions and questions aimed at AI.

0

cs.CR 2026-07-02

First survey maps attacks and defenses for mobile on-device AI

by Yujin Huang, Xin Zheng +2 more

SoK: Attack and Defense Landscape of Mobile On-device AI Systems

It organizes security pillars, local-model risks, and protections into one framework that future work can build on.

abstract click to expand

Mobile on-device AI (MoAI) systems that integrate locally deployed AI models with conventional mobile software components are emerging as a key paradigm for delivering intelligent functionality directly on end-user devices. By moving inference from remote cloud services to the local mobile environment, such systems enable privacy-preserving, low-latency, and offline-capable AI functionality, yet introduce new security risks arising from the local storage of AI models. This paper presents the first comprehensive systematization of knowledge on MoAI security, covering security pillars, attack landscape, and defense landscape of MoAI systems. We further identify unresolved gaps in current attack and defense research and point to promising directions for future research in this emerging area. Our work establishes the first systematic framework for understanding the attack and defense landscapes of MoAI systems, serving as a foundation for building secure MoAI systems and advancing research in this critical domain. Companion resources are available at https://github.com/Jinxhy/Awesome-MoAI-Security.

0

cs.CR 2026-07-02

ReShift redirects reasoning chains inside vision-language models

by Zhihao Dou, Qinjian Zhao +2 more

ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models

The attack uses poisoned data construction and joint optimization to shift internal CoT trajectories on a trigger without harming clean accu

abstract click to expand

Vision--Language Models (VLMs) are increasingly deployed in safety-critical applications, yet remain vulnerable to backdoor attacks. Existing methods primarily manipulate final outputs, often producing reasoning traces that are inconsistent or easily detectable. In this paper, we propose ReShift, the novel aha-moment-driven reasoning-level backdoor framework that explicitly redirects the internal chain-of-thought (CoT) trajectory while preserving surface-level coherence. ReShift introduces a Poisoned Reasoning-Aware Data Construction (PRDC) pipeline and a Supervised--Reinforcement Joint Optimization (SRJO) strategy to induce stable trigger-conditioned reasoning shifts. We further formalize Entropy Rebound as a principled signal for characterizing reasoning redirection and provide theoretical guaranties linking entropy gaps to trajectory-level divergence. Extensive experiments demonstrate that ReShift achieves high attack success rates while maintaining clean-task performance and realistic reasoning traces, substantially improving stealthiness against existing defenses.

0

cs.CR 2026-07-02

Queries reveal which embedding model runs behind an IR API

by Cedric Fitiavana Raelijohn, Sébastien Gambs +1 more

Embedding Inference Attack

Black-box attacks identify the model from unordered document sets even when a reranker is present.

abstract click to expand

Embedding models are essential components of modern Information Retrieval (IR) systems, yet they are typically hidden behind APIs. Recent works have shown that dense IR system can lead to security vulnerabilities such as embedding inversion attacks. However, such attacks usually require that the attacker knows the embedding model for the attack to be applicable. In this paper, we study IR systems under a black-box setting in which the adversary observes only the unordered set of retrieved documents, without ranking or similarity scores. We demonstrate that in such contexts, tailored queries allow an adversary to identify which embedding model is in use from a set of known model candidate, which we coin as an embedding inference attack (EIA). We also show that certain queries remain discriminative even when the system includes a reranker as a potential defense mechanism. We further validate our method on a real Retrieval-Augmented Generation (RAG) system, in which the tailored queries bypass the LLM's tendency to reject inputs it does not recognize as well-formed questions. Finally, we propose and evaluate other mitigation strategies such as similarity thresholds.

0

cs.CR 2026-07-02

Malicious apps hijack AI phone agents without permissions

by Zidong Zhang, Zhentao Xie +2 more

(A)I Sees What You Don't: Exploiting New Attack Surfaces in Third-Party Mobile Agents

Gaps between human and machine vision let attackers seize control of screenshot-driven automation while the screen looks unchanged.

abstract click to expand

Third-party mobile agents powered by Vision-Language Models (VLMs) have emerged as a promising paradigm for automating smartphone interactions. These agents act as high-privilege decision-makers, perceiving device states through screenshots and executing actions via VLM reasoning, transforming how an agent app interacts with the environment (i.e., other apps or the OS). Correspondingly, this transformation introduces new attack surfaces or transforms benign/harmless interfaces into exploitable ones for mobile devices. In this paper, we summarize key differences between third-party mobile agent apps and general apps when interacting with the environment, analyze the security posture of agents, and identify two unique attack surfaces compared to general mobile apps: the Screen Perception Attack Surface, which exploits the gap between human and machine vision, and the Misused Channel Attack Surface, which intercepts or manipulates the agent's execution pipeline. We design and implement seven concrete attacks, from subliminal text injection and invisible pixel zone exploitation to screenshot tampering and host PC command injection. Our evaluation of five popular mobile agent frameworks demonstrates that a malicious app can hijack agent actions and achieve arbitrary command execution even without any privilege permissions, while remaining visually indistinguishable to users. These findings reveal a fundamental trust mismatch in autonomous agent design and highlight the urgent need for perception-aware security models on multi-tenant platforms.

0

cs.CR 2026-07-01

Smartphones reconstruct 3D printer G-code with 98.89% accuracy

by Amirhossein Jamarani, Diba Afroze +3 more

A Non-Line-of-Sight, Multi-Modality-based Side-Channel IP Theft Attack on Additive Manufacturing Using Dual Smartphones

Dual phones capture acoustic and magnetic signals from 60 cm away in non-line-of-sight to recover printing commands.

abstract click to expand

Additive Manufacturing (AM) has revolutionized major sectors, including aerospace, automotive, and healthcare, by enabling adjustable production. As the usage of AM increases, so does the risk of Intellectual Property (IP) leakage during the printing process due to unintended side-channel emissions. Current studies and attack scenarios on 3D printers face three challenges: low success and accuracy rates in final G-code reconstruction, limited distance range for attacking the 3D printer's IP, and reliance on specialized, overt data-collection tools. This paper presents a side-channel attack that addresses the noted limitations by using two smartphones' internal sensors. We position the smartphones 60 cm away in a non-line-of-sight setup to collect the 3D printer's acoustic and magnetic emissions. Our attack successfully reconstructs the G-code commands of the final objects at a rate of 98.89% on command-level reconstruction accuracy. Additionally, we evaluate the transferability of our attack strategy by applying it to another 3D printer in a different environment. Our proven unauthorized access to the reconstructed G-code and thus to the IP of the AM system indicates the security weaknesses in 3D printing, highlighting the need for mitigating side-channel attacks.

0

cs.NI 2026-07-01

Relay extracts semantic meaning from latent codes without source data

by Yalin E. Sagduyu, Tugba Erpek +2 more

Semantic Leakage and Privacy Preservation in Relay-Assisted Semantic Communications

This exposes a privacy flaw in semantic comms; adversarial training widens the accuracy gap at the relay while preserving receiver performan

abstract click to expand

Semantic communication (SemCom) has emerged as a promising paradigm in which the transmission of task-relevant information is prioritized over raw data, enabling efficient and robust communication under resource and channel constraints. In this paper, the privacy implications of relay-assisted SemCom systems are studied, where the intermediate relay node operates directly on learned latent representations. It is shown that the relay, even without access to source data, can reliably infer semantic meaning and reconstruct signals with performance comparable to that of the legitimate receiver, revealing a fundamental privacy vulnerability of semantic representations. To address this issue, an iterative adversarial training framework is proposed in which a strong, adaptively trained eavesdropper at the relay is explicitly accounted for. The proposed approach alternates between optimizing the relay's eavesdropping function and the legitimate system, resulting in representations that preserve semantic decoding performance at the intended receiver while degrading semantic inference at the relay. The semantic accuracy gap between the legitimate receiver and the eavesdropper is significantly enlarged across channel conditions. Importantly, this protection is achieved in a stealthy manner, with high reconstruction fidelity maintained while semantic leakage is selectively suppressed.

0

cs.CR 2026-07-01

Robocalls hit the US far harder than other countries

by Kemal Altwlkany, Andro Merćep +3 more

Robocalls: A Worldwide or US-only Problem? Analyzing Spam and Fraud in International Phone Calls

Data from 65 nations shows the problem is global yet far more severe inside the United States

abstract click to expand

Unsolicited automated phone calls (robocalls) are a serious threat: in the US alone, these calls resulted in reported losses of 1.1$ billion during 2025. Phishing and spoofing consistently rank among the most reported crimes within the FBI's Internet Crime Complaint Center, with phone call scams having the highest reported median loss. Combating robocalls is difficult due to many legal and practical constraints: robocalls often encompass multiple legal jurisdictions of different countries/states, the large volume of robocalls, their multilingual nature, the lack of publicly available data, privacy concerns with obtaining data, etc. We present a study of international robocalls, aggregating robocall reports from countries across all inhabited continents and contribute by providing new findings on international robocalls from 65 different countries. We also present the first publicly available multimodal and international robocall dataset: 8.7 million call detail records, 839 robocall transcripts from 28 identified robocall campaign clusters, and 677 robocall recordings. We describe our methodology for collecting robocall data over a 9-month period and provide a detailed analysis comparing robocalls in the US with those in other countries. Our analysis covers several aspects, including uncovering calling patterns, identifying co-targeting attacks, discovering common robocall campaigns, extracting callback numbers, analyzing linguistic differences among robocalls in the same language but different regions, and other insights. Our results indicate that although robocalls are an international problem, the severity of the threat is significantly higher in the US than in other countries. We provide steps for future research and suggest remedies to reduce the effectiveness of robocalls based on our analysis.

0

cs.CR 2026-07-01

FPGA parallelism leaks ML-KEM keys despite higher-order masking

by Davis Ranney, Yashaswini I Makaram +2 more

Exploring Side-Channel Protections in Hardware Implementations of PQC ML-KEM Verification

Experiments recover full secret keys from masked verification on FPGAs via first-order leakage created by parallel processing.

abstract click to expand

As ML-KEM is adopted as a post-quantum cryptographic standard, resilience against physical side-channel attacks has become essential. Among the constituent steps, the decapsulation Fujisaki-Okamoto (FO) verification is particularly vulnerable to side-channel power and electromagnetic (EM) analysis. In this work, we focus on common FPGA-based implementations and examine their side-channel vulnerabilities, and compare them with those of microcontroller implementations. Three verification implementations, unprotected, hash-based (first-order), and higher-order masked, are evaluated for side-channel security on both a microcontroller and an FPGA. While FPGAs offer higher speed and parallelism, they often exhibit stronger side-channel leakage, especially in high bandwidth configurations. The higher-order masked designs still leak information about the underlying data due to hardware-level effects and data-dependent processing. Our experiments show that their parallelized processing on FPGAs introduces sufficient first-order leakage for full secret-key recovery. These results underscore the persistent challenge of securing PQC algorithms in performance-constrained and parallelized hardware environments.

0

cs.CR 2026-07-01

LLM attacks organized across eight lifecycle stages

by Seyed Bagher Hashemi Natanzi, Bo Tang

A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

Mapping vulnerabilities from data collection to deployment shows why isolated defenses fail to protect systems that use tools and memory.

abstract click to expand

Large language models are no longer only text generators. They are increasingly embedded in retrieval pipelines, enterprise assistants, coding environments, robotic systems, security-operation workflows, and autonomous agents that can read private data, call tools, write files, execute code, and act across organizational boundaries. This shift changes the security problem: risks do not arise from the model weights alone, but from the full lifecycle and application stack through which data, prompts, model outputs, tools, memories, and user authority interact. This paper systematizes the literature on vulnerabilities in large language model systems through a lifecycle and application-stack lens. We organize attacks across eight stages: data collection, pretraining, post-training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, and deployment/maintenance. For each stage, we analyze attacker capabilities, affected security objectives, representative attacks, practical risks, evaluation practices, and defenses. We further map LLM-specific vulnerabilities to confidentiality, integrity, availability, safety, privacy, fairness, accountability, and agency-control objectives. Unlike taxonomies that list isolated attack names, the proposed systematization emphasizes where trust boundaries fail, how untrusted data becomes executable instruction, how delegated authority amplifies model errors, and why point defenses rarely compose. We close with a research agenda for secure LLM systems, including compositional security, provenance-aware retrieval, tool-call containment, long-horizon agent evaluation, privacy-preserving adaptation, realistic red teaming, and deployment-grade incident response.

0

cs.CR 2026-07-01

TDA-LSTM hybrid reaches AUC 1.000 on CIC-IDS2017

by Amar Jeet, Bhaskar Ranjan Karn +1 more

Hybrid Topological Data Analysis and LSTM Networks for Enhanced Network Intrusion Detection Using CIC-IDS2017 Dataset

Model combines persistence diagrams with LSTM layers to separate normal traffic from 14 attack types in 2.8 million flows.

abstract click to expand

Network intrusion detection systems (NIDS) are crucial in cybersecurity infrastructure, needing advanced techniques to detect hostile activity in network traffic. This research introduces a hybrid approach that combines Topological Data Analysis (TDA) with Long Short-Term Memory (LSTM) networks to improve anomaly detection in network security. Our multi-layered design combines TDA's persistent homology with LSTM networks to capture topological characteristics of network traffic patterns and simulate temporal sequences. We assessed our methodology using the CIC-IDS2017 dataset, which includes over 2.8 million labelled flows, 77 network variables, and 14 attack categories that reflect modern threat landscapes such as DDoS, brute force, web attacks, penetration, and botnet activities. Integrating Betti curves and persistence diagrams with deep learning architectures enhances feature extraction performance. Our hybrid TDA+LSTM model has an AUC of 1.000 and F1-score of 1.000, with 5-fold cross-validation producing a mean AUC of 1.000 $\pm$ 0.000 and mean F1 of 0.999 $\pm$ 0.001. An ablation research demonstrates the complimentary contributions of topological (F1=0.990) and temporal characteristics (F1=1.000). Comparative research shows that the suggested strategy beats TDA+Random Forest (F1=0.994) and Isolation Forest (F1=0.835) baselines in several attack categories.

0

cs.CL 2026-07-01

Dual embeddings keep LLM watermarks detectable after paraphrasing

by Jonas Schäfer, Cezary Pilaszewicz +1 more

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

Token and context vectors produce a signal that survives rephrasing and translation while preserving output quality.

abstract click to expand

This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation. DEW utilizes a signal-processing methodology, applying algebraic vector-space operations to token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The method obfuscates the watermark by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW. Experimental results across multiple LLMs indicate that DEW improves post-paraphrase detection while maintaining competitive text quality, and remains detectable after translation, even when prior semantic watermarks degrade significantly. These findings position DEW as a practical and robust solution for safeguarding LLM-generated text and addressing critical issues in responsible AI deployment.

0

cs.CR 2026-07-01

Restricted errors and code equivalence yield post-quantum signatures

by Sarah Arpin, Jason T. LeGrow +2 more

Digital signature schemes based on code equivalence and syndrome decoding from restricted errors

Review explains how sigma protocols for two coding problems become non-interactive signatures via Fiat-Shamir.

abstract click to expand

Digital signature schemes are an important cryptographic tool to ensure data authenticity and integrity in many applications that must be resilient to attacks, including those facilitated by quantum computers. We consider the two digital signature schemes based on error-correcting codes that are second-round candidates in NIST's call for Additional Signature Schemes, which is part of the Post-Quantum Cryptography Standardization Process. Specifically, we provide an overview of the Codes and Restricted Objects Signature Scheme (CROSS) and the Linear Equivalence Signature Scheme (LESS). We describe their underlying problems of syndrome decoding from restricted errors and code equivalence. We review sigma protocols and how they can be transformed into digital signature schemes via the Fiat-Shamir transform. Finally, we explain how this procedure yields code-based digital signatures believed to be post-quantum secure.

0

cs.CR 2026-07-01

Random Forest reaches 0.99 F1-score on IoT attack data

by Rana Alharbi, Chuadhry Mujeeb Ahmed

Comparative Analysis of Machine Learning based Intrusion Detection in Realistic IoT Networks

Comparative tests on emulated 78-device network with MQTT, CoAP and RTSP show it leads five ML models for intrusion detection.

abstract click to expand

The Internet of Things (IoT) is rapidly growing and expanding into various sectors, such as healthcare, transportation, smart homes, and more. Despite the benefits of using IoT devices, they present several challenges. Given the significant role these devices play in our lives, it is crucial to address issues related to their security and privacy. These devices are limited in resources, which complicates their security and the protection of the data that they manage. The paper aims to examine intrusion detection systems using the Gotham2025 dataset, generated through the Gotham testbed, which consists of 78 emulated IoT devices utilising various protocols, including MQTT, CoAP, and RTSP, to assist in safeguarding IoT networks from attacks. We conduct a comparative analysis between five machine learning algorithms, including Random Forest, XGBoost, Logistic Regression, Naive Bayes, and Deep Neural Network. We demonstrate that the Random Forest Classifier was the top-performing model, achieving an F1-score of 0.99 in classifying attacks.

0

cs.CR 2026-07-01

Graph maps CVEs to ATT&CK tactics and techniques

by Basant Agarwal, Dincy R. Arikkat +4 more

CVE-TTP KG: Knowledge Graph Linking Software Vulnerabilities to Attack Behaviors

Automated extraction turns vulnerability records into structured attack-behavior links for faster defense decisions.

abstract click to expand

In the evolving threat landscape, adversaries exploit software vulnerabilities to launch sophisticated attacks, challenging traditional defenses. Although databases like CVE and NVD provide detailed technical information, they often lack links to attacker behaviors such as tactics and techniques, limiting effective threat interpretation and response. This work bridges this gap by connecting vulnerabilities with behavioral patterns from the MITRE ATT&CK framework. We construct a CVE-TTP Knowledge Graph that links CVEs to tactics and techniques using classification and relation extraction. Transformer-based models are developed for behavior identification, with CySecBERT achieving macro F1-scores of 87.71% (techniques) and 96.16% (tactics). Also, we created an annotated dataset with 24,820 entities and 43,608 relations for entity and relation extraction. The pipeline-based approach achieves macro F1-scores of 0.86 (entity extraction) and 0.99 (relation extraction), while a span-based joint model achieves 0.78. These outputs are integrated into a Neo4j-based Cyber Threat Knowledge Graph, enabling structured visualization of vulnerabilities.

0

cs.CR 2026-07-01

Block.co accepts forged certificates as valid

by Giacomo Zonneveld, Giulia Rafaiani +1 more

A forgery attack on the Block.co blockchain-based digital credential certification system

Vulnerability shows decentralized blockchain credential systems cannot ensure issuer authenticity without central authority.

abstract click to expand

Certification of digital documents, such as academic credentials, seems a particularly suitable application for the use of blockchain and distributed ledger technologies. Indeed, these technologies enable decentralized certification systems that rely on the immutability and persistence of their distributed ledgers. However, in the absence of a central trusted authority, it is not easy to guarantee the authenticity of the connection between the real identity of an academic institution and the digital identity of the certificate issuer. In this paper, we demonstrate that one of such systems, known as Block.co, has a vulnerability that allows the production of forged certificates that are recognized as valid by the system. Since this is an inherent limitation of the approach used for blockchain-based certification, our attack is likely to be extendable to other systems adopting the same approach.

0

cs.CV 2026-07-01

Prompt-free diffusion hides secret images without text leaks

by Jingwen Cai, Fen Xiao +2 more

No Prompt, No Leaks: A Robust Generative Steganography Framework via Prompt-Free Diffusion

Style priors plus a predictor-corrector replace prompts for controllable, accurate stego generation.

abstract click to expand

Generative image steganography synthesizes stego images directly from secret information to achieve inherent security advantages. Latent Diffusion Models (LDMs) have recently emerged as a fundamental image steganography framework that modulates secret latent representations with text prompts. Limited by the inflexibility of text prompts, these methods still struggle to generate high-quality stego images and accurately recover secret images. In this work, we propose a prompt-free diffusion image steganography framework that integrates style semantic priors to control more robust and reliable stego image generation. Specifically, a Cascaded Affine Coupling Module (CACM) establishes a bijective, deterministic mapping between a secret image and its latent representation. Then, style semantics are integrated into the diffusion process to control latent representation and ensure visual imperceptibility in the generated stego images. To mitigate trajectory deviations stemming from the unconditioned reverse process, a predictor-corrector mechanism is introduced to iteratively refine the generation trajectory via feedback from the current and predicted next states. Extensive experimental results show that the proposed method achieves competitive performance compared to state-of-the-art methods in terms of security, secret image reconstruction accuracy and controllability.

0

cs.CR 2026-07-01

Workflow combines CPU and GPU TEEs for private AI

by Robert Schambach (TU Dresden), Quoc Do Le (STACKIT Cloud) +3 more

EnclaveX: End-to-End Confidential AI with CPU/GPU TEEs

End-to-end system protects models and data from cloud operators and Kubernetes admins while measuring overhead on TDX with H200.

abstract click to expand

Large Language Models (LLMs) have rapidly proliferated, driving widespread adoption of AI applications. Most deployments rely on centralized infrastructures such as Microsoft Azure, Google Cloud, or AWS, requiring users to share sensitive data and training or fine-tuning code. This dependence raises significant security and privacy concerns, as cloud providers must be trusted to ensure confidentiality and integrity. Trusted Execution Environments (TEEs) e.g., Intel SGX/TDX, AMD SEV-SNP, and ARM CCA have been introduced to mitigate these risks. More recently, NVIDIA has developed GPU TEEs (e.g., H100/H200), yet comprehensive evaluations of end-to-end workflows that integrate CPU and GPU TEEs remain limited. Critical aspects, including performance overhead, remote attestation, and security guarantees for AI/LLM applications, have not been sufficiently studied. This paper addresses this gap by presenting an end-to-end workflow that combines CPU and GPU TEEs. We propose mechanisms to ensure confidentiality and integrity at both the VM level (via Intel TDX and AMD SEV-SNP) and the application level, highlighting vulnerabilities such as Kubernetes administrators' ability to access confidential VM contents. Finally, we evaluate the performance overhead of our system using industry benchmarks, focusing on configurations that integrate Intel TDX with NVIDIA H200 GPUs.

0

cs.CR 2026-07-01

Low KC strings can have high witness complexity

by Fabio F.G. Buono

Witness Complexity of Short Descriptions: A Cryptographic Perspective

gam(x) measures runtime of near-shortest descriptions and can exceed polynomial bounds even when KC is low, with a conditional separation fr

abstract click to expand

In cryptographic practice, where protocols impose strict time bounds, implementations demand predictable resource usage, and real-world systems require immediate verification for security and usability, a short key or certificate is useful only if it can be expanded or verified within a bounded time; otherwise a compact representation that requires superpolynomial work to expand offers no operational guarantee within a bounded-time protocol. This paper formalises that gap by introducing \emph{witness complexity} $\gam(x)$, the minimum running time over near-shortest descriptions of a string on a universal Turing machine. $\gam$ differs from Shannon entropy and Kolmogorov complexity $\KC$: low $\KC$ can coexist with high $\gam$. We prove invariance up to polynomial factors; a conditional separation (assuming $\PneqNP$). An unconditional lower bound from incomputability of $\KC$; a biconditional characterisation of $\PeqNP$ via the class-relative variant $\gP$; and polynomial-time tractability for structured $\classNP$ families. Part II develops companion measures and shows an unconditional gap between grammar size and derivation cost, positioning $\gam$ as a metric for the usability of keys and certificates.

0

cs.CR 2026-07-01

Vector figures encode data for automated certified recovery

by Bowen Sun, Chaowei Xiao

Automated High-Precision Extraction and Forensic Verification of Data-Bearing Vector Figures

Recovery is injective outside a tiny near-zero interval and a re-rendering certificate binds values to drawn markers and lines

abstract click to expand

The quantitative record of science and engineering is increasingly carried by figures rather than text or tables, and a reader who needs the underlying numbers must usually re-digitize them by hand: slowly, imprecisely, and with no way to prove the result is faithful. Yet when a figure is stored as vector graphics, its data are not approximated by the picture but encoded in it: the renderer writes each marker and vertex at a printed precision that, for the dominant scientific toolchain, exceeds the data's own. We turn this into three contributions, one per shortcoming of hand digitization. First, a precision theory bounding how accurately data can be recovered for a given renderer and export format: bit-exact float32 for matplotlib markers, and a calibration-limited three to four significant figures end to end. Second, an automatic extractor that decodes a figure in one pass with no human in the loop, in place of the slow point-by-point tracing a digitizer demands. Third, a verification theory: recovery is injective except on a characterized, vanishingly small interval near zero; accidental agreement between unrelated data is astronomically unlikely; and a re-rendering certificate binds the recovered values to the markers, lines, and ticks the figure draws, not its text, making a result non-repudiable. With no ground truth used during recovery, decoded figures match external archives (Planck 2018 to 10^-9; the Keeling CO2 record to 5*10^-4, and one decoded figure independently reproduces a correction to the Chinchilla scaling-law confidence interval. We map the achievable precision across common renderers and their PDF, SVG, and EPS formats. What we deliver is certified data; the scientific significance of any particular dataset lies outside this paper's scope, and recovered values are candidates for human review, never accusations.

0

cs.CR 2026-07-01

Orthogonalization detects LLM backdoors without blacklists

by Zhengxing Li, David J. Miller +2 more

CSO-LLM: Class Subspace Orthogonalization for Post-Training Backdoor Detection and Trigger Inversion in LLMs

Class subspace method in embedding space boosts detector accuracy and recovers ground-truth triggers across domains and architectures.

abstract click to expand

While post-training backdoor detection and trigger inversion schemes have been developed for AIs used e.g. for images, there is a paucity of such methods for LLMs. First, the LLM input space is discrete, with up to 150,000^k k-tuples to consider with k the token-length of a putative trigger. Second, one must blacklist tokens typical of the putative target response (class) of an attack, as such tokens may give false detection signals. However, a comprehensive blacklist is not available, in general, for a given domain. We develop a highly effective detection and inversion framework for LLMs treated as classifiers. Central to our approach is class subspace orthogonalization (CSO), a novel plug-and-play paradigm for backdoor detection that serves two fundamental roles when applied to LLMs: i) it enhances both sensitivity and specificity of a baseline detector; ii) it provides a form of implicit blacklisting, as it penalizes against inclusion, in a candidate trigger, of tokens that induce signal perturbations "in the direction of" the putative target class of an attack. One version of our detector performs continuous optimization in token embedding space, while a companion trigger-inversion and detection method performs greedy accretion in discrete token space. Our methods give both strong detection performance and accurate inversion of ground-truth triggers on several LLM classification domains, and for several different LLM architectures.

0

cs.CR 2026-07-01

Per-component triple fingerprint tracks reused agent skills across edits

by Hongliang Liu, Yuhao Wu +1 more

The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills

120-byte signature on prompt, code and tools recovers family identity when one part stays shared but flags independent rewrites, at 77x lowe

abstract click to expand

AI agents increasingly acquire and execute skills at runtime: bundles of prompt instructions, executable code, and tool declarations fetched from marketplaces and other agents. Governing them needs a stable notion of skill identity, yet cryptographic hashing is engineered to destroy the very similarity we need, as a one-character edit scrambles the digest. We present a compact, locality-sensitive fingerprint that embeds each component of a skill and projects it to bits with a multi-bank SimHash, giving a fixed 120-byte signature compared in constant time by Hamming distance. Our central claim is that keeping the fingerprint as a per-component triple (prompt, code, tools), rather than a single score, is what makes it useful: the triple recovers skill-family identity through paraphrase, renaming, refactoring, and controlled code translation when another component remains shared, while independent multilingual reimplementation is not recovered; it also localizes which component carries the reuse. We claim lineage, not behavioral equivalence: identity supplies the structural axis of a registry and leaves safety to behavioral verification. The fingerprint reaches an area under the ROC curve (AUC) of 0.974 (95% CI [0.956, 0.994]) over 4,950 pairwise comparisons while using 77x fewer bits than the embedding it approximates, with ranking preserved in expectation and finite-bit concentration; the per-component split turns one number into relationship classification, families, novelty, and a portable "SkillBOM" for a skill registry. On a 906-skill injection benchmark the fingerprint recognizes injected skills as tampered copies of a known base and localizes the change, but recognition is not trust: it remains, by design, an identity signal complementary to behavioral verification rather than a safety verdict.

0

cs.CR 2026-07-01

Framework matches red teaming methods to four AI agent layers

by Yong Yang, Xing Zheng +8 more

Securing the AI Agent: A Unified Framework for Multi-Layer Agent Red Teaming

Pairs rules, auditing, multi-turn tests, and jailbreaks with infrastructure, tools, behavior, and models for full coverage including skill p

abstract click to expand

The fast growth of open-source AI infrastructure, from model serving engines and agent platforms to the Model Context Protocol (MCP) ecosystem and the language models themselves, has outpaced the security tooling available to defend it. We present AI-Infra-Guard, an open-source framework that organizes AI red teaming around a single observation: the attack surface of an AI agent is stratified across layers (infrastructure, protocol/tool, agent behavior, and model), and no single detection paradigm fits all of them. The framework therefore matches a paradigm to each layer, from deterministic rule matching over 75+ AI components and 1{,}400+ vulnerability rules, through LLM-driven agentic auditing of MCP servers and agent-skill packages and multi-turn black-box agent red teaming, to a jailbreak harness with 26+ attack operators over sixteen datasets. To our knowledge it is the only open-source framework to span all of these, including supply-chain auditing of the agent skills that increasingly extend AI agents. We release AI-Infra-Guard as open source so that \emph{layer-paradigm matching} can serve as a practical foundation for agent security and a shared base for the community to build on.

0

cs.LG 2026-07-01

Tabular models memorize training data only under narrow fine-tuning

by Francesco Capano, Jonas Böhler

Probing Memorization of Tabular In-Context Learning

Probe isolates signals in 8 of 10 tasks with single-task repeated training, yet effects vanish under realistic conditions.

abstract click to expand

Large tabular models (LTMs), i.e., tabular foundation models leveraging in-context learning (ICL), achieve state-of-the-art performance on tabular tasks. While LLMs are known to unintentionally memorize training data, the memorization dynamics of LTMs remain largely unexplored. We investigate the potential for parametric memorization in tabular ICL. We introduce ICLMEM, a probing framework designed to separate context-based predictions from parametric memorization. Our zero-information multiple-choice context strips away valid contextual patterns to force the model to fall back on its parametric memory. Our controlled fine-tuning setup establishes membership ground truth and accounts for common pitfalls, e.g., distribution shift, feature contamination, base-rate fallacy, and the pre-trained base model acts as reference to calibrate for sample difficulty. Our controlled evaluation on a leading real-world-trained LTM detects moderate memorization signals in 8 out of 10 tasks ($\text{AUC}$ up to $0.67$ and TPR at $1\%$ FPR $>0.1$). Notably, memorization signals are strongest for low-cardinality and binary tasks. However, they largely vanish under realistic training conditions. Our findings show LTM memorization signals under specific circumstances (single-task fine-tuning with fixed samples across many epochs and small query size). To protect sensitive data, appropriate measures must be taken, which we discuss.

0

cs.CR 2026-07-01

Probe choice flips canary memorization verdicts in three cases

by Zhichao Fan, Zexin Zhuang +1 more

Probe Choice Changes Canary-Memorization Verdicts: Three Post-Hoc Disagreement Case Studies in a Text-Dominant LoRA-Tuned Autoregressive Testbed

Fixed 20-token window mean-NLL disagrees with full-span NLL and exact recall on controlled secrets.

abstract click to expand

We audit a fixed prefix-window mean-NLL memorization probe (K=20) on a Qwen2.5-VL-7B canary testbed and report three post-hoc cases where it disagrees with full-span secret NLL or greedy exact-recall. C3 (false negative, window truncation): damage lands on hex tokens outside K=20; the probe stays flat while hit@1 drops. C4 (false positive, non-secret drift): the probe moves, but approximately 99% sits on non-secret preamble; the secret span and hit@1 are unchanged. C5 (ambiguous in-window drop): the probe falls on an undertrained baseline while full-span hex is positive and hit@1=0. Recommendation: report (i) full-span secret NLL, (ii) a span-localised decomposition, (iii) behavioural exact-recall at k>=4, and (iv) decoy probes before asserting secret-specificity. Evidence is on controlled canaries in one backbone; magnitudes are testbed-specific.

0

cs.SE 2026-07-01

LLMs calibrate security better than function in generated code

by Mohammed Latif Siddiq, Md. Nafiu Rahman +1 more

An Empirical Study of Security Calibration in Large Language Models for Code

Models align stated confidence more closely with vulnerability presence than with whether the code actually works.

abstract click to expand

Large Language Models (LLMs) are rapidly transforming software development, yet their use in security-critical contexts raises a key question: do models know when their generated code is insecure? This property, known as calibration, measures whether a model's confidence aligns with the true correctness of its outputs. We present the first large-scale empirical study of security calibration in LLM-generated code. We evaluate GPT-4o-mini, Gemini-2.0-Flash, and Qwen3-Coder-Next across multiple temperature settings on two complementary benchmarks: self-contained security tasks and multi-language repository-level contexts. Our results suggest that overconfidence is prevalent across the evaluated LLMs. Functional calibration is consistently worse than security calibration, suggesting that models estimate security outcomes more reliably than functional correctness, potentially because functional correctness depends on complex execution behavior. We also examine whether calibration-guided automated repair can help remediate vulnerabilities in LLM-generated code, finding only limited improvements while frequently introducing functional regressions. Moreover, we study different mitigation strategies for reducing False Trust, where models assign high confidence to vulnerable code. The results show that although architectural gating improves calibration on controlled benchmarks, calibration deteriorates in realistic repository-level settings, increasing the risk of high-confidence vulnerable outputs.

0

cs.CR 2026-07-01

Hybrid method spots backdoor attacks after federated models converge

by Guanming Che, Qiang Wang +2 more

Secure-CHG: A Comprehensive Framework for Robust and Fair Federated Learning via Hybrid Defense and Contribution-Aware Trust

Projecting updates into hardness-gradient space isolates attackers when statistical defenses lose all signal.

abstract click to expand

Federated Learning (FL) is highly susceptible to stealthy backdoor attacks, which aim to force a model into predicting an attacker-chosen target class for inputs containing a specific trigger. However, existing statistical defenses primarily focus on the early stages of model convergence. In this paper, we identify a fundamental vulnerability termed ``Late-stage Failure.'' We demonstrate that as the global model converges, decaying gradient norms render malicious and benign updates morphologically indistinguishable. This vanishing statistical variance effectively blinds traditional defenses, enabling adaptive adversaries to remain dormant and subsequently hijack the training process. To overcome these constraints, we propose Secure-CHG, a hybrid framework that pivots the defense paradigm from superficial morphological detection toward intrinsic semantic contribution verification. Secure-CHG employs an adaptive defense pipeline: a cascaded statistical filter stabilizes optimization during the early oscillatory phase, while a novel CHG-Shapley mechanism takes over during late-stage convergence. By leveraging sample hardness (i.e., local training loss) to project updates into a composite Hardness-Gradient space, it effectively amplifies adversarial semantic traces, enabling the isolation of stealthy attackers even as gradient norms vanish. Furthermore, we derive a closed-form solution for CHG-Shapley, facilitating low-complexity, retraining-free node valuation and trust-modulated aggregation. Extensive evaluations on CIFAR-10, MedMNIST, and NEU-SDDB demonstrate that Secure-CHG effectively mitigates Late-stage Failure. Specifically, it significantly suppresses advanced backdoor attacks, reducing their attack success rate by 2.3$\times$ and 2.0$\times$ relative to the mainstream Krum and Trimmed Mean baselines, respectively.

0

cs.CR 2026-07-01

Contract turns untrusted LLMs into safe 3x faster solvers

by Chenyu Zhou, Qiliang Jiang +2 more

Certified Speculative Execution for Untrusted AI Agents

Zero violations and near-oracle regret on hard-constrained decisions even from sources that violate 98 percent of the time.

abstract click to expand

Hard-constrained sequential decision systems have no certified way to spend the test-time compute of modern AI: executing the multi-step drafts of a learned policy or a frozen LLM forfeits the feasibility guarantee a trusted solver provides, while invoking the solver at every step forfeits the speed the AI offers. Certificate-Gated Prefix Acceptance (CGPA) closes this gap with a certified speculative-execution contract for untrusted AI agents: a trusted verifier rejects constraint-violating transitions exactly, a conformally calibrated value boundary gates the longest low-cost prefix within a per-segment regret budget, and the rest defers to the solver, so safety, regret, and speed decouple by construction. The contract drives every untrusted proposal source - adversarial drafters and six heterogeneous frozen LLMs (including a 12B model that violates constraints in 98% of direct rollouts) - to zero applied violations; a certificate-aware learned boundary, conformally calibrated, drives mean regret three orders of magnitude below unguarded acceptance, to within sampling noise of the stepwise oracle (95% CI spanning zero), and under calendar shift a learned proposal source overtakes it on 15 of 18 held-out days. On a deployment-scale unit-commitment instance it turns a frozen 8B LLM into a 2.96x per-episode wall-clock speedup at 2.1% regret, outpacing the domain heuristic (1.79x) and a safe receding-horizon baseline (1.07x): the more capable the untrusted source, the faster the certified system, at guarantees that never change.

0

cs.CR 2026-06-30

Localized curvature repair removes LLM backdoors

by Arash Raftari, Mehrdad Mahdavi +2 more

Curvature-Guided Module Localization for Low-Rank Detoxification of Backdoored Large Language Models

Activation patching plus Fisher analysis pinpoints modules for low-rank fixes that block trigger attacks while preserving normal outputs.

abstract click to expand

Backdoor attacks pose a serious threat to large language models (LLMs) by causing otherwise benign systems to produce attacker-specified malicious behavior when a hidden trigger is present. In this work, we study post hoc detoxification of backdoored LLMs in a practical setting where the defender has access to the poisoned model but does not wish to retrain the full network from scratch. We propose a mechanistically guided weight-space repair framework that first localizes modules involved in propagating trigger-induced behavior using activation patching and Fisher/K-FAC curvature analysis, and then applies targeted low-rank repair to only the most influential modules. We evaluate the method on poisoned variants of \texttt{Llama-3.2-1B-Instruct} with triggers inserted at the beginning, middle, and end of otherwise benign prompts. Results show that the proposed approach substantially suppresses trigger-conditioned malicious responses while preserving benign model behavior. These findings suggest that backdoor removal in LLMs can be formulated as a localized structural repair problem rather than only a broad behavioral alignment problem.

0