hub Canonical reference

Agent skills: A data-driven analysis of claude skills for extending large language model functionality

· 2026 · arXiv 2602.08004

Canonical reference. 80% of citing Pith papers cite this work as background.

28 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 28 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

cs.CR · 2026-06-05 · unverdicted · novelty 8.0

MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

cs.CR · 2026-04-16 · unverdicted · novelty 8.0

Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.

FermiLink: A Unified Agent Framework for Multidomain Autonomous Scientific Simulations

physics.chem-ph · 2026-04-03 · conditional · novelty 8.0

FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in benchmarks and generating research-grade results on unpublished problems.

Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware

cs.CR · 2026-07-02 · unverdicted · novelty 7.0

SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.

From Registry to Repository: How AI Agent Skills Are Written, Adapted, and Maintained

cs.SE · 2026-07-01 · unverdicted · novelty 7.0

Empirical study of 41k+ AI agent skills finds reuse is mostly one-time verbatim copying with 53% never modified afterward and maintenance focused on additive local adaptations.

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

cs.CR · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.

An Empirical Study of Agent Skills for Healthcare: Practice, Gaps, and Governance

cs.AI · 2026-05-04 · unverdicted · novelty 7.0

Public healthcare agent skills emphasize workflow automation over clinical diagnostics and treatments, with uneven lifecycle coverage and weak alignment between technical and clinical risk.

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.

Skill Coverage: A Test Adequacy Metric for Agent Skills

cs.AI · 2026-06-09 · unverdicted · novelty 6.0

Skill coverage is a binary test adequacy metric that extracts observable behavior constraints from skill documents and judges whether trajectories provide sufficient evidence to cover each constraint, revealing 39.90-43.98% coverage on SkillsBench.

Workflow-to-Skill: Skill Creation via Routing-Workflow-Semantics-Attachments Decomposition

cs.AI · 2026-06-05 · unverdicted · novelty 6.0

W2S framework with RWSA decomposition converts heterogeneous traces into Skills and improves behavioral replay consistency by 10.5% over summarization baselines on 70 Skills.

SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.

FederatedSkill: Federated Learning for Agentic Skill Evolution

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

FederatedSkill aggregates client semantic skill diffs via a server evolution agent to enable strictly personalized skill evolution, reporting up to 44.4% higher success rates and 37.5% lower compute cost than self-evolving baselines across 20 task families.

SkillGuard: A Permission Framework for Agent Skills

cs.CR · 2026-06-02 · unverdicted · novelty 6.0

SkillGuard presents a dual-plane permission framework for agent skills that achieves 99.76% taxonomy coverage and reduces attack success rates in evaluations on 315 skills.

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

cs.MA · 2026-05-26 · unverdicted · novelty 6.0

AgensFlow learns coordination policies from task trajectories and outperforms fixed pipelines on distributed-systems incident and security-advisory tasks.

CODESKILL: Learning Self-Evolving Skills for Coding Agents

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

CODESKILL trains an LLM policy via RL on hybrid rewards to extract and maintain multi-granularity skills from agent trajectories, raising pass rates 9.69 points over no-skill baselines on three coding benchmarks while keeping the skill bank compact.

SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks

cs.AI · 2026-05-09 · unverdicted · novelty 6.0 · 3 refs

SearchSkill improves LLM query planning on knowledge QA by using explicit skill selection from an evolving SkillBank and a two-stage SFT process that aligns training with inference-time skill-grounded execution.

Skill Retrieval Augmentation for Agentic AI

cs.CL · 2026-04-27 · unverdicted · novelty 6.0 · 3 refs

Introduces SRA paradigm and SRA-Bench benchmark (5,400 tasks, 26,262 skills) showing retrieval improves performance but LLMs fail to selectively incorporate retrieved skills.

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

cs.LG · 2026-06-07 · unverdicted · novelty 5.0

SkillHone introduces a harness that maintains persistent decision histories to support continual evolution of language-model agent skills, reporting 15.8-point gains on GAIA over a commercial deep-research agent.

Unsupervised Skill Discovery for Agentic Data Analysis

cs.AI · 2026-06-04 · unverdicted · novelty 5.0

DataCOPE uses verifier-guided contrastive distillation from agent trajectories to discover skills, yielding average gains of 9.71% on report-style and 32.30% on reasoning-style data analysis tasks across four model settings.

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

SkillComposer decomposes skill construction into create/improve/merge operations trained by rejection sampling, enabling self-evolving skills that improve agent and code task performance while generalizing to unseen domains.

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Skill0.5 is an agentic RL framework that internalizes general skills for hard tasks and utilizes task-specific skills for easy tasks via a dynamic difficulty-aware router to improve out-of-distribution generalization.

citing papers explorer

Showing 28 of 28 citing papers.

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills cs.CR · 2026-06-05 · unverdicted · none · ref 32
MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents? cs.CR · 2026-04-16 · unverdicted · none · ref 38
Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
FermiLink: A Unified Agent Framework for Multidomain Autonomous Scientific Simulations physics.chem-ph · 2026-04-03 · conditional · none · ref 34
FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in benchmarks and generating research-grade results on unpublished problems.
Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware cs.CR · 2026-07-02 · unverdicted · none · ref 8
SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.
From Registry to Repository: How AI Agent Skills Are Written, Adapted, and Maintained cs.SE · 2026-07-01 · unverdicted · none · ref 7
Empirical study of 41k+ AI agent skills finds reuse is mostly one-time verbatim copying with 53% never modified afterward and maintenance focused on additive local adaptations.
Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning cs.CL · 2026-05-30 · unverdicted · none · ref 56
SelSkill applies dual-granularity preference learning to selective skill-or-skip decisions, improving task success by 10.9 points and execution precision by 29.1 points on ALFWorld with Qwen3-8B.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces cs.CR · 2026-05-12 · unverdicted · none · ref 20 · 2 links
SkillSafetyBench is a benchmark of 155 cases across 47 tasks and 6 risk domains showing that non-user attacks via skills, artifacts, or environments can consistently induce unsafe agent behavior.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck cs.LG · 2026-05-08 · unverdicted · none · ref 13
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
An Empirical Study of Agent Skills for Healthcare: Practice, Gaps, and Governance cs.AI · 2026-05-04 · unverdicted · none · ref 2
Public healthcare agent skills emphasize workflow automation over clinical diagnostics and treatments, with uneven lifecycle coverage and weak alignment between technical and clinical risk.
Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security cs.CR · 2026-06-10 · unverdicted · none · ref 4
Runtime Skill Audit introduces targeted runtime probing to detect malicious LLM agent skills, reporting 90% accuracy and resilience to self-evolving attacks on 100 skills versus static baselines.
Skill Coverage: A Test Adequacy Metric for Agent Skills cs.AI · 2026-06-09 · unverdicted · none · ref 10
Skill coverage is a binary test adequacy metric that extracts observable behavior constraints from skill documents and judges whether trajectories provide sufficient evidence to cover each constraint, revealing 39.90-43.98% coverage on SkillsBench.
Workflow-to-Skill: Skill Creation via Routing-Workflow-Semantics-Attachments Decomposition cs.AI · 2026-06-05 · unverdicted · none · ref 4
W2S framework with RWSA decomposition converts heterogeneous traces into Skills and improves behavioral replay consistency by 10.5% over summarization baselines on 70 Skills.
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization cs.AI · 2026-06-04 · unverdicted · none · ref 26
SciVisAgentSkills provides reusable agent skills that raise mean task scores on a 108-task SciVis benchmark when paired with Codex and Claude Code agents.
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill cs.LG · 2026-06-02 · unverdicted · none · ref 67
Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
FederatedSkill: Federated Learning for Agentic Skill Evolution cs.LG · 2026-06-02 · unverdicted · none · ref 2
FederatedSkill aggregates client semantic skill diffs via a server evolution agent to enable strictly personalized skill evolution, reporting up to 44.4% higher success rates and 37.5% lower compute cost than self-evolving baselines across 20 task families.
SkillGuard: A Permission Framework for Agent Skills cs.CR · 2026-06-02 · unverdicted · none · ref 18
SkillGuard presents a dual-plane permission framework for agent skills that achieves 99.76% taxonomy coverage and reduces attack success rates in evaluations on 315 skills.
AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems cs.MA · 2026-05-26 · unverdicted · none · ref 11
AgensFlow learns coordination policies from task trajectories and outperforms fixed pipelines on distributed-systems incident and security-advisory tasks.
CODESKILL: Learning Self-Evolving Skills for Coding Agents cs.AI · 2026-05-25 · unverdicted · none · ref 1
CODESKILL trains an LLM policy via RL on hybrid rewards to extract and maintain multi-granularity skills from agent trajectories, raising pass rates 9.69 points over no-skill baselines on three coding benchmarks while keeping the skill bank compact.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks cs.AI · 2026-05-09 · unverdicted · none · ref 18 · 3 links
SearchSkill improves LLM query planning on knowledge QA by using explicit skill selection from an evolving SkillBank and a two-stage SFT process that aligns training with inference-time skill-grounded execution.
Skill Retrieval Augmentation for Agentic AI cs.CL · 2026-04-27 · unverdicted · none · ref 16 · 3 links
Introduces SRA paradigm and SRA-Bench benchmark (5,400 tasks, 26,262 skills) showing retrieval improves performance but LLMs fail to selectively incorporate retrieved skills.
SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History cs.LG · 2026-06-07 · unverdicted · none · ref 9
SkillHone introduces a harness that maintains persistent decision histories to support continual evolution of language-model agent skills, reporting 15.8-point gains on GAIA over a commercial deep-research agent.
Unsupervised Skill Discovery for Agentic Data Analysis cs.AI · 2026-06-04 · unverdicted · none · ref 18
DataCOPE uses verifier-guided contrastive distillation from agent trajectories to discover skills, yielding average gains of 9.71% on report-style and 32.30% on reasoning-style data analysis tasks across four model settings.
SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization cs.CL · 2026-06-04 · unverdicted · none · ref 2
SkillComposer decomposes skill construction into create/improve/merge operations trained by rejection sampling, enabling self-evolving skills that improve agent and code task performance while generalizing to unseen domains.
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning cs.CL · 2026-05-27 · unverdicted · none · ref 3
Skill0.5 is an agentic RL framework that internalizes general skills for hard tasks and utilizes task-specific skills for easy tasks via a dynamic difficulty-aware router to improve out-of-distribution generalization.
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 29
SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 86
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents cs.SE · 2026-05-21 · unverdicted · none · ref 5 · 2 links
Contractual skills framework structures SKILL.md files as readable task contracts; A/B tests on synthetic tasks show mean quality rising from 4.692 to 4.914 and critical-error rate falling from 0.083 to 0.013 across models.
Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub cs.CL · 2026-03-19 · unverdicted · none · ref 7
Analysis of ClawHub shows language-based functional divides in agent skills, with over 30% flagged suspicious and submission-time documentation enabling 73% accurate risk prediction.

Agent skills: A data-driven analysis of claude skills for extending large language model functionality

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer