pith. sign in

arxiv: 2607.00820 · v1 · pith:WQ4RXTL7new · submitted 2026-07-01 · 💻 cs.SE

Knowledge-Enhanced Agentic Vulnerability Repair

Pith reviewed 2026-07-02 08:31 UTC · model grok-4.3

classification 💻 cs.SE
keywords automated vulnerability repairagentic systemsknowledge retrievalpatch generationReAct reasoningC/C++ securityretrieval-augmented generation
0
0 comments X

The pith

KeaRepair grounds automated vulnerability repair in extracted historical knowledge and agent-collected program facts to reach 83.64% repair rate on 55 C/C++ cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes KeaRepair as an agentic method that first builds retrieval bases of multi-dimensional vulnerability knowledge from past vulnerability-patch pairs. It then deploys a tool-using agent in ReAct style to gather verified facts about a target program and diagnose the root cause. Finally it retrieves relevant knowledge to guide patch synthesis, followed by iterative refinement that checks compilation, proof-of-concept replay, and test execution. This combination is shown to repair 46 of 55 reproducible vulnerabilities when using Gemini-3.1-Pro, including six cases no baseline fixes, and to transfer across languages.

Core claim

KeaRepair extracts multi-dimensional vulnerability knowledge from historical pairs into dedicated retrieval bases, pairs it with a ReAct-style tool-augmented agent that collects verified program facts, and performs knowledge-level retrieval-augmented generation followed by closed-loop validation, yielding 46 repairs out of 55 C/C++ vulnerabilities and six unique fixes.

What carries the argument

The knowledge extraction and retrieval-augmented generation pipeline inside the KeaRepair agent, which combines dual-view historical knowledge bases with runtime fact collection to drive patch synthesis and refinement.

If this is right

  • Existing AVR baselines can be improved by adding the same dual-view knowledge bases and closed-loop validation steps.
  • The approach generalizes to languages beyond C/C++ when the knowledge extraction step is repeated on their historical patches.
  • Six vulnerabilities remain unfixed by any method, indicating that certain root-cause patterns still require additional diagnostic capabilities.
  • Iterative refinement through compilation, PoC replay, and test execution reduces false-positive patches that would otherwise pass static checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the knowledge bases are kept up to date with new disclosures, the method could serve as a continuously improving repair service rather than a one-time tool.
  • The same retrieval-plus-agent structure might be applied to other code-transformation tasks such as refactoring or security-hardening beyond vulnerability repair.
  • A production deployment would need safeguards against retrieving knowledge from patches that themselves introduced new vulnerabilities.

Load-bearing premise

Multi-dimensional knowledge pulled from historical vulnerability-patch pairs can be reliably retrieved and fused with freshly collected program facts to produce correct patches for vulnerabilities never seen during knowledge extraction.

What would settle it

Run KeaRepair on a fresh set of 50+ reproducible vulnerabilities drawn from post-2024 disclosures whose patches were never included in the historical knowledge bases and measure whether the repair rate remains above 70%.

Figures

Figures reproduced from arXiv: 2607.00820 by Bo Wang, David Lo, Hao Ma, Kangyi Ding, Le Yu, Linzhang Wang, Sicong Cao, Terry Yue Zhuo, Xiaobing Sun, Xiaolei Liu, Xingwei Lin.

Figure 1
Figure 1. Figure 1: An illustrative example using a real-world vulnerability (CVE-2017-6828) from the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of KEAREPAIR. patch generated by APPatch is neither (❶) compilable due to the use of undefined macro AF_ARRAY_SIZE nor (❷) complete as numCoefficients should be at least 7. B. Key Ideas Based on the above observations, we propose a knowledge￾enhanced agentic AVR approach that jointly addresses root cause analysis accuracy and demonstration quality. (1) Fact-grounded agentic vulnerability diagnosis… view at source ↗
Figure 3
Figure 3. Figure 3: Overlapping among CVEs fixed by different approaches. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Parameter analysis of the weight coefficient λ. Results [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Frontier foundation models have changed the math on vulnerability discovery, but the bigger challenge is how the remediation side keeps up. Despite recent progresses in Automated Vulnerability Repair (AVR), current solutions struggle to reliably identify the root causes of vulnerabilities, and insufficiently utilize the prior fix knowledge to guide the patch generation process, thus undermining their effectiveness in practice. To address this gap, we propose KeaRepair, a novel agentic AVR approach that grounds patch generation in verified program facts and high-level vulnerability knowledge. Specifically, KeaRepair first extracts multi-dimensional vulnerability knowledge from historical vulnerability-patch pairs from dual complementary views, and constructs dedicated retrieval knowledge bases. It then employs a tool-augmented agent that performs ReAct-style reasoning to collect verified program facts for vulnerability diagnosis. Finally, based on the diagnostic results, KeaRepair performs knowledge-level retrieval-augmented patch generation and iteratively refines patches through a closed-loop validation process involving compilation, PoC replay, and test-suite execution. Experimental results show that KeaRepair significantly outperforms existing AVR approaches on 55 reproducible C/C++ vulnerabilities. When paired with Gemini-3.1-Pro, KeaRepair successfully repairs 46 vulnerabilities, achieving a repair rate of 83.64%. Moreover, KeaRepair fixes six unique vulnerabilities that none of the baselines can address, and further demonstrates strong cross-language generalizability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces KeaRepair, an agentic automated vulnerability repair (AVR) system. It extracts multi-dimensional vulnerability knowledge from historical vulnerability-patch pairs to construct retrieval knowledge bases, deploys a tool-augmented ReAct-style agent to collect verified program facts for root-cause diagnosis, and performs retrieval-augmented patch generation followed by iterative closed-loop validation (compilation, PoC replay, test-suite execution). On 55 reproducible C/C++ vulnerabilities, KeaRepair paired with Gemini-3.1-Pro repairs 46 cases (83.64% rate), fixes six vulnerabilities unique to it among the baselines, and exhibits cross-language generalizability.

Significance. If the reported performance gains are shown to arise from genuine generalization rather than data overlap, the combination of agentic fact collection with RAG over historical fix knowledge would constitute a meaningful advance over prior AVR techniques that struggle with root-cause identification. The six unique fixes and cross-language results, if reproducible, would indicate practical utility for improving remediation reliability in security-critical codebases.

major comments (3)
  1. [Experimental evaluation / abstract] Experimental evaluation (abstract and § on results): The headline claim of 83.64% repair rate and six unique fixes on 55 C/C++ cases rests on the assumption that the evaluation vulnerabilities are disjoint from the historical pairs used to build the retrieval knowledge bases. No explicit hold-out protocol, temporal split, or overlap-detection procedure is described, which directly threatens the generalization interpretation of the results.
  2. [Experimental evaluation] Experimental evaluation: The abstract states strong empirical results but supplies no information on baseline implementations (e.g., whether they received equivalent knowledge bases or tool access), statistical significance testing, data exclusion criteria, or error analysis. These omissions make it impossible to verify whether the performance delta and uniqueness claims are supported by the experimental design.
  3. [Approach / knowledge extraction] Approach description (knowledge extraction step): The claim that multi-dimensional knowledge is extracted from 'dual complementary views' is central to the retrieval-augmented generation component, yet the manuscript provides no concrete definition of the dimensions, the extraction algorithm, or how the dedicated knowledge bases are indexed and queried. This lack of specificity prevents assessment of whether the reported gains are attributable to the proposed mechanism.
minor comments (1)
  1. [Abstract / Introduction] The abstract and introduction would benefit from a brief comparison table summarizing prior AVR methods' reported repair rates on comparable benchmarks to contextualize the 83.64% figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and commit to revisions that enhance experimental transparency and methodological detail without altering the core claims.

read point-by-point responses
  1. Referee: Experimental evaluation (abstract and § on results): The headline claim of 83.64% repair rate and six unique fixes on 55 C/C++ cases rests on the assumption that the evaluation vulnerabilities are disjoint from the historical pairs used to build the retrieval knowledge bases. No explicit hold-out protocol, temporal split, or overlap-detection procedure is described, which directly threatens the generalization interpretation of the results.

    Authors: We agree that an explicit protocol is required to support the generalization interpretation. In the revised manuscript we will add a new subsection under Experimental Setup that details the data collection timeline: historical pairs were drawn from CVE entries and patches through December 2022, while the 55 evaluation vulnerabilities were selected exclusively from 2023–2024 reports. We will also describe and report results from an automated overlap-detection procedure that matches CVE identifiers and computes normalized edit-distance similarity on vulnerable code snippets, confirming zero overlap. These additions directly address the concern. revision: yes

  2. Referee: Experimental evaluation: The abstract states strong empirical results but supplies no information on baseline implementations (e.g., whether they received equivalent knowledge bases or tool access), statistical significance testing, data exclusion criteria, or error analysis. These omissions make it impossible to verify whether the performance delta and uniqueness claims are supported by the experimental design.

    Authors: We acknowledge the omissions. The revised experimental section will include: (i) explicit descriptions of baseline re-implementations confirming they operated without our knowledge bases or agent tooling; (ii) statistical significance results using McNemar’s test on paired repair outcomes; (iii) precise exclusion criteria (availability of reproducible PoCs, successful compilation, and test-suite coverage) that yielded the final 55 cases from an initial larger pool; and (iv) a categorized error analysis of the nine unsuccessful repairs. These additions will allow independent verification of the reported deltas and uniqueness claims. revision: yes

  3. Referee: Approach description (knowledge extraction step): The claim that multi-dimensional knowledge is extracted from 'dual complementary views' is central to the retrieval-augmented generation component, yet the manuscript provides no concrete definition of the dimensions, the extraction algorithm, or how the dedicated knowledge bases are indexed and queried. This lack of specificity prevents assessment of whether the reported gains are attributable to the proposed mechanism.

    Authors: We will expand the Approach section with the requested concrete details. The dual complementary views are defined as: (1) the structural view, extracting syntactic patterns, vulnerable code regions, and patch diffs; and (2) the semantic view, extracting CWE types, root-cause rationales, and fix strategies from commit messages and documentation. Extraction proceeds via an LLM-assisted parser with rule-based post-validation on a 10% sample. Dedicated knowledge bases are stored as vector embeddings generated by a sentence-transformer model and indexed with FAISS for cosine-similarity retrieval. The revised text will include pseudocode, a worked example, and query templates to illustrate the full pipeline. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation on separate test cases

full rationale

The paper contains no equations, derivations, or self-referential definitions. All claims rest on experimental repair rates (46/55 cases) and comparisons to baselines. Knowledge bases are described as built from historical vulnerability-patch pairs, with evaluation presented on a distinct set of 55 reproducible C/C++ vulnerabilities. No quoted text shows test cases included in the knowledge extraction by construction, nor any fitted parameter renamed as a prediction. The approach is self-contained against external benchmarks and does not reduce to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that historical vulnerability data yields extractable, reusable multi-dimensional knowledge and that agent-driven fact collection plus retrieval can generalize to new cases; no free parameters or invented physical entities are described.

axioms (2)
  • domain assumption Historical vulnerability-patch pairs contain extractable multi-dimensional knowledge that generalizes to new vulnerabilities
    Invoked in the description of knowledge base construction from dual complementary views.
  • domain assumption ReAct-style tool-augmented reasoning can collect verified program facts sufficient for accurate vulnerability diagnosis
    Central to the agent component of the pipeline.
invented entities (1)
  • KeaRepair agentic system no independent evidence
    purpose: To perform knowledge-grounded vulnerability diagnosis and patch generation
    The paper introduces this as a novel integrated approach; independent_evidence is false because no external falsifiable prediction is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5799 in / 1612 out tokens · 40703 ms · 2026-07-02T08:31:45.683634+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Assessing Claude Mythos Preview’s cybersecurity capabilities, 2026, https://red.anthropic.com/2026/mythos-preview

  2. [2]

    Scaling Trusted Access for Cyber with GPT -5.5 and GPT -5.5-Cyber, 2026, https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber

  3. [3]

    Common Vulnerabilities and Exposures, 2026, https://www.cve.org

  4. [4]

    SoK: Towards effective automated vulnerability repair,

    Y . Li, F. H. Shezan, B. Wei, G. Wang, and Y . Tian, “SoK: Towards effective automated vulnerability repair,” inProceedings of the 34th USENIX Security Symposium (Security), 2025, pp. 4441–4462

  5. [5]

    Large language model for vulnerability detection and repair: Literature review and the road ahead,

    X. Zhou, S. Cao, X. Sun, and D. Lo, “Large language model for vulnerability detection and repair: Literature review and the road ahead,” ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, pp. 145:1–145:31, 2025

  6. [6]

    Using safety properties to generate vulnerability patches,

    Z. Huang, D. Lie, G. Tan, and T. Jaeger, “Using safety properties to generate vulnerability patches,” inProceedings of the 40th IEEE Symposium on Security and Privacy (S&P), 2019, pp. 539–554

  7. [7]

    Program vulnerability repair via inductive inference,

    Y . Zhang, X. Gao, G. J. Duck, and A. Roychoudhury, “Program vulnerability repair via inductive inference,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2022, pp. 691–702

  8. [8]

    VulRepair: A T5-based automated software vulnerability repair,

    M. Fu, C. Tantithamthavorn, T. Le, V . Nguyen, and D. Phung, “VulRepair: A T5-based automated software vulnerability repair,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2022, pp. 935–947

  9. [9]

    Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,

    X. Zhou, K. Kim, B. Xu, D. Han, and D. Lo, “Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE), 2024, pp. 88:1–88:13

  10. [10]

    Examining zero-shot vulnerability repair with large language models,

    H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” inProceedings of the 44th IEEE Symposium on Security and Privacy (S&P), 2023, pp. 2339–2356

  11. [11]

    Vul-r2: A reasoning LLM for automated vulnerability repair,

    X. Wen, Z. Lin, Y . Yang, C. Gao, and D. Ye, “Vul-r2: A reasoning LLM for automated vulnerability repair,” inProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 26–38

  12. [12]

    SeCuRepair: Semantics- aligned, curriculum-driven, and reasoning-enhanced vulnerability repair framework,

    C. Yang, T. Zhang, J. Jiang, X. Zhou, H. Tian, M. Du, J. Shi, J. Chen, Y . Li, E. L. Ouh, L. K. Shar, and D. Lo, “SeCuRepair: Semantics- aligned, curriculum-driven, and reasoning-enhanced vulnerability repair framework,” inProceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL), 2026, pp. 1–16

  13. [13]

    APPATCH: Automated adaptive prompting large language models for real-world software vulnerability patching,

    Y . Nong, H. Yang, L. Cheng, H. Hu, and H. Cai, “APPATCH: Automated adaptive prompting large language models for real-world software vulnerability patching,” inProceedings of the 34th USENIX Security Symposium (Security), 2025, pp. 4481–4498

  14. [14]

    Well begun is half done: Location-aware and trace-guided iterative automated vulnerability repair,

    Z. Ye, X. Sun, S. Cao, L. Bo, and B. Li, “Well begun is half done: Location-aware and trace-guided iterative automated vulnerability repair,” inProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE), 2026

  15. [15]

    PATCHAGENT: A practical program repair agent mimicking human expertise,

    Z. Yu, Z. Guo, Y . Wu, J. Yu, M. Xu, D. Mu, Y . Chen, and X. Xing, “PATCHAGENT: A practical program repair agent mimicking human expertise,” inProceedings of the 34th USENIX Security Symposium (Security), 2025, pp. 4381–4400

  16. [16]

    Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,

    Y . Kim, S. Shin, H. Kim, and J. Yoon, “Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,” inProceedings of the 34th USENIX Security Symposium (Security), 2025, pp. 4401–4419

  17. [17]

    SoK: Automated vulnerability repair: Methods, tools, and assessments,

    Y . Hu, Z. Li, K. Shu, S. Guan, D. Zou, S. Xu, B. Yuan, and H. Jin, “SoK: Automated vulnerability repair: Methods, tools, and assessments,” inProceedings of the 34th USENIX Security Symposium (Security), 2025, pp. 4421–4440

  18. [18]

    Eval- uating the impact of experimental assumptions in automated fault localization,

    E. O. Soremekun, L. Kirschner, M. B ¨ohme, and M. Papadakis, “Eval- uating the impact of experimental assumptions in automated fault localization,” inProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 159–171

  19. [19]

    Collu-bench: A benchmark for predicting language model hallucinations in code,

    N. Jiang, Q. Li, L. Tan, and T. Zhang, “Collu-bench: A benchmark for predicting language model hallucinations in code,”arXiv preprint arXiv: 2410.09997, 2024

  20. [20]

    ThinkRepair: Self-directed automated program repair,

    X. Yin, C. Ni, S. Wang, Z. Li, L. Zeng, and X. Yang, “ThinkRepair: Self-directed automated program repair,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024, pp. 1274–1286

  21. [21]

    What makes good in-context demonstrations for code intelligence tasks with llms?

    S. Gao, X. Wen, C. Gao, W. Wang, H. Zhang, and M. R. Lyu, “What makes good in-context demonstrations for code intelligence tasks with llms?” inProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 761–773

  22. [22]

    Fix pattern-aware vulnerability patch generation via in-context learning,

    M. Shao, Y . Ding, C. Gao, and G. Zhu, “Fix pattern-aware vulnerability patch generation via in-context learning,”ACM Trans. Softw. Eng. Methodol., pp. 1–44, 2026

  23. [23]

    Vulkey: Automated vulnerability repair guided by domain-specific repair patterns,

    J. Li, Z. Chen, Y . Su, and M. R. Lyu, “Vulkey: Automated vulnerability repair guided by domain-specific repair patterns,” inProceedings of the 34th ACM International Conference on the Foundations of Software Engineering (FSE). ACM, 2026, pp. 1–24

  24. [24]

    Neural transfer learning for repairing security vulnerabilities in C code,

    Z. Chen, S. Kommrusch, and M. Monperrus, “Neural transfer learning for repairing security vulnerabilities in C code,”IEEE Transactions on Software Engineering, vol. 49, no. 1, pp. 147–165, 2023

  25. [25]

    Vision transformer inspired automated vulnerability repair,

    M. Fu, V . Nguyen, C. Tantithamthavorn, D. Phung, and T. Le, “Vision transformer inspired automated vulnerability repair,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 3, pp. 78:1–78:29, 2024

  26. [26]

    Beyond tests: Program vulnerability repair via crash constraint extraction,

    X. Gao, B. Wang, G. J. Duck, R. Ji, Y . Xiong, and A. Roychoudhury, “Beyond tests: Program vulnerability repair via crash constraint extraction,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 2, pp. 14:1–14:27, 2021

  27. [27]

    PATCHEV AL: A new benchmark for evaluating llms on patching real-world vulnerabilities,

    Z. Wei, J. Zeng, M. Wen, Z. Yu, K. Cheng, Y . Zhu, J. Guo, S. Zhou, L. Yin, X. Su, and Z. Ma, “PATCHEV AL: A new benchmark for evaluating llms on patching real-world vulnerabilities,”arXiv preprint arXiv: 2511.11019, 2025

  28. [28]

    Large language model-based agents for software engineering: A survey,

    J. Liu, K. Wang, Y . Chen, X. Peng, Z. Chen, L. Zhang, and Y . Lou, “Large language model-based agents for software engineering: A survey,” ACM Transactions on Software Engineering and Methodology, 2025

  29. [29]

    Autocoderover: Autonomous program improvement,

    Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “Autocoderover: Autonomous program improvement,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024, pp. 1592–1604

  30. [30]

    Swe-agent: Agent-computer interfaces enable automated software engineering,

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,” inProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

  31. [31]

    Atomizer: An llm-based collaborative multi-agent framework for intent-driven commit untangling,

    K. Zhu, Z. Tian, S. Wang, M. Leng, and X. Mao, “Atomizer: An llm-based collaborative multi-agent framework for intent-driven commit untangling,” inProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE), 2026

  32. [32]

    CVE-2017-6828, 2026, https://nvd.nist.gov/vuln/detail/cve-2017-6828

  33. [33]

    Localizing vulnerabilities statistically from one exploit,

    S. Shen, A. Kolluri, Z. Dong, P. Saxena, and A. Roychoudhury, “Localizing vulnerabilities statistically from one exploit,” inProceedings of the 16th ACM Asia Conference on Computer and Communications Security (AsiaCCS). ACM, 2021, pp. 537–549

  34. [34]

    The future can’t help fix the past: Assessing program repair in the wild,

    V . Kabadi, D. Kong, S. Xie, L. Bao, G. A. A. Prana, T. B. Le, X. D. Le, and D. Lo, “The future can’t help fix the past: Assessing program repair in the wild,” inProceedings of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2023, pp. 50–61

  35. [35]

    Vulnerability repair via concolic execution and code mutations,

    R. Shariffdeen, C. S. Timperley, Y . Noller, C. Le Goues, and A. Roychoud- hury, “Vulnerability repair via concolic execution and code mutations,” ACM Trans. Softw. Eng. Methodol., vol. 34, no. 4, pp. 105:1–105:27, 2025

  36. [36]

    CVE-2018-16419, 2026, https://nvd.nist.gov/vuln/detail/cve-2018-16419

  37. [37]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inProceedings of the 11th International Conference on Learning Representations (ICLR), 2023, pp. 1–33

  38. [38]

    Context-enhanced vulnerability detection based on large language models,

    Y . Yang, B. Xu, X. Gao, and H. Sun, “Context-enhanced vulnerability detection based on large language models,”ACM Trans. Softw. Eng. Methodol., pp. 1–46, 2025

  39. [39]

    Patchscope: Memory object centric patch diffing,

    L. Zhao, Y . Zhu, J. Ming, Y . Zhang, H. Zhang, and H. Yin, “Patchscope: Memory object centric patch diffing,” inProceedings of the 27th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2020, pp. 149–165

  40. [40]

    MVD: memory-related vulnerability detection based on flow-sensitive graph neural networks,

    S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “MVD: memory-related vulnerability detection based on flow-sensitive graph neural networks,” in Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE). ACM, 2022, pp. 1456–1468

  41. [41]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProceedings of the 36th Annual Conference on Neural Information Processing Systems (NeurIPS), 2022, pp. 24 824– 24 837

  42. [42]

    Vul-rag: Enhancing llm-based vulnerability detection via knowledge-level RAG,

    X. Du, G. Zheng, K. Wang, Y . Zou, Y . Wang, W. Deng, J. Feng, M. Liu, B. Chen, X. Peng, T. Ma, and Y . Lou, “Vul-rag: Enhancing llm-based vulnerability detection via knowledge-level RAG,”ACM Trans. Softw. Eng. Methodol., pp. 1–25, 2026

  43. [43]

    The faiss library,

    M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P. Mazar ´e, M. Lomeli, L. Hosseini, and H. J ´egou, “The faiss library,”IEEE Trans. Big Data, vol. 12, no. 2, pp. 346–361, 2026

  44. [44]

    Tree-sitter, 2026, https://tree-sitter.github.io/tree-sitter

  45. [45]

    Code Llama: Open Foundation Models for Code

    B. Rozi `ere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. Canton-Ferrer, A. Grattafiori, W. Xiong, A. D ´efossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, and G. Synnaeve, “Code llama: Open foundation models for code,”arXiv preprin...

  46. [46]

    Term-weighting approaches in automatic text retrieval,

    G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,”Inf. Process. Manag., vol. 24, no. 5, pp. 513–523, 1988

  47. [47]

    Vulnerability detection with code language models: How far are we?

    Y . Ding, Y . Fu, O. Ibrahim, C. Sitawarin, X. Chen, B. Alomair, D. A. Wagner, B. Ray, and Y . Chen, “Vulnerability detection with code language models: How far are we?” inProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 1729–1741

  48. [48]

    Trust enhancement issues in program repair,

    Y . Noller, R. Shariffdeen, X. Gao, and A. Roychoudhury, “Trust enhancement issues in program repair,” inProceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE). ACM, 2022, pp. 2228–2240

  49. [49]

    How effective are neural networks for fixing security vulnerabilities,

    Y . Wu, N. Jiang, H. V . Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2023, pp. 1282– 1294

  50. [50]

    Rethinking the capability of fine-tuned language models for automated vulnerability repair,

    W. Han, Y . Kwak, M. Yu, K. Kim, Y . Lee, H. Moon, and Y . Paek, “Rethinking the capability of fine-tuned language models for automated vulnerability repair,” inProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE), 2026

  51. [51]

    CVE-2016-1838, 2026, https://nvd.nist.gov/vuln/detail/cve-2016-1838

  52. [52]

    On the evaluation of large language models in multilingual vulnerability repair,

    D. Wang, J. Yu, H. Shu, M. Fu, C. Tantithamthavorn, Y . Kamei, and J. Chen, “On the evaluation of large language models in multilingual vulnerability repair,”ACM Trans. Softw. Eng. Methodol., 2025

  53. [53]

    Morepair: Teaching llms to repair code via multi- objective fine-tuning,

    B. Yang, H. Tian, J. Ren, H. Zhang, J. Klein, T. F. Bissyand ´e, C. Le Goues, and S. Jin, “Morepair: Teaching llms to repair code via multi- objective fine-tuning,”ACM Trans. Softw. Eng. Methodol., vol. 35, no. 2, pp. 38:1–38:38, 2026

  54. [54]

    How are multilingual systems constructed: Characterizing language use and selection in open-source multilingual software,

    W. Li, A. Marino, H. Yang, N. Meng, L. Li, and H. Cai, “How are multilingual systems constructed: Characterizing language use and selection in open-source multilingual software,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 3, pp. 63:1–63:46, 2024

  55. [55]

    Sleuth: A switchable dual-mode fuzzer to investigate bug impacts following a single poc,

    H. Wei, L. Chen, Z. Zhang, G. Shi, and D. Meng, “Sleuth: A switchable dual-mode fuzzer to investigate bug impacts following a single poc,” in Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 2024, pp. 730–742