Grounded Optimization: A Layered Engineering Framework for Reducing LLM Hallucination in Automated Personal Document Rewriting

Adarsh Agrawal; Shashank Indukuri

arxiv: 2607.01457 · v1 · pith:76EHTETYnew · submitted 2026-07-01 · 💻 cs.CL · cs.AI

Grounded Optimization: A Layered Engineering Framework for Reducing LLM Hallucination in Automated Personal Document Rewriting

Shashank Indukuri , Adarsh Agrawal This is my paper

Pith reviewed 2026-07-03 21:02 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords LLM hallucinationresume optimizationgrounded optimizationprompt groundinghallucination mitigationdocument rewritingAI reliability

0 comments

The pith

A five-layer framework reduces detected hallucinations in LLM resume rewriting from 2.48-5.36 to 0.04-0.24 per resume.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Grounded Optimization as a five-layer framework to cut specific hallucination failures when LLMs rewrite resumes for applicant tracking systems. The failures include anachronistic technology claims, cross-domain term mixing, structural changes, and outright fabrications. Ablation tests on 25 synthetic resumes across 14 industries show that baselines produce multiple hallucinations while the full framework and even its prompt-grounding layer alone bring rates near zero under suitable conditions. A reader would care because these tools now handle job-critical documents where errors can directly affect hiring outcomes.

Core claim

The authors establish that the Grounded Optimization framework, consisting of temporal context validation, deterministic contamination detection, structural invariant enforcement, prompt-level grounding, and an evaluator agent, lowers the overall detected hallucination rate to 0.04-0.24 per resume across three LLMs and four temperature settings, down from 2.48-5.36 in undefended baselines, while prompt-level grounding alone reaches zero detected hallucinations at low temperature with a capable model.

What carries the argument

The five-layer Grounded Optimization framework that enforces temporal validation, contamination detection, structural invariants, prompt grounding, and agent evaluation to block anachronistic injection, terminology contamination, structural mutation, and fabrication.

If this is right

Temporal hallucinations fall by 50-95 percent across all tested conditions when the layers are active.
Prompt-level grounding by itself produces zero detected hallucinations at low temperature with a capable instruction-following model.
Higher temperatures and weaker models require the deterministic layers to maintain low hallucination rates.
The approach was tested on 25 synthetic resumes spanning 14 industries using three LLMs and six layer configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The released contamination taxonomy could be applied to measure similar hallucination patterns in LLM rewriting of other personal documents such as cover letters.
The layered structure offers a template for adding deterministic checks to other high-stakes LLM document tasks where synthetic test results need real-world validation.

Load-bearing premise

The synthetic resumes and the independent hallucination detectors used in the ablation experiments accurately capture and measure the real-world hallucination types without significant false positives or missed cases.

What would settle it

Running the framework on a collection of real user resumes, then having domain experts compare each output against the original facts for the four hallucination types, would confirm or refute whether the measured reductions occur outside synthetic test data.

Figures

Figures reproduced from arXiv: 2607.01457 by Adarsh Agrawal, Shashank Indukuri.

read the original abstract

Large language models (LLMs) are increasingly applied to resume optimization for applicant tracking systems, introducing hallucination failures distinct from general text generation: anachronistic technology injection, cross-domain terminology contamination, structural mutation, and content fabrication. We present Grounded Optimization, a five-layer framework combining temporal context validation, deterministic contamination detection, structural invariant enforcement, prompt-level grounding, and an evaluator agent. In ablation experiments across three LLMs, four temperature settings, and six layer configurations on 25 synthetic resumes spanning 14 industries, undefended baselines produce 2.48-5.36 detected hallucinations per resume. Among detectors independent of the active defenses, temporal hallucinations are reduced by 50-95% across all conditions; overall detected hallucination rate falls to 0.04-0.24. Prompt-level grounding alone achieves zero detected hallucinations at low temperature with a capable instruction-following model; higher temperatures and weaker models reveal the need for the deterministic layers as a complement. We release the contamination taxonomy, evaluation code, and raw data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The layered framework cuts detected hallucinations on synthetic resumes but depends on unvalidated custom detectors.

read the letter

The main takeaway is that a five-layer stack—temporal checks, contamination detection, structural rules, prompt grounding, and an evaluator—drops detected hallucinations from 2.48-5.36 per resume down to 0.04-0.24 across three models and temperature settings. Prompt grounding alone reaches zero in the low-temperature, strong-model case.

What the paper does is apply established prompt and rule ideas to resume-specific failure modes like anachronistic tech injection and cross-domain contamination. The ablation table across layer combinations is clear, and releasing the taxonomy, code, and raw data on 25 synthetic resumes lets others inspect the setup.

The soft spot is the measurement itself. The numbers come from the authors' own detectors run on synthetic data, with no reported human validation, inter-annotator agreement, or tests on real resumes. If those detectors miss real cases or over-flag synthetic artifacts, the claimed reductions become hard to trust. The abstract does not address this.

This is for practitioners who need reliable LLM output on professional documents like resumes. It will not shift general hallucination theory but offers a concrete, testable pattern for one application.

Send it to review. The empirical work and open resources are enough to justify referee time, even if the detector validation needs strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Grounded Optimization, a five-layer framework (temporal context validation, deterministic contamination detection, structural invariant enforcement, prompt-level grounding, and evaluator agent) to reduce four specific hallucination types in LLM-based resume rewriting. Ablation experiments across three LLMs, four temperatures, and six layer configurations on 25 synthetic resumes report baseline rates of 2.48-5.36 detected hallucinations per resume falling to 0.04-0.24 overall, with prompt-level grounding alone reaching zero at low temperature on capable models. The contamination taxonomy, evaluation code, and raw data are released.

Significance. If the empirical reductions hold under validated measurement, the work supplies a practical, composable engineering framework for constraining hallucinations in domain-specific LLM applications such as HR document processing. The explicit release of the taxonomy, code, and raw data is a clear strength that enables direct reproduction and extension by others.

major comments (2)

[Ablation experiments] The central quantitative claims (baseline 2.48-5.36 hallucinations/resume reduced to 0.04-0.24) rest on custom detectors for anachronistic injection, contamination, structural mutation, and fabrication applied to the 25 synthetic resumes. The manuscript provides no validation of these detectors (e.g., inter-annotator agreement, precision/recall against human labels on real resumes, or comparison to established hallucination benchmarks), which is load-bearing for the reported layer-wise reductions.
[Data and experimental setup] The construction and representativeness of the 25 synthetic resumes spanning 14 industries are not described in sufficient detail to determine whether they faithfully elicit the targeted real-world hallucination modes or whether detector outputs could be artifacts of the synthetic generation process itself.

minor comments (1)

[Abstract and §4] The abstract states that 'detectors independent of the active defenses' were used, but the main text does not specify the exact protocol ensuring this independence across all six layer configurations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. The feedback highlights important considerations for the validation of our hallucination detectors and the description of our synthetic dataset. We respond to each major comment below and commit to revisions that address the concerns raised.

read point-by-point responses

Referee: The central quantitative claims (baseline 2.48-5.36 hallucinations/resume reduced to 0.04-0.24) rest on custom detectors for anachronistic injection, contamination, structural mutation, and fabrication applied to the 25 synthetic resumes. The manuscript provides no validation of these detectors (e.g., inter-annotator agreement, precision/recall against human labels on real resumes, or comparison to established hallucination benchmarks), which is load-bearing for the reported layer-wise reductions.

Authors: We acknowledge the importance of validating the custom detectors. These detectors combine deterministic rules for temporal validation, contamination detection, and structural enforcement with an evaluator agent for fabrication. The full implementation is released with the paper to support reproducibility and external verification. We did not conduct inter-annotator agreement or human evaluation on real resumes due to challenges in accessing privacy-sensitive professional documents with reliable ground truth labels. We will add a new subsection in the discussion to explicitly address the detector design, its limitations, and the rationale for using synthetic data. Additionally, we will include a comparison to existing hallucination benchmarks where applicable to contextualize our approach. revision: partial
Referee: The construction and representativeness of the 25 synthetic resumes spanning 14 industries are not described in sufficient detail to determine whether they faithfully elicit the targeted real-world hallucination modes or whether detector outputs could be artifacts of the synthetic generation process itself.

Authors: We agree that the synthetic resume construction requires more detailed exposition. The 25 resumes were generated to cover 14 industries with controlled variations designed to provoke each of the four hallucination types (e.g., inclusion of future-dated technologies for anachronisms). We will revise the experimental setup section to include the complete generation protocol, including the base templates, industry-specific adaptations, and the specific prompts or rules used to introduce hallucination-prone elements. This will allow readers to assess the fidelity to real-world scenarios and rule out generation artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ablation results on synthetic data

full rationale

The paper presents an engineering framework evaluated via ablation experiments on 25 synthetic resumes using custom detectors and a contamination taxonomy. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear in the provided text. Results are reported as measured outcomes (e.g., hallucination rates dropping from 2.48-5.36 to 0.04-0.24) rather than quantities defined by construction from the authors' inputs or prior work. The measurement approach relies on author-defined detectors, but this is an empirical limitation, not a circular reduction in any derivation chain. The paper is self-contained as an applied study without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the validity of the hallucination taxonomy and the synthetic dataset construction, neither of which is detailed in the abstract; no free parameters, axioms, or invented entities are explicitly introduced in the provided text.

pith-pipeline@v0.9.1-grok · 5718 in / 1171 out tokens · 17310 ms · 2026-07-03T21:02:59.835656+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 4 internal anchors

[1]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023

work page 2023
[2]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. Siren’s song in the ai ocean: A survey on hallucination in large language models.arXiv preprint arXiv:2309.01219, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark JF Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004–9017, 2023

work page 2023
[4]

Evaluating the factual consistency of abstractive text summarization

Wojciech Kry´sci´nski, Bryan McCann, Caiming Xiong, and Richard Socher. Evaluating the factual consistency of abstractive text summarization. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 9332–9346, 2020

work page 2020
[5]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

work page 2020
[6]

Career-Aware Resume Tailoring via Multi-Source Retrieval-Augmented Generation with Provenance Tracking: A Case Study

Kumar Abhinav. Career-aware resume tailoring via multi-source retrieval-augmented generation with provenance tracking: A case study.arXiv preprint arXiv:2605.05257, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

Langgraph: Building stateful, multi-actor applications with llms

LangChain. Langgraph: Building stateful, multi-actor applications with llms. https:// github.com/langchain-ai/langgraph, 2024

work page 2024
[8]

Critic: Large language models can self-correct with tool-interactive critiquing

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large language models can self-correct with tool-interactive critiquing. In Proceedings of the Twelfth International Conference on Learning Representations, 2024

work page 2024
[9]

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, 2023

work page 2023
[10]

Halueval: A large-scale hallucination evaluation benchmark for large language models

Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. Halueval: A large-scale hallucination evaluation benchmark for large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, 2023

work page 2023
[11]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

work page 2024
[12]

Improved lexically constrained decoding for translation and monolingual rewriting

J Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. Improved lexically constrained decoding for translation and monolingual rewriting. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 839–850, 2019

work page 2019
[13]

Neurologic decoding: (un)supervised neural text generation with predicate logic constraints

Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Neurologic decoding: (un)supervised neural text generation with predicate logic constraints. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pages 4288–4299, 2021

work page 2021
[14]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325, 2023. 9

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36, 2023

work page 2023
[16]

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

Adarsh Agrawal and Shashank Indukuri. Schema-first retrieval: Embedding catalogs for natural language analytics.arXiv preprint arXiv:2606.28387, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[17]

Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, et al

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, et al. Introducing v0.5 of the AI safety benchmark from MLCommons.arXiv preprint arXiv:2404.12241, 2024

work page arXiv 2024
[18]

Resume screening using natural language processing and machine learning: A systematic review

Ankit Kumar Sinha, M Amir Khusru Akhtar, and Anand Kumar. Resume screening using natural language processing and machine learning: A systematic review. InMachine Learning and Information Processing, pages 207–218. Springer, 2021

work page 2021
[19]

An improved deep neural network model for job matching

Yao Deng, Hang Lei, Xiao Li, and Yihong Lin. An improved deep neural network model for job matching. In2018 International Conference on Algorithms and Architectures for Parallel Processing, pages 86–96. Springer, 2018

work page 2018
[20]

Methodology for resume parsing and job domain prediction.Journal of Statistics and Management Systems, 23(7): 1263–1274, 2020

Vikas Mittal, Palak Mehta, Devesh Relan, and Garima Shakhla. Methodology for resume parsing and job domain prediction.Journal of Statistics and Management Systems, 23(7): 1263–1274, 2020

work page 2020
[21]

S3” matches the AWS service but not substrings like “MS365

Saurabh Bhausaheb Zinjad, Amrita Bhattacharjee, Amey Bhilegaonkar, and Huan Liu. Resume- flow: An llm-facilitated pipeline for personalized resume generation and refinement.arXiv preprint arXiv:2402.06221, 2024. A Temporal Context Validation Details A.1 Timeline Construction Given a resume R with professional experience entries E={e 1, . . . , en}, where ...

work page arXiv 2024
[22]

Parse: LLM-based PDF-to-JSON conversion, producing structured resume data with typed fields (contact info, experience entries with dates and bullet points, education, skills, projects, certifications)

work page
[23]

3.Rewrite: Multi-agent parallel optimization with all five defense layers active

Score: ATS scoring against the target job description, producing section-level feedback and an aggregate score. 3.Rewrite: Multi-agent parallel optimization with all five defense layers active

work page
[24]

Azure ML Studio,

Re-Score: The optimized resume is scored again; if the score has not improved sufficiently, the rewrite stage is repeated (up to 5 cycles). E.2 Agent Specialization Five specialized agents run in parallel: •Summary Agent: Optimizes the professional summary •Skills Agent: Aligns skills with job requirements •Experience Agent: Rewrites professional experien...

work page 2017

[1] [1]

Survey of hallucination in natural language generation

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023

work page 2023

[2] [2]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. Siren’s song in the ai ocean: A survey on hallucination in large language models.arXiv preprint arXiv:2309.01219, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark JF Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004–9017, 2023

work page 2023

[4] [4]

Evaluating the factual consistency of abstractive text summarization

Wojciech Kry´sci´nski, Bryan McCann, Caiming Xiong, and Richard Socher. Evaluating the factual consistency of abstractive text summarization. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 9332–9346, 2020

work page 2020

[5] [5]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

work page 2020

[6] [6]

Career-Aware Resume Tailoring via Multi-Source Retrieval-Augmented Generation with Provenance Tracking: A Case Study

Kumar Abhinav. Career-aware resume tailoring via multi-source retrieval-augmented generation with provenance tracking: A case study.arXiv preprint arXiv:2605.05257, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

Langgraph: Building stateful, multi-actor applications with llms

LangChain. Langgraph: Building stateful, multi-actor applications with llms. https:// github.com/langchain-ai/langgraph, 2024

work page 2024

[8] [8]

Critic: Large language models can self-correct with tool-interactive critiquing

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large language models can self-correct with tool-interactive critiquing. In Proceedings of the Twelfth International Conference on Learning Representations, 2024

work page 2024

[9] [9]

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, 2023

work page 2023

[10] [10]

Halueval: A large-scale hallucination evaluation benchmark for large language models

Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. Halueval: A large-scale hallucination evaluation benchmark for large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, 2023

work page 2023

[11] [11]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

work page 2024

[12] [12]

Improved lexically constrained decoding for translation and monolingual rewriting

J Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. Improved lexically constrained decoding for translation and monolingual rewriting. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 839–850, 2019

work page 2019

[13] [13]

Neurologic decoding: (un)supervised neural text generation with predicate logic constraints

Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Neurologic decoding: (un)supervised neural text generation with predicate logic constraints. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pages 4288–4299, 2021

work page 2021

[14] [14]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325, 2023. 9

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36, 2023

work page 2023

[16] [16]

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

Adarsh Agrawal and Shashank Indukuri. Schema-first retrieval: Embedding catalogs for natural language analytics.arXiv preprint arXiv:2606.28387, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[17] [17]

Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, et al

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, et al. Introducing v0.5 of the AI safety benchmark from MLCommons.arXiv preprint arXiv:2404.12241, 2024

work page arXiv 2024

[18] [18]

Resume screening using natural language processing and machine learning: A systematic review

Ankit Kumar Sinha, M Amir Khusru Akhtar, and Anand Kumar. Resume screening using natural language processing and machine learning: A systematic review. InMachine Learning and Information Processing, pages 207–218. Springer, 2021

work page 2021

[19] [19]

An improved deep neural network model for job matching

Yao Deng, Hang Lei, Xiao Li, and Yihong Lin. An improved deep neural network model for job matching. In2018 International Conference on Algorithms and Architectures for Parallel Processing, pages 86–96. Springer, 2018

work page 2018

[20] [20]

Methodology for resume parsing and job domain prediction.Journal of Statistics and Management Systems, 23(7): 1263–1274, 2020

Vikas Mittal, Palak Mehta, Devesh Relan, and Garima Shakhla. Methodology for resume parsing and job domain prediction.Journal of Statistics and Management Systems, 23(7): 1263–1274, 2020

work page 2020

[21] [21]

S3” matches the AWS service but not substrings like “MS365

Saurabh Bhausaheb Zinjad, Amrita Bhattacharjee, Amey Bhilegaonkar, and Huan Liu. Resume- flow: An llm-facilitated pipeline for personalized resume generation and refinement.arXiv preprint arXiv:2402.06221, 2024. A Temporal Context Validation Details A.1 Timeline Construction Given a resume R with professional experience entries E={e 1, . . . , en}, where ...

work page arXiv 2024

[22] [22]

Parse: LLM-based PDF-to-JSON conversion, producing structured resume data with typed fields (contact info, experience entries with dates and bullet points, education, skills, projects, certifications)

work page

[23] [23]

3.Rewrite: Multi-agent parallel optimization with all five defense layers active

Score: ATS scoring against the target job description, producing section-level feedback and an aggregate score. 3.Rewrite: Multi-agent parallel optimization with all five defense layers active

work page

[24] [24]

Azure ML Studio,

Re-Score: The optimized resume is scored again; if the score has not improved sufficiently, the rewrite stage is repeated (up to 5 cycles). E.2 Agent Specialization Five specialized agents run in parallel: •Summary Agent: Optimizes the professional summary •Skills Agent: Aligns skills with job requirements •Experience Agent: Rewrites professional experien...

work page 2017