HTTP REST API Structure Learning

Amit Dvir; Ran Dubin

arxiv: 2607.02442 · v1 · pith:Y5H4DW7Mnew · submitted 2026-07-02 · 💻 cs.SE · cs.CR

HTTP REST API Structure Learning

Ran Dubin , Amit Dvir This is my paper

Pith reviewed 2026-07-03 08:26 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords API securityanomaly detectionREST APIunsupervised learningnetwork traffic analysisHTTP securitymalicious request detection

0 comments

The pith

HRAL learns REST API endpoint structures directly from network traffic to detect anomalies without documentation or rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HRAL as an unsupervised anomaly detection method that builds models of normal REST API behavior and structure solely by observing network traffic. It does not depend on API documentation or hand-crafted rules to identify deviations that may signal attacks. This matters in practice because many deployed APIs lack complete or up-to-date documentation, which limits the reach of security tools that require such inputs. Evaluation across different levels of documentation completeness shows HRAL reaching average recall of 82.07 percent and F1-score of 87.24 percent while outperforming prior techniques in low-documentation settings. When paired with existing signature rules such as OWASP ModSecurity CRS, detection reaches 100 percent.

Core claim

HRAL is a novel unsupervised anomaly detection approach that models the structure and behavior of API endpoints directly from network traffic, without relying on predefined rules or documentation, enabling robust detection of malicious activity by understanding how APIs behave and flagging deviations as potential threats. It achieves an average recall of 82.07% and an F1-score of 87.24%, significantly outperforming alternatives when API documentation is limited, and reaches 100% detection when combined with OWASP ModSecurity CRS.

What carries the argument

HRAL, the unsupervised model that extracts and baselines endpoint structure and behavior patterns from raw network traffic for deviation-based detection.

If this is right

HRAL maintains high recall and F1 scores even when OpenAPI documentation is sparse or absent.
Performance approaches that of systems given complete API definitions.
Pairing the learned model with signature rules such as OWASP ModSecurity CRS yields complete detection coverage.
The method supports security in real-world environments where APIs are only partially documented.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Layered defenses could treat traffic-derived models as a default baseline that requires no manual documentation updates.
The same traffic-driven learning pattern could be tested on non-REST protocols or internal microservice meshes.
If traffic patterns prove stable across versions, the approach might reduce the documentation burden on API maintainers.

Load-bearing premise

Network traffic alone contains sufficient and representative information to model normal API endpoint structure and behavior without predefined rules or documentation.

What would settle it

A controlled test on a production API where traffic logs are captured, a set of undocumented malicious requests are injected, and HRAL's detection rate falls well below the reported 82 percent recall.

Figures

Figures reproduced from arXiv: 2607.02442 by Amit Dvir, Ran Dubin.

**Figure 1.** Figure 1: OpenAPI Request-Based Algorithm endpoint representation. By detecting abnormal patterns in API requests, we can identify changes to API endpoints, new APIs, or new attacks. In our evaluation, we have the following abnormal detection behaviors: Minimal/Basic/Full OpenAPI (Supervised) spec: The ground truth evaluation method does not utilize API understanding; instead, it relies solely on a comparison with t… view at source ↗

**Figure 2.** Figure 2: OpenAPI Request-based and Spec-building Algorithm [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Application Programming Interfaces (APIs) are essential in software development, enabling web services, mobile apps, and microservices. However, their widespread use introduces significant security risks, highlighting the importance of API security. This paper presents HTTP REST API Learning (HRAL), a novel unsupervised anomaly detection approach that models the structure and behavior of API endpoints directly from network traffic, without relying on predefined rules or documentation. HRAL enables robust detection of malicious activity by understanding how APIs behave and flagging deviations as potential threats. We evaluate HRAL across varying levels of OpenAPI documentation detail and compare it with existing techniques. HRAL achieves strong performance, with an average recall of 82.07% and an F1-score of 87.24%, significantly outperforming alternatives when API documentation is limited. Moreover, our results approach the effectiveness of full API document definitions. When combined with signature-based rules such as the OWASP ModSecurity CRS, our system achieves 100% detection. These results highlight HRAL's effectiveness in real-world, partially documented API environments and its potential as a foundational layer for modern API security solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces HRAL for unsupervised REST API modeling from traffic with reported good metrics, but without methodology or data details the claims cannot be evaluated.

read the letter

The one or two things to know about this paper are that it presents HRAL, a new unsupervised approach to learning the structure of HTTP REST APIs directly from network traffic for anomaly detection, and that it reports average recall of 82 percent and F1 of 87 percent, with perfect detection when combined with OWASP rules. These claims are made in the context of limited API documentation.

What the paper does well is address a practical issue in API security where documentation is often missing or incomplete. The unsupervised nature means it doesn't require predefined rules or full specs, which is a common constraint. It also shows how the method can be used alongside existing signature-based systems to improve overall detection. The comparison to alternatives when documentation is limited highlights a potential advantage in real-world settings.

The soft spots are significant in the evaluation and methodology. The abstract states the performance numbers but provides no information on the datasets used, the specific algorithm for modeling the API structure, the evaluation protocol, or any error analysis. This makes it impossible to verify if the results are robust or if there are issues like overfitting or biased test data. The assumption that network traffic alone contains enough information to model normal behavior is plausible but unexamined in the given text. Without these details, the soundness of the central claims is hard to judge.

This paper is aimed at researchers and practitioners in API security and network anomaly detection. A reader who works on building security tools for microservices or web APIs might find the high-level approach interesting as a starting point for traffic-based modeling. However, to get real value, the full paper would need to supply the missing technical details.

Overall, the work seems like it could be worth engaging with if the full manuscript includes reproducible methods and solid experiments. I would recommend that a serious editor send this to peer review rather than desk reject, because the problem is relevant and the unsupervised angle is worth checking out even if revisions are needed.

Referee Report

1 major / 0 minor

Summary. The paper presents HTTP REST API Learning (HRAL), a novel unsupervised anomaly detection approach that models the structure and behavior of REST API endpoints directly from network traffic without predefined rules or documentation. It evaluates the method across varying levels of OpenAPI documentation detail, claiming an average recall of 82.07% and F1-score of 87.24% that significantly outperforms alternatives when documentation is limited, approaches the effectiveness of full API definitions, and reaches 100% detection when combined with OWASP ModSecurity CRS.

Significance. If substantiated by rigorous evaluation, the work would represent a meaningful advance in API security by demonstrating a documentation-independent traffic-based modeling technique that can serve as a foundational layer alongside signature-based systems in partially documented environments.

major comments (1)

[Abstract] Abstract: The central performance claims (average recall of 82.07% and F1-score of 87.24%) are stated without any description of the evaluation methodology, datasets, experimental protocol, baselines, or error analysis. This omission renders the quantitative results impossible to assess or reproduce.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comment on the abstract. We agree that additional context is needed there and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claims (average recall of 82.07% and F1-score of 87.24%) are stated without any description of the evaluation methodology, datasets, experimental protocol, baselines, or error analysis. This omission renders the quantitative results impossible to assess or reproduce.

Authors: We agree that the abstract would benefit from a concise description of the evaluation setup. In the revised version we will add a brief clause noting the use of real-world HTTP traffic traces, the protocol of testing across partial-to-full OpenAPI documentation levels, the comparison against signature-based and other unsupervised baselines, and that detailed methodology, results, and error analysis appear in Sections 4–5. This keeps the abstract within length limits while making the claims assessable. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and manuscript description present an unsupervised modeling approach from network traffic with reported empirical performance metrics (recall, F1-score) from comparisons to alternatives. No equations, derivations, self-citations, or fitted parameters are described that reduce any claim to its own inputs by construction. The evaluation relies on external benchmarks rather than internal definitions or self-referential predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5712 in / 992 out tokens · 20841 ms · 2026-07-03T08:26:54.467593+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references

[1]

Research Towards Key Issues of API Security,

R. Sun, Q. Wang, and L. Guo, “Research Towards Key Issues of API Security,” inCNCERT, Beijing, China, July 20–21, 2021, pp. 179–192

2021
[2]

Mobile application web api reconnaissance: Web-to-mobile inconsistencies & vulnerabilities,

A. Mendoza and G. Gu, “Mobile application web api reconnaissance: Web-to-mobile inconsistencies & vulnerabilities,” inSP, 2018, pp. 756– 769

2018
[3]

Operation NightScout: Supply-chain at- tack targets online gaming in Asia

I. Sanmillan, “Operation NightScout: Supply-chain at- tack targets online gaming in Asia.” [On- line]. Available: https://www.welivesecurity.com/2021/02/01/ operation-nightscout-supply-chain-attack-online-gaming-asia/

2021
[4]

Log4j CVE-2021-44228

M. CVE, “Log4j CVE-2021-44228.” [Online]. Available: https: //cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2021-44228

2021
[5]

Defense-In-Depth Security Strategy in Log4j Vulnerability Analysis,

S. Feng and M. Lubis, “Defense-In-Depth Security Strategy in Log4j Vulnerability Analysis,” inICADEIS, 2022, pp. 01–04

2022
[6]

OpenAPI Sepcification

O. Organization, “OpenAPI Sepcification.” [Online]. Available: https: //www.openapis.org/
[7]

Overview of machine learning processes used in improving security in api-based web applications,

E. M. Pas ,ca, R. Erdei, D. Delinschi, and O. Matei, “Overview of machine learning processes used in improving security in api-based web applications,” inArtificial Intelligence Application in Networks and Systems, R. Silhavy and P. Silhavy, Eds., 2023, pp. 367–381

2023
[8]

Speculator: A library for reconstructing OpenAPI specification from traffic of HTTP transactions

OpenClarity, “Speculator: A library for reconstructing OpenAPI specification from traffic of HTTP transactions.” [Online]. Available: https://github.com/openclarity/speculator
[9]

Analysis and mitigation of nosql injections,

A. Ron, A. Shulman-Peleg, and A. Puzanov, “Analysis and mitigation of nosql injections,”IEEE Security & Privacy, vol. 14, no. 2, pp. 30–39, 2016

2016
[10]

Seapp: A secure application management framework based on rest api access control in sdn-enabled cloud environment,

T. Hu, Z. Zhang, P. Yi, D. Liang, Z. Li, Q. Ren, Y . Hu, and J. Lan, “Seapp: A secure application management framework based on rest api access control in sdn-enabled cloud environment,”Journal of Parallel and Distributed Computing, vol. 147, pp. 108–123, 2021

2021
[11]

API Security in Large Enterprises: Leveraging Machine Learning for Anomaly Detection,

G. Baye, F. Hussain, A. Oracevic, R. Hussain, and S. Ahsan Kazmi, “API Security in Large Enterprises: Leveraging Machine Learning for Anomaly Detection,” inISNCC, 2021, pp. 1–6

2021
[12]

Best Practices to Secure API Implementations in Core Banking System (CBS) in Banks,

M. Ul Alam, M. A. K. Azad, and M. S. Ali, “Best Practices to Secure API Implementations in Core Banking System (CBS) in Banks,” in CCWC, 2022, pp. 0730–0735

2022
[13]

Machine learning for detecting fraud in an API,

A. S ´anchez Espunyes, “Machine learning for detecting fraud in an API,” 2022

2022
[14]

Auto-encoder lstm methods for anomaly- based web application firewallall,

A. Moradi Vartouni, S. Mehralian, M. Teshnehlab, and S. Sedighian Kashi, “Auto-encoder lstm methods for anomaly- based web application firewallall,”International Journal of Information and Communication Technology Research, vol. 11, no. 3, pp. 49–56, 2019

2019
[15]

Auto-threshold deep SVDD for anomaly-based web application firewall,

A. Moradi Vartouni, M. Shokri, and M. Teshnehlab, “Auto-threshold deep SVDD for anomaly-based web application firewall,” 2021

2021
[16]

Securing microservices with deep learning — Long Short- Term Memory Autoencoder for Anomaly Detection,

L. S. Arstila, “Securing microservices with deep learning — Long Short- Term Memory Autoencoder for Anomaly Detection,” Otakaari 24, 02150 Espoo, Finland, May 2023

2023
[17]

The role of anomaly detection in api security: A machine learning approach,

J. Paul, “The role of anomaly detection in api security: A machine learning approach,” 11 2024

2024
[18]

An algorithm for suffix stripping,

M. F. Porter, “An algorithm for suffix stripping,”Program, vol. 14, no. 3, pp. 130–137, 1980

1980
[19]

Scalable hierarchical agglomerative clustering,

N. Monath, K. A. Dubey, G. Guruganesh, M. Zaheer, A. Ahmed, A. McCallum, G. Mergen, M. Najork, M. Terzihan, B. Tjanakaet al., “Scalable hierarchical agglomerative clustering,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 1245–1255

2021
[20]

Ward’s hierarchical agglomerative clus- tering method: which algorithms implement Ward’s criterion?

F. Murtagh and P. Legendre, “Ward’s hierarchical agglomerative clus- tering method: which algorithms implement Ward’s criterion?”Journal of Classification, vol. 31, pp. 274–295, 2014

2014
[21]

HTTP DATASET CSIC 2010

Cisco, “HTTP DATASET CSIC 2010.” [Online]. Available: https: //www.tic.itefi.csic.es/dataset/

2010
[22]

A classification-by- retrieval framework for few-shot anomaly detection to detect api in- jection,

U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by- retrieval framework for few-shot anomaly detection to detect api in- jection,”Computers & Security, vol. 150, p. 104249, 2025

2025
[23]

CVE-2021-44228,

N. I. of Standards and Technology, “CVE-2021-44228,” 2021, https: //nvd.nist.gov/vuln/detail/CVE-2021-44228

2021
[24]

OWSAP Log Forging,

OWSAP-Log, “OWSAP Log Forging,” 2023, https://owasp.org/ www-community/attacks/Log Injection

2023
[25]

OWSAP Log Injection,

OWSAP-SQL, “OWSAP Log Injection,” 2023, https://owasp.org/ www-community/attacks/SQL Injection

2023
[26]

OW ASP ModSecurity Core Rule Set

OW ASP, “OW ASP ModSecurity Core Rule Set.” [Online]. Available: https://owasp.org/www-project-modsecurity-core-rule-set/#: ∼:text=The%20OW ASP%20ModSecurity%20Core%20Rule,a% 20minimum%20of%20false%20alerts

[1] [1]

Research Towards Key Issues of API Security,

R. Sun, Q. Wang, and L. Guo, “Research Towards Key Issues of API Security,” inCNCERT, Beijing, China, July 20–21, 2021, pp. 179–192

2021

[2] [2]

Mobile application web api reconnaissance: Web-to-mobile inconsistencies & vulnerabilities,

A. Mendoza and G. Gu, “Mobile application web api reconnaissance: Web-to-mobile inconsistencies & vulnerabilities,” inSP, 2018, pp. 756– 769

2018

[3] [3]

Operation NightScout: Supply-chain at- tack targets online gaming in Asia

I. Sanmillan, “Operation NightScout: Supply-chain at- tack targets online gaming in Asia.” [On- line]. Available: https://www.welivesecurity.com/2021/02/01/ operation-nightscout-supply-chain-attack-online-gaming-asia/

2021

[4] [4]

Log4j CVE-2021-44228

M. CVE, “Log4j CVE-2021-44228.” [Online]. Available: https: //cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2021-44228

2021

[5] [5]

Defense-In-Depth Security Strategy in Log4j Vulnerability Analysis,

S. Feng and M. Lubis, “Defense-In-Depth Security Strategy in Log4j Vulnerability Analysis,” inICADEIS, 2022, pp. 01–04

2022

[6] [6]

OpenAPI Sepcification

O. Organization, “OpenAPI Sepcification.” [Online]. Available: https: //www.openapis.org/

[7] [7]

Overview of machine learning processes used in improving security in api-based web applications,

E. M. Pas ,ca, R. Erdei, D. Delinschi, and O. Matei, “Overview of machine learning processes used in improving security in api-based web applications,” inArtificial Intelligence Application in Networks and Systems, R. Silhavy and P. Silhavy, Eds., 2023, pp. 367–381

2023

[8] [8]

Speculator: A library for reconstructing OpenAPI specification from traffic of HTTP transactions

OpenClarity, “Speculator: A library for reconstructing OpenAPI specification from traffic of HTTP transactions.” [Online]. Available: https://github.com/openclarity/speculator

[9] [9]

Analysis and mitigation of nosql injections,

A. Ron, A. Shulman-Peleg, and A. Puzanov, “Analysis and mitigation of nosql injections,”IEEE Security & Privacy, vol. 14, no. 2, pp. 30–39, 2016

2016

[10] [10]

Seapp: A secure application management framework based on rest api access control in sdn-enabled cloud environment,

T. Hu, Z. Zhang, P. Yi, D. Liang, Z. Li, Q. Ren, Y . Hu, and J. Lan, “Seapp: A secure application management framework based on rest api access control in sdn-enabled cloud environment,”Journal of Parallel and Distributed Computing, vol. 147, pp. 108–123, 2021

2021

[11] [11]

API Security in Large Enterprises: Leveraging Machine Learning for Anomaly Detection,

G. Baye, F. Hussain, A. Oracevic, R. Hussain, and S. Ahsan Kazmi, “API Security in Large Enterprises: Leveraging Machine Learning for Anomaly Detection,” inISNCC, 2021, pp. 1–6

2021

[12] [12]

Best Practices to Secure API Implementations in Core Banking System (CBS) in Banks,

M. Ul Alam, M. A. K. Azad, and M. S. Ali, “Best Practices to Secure API Implementations in Core Banking System (CBS) in Banks,” in CCWC, 2022, pp. 0730–0735

2022

[13] [13]

Machine learning for detecting fraud in an API,

A. S ´anchez Espunyes, “Machine learning for detecting fraud in an API,” 2022

2022

[14] [14]

Auto-encoder lstm methods for anomaly- based web application firewallall,

A. Moradi Vartouni, S. Mehralian, M. Teshnehlab, and S. Sedighian Kashi, “Auto-encoder lstm methods for anomaly- based web application firewallall,”International Journal of Information and Communication Technology Research, vol. 11, no. 3, pp. 49–56, 2019

2019

[15] [15]

Auto-threshold deep SVDD for anomaly-based web application firewall,

A. Moradi Vartouni, M. Shokri, and M. Teshnehlab, “Auto-threshold deep SVDD for anomaly-based web application firewall,” 2021

2021

[16] [16]

Securing microservices with deep learning — Long Short- Term Memory Autoencoder for Anomaly Detection,

L. S. Arstila, “Securing microservices with deep learning — Long Short- Term Memory Autoencoder for Anomaly Detection,” Otakaari 24, 02150 Espoo, Finland, May 2023

2023

[17] [17]

The role of anomaly detection in api security: A machine learning approach,

J. Paul, “The role of anomaly detection in api security: A machine learning approach,” 11 2024

2024

[18] [18]

An algorithm for suffix stripping,

M. F. Porter, “An algorithm for suffix stripping,”Program, vol. 14, no. 3, pp. 130–137, 1980

1980

[19] [19]

Scalable hierarchical agglomerative clustering,

N. Monath, K. A. Dubey, G. Guruganesh, M. Zaheer, A. Ahmed, A. McCallum, G. Mergen, M. Najork, M. Terzihan, B. Tjanakaet al., “Scalable hierarchical agglomerative clustering,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 1245–1255

2021

[20] [20]

Ward’s hierarchical agglomerative clus- tering method: which algorithms implement Ward’s criterion?

F. Murtagh and P. Legendre, “Ward’s hierarchical agglomerative clus- tering method: which algorithms implement Ward’s criterion?”Journal of Classification, vol. 31, pp. 274–295, 2014

2014

[21] [21]

HTTP DATASET CSIC 2010

Cisco, “HTTP DATASET CSIC 2010.” [Online]. Available: https: //www.tic.itefi.csic.es/dataset/

2010

[22] [22]

A classification-by- retrieval framework for few-shot anomaly detection to detect api in- jection,

U. Aharon, R. Dubin, A. Dvir, and C. Hajaj, “A classification-by- retrieval framework for few-shot anomaly detection to detect api in- jection,”Computers & Security, vol. 150, p. 104249, 2025

2025

[23] [23]

CVE-2021-44228,

N. I. of Standards and Technology, “CVE-2021-44228,” 2021, https: //nvd.nist.gov/vuln/detail/CVE-2021-44228

2021

[24] [24]

OWSAP Log Forging,

OWSAP-Log, “OWSAP Log Forging,” 2023, https://owasp.org/ www-community/attacks/Log Injection

2023

[25] [25]

OWSAP Log Injection,

OWSAP-SQL, “OWSAP Log Injection,” 2023, https://owasp.org/ www-community/attacks/SQL Injection

2023

[26] [26]

OW ASP ModSecurity Core Rule Set

OW ASP, “OW ASP ModSecurity Core Rule Set.” [Online]. Available: https://owasp.org/www-project-modsecurity-core-rule-set/#: ∼:text=The%20OW ASP%20ModSecurity%20Core%20Rule,a% 20minimum%20of%20false%20alerts