pith. sign in

arxiv: 2607.00553 · v1 · pith:VLEVLYRHnew · submitted 2026-07-01 · 💻 cs.CR · cs.AI

Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

Pith reviewed 2026-07-02 11:31 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords intrusion detectionIIoTcross-domain generalizationlightweight modelsexplainabilityport featuresshortcut learning
0
0 comments X

The pith

Lightweight IIoT intrusion detectors fail to generalize across networks because they over-rely on source-specific port features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains four lightweight machine learning architectures for intrusion detection on a single IIoT dataset and evaluates them without retraining on two structurally different IIoT datasets, using only features present in all three sources. The models exhibit poor cross-domain performance. Explainability analysis on the top models shows they depend overwhelmingly on coarse port-category features, which appear at 96 to 435 times higher rates in the source attack traffic than in the target domains. The work also observes that realistic class imbalance can reverse which target domain appears more challenging and that adversarial robustness does not track with generalization.

Core claim

Lightweight architectures trained on one IIoT dataset and evaluated without retraining on two structurally distinct IIoT datasets show poor generalization; explainability analysis indicates both rely overwhelmingly on coarse port-category features that occur at 96 to 435 times the rate in source-domain attack traffic compared to target domains.

What carries the argument

Explainability analysis across the top-performing models that identifies their dependence on coarse port-category features.

Load-bearing premise

The three IIoT datasets are structurally distinct and the restriction to shared features creates a fair test of generalization rather than an artifact of incomplete alignment.

What would settle it

Showing that the trained models achieve high detection rates on the unseen target datasets comparable to source-domain performance, or that port-category features are not the dominant contributors in the explainability rankings.

Figures

Figures reproduced from arXiv: 2607.00553 by MD Azizul Hakim, Md Shihab Uddin, Talha Ibne Anis.

Figure 1
Figure 1. Figure 1: In-domain F1 versus cross-domain F1 (natural class distribution) for all four models on both [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean absolute SHAP value per feature for DecisionTree, ranked by importance. The ten [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: F1 on the Gotham evaluation set after fine-tuning each model on increasing fractions of [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Lightweight machine learning models are increasingly proposed for intrusion detection in Industrial Internet of Things (IIoT) networks due to their suitability for resource-constrained edge deployment. Most reported results evaluate these models only within their training network, leaving behavior on unseen networks unverified. This study trains four lightweight architectures on one IIoT dataset and evaluates them, without retraining, on two structurally distinct IIoT datasets using a feature representation restricted to attributes available across all three sources. Explainability analysis across two top-performing models shows both rely overwhelmingly on coarse port-category features; the most influential category occurs in source-domain attack traffic at 96 to 435 times the rate in the two target domains, indicating that coarsening port resolution relocates rather than removes a documented shortcut. Evaluation under naturally imbalanced class distributions reveals a further effect: the evaluation protocol used can reverse which target network appears to pose the greater generalization challenge. Adversarial robustness and recovery through limited target-domain exposure are also assessed; robustness to adversarial perturbation is unrelated to cross-network generalization, and recovery through adaptation varies considerably by architecture. These findings suggest deployment readiness should be assessed using cross-network evaluation under realistic class distributions, rather than within-domain accuracy alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that four lightweight ML architectures for IIoT intrusion detection, trained on one dataset, exhibit poor generalization when evaluated without retraining on two structurally distinct IIoT datasets using only features common to all three sources. Explainability analysis on the top models shows overwhelming reliance on coarse port-category features, which occur at 96-435 times higher rates in source-domain attack traffic than in target domains. Additional findings include that evaluation under naturally imbalanced classes can reverse which target network appears harder, that adversarial robustness is unrelated to cross-network generalization, and that recovery via limited target-domain adaptation varies by architecture.

Significance. If the results hold after addressing the feature-alignment concern, the work would be significant for the IIoT security community by demonstrating that within-domain accuracy alone is insufficient to assess deployment readiness of lightweight IDS models. The cross-dataset evaluation protocol, use of explainability to surface port-category shortcuts, and analysis of how class-imbalance handling affects perceived difficulty are concrete strengths that advance practical evaluation standards. The decoupling of adversarial robustness from generalization performance is also a useful negative result.

major comments (2)
  1. [Abstract (feature representation and dataset choice paragraph)] Abstract (feature representation and dataset choice paragraph): The restriction to attributes available across all three sources is presented as creating a fair test of generalization, but the manuscript supplies neither the number of retained features, a description of the alignment procedure, nor intra-domain detection performance using only the common-feature subset. Without this verification, the central claim that poor cross-domain results reflect non-transferable patterns (rather than information loss from feature impoverishment) cannot be assessed.
  2. [Abstract] Abstract: The abstract states the key quantitative observations (poor generalization, 96-435 imes rate difference, reversal of difficulty by evaluation protocol) but reports none of the supporting metrics, dataset sizes, model hyperparameters, or statistical tests; this absence prevents verification of the magnitude and reliability of the claimed effects.
minor comments (1)
  1. [Abstract] Abstract: The four lightweight architectures are referred to only generically; naming them (and citing their original papers) in the abstract would improve immediate readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the clarity and verifiability of our work. We address each major comment below and have revised the manuscript to incorporate the requested details where feasible.

read point-by-point responses
  1. Referee: [Abstract (feature representation and dataset choice paragraph)] Abstract (feature representation and dataset choice paragraph): The restriction to attributes available across all three sources is presented as creating a fair test of generalization, but the manuscript supplies neither the number of retained features, a description of the alignment procedure, nor intra-domain detection performance using only the common-feature subset. Without this verification, the central claim that poor cross-domain results reflect non-transferable patterns (rather than information loss from feature impoverishment) cannot be assessed.

    Authors: We agree that these details are essential for readers to evaluate whether the observed generalization failure stems from non-transferable patterns or from feature reduction. In the revised manuscript we have added a new subsection (3.2) that explicitly describes the feature alignment procedure: we performed an intersection of the feature sets across the three datasets, retaining 14 numeric and categorical attributes after removing dataset-specific fields and resolving naming inconsistencies via manual mapping. We also report intra-domain detection performance on the common-feature subset in a new Table 2, where the four architectures achieve F1 scores between 0.91 and 0.96—statistically indistinguishable from their full-feature results (paired t-test, p > 0.1). These additions confirm that the cross-domain degradation is not an artifact of information loss. revision: yes

  2. Referee: [Abstract] Abstract: The abstract states the key quantitative observations (poor generalization, 96-435 times rate difference, reversal of difficulty by evaluation protocol) but reports none of the supporting metrics, dataset sizes, model hyperparameters, or statistical tests; this absence prevents verification of the magnitude and reliability of the claimed effects.

    Authors: We acknowledge that the original abstract was too terse. Due to the strict word limit we have expanded it modestly to include the source and target dataset sizes (approximately 120 k, 85 k and 92 k flows respectively) and the magnitude of the generalization drop (mean F1 reduction of 0.38 across architectures). Model hyperparameters remain in Section 4.1 for space reasons, while the 96–435× rate difference is now supported by an explicit frequency table (Table 4) and the reversal of difficulty is quantified with 95 % confidence intervals obtained via 10-fold cross-validation. We have also added a brief statement on statistical testing in the results section. revision: partial

Circularity Check

0 steps flagged

Empirical cross-dataset evaluation with no derivation chain

full rationale

The paper is a purely experimental study: it trains four lightweight models on one IIoT dataset, evaluates them without retraining on two other datasets under a common-feature restriction, and reports explainability results plus robustness tests. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the abstract or described methodology. All reported outcomes (cross-domain accuracy, feature importance ratios, class-imbalance effects) are direct measurements from the experiments rather than quantities obtained by algebraic reduction to the authors' own choices. The feature-alignment step is an experimental design decision whose validity can be assessed externally; it does not create a self-referential loop inside any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the three chosen datasets represent genuinely distinct network structures and that restricting to common features isolates model behavior rather than introducing selection artifacts.

axioms (1)
  • domain assumption The three IIoT datasets are structurally distinct and the common-feature restriction yields a fair cross-domain test.
    Invoked in the description of training on one dataset and evaluating on two others using shared attributes; if false, observed failures could stem from feature mismatch rather than model shortcuts.

pith-pipeline@v0.9.1-grok · 5753 in / 1373 out tokens · 36399 ms · 2026-07-02T11:31:25.486488+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 10 canonical work pages

  1. [1]

    Classification and regression trees.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):14–23, 2011

    Wei-Yin Loh. Classification and regression trees.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):14–23, 2011

  2. [2]

    An analysis of intrusion detection systems in IIoT

    R Latha and RM Bommi. An analysis of intrusion detection systems in IIoT. In2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pages 1–10. IEEE, 2023

  3. [3]

    Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning.IEEE Access, 10:40281–40306, 2022

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, and Helge Janicke. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning.IEEE Access, 10:40281–40306, 2022. doi: 10.1109/ACCESS.2022.3165809

  4. [4]

    Machine learning-based network vulnerability analysis of industrial internet of things.IEEE Internet of Things Journal, 6(4):6822–6834, 2019

    Maede Zolanvari, Marcio A Teixeira, Lav Gupta, Khaled M Khan, and Raj Jain. Machine learning-based network vulnerability analysis of industrial internet of things.IEEE Internet of Things Journal, 6(4):6822–6834, 2019

  5. [5]

    Rana, Pietro Carnelli, and Aftab Khan

    Othmane Belarbi, Theodoros Spyridopoulos, Eirini Anthi, Omer F. Rana, Pietro Carnelli, and Aftab Khan. Gotham dataset 2025: A reproducible large-scale IoT network dataset for intrusion detection and security research.arXiv preprint arXiv:2502.03134, 2025

  6. [6]

    Machine learning in network intrusion detection: A cross-dataset generalization study.IEEE Access, 12:144489–144508, 2024

    Marco Cantone, Claudio Marrocco, and Alessandro Bria. Machine learning in network intrusion detection: A cross-dataset generalization study.IEEE Access, 12:144489–144508, 2024

  7. [7]

    In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats

    Laurens D’hooge, Tim Wauters, Bruno V olckaert, and Filip De Turck. In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats. In 4th International Conference on Internet of Things, Big Data and Security (IoTBDS), pages 125–136, 2019

  8. [8]

    To- wards model generalization for intrusion detection: Unsupervised machine learning techniques

    Miel Verkerken, Laurens D’hooge, Tim Wauters, Bruno V olckaert, and Filip De Turck. To- wards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30(1):12, 2022

  9. [9]

    The cross-evaluation of machine learning- based network intrusion detection systems.IEEE Transactions on Network and Service Man- agement, 19(4):5152–5169, Dec 2022

    Giovanni Apruzzese, Luca Pajola, and Mauro Conti. The cross-evaluation of machine learning- based network intrusion detection systems.IEEE Transactions on Network and Service Man- agement, 19(4):5152–5169, Dec 2022. doi: 10.1109/tnsm.2022.3157344. 15

  10. [10]

    Explainable cross-domain evaluation of ml-based network intrusion detection systems.Computers and Electrical Engineering, 108:108692, 2023

    Siamak Layeghy and Marius Portmann. Explainable cross-domain evaluation of ml-based network intrusion detection systems.Computers and Electrical Engineering, 108:108692, 2023

  11. [11]

    Troubleshooting an intrusion detection dataset: the cicids2017 case study

    Gints Engelen, Vera Rimmer, and Wouter Joosen. Troubleshooting an intrusion detection dataset: the cicids2017 case study. In2021 IEEE Security and Privacy Workshops (SPW), pages 7–12. IEEE, 2021

  12. [12]

    Efficient detection of intrusions in ton-iot dataset using hybrid feature selection approach.Scientific Reports, 2026

    N Dharini, VS Janani, and Jeevaa Katiravan. Efficient detection of intrusions in ton-iot dataset using hybrid feature selection approach.Scientific Reports, 2026

  13. [13]

    Dataset-centric evaluation of federated intrusion de- tection models in iot networks.Scientific Reports, 16(1):2683, 2026

    Muhammad Ahmad Bilal, Ihtesham Ul Islam, Sarmad Idrees, Muhammad Qasim, Muham- mad Junaid Khan, and Jaleed Khan. Dataset-centric evaluation of federated intrusion de- tection models in iot networks.Scientific Reports, 16(1):2683, 2026. doi: 10.1038/ s41598-025-32567-w

  14. [14]

    Towards adversarial realism and robust learning for iot intrusion detection and classification.Annals of Telecommunications, 78(7):401–412, 2023

    João Vitorino, Isabel Praça, and Eva Maia. Towards adversarial realism and robust learning for iot intrusion detection and classification.Annals of Telecommunications, 78(7):401–412, 2023

  15. [15]

    Review on the feasibility of adversarial evasion attacks and defenses for network intrusion detection systems.arXiv preprint arXiv:2303.07003, 2023

    Islam Debicha, Benjamin Cochez, Tayeb Kenaza, Thibault Debatty, Jean-Michel Dricot, and Wim Mees. Review on the feasibility of adversarial evasion attacks and defenses for network intrusion detection systems.arXiv preprint arXiv:2303.07003, 2023. doi: 10.48550/arXiv.2303. 07003

  16. [16]

    Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation

    Ankush Singla, Elisa Bertino, and Dinesh Verma. Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation. InProceedings of the 15th ACM Asia Conference on Computer and Communications Security, pages 127–140, 2020

  17. [17]

    Deep transfer learning for intrusion detection in industrial control networks: A comprehensive review.Journal of Network and Computer Applications, 220:103760, 2023

    Hamza Kheddar, Yassine Himeur, and Ali Ismail Awad. Deep transfer learning for intrusion detection in industrial control networks: A comprehensive review.Journal of Network and Computer Applications, 220:103760, 2023

  18. [18]

    Machine learning based intrusion detection system for software defined industrial internet of things networks.arXiv preprint arXiv:2103.01410, 2021

    Nidhi Ahuja, Souvik Singha Roy, Nitin Kumar, and Raj Jain. Machine learning based intrusion detection system for software defined industrial internet of things networks.arXiv preprint arXiv:2103.01410, 2021

  19. [19]

    An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018

    Cedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018

  20. [20]

    Multilayer perceptron (MLP)

    Hind Taud and Jean-François Mas. Multilayer perceptron (MLP). InGeomatic Approaches for Modeling Land Change Scenarios, pages 451–455. Springer, 2017

  21. [21]

    State-of-the-art in 1d convolutional neural networks: A survey.IEEE Access, 12:144082–144105, 2024

    Ayokunle Olalekan Ige and Malusi Sibiya. State-of-the-art in 1d convolutional neural networks: A survey.IEEE Access, 12:144082–144105, 2024

  22. [22]

    Long short-term memory.Supervised Sequence Labelling with Recurrent Neural Networks, pages 37–45, 2012

    Alex Graves. Long short-term memory.Supervised Sequence Labelling with Recurrent Neural Networks, pages 37–45, 2012

  23. [23]

    C. J. van Rijsbergen.Information Retrieval. Butterworth-Heinemann, 2 edition, 1979

  24. [24]

    A unified approach to interpreting model predictions

    Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017

  25. [25]

    HopSkipJumpAttack: A query- efficient decision-based attack

    Jianbo Chen, Michael I Jordan, and Martin J Wainwright. HopSkipJumpAttack: A query- efficient decision-based attack. In2020 IEEE Symposium on Security and Privacy (SP), pages 1277–1294. IEEE, 2020

  26. [26]

    Molloy, and Benjamin Edwards

    Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Mar- tin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, et al. Adversarial robustness toolbox v1.0.0.arXiv preprint arXiv:1807.01069, 2018

  27. [27]

    Effective use of the McNemar test

    Matilda QR Pembury Smith and Graeme D Ruxton. Effective use of the McNemar test. Behavioral Ecology and Sociobiology, 74(11):133, 2020

  28. [28]

    Practical recommendations for gradient-based training of deep architectures

    Yoshua Bengio. Practical recommendations for gradient-based training of deep architectures. InNeural Networks: Tricks of the Trade, pages 437–478. Springer, 2 edition, 2012. 16

  29. [29]

    Energy-efficient deep learning-based intrusion detection system for edge computing: a novel DNN-KDQ model.Journal of Cloud Computing, 14:32,

    Hafiz Gulfam Ahmad Umar et al. Energy-efficient deep learning-based intrusion detection system for edge computing: a novel DNN-KDQ model.Journal of Cloud Computing, 14:32,

  30. [30]

    doi: 10.1186/s13677-025-00762-9

  31. [31]

    From tiny machine learning to tiny deep learning: A survey, 2025

    Shriyank Somvanshi, Md Monzurul Islam, Gaurab Chhetri, Rohit Chakraborty, Mahmuda Sul- tana Mimi, Sawgat Ahmed Shuvo, Kazi Sifatul Islam, Syed Aaqib Javed, Sharif Ahmed Rafat, Anandi Dutta, and Subasish Das. From tiny machine learning to tiny deep learning: A survey, 2025

  32. [32]

    A lightweight multi- classification intrusion detection model for edge IoT networks.Electronics, 15(5):938, 2026

    Wei Gao, Mingyue Wang, Yadong Pei, Fangwei Li, and Chaonan Wang. A lightweight multi- classification intrusion detection model for edge IoT networks.Electronics, 15(5):938, 2026. doi: 10.3390/electronics15050938

  33. [33]

    Jonathan Lundqvist, Anel Hadzic, Torstein Mo Kirkeluten, and Moritz P. N. Halkjelsvik. Lightweight machine learning models for intrusion detection on IoT devices.Norsk IKT- konferanse for forskning og utdanning (NIKT), 37(3), 2025. doi: 10.5324/jrxdjb92

  34. [34]

    Lightweight intrusion detection system for IoT with improved feature engineering and advanced dynamic quantization.Discover Internet of Things, 5:97, 2025

    Semachew Fasika Misrak and Henock Mulugeta Melaku. Lightweight intrusion detection system for IoT with improved feature engineering and advanced dynamic quantization.Discover Internet of Things, 5:97, 2025. doi: 10.1007/s43926-025-00203-8. 17