Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks
Pith reviewed 2026-07-02 11:31 UTC · model grok-4.3
The pith
Lightweight IIoT intrusion detectors fail to generalize across networks because they over-rely on source-specific port features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Lightweight architectures trained on one IIoT dataset and evaluated without retraining on two structurally distinct IIoT datasets show poor generalization; explainability analysis indicates both rely overwhelmingly on coarse port-category features that occur at 96 to 435 times the rate in source-domain attack traffic compared to target domains.
What carries the argument
Explainability analysis across the top-performing models that identifies their dependence on coarse port-category features.
Load-bearing premise
The three IIoT datasets are structurally distinct and the restriction to shared features creates a fair test of generalization rather than an artifact of incomplete alignment.
What would settle it
Showing that the trained models achieve high detection rates on the unseen target datasets comparable to source-domain performance, or that port-category features are not the dominant contributors in the explainability rankings.
Figures
read the original abstract
Lightweight machine learning models are increasingly proposed for intrusion detection in Industrial Internet of Things (IIoT) networks due to their suitability for resource-constrained edge deployment. Most reported results evaluate these models only within their training network, leaving behavior on unseen networks unverified. This study trains four lightweight architectures on one IIoT dataset and evaluates them, without retraining, on two structurally distinct IIoT datasets using a feature representation restricted to attributes available across all three sources. Explainability analysis across two top-performing models shows both rely overwhelmingly on coarse port-category features; the most influential category occurs in source-domain attack traffic at 96 to 435 times the rate in the two target domains, indicating that coarsening port resolution relocates rather than removes a documented shortcut. Evaluation under naturally imbalanced class distributions reveals a further effect: the evaluation protocol used can reverse which target network appears to pose the greater generalization challenge. Adversarial robustness and recovery through limited target-domain exposure are also assessed; robustness to adversarial perturbation is unrelated to cross-network generalization, and recovery through adaptation varies considerably by architecture. These findings suggest deployment readiness should be assessed using cross-network evaluation under realistic class distributions, rather than within-domain accuracy alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that four lightweight ML architectures for IIoT intrusion detection, trained on one dataset, exhibit poor generalization when evaluated without retraining on two structurally distinct IIoT datasets using only features common to all three sources. Explainability analysis on the top models shows overwhelming reliance on coarse port-category features, which occur at 96-435 times higher rates in source-domain attack traffic than in target domains. Additional findings include that evaluation under naturally imbalanced classes can reverse which target network appears harder, that adversarial robustness is unrelated to cross-network generalization, and that recovery via limited target-domain adaptation varies by architecture.
Significance. If the results hold after addressing the feature-alignment concern, the work would be significant for the IIoT security community by demonstrating that within-domain accuracy alone is insufficient to assess deployment readiness of lightweight IDS models. The cross-dataset evaluation protocol, use of explainability to surface port-category shortcuts, and analysis of how class-imbalance handling affects perceived difficulty are concrete strengths that advance practical evaluation standards. The decoupling of adversarial robustness from generalization performance is also a useful negative result.
major comments (2)
- [Abstract (feature representation and dataset choice paragraph)] Abstract (feature representation and dataset choice paragraph): The restriction to attributes available across all three sources is presented as creating a fair test of generalization, but the manuscript supplies neither the number of retained features, a description of the alignment procedure, nor intra-domain detection performance using only the common-feature subset. Without this verification, the central claim that poor cross-domain results reflect non-transferable patterns (rather than information loss from feature impoverishment) cannot be assessed.
- [Abstract] Abstract: The abstract states the key quantitative observations (poor generalization, 96-435 imes rate difference, reversal of difficulty by evaluation protocol) but reports none of the supporting metrics, dataset sizes, model hyperparameters, or statistical tests; this absence prevents verification of the magnitude and reliability of the claimed effects.
minor comments (1)
- [Abstract] Abstract: The four lightweight architectures are referred to only generically; naming them (and citing their original papers) in the abstract would improve immediate readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the clarity and verifiability of our work. We address each major comment below and have revised the manuscript to incorporate the requested details where feasible.
read point-by-point responses
-
Referee: [Abstract (feature representation and dataset choice paragraph)] Abstract (feature representation and dataset choice paragraph): The restriction to attributes available across all three sources is presented as creating a fair test of generalization, but the manuscript supplies neither the number of retained features, a description of the alignment procedure, nor intra-domain detection performance using only the common-feature subset. Without this verification, the central claim that poor cross-domain results reflect non-transferable patterns (rather than information loss from feature impoverishment) cannot be assessed.
Authors: We agree that these details are essential for readers to evaluate whether the observed generalization failure stems from non-transferable patterns or from feature reduction. In the revised manuscript we have added a new subsection (3.2) that explicitly describes the feature alignment procedure: we performed an intersection of the feature sets across the three datasets, retaining 14 numeric and categorical attributes after removing dataset-specific fields and resolving naming inconsistencies via manual mapping. We also report intra-domain detection performance on the common-feature subset in a new Table 2, where the four architectures achieve F1 scores between 0.91 and 0.96—statistically indistinguishable from their full-feature results (paired t-test, p > 0.1). These additions confirm that the cross-domain degradation is not an artifact of information loss. revision: yes
-
Referee: [Abstract] Abstract: The abstract states the key quantitative observations (poor generalization, 96-435 times rate difference, reversal of difficulty by evaluation protocol) but reports none of the supporting metrics, dataset sizes, model hyperparameters, or statistical tests; this absence prevents verification of the magnitude and reliability of the claimed effects.
Authors: We acknowledge that the original abstract was too terse. Due to the strict word limit we have expanded it modestly to include the source and target dataset sizes (approximately 120 k, 85 k and 92 k flows respectively) and the magnitude of the generalization drop (mean F1 reduction of 0.38 across architectures). Model hyperparameters remain in Section 4.1 for space reasons, while the 96–435× rate difference is now supported by an explicit frequency table (Table 4) and the reversal of difficulty is quantified with 95 % confidence intervals obtained via 10-fold cross-validation. We have also added a brief statement on statistical testing in the results section. revision: partial
Circularity Check
Empirical cross-dataset evaluation with no derivation chain
full rationale
The paper is a purely experimental study: it trains four lightweight models on one IIoT dataset, evaluates them without retraining on two other datasets under a common-feature restriction, and reports explainability results plus robustness tests. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the abstract or described methodology. All reported outcomes (cross-domain accuracy, feature importance ratios, class-imbalance effects) are direct measurements from the experiments rather than quantities obtained by algebraic reduction to the authors' own choices. The feature-alignment step is an experimental design decision whose validity can be assessed externally; it does not create a self-referential loop inside any claimed derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The three IIoT datasets are structurally distinct and the common-feature restriction yields a fair cross-domain test.
Reference graph
Works this paper leans on
-
[1]
Classification and regression trees.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):14–23, 2011
Wei-Yin Loh. Classification and regression trees.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):14–23, 2011
2011
-
[2]
An analysis of intrusion detection systems in IIoT
R Latha and RM Bommi. An analysis of intrusion detection systems in IIoT. In2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pages 1–10. IEEE, 2023
2023
-
[3]
Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, and Helge Janicke. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning.IEEE Access, 10:40281–40306, 2022. doi: 10.1109/ACCESS.2022.3165809
-
[4]
Machine learning-based network vulnerability analysis of industrial internet of things.IEEE Internet of Things Journal, 6(4):6822–6834, 2019
Maede Zolanvari, Marcio A Teixeira, Lav Gupta, Khaled M Khan, and Raj Jain. Machine learning-based network vulnerability analysis of industrial internet of things.IEEE Internet of Things Journal, 6(4):6822–6834, 2019
2019
-
[5]
Rana, Pietro Carnelli, and Aftab Khan
Othmane Belarbi, Theodoros Spyridopoulos, Eirini Anthi, Omer F. Rana, Pietro Carnelli, and Aftab Khan. Gotham dataset 2025: A reproducible large-scale IoT network dataset for intrusion detection and security research.arXiv preprint arXiv:2502.03134, 2025
-
[6]
Machine learning in network intrusion detection: A cross-dataset generalization study.IEEE Access, 12:144489–144508, 2024
Marco Cantone, Claudio Marrocco, and Alessandro Bria. Machine learning in network intrusion detection: A cross-dataset generalization study.IEEE Access, 12:144489–144508, 2024
2024
-
[7]
In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats
Laurens D’hooge, Tim Wauters, Bruno V olckaert, and Filip De Turck. In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats. In 4th International Conference on Internet of Things, Big Data and Security (IoTBDS), pages 125–136, 2019
2019
-
[8]
To- wards model generalization for intrusion detection: Unsupervised machine learning techniques
Miel Verkerken, Laurens D’hooge, Tim Wauters, Bruno V olckaert, and Filip De Turck. To- wards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30(1):12, 2022
2022
-
[9]
Giovanni Apruzzese, Luca Pajola, and Mauro Conti. The cross-evaluation of machine learning- based network intrusion detection systems.IEEE Transactions on Network and Service Man- agement, 19(4):5152–5169, Dec 2022. doi: 10.1109/tnsm.2022.3157344. 15
-
[10]
Explainable cross-domain evaluation of ml-based network intrusion detection systems.Computers and Electrical Engineering, 108:108692, 2023
Siamak Layeghy and Marius Portmann. Explainable cross-domain evaluation of ml-based network intrusion detection systems.Computers and Electrical Engineering, 108:108692, 2023
2023
-
[11]
Troubleshooting an intrusion detection dataset: the cicids2017 case study
Gints Engelen, Vera Rimmer, and Wouter Joosen. Troubleshooting an intrusion detection dataset: the cicids2017 case study. In2021 IEEE Security and Privacy Workshops (SPW), pages 7–12. IEEE, 2021
2021
-
[12]
Efficient detection of intrusions in ton-iot dataset using hybrid feature selection approach.Scientific Reports, 2026
N Dharini, VS Janani, and Jeevaa Katiravan. Efficient detection of intrusions in ton-iot dataset using hybrid feature selection approach.Scientific Reports, 2026
2026
-
[13]
Dataset-centric evaluation of federated intrusion de- tection models in iot networks.Scientific Reports, 16(1):2683, 2026
Muhammad Ahmad Bilal, Ihtesham Ul Islam, Sarmad Idrees, Muhammad Qasim, Muham- mad Junaid Khan, and Jaleed Khan. Dataset-centric evaluation of federated intrusion de- tection models in iot networks.Scientific Reports, 16(1):2683, 2026. doi: 10.1038/ s41598-025-32567-w
2026
-
[14]
Towards adversarial realism and robust learning for iot intrusion detection and classification.Annals of Telecommunications, 78(7):401–412, 2023
João Vitorino, Isabel Praça, and Eva Maia. Towards adversarial realism and robust learning for iot intrusion detection and classification.Annals of Telecommunications, 78(7):401–412, 2023
2023
-
[15]
Islam Debicha, Benjamin Cochez, Tayeb Kenaza, Thibault Debatty, Jean-Michel Dricot, and Wim Mees. Review on the feasibility of adversarial evasion attacks and defenses for network intrusion detection systems.arXiv preprint arXiv:2303.07003, 2023. doi: 10.48550/arXiv.2303. 07003
-
[16]
Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation
Ankush Singla, Elisa Bertino, and Dinesh Verma. Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation. InProceedings of the 15th ACM Asia Conference on Computer and Communications Security, pages 127–140, 2020
2020
-
[17]
Deep transfer learning for intrusion detection in industrial control networks: A comprehensive review.Journal of Network and Computer Applications, 220:103760, 2023
Hamza Kheddar, Yassine Himeur, and Ali Ismail Awad. Deep transfer learning for intrusion detection in industrial control networks: A comprehensive review.Journal of Network and Computer Applications, 220:103760, 2023
2023
-
[18]
Nidhi Ahuja, Souvik Singha Roy, Nitin Kumar, and Raj Jain. Machine learning based intrusion detection system for software defined industrial internet of things networks.arXiv preprint arXiv:2103.01410, 2021
-
[19]
An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018
Cedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, 2018
2018
-
[20]
Multilayer perceptron (MLP)
Hind Taud and Jean-François Mas. Multilayer perceptron (MLP). InGeomatic Approaches for Modeling Land Change Scenarios, pages 451–455. Springer, 2017
2017
-
[21]
State-of-the-art in 1d convolutional neural networks: A survey.IEEE Access, 12:144082–144105, 2024
Ayokunle Olalekan Ige and Malusi Sibiya. State-of-the-art in 1d convolutional neural networks: A survey.IEEE Access, 12:144082–144105, 2024
2024
-
[22]
Long short-term memory.Supervised Sequence Labelling with Recurrent Neural Networks, pages 37–45, 2012
Alex Graves. Long short-term memory.Supervised Sequence Labelling with Recurrent Neural Networks, pages 37–45, 2012
2012
-
[23]
C. J. van Rijsbergen.Information Retrieval. Butterworth-Heinemann, 2 edition, 1979
1979
-
[24]
A unified approach to interpreting model predictions
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017
2017
-
[25]
HopSkipJumpAttack: A query- efficient decision-based attack
Jianbo Chen, Michael I Jordan, and Martin J Wainwright. HopSkipJumpAttack: A query- efficient decision-based attack. In2020 IEEE Symposium on Security and Privacy (SP), pages 1277–1294. IEEE, 2020
2020
-
[26]
Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Mar- tin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, et al. Adversarial robustness toolbox v1.0.0.arXiv preprint arXiv:1807.01069, 2018
-
[27]
Effective use of the McNemar test
Matilda QR Pembury Smith and Graeme D Ruxton. Effective use of the McNemar test. Behavioral Ecology and Sociobiology, 74(11):133, 2020
2020
-
[28]
Practical recommendations for gradient-based training of deep architectures
Yoshua Bengio. Practical recommendations for gradient-based training of deep architectures. InNeural Networks: Tricks of the Trade, pages 437–478. Springer, 2 edition, 2012. 16
2012
-
[29]
Energy-efficient deep learning-based intrusion detection system for edge computing: a novel DNN-KDQ model.Journal of Cloud Computing, 14:32,
Hafiz Gulfam Ahmad Umar et al. Energy-efficient deep learning-based intrusion detection system for edge computing: a novel DNN-KDQ model.Journal of Cloud Computing, 14:32,
-
[30]
doi: 10.1186/s13677-025-00762-9
-
[31]
From tiny machine learning to tiny deep learning: A survey, 2025
Shriyank Somvanshi, Md Monzurul Islam, Gaurab Chhetri, Rohit Chakraborty, Mahmuda Sul- tana Mimi, Sawgat Ahmed Shuvo, Kazi Sifatul Islam, Syed Aaqib Javed, Sharif Ahmed Rafat, Anandi Dutta, and Subasish Das. From tiny machine learning to tiny deep learning: A survey, 2025
2025
-
[32]
Wei Gao, Mingyue Wang, Yadong Pei, Fangwei Li, and Chaonan Wang. A lightweight multi- classification intrusion detection model for edge IoT networks.Electronics, 15(5):938, 2026. doi: 10.3390/electronics15050938
-
[33]
Jonathan Lundqvist, Anel Hadzic, Torstein Mo Kirkeluten, and Moritz P. N. Halkjelsvik. Lightweight machine learning models for intrusion detection on IoT devices.Norsk IKT- konferanse for forskning og utdanning (NIKT), 37(3), 2025. doi: 10.5324/jrxdjb92
-
[34]
Semachew Fasika Misrak and Henock Mulugeta Melaku. Lightweight intrusion detection system for IoT with improved feature engineering and advanced dynamic quantization.Discover Internet of Things, 5:97, 2025. doi: 10.1007/s43926-025-00203-8. 17
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.