Do Newer Lightweight CNNs Perform Better Under Resource Constraints? A Controlled Multigenerational Study of Architecture, Initialization, Training Budget, and Efficiency

Tasnim Shahriar

arxiv: 2607.01984 · v1 · pith:4WMQJCSCnew · submitted 2026-07-02 · 💻 cs.LG · cs.AI· cs.CV

Do Newer Lightweight CNNs Perform Better Under Resource Constraints? A Controlled Multigenerational Study of Architecture, Initialization, Training Budget, and Efficiency

Tasnim Shahriar This is my paper

Pith reviewed 2026-07-03 17:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords lightweight CNNmodel comparisonefficiencyPareto frontierCIFARTiny ImageNetMobileNetEfficientNet

0 comments

The pith

Controlled tests find newer lightweight CNNs deliver selective rather than universal gains in accuracy and efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares nine lightweight CNN model packages on CIFAR-10, CIFAR-100, and Tiny ImageNet using one shared training protocol and evaluation setup. It isolates architecture effects to check whether later designs consistently beat earlier ones on both accuracy and measured resource use. Results show EfficientNetV2-S leading accuracy on two datasets while EfficientNet-B0 stays within a few points using far fewer parameters and operations and lands on every Pareto frontier. MobileNetV3-Small also beats its successor MobileNetV4-Conv-Small in accuracy and speed under the fixed conditions. Practitioners selecting models for constrained hardware therefore cannot assume newer releases will dominate without running controlled comparisons.

Core claim

Under a shared downstream training protocol, newer lightweight CNN designs provide selective rather than universal gains. EfficientNetV2-S reaches the highest top-1 accuracy on CIFAR-10 and CIFAR-100, yet EfficientNet-B0 remains within 0.22 to 1.79 points of the best result while using roughly 79 percent fewer parameters and 86 percent fewer GMACs and appearing on every accuracy-resource Pareto frontier. MobileNetV3-Small records higher accuracy and lower measured latency than MobileNetV4-Conv-Small on all three datasets, and latency orderings shift between GPU and CPU hardware. Scratch training leaves EfficientNet-B0 well below its pretrained performance even after extended epochs.

What carries the argument

The shared downstream training protocol and evaluation setup that holds initialization, data augmentation, optimizer, and epoch budget fixed across nine model packages.

If this is right

EfficientNet-B0 remains competitive across datasets while using substantially fewer parameters and GMACs than later designs.
MobileNetV3-Small achieves lower GMAC count, faster CPU inference, and higher accuracy than MobileNetV4-Conv-Small under identical conditions.
Latency rankings differ sharply between NVIDIA L4 GPU and AMD Ryzen CPU, showing GMACs alone do not predict measured inference performance.
SqueezeNet1.1 records the fewest parameters and lowest peak CUDA memory but substantially weaker accuracy.
EfficientNet-B0 stays 3 to 17 points below its pretrained accuracy after 100 epochs of scratch training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model selection for edge devices benefits from empirical Pareto analysis rather than assuming later releases dominate.
The interaction between architecture and fixed training budget can favor certain earlier designs over their successors.
Hardware-specific latency measurements are required because operation counts do not reliably rank real inference speed.
Revisiting well-tuned older models may yield better efficiency trade-offs than adopting the newest variants without controlled re-evaluation.

Load-bearing premise

The shared downstream training protocol and evaluation setup fairly represents typical real-world use cases without systematic biases from initialization or optimization details that favor one generation of models.

What would settle it

Re-training the same nine models with each architecture's originally published training recipe and hyperparameters reverses the observed accuracy ordering between MobileNetV3-Small and MobileNetV4-Conv-Small on CIFAR-10.

Figures

Figures reproduced from arXiv: 2607.01984 by Tasnim Shahriar.

**Figure 1.** Figure 1: Pretrained top-1 test accuracy. Each model uses the same color and marker identity throughout the paper. Full top-5 results are provided in Appendix A.1. 4.2 Static resources, peak memory, and point estimate Pareto frontiers [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Top-1 accuracy against parameters and GMACs. Black lines connect point estimate Pareto-efficient models in order of increasing resource demand. Across CIFAR-10 and CIFAR-100, the parameter frontier contains SqueezeNet1.1, ShuffleNetV2 x1.0, MobileNetV3-Small, EfficientNet-B0, and EfficientNetV2-S. The GMAC frontier contains MobileNetV3-Small, EfficientNet-B0, and EfficientNetV2-S. Tiny ImageNet changes the… view at source ↗

**Figure 3.** Figure 3: Top-1 accuracy against NVIDIA L4 median latency and Ryzen CPU one-thread median latency. Point estimate Pareto frontiers differ between the evaluated execution environments. The execution-environment-specific ranking shift is substantial. ResNet18 ranks first on the L4 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Latency rank changes across the evaluated NVIDIA L4 FP16 autocast and AMD Ryzen 5 5500U FP32 environments. Descriptive Spearman rank correlations [32] reinforce the same conclusion. GMACs correlate strongly with CPU median latency, ρ = 0.95 for one thread and ρ = 0.93 for four threads, but weakly with L4 median latency, ρ = 0.27. L4 and one-thread CPU latency rankings correlate only moderately, ρ = 0.40. T… view at source ↗

**Figure 5.** Figure 5: EfficientNet-B0 scratch validation accuracy against cumulative recorded training time, with separate endpoint bars for pretrained, maximum 20 epoch scratch, and fresh 100 epoch scratch schedules. The scratch curves represent distinct runs rather than one continued schedule. 4.6 MobileNetV3-Small and MobileNetV4-Conv-S Under pretrained initialization, MobileNetV3-Small records higher observed test accuracy … view at source ↗

**Figure 6.** Figure 6: MobileNetV3-Small and MobileNetV4-Conv-S final accuracy, scratch validation accuracy by epoch, and scratch validation accuracy by cumulative recorded training time. The result supports a conditional conclusion. The evaluated later-generation MobileNetV4-Conv-S model package does not improve upon MobileNetV3-Small under the tested public checkpoints, downstream recipe, scratch budget, CPU environment, param… view at source ↗

**Figure 7.** Figure 7: Selected paired test-set accuracy differences and 95% bootstrap intervals. Open markers indicate intervals that cross zero, while filled markers indicate intervals that exclude zero. These intervals condition on one trained model per setting and do not capture training-seed variability. The numerical paired-comparison table is provided in Appendix A.4. 5 Discussion 5.1 Do newer lightweight CNNs perform bet… view at source ↗

**Figure 8.** Figure 8: Top-1 accuracy against peak PyTorch CUDA allocated tensor memory during batch size 1 FP16 autocast inference. Frontiers are calculated independently for each dataset. A.3 Descriptive rank associations [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Newer lightweight convolutional neural networks are often presented as improving predictive performance and deployment efficiency, but such claims require controlled evaluation. This study compares nine lightweight CNN model packages across CIFAR-10, CIFAR-100, and Tiny ImageNet under a shared downstream protocol. We report top-1 accuracy, macro F1, top-5 accuracy, parameter count, FP32 storage, GMACs, batch-size-1 latency on an NVIDIA L4 and AMD Ryzen 5 5500U CPU, peak PyTorch CUDA allocated tensor memory, and point estimate Pareto frontiers. EfficientNetV2-S achieves the highest observed top-1 accuracy on CIFAR-10 and CIFAR-100 at 97.57% and 86.98%, while RepViT-M1.0 leads Tiny ImageNet at 79.87%. EfficientNet-B0 remains within 0.22, 0.85, and 1.79 percentage points of the best result on the three datasets while using approximately 79% fewer parameters and 86% fewer GMACs than EfficientNetV2-S. It also appears on every evaluated accuracy and resource Pareto frontier, making it the most consistently competitive intermediate-budget option. MobileNetV3-Small has the lowest GMAC count, is the fastest model under both CPU thread settings, and records higher observed accuracy than MobileNetV4-Conv-S on all three datasets. Under random initialization, it leads MobileNetV4-Conv-S by 2.55, 1.76, and 0.99 points, with paired test-set intervals excluding zero for the fixed trained models. EfficientNet-B0 remains 3.29, 10.10, and 17.54 points below its pretrained counterpart after 100 epochs of scratch training, despite requiring about five times the recorded training time. SqueezeNet1.1 has the fewest parameters and lowest peak CUDA allocation, but substantially weaker accuracy. Latency rankings differ sharply between the L4 and CPU environments, showing that GMACs alone do not predict measured inference performance. Overall, newer designs provide selective rather than universal gains

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Newer lightweight CNNs deliver selective gains at best under a shared training protocol; EfficientNet-B0 stays competitive on accuracy and efficiency across datasets.

read the letter

The main takeaway is that this controlled comparison finds no universal advantage for newer lightweight CNNs. Under one fixed downstream recipe, EfficientNet-B0 stays within a couple points of the best accuracy while using far fewer parameters and GMACs, MobileNetV3-Small beats MobileNetV4-Conv-S on all three datasets, and latency rankings flip between GPU and CPU. The paper supplies concrete numbers on accuracy, F1, storage, compute, memory, and hardware latency for nine models on CIFAR-10/100 and Tiny ImageNet, plus Pareto frontiers.

What stands out is the multigenerational setup run under identical conditions. That produces usable side-by-side data on where the claimed improvements actually appear and where they do not. The observation that GMACs alone do not predict measured latency is also worth noting for anyone deploying on mixed hardware.

The soft spot is the single shared protocol. Newer models were originally published with different augmentation schedules, progressive learning, and optimizers. Applying the same 100-epoch random-init recipe can compress their measured gains relative to older models whose original papers used simpler settings. The abstract already shows large scratch-to-pretrained gaps and the MobileNetV3 edge, but without per-model hyperparameter sweeps or ablations the results remain tied to this particular training choice. Methods details on statistical testing and run-to-run variance are also thin in the provided text.

This is for practitioners who need to pick a lightweight model for image classification under tight resource budgets and want numbers from one consistent setup rather than cherry-picked claims. It is not introducing new architectures or theory.

I would send it to peer review. The empirical contribution is narrow but the controlled framing makes the data usable, and referees can press on the protocol fairness and missing variance estimates.

Referee Report

2 major / 2 minor

Summary. The paper performs a controlled empirical comparison of nine lightweight CNN packages (EfficientNet-B0/V2-S, MobileNetV3-Small/V4-Conv-S, RepViT-M1.0, SqueezeNet1.1 and others) on CIFAR-10, CIFAR-100 and Tiny ImageNet. Using a single shared downstream training protocol (100 epochs, fixed initialization/augmentation/optimizer), it reports top-1 accuracy, macro F1, top-5 accuracy, parameter count, GMACs, GPU/CPU latency, peak CUDA memory and accuracy-resource Pareto frontiers. Main claims are that EfficientNetV2-S leads CIFAR accuracy, EfficientNet-B0 remains competitive with far lower resources and appears on all frontiers, MobileNetV3-Small outperforms MobileNetV4-Conv-S under the protocol (including under random init), and newer designs yield only selective rather than universal gains.

Significance. If the shared protocol is architecture-neutral, the work supplies concrete, multi-metric evidence that newer lightweight CNNs do not deliver uniform improvements under fixed resource budgets, underscoring the value of older baselines like EfficientNet-B0 and the poor correlation between GMACs and measured latency. The explicit Pareto analysis and cross-environment latency comparison are useful for practitioners choosing models under deployment constraints.

major comments (2)

[Methods (training protocol)] Methods (training protocol description): The central claim that 'newer designs provide selective rather than universal gains' rests on the assumption that the fixed 100-epoch shared recipe is fair across generations. Newer models (EfficientNetV2-S, MobileNetV4) were originally published with progressive learning, distinct RandAugment policies and optimizer schedules; no ablation or per-model hyperparameter search is reported to isolate architecture from protocol mismatch. This directly affects interpretation of the MobileNetV3-Small vs MobileNetV4-Conv-S gap and the EfficientNet-B0 competitiveness result.
[Results (accuracy and Pareto sections)] Results (accuracy tables and Pareto frontiers): Only point estimates are shown for most comparisons; while paired intervals are mentioned for the MobileNetV3/V4 case, no multiple random seeds, statistical tests, or variance estimates are provided for the 0.22–1.79 pp gaps cited for EfficientNet-B0 or the RepViT Tiny-ImageNet lead. This weakens the load-bearing claim that EfficientNet-B0 'remains within X points of the best result' and appears on every frontier.

minor comments (2)

[Abstract and Results] Abstract and results: The phrase 'point estimate Pareto frontiers' is used without clarifying whether the frontiers are constructed from single runs or whether uncertainty bands are considered.
[Methods] The manuscript should explicitly state the exact data-augmentation pipeline, learning-rate schedule, and optimizer hyperparameters used in the shared protocol so readers can assess compatibility with each model's original recipe.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below, focusing on the controlled nature of the study.

read point-by-point responses

Referee: Methods (training protocol description): The central claim that 'newer designs provide selective rather than universal gains' rests on the assumption that the fixed 100-epoch shared recipe is fair across generations. Newer models (EfficientNetV2-S, MobileNetV4) were originally published with progressive learning, distinct RandAugment policies and optimizer schedules; no ablation or per-model hyperparameter search is reported to isolate architecture from protocol mismatch. This directly affects interpretation of the MobileNetV3-Small vs MobileNetV4-Conv-S gap and the EfficientNet-B0 competitiveness result.

Authors: The manuscript's contribution is a controlled comparison under one shared downstream protocol (explicitly described in Methods and the abstract) to isolate architecture effects rather than to optimize each model individually. This fixed-recipe design is the basis for the 'selective rather than universal gains' claim. We will add an explicit statement in the Methods and Discussion sections clarifying that no per-model hyperparameter search or protocol adaptation was performed, so results reflect performance under the common recipe and not necessarily the best attainable accuracy for each architecture. revision: partial
Referee: Results (accuracy tables and Pareto frontiers): Only point estimates are shown for most comparisons; while paired intervals are mentioned for the MobileNetV3/V4 case, no multiple random seeds, statistical tests, or variance estimates are provided for the 0.22–1.79 pp gaps cited for EfficientNet-B0 or the RepViT Tiny-ImageNet lead. This weakens the load-bearing claim that EfficientNet-B0 'remains within X points of the best result' and appears on every frontier.

Authors: Paired test-set intervals were computed and reported for the MobileNetV3/V4 comparison. For the remaining gaps we report single-run point estimates, which is standard in many architecture benchmark papers. We acknowledge that multi-seed variance would strengthen the claims. We will add a limitations paragraph noting the single-seed point estimates for most metrics and the consequent reliance on observed differences rather than statistical tests. revision: partial

Circularity Check

0 steps flagged

Purely empirical comparison; no derivations or self-referential predictions

full rationale

The manuscript performs a controlled multigenerational empirical study of nine lightweight CNN packages on CIFAR-10/100 and Tiny ImageNet. It reports measured top-1 accuracy, F1, latency, GMACs, memory, and Pareto frontiers under one fixed downstream protocol. No equations, fitted parameters, uniqueness theorems, or ansatzes appear; the headline claim of selective rather than universal gains is a direct summary of the tabulated experimental outcomes. No load-bearing step reduces to a self-citation or to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on specific free parameters, axioms, or invented entities; the study relies on standard CNN training and evaluation practices whose details are not stated.

pith-pipeline@v0.9.1-grok · 5943 in / 1036 out tokens · 23802 ms · 2026-07-03T17:18:43.589421+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 2 internal anchors

[1]

Deep learning,

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[2]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255

work page 2009
[3]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Technical Report, 2009

work page 2009
[4]

Tiny ImageNet visual recognition challenge,

Y. Le and X. Yang, “Tiny ImageNet visual recognition challenge,” Stanford CS231n Course Project, 2015

work page 2015
[5]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016
[6]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and less than 0.5MB model size,” arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

MobileNetV2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520

work page 2018
[8]

ShuffleNet V2: Practical guidelines for efficient CNN architecture design,

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 116–131. 17

work page 2018
[9]

Searching for MobileNetV3,

A. Howard et al., “Searching for MobileNetV3,” inProc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1314–1324

work page 2019
[10]

EfficientNet: Rethinking model scaling for convolutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” inProc. Int. Conf. Mach. Learn., 2019, pp. 6105–6114

work page 2019
[11]

EfficientNetV2: Smaller models and faster training,

M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” inProc. Int. Conf. Mach. Learn., 2021, pp. 10096–10106

work page 2021
[12]

MobileOne: An improved one millisecond mobile backbone,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “MobileOne: An improved one millisecond mobile backbone,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7907–7917

work page 2023
[13]

Run, Don’t Walk: Chasing higher FLOPS for faster neural networks,

J. Chen et al., “Run, Don’t Walk: Chasing higher FLOPS for faster neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12021–12031

work page 2023
[14]

FastViT: A fast hybrid vision transformer using structural reparameterization,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “FastViT: A fast hybrid vision transformer using structural reparameterization,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 5785–5795

work page 2023
[15]

RepViT: Revisiting mobile CNN from ViT perspective,

A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “RepViT: Revisiting mobile CNN from ViT perspective,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 15909–15920

work page 2024
[16]

MobileNetV4: Universal models for the mobile ecosystem,

D. Qin et al., “MobileNetV4: Universal models for the mobile ecosystem,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 78–96

work page 2024
[17]

Rewrite the Stars,

X. Ma, X. Dai, Y. Bai, Y. Wang, and Y. Fu, “Rewrite the Stars,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 5694–5703

work page 2024
[18]

LSNet: See Large, Focus Small,

A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “LSNet: See Large, Focus Small,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 9718–9729

work page 2025
[19]

UniConvNet: Expanding effective receptive field while maintaining asymptotically Gaussian distribution for ConvNets of any scale,

Y. Wang and W. Xi, “UniConvNet: Expanding effective receptive field while maintaining asymptotically Gaussian distribution for ConvNets of any scale,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 20922–20933

work page 2025
[20]

Do better ImageNet models transfer better?

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Do better ImageNet models transfer better?” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2661–2671

work page 2019
[21]

Rethinking ImageNet pre-training,

K. He, R. Girshick, and P. Dollár, “Rethinking ImageNet pre-training,” inProc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4918–4927

work page 2019
[22]

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks,

M. Goldblum et al., “Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks,” inAdvances in Neural Information Processing Systems, vol. 36, Datasets and Benchmarks Track, 2023

work page 2023
[23]

A comprehensive study of transfer learning under constraints,

T. Pégeot, I. Kucher, A. Popescu, and B. Delezoide, “A comprehensive study of transfer learning under constraints,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2023, pp. 1148–1157

work page 2023
[24]

RCV2023 challenges: Benchmarking model training and inference for resource- constrained deep learning,

R. Tiwari et al., “RCV2023 challenges: Benchmarking model training and inference for resource- constrained deep learning,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2023, pp. 1534–1543

work page 2023
[25]

Which backbone to use: A resource-efficient domain specific comparison for computer vision,

P. Jeevan and A. Sethi, “Which backbone to use: A resource-efficient domain specific comparison for computer vision,”Transactions on Machine Learning Research, 2025

work page 2025
[26]

Vision backbone efficient selection for image classification in low-data regimes,

J. Guerin, S. Bansal, A. Shaban, P. Mann, and H. Gazula, “Vision backbone efficient selection for image classification in low-data regimes,” inProc. 36th British Mach. Vis. Conf., 2025, Paper 788

work page 2025
[27]

Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects

T. Shahriar, “Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects,” arXiv:2505.03303, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inProc. Int. Conf. Learn. Represent., 2019

work page 2019
[29]

SGDR: Stochastic gradient descent with warm restarts,

I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” inProc. Int. Conf. Learn. Represent., 2017. 18

work page 2017
[30]

Rethinking the Inception architecture for computer vision,

C. Szegedy et al., “Rethinking the Inception architecture for computer vision,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826

work page 2016
[31]

Miettinen,Nonlinear Multiobjective Optimization

K. Miettinen,Nonlinear Multiobjective Optimization. Boston, MA, USA: Kluwer Academic Publishers, 1999

work page 1999
[32]

The proof and measurement of association between two things,

C. Spearman, “The proof and measurement of association between two things,”The American Journal of Psychology, vol. 15, no. 1, pp. 72–101, 1904

work page 1904
[33]

Note on the sampling error of the difference between correlated proportions or percentages,

Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, pp. 153–157, 1947

work page 1947
[34]

Efron and R

B. Efron and R. J. Tibshirani,An Introduction to the Bootstrap. New York, NY, USA: Chapman and Hall, 1993

work page 1993
[35]

PyTorch Image Models,

R. Wightman, “PyTorch Image Models,” GitHub repository and Zenodo software archive, 2019. 19

work page 2019

[1] [1]

Deep learning,

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[2] [2]

ImageNet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255

work page 2009

[3] [3]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Technical Report, 2009

work page 2009

[4] [4]

Tiny ImageNet visual recognition challenge,

Y. Le and X. Yang, “Tiny ImageNet visual recognition challenge,” Stanford CS231n Course Project, 2015

work page 2015

[5] [5]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016

[6] [6]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and less than 0.5MB model size,” arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

MobileNetV2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520

work page 2018

[8] [8]

ShuffleNet V2: Practical guidelines for efficient CNN architecture design,

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 116–131. 17

work page 2018

[9] [9]

Searching for MobileNetV3,

A. Howard et al., “Searching for MobileNetV3,” inProc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1314–1324

work page 2019

[10] [10]

EfficientNet: Rethinking model scaling for convolutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” inProc. Int. Conf. Mach. Learn., 2019, pp. 6105–6114

work page 2019

[11] [11]

EfficientNetV2: Smaller models and faster training,

M. Tan and Q. Le, “EfficientNetV2: Smaller models and faster training,” inProc. Int. Conf. Mach. Learn., 2021, pp. 10096–10106

work page 2021

[12] [12]

MobileOne: An improved one millisecond mobile backbone,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “MobileOne: An improved one millisecond mobile backbone,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7907–7917

work page 2023

[13] [13]

Run, Don’t Walk: Chasing higher FLOPS for faster neural networks,

J. Chen et al., “Run, Don’t Walk: Chasing higher FLOPS for faster neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 12021–12031

work page 2023

[14] [14]

FastViT: A fast hybrid vision transformer using structural reparameterization,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “FastViT: A fast hybrid vision transformer using structural reparameterization,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 5785–5795

work page 2023

[15] [15]

RepViT: Revisiting mobile CNN from ViT perspective,

A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “RepViT: Revisiting mobile CNN from ViT perspective,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 15909–15920

work page 2024

[16] [16]

MobileNetV4: Universal models for the mobile ecosystem,

D. Qin et al., “MobileNetV4: Universal models for the mobile ecosystem,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 78–96

work page 2024

[17] [17]

Rewrite the Stars,

X. Ma, X. Dai, Y. Bai, Y. Wang, and Y. Fu, “Rewrite the Stars,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 5694–5703

work page 2024

[18] [18]

LSNet: See Large, Focus Small,

A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “LSNet: See Large, Focus Small,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 9718–9729

work page 2025

[19] [19]

UniConvNet: Expanding effective receptive field while maintaining asymptotically Gaussian distribution for ConvNets of any scale,

Y. Wang and W. Xi, “UniConvNet: Expanding effective receptive field while maintaining asymptotically Gaussian distribution for ConvNets of any scale,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 20922–20933

work page 2025

[20] [20]

Do better ImageNet models transfer better?

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Do better ImageNet models transfer better?” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2661–2671

work page 2019

[21] [21]

Rethinking ImageNet pre-training,

K. He, R. Girshick, and P. Dollár, “Rethinking ImageNet pre-training,” inProc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4918–4927

work page 2019

[22] [22]

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks,

M. Goldblum et al., “Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks,” inAdvances in Neural Information Processing Systems, vol. 36, Datasets and Benchmarks Track, 2023

work page 2023

[23] [23]

A comprehensive study of transfer learning under constraints,

T. Pégeot, I. Kucher, A. Popescu, and B. Delezoide, “A comprehensive study of transfer learning under constraints,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2023, pp. 1148–1157

work page 2023

[24] [24]

RCV2023 challenges: Benchmarking model training and inference for resource- constrained deep learning,

R. Tiwari et al., “RCV2023 challenges: Benchmarking model training and inference for resource- constrained deep learning,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2023, pp. 1534–1543

work page 2023

[25] [25]

Which backbone to use: A resource-efficient domain specific comparison for computer vision,

P. Jeevan and A. Sethi, “Which backbone to use: A resource-efficient domain specific comparison for computer vision,”Transactions on Machine Learning Research, 2025

work page 2025

[26] [26]

Vision backbone efficient selection for image classification in low-data regimes,

J. Guerin, S. Bansal, A. Shaban, P. Mann, and H. Gazula, “Vision backbone efficient selection for image classification in low-data regimes,” inProc. 36th British Mach. Vis. Conf., 2025, Paper 788

work page 2025

[27] [27]

Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects

T. Shahriar, “Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects,” arXiv:2505.03303, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inProc. Int. Conf. Learn. Represent., 2019

work page 2019

[29] [29]

SGDR: Stochastic gradient descent with warm restarts,

I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” inProc. Int. Conf. Learn. Represent., 2017. 18

work page 2017

[30] [30]

Rethinking the Inception architecture for computer vision,

C. Szegedy et al., “Rethinking the Inception architecture for computer vision,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826

work page 2016

[31] [31]

Miettinen,Nonlinear Multiobjective Optimization

K. Miettinen,Nonlinear Multiobjective Optimization. Boston, MA, USA: Kluwer Academic Publishers, 1999

work page 1999

[32] [32]

The proof and measurement of association between two things,

C. Spearman, “The proof and measurement of association between two things,”The American Journal of Psychology, vol. 15, no. 1, pp. 72–101, 1904

work page 1904

[33] [33]

Note on the sampling error of the difference between correlated proportions or percentages,

Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, pp. 153–157, 1947

work page 1947

[34] [34]

Efron and R

B. Efron and R. J. Tibshirani,An Introduction to the Bootstrap. New York, NY, USA: Chapman and Hall, 1993

work page 1993

[35] [35]

PyTorch Image Models,

R. Wightman, “PyTorch Image Models,” GitHub repository and Zenodo software archive, 2019. 19

work page 2019