Group-invariant Coresets for Data-efficient Active Learning
Pith reviewed 2026-07-02 04:04 UTC · model grok-4.3
The pith
A group-invariant coreset method selects samples by their orbits under known transformations to avoid querying redundant symmetric copies in active learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRINCO performs acquisition in the quotient space induced by a transformation group so that selection operates on orbits rather than raw samples. It uses canonical representatives or learned orbit-separating invariant embeddings to define quotient metrics, combines this with invariant training through an orbit-averaged loss, and derives a generalization bound relating excess orbit-averaged risk to quotient-space coverage, label uncertainty, and intra-orbit variability.
What carries the argument
GRINCO, the group-invariant coreset framework that performs acquisition in the quotient space induced by a transformation group.
If this is right
- GRINCO improves orbit coverage compared to conventional coreset baselines.
- It achieves stronger label efficiency especially when group-induced redundancy is substantial.
- The generalization bound connects excess risk to how well the quotient space is covered.
- Performance gains appear on both synthetic scale-invariant data and image benchmarks with rotations.
Where Pith is reading between the lines
- Similar quotient methods could extend to other data types with known symmetries like translations or reflections.
- Learned invariant embeddings might allow the approach even when the group is only partially known.
- Reducing intra-orbit variability through the averaged loss could improve model robustness beyond label savings.
- Testing on sequential data with time-shift groups would check if the efficiency gains hold in other modalities.
Load-bearing premise
The transformation group must be known in advance and must create substantial redundancy that can be removed without discarding information needed for the learning task.
What would settle it
Running the method on image data with known rotations and finding that it requires as many or more labels as standard coresets to reach the same accuracy would show the claim does not hold.
Figures
read the original abstract
Active learning reduces labeling cost by querying the most informative unlabeled samples, but standard coreset methods ignore known data symmetries and can waste budget on transformed versions of the same instance. We propose GRINCO, a group-invariant coreset framework that performs acquisition in the quotient space induced by a transformation group, so that selection operates on orbits rather than raw samples. The method uses either canonical representatives or learned orbit-separating invariant embeddings to define practical quotient metrics, and combines quotient-space k-center selection with invariant training through an orbit-averaged loss. We further derive a generalization bound that relates excess orbit-averaged risk to quotient-space coverage, label uncertainty, and intra-orbit variability. Experiments on synthetic scale-invariant data and image benchmarks with rotation-induced redundancy show that GRINCO improves orbit coverage and achieves stronger label efficiency than conventional coreset baselines, especially when group-induced redundancy is substantial.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No significant circularity
full rationale
The provided abstract and description contain no equations, fitted parameters presented as predictions, or self-citation chains that reduce the central claims (quotient-space coreset selection, orbit-averaged loss, or generalization bound) to inputs by construction. The bound is stated as relating excess orbit-averaged risk to coverage, uncertainty, and intra-orbit variability without visible self-referential fitting. No load-bearing self-citations or ansatzes smuggled via prior work are quoted. This is the common case of a self-contained extension with independent experimental support.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning from multiple annotators for medical image segmentation,
L. Zhang, R. Tanno, M. Xu, Y . Huang, K. Bronik, C. Jin, J. Jacob, Y . Zheng, L. Shao, O. Ciccarelliet al., “Learning from multiple annotators for medical image segmentation,”Pattern Recognition, vol. 138, p. 109400, 2023
2023
-
[2]
Label-efficient learning in agriculture: A comprehensive review,
J. Li, D. Chen, X. Qi, Z. Li, Y . Huang, D. Morris, and X. Tan, “Label-efficient learning in agriculture: A comprehensive review,”Computers and Electronics in Agriculture, vol. 215, p. 108412, 2023
2023
-
[3]
A comprehensive review: Active learning for hyper- spectral image classifications,
U. Patel and V . Patel, “A comprehensive review: Active learning for hyper- spectral image classifications,”Earth Science Informatics, vol. 16, no. 3, pp. 1975–1991, 2023
1975
-
[4]
A survey on self-supervised learning: Algorithms, applications, and future trends,
J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo, and D. Tao, “A survey on self-supervised learning: Algorithms, applications, and future trends,”IEEE Trans. Patt. Anal. Mach. Intell., 2024
2024
-
[5]
Hslabeling: Towards efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling,
J. Lin, Z. Yang, Q. Liu, Y . Yan, P. Ghamisi, W. Xie, and L. Fang, “Hslabeling: Towards efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling,”IEEE Transactions on Image Processing, 2025
2025
-
[6]
Active learning literature survey,
B. Settles, “Active learning literature survey,” University of Wisconsin– Madison, Computer Sciences Technical Report 1648, 2009
2009
-
[7]
A survey on deep active learning: Recent advances and new frontiers,
D. Li, Z. Wang, Y . Chen, R. Jiang, W. Ding, and M. Okumura, “A survey on deep active learning: Recent advances and new frontiers,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 4, pp. 5879–5899, 2024. PREPRINT 14
2024
-
[8]
Geometric approxi- mation via coresets,
P. K. Agarwal, S. Har-Peled, K. R. Varadarajanet al., “Geometric approxi- mation via coresets,”Combinatorial and Computational Geometry, vol. 52, no. 1, pp. 1–30, 2005
2005
-
[9]
Active learning for convolutional neural networks: A core-set approach,
O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” inInt. Conf. on Learning Representations (ICLR), 2018
2018
-
[10]
A coreset selection of coreset selection literature: Introduction and recent advances,
B. B. Moser, A. S. Shanbhag, S. Frolov, F. Raue, J. Folz, and A. Dengel, “A coreset selection of coreset selection literature: Introduction and recent advances,”arXiv preprint arXiv:2505.17799, 2025
-
[11]
In defense of core-set: A density-aware core-set selec- tion for active learning,
Y . Kim and B. Shin, “In defense of core-set: A density-aware core-set selec- tion for active learning,” inProc. 28th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD), 2022, pp. 804–812
2022
-
[12]
Active learning through a covering lens,
O. Yehuda, A. Dekel, G. Hacohen, and D. Weinshall, “Active learning through a covering lens,” inAdvances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022
2022
-
[13]
Generalized coverage for more robust low-budget active learning,
W. Bae, J. Noh, and D. J. Sutherland, “Generalized coverage for more robust low-budget active learning,” inComputer Vision – ECCV 2024, ser. Lecture Notes in Computer Science, vol. 15141. Springer, 2024, pp. 318–334
2024
-
[14]
Z. Xiong, N. Dalmasso, S. Sharma, F. Lecue, D. Magazzeni, V . K. Potluru, T. Balch, and M. Veloso, “Fair wasserstein coresets,”arXiv preprint arXiv:2311.05436, 2024
-
[15]
Geometric me- dian matching for robust k-subset selection from noisy data,
A. Acharya, S. Sanghavi, A. G. Dimakis, and I. S. Dhillon, “Geometric me- dian matching for robust k-subset selection from noisy data,”arXiv preprint arXiv:2504.00564, 2025
-
[16]
Small coresets via negative dependence: Dpps, linear statistics, and concentration,
R. Bardenet, S. Ghosh, H. Simon-Onfroy, and H.-S. Tran, “Small coresets via negative dependence: Dpps, linear statistics, and concentration,”arXiv preprint arXiv:2411.00611, 2024
-
[17]
Equivariant and coordinate independent convolutional networks: A gauge field theory of neural networks,
M. Weiler, “Equivariant and coordinate independent convolutional networks: A gauge field theory of neural networks,” Ph.D. dissertation, University of Amsterdam, Mar. 2024, phD thesis
2024
-
[18]
Group equivariant convolutional networks,
T. S. Cohen and M. Welling, “Group equivariant convolutional networks,” in Proc. of the 33rd Int. Conf. on Machine Learning (ICML), vol. 48. PMLR, 2016, pp. 2990–2999
2016
-
[19]
Har- monic networks: Deep translation and rotation equivariance,
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Har- monic networks: Deep translation and rotation equivariance,” inProc. of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028– 5037
2017
-
[20]
Generalizing convolu- tional neural networks for equivariance to lie groups on arbitrary continuous data,
M. Finzi, S. Stanton, P. Izmailov, and A. G. Wilson, “Generalizing convolu- tional neural networks for equivariance to lie groups on arbitrary continuous data,” inProc. 37th Int. Conf. on Machine Learning, vol. 119. PMLR, 13–18 Jul 2020, pp. 3165–3176
2020
-
[21]
Deep sets,
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. J. Smola, “Deep sets,” inAdvances in Neural Information Processing Systems, vol. 30, 2017
2017
-
[22]
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
M. M. Bronstein, J. Bruna, T. Cohen, and P. Veli ˇckovi´c, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,”arXiv preprint arXiv:2104.13478, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[23]
A group-theoretic framework for data augmentation,
S. Chen, E. Dobriban, and J. H. Lee, “A group-theoretic framework for data augmentation,”Journal of Machine Learning Research, vol. 21, no. 245, pp. 1–71, 2020
2020
-
[24]
Group invariant machine learning by fundamental domain projections,
B. Aslan, D. Platt, and D. Sheard, “Group invariant machine learning by fundamental domain projections,” inNeurIPS Workshop on Symmetry and Geometry in Neural Representations, 2023, pp. 181–218
2023
-
[25]
Unsupervised representation learn- ing by predicting image rotations,
S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learn- ing by predicting image rotations,” inProc. of the Int. Conf. on Learning Representations (ICLR), Vancouver, Canada, April 2018
2018
-
[26]
A simple framework for contrastive learning of visual representations,
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inProc. of the 37th Int. Conf. on Machine Learning, ser. ICML’20. JMLR.org, 2020
2020
-
[27]
Emerging properties in self-supervised vision transformers,
M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2021, pp. 9630–9640
2021
-
[28]
Learning invariances in neural networks from training data,
G. Benton, M. Finzi, P. Izmailov, and A. G. Wilson, “Learning invariances in neural networks from training data,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 17 605–17 616
2020
-
[29]
Reducing label effort: Self-supervised meets active learning,
J. Z. Bengar, J. van de Weijer, B. Twardowski, and B. Raducanu, “Reducing label effort: Self-supervised meets active learning,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), 2021
2021
-
[30]
Bridging diver- sity and uncertainty in active learning with self-supervised pre-training,
P. Doucet, B. Estermann, T. Aczel, and R. Wattenhofer, “Bridging diver- sity and uncertainty in active learning with self-supervised pre-training,” in5th Workshop on Practical ML for Limited/Low Resource Settings (PML4LRS@ICLR), 2024
2024
-
[31]
Integrating deep metric learning with coreset for active learning in 3d segmentation,
A. Vepa, Z. Yang, A. Choi, J. Joo, F. Scalzo, and Y . Sun, “Integrating deep metric learning with coreset for active learning in 3d segmentation,” inAdvances in Neural Information Processing Systems 38 (NeurIPS 2024), 2024
2024
-
[32]
Empowering active learning for 3d molecular graphs with geometric graph isomorphism,
R. Subedi, L. Wei, W. Gao, S. Chakraborty, and Y . Liu, “Empowering active learning for 3d molecular graphs with geometric graph isomorphism,” in Advances in Neural Information Processing Systems 38 (NeurIPS 2024), 2024
2024
-
[33]
M. A. Armstrong,Groups and symmetry. Springer Science & Business Media, 1997
1997
-
[34]
Learning symmetrization for equivariance with orbit distance minimization,
D. T. Nguyen, J. Kim, H. Yang, and S. Hong, “Learning symmetrization for equivariance with orbit distance minimization,” inNeurIPS Workshop on Symmetry and Geometry in Neural Representations, 2023
2023
-
[35]
Low-dimensional invariant embeddings for uni- versal geometric learning,
N. Dym and S. J. Gortler, “Low-dimensional invariant embeddings for uni- versal geometric learning,”Foundations of Computational Mathematics, pp. 1–41, 2024
2024
-
[36]
Autonomous driving system: A comprehensive survey,
J. Zhao, W. Zhao, B. Deng, Z. Wang, F. Zhang, W. Zheng, W. Cao, J. Nan, Y . Lian, and A. F. Burke, “Autonomous driving system: A comprehensive survey,”Expert Systems with Applications, p. 122836, 2023
2023
-
[37]
Improving generalization with active learning,
D. Cohn, L. Atlas, and R. Ladner, “Improving generalization with active learning,”Machine learning, vol. 15, pp. 201–221, 1994
1994
-
[38]
A survey on active learning strategy,
L.-L. Sun and X.-Z. Wang, “A survey on active learning strategy,” in2010 Int. Conf. on Machine Learning and Cybernetics, vol. 1. IEEE, 2010, pp. 161–166
2010
-
[39]
A survey of deep active learning,
P. Ren, Y . Xiao, X. Chang, P.-Y . Huang, Z. Li, B. B. Gupta, X. Chen, and X. Wang, “A survey of deep active learning,”ACM computing surveys (CSUR), vol. 54, no. 9, pp. 1–40, 2021
2021
-
[40]
Active learning query strategies for classification, regression, and clustering: A survey,
P. Kumar and A. Gupta, “Active learning query strategies for classification, regression, and clustering: A survey,”Journal of Computer Science and Technology, vol. 35, pp. 913–945, 2020
2020
-
[41]
A survey on active learning: State-of-the-art, practical challenges and research directions,
A. Tharwat and W. Schenck, “A survey on active learning: State-of-the-art, practical challenges and research directions,”Mathematics, vol. 11, no. 4, p. 820, 2023
2023
-
[42]
Core-sets: Updated survey,
D. Feldman, “Core-sets: Updated survey,”Sampling Techniques for Super- vised or Unsupervised Tasks, pp. 23–44, 2020
2020
-
[43]
Clustering to minimize the maximum intercluster distance,
T. Gonzalez, “Clustering to minimize the maximum intercluster distance,” Theoretical Computer Science, vol. 38, pp. 293–306, 1985
1985
-
[44]
Probabilistic symmetries and invariant neural networks,
B. Bloem-Reddy and Y . W. Teh, “Probabilistic symmetries and invariant neural networks,”Journal of Machine Learning Research, vol. 21, no. 90, pp. 1–61, 2020
2020
-
[45]
A survey on image data augmentation for deep learning,
C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,”Journal of Big Data, vol. 6, no. 1, p. 60, 2019
2019
-
[46]
Equivariant repre- sentation learning via class-pose decomposition,
G. L. Marchetti, G. Tegnér, A. Varava, and D. Kragic, “Equivariant repre- sentation learning via class-pose decomposition,” inInt. Conf. on Artificial Intelligence and Statistics. PMLR, 2023, pp. 4745–4756
2023
-
[47]
Symmetry- adapted representation learning,
F. Anselmi, G. Evangelopoulos, L. Rosasco, and T. Poggio, “Symmetry- adapted representation learning,”Pattern Recognition, vol. 86, pp. 201–208, 2019
2019
-
[48]
Structuring represen- tations using group invariants,
M. Shakerinava, A. K. Mondal, and S. Ravanbakhsh, “Structuring represen- tations using group invariants,”Advances in Neural Information Processing Systems, vol. 35, pp. 34 162–34 174, 2022
2022
-
[49]
Estimation under group actions: recovering orbits from invariants,
A. S. Bandeira, B. Blum-Smith, J. Kileel, J. Niles-Weed, A. Perry, and A. S. Wein, “Estimation under group actions: recovering orbits from invariants,” Applied and Computational Harmonic Analysis, vol. 66, pp. 236–319, 2023
2023
-
[50]
Diffusion maps for group-invariant manifolds,
P. Hoyos and J. Kileel, “Diffusion maps for group-invariant manifolds,”arXiv preprint arXiv:2303.16169, 2023
-
[51]
Burago, Y
D. Burago, Y . Burago, and S. Ivanov,A Course in Metric Geometry. Amer- ican Mathematical Society, 2022, vol. 33
2022
-
[52]
Unsupervised learning of group invariant and equivariant representations,
R. Winter, M. Bertolini, T. Le, F. Noe, and D.-A. Clevert, “Unsupervised learning of group invariant and equivariant representations,”Advances in Neural Information Processing Systems, vol. 35, pp. 31 942–31 956, 2022
2022
-
[53]
Learning (approxi- mately) equivariant networks via constrained optimization,
A. Manolache, L. F. O. Chamon, and M. Niepert, “Learning (approxi- mately) equivariant networks via constrained optimization,”arXiv preprint arXiv:2505.13631, 2025
-
[54]
A Bernstein-type inequality for functions of bounded interaction
A. Maurer, “A bernstein-type inequality for functions of bounded interaction,” arXiv preprint arXiv:1701.06191, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[55]
A gentle introduction to concentration inequalities,
K. Sridharan, “A gentle introduction to concentration inequalities,”Dept. Comput. Sci., Cornell Univ., Tech. Rep, pp. 1–21, 2002
2002
-
[56]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009
2009
-
[57]
An analysis of single-layer networks in unsu- pervised feature learning,
A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsu- pervised feature learning,” inProc. of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conf. Proc., 2011, pp. 215–223
2011
-
[58]
Gradient-based learning applied to document recognition,
Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002
2002
-
[59]
Deep batch active learning by diverse, uncertain gradient lower bounds,
J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agarwal, “Deep batch active learning by diverse, uncertain gradient lower bounds,” inInt. Conf. on Learning Representations (ICLR), 2020. PREPRINT 15 APPENDIX PROOF OF THEGENERALIZATIONTHEOREM This proof of Theorem 1 complements Subsection III-C by bounding the generalization gap via a decompositi...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.