Rethinking Generic Object Tracking Toward Human-Level Perceptual Intelligence
Pith reviewed 2026-07-03 20:59 UTC · model grok-4.3
The pith
Enhancing target discrimination, robust adaptation, and geometric reasoning narrows the gap between machine trackers and human visual perception.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generic object tracking can be advanced toward human-level performance by a series of methods that systematically enhance the target discrimination, robust adaptation, and geometric reasoning capabilities of tracking models, thereby addressing bottlenecks in generalization and online adaptation for unpredictable future events and variations.
What carries the argument
A series of methods enhancing target discrimination against distractors, robust online adaptation to variations, and geometric reasoning about spatial context in models started from a single bounding box.
If this is right
- Trackers maintain visual continuity despite severe target deformation.
- Models better resist complex distractors and significant environmental changes.
- Performance improves on object categories unseen during training.
- Reliable localization continues from an initial bounding box in dynamic streams.
Where Pith is reading between the lines
- Stronger adaptation and geometry modules might reduce reliance on massive labeled training sets.
- The same three enhancements could transfer to related tasks like video object segmentation.
- Success would suggest that targeted capability boosts, rather than full scene semantics, suffice for human-like tracking.
Load-bearing premise
That the main bottlenecks of generalization and online adaptation can be addressed by systematically enhancing target discrimination, robust adaptation, and geometric reasoning.
What would settle it
A sequence of test videos where a tracker using the proposed enhancements still loses the target on a novel combination of severe deformation and unseen-category distractors.
Figures
read the original abstract
At the heart of human visual perception lies the ability to maintain a continuous and coherent understanding of the external world. By integrating observations with accumulated experience, the human visual system can continuously adapt to variations in both the target and its surrounding environment, while preserving robust visual continuity as scene dynamics evolve. Human vision can therefore integrate prior knowledge, spatial geometry, and semantic context to understand complex scenes and their changes. As a core problem in computer vision, visual object tracking aims to bring machine perception closer to human visual perception. These capabilities are central to the task of Generic Object Tracking (GOT). In this task, a visual tracker is initialized only with the bounding box of an arbitrarily specified target in the first frame, and must continuously localize the target in subsequent dynamic visual streams. However, future events, observations, and real-world variations are inherently unpredictable; therefore, the model's generalization and online adaptation capabilities remain bottlenecks. Tracking reliability can deteriorate when the target undergoes severe deformation, is affected by complex distractors, encounters significant environmental changes, or belongs to a category unseen during training. This dissertation aims to narrow the gap between machine visual tracking systems and human visual perception by proposing a series of methods that systematically enhance the target discrimination, robust adaptation, and geometric reasoning capabilities of tracking models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a dissertation proposal that identifies limitations in generic object tracking (GOT), including poor generalization and online adaptation to unpredictable events such as target deformation, distractors, environmental changes, and unseen categories. It claims that integrating human-like capabilities for target discrimination, robust adaptation, and geometric reasoning will narrow the gap to human-level perceptual intelligence, but provides no specific methods, derivations, experiments, or results.
Significance. Advancing visual tracking toward human-level robustness would be significant for computer vision applications. However, because the manuscript supplies no methods, data, or evidence, no assessment of achieved significance is possible; the contribution remains aspirational.
major comments (1)
- [Abstract] Abstract: The central claim that 'a series of methods' will systematically enhance target discrimination, robust adaptation, and geometric reasoning is unsupported by any description of those methods, any equations, any experimental design, or any preliminary results. This renders the claim an intention rather than a testable or verifiable contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed review. Our manuscript is a dissertation proposal that frames open challenges in generic object tracking and outlines a research agenda; it does not claim to deliver completed methods or results. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'a series of methods' will systematically enhance target discrimination, robust adaptation, and geometric reasoning is unsupported by any description of those methods, any equations, any experimental design, or any preliminary results. This renders the claim an intention rather than a testable or verifiable contribution.
Authors: We agree that no concrete methods, equations, experimental designs, or results are supplied. The manuscript is explicitly a dissertation proposal whose abstract states the intended research program ('This dissertation aims to narrow the gap ... by proposing a series of methods'). The contribution at this stage is the identification of the three core bottlenecks (target discrimination, robust adaptation, geometric reasoning) and the argument that addressing them would move tracking closer to human-level robustness. Because the document is a proposal rather than a completed study, the absence of implementation details is by design; the abstract accurately describes the scope of the planned dissertation work. revision: no
Circularity Check
No circularity: proposal without derivations or equations
full rationale
The document is a dissertation proposal whose abstract states an intention to propose methods for target discrimination, robust adaptation, and geometric reasoning to approach human-level tracking. No equations, parameter fits, self-citations, uniqueness theorems, or ansatzes are supplied. The central text contains no derivation chain that could reduce to its own inputs by construction; the claim is aspirational rather than a completed result. This matches the default expectation of no significant circularity for a high-level goal statement.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Picture perception reveals mental geometry of 3d scene inferences,
E. Koch, F. Baig, and Q. Zaidi, “Picture perception reveals mental geometry of 3d scene inferences,” Proceedings of the National Academy of Sciences of the United States of America (PNAS) , 2018. 1, 79
work page 2018
-
[2]
Knowledge in perception and illusion,
R. L. Gregory, “Knowledge in perception and illusion,” Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences (PHILOS T R SOC B), 1997. 1, 79
work page 1997
-
[3]
Learning discriminative model prediction for tracking,
G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte, “Learning discriminative model prediction for tracking,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) , 2019. 1, 10, 19, 25, 26, 31, 47, 51, 61, 79, 83, 87
work page 2019
-
[4]
Siamrpn++: Evolution of siamese visual tracking with very deep networks,
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2019. 1, 11, 19, 32, 34, 47, 79
work page 2019
-
[5]
Visual object tracking with discriminative filters and siamese networks: a survey and outlook,
S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, and J. Matas, “Visual object tracking with discriminative filters and siamese networks: a survey and outlook,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) , 2022. 1, 10, 11, 27, 47, 51, 79, 83
work page 2022
-
[6]
A vist: A benchmark for visual object tracking in adverse visibility,
M. Noman, W. A. Ghallabi, D. Najiha, C. Mayer, A. Dudhane, M. Danelljan, H. Cholakkal, S. Khan, L. Van Gool, and F. S. Khan, “A vist: A benchmark for visual object tracking in adverse visibility,” in Proc. Brit. Mach. Vis. Conf. (BMVC) ,
-
[7]
2, 16, 28, 29, 33, 43, 59, 68, 88
-
[8]
Improving visual object tracking through visual prompting,
S.-F. Chen, J.-C. Chen, I.-H. Jhuo, and Y.-Y. Lin, “Improving visual object tracking through visual prompting,” IEEE Transactions on Multimedia (TMM) , 2025. 4, 8, 47, 49, 51, 52, 53, 55, 58, 59, 60, 61, 62, 66, 67, 83, 87, 89, 90, 96
work page 2025
-
[9]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML) , 2021. 4, 8, 19, 22, 29, 35, 36, 41, 66 106
work page 2021
-
[10]
S.-F. Chen, J.-C. Chen, I.-H. Jhuo, and Y.-Y. Lin, “GOT-JEPA: Generic object tracking with model adaptation and occlusion handling using joint-embedding pre- dictive architecture,” IEEE Trans. Circ. Syst. Video Tech. (TCSVT) , 2026. 5, 46
work page 2026
-
[11]
A path towards autonomous machine intelligence,
Y. LeCun, “A path towards autonomous machine intelligence,” https:// openreview.net/forum?id=BZ5a1r-kVsf, 2022. 5, 6, 8, 13, 49, 52
work page 2022
-
[12]
Cotracker: It is better to track together,
N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht, “Cotracker: It is better to track together,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2024. 6, 8, 14, 15, 49, 55, 56, 75, 76, 79
work page 2024
-
[13]
GOT-Edit: Geometry-aware generic object tracking via online model editing,
S.-F. Chen, J.-C. Chen, I. hong Jhuo, and Y.-Y. Lin, “GOT-Edit: Geometry-aware generic object tracking via online model editing,” in The Fourteenth International Conference on Learning Representations , 2026. 6, 78
work page 2026
-
[14]
Vggt: Visual geometry grounded transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025. 6, 8, 15, 76, 79, 80, 84, 89
work page 2025
-
[15]
Dinov2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fer- nandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” Trans. Mach. Learn. Res. (TMLR) , 2023. 6, 8, 12, 21, 28, 41, 53, 61, 62, 67, 80, 84, 89, 90
work page 2023
-
[16]
Alphaedit: Null-space constrained knowledge editing for language models,
J. Fang, H. Jiang, K. Wang, Y. Ma, S. Jie, X. Wang, X. He, and T.-S. Chua, “Alphaedit: Null-space constrained knowledge editing for language models,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2025. 7, 8, 80, 82
work page 2025
-
[17]
Visual object tracking using adaptive correlation filters,
D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2010. 10
work page 2010
-
[18]
Exploiting the circulant structure of tracking-by-detection with kernels,
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “Exploiting the circulant structure of tracking-by-detection with kernels,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2012. 10, 83
work page 2012
-
[19]
Learning multi-domain convolutional neural networks for vi- sual tracking,
H. Nam and B. Han, “Learning multi-domain convolutional neural networks for vi- sual tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) ,
-
[20]
Learning background-aware correlation filters for visual tracking,
H. Kiani Galoogahi, A. Fagg, and S. Lucey, “Learning background-aware correlation filters for visual tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2017. 10 107
work page 2017
-
[21]
Joint representation and truncated inference learning for correlation filter based tracking,
Y. Yao, X. Wu, S. Shan, and W. Zuo, “Joint representation and truncated inference learning for correlation filter based tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018. 10
work page 2018
-
[22]
Discriminative correlation filter with channel and spatial reliability,
A. Lukezic, T. Vojir, L. Cehovin Zajc, J. Matas, and M. Kristan, “Discriminative correlation filter with channel and spatial reliability,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2017. 10
work page 2017
-
[23]
Eco: Efficient con- volution operators for tracking,
M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg, “Eco: Efficient con- volution operators for tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017. 10
work page 2017
-
[24]
Beyond correlation filters: Learning continuous convolution operators for visual tracking,
M. Danelljan, A. Robinson, F. Shahbaz Khan, and M. Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2016. 10, 66
work page 2016
-
[25]
Learning a novel ensemble tracker for robust visual tracking,
K. Nai and S. Chen, “Learning a novel ensemble tracker for robust visual tracking,” IEEE Trans. Multimedia (TMM) , 2023. 10, 32, 42
work page 2023
-
[26]
Robust tracking against adversarial attacks,
S. Jia, C. Ma, Y. Song, and X. Yang, “Robust tracking against adversarial attacks,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2024. 10
work page 2024
-
[27]
Occlusion-aware real-time object tracking,
X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, “Occlusion-aware real-time object tracking,” IEEE Trans. Image Process. (TIP) , 2016. 10
work page 2016
-
[28]
Semantics-aware visual object tracking,
R. Yao, G. Lin, C. Shen, Y. Zhang, and Q. Shi, “Semantics-aware visual object tracking,” IEEE Trans. Circ. Syst. Video Tech. (TCSVT) , 2019. 10
work page 2019
-
[29]
Mining spatial-temporal similarity for visual tracking,
Y. Zhang, X. Gao, Z. Chen, H. Zhong, H. Xie, and C. Yan, “Mining spatial-temporal similarity for visual tracking,” IEEE Trans. Image Process. (TIP) , 2020. 10
work page 2020
-
[30]
Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) , 2011. 10
work page 2011
-
[31]
Rtracker: Recoverable tracking via pn tree structured memory,
Y. Huang, X. Li, Z. Zhou, Y. Wang, Z. He, and M.-H. Yang, “Rtracker: Recoverable tracking via pn tree structured memory,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024. 10
work page 2024
-
[32]
High-speed tracking with kernelized correlation filters,
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) ,
-
[33]
Joint spatio-temporal similarity and discrimination learning for visual tracking,
Y. Liang, H. Chen, Q. Wu, C. Xia, and J. Li, “Joint spatio-temporal similarity and discrimination learning for visual tracking,” IEEE Trans. Circ. Syst. Video Tech. (TCSVT), 2025. 10 108
work page 2025
-
[34]
Deformable object tracking with gated fusion,
W. Liu, Y. Song, D. Chen, S. He, Y. Yu, T. Yan, G. P. Hancke, and R. W. Lau, “Deformable object tracking with gated fusion,” IEEE Trans. Image Process. (TIP),
-
[35]
Transformer meets tracker: Exploiting temporal context for robust visual tracking,
N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2021. 11, 32, 34, 43, 91, 96
work page 2021
-
[36]
Transforming model prediction for tracking,
C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. Van Gool, “Transforming model prediction for tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2022. 10, 14, 22, 24, 25, 26, 31, 32, 34, 42, 43, 47, 49, 51, 52, 53, 55, 58, 59, 60, 61, 63, 66, 67, 80, 83, 84, 85, 87, 89, 91
work page 2022
-
[37]
Model-agnostic meta-learning for fast adapta- tion of deep networks,
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adapta- tion of deep networks,” in Proc. Int. Conf. Mach. Learn. (ICML) , 2017. 10
work page 2017
-
[38]
Meta-learning via hypernetworks,
D. Zhao, S. Kobayashi, J. Sacramento, and J. von Oswald, “Meta-learning via hypernetworks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) , 2020. 10, 53
work page 2020
-
[39]
A simple neural attentive meta-learner,
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural attentive meta-learner,” in Proc. Int. Conf. Learn. Represent. (ICLR) , 2018. 10
work page 2018
-
[40]
Fully- convolutional siamese networks for object tracking,
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fully- convolutional siamese networks for object tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2016. 11
work page 2016
-
[41]
High performance visual tracking with siamese region proposal network,
B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018. 11
work page 2018
-
[42]
Siamcar: Siamese fully convolu- tional classification and regression for visual tracking,
D. Guo, J. Wang, Y. Cui, Z. Wang, and S. Chen, “Siamcar: Siamese fully convolu- tional classification and regression for visual tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2020. 11
work page 2020
-
[43]
Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines,
Y. Xu, Z. Wang, Z. Li, Y. Yuan, and G. Yu, “Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines,” in Proc. AAAI Conf. Artif. Intell. (AAAI) , 2020. 11
work page 2020
-
[44]
Siam r-cnn: Visual tracking by re-detection,
P. Voigtlaender, J. Luiten, P. H. Torr, and B. Leibe, “Siam r-cnn: Visual tracking by re-detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) ,
-
[45]
Deformable siamese attention net- works for visual object tracking,
Y. Yu, Y. Xiong, W. Huang, and M. R. Scott, “Deformable siamese attention net- works for visual object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020. 11 109
work page 2020
-
[46]
Ocean: Object-aware anchor-free tracking,
Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-aware anchor-free tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2020. 11, 32
work page 2020
-
[47]
Siamese instance search for tracking,
R. Tao, E. Gavves, and A. W. Smeulders, “Siamese instance search for tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2016. 11
work page 2016
-
[48]
Learning spatio-temporal transformer for visual tracking,
B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) , 2021. 11, 34, 42, 43, 66
work page 2021
-
[49]
X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2021. 11, 32, 34, 43, 66
work page 2021
-
[50]
Joint feature learning and relation modeling for tracking: A one-stream framework,
B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022. 11, 32, 42, 58, 63
work page 2022
-
[51]
Learning target- aware representation for visual tracking via informative interactions,
M. Guo, Z. Zhang, H. Fan, L. Jing, Y. Lyu, B. Li, and W. Hu, “Learning target- aware representation for visual tracking via informative interactions,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2022. 11
work page 2022
-
[52]
Robust object modeling for visual tracking,
Y. Cai, J. Liu, J. Tang, and G. Wu, “Robust object modeling for visual tracking,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) , 2023. 11, 34, 42, 43, 58, 66, 96
work page 2023
-
[53]
Aiatrack: Attention in attention for transformer visual tracking,
S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” in Proc. Eur. Conf. Comput. Vis. (ECCV) , 2022. 11, 32, 34
work page 2022
-
[54]
Target-aware tracking with long-term context attention,
K. He, C. Zhang, S. Xie, Z. Li, and Z. Wang, “Target-aware tracking with long-term context attention,” in Proc. AAAI Conf. Artif. Intell. (AAAI) , 2023. 11, 21, 34, 42
work page 2023
-
[55]
Reading relevant feature from global representation memory for visual object tracking,
X. Zhou, P. Guo, L. Hong, J. Li, W. Zhang, W. Ge, and W. Zhang, “Reading relevant feature from global representation memory for visual object tracking,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) , 2023. 11
work page 2023
-
[56]
Mixformer: End-to-end tracking with iter- ative mixed attention,
Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iter- ative mixed attention,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022. 11, 32, 34, 43, 66, 91
work page 2022
-
[57]
Dropmae: Masked au- toencoders with spatial-attention dropout for tracking tasks,
Q. Wu, T. Yang, Z. Liu, B. Wu, Y. Shan, and A. B. Chan, “Dropmae: Masked au- toencoders with spatial-attention dropout for tracking tasks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2023. 11, 42 110
work page 2023
-
[58]
Representation learning for visual object tracking by masked appearance transfer,
H. Zhao, D. Wang, and H. Lu, “Representation learning for visual object tracking by masked appearance transfer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023. 11, 42
work page 2023
-
[59]
Context-guided black-box attack for visual tracking,
X. Huang, D. Miao, H. Wang, Y. Wang, and X. Li, “Context-guided black-box attack for visual tracking,” IEEE Trans. Multimedia (TMM) , 2024. 11
work page 2024
-
[60]
Seqtrack: Sequence to sequence learning for visual object tracking,
X. Chen, H. Peng, D. Wang, H. Lu, and H. Hu, “Seqtrack: Sequence to sequence learning for visual object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023. 11, 21, 32, 34, 42, 43, 59, 61, 63, 66, 87, 90
work page 2023
-
[61]
Autoregressive visual tracking,
X. Wei, Y. Bai, Y. Zheng, D. Shi, and Y. Gong, “Autoregressive visual tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2023. 11, 32, 42
work page 2023
-
[62]
Autoregressive queries for adaptive tracking with spatio-temporal trans- formers,
X. Jinxia, Z. Bineng, M. Zhiyi, Z. Shengping, S. Liangtao, S. Shuxiang, and J. Rongrong, “Autoregressive queries for adaptive tracking with spatio-temporal trans- formers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2024. 11, 42
work page 2024
-
[63]
Explicit visual prompts for visual object tracking,
L. Shi, B. Zhong, Q. Liang, N. Li, S. Zhang, and X. Li, “Explicit visual prompts for visual object tracking,” in Proc. AAAI Conf. Artif. Intell. (AAAI) , 2024. 11, 42, 66
work page 2024
-
[64]
Hiptrack: Visual tracking with historical prompts,
W. Cai, Q. Liu, and Y. Wang, “Hiptrack: Visual tracking with historical prompts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2024. 11, 42, 58, 66
work page 2024
-
[65]
Diff-tracker: Text-to-image diffusion models are unsupervised trackers,
Z. Zhang, L. Xu, D. Peng, H. Rahmani, and J. Liu, “Diff-tracker: Text-to-image diffusion models are unsupervised trackers,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2024. 11
work page 2024
-
[66]
Diffusiontrack: Point set diffusion model for vi- sual object tracking,
F. Xie, Z. Wang, and C. Ma, “Diffusiontrack: Point set diffusion model for vi- sual object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024. 11, 42, 63
work page 2024
-
[67]
Dreamtrack: Dreaming the future for multimodal visual object tracking,
M. Guo, W. Tan, W. Ran, L. Jing, and Z. Zhang, “Dreamtrack: Dreaming the future for multimodal visual object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2025. 11
work page 2025
-
[68]
Less is more: Token context-aware learning for object tracking,
C. Xu, B. Zhong, Q. Liang, Y. Zheng, G. Li, and S. Song, “Less is more: Token context-aware learning for object tracking,” in Proc. AAAI Conf. Artif. Intell. (AAAI), 2025. 11 111
work page 2025
-
[69]
Mambalct: Boosting tracking via long-term context state space model,
X. Li, B. Zhong, Q. Liang, G. Li, Z. Mo, and S. Song, “Mambalct: Boosting tracking via long-term context state space model,” in Proc. AAAI Conf. Artif. Intell. (AAAI) , 2025. 11, 61, 90
work page 2025
-
[70]
Exploring enhanced contextual information for video-level object tracking,
B. Kang, X. Chen, S. Lai, Y. Liu, Y. Liu, and D. Wang, “Exploring enhanced contextual information for video-level object tracking,” in Proc. AAAI Conf. Artif. Intell. (AAAI) , 2025. 11, 87, 90, 91
work page 2025
-
[71]
Robust tracking via mamba- based context-aware token learning,
J. Xie, B. Zhong, Q. Liang, N. Li, Z. Mo, and S. Song, “Robust tracking via mamba- based context-aware token learning,” in Proc. AAAI Conf. Artif. Intell. (AAAI) ,
-
[72]
Ovtrack: Open- vocabulary multiple object tracking,
S. Li, T. Fischer, L. Ke, H. Ding, M. Danelljan, and F. Yu, “Ovtrack: Open- vocabulary multiple object tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023. 12
work page 2023
-
[73]
Citetracker: Correlating image and text for visual tracking,
X. Li, Y. Huang, Z. He, Y. Wang, H. Lu, and M.-H. Yang, “Citetracker: Correlating image and text for visual tracking,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023. 12, 32, 42
work page 2023
-
[74]
Onetracker: Unifying visual object tracking with foundation models and efficient tuning,
L. Hong, S. Yan, R. Zhang, W. Li, X. Zhou, P. Guo, K. Jiang, Y. Chen, J. Li, Z. Chen, et al., “Onetracker: Unifying visual object tracking with foundation models and efficient tuning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024. 12, 13, 42, 43
work page 2024
-
[75]
Visual prompt multi-modal tracking,
J. Zhu, S. Lai, X. Chen, D. Wang, and H. Lu, “Visual prompt multi-modal tracking,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , 2023. 12, 13, 15
work page 2023
-
[76]
Divert more attention to vision-language tracking,
M. Guo, Z. Zhang, H. Fan, and L. Jing, “Divert more attention to vision-language tracking,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) , 2022. 12
work page 2022
-
[77]
C. Fifty, D. Duan, R. G. Junkins, E. Amid, J. Leskovec, C. Ré, and S. Thrun, “Context-aware meta-learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2024. 12, 21
work page 2024
-
[78]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) , 2023. 13, 19
work page 2023
-
[79]
Segment everything everywhere all at once,
X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Wang, L. Wang, J. Gao, and Y. J. Lee, “Segment everything everywhere all at once,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) , 2023. 13, 19 112
work page 2023
-
[80]
Segment anything meets point tracking,
F. Rajič, L. Ke, Y.-W. Tai, C.-K. Tang, M. Danelljan, and F. Yu, “Segment anything meets point tracking,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. W ACV), 2025. 13
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.