WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

Jiajia Xu; Jun Yu; Rui Wang; Shu Zhan; Toru Kurihara; Wan Song; Wei Zhou

arxiv: 2607.02097 · v1 · pith:H6HEQTSCnew · submitted 2026-07-02 · 💻 cs.CV · cs.LG

WBMM: Windowed Batch Matrix Multiplication for Efficient Large Receptive Field Convolution

Wan Song , Wei Zhou , Rui Wang , Jun Yu , Toru Kurihara , Jiajia Xu , Shu Zhan This is my paper

Pith reviewed 2026-07-03 15:36 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords large kernel convolutiondepthwise convolutionbatch matrix multiplicationreceptive fieldefficient convolutionwindow partitioningrelative position biascomputer vision

0 comments

The pith

Windowed batch matrix multiplication enables efficient computation of large receptive field convolutions by converting irregular memory access into regular batched matrix operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that large kernel depthwise convolutions can be made faster and more scalable by partitioning the input into windows and using a compact relative position bias table to build weight matrices for batched matrix multiplication. This approach reverses the usual trend where larger kernels slow down computation, instead improving throughput as windows grow larger. A sympathetic reader would care because it allows vision models to use bigger receptive fields without the typical speed penalty, leading to training speedups of 1.31 to 1.88 times on ImageNet, COCO, and ADE20K with no loss in accuracy.

Core claim

WBMM partitions the input feature map into contiguous windows and indexes a compact relative-position bias table to construct the weight matrices, allowing the large receptive field depthwise convolution to be performed as a batched matrix multiplication with regular memory access patterns.

What carries the argument

Windowed Batch Matrix Multiplication (WBMM) that partitions input into windows and uses a compact relative position bias table to construct weight matrices for batched matrix multiplication, enabling regular memory access.

If this is right

WBMM with 14x14 windows runs faster than 5x5 depthwise convolution while providing a 7.8 times larger per-layer receptive field.
Combined with inter-block cross-window communication and hierarchical window reparameterization, it achieves 1.31-1.88x training speedup with comparable or higher accuracy on ImageNet-1K, COCO, and ADE20K.
Throughput improves as window size increases, opposite to standard depthwise convolutions.
Advantages hold across GPU, CPU, and edge devices without needing specialized kernels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method may allow vision transformers or CNNs to incorporate much larger kernels than previously practical.
Similar windowing and bias table techniques could be applied to other operations suffering from irregular memory access in deep learning.
By avoiding the need for custom acceleration kernels, it lowers the barrier for deploying large receptive field models on diverse hardware.

Load-bearing premise

Constructing the weight matrices from the compact relative-position bias table inside each window exactly preserves the receptive field coverage and numerical behavior of a full large-kernel depthwise convolution.

What would settle it

A direct numerical comparison showing whether the output of WBMM matches that of a standard large-kernel depthwise convolution within floating point tolerance on the same input, or an accuracy difference exceeding 0.5% on ImageNet-1K when replacing one with the other in a model.

Figures

Figures reproduced from arXiv: 2607.02097 by Jiajia Xu, Jun Yu, Rui Wang, Shu Zhan, Toru Kurihara, Wan Song, Wei Zhou.

**Figure 1.** Figure 1: Depthwise convolution vs. WBMM. (a) Depthwise convolution gathers k 2 scattered neighbors per output, causing irregular memory access that worsens with kernel size. (b) WBMM partitions input into contiguous windows and constructs weights via table indexing, enabling regular memory access through batched matrix multiplication. 1. Introduction Convolutional Neural Networks (CNNs) have undergone significan… view at source ↗

**Figure 2.** Figure 2: GPU memory access pattern: depthwise convolution vs. WBMM. (a,c) A 5 × 5 depthwise convolution gathers 25 values from 5 non-contiguous rows, requiring 5 separate cache fetches with stride W. (b,d) WBMM reads each window as a single contiguous block, fitting entirely in L1 cache and enabling coalesced access. A 4 × 4 window is shown here for illustration only; actual WBMM configurations use 7×7 or 14×14 win… view at source ↗

**Figure 3.** Figure 3: Operator-level benchmark (batch=128, 256 channels, FP32, single A800 GPU). DW-Std 5 × 5 serves as baseline. See text for detailed analysis. & Hutter, 2019), 160k iterations for ADE20K, and a 3× schedule for COCO. Ablation metrics are mean±std over three runs; full configurations are in Section L. Why compare with UniRepLKNet. UniRepLKNet (Ding et al., 2024) identified 13 × 13 as the optimal kernel size thr… view at source ↗

**Figure 4.** Figure 4: Interpretable structure of learned WBMM weight matrices M. (a) Locality: M exhibits strong diagonal dominance and > 90% weight decay within distance 2. (b) Channel specialization: channels learn distinct horizontal, vertical, and diagonal patterns resembling oriented edge detectors. (c) Frequency selectivity: low-pass vs. high-pass channels coexist, with shallow stages biased toward high-pass and deeper st… view at source ↗

read the original abstract

Large kernel depthwise convolutions achieve strong performance but suffer from significant degradation as kernel size grows due to irregular memory access from gather-based computation; while Large Kernel Acceleration (LKA) helps on small feature maps, it becomes counterproductive on large feature maps, even slower than non-accelerated implementations. We propose Windowed Batch Matrix Multiplication (WBMM), which partitions input into contiguous windows and indexes a compact relative position bias table to construct weight matrices, enabling regular memory access via batched matrix multiplication. This yields a unique property: WBMM's throughput improves with larger windows, opposite to depthwise convolutions that degrade with larger kernels. Operator-level benchmarks show WBMM with 14x14 windows outperforms 5x5 depthwise convolution baselines in speed while providing a 7.8x larger per-layer receptive field. Combined with inter-block cross-window communication and hierarchical window reparameterization, WBMM achieves comparable or higher accuracy on ImageNet-1K, COCO, and ADE20K with 1.31-1.88x training speedup, and demonstrates consistent advantages across GPU, CPU, and edge devices without requiring specialized acceleration kernels. Our code is available at http://github.com/wansong-s/WBMM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WBMM turns large-kernel depthwise conv into batched matmul over windows with a bias table, and the reported scaling where bigger windows run faster is the part that stands out.

read the letter

The paper's main contribution is the WBMM operator. It partitions the input into windows, pulls weights from a compact relative-position bias table, and runs the work as regular batched matrix multiplication. This produces the unusual behavior that throughput improves as window size grows, which is the opposite of standard gather-based depthwise convolution.

They combine it with inter-block cross-window communication and hierarchical reparameterization, then report operator-level speedups and end-to-end training gains of 1.31-1.88x on ImageNet-1K, COCO, and ADE20K while keeping accuracy comparable or higher. The code is public and the method runs on GPU, CPU, and edge hardware without custom kernels.

The soft spot is the unverified equivalence claim. The approach assumes that indexing the bias table inside each window exactly reproduces the receptive field and floating-point behavior of a full large-kernel depthwise convolution. The abstract gives no element-wise or kernel-level check at the 14x14 size used for the speed numbers, so differences in boundary handling or accumulation order could exist even if final accuracies match after training. Experimental details such as hardware specs, error bars, and ablation tables are also missing from the summary.

This work is aimed at people optimizing CNN backbones for large receptive fields on current hardware. A reader who needs practical convolution speedups rather than a new architecture would find the operator and scaling results useful.

It deserves a serious referee because the core construction is concrete, the code is available, and the performance bottleneck it targets is real.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Windowed Batch Matrix Multiplication (WBMM) to implement large receptive-field depthwise convolutions by partitioning feature maps into contiguous windows, indexing a compact relative-position bias table to build weight matrices, and performing batched matrix multiplication for regular memory access. It claims this yields throughput that improves with window size (opposite to standard depthwise conv), operator-level speedups with 14x14 windows versus 5x5 baselines while delivering 7.8x larger per-layer receptive field, and, when augmented with inter-block cross-window communication plus hierarchical reparameterization, 1.31-1.88x training speedups with comparable or higher accuracy on ImageNet-1K, COCO, and ADE20K across GPU/CPU/edge devices; code is released.

Significance. If the equivalence between WBMM and reference large-kernel depthwise convolution holds exactly (receptive-field coverage and numerics) and the reported speedups prove reproducible without hidden accuracy loss, the approach could provide a practical route to scale receptive fields in convolutional backbones without custom kernels or specialized hardware, with potential impact on efficient vision model design.

major comments (2)

[§3] §3 (WBMM construction): the central claim that indexing the compact relative-position bias table and performing windowed batched matmul exactly replicates the receptive-field coverage and numerical behavior of a gather-based large-kernel depthwise convolution is asserted but unsupported by any direct element-wise output comparison, kernel-equivalence test, or boundary-handling verification at the 14x14 scale used for the speed claims; without this check, discrepancies in effective kernel support or accumulation order remain possible even if downstream accuracies match.
[§4, §5] §4 (operator benchmarks) and §5 (end-to-end results): the reported 1.31-1.88x training speedups and accuracy numbers on ImageNet-1K/COCO/ADE20K lack accompanying experimental protocol details, hardware specifications, error bars, or full ablation tables isolating the contribution of WBMM versus the added cross-window and reparameterization components; this makes it impossible to assess whether the performance advantage is load-bearing or reproducible.

minor comments (2)

The abstract states concrete speed/accuracy numbers but the main text should explicitly cross-reference the corresponding tables/figures for each claim.
Notation for window partitioning and bias-table indexing could be clarified with a small diagram or pseudocode snippet to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on the WBMM manuscript. We address the two major comments below and will incorporate revisions to strengthen the claims on equivalence and experimental reproducibility.

read point-by-point responses

Referee: [§3] §3 (WBMM construction): the central claim that indexing the compact relative-position bias table and performing windowed batched matmul exactly replicates the receptive-field coverage and numerical behavior of a gather-based large-kernel depthwise convolution is asserted but unsupported by any direct element-wise output comparison, kernel-equivalence test, or boundary-handling verification at the 14x14 scale used for the speed claims; without this check, discrepancies in effective kernel support or accumulation order remain possible even if downstream accuracies match.

Authors: We agree that an explicit numerical verification would make the equivalence claim more robust. By construction, WBMM partitions the feature map into contiguous windows and uses the compact relative-position bias table to assemble the exact weight matrix for each window before batched matrix multiplication; this is mathematically identical to the gather-based depthwise convolution (same kernel weights, same receptive-field support per output position, and identical accumulation). Window boundaries are handled by the partitioning scheme to preserve the original convolution semantics without padding artifacts inside windows. Nevertheless, we will add a direct element-wise output comparison (including L2 difference and boundary cases) between WBMM and a reference large-kernel depthwise convolution implementation at the 14×14 scale in the revised §3. revision: yes
Referee: [§4, §5] §4 (operator benchmarks) and §5 (end-to-end results): the reported 1.31-1.88x training speedups and accuracy numbers on ImageNet-1K/COCO/ADE20K lack accompanying experimental protocol details, hardware specifications, error bars, or full ablation tables isolating the contribution of WBMM versus the added cross-window and reparameterization components; this makes it impossible to assess whether the performance advantage is load-bearing or reproducible.

Authors: We acknowledge that the current manuscript provides insufficient protocol transparency. The released code (http://github.com/wansong-s/WBMM) already contains the full training and benchmarking scripts, but we will expand §§4–5 with: (i) complete experimental protocols (optimizer, learning-rate schedule, data augmentation, batch size, number of epochs), (ii) hardware specifications (GPU/CPU/edge device models and software versions), (iii) error bars computed over at least three independent runs, and (iv) expanded ablation tables that isolate the WBMM operator from the inter-block cross-window communication and hierarchical reparameterization modules. These additions will allow readers to assess the contribution and reproducibility of each component. revision: yes

Circularity Check

0 steps flagged

No circularity; direct algorithmic construction with empirical benchmarks

full rationale

The provided abstract and description contain no equations, derivations, or self-citations that reduce any claimed result (speedups, receptive-field size, or accuracy) to a quantity defined by the method itself or by fitted parameters. WBMM is presented as an explicit construction (window partitioning + indexing of relative-position bias table + batched matmul), and the reported operator benchmarks and end-to-end speedups are measured outcomes rather than quantities forced by definition. No load-bearing self-citation chains, uniqueness theorems, or ansatzes appear. The skeptic concern about exact numerical equivalence is a verification issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a new operator but rests on standard hardware assumptions about batched matrix multiplication efficiency; no free parameters, invented entities, or non-standard axioms are visible in the abstract.

axioms (1)

domain assumption Batched matrix multiplication on modern GPUs and CPUs exhibits regular memory access and scales favorably with matrix size.
Invoked to explain why WBMM throughput improves with larger windows.

pith-pipeline@v0.9.1-grok · 5764 in / 1312 out tokens · 22936 ms · 2026-07-03T15:36:00.166501+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

work page
[2]

Cai, Zhaowei and Vasconcelos, Nuno , journal=. Cascade. 2019 , publisher=

work page 2019
[3]

Chen, Honghao and Chu, Xiangxiang and Ren, Yongjian and Zhao, Xin and Huang, Kaiqi , booktitle=

work page
[4]

2009 , organization=

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=. 2009 , organization=

work page 2009
[5]

Ding, Xiaohan and Guo, Yuchen and Ding, Guiguang and Han, Jungong , booktitle=

work page
[6]

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in

Ding, Xiaohan and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang , booktitle=. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in

work page
[7]

Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han, Jungong and Ding, Guiguang and Sun, Jian , booktitle=

work page
[8]

Ding, Xiaohan and Zhang, Yiyuan and Ge, Yixiao and Zhao, Sijie and Song, Lin and Yue, Xiangyu and Shan, Ying , booktitle=

work page
[9]

International Conference on Learning Representations , year=

An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

work page
[10]

International Conference on Learning Representations , year=

Conditional Positional Encodings for Vision Transformers , author=. International Conference on Learning Representations , year=

work page
[11]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , pages=

Self-Attention with Relative Position Representations , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , pages=

work page 2018
[12]

Ding, Xiaohan and Chen, Honghao and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang , booktitle=

work page
[13]

Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu , booktitle=

work page
[14]

and Xu, Daguang , booktitle=

Hatamizadeh, Ali and Nath, Vishwesh and Tang, Yucheng and Yang, Dong and Roth, Holger R. and Xu, Daguang , booktitle=. 2021 , organization=

work page 2021
[15]

2022 , organization=

Cao, Hu and Wang, Yueyue and Chen, Joy and Jiang, Dongsheng and Zhang, Xiaopeng and Tian, Qi and Wang, Manning , booktitle=. 2022 , organization=

work page 2022
[16]

Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining , booktitle=

work page
[17]

Liu, Zhuang and Mao, Hanzi and Wu, Chao-Yuan and Feichtenhofer, Christoph and Darrell, Trevor and Xie, Saining , booktitle=. A

work page
[18]

Computer Vision -- ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=

Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll. Computer Vision -- ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=. 2014 , organization=

work page 2014
[19]

Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong and Shi, Jianping and Ouyan...

work page
[20]

Proceedings of the European Conference on Computer Vision (ECCV) , pages=

Unified Perceptual Parsing for Scene Understanding , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=

work page
[21]

Semantic Understanding of Scenes Through the

Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio , journal=. Semantic Understanding of Scenes Through the. 2019 , publisher=

work page 2019
[22]

Liu, Shiwei and Chen, Tianlong and Chen, Xiaohan and Chen, Xuxi and Xiao, Qiao and Wu, Boqian and K. More. International Conference on Learning Representations , year=

work page
[23]

Tenth International Workshop on Frontiers in Handwriting Recognition , year=

High Performance Convolutional Neural Networks for Document Processing , author=. Tenth International Workshop on Frontiers in Handwriting Recognition , year=

work page
[24]

Chetlur, Sharan and Woolley, Cliff and Vandermersch, Philippe and Cohen, Jonathan and Tran, John and Catanzaro, Bryan and Shelhamer, Evan , journal=

work page
[25]

Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor , booktitle=

work page
[26]

Advances in Neural Information Processing Systems , pages=

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K. Advances in Neural Information Processing Systems , pages=

work page
[27]

Computational Visual Media , volume=

Visual Attention Network , author=. Computational Visual Media , volume=. 2023 , publisher=

work page 2023

[1] [1]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

work page

[2] [2]

Cai, Zhaowei and Vasconcelos, Nuno , journal=. Cascade. 2019 , publisher=

work page 2019

[3] [3]

Chen, Honghao and Chu, Xiangxiang and Ren, Yongjian and Zhao, Xin and Huang, Kaiqi , booktitle=

work page

[4] [4]

2009 , organization=

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=. 2009 , organization=

work page 2009

[5] [5]

Ding, Xiaohan and Guo, Yuchen and Ding, Guiguang and Han, Jungong , booktitle=

work page

[6] [6]

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in

Ding, Xiaohan and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang , booktitle=. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in

work page

[7] [7]

Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han, Jungong and Ding, Guiguang and Sun, Jian , booktitle=

work page

[8] [8]

Ding, Xiaohan and Zhang, Yiyuan and Ge, Yixiao and Zhao, Sijie and Song, Lin and Yue, Xiangyu and Shan, Ying , booktitle=

work page

[9] [9]

International Conference on Learning Representations , year=

An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

work page

[10] [10]

International Conference on Learning Representations , year=

Conditional Positional Encodings for Vision Transformers , author=. International Conference on Learning Representations , year=

work page

[11] [11]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , pages=

Self-Attention with Relative Position Representations , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , pages=

work page 2018

[12] [12]

Ding, Xiaohan and Chen, Honghao and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang , booktitle=

work page

[13] [13]

Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu , booktitle=

work page

[14] [14]

and Xu, Daguang , booktitle=

Hatamizadeh, Ali and Nath, Vishwesh and Tang, Yucheng and Yang, Dong and Roth, Holger R. and Xu, Daguang , booktitle=. 2021 , organization=

work page 2021

[15] [15]

2022 , organization=

Cao, Hu and Wang, Yueyue and Chen, Joy and Jiang, Dongsheng and Zhang, Xiaopeng and Tian, Qi and Wang, Manning , booktitle=. 2022 , organization=

work page 2022

[16] [16]

Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining , booktitle=

work page

[17] [17]

Liu, Zhuang and Mao, Hanzi and Wu, Chao-Yuan and Feichtenhofer, Christoph and Darrell, Trevor and Xie, Saining , booktitle=. A

work page

[18] [18]

Computer Vision -- ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=

Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll. Computer Vision -- ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=. 2014 , organization=

work page 2014

[19] [19]

Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong and Shi, Jianping and Ouyan...

work page

[20] [20]

Proceedings of the European Conference on Computer Vision (ECCV) , pages=

Unified Perceptual Parsing for Scene Understanding , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=

work page

[21] [21]

Semantic Understanding of Scenes Through the

Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio , journal=. Semantic Understanding of Scenes Through the. 2019 , publisher=

work page 2019

[22] [22]

Liu, Shiwei and Chen, Tianlong and Chen, Xiaohan and Chen, Xuxi and Xiao, Qiao and Wu, Boqian and K. More. International Conference on Learning Representations , year=

work page

[23] [23]

Tenth International Workshop on Frontiers in Handwriting Recognition , year=

High Performance Convolutional Neural Networks for Document Processing , author=. Tenth International Workshop on Frontiers in Handwriting Recognition , year=

work page

[24] [24]

Chetlur, Sharan and Woolley, Cliff and Vandermersch, Philippe and Cohen, Jonathan and Tran, John and Catanzaro, Bryan and Shelhamer, Evan , journal=

work page

[25] [25]

Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor , booktitle=

work page

[26] [26]

Advances in Neural Information Processing Systems , pages=

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K. Advances in Neural Information Processing Systems , pages=

work page

[27] [27]

Computational Visual Media , volume=

Visual Attention Network , author=. Computational Visual Media , volume=. 2023 , publisher=

work page 2023