Self-Supervised Test-Time Tuning for Packet Loss Concealment

Joseph Keshet; Yehoshua Dissen

arxiv: 2607.01823 · v1 · pith:2PKKHH47new · submitted 2026-07-02 · 📡 eess.AS · cs.CL

Self-Supervised Test-Time Tuning for Packet Loss Concealment

Yehoshua Dissen , Joseph Keshet This is my paper

Pith reviewed 2026-07-03 05:18 UTC · model grok-4.3

classification 📡 eess.AS cs.CL

keywords packet loss concealmentself-supervised adaptationtest-time tuningaudio reconstructionspeech processingmusic transmissionneural audio models

0 comments

The pith

Pretrained packet loss concealment models can be adapted at test time using only received audio packets to better reconstruct the missing ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that PLC models need not remain fixed after initial training because each lossy audio signal carries usable supervision in the packets that arrived. By synthetically masking segments of those received packets and retraining the model on its original concealment objective, the approach produces an adapted model that then handles the actual losses. This self-supervised process requires no clean reference audio, no external data, and no architecture changes. It applies in both offline file processing, where multiple adaptation passes are possible, and in causal streaming, where updates from past blocks affect only future output. The central insight is that signal-specific patterns visible in the received portions can guide better concealment of the unseen portions.

Core claim

The still-observed portions of a lossy signal can provide an effective training signal for improving concealment on that same signal. TTT-PLC achieves this by creating synthetic masks on the received audio, applying the model's native PLC training objective to those masks, and then deploying the resulting adapted parameters on the true missing packets.

What carries the argument

TTT-PLC framework: self-supervised test-time tuning that synthetically masks portions of received packets and adapts the model on the native PLC objective without external supervision.

If this is right

Pretrained PLC models improve on individual signals without requiring clean references or retraining from scratch.
Non-causal adaptation on an entire received file yields a performance ceiling reachable through repeated self-supervised passes.
Causal adaptation updates parameters from completed past blocks and applies them only to future audio blocks.
The same framework works on both recurrent full-band speech models and hybrid autoregressive-neural music models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar self-supervised test-time adaptation may apply to other partial-observation audio tasks such as denoising or dereverberation where observed segments can supervise reconstruction of unobserved ones.
Reducing reliance on perfectly matched training distributions could allow smaller base models if per-signal adaptation compensates at deployment.
In variable network conditions, per-call or per-stream adaptation might lower average perceptual degradation compared with a single fixed model.

Load-bearing premise

Synthetically masking portions of the received signal creates a training distribution that is close enough to real packet losses for the adapted model to generalize to the actual missing packets.

What would settle it

Measure whether the adapted model produces higher error than the original fixed model when both are tested on audio with genuine network packet losses that were never seen during adaptation.

Figures

Figures reproduced from arXiv: 2607.01823 by Joseph Keshet, Yehoshua Dissen.

**Figure 1.** Figure 1: Causal FRN block replay improves after the first completed block [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

read the original abstract

Packet loss concealment (PLC) reconstructs audio packets that are missing at the receiver, usually with a trained model whose parameters remain fixed at deployment time. This treats the PLC model as static, even though each call or recording exposes signal-specific information through the packets that did arrive. We present TTT-PLC, a self-supervised test-time tuning framework that adapts existing PLC models using only those received packets. The method creates supervision by synthetically masking portions of the available signal, training the model to conceal them with its native PLC objective, and then using the adapted model to reconstruct the true packet losses. No clean reference signal, external adaptation data, or architectural modification is required. We study TTT-PLC in two deployment settings. In the non-causal setting, the received file is available before reconstruction, allowing repeated self-supervised adaptation passes and providing a per-file adaptation ceiling. In the causal setting, audio is streamed without revising emitted samples; adaptation is performed only on completed past blocks, and updated parameters affect only future audio. We instantiate the framework on two public PLC backbones, FRN, a recurrent full-band speech PLC model, and PARCnet, a hybrid autoregressive-neural model for networked music. Across these settings, the results show that pretrained PLC systems do not need to be treated as fixed at inference time, the still-observed portions of a lossy signal can provide an effective training signal for improving concealment on that same signal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TTT-PLC gives a workable self-supervised adaptation loop for existing PLC models using only received packets, but the abstract supplies no numbers and the synthetic masking assumption is untested against real loss patterns.

read the letter

The main takeaway is that this paper shows how to adapt a pretrained PLC model at test time without clean references or extra data. It creates its own supervision by masking parts of the received signal and retraining on the native concealment loss, then applies the updated model to the actual missing packets. The approach is tested in both non-causal file-level adaptation and causal streaming, on FRN for speech and PARCnet for music.

What is new is the specific self-supervised loop that turns the still-available packets into a per-signal training signal. The paper does a clean job of separating the two deployment regimes and showing that no architecture changes are needed.

The soft spots are straightforward. The abstract reports positive results but gives no quantitative numbers, no error bars, and no description of the masking policy or how burst lengths and correlations were chosen. The stress-test concern lands: if the synthetic masks do not match the true packet-loss process, especially in the causal case where adaptation uses only past blocks, the gains may not transfer. Without those details or ablations, it is difficult to know whether the central claim holds.

This paper is for people working on networked audio, speech codecs, or test-time adaptation in signal processing. A reader who already knows the PLC literature will see the practical angle quickly.

It deserves a serious referee. The idea is internally consistent and targets a real deployment limitation, even if the current evidence is thin. I would send it out for review rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes TTT-PLC, a self-supervised test-time tuning framework that adapts pretrained PLC models (FRN and PARCnet) at inference time by synthetically masking portions of received packets to generate supervision signals, then applying the adapted model to true losses. It evaluates the approach in non-causal (full-file) and causal (streaming) settings, claiming that observed packets provide an effective training signal for the same signal without clean references or architectural changes.

Significance. If the results hold, the work shows that PLC models need not be treated as fixed at deployment and that per-signal adaptation is feasible from received data alone, which could improve robustness in real audio streaming without requiring new training corpora or model redesigns.

major comments (2)

[Abstract, §3] Abstract and method description: the central claim requires that synthetically masked segments drawn from received packets yield a training distribution sufficiently close to the true (unobserved) packet losses; however, no masking policy, burst-length statistics, or correlation measures are specified, and no ablation is reported when the synthetic distribution diverges from the test loss process.
[Abstract, causal setting paragraph] Causal setting (described in abstract): adaptation occurs only on past blocks whose loss statistics may differ from future blocks, yet the manuscript provides no analysis or experiment quantifying the impact of this temporal mismatch on adaptation gains.

minor comments (1)

[Abstract] The abstract states positive results on two backbones and two settings but reports no quantitative numbers, error bars, or baseline comparisons; including at least the key metrics (e.g., PESQ or STOI deltas) would strengthen the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate the requested clarifications and analyses into the revised manuscript.

read point-by-point responses

Referee: [Abstract, §3] Abstract and method description: the central claim requires that synthetically masked segments drawn from received packets yield a training distribution sufficiently close to the true (unobserved) packet losses; however, no masking policy, burst-length statistics, or correlation measures are specified, and no ablation is reported when the synthetic distribution diverges from the test loss process.

Authors: We agree that the masking policy and its relation to real loss statistics must be stated explicitly to substantiate the central claim. In the revision we will (i) detail the exact synthetic masking procedure applied to received packets, (ii) report the burst-length distribution and any correlation statistics used, and (iii) add an ablation that measures performance degradation when the synthetic masking distribution is deliberately mismatched to the test loss process. revision: yes
Referee: [Abstract, causal setting paragraph] Causal setting (described in abstract): adaptation occurs only on past blocks whose loss statistics may differ from future blocks, yet the manuscript provides no analysis or experiment quantifying the impact of this temporal mismatch on adaptation gains.

Authors: We acknowledge that the possible mismatch between loss statistics in past blocks (used for adaptation) and future blocks (where the adapted model is applied) requires explicit quantification. The revised manuscript will include a dedicated analysis together with controlled experiments that vary the degree of temporal mismatch and report its effect on the observed adaptation gains. revision: yes

Circularity Check

0 steps flagged

No circularity; procedural adaptation loop is self-contained

full rationale

The paper presents an empirical self-supervised test-time tuning procedure that creates synthetic masks on received packets to adapt a pretrained PLC model, then applies the adapted model to true losses. No equations, derivations, or first-principles claims are advanced that reduce any result to fitted inputs by construction. The central claim rests on the methodological assumption that synthetic masking provides useful supervision, but this is an explicit design choice rather than a self-referential reduction. No load-bearing self-citations or uniqueness theorems are invoked to force outcomes. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that synthetic masking of received audio produces a training distribution close enough to real losses for adaptation to help; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Synthetic masking of available packets creates supervision whose statistics match those of true packet losses for the purpose of model adaptation.
Invoked in the description of how the self-supervised objective is constructed (abstract).

pith-pipeline@v0.9.1-grok · 5792 in / 1186 out tokens · 17259 ms · 2026-07-03T05:18:27.581595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Recommendation ITU-T g.711 appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711,

International Telecommunication Union, “Recommendation ITU-T g.711 appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711,” International Telecommunication Union, Recommendation, 1999. [Online]. Available: https://www.itu.int/rec/ T-REC-G.711-199909-I%21AppI/en

1999
[2]

Definition of the opus audio codec,

J.-M. Valin, K. V os, and T. B. Terriberry, “Definition of the opus audio codec,” RFC Editor, Tech. Rep. RFC 6716, 2012. [Online]. Available: https://www.rfc-editor.org/info/rfc6716

2012
[3]

Interspeech 2022 audio deep packet loss concealment challenge,

L. Diener, S. Sootla, S. Branets, A. Saabas, R. Aichner, and R. Cutler, “Interspeech 2022 audio deep packet loss concealment challenge,” in Proc. Interspeech 2022, 2022, pp. 580–584

2022
[4]

Plcmos – a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms,

L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, “Plcmos – a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms,” inProc. Interspeech 2023, 2023, pp. 2533–2537

2023
[5]

Improving performance of real-time full-band blind packet-loss concealment with predictive network,

V .-A. Nguyen, A. H. T. Nguyen, and A. W. H. Khong, “Improving performance of real-time full-band blind packet-loss concealment with predictive network,” inICASSP 2023 – 2023 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP), 2023

2023
[6]

Hybrid packet loss concealment for real-time networked music applications,

A. I. Mezza, M. Amerena, A. Bernardini, and A. Sarti, “Hybrid packet loss concealment for real-time networked music applications,”IEEE Open Journal of Signal Processing, vol. 5, pp. 266–273, 2024

2024
[7]

Linear prediction based packet loss concealment algorithm for pcm coded speech,

E. Gunduzhan and K. Momtahan, “Linear prediction based packet loss concealment algorithm for pcm coded speech,”IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp. 778–785, 2001

2001
[8]

Packet loss concealment based on extrapolation of speech waveform,

J.-H. Chen, “Packet loss concealment based on extrapolation of speech waveform,” in2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4129–4132

2009
[9]

A time-domain convolutional recurrent network for packet loss con- cealment,

J. Lin, Y . Wang, K. Kalgaonkar, G. Keren, D. Zhang, and C. Fuegen, “A time-domain convolutional recurrent network for packet loss con- cealment,” inICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7148– 7152

2021
[10]

tplcnet: Real-time deep packet loss concealment in the time domain using a short temporal context,

N. L. Westhausen and B. T. Meyer, “tplcnet: Real-time deep packet loss concealment in the time domain using a short temporal context,” inProc. Interspeech 2022, 2022, pp. 2903–2907

2022
[11]

Adversarial auto-encoding for packet loss concealment,

S. Pascual, J. Serra, and J. Pons, “Adversarial auto-encoding for packet loss concealment,” in2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021, pp. 71–75

2021
[12]

A temporal- spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission,

J. Wang, Y . Guan, C. Zheng, R. Peng, and X. Li, “A temporal- spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission,”The Journal of the Acoustical Society of America, vol. 150, no. 4, pp. 2577–2588, 2021

2021
[13]

Diff-plc: A diffusion-based approach for effective packet loss concealment,

D.-H. Yang and J.-H. Chang, “Diff-plc: A diffusion-based approach for effective packet loss concealment,” in2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 357–363

2024
[14]

Flow-plc: Towards efficient packet loss concealment with flow matching,

——, “Flow-plc: Towards efficient packet loss concealment with flow matching,”IEEE Signal Processing Letters, 2025

2025
[15]

The icassp 2024 audio deep packet loss concealment grand challenge,

L. Diener, S. Branets, A. Saabas, and R. Cutler, “The icassp 2024 audio deep packet loss concealment grand challenge,”IEEE Open Journal of Signal Processing, vol. 6, pp. 231–237, 2025

2024
[16]

Perceptual evaluation of speech quality (pesq): A new method for speech quality assessment of telephone networks and codecs,

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (pesq): A new method for speech quality assessment of telephone networks and codecs,” in2001 IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 749–752

2001
[17]

An algorithm for intelligibility prediction of time–frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011

2011
[18]

The gilbert-elliott model for packet loss in real time services on the internet,

G. Haßlinger and O. Hohlfeld, “The gilbert-elliott model for packet loss in real time services on the internet,” inProceedings of the 14th GI/ITG Conference on Measurement, Modelling and Evaluation of Computer and Communication Systems (MMB 2008), 2008

2008
[19]

Towards robust packet loss concealment system with asr-guided representations,

D.-H. Yang and J.-H. Chang, “Towards robust packet loss concealment system with asr-guided representations,” in2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–8

2023
[20]

Td-plc: A semantic-aware speech encoding for improved packet loss concealment,

J. Zhang, Z. Zhao, Y . Liu, J. Liu, Z. He, and K. Niu, “Td-plc: A semantic-aware speech encoding for improved packet loss concealment,” inProc. Interspeech 2024, 2024, pp. 1745–1749

2024
[21]

Enhanced asr robustness to packet loss with a front-end adaptation network,

Y . Dissen, S. Yonash, I. Cohen, and J. Keshet, “Enhanced asr robustness to packet loss with a front-end adaptation network,” inProc. Interspeech 2024, 2024, pp. 5008–5012

2024
[22]

A front-end adaptation network for improving speech recognition performance in packet loss and noisy environments,

——, “A front-end adaptation network for improving speech recognition performance in packet loss and noisy environments,”IEEE Transactions on Audio, Speech and Language Processing, 2025

2025
[23]

Noise2void: Learning denoising from single noisy images,

A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void: Learning denoising from single noisy images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2129–2137

2019
[24]

Noise2self: Blind denoising by self- supervision,

J. Batson and L. Royer, “Noise2self: Blind denoising by self- supervision,” inInternational Conference on Machine Learning, 2019, pp. 524–533

2019
[25]

Tent: Fully test-time adaptation by entropy minimization,

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” inInternational Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://openreview.net/forum?id=uXl3bZLkr3c

2021
[26]

Zero-shot test-time adaptation via knowledge distillation for personalized speech denoising and dereverberation,

S. Kim, M. Athi, G. Shi, T. Kristjansson, and M. Kim, “Zero-shot test-time adaptation via knowledge distillation for personalized speech denoising and dereverberation,”Journal of the Acoustical Society of America, 2024. DISSEN AND KESHET: TEST-TIME TUNING FOR PACKET LOSS CONCEALMENT 11

2024
[27]

Deep image prior,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9446–9454

2018
[28]

Deep decoder: Concise image representations from untrained non-convolutional networks,

R. Heckel and P. Hand, “Deep decoder: Concise image representations from untrained non-convolutional networks,” inInternational Confer- ence on Learning Representations, 2019

2019
[29]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” inEuropean Conference on Computer Vision, 2020, pp. 405– 421

2020
[30]

Zero-shot self-supervised learning for mri reconstruction,

B. Yaman, “Zero-shot self-supervised learning for mri reconstruction,” inInternational Conference on Learning Representations, 2022

2022
[31]

Librispeech: An asr corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

2015
[32]

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck, “Enabling factorized piano music modeling and generation with the maestro dataset,”arXiv preprint arXiv:1810.12247, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

Mel-cepstral distance measure for objective speech qual- ity assessment,

R. Kubichek, “Mel-cepstral distance measure for objective speech qual- ity assessment,” inProceedings of IEEE pacific rim conference on communications computers and signal processing, vol. 1. IEEE, 1993, pp. 125–128

1993
[34]

Peaq-the itu standard for objective mea- surement of perceived audio quality,

T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, and C. Colomes, “Peaq-the itu standard for objective mea- surement of perceived audio quality,”Journal of the Audio Engineering Society, vol. 48, no. 1/2, pp. 3–29, 2000

2000

[1] [1]

Recommendation ITU-T g.711 appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711,

International Telecommunication Union, “Recommendation ITU-T g.711 appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711,” International Telecommunication Union, Recommendation, 1999. [Online]. Available: https://www.itu.int/rec/ T-REC-G.711-199909-I%21AppI/en

1999

[2] [2]

Definition of the opus audio codec,

J.-M. Valin, K. V os, and T. B. Terriberry, “Definition of the opus audio codec,” RFC Editor, Tech. Rep. RFC 6716, 2012. [Online]. Available: https://www.rfc-editor.org/info/rfc6716

2012

[3] [3]

Interspeech 2022 audio deep packet loss concealment challenge,

L. Diener, S. Sootla, S. Branets, A. Saabas, R. Aichner, and R. Cutler, “Interspeech 2022 audio deep packet loss concealment challenge,” in Proc. Interspeech 2022, 2022, pp. 580–584

2022

[4] [4]

Plcmos – a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms,

L. Diener, M. Purin, S. Sootla, A. Saabas, R. Aichner, and R. Cutler, “Plcmos – a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms,” inProc. Interspeech 2023, 2023, pp. 2533–2537

2023

[5] [5]

Improving performance of real-time full-band blind packet-loss concealment with predictive network,

V .-A. Nguyen, A. H. T. Nguyen, and A. W. H. Khong, “Improving performance of real-time full-band blind packet-loss concealment with predictive network,” inICASSP 2023 – 2023 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP), 2023

2023

[6] [6]

Hybrid packet loss concealment for real-time networked music applications,

A. I. Mezza, M. Amerena, A. Bernardini, and A. Sarti, “Hybrid packet loss concealment for real-time networked music applications,”IEEE Open Journal of Signal Processing, vol. 5, pp. 266–273, 2024

2024

[7] [7]

Linear prediction based packet loss concealment algorithm for pcm coded speech,

E. Gunduzhan and K. Momtahan, “Linear prediction based packet loss concealment algorithm for pcm coded speech,”IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp. 778–785, 2001

2001

[8] [8]

Packet loss concealment based on extrapolation of speech waveform,

J.-H. Chen, “Packet loss concealment based on extrapolation of speech waveform,” in2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4129–4132

2009

[9] [9]

A time-domain convolutional recurrent network for packet loss con- cealment,

J. Lin, Y . Wang, K. Kalgaonkar, G. Keren, D. Zhang, and C. Fuegen, “A time-domain convolutional recurrent network for packet loss con- cealment,” inICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7148– 7152

2021

[10] [10]

tplcnet: Real-time deep packet loss concealment in the time domain using a short temporal context,

N. L. Westhausen and B. T. Meyer, “tplcnet: Real-time deep packet loss concealment in the time domain using a short temporal context,” inProc. Interspeech 2022, 2022, pp. 2903–2907

2022

[11] [11]

Adversarial auto-encoding for packet loss concealment,

S. Pascual, J. Serra, and J. Pons, “Adversarial auto-encoding for packet loss concealment,” in2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021, pp. 71–75

2021

[12] [12]

A temporal- spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission,

J. Wang, Y . Guan, C. Zheng, R. Peng, and X. Li, “A temporal- spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission,”The Journal of the Acoustical Society of America, vol. 150, no. 4, pp. 2577–2588, 2021

2021

[13] [13]

Diff-plc: A diffusion-based approach for effective packet loss concealment,

D.-H. Yang and J.-H. Chang, “Diff-plc: A diffusion-based approach for effective packet loss concealment,” in2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 357–363

2024

[14] [14]

Flow-plc: Towards efficient packet loss concealment with flow matching,

——, “Flow-plc: Towards efficient packet loss concealment with flow matching,”IEEE Signal Processing Letters, 2025

2025

[15] [15]

The icassp 2024 audio deep packet loss concealment grand challenge,

L. Diener, S. Branets, A. Saabas, and R. Cutler, “The icassp 2024 audio deep packet loss concealment grand challenge,”IEEE Open Journal of Signal Processing, vol. 6, pp. 231–237, 2025

2024

[16] [16]

Perceptual evaluation of speech quality (pesq): A new method for speech quality assessment of telephone networks and codecs,

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (pesq): A new method for speech quality assessment of telephone networks and codecs,” in2001 IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2001, pp. 749–752

2001

[17] [17]

An algorithm for intelligibility prediction of time–frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011

2011

[18] [18]

The gilbert-elliott model for packet loss in real time services on the internet,

G. Haßlinger and O. Hohlfeld, “The gilbert-elliott model for packet loss in real time services on the internet,” inProceedings of the 14th GI/ITG Conference on Measurement, Modelling and Evaluation of Computer and Communication Systems (MMB 2008), 2008

2008

[19] [19]

Towards robust packet loss concealment system with asr-guided representations,

D.-H. Yang and J.-H. Chang, “Towards robust packet loss concealment system with asr-guided representations,” in2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, pp. 1–8

2023

[20] [20]

Td-plc: A semantic-aware speech encoding for improved packet loss concealment,

J. Zhang, Z. Zhao, Y . Liu, J. Liu, Z. He, and K. Niu, “Td-plc: A semantic-aware speech encoding for improved packet loss concealment,” inProc. Interspeech 2024, 2024, pp. 1745–1749

2024

[21] [21]

Enhanced asr robustness to packet loss with a front-end adaptation network,

Y . Dissen, S. Yonash, I. Cohen, and J. Keshet, “Enhanced asr robustness to packet loss with a front-end adaptation network,” inProc. Interspeech 2024, 2024, pp. 5008–5012

2024

[22] [22]

A front-end adaptation network for improving speech recognition performance in packet loss and noisy environments,

——, “A front-end adaptation network for improving speech recognition performance in packet loss and noisy environments,”IEEE Transactions on Audio, Speech and Language Processing, 2025

2025

[23] [23]

Noise2void: Learning denoising from single noisy images,

A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void: Learning denoising from single noisy images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2129–2137

2019

[24] [24]

Noise2self: Blind denoising by self- supervision,

J. Batson and L. Royer, “Noise2self: Blind denoising by self- supervision,” inInternational Conference on Machine Learning, 2019, pp. 524–533

2019

[25] [25]

Tent: Fully test-time adaptation by entropy minimization,

D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” inInternational Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://openreview.net/forum?id=uXl3bZLkr3c

2021

[26] [26]

Zero-shot test-time adaptation via knowledge distillation for personalized speech denoising and dereverberation,

S. Kim, M. Athi, G. Shi, T. Kristjansson, and M. Kim, “Zero-shot test-time adaptation via knowledge distillation for personalized speech denoising and dereverberation,”Journal of the Acoustical Society of America, 2024. DISSEN AND KESHET: TEST-TIME TUNING FOR PACKET LOSS CONCEALMENT 11

2024

[27] [27]

Deep image prior,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9446–9454

2018

[28] [28]

Deep decoder: Concise image representations from untrained non-convolutional networks,

R. Heckel and P. Hand, “Deep decoder: Concise image representations from untrained non-convolutional networks,” inInternational Confer- ence on Learning Representations, 2019

2019

[29] [29]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” inEuropean Conference on Computer Vision, 2020, pp. 405– 421

2020

[30] [30]

Zero-shot self-supervised learning for mri reconstruction,

B. Yaman, “Zero-shot self-supervised learning for mri reconstruction,” inInternational Conference on Learning Representations, 2022

2022

[31] [31]

Librispeech: An asr corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

2015

[32] [32]

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck, “Enabling factorized piano music modeling and generation with the maestro dataset,”arXiv preprint arXiv:1810.12247, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

Mel-cepstral distance measure for objective speech qual- ity assessment,

R. Kubichek, “Mel-cepstral distance measure for objective speech qual- ity assessment,” inProceedings of IEEE pacific rim conference on communications computers and signal processing, vol. 1. IEEE, 1993, pp. 125–128

1993

[34] [34]

Peaq-the itu standard for objective mea- surement of perceived audio quality,

T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, and C. Colomes, “Peaq-the itu standard for objective mea- surement of perceived audio quality,”Journal of the Audio Engineering Society, vol. 48, no. 1/2, pp. 3–29, 2000

2000