Deep Multitask Learning for Mixed-Type Outcomes with Shared Sparsity

Huichao Li; Sanguo Zhang; Shuangge Ma; Tong Wang

arxiv: 2607.00995 · v1 · pith:HIPLOK4Nnew · submitted 2026-07-01 · 📊 stat.ML · cs.LG

Deep Multitask Learning for Mixed-Type Outcomes with Shared Sparsity

Huichao Li , Tong Wang , Sanguo Zhang , Shuangge Ma This is my paper

Pith reviewed 2026-07-02 06:02 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords multitask learningmixed outcomesshared sparsityrank-based criteriondeep neural networkvariable selectionhigh-dimensional datamonotone transformation

0 comments

The pith

A multitask framework models mixed outcomes as monotone transformations of a shared process to enable unified rank-based learning with shared sparsity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most multitask methods cannot handle mixed outcome types because their task-specific losses are incomparable. This work introduces a transformation framework where each task's response is an unknown monotone transform of a common latent response. A smoothed rank-based criterion with group-Lasso penalty is optimized in a deep neural network that shares the first layer across tasks. The method yields nonasymptotic excess-risk bounds and variable-selection consistency. It is motivated by and tested on high-dimensional gene expression data with continuous, binary, and mixed outcomes.

Core claim

The authors establish a multitask transformation framework in which task-specific responses may differ through unknown monotone transformations. Motivated by shared sparsity in high-dimensional settings, they estimate the target functions and identify important predictors by optimizing a smoothed rank-based criterion with a group-Lasso penalty, implemented through a multitask deep neural network with a shared first layer. They prove nonasymptotic excess-risk bounds and variable-selection consistency for the proposed estimator.

What carries the argument

Smoothed rank-based criterion with group-Lasso penalty on a multitask deep neural network with shared first layer, under the monotone transformation framework for mixed outcomes.

Load-bearing premise

The responses for different tasks are related solely through unknown monotone transformations.

What would settle it

A counterexample dataset in which the inter-task relationships violate monotonicity, causing the method to lose its advantage over separate modeling.

Figures

Figures reproduced from arXiv: 2607.00995 by Huichao Li, Sanguo Zhang, Shuangge Ma, Tong Wang.

read the original abstract

Most existing multitask learning approaches are limited by their reliance on task-specific loss functions tailored to the scale and type of each outcome. When outcomes differ across tasks, these losses are generally not directly comparable, which makes it difficult to formulate a unified objective and may limit information sharing across tasks. We propose a multitask transformation framework in which task-specific responses may differ through unknown monotone transformations. Motivated by high-dimensional biological applications in which the predictor dimension may diverge with the sample size while only a common subset of predictors is informative, we consider shared sparsity across tasks. Under this framework, we estimate the target functions and identify important predictors by optimizing a smoothed rank-based criterion with a group-Lasso penalty, implemented through a multitask deep neural network with a shared first layer. We establish the nonasymptotic excess-risk bounds, and variable-selection consistency for the proposed estimator. Simulation studies show that the proposed method achieves competitive prediction and variable-selection performance compared with competing approaches. Analyses of gene-expression studies with continuous, binary, and mixed outcomes further illustrate that the proposed method improves prediction and identifies biologically meaningful shared predictors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a rank-based multitask method for mixed outcomes that uses unknown monotone transforms plus group-Lasso on a shared DNN layer, with claimed nonasymptotic bounds.

read the letter

The paper's main move is to handle mixed outcome types in multitask learning by assuming each task's response is an unknown monotone transformation of some shared underlying signal. This lets them replace task-specific losses with one smoothed rank-based criterion, then add group-Lasso on the first layer of a deep net to force shared sparsity across tasks.

That construction is the concrete advance. It directly targets the problem that standard losses are not comparable when outcomes are continuous, binary, or mixed, which is common in the gene-expression settings they mention. The simulations report competitive prediction and selection performance, and the real-data examples show it can recover biologically plausible shared predictors.

The load-bearing assumption is the monotone transformation. The abstract and motivation tie the unified objective and the excess-risk bounds to it; if the relationship is not monotone the rank criterion no longer aligns with the data-generating process and the consistency claims do not apply. The nonasymptotic bounds and variable-selection consistency are stated as established, but the abstract alone does not show the derivation or the precise conditions, so those claims cannot be checked yet.

This is for researchers who work on high-dimensional multitask problems with heterogeneous responses, especially in biology or similar fields. A reader who needs a practical way to share information across outcome types will find a usable proposal here.

It should go to peer review. The problem is real, the construction is new on the points described, and the theory is at least attempted, even though the monotone assumption needs close examination and the full derivations are still to be seen.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a multitask transformation framework in which task-specific responses differ by unknown monotone transformations. It optimizes a smoothed rank-based criterion with group-Lasso penalty, implemented via a deep neural network with shared first layer, to estimate target functions and recover shared important predictors under diverging dimension. Nonasymptotic excess-risk bounds and variable-selection consistency are established for the estimator. Simulations demonstrate competitive prediction and selection performance; gene-expression analyses with continuous, binary, and mixed outcomes illustrate improved prediction and biologically meaningful shared predictors.

Significance. If the theoretical results hold under the stated framework, the work provides a unified objective for multitask learning with heterogeneous outcome types, enabling information sharing via rank-based losses and shared sparsity. This addresses a practical limitation in high-dimensional biological applications where outcome scales differ and only a common predictor subset is informative. The combination of nonasymptotic bounds, consistency, and empirical validation on mixed outcomes would be a useful contribution to multitask methods.

major comments (1)

[Abstract] Abstract (paragraph 2): The nonasymptotic excess-risk bounds and variable-selection consistency are derived under the multitask transformation framework that assumes task-specific responses differ through unknown monotone transformations. The abstract motivates the method for mixed-type outcomes (continuous, binary), yet the rank-based objective and its theoretical guarantees lose justification if monotonicity fails to hold; no discussion or sensitivity analysis is indicated for this load-bearing modeling assumption in the general mixed-type setting.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The single major comment is addressed point-by-point below; we agree that additional discussion of the monotonicity assumption is warranted and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph 2): The nonasymptotic excess-risk bounds and variable-selection consistency are derived under the multitask transformation framework that assumes task-specific responses differ through unknown monotone transformations. The abstract motivates the method for mixed-type outcomes (continuous, binary), yet the rank-based objective and its theoretical guarantees lose justification if monotonicity fails to hold; no discussion or sensitivity analysis is indicated for this load-bearing modeling assumption in the general mixed-type setting.

Authors: We agree that the monotonicity assumption is central to the validity of the rank-based criterion and the ensuing nonasymptotic bounds. While the abstract already states that responses 'may differ through unknown monotone transformations,' we acknowledge that the manuscript would benefit from explicit discussion of when this assumption is plausible for mixed-type outcomes and from empirical checks when it is mildly violated. In the revision we will (i) expand the introduction to clarify the modeling rationale for continuous, binary, and mixed outcomes in biological settings and (ii) add a targeted sensitivity simulation that perturbs monotonicity and reports degradation in excess risk and selection consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived conditionally on explicit model assumptions

full rationale

The paper defines a multitask transformation framework with unknown monotone transformations, introduces a smoothed rank-based objective with group-Lasso, and derives nonasymptotic excess-risk bounds plus variable-selection consistency for the resulting DNN estimator. These results are obtained by standard empirical process arguments applied to the proposed criterion under the stated model; they do not reduce by construction to fitted quantities, self-citations, or renamed inputs. The monotone-transformation assumption is an explicit modeling choice required for the rank-based unification, not a hidden tautology. No load-bearing step equates the claimed guarantees to the inputs by definition.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption of monotone transformations linking task responses and the modeling choice of shared sparsity; no free parameters are explicitly named in the abstract, though the group-Lasso tuning parameter and smoothing bandwidth are implicit.

free parameters (2)

group-Lasso penalty parameter
Controls the strength of shared sparsity; must be chosen or tuned.
smoothing parameter for rank-based criterion
Makes the non-smooth rank objective differentiable for optimization.

axioms (2)

domain assumption Task-specific responses differ through unknown monotone transformations
Enables a unified objective across outcome types (abstract paragraph 2).
domain assumption Only a common subset of predictors is informative across tasks
Justifies the shared sparsity penalty in high-dimensional regime.

pith-pipeline@v0.9.1-grok · 5730 in / 1337 out tokens · 30350 ms · 2026-07-02T06:02:39.594959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 4 canonical work pages · 2 internal anchors

[1]

T., Parmigiani, G., and Mazumder, R

Behdin, K., Loewinger, G., Kishida, K. T., Parmigiani, G., and Mazumder, R. (2025). Multi-task learning for sparsity pattern heterogeneity: statistical and computational perspectives.Journal of the Royal Statistical Society Series B: Statistical Methodology page qkaf076

2025
[2]

Chen, Z., Badrinarayanan, V., Lee, C.-Y., and Rabinovich, A. (2018). Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning, pages 794–803. PMLR

2018
[3]

Dinh, V. C. and Ho, L. S. (2020). Consistent feature selection for analytic deep neural networks.Advances in Neural Information Processing Systems33,2420–2431

2020
[4]

S., and Libra, M

Falzone, L., Grimaldi, M., Celentano, E., Augustin, L. S., and Libra, M. (2020). Identification of modulated micrornas associated with breast cancer, diet, and physical activity.Cancers 12,2555

2020
[5]

and Gu, Y

Fan, J. and Gu, Y. (2024). Factor augmented sparse throughput deep relu neural networks for high dimensional regression.Journal of the American Statistical Association119, 2680–2694

2024
[6]

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

Feng, J. and Simon, N. (2017). Sparse-input neural networks for high-dimensional nonpara- metric regression and classification.arXiv preprint arXiv:1711.07592. 23

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator.Journal of Econometrics35,303–316

1987
[8]

P., Swiatnicki, M

Hollern, D. P., Swiatnicki, M. R., Rennhack, J. P., Misek, S. A., Matson, B. C., McAuliff, A., Gallo, K. A., Caron, K. M., and Andrechek, E. R. (2019). E2F1 drives breast cancer metastasis by regulating the target gene FGF13 and altering cell migration.Scientific Reports9,10718

2019
[9]

Huang, X., Xu, K., Lee, D., Hassani, H., Bastani, H., and Dobriban, E. (2025). Optimal multitask linear regression and contextual bandits under sparse heterogeneity.Journal of the American Statistical Associationpages 1–14

2025
[10]

Ji, S., Kollár, J., and Shiffman, B. (1992). A global łojasiewicz inequality for algebraic varieties.Transactions of the American Mathematical Society329,813–818

1992
[11]

Jiao, Y., Shen, G., Lin, Y., and Huang, J. (2023). Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors.The Annals of Statistics51,691–716

2023
[12]

Kendall, A., Gal, Y., and Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7482–7491

2018
[13]

A., Duanmu, Z., Fang, H.-L., and Runge-Morris, M

Kocarek, T. A., Duanmu, Z., Fang, H.-L., and Runge-Morris, M. (2008). Age- and sex- dependentexpressionofmultiplemurinehepatichydroxysteroidsulfotransferase(SULT2A) genes.Biochemical Pharmacology76,1036–1046

2008
[14]

Bayesianvariableselectionformultioutcome models through shared shrinkage.Scandinavian Journal of Statistics48,295–320

Kundu, D., Mitra, R., andGaskins, J.T.(2021). Bayesianvariableselectionformultioutcome models through shared shrinkage.Scandinavian Journal of Statistics48,295–320

2021
[15]

Liu, S., Johns, E., and Davison, A. J. (2019). End-to-end multi-task learning with attention. 24 InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1871–1880

2019
[16]

Taking Advantage of Sparsity in Multi-Task Learning

Lounici, K., Pontil, M., Tsybakov, A. B., and Van De Geer, S. (2009). Taking advantage of sparsity in multi-task learning.arXiv preprint arXiv:0903.1468

work page internal anchor Pith review Pith/arXiv arXiv 2009
[17]

Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., and Chi, E. H. (2018). Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, pages 1930–1939

2018
[18]

Ntambi, J. M. (2007). Hepatic stearoyl-CoA desaturase-1 deficiency protects mice from carbohydrate-induced adiposity and hepatic steatosis.Cell Metabolism6,484–496

2007
[19]

Qin, X., Wang, X., and Yan, J. (2025). Towards consistent multi-task learning: Unlocking the potential of task-specific parameters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10067–10076

2025
[20]

Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with relu activation function.The Annals of Statistics48,1875–1897

2020
[21]

Song, X., Ma, S., Huang, J., and Zhou, X. (2007). A semiparametric approach for the nonparametric transformation survival model with multiple covariates.Biostatistics8, 197–211

2007
[22]

Tian, Y., Gu, Y., and Feng, Y. (2025). Learning from similar linear representations: Adaptivity, minimaxity, and robustness.Journal of Machine Learning Research26, 1–125

2025
[23]

Tripuraneni, N., Jin, C., and Jordan, M. (2021). Provable meta-learning of linear rep- 25 resentations. InInternational Conference on Machine Learning, pages 10434–10443. PMLR

2021
[24]

Wang, J., Yin, J., Yang, Q., Ding, F., Chen, X., Li, B., and Tian, X. (2016). Human epidermal growth factor receptor 4 (HER4) is a favorable prognostic marker of breast cancer: a systematic review and meta-analysis.Oncotarget7,76693–76703

2016
[25]

Wang, T., Zhang, S., Zhang, S., Huang, J., and Ma, S. (2024). Deep transformation model. arXiv preprint arXiv:2410.19226

work page arXiv 2024
[26]

Xu, C., Lu, Z., Hou, G., and Zhu, M. (2024). Exploring the function and prognostic value of RPLP0, RPLP1 and RPLP2 expression in lung adenocarcinoma.Journal of Molecular Histology55,1079–1091

2024
[27]

Xu, X., Cai, H., Peng, J., Liu, H., and Chu, F. (2024). HBB as a novel biomarker for the diagnosis and monitoring of lung cancer regulates cell proliferation via ERK1/2 pathway. Technology in Cancer Research&Treatment23,15330338241249032

2024
[28]

Zakerinia, H., Ghobadi, D., and Lampert, C. H. (2025). From low intrinsic dimensional- ity to non-vacuous generalization bounds in deep multi-task learning.arXiv preprint arXiv:2501.19067

work page arXiv 2025
[29]

and Yang, Q

Zhang, Y. and Yang, Q. (2021). A survey on multi-task learning.IEEE Transactions on Knowledge and Data Engineering34,5586–5609. 26

2021

[1] [1]

T., Parmigiani, G., and Mazumder, R

Behdin, K., Loewinger, G., Kishida, K. T., Parmigiani, G., and Mazumder, R. (2025). Multi-task learning for sparsity pattern heterogeneity: statistical and computational perspectives.Journal of the Royal Statistical Society Series B: Statistical Methodology page qkaf076

2025

[2] [2]

Chen, Z., Badrinarayanan, V., Lee, C.-Y., and Rabinovich, A. (2018). Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning, pages 794–803. PMLR

2018

[3] [3]

Dinh, V. C. and Ho, L. S. (2020). Consistent feature selection for analytic deep neural networks.Advances in Neural Information Processing Systems33,2420–2431

2020

[4] [4]

S., and Libra, M

Falzone, L., Grimaldi, M., Celentano, E., Augustin, L. S., and Libra, M. (2020). Identification of modulated micrornas associated with breast cancer, diet, and physical activity.Cancers 12,2555

2020

[5] [5]

and Gu, Y

Fan, J. and Gu, Y. (2024). Factor augmented sparse throughput deep relu neural networks for high dimensional regression.Journal of the American Statistical Association119, 2680–2694

2024

[6] [6]

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

Feng, J. and Simon, N. (2017). Sparse-input neural networks for high-dimensional nonpara- metric regression and classification.arXiv preprint arXiv:1711.07592. 23

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator.Journal of Econometrics35,303–316

1987

[8] [8]

P., Swiatnicki, M

Hollern, D. P., Swiatnicki, M. R., Rennhack, J. P., Misek, S. A., Matson, B. C., McAuliff, A., Gallo, K. A., Caron, K. M., and Andrechek, E. R. (2019). E2F1 drives breast cancer metastasis by regulating the target gene FGF13 and altering cell migration.Scientific Reports9,10718

2019

[9] [9]

Huang, X., Xu, K., Lee, D., Hassani, H., Bastani, H., and Dobriban, E. (2025). Optimal multitask linear regression and contextual bandits under sparse heterogeneity.Journal of the American Statistical Associationpages 1–14

2025

[10] [10]

Ji, S., Kollár, J., and Shiffman, B. (1992). A global łojasiewicz inequality for algebraic varieties.Transactions of the American Mathematical Society329,813–818

1992

[11] [11]

Jiao, Y., Shen, G., Lin, Y., and Huang, J. (2023). Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors.The Annals of Statistics51,691–716

2023

[12] [12]

Kendall, A., Gal, Y., and Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7482–7491

2018

[13] [13]

A., Duanmu, Z., Fang, H.-L., and Runge-Morris, M

Kocarek, T. A., Duanmu, Z., Fang, H.-L., and Runge-Morris, M. (2008). Age- and sex- dependentexpressionofmultiplemurinehepatichydroxysteroidsulfotransferase(SULT2A) genes.Biochemical Pharmacology76,1036–1046

2008

[14] [14]

Bayesianvariableselectionformultioutcome models through shared shrinkage.Scandinavian Journal of Statistics48,295–320

Kundu, D., Mitra, R., andGaskins, J.T.(2021). Bayesianvariableselectionformultioutcome models through shared shrinkage.Scandinavian Journal of Statistics48,295–320

2021

[15] [15]

Liu, S., Johns, E., and Davison, A. J. (2019). End-to-end multi-task learning with attention. 24 InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1871–1880

2019

[16] [16]

Taking Advantage of Sparsity in Multi-Task Learning

Lounici, K., Pontil, M., Tsybakov, A. B., and Van De Geer, S. (2009). Taking advantage of sparsity in multi-task learning.arXiv preprint arXiv:0903.1468

work page internal anchor Pith review Pith/arXiv arXiv 2009

[17] [17]

Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., and Chi, E. H. (2018). Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, pages 1930–1939

2018

[18] [18]

Ntambi, J. M. (2007). Hepatic stearoyl-CoA desaturase-1 deficiency protects mice from carbohydrate-induced adiposity and hepatic steatosis.Cell Metabolism6,484–496

2007

[19] [19]

Qin, X., Wang, X., and Yan, J. (2025). Towards consistent multi-task learning: Unlocking the potential of task-specific parameters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10067–10076

2025

[20] [20]

Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with relu activation function.The Annals of Statistics48,1875–1897

2020

[21] [21]

Song, X., Ma, S., Huang, J., and Zhou, X. (2007). A semiparametric approach for the nonparametric transformation survival model with multiple covariates.Biostatistics8, 197–211

2007

[22] [22]

Tian, Y., Gu, Y., and Feng, Y. (2025). Learning from similar linear representations: Adaptivity, minimaxity, and robustness.Journal of Machine Learning Research26, 1–125

2025

[23] [23]

Tripuraneni, N., Jin, C., and Jordan, M. (2021). Provable meta-learning of linear rep- 25 resentations. InInternational Conference on Machine Learning, pages 10434–10443. PMLR

2021

[24] [24]

Wang, J., Yin, J., Yang, Q., Ding, F., Chen, X., Li, B., and Tian, X. (2016). Human epidermal growth factor receptor 4 (HER4) is a favorable prognostic marker of breast cancer: a systematic review and meta-analysis.Oncotarget7,76693–76703

2016

[25] [25]

Wang, T., Zhang, S., Zhang, S., Huang, J., and Ma, S. (2024). Deep transformation model. arXiv preprint arXiv:2410.19226

work page arXiv 2024

[26] [26]

Xu, C., Lu, Z., Hou, G., and Zhu, M. (2024). Exploring the function and prognostic value of RPLP0, RPLP1 and RPLP2 expression in lung adenocarcinoma.Journal of Molecular Histology55,1079–1091

2024

[27] [27]

Xu, X., Cai, H., Peng, J., Liu, H., and Chu, F. (2024). HBB as a novel biomarker for the diagnosis and monitoring of lung cancer regulates cell proliferation via ERK1/2 pathway. Technology in Cancer Research&Treatment23,15330338241249032

2024

[28] [28]

Zakerinia, H., Ghobadi, D., and Lampert, C. H. (2025). From low intrinsic dimensional- ity to non-vacuous generalization bounds in deep multi-task learning.arXiv preprint arXiv:2501.19067

work page arXiv 2025

[29] [29]

and Yang, Q

Zhang, Y. and Yang, Q. (2021). A survey on multi-task learning.IEEE Transactions on Knowledge and Data Engineering34,5586–5609. 26

2021