pith. sign in

arxiv: 2607.00317 · v1 · pith:NLXQIVSKnew · submitted 2026-07-01 · 📊 stat.AP

Economic Disparities and Their Relationship to Destructive Health Behaviors in Five Western U.S. States

Pith reviewed 2026-07-02 00:42 UTC · model grok-4.3

classification 📊 stat.AP
keywords economic disparitiessuicide rateLASSO regressionlinear regressioncounty health datawestern United Statescorrelation analysis
0
0 comments X

The pith

Certain economic variables emerge as stronger predictors of county suicide rates than others in five western states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines links between economic factors and health outcomes, focusing on suicide rates across counties in Washington, Idaho, Oregon, California, and Nevada. Data from census and health rankings sources were cleaned and imputed for missing values before applying principal component analysis, clustering, correlations, linear regressions, and LASSO. These methods reveal which economic variables most strongly associate with suicide rates and destructive health behaviors, while also indicating that counties tend to group according to state-level economic patterns.

Core claim

Correlation analysis, LASSO regression, and linear fittings identify the destructive health variables most connected to economic variables by R-squared and correlation strength, pinpoint the economic variables most and least important for predicting suicide rate, and outline the possible relationships between suicide rate and those economic factors.

What carries the argument

LASSO regression combined with linear model fittings and correlation analysis on regression-imputed county-level economic and health data.

If this is right

  • Counties group according to broader state-level economic patterns based on PCA and clustering.
  • Specific economic variables can be ranked by their importance in predicting suicide rates.
  • Relationships between suicide rates and economic factors can be quantified through R-squared and correlation values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The identification of key predictors could guide selection of economic indicators for future health studies in similar regions.
  • State-level patterns in county groupings point to the value of including political or electoral data to interpret the clusters further.

Load-bearing premise

The regression imputation of missing values and the selected linear and LASSO models accurately reflect true relationships without major bias introduced by data cleaning decisions or unmeasured factors.

What would settle it

Repeating the LASSO and linear analyses on a subset of counties with no missing data or using alternative imputation methods would yield materially different sets of important economic predictors for suicide rate.

Figures

Figures reproduced from arXiv: 2607.00317 by Noah Jackson, Sergey Lapin.

Figure 1
Figure 1. Figure 1: Chloropleths for Suicide rate and Poverty rate [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PCA plot done for counties in the five western states via only using economic [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Clustering for the PCA data using the elbow method [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlation matrix between all the variables in the data set [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Graph between Destructive variables and Economic variables [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: LASSO Coefficient Plot 13 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

In this paper, we look at the relationships that economic variables have with adverse health outcomes in the western counties of Washington, Idaho, Oregon, California, and Nevada, with specific emphasis on how suicide rate relates to such economic variables. Data was first gathered from Census and County Health Rankings for the entire United States (for website use and usefulness for future research), cleaned and regression-imputed, and then various exploratory data analysis methods were used, such as PCA, clustering, correlation gathering, linear fittings, and LASSO. PCA and clustering suggested that counties may group according to broader state-level economic patterns, although political interpretations would require additional electoral data. Correlation Analysis along with LASSO and linear fittings showed us the destructive variables that connected the most with economic variables (in terms of $R^2$ and correlation values seen), the economic variables that are most and least important in predicting suicide rate, and the possible relationships that suicide rate has with these economic variables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes relationships between economic variables and destructive health behaviors, with emphasis on suicide rates, across counties in five western U.S. states using data from the Census and County Health Rankings. After cleaning and regression imputation of missing values, the authors apply PCA, clustering, correlation analysis, linear regressions, and LASSO to identify the economic variables most and least associated with suicide rate and to describe possible relationships.

Significance. If the reported associations prove robust after proper validation, the work could offer data-driven insights into socioeconomic correlates of suicide in the western U.S. by ranking predictors via LASSO and linear fits. The use of publicly available sources and multiple exploratory techniques is a modest strength, but the absence of any quantitative results, diagnostics, or validation steps in the abstract prevents evaluation of whether the central claims hold.

major comments (2)
  1. [Methods] Methods section (regression imputation step): the central claim that LASSO and linear fittings identify the most important economic predictors of suicide rate rests on the imputed dataset, yet the manuscript supplies no fraction of missing values, auxiliary variables used in the regression imputation, complete-case versus imputed comparison, or sensitivity analysis. Any systematic bias from the imputation therefore directly affects the reported correlations, R² rankings, and LASSO coefficient paths.
  2. [Abstract] Abstract and Results: the abstract states that correlation analysis, LASSO, and linear fittings revealed the strongest links and possible relationships, but reports none of the actual R² values, selected predictors, coefficient magnitudes, or model diagnostics, rendering the support for the claimed relationships unevaluable.
minor comments (2)
  1. The abstract notes that PCA and clustering suggest counties group by state-level economic patterns but defers political interpretation to future electoral data; a brief statement on the number of counties and years covered would improve context.
  2. Clarify whether the LASSO regularization strength was chosen by cross-validation and report the resulting sparsity level or selected variables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our exploratory analysis of economic correlates of suicide rates. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Methods] Methods section (regression imputation step): the central claim that LASSO and linear fittings identify the most important economic predictors of suicide rate rests on the imputed dataset, yet the manuscript supplies no fraction of missing values, auxiliary variables used in the regression imputation, complete-case versus imputed comparison, or sensitivity analysis. Any systematic bias from the imputation therefore directly affects the reported correlations, R² rankings, and LASSO coefficient paths.

    Authors: We agree that the current manuscript lacks these essential details on the imputation process. In the revised version we will report the fraction of missing values per variable, list the auxiliary predictors used for each regression imputation, present side-by-side results from complete-case versus imputed analyses, and include a brief sensitivity check (e.g., comparing results under different imputation models or exclusion thresholds). revision: yes

  2. Referee: [Abstract] Abstract and Results: the abstract states that correlation analysis, LASSO, and linear fittings revealed the strongest links and possible relationships, but reports none of the actual R² values, selected predictors, coefficient magnitudes, or model diagnostics, rendering the support for the claimed relationships unevaluable.

    Authors: We accept that the abstract currently omits quantitative results. We will revise the abstract to include the key numerical findings: the top LASSO-selected predictors for suicide rate, their relative importance or coefficient signs, the highest R² values from the linear models, and a short statement on model diagnostics, while remaining within the word limit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical data analysis on external sources

full rationale

The paper conducts standard exploratory data analysis (PCA, clustering, correlations, linear fits, LASSO) on public Census and County Health Rankings data after regression imputation. No mathematical derivations, self-referential equations, fitted inputs renamed as predictions, or self-citation chains exist. All steps operate on external data without reducing claims to inputs by construction. This matches the default expectation of no circularity for data-driven papers.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The analysis depends on standard statistical assumptions plus fitted parameters introduced by regression imputation and LASSO regularization; no new entities are postulated.

free parameters (2)
  • LASSO regularization strength
    Controls sparsity and fit in the suicide-rate prediction model; value not specified.
  • Regression imputation coefficients
    Parameters fitted during missing-value imputation from observed data.
axioms (2)
  • domain assumption Linear relationships adequately describe the associations between economic variables and suicide rate
    Invoked by the use of linear fittings and LASSO.
  • domain assumption Census and County Health Rankings data remain representative after cleaning and imputation
    Foundation for all downstream analyses.

pith-pipeline@v0.9.1-grok · 5696 in / 1212 out tokens · 41170 ms · 2026-07-02T00:42:38.549346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 7 canonical work pages

  1. [1]

    Daly and Norman J

    Mary C. Daly and Norman J. Johnson and Daniel J. Wilson , title=. 2007 , number=. doi:None , url=

  2. [2]

    Vital Signs: Trends in State Suicide Rates — United States, 1999–2016 and Circumstances Contributing to Suicide — 27 States, 2015 , volume =

    Stone, Deborah and Simon, Thomas and Fowler, Katherine and Kegler, Scott and Yuan, Keming and Holland, Kristin and Ivey-Stephenson, Asha and Crosby, Alex , year =. Vital Signs: Trends in State Suicide Rates — United States, 1999–2016 and Circumstances Contributing to Suicide — 27 States, 2015 , volume =. MMWR. Morbidity and Mortality Weekly Report , doi =

  3. [3]

    Missing data imputation: Focusing on single imputation , volume =

    Zhang, Zhongheng , year =. Missing data imputation: Focusing on single imputation , volume =. Annals of translational medicine , doi =

  4. [4]

    WHO Housing and Health Guidelines , year =

  5. [5]

    Income and Wealth Distribution Databases: Terms of Reference , year =

  6. [6]

    Assessing health gradient with different equivalence scales for household income – A sensitivity analysis , journal =

    Sakari Karvonen and Pasi Moisio and Kristian Vepsäläinen and Joonas Ollonqvist , keywords =. Assessing health gradient with different equivalence scales for household income – A sensitivity analysis , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.ssmph.2021.100892 , url =

  7. [7]

    Income Comparisons with Regional Price Parity Adjustments , year=

    U.S. Income Comparisons with Regional Price Parity Adjustments , year=. The B.E. Journal of Economic Analysis & Policy , author=. doi:10.1515/bejeap-2018-0024 , url=

  8. [8]

    Better Jobs Index: An Employment Conditions Index for Latin America , institution =

  9. [9]

    and Garnett, Matthew F

    Spencer, Merianne R. and Garnett, Matthew F. and Mini. Drug overdose deaths in the United States, 2002--2022 , institution =. 2024 , doi =

  10. [10]

    http://www.nber.org/papers/w32068

    Hamilton, James D and Ma, Xinwei and Xi, Jin. Principal Component Analysis for a Mix of Stationary and Nonstationary Variables. 2024. doi:10.3386/w32068 , URL = "http://www.nber.org/papers/w32068", abstract =

  11. [11]

    Journal of Monetary Economics , author=

    Forecasting inflation , year=. Journal of Monetary Economics , author=. doi:None , url=

  12. [12]

    Journal of Educational Psychology , volume =

    Analysis of a Complex of Statistical Variables into Principal Components , author =. Journal of Educational Psychology , volume =

  13. [13]

    Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume =

    Some Methods for Classification and Analysis of Multivariate Observations , author =. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume =. 1967 , publisher =

  14. [14]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume =

    Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1996 , publisher =

  15. [15]

    and Nachtsheim, Christopher J

    Kutner, Michael H. and Nachtsheim, Christopher J. and Neter, John and Li, William , title =. 2005 , publisher =

  16. [16]

    2011 , month = jul, url =

    Renwick, Trudi , title =. 2011 , month = jul, url =

  17. [17]

    2024 , month = may, url =

    Burns, Kalee , title =. 2024 , month = may, url =

  18. [18]

    Review of Income and Wealth , volume =

    Serajuddin, Umar and Verme, Paolo , title =. Review of Income and Wealth , volume =. 2015 , month = mar, doi =

  19. [19]

    Trevor Hastie and Robert Tibshirani and Jerome Friedman , title =

  20. [20]

    Associations of suicide rates with socioeconomic status and social isolation: Findings from longitudinal register and Census Data , url=

    Näher, Anatol-Fiete and Rummel-Kluge, Christine and Hegerl, Ulrich , year=. Associations of suicide rates with socioeconomic status and social isolation: Findings from longitudinal register and Census Data , url=. Frontiers in psychiatry , publisher=

  21. [21]

    The relationship between population density and suicide: a systematic review and meta-analysis , journal =

    Chiara Davico and Marilia. The relationship between population density and suicide: a systematic review and meta-analysis , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.jpsychires.2025.06.003 , url =

  22. [22]

    and Millman, K

    Harris, Charles R. and Millman, K. Jarrod and van der Walt, St. Array Programming with NumPy , journal =. 2020 , doi =

  23. [23]

    Proceedings of the 9th Python in Science Conference , pages =

    McKinney, Wes , title =. Proceedings of the 9th Python in Science Conference , pages =

  24. [24]

    John Fox and Sanford Weisberg , year =. An

  25. [25]

    2024 , note =

    R package 'corrplot': Visualization of a Correlation Matrix , author =. 2024 , note =

  26. [26]

    InterJournal , volume =

    The igraph software package for complex network research , author =. InterJournal , volume =. 2006 , url =

  27. [27]

    arXiv preprint arXiv:2311.10260 , year =

    igraph enables fast and robust network analysis across programming languages , author =. arXiv preprint arXiv:2311.10260 , year =

  28. [28]

    doi:10.5281/zenodo.7682609 , url =

    Gábor Csárdi and Tamás Nepusz and Vincent Traag and Szabolcs Horvát and Fabio Zanini and Daniel Noom and Kirill Müller and David Schoch and Maëlle Salmon , year =. doi:10.5281/zenodo.7682609 , url =

  29. [29]

    2026 , note =

    dplyr: A Grammar of Data Manipulation , author =. 2026 , note =

  30. [30]

    2016 , isbn =

    Hadley Wickham , title =. 2016 , isbn =

  31. [31]

    2023 , publisher =

    Edzer Pebesma and Roger Bivand , title =. 2023 , publisher =

  32. [32]

    2018 , journal =

    Edzer Pebesma , title =. 2018 , journal =. doi:10.32614/RJ-2018-009 , url =

  33. [33]

    2025 , note =

    tigris: Load Census TIGER/Line Shapefiles , author =. 2025 , note =

  34. [34]

    2025 , note =

    patchwork: The Composer of Plots , author =. 2025 , note =

  35. [35]

    2026 , note =

    magick: Advanced Graphics and Image-Processing in R , author =. 2026 , note =

  36. [36]

    2024 , note =

    kableExtra: Construct Complex Table with 'kable' and Pipe Syntax , author =. 2024 , note =

  37. [37]

    Journal of Statistical Software , year =

    Regularization Paths for Generalized Linear Models via Coordinate Descent , author =. Journal of Statistical Software , year =

  38. [38]

    Journal of Statistical Software , year =

    Elastic Net Regularization Paths for All Generalized Linear Models , author =. Journal of Statistical Software , year =