Post-selection inference for network structure

Eric Auerbach; Jonathan Auerbach; Sidonia McKenzie

arxiv: 2607.00312 · v1 · pith:VOWZFNMJnew · submitted 2026-07-01 · 💰 econ.EM · stat.ME

Post-selection inference for network structure

Eric Auerbach , Jonathan Auerbach , Sidonia McKenzie This is my paper

Pith reviewed 2026-07-02 00:51 UTC · model grok-4.3

classification 💰 econ.EM stat.ME

keywords post-selection inferencenetwork structureconfidence intervalsgroup densitiessimultaneous coverageempirical processessocial networkseconomic networks

0 comments

The pith

Two new confidence intervals remain valid for network group densities even after selecting groups from the data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops two confidence intervals for the density of connections between groups in a network. These intervals remain valid even after the groups have been chosen based on the network structure itself. A sympathetic reader would care because conventional methods can produce misleading inferences when selection depends on the data. The authors show that one interval based on a concentration inequality achieves optimal width asymptotically. Three empirical examples illustrate that selection correction can change conclusions about homophily, trade structures, and market segments.

Core claim

We develop two new confidence intervals that are universally valid post-selection in the sense that they guarantee simultaneous coverage asymptotically over all pairs of groups whose relative sizes do not vanish. Our first interval builds on a strategy of Berk et al. Our second interval is based on a Talagrand-type concentration inequality for empirical processes. Only the second interval achieves the best-possible width asymptotically up to a constant factor.

What carries the argument

The second confidence interval derived from a Talagrand-type concentration inequality for empirical processes, which delivers simultaneous coverage over selected group pairs and near-optimal width.

If this is right

Both intervals are simple to compute and scale to large networks.
The second interval achieves the best possible width asymptotically up to a constant factor.
Some evidence for homophily in social networks and hub-and-spoke trade structures survives the correction.
Evidence for disjoint market segments in worker transition networks does not survive the correction.
The intervals guarantee simultaneous coverage over all qualifying pairs of groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers studying community detection in networks may need to reexamine earlier density estimates that ignored selection.
The concentration approach could extend to other network statistics such as betweenness or modularity if similar empirical process bounds hold.
In labor economics, claims about segmented markets might become more conservative once post-selection adjustments are routine.
Finite-sample performance could be checked via Monte Carlo experiments on networks of moderate size where group fractions are held fixed.

Load-bearing premise

The groups are selected from the observed network data in an asymptotic regime where their relative sizes do not vanish.

What would settle it

A simulation on a sequence of growing networks with known true densities where the intervals fail to cover at the nominal rate for fixed-fraction groups would disprove the asymptotic simultaneous coverage.

Figures

Figures reproduced from arXiv: 2607.00312 by Eric Auerbach, Jonathan Auerbach, Sidonia McKenzie.

**Figure 1.** Figure 1: A network with an apparent core-periphery structure. The selected core nodes are [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Researchers often use the density of connections between groups of agents, such as communities, blocs, or markets, to characterize the structure of a social or economic network. In many cases, these groups are selected using the network data, making conventional fixed-group inference procedures potentially invalid. To address this issue, we develop two new confidence intervals that are universally valid post-selection in the sense that they guarantee simultaneous coverage asymptotically over all pairs of groups whose relative sizes do not vanish. Our first interval builds on a strategy of \cite{berk2013valid}. Our second interval is based on a Talagrand-type concentration inequality for empirical processes. Both intervals are simple to compute and scalable to large networks, but a key technical contribution of our paper is show that only the second interval achieves the best-possible width asymptotically up to a constant factor. Three empirical illustrations show that accounting for selection can matter in practice. Some evidence for homophily in a social network and a hub-and-spoke structure in a trade network survives our correction, while evidence for disjoint market segments in a worker transition network does not.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives two new post-selection intervals for network densities with simultaneous asymptotic coverage over large groups, and shows the Talagrand version is optimal in width up to a constant factor.

read the letter

The main takeaway is that this paper supplies two new confidence intervals for the density between data-selected groups in a network. Both are asymptotically valid simultaneously over all pairs whose relative sizes do not vanish. The first adapts the Berk et al. 2013 method; the second uses a Talagrand-type bound and is proved to achieve the best possible width asymptotically up to a constant factor. Both are simple to compute and scale to large networks.

What the work does well is tackle a practical problem that shows up whenever researchers form communities, markets, or blocs from the observed links and then want to report densities. The three empirical illustrations are useful: they show the correction can overturn a finding (worker transitions) while leaving others intact (homophily and hub-spoke trade). That kind of check helps readers see when the adjustment matters.

The soft spots are straightforward. Coverage holds only in the asymptotic regime where selected groups are not vanishingly small, so the intervals do not apply to tiny subgroups. The optimality claim is up to a constant factor rather than exact. The paper rests on cited concentration results rather than deriving everything from scratch, which is normal but means the novelty is in the tailoring and the comparison of the two intervals. Without the full proofs in front of me I cannot check the technical steps, but the abstract states the claims cleanly and without internal contradiction.

This is for applied network researchers in economics and statistics who already worry about post-selection. It is worth sending to a serious referee because it supplies concrete, usable tools for a known issue and includes empirical checks that let readers judge practical impact.

Referee Report

0 major / 3 minor

Summary. The manuscript develops two new confidence intervals for inference on network densities between groups that are selected from the observed network data itself. The intervals are constructed to deliver asymptotic simultaneous coverage over all pairs of groups whose relative sizes do not vanish. The first interval adapts the Berk et al. (2013) approach; the second uses a Talagrand-type concentration inequality for empirical processes. The paper establishes that only the second interval attains the best-possible asymptotic width up to a constant factor. Three empirical applications (social network, trade network, worker transition network) illustrate that selection correction can alter substantive conclusions about homophily, hub-and-spoke structure, and market segmentation.

Significance. If the coverage and optimality claims hold, the paper supplies a practical, scalable tool for a pervasive problem in empirical network analysis: valid inference after data-driven group selection. The explicit optimality result for the Talagrand-based interval (up to a constant factor) is a clear technical contribution that distinguishes the two proposals. The empirical illustrations provide concrete evidence that the correction matters in practice. These features would make the work a useful reference for applied researchers working with community detection, bloc identification, or market segmentation in networks.

minor comments (3)

Abstract: the phrase 'three empirical illustrations' would benefit from naming the three networks (social, trade, worker transition) to give readers immediate context without needing to reach the main text.
The manuscript relies on the external result of Berk et al. (2013) for the first interval and on Talagrand-type bounds for the second; a short self-contained sketch of how these are applied to the network-density functional would improve accessibility even if the full proofs remain in an appendix.
Notation: ensure that the definition of 'relative size' (used in the non-vanishing condition) is stated once in a single display equation or definition box so that readers can locate it without searching multiple sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation builds new post-selection confidence intervals by adapting the strategy from the external citation Berk et al. (2013) for the first interval and applying a Talagrand-type concentration inequality for the second. The key technical result—that only the second achieves asymptotically optimal width up to a constant factor—is established via the external inequality and asymptotic analysis under the stated regime (relative group sizes not vanishing). No step reduces a claimed prediction or result to a fitted parameter or self-defined quantity by construction, and the cited results are independent external references rather than self-citations. The paper is self-contained against its external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard asymptotic assumptions for empirical processes and network sampling; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Asymptotic regime in which selected group relative sizes do not vanish
Required for the simultaneous coverage guarantee over all pairs of groups.

pith-pipeline@v0.9.1-grok · 5717 in / 1149 out tokens · 22688 ms · 2026-07-02T00:51:08.164630+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references

[1]

The following Lemma 4 relates || · ||† to || · ||∞→1 and || · ||♠

≤1.7823. The following Lemma 4 relates || · ||† to || · ||∞→1 and || · ||♠. The upper bound in (A.3) is Theorem 3 of Gittens and Tropp (2009). The lower bound in (A.3) builds on an argument given in the third paragraph of Section 4.2 of Rudelson and Vershynin (2007). In this result, our contribution is to replace their use of Khintchine’s inequality (whic...

2009
[2]

Proof.The inequalityE[||Z|| ∞→1]≤2E[||Z|| †] is Theorem 3 of Gittens and Tropp (2009)

≤1.7823. Proof.The inequalityE[||Z|| ∞→1]≤2E[||Z|| †] is Theorem 3 of Gittens and Tropp (2009). To demonstrate inequality (2 KG(∞))−1E[||Z|| †]≤E[||Z|| ∞→1], we actually show the stronger result that for any N1 ×N 2 matrix X, ||X|| † ≤ 2KG(∞)||X|| ∞→1. To show the stronger result, we first show that ||X||1,2 ≤K G(∞)||X|| ∞→1. Lemma 3 above implies that P ...

2009
[3]

≤1.7823. Proof. Fix x > 0 and c∈ (0, 1/2]. We demonstrate the result in three steps. In the first step, we represent ||ϵ||□;c/(2B) as the maximum of an empirical process indexed by elements of Gc and apply Lemma 8 above to bound ||ϵ||□;c/(2B). The bounds depend on the expected value of the maximum of the empirical process. We bound this expectation in the...

2005
[4]

∈argmax (g1,g2)∈Gc P i∈[N1],j∈[N2] ϵijgi,1gj,2 , using an arbitrary deterministic tie-breaking rule, and letm ∗ t =P i∈[Nt] gi,t fort∈ {1,2}. We start with the lower bound ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2) = 1 m∗ 1m∗ 2 X i∈[N1],j∈[N2] ϵijg∗ i,1g∗ j,2 ≥ 1 2√m∗ 1m∗ 2  p N1 min   1 N1 X i∈[N1] (ui)+, 1 N1 X i∈[N1] (ui)−   + p N2 min   1 N2 X j∈[N2] (vj)+...
[5]

> 0 with probability approaching one, since ˆσ(g∗ 1, g∗ 2)/σ0 →p 1 and σ0 > 0. Combin- ing these facts with (A.22) and noting that 2 √ 2π = √ 8π, it follows that, with probability 74 approaching one along any asymptotic sequence of models inF 0 withN 1, N2 → ∞, ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2) ≥(1−s/2) √N1 + √N2√ 8π ˆσ(g∗ 1, g∗ 2)√m∗ 1m∗ 2 >(1−s) √N1 + √N2√ ...
[6]

Since I(g∗ 1, g∗ 2; α) = ˆθ(g∗ 1, g∗

> 0 and the final inequality is (A.21). Since I(g∗ 1, g∗ 2; α) = ˆθ(g∗ 1, g∗
[7]

±[K×ˆσ(g ∗ 1, g∗ 2)]/ √m∗ 1m∗ 2, the event {θ(g∗ 1, g∗
[8]

Finally, since (g∗ 1, g∗

∈I (g∗ 1, g∗ 2; α)} implies |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤Kˆσ(g∗ 1, g∗ 2)/√m∗ 1m∗ 2, and therefore lim F∈F 0:N 1,N2→∞ P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)) = 0. Finally, since (g∗ 1, g∗
[9]

∈ G c implies ∩(g1,g2)∈Gc{θ(g1, g2) ∈I (g1, g2; α)} ⊆ {θ (g∗ 1, g∗
[10]

Step 1:In this step we show the inequality (A.22)

∈I (g∗ 1, g∗ 2; α)}, and sinceF 0 ⊆ F, lim inf F∈F:N 1,N2→∞ P ∩(g1,g2)∈Gc {θ(g1, g2)∈I(g 1, g2;α)} = 0, which is the conclusion of the proposition. Step 1:In this step we show the inequality (A.22). By definition of ( g∗ 1, g∗ 2), we have that P i∈[N1],j∈[N2] ϵijg∗ i,1g∗ j,2 ≥ P i∈[N1],j∈[N2] ϵijg′ i,1g′ j,2 for any (g′ 1, g′
[11]

Choosing g′ j,2 = 1 for all j∈ [N2] and g′ i,1 = 1{ P j∈[N2] ϵij ≥ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0} ≥ N1/2} + 1{ P j∈[N2] ϵij ≤ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0}< N 1/2}

∈ G c. Choosing g′ j,2 = 1 for all j∈ [N2] and g′ i,1 = 1{ P j∈[N2] ϵij ≥ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0} ≥ N1/2} + 1{ P j∈[N2] ϵij ≤ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0}< N 1/2}. By construction, the entries of g′ 1 and g′ 2 take values in {0, 1},P j∈[N2] g′ j,2 = N2 andP i∈[N1] g′ i,1 ≥N 1/2, and so 75 (g′ 1, g′ 2)∈ G c for anyc∈[0,1/2]. Under this cho...
[12]

1− ˆσ(g∗ 1 ,g∗ 2 )−σ0 σ0 1+ ˆσ(g∗ 1 ,g∗ 2 )−σ0 σ0 # , C := E ui σ0 +

we have that X i∈[N1],j∈[N2] ϵijg′ i,1g′ j,2 = X i∈[N1]  X j∈[N2] ϵij   + 1{ X i∈[N1] 1{ X j∈[N2] ϵij ≥0} ≥N 1/2} + X i∈[N1]  X j∈[N2] ϵij   − 1{ X i∈[N1] 1{ X j∈[N2] ϵij ≥0}< N 1/2} ≥min  X i∈[N1]  X j∈[N2] ϵij   + , X i∈[N1]  X j∈[N2] ϵij   −   . By the same logic, choosingg′ i,1 = 1 for all i∈ [N1] and g′ j,2 = 1{ P i∈[N1] ϵij ≥ 0}1...
[13]

Since σ(g∗ 1, g∗

≤ 2 and so P     1 N1 X i∈[N1] ui σ0 + −E ui σ0 +   " 1− ˆσ(g∗ 1 ,g∗ 2)−σ0 σ0 1 + ˆσ(g∗ 1 ,g∗ 2)−σ0 σ0 # > t/3   ≤P   1 N1 X i∈[N1] ui σ0 + −E ui σ0 + > t/6   +P(E C N). Since σ(g∗ 1, g∗
[14]

Since E[(u i)+]≥E[u i] = 0 and E (ui)2 + ≤E[(u i)2] = σ2 0, it follows that Var((ui)+) ≤σ 2 0, Var ui σ0 + ≤ 1, and so Var 1 N1 P i∈[N1] ui σ0 + ≤ 1 N1

= σ0 under F0 and Assumption 3(i) holds uniformly over Gc, we have P (EC N) → 0. Since E[(u i)+]≥E[u i] = 0 and E (ui)2 + ≤E[(u i)2] = σ2 0, it follows that Var((ui)+) ≤σ 2 0, Var ui σ0 + ≤ 1, and so Var 1 N1 P i∈[N1] ui σ0 + ≤ 1 N1 . It follows from Chebyshev’s inequality that for anyt >0, P   1 N1 X i∈[N1] ui σ0 + −E ui σ0 + ≥t/6   ≤ 36 t2N1 which c...

2009
[15]

Because Assumption 3(i) holds uniformly overG c, it follows that ∆N = ˆσ(g∗ 1, g∗ 2)−σ(g ∗ 1, g∗ 2) σ(g ∗ 1, g∗

:= q 1 m∗ 1m∗ 2 P i∈[N1],j∈[N2] σ2 ijg∗ i,1g∗ j,2 = σ0. Because Assumption 3(i) holds uniformly overG c, it follows that ∆N = ˆσ(g∗ 1, g∗ 2)−σ(g ∗ 1, g∗ 2) σ(g ∗ 1, g∗
[16]

Now define the event EN := |∆N | ≤ 1 2

=o p(1). Now define the event EN := |∆N | ≤ 1 2 . Since ∆ N = op(1), we have P (EC N) → 0. On the eventE N, ∆N 1+∆N ≤2|∆ N |, and so P E ui σ0 + ∆N 1 + ∆N > t/3 ≤P 2E ui σ0 + |∆N |> t/3 +P(E C N). Since E ui σ0 + is eventually bounded and ∆ N = op(1), the first term on the right-hand side converges to zero. The second term also converges to zero. This dem...
[17]

≤ 1.7823 denote Krivine’s upper bound for KG(∞) (Lemma 3). Since KG(∞) ≤ ¯KG, the lower-tail inequality (A.12) of Lemma 10 re- mains valid with KG(∞) replaced by ¯KG, because (18 ¯KG)−1E[||ϵ|| †]≤ (18KG(∞))−1E[||ϵ|| †] so the event on the left-hand side of (A.12) only shrinks; likewise the inequality ||X||1,2 ≤ KG(∞)||X|| ∞→1 from the proof of Lemma 4 hol...
[18]

By construction, |ˆθ(g∗ 1, g∗ 2)−θ(g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2)

∈argmax (g1,g2)∈Gc P i∈[N1],j∈[N2] ϵijgi,1gj,2 , using an arbitrary deterministic tie- breaking rule, and let m∗ t :=P i∈[Nt] g∗ i,t for t∈ { 1, 2}. By construction, |ˆθ(g∗ 1, g∗ 2)−θ(g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2). Since (g ∗ 1, g∗ 2)∈ G c, P ∩(g1,g2)∈Gc {θ(g1, g2)∈I(g 1, g2;α)} ≤P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)).(A.25) On the event {θ(g∗ 1, g∗
[19]

∈I (g∗ 1, g∗ 2; α)}, both θ(g∗ 1, g∗
[20]

belong to I(g∗ 1, g∗ 2; α) — the latter by hypothesis — and therefore |ˆθ(g∗ 1, g∗
[21]

−θ(g∗ 1, g∗ 2)| ≤ |I (g∗ 1, g∗ 2; α)|. For all (N1, N2) with min(N1, N2) large enough that the width hypothesis (4.3) is in force, it follows that, almost surely, {θ(g∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)} ⊆ |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤(1−δ ′)c ∗(α) ˆ¯τ m∗ 1m∗ 2 . (If ˆ¯τ= 0, the convention in (4.3) forces |I(g∗ 1, g∗ 2; α)| = 0, in which case |ˆθ−θ| (g∗ 1, g∗
[22]

= 0 on the displayed event and the containment holds trivially.) Hence P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α))≤P |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤(1−δ ′)c ∗(α) ˆ¯τ m∗ 1m∗ 2 .(A.26) Let r be the sequence of real numbers in Assumption 3(ii) and set r′ := r¯τ. Decomposing the right-hand side of (A.26) according to whether ˆ¯τ≤¯τ+r ′ gives P |ˆθ−θ|(g ∗ 1, g∗ 2)≤(1...
[23]

residual

−θ (g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2), the first inequality is becauseF∈ F β,δ implies||σ|| F ≤C −1 β,δ E[||ϵ|| †], the second equality rearranges terms, and the second inequality is the lower-tail inequality (A.12) of Lemma 10 (with ¯KG in place of KG(∞), and noting that 2V 2 = 2 (||σ||2 F + 4BE[||ϵ|| †] +B||σ|| F )) applied withx=x β,δ. 84 Since r→ 0, ...

2015
[24]

Define ˜Y:=Y /Bso that the entries of ˜Yare bounded by 1 in absolute value. 89
[25]

Let ˜Y= P i∈[Nmin] siuivT i be the singular value decomposition of ˜Y
[26]

DefineS:={i:s i ≥(2 +η) √Nmax}andW:= P i∈S siuivT i
[27]

Define ˆµij :=B(W ij1{|W ij| ≤1}+1{W ij >1} −1{W ij <−1})
[28]

weak” version that restricts the full matrices µ or σ, and a “strong

Define ˆ¯τ:= 1.01||Y−ˆµ||†+0.25||Y−ˆµ||F , ˆV := p ||Y−ˆµ|| 2 F +B||Y−ˆµ|| F + 4B||Y−ˆµ|| †, and ˆσ(g1, g2) := ||(Y−ˆµ )g1g2||F /√m1m2 for (g1, g2) ∈ G c where the matrix ˆµhas ˆµij as itsijth entry and the matrix (Y−ˆµ)g 1g2 has (Yij −ˆµij)gi,1gj,2 as itsijth entry. B.2.2 Assumptions We make four additional assumptions about the class of random graph mod...

2020

[1] [1]

The following Lemma 4 relates || · ||† to || · ||∞→1 and || · ||♠

≤1.7823. The following Lemma 4 relates || · ||† to || · ||∞→1 and || · ||♠. The upper bound in (A.3) is Theorem 3 of Gittens and Tropp (2009). The lower bound in (A.3) builds on an argument given in the third paragraph of Section 4.2 of Rudelson and Vershynin (2007). In this result, our contribution is to replace their use of Khintchine’s inequality (whic...

2009

[2] [2]

Proof.The inequalityE[||Z|| ∞→1]≤2E[||Z|| †] is Theorem 3 of Gittens and Tropp (2009)

≤1.7823. Proof.The inequalityE[||Z|| ∞→1]≤2E[||Z|| †] is Theorem 3 of Gittens and Tropp (2009). To demonstrate inequality (2 KG(∞))−1E[||Z|| †]≤E[||Z|| ∞→1], we actually show the stronger result that for any N1 ×N 2 matrix X, ||X|| † ≤ 2KG(∞)||X|| ∞→1. To show the stronger result, we first show that ||X||1,2 ≤K G(∞)||X|| ∞→1. Lemma 3 above implies that P ...

2009

[3] [3]

≤1.7823. Proof. Fix x > 0 and c∈ (0, 1/2]. We demonstrate the result in three steps. In the first step, we represent ||ϵ||□;c/(2B) as the maximum of an empirical process indexed by elements of Gc and apply Lemma 8 above to bound ||ϵ||□;c/(2B). The bounds depend on the expected value of the maximum of the empirical process. We bound this expectation in the...

2005

[4] [4]

∈argmax (g1,g2)∈Gc P i∈[N1],j∈[N2] ϵijgi,1gj,2 , using an arbitrary deterministic tie-breaking rule, and letm ∗ t =P i∈[Nt] gi,t fort∈ {1,2}. We start with the lower bound ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2) = 1 m∗ 1m∗ 2 X i∈[N1],j∈[N2] ϵijg∗ i,1g∗ j,2 ≥ 1 2√m∗ 1m∗ 2  p N1 min   1 N1 X i∈[N1] (ui)+, 1 N1 X i∈[N1] (ui)−   + p N2 min   1 N2 X j∈[N2] (vj)+...

[5] [5]

> 0 with probability approaching one, since ˆσ(g∗ 1, g∗ 2)/σ0 →p 1 and σ0 > 0. Combin- ing these facts with (A.22) and noting that 2 √ 2π = √ 8π, it follows that, with probability 74 approaching one along any asymptotic sequence of models inF 0 withN 1, N2 → ∞, ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2) ≥(1−s/2) √N1 + √N2√ 8π ˆσ(g∗ 1, g∗ 2)√m∗ 1m∗ 2 >(1−s) √N1 + √N2√ ...

[6] [6]

Since I(g∗ 1, g∗ 2; α) = ˆθ(g∗ 1, g∗

> 0 and the final inequality is (A.21). Since I(g∗ 1, g∗ 2; α) = ˆθ(g∗ 1, g∗

[7] [7]

±[K×ˆσ(g ∗ 1, g∗ 2)]/ √m∗ 1m∗ 2, the event {θ(g∗ 1, g∗

[8] [8]

Finally, since (g∗ 1, g∗

∈I (g∗ 1, g∗ 2; α)} implies |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤Kˆσ(g∗ 1, g∗ 2)/√m∗ 1m∗ 2, and therefore lim F∈F 0:N 1,N2→∞ P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)) = 0. Finally, since (g∗ 1, g∗

[9] [9]

∈ G c implies ∩(g1,g2)∈Gc{θ(g1, g2) ∈I (g1, g2; α)} ⊆ {θ (g∗ 1, g∗

[10] [10]

Step 1:In this step we show the inequality (A.22)

∈I (g∗ 1, g∗ 2; α)}, and sinceF 0 ⊆ F, lim inf F∈F:N 1,N2→∞ P ∩(g1,g2)∈Gc {θ(g1, g2)∈I(g 1, g2;α)} = 0, which is the conclusion of the proposition. Step 1:In this step we show the inequality (A.22). By definition of ( g∗ 1, g∗ 2), we have that P i∈[N1],j∈[N2] ϵijg∗ i,1g∗ j,2 ≥ P i∈[N1],j∈[N2] ϵijg′ i,1g′ j,2 for any (g′ 1, g′

[11] [11]

Choosing g′ j,2 = 1 for all j∈ [N2] and g′ i,1 = 1{ P j∈[N2] ϵij ≥ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0} ≥ N1/2} + 1{ P j∈[N2] ϵij ≤ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0}< N 1/2}

∈ G c. Choosing g′ j,2 = 1 for all j∈ [N2] and g′ i,1 = 1{ P j∈[N2] ϵij ≥ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0} ≥ N1/2} + 1{ P j∈[N2] ϵij ≤ 0}1{ P i∈[N1] 1{ P j∈[N2] ϵij ≥ 0}< N 1/2}. By construction, the entries of g′ 1 and g′ 2 take values in {0, 1},P j∈[N2] g′ j,2 = N2 andP i∈[N1] g′ i,1 ≥N 1/2, and so 75 (g′ 1, g′ 2)∈ G c for anyc∈[0,1/2]. Under this cho...

[12] [12]

1− ˆσ(g∗ 1 ,g∗ 2 )−σ0 σ0 1+ ˆσ(g∗ 1 ,g∗ 2 )−σ0 σ0 # , C := E ui σ0 +

we have that X i∈[N1],j∈[N2] ϵijg′ i,1g′ j,2 = X i∈[N1]  X j∈[N2] ϵij   + 1{ X i∈[N1] 1{ X j∈[N2] ϵij ≥0} ≥N 1/2} + X i∈[N1]  X j∈[N2] ϵij   − 1{ X i∈[N1] 1{ X j∈[N2] ϵij ≥0}< N 1/2} ≥min  X i∈[N1]  X j∈[N2] ϵij   + , X i∈[N1]  X j∈[N2] ϵij   −   . By the same logic, choosingg′ i,1 = 1 for all i∈ [N1] and g′ j,2 = 1{ P i∈[N1] ϵij ≥ 0}1...

[13] [13]

Since σ(g∗ 1, g∗

≤ 2 and so P     1 N1 X i∈[N1] ui σ0 + −E ui σ0 +   " 1− ˆσ(g∗ 1 ,g∗ 2)−σ0 σ0 1 + ˆσ(g∗ 1 ,g∗ 2)−σ0 σ0 # > t/3   ≤P   1 N1 X i∈[N1] ui σ0 + −E ui σ0 + > t/6   +P(E C N). Since σ(g∗ 1, g∗

[14] [14]

Since E[(u i)+]≥E[u i] = 0 and E (ui)2 + ≤E[(u i)2] = σ2 0, it follows that Var((ui)+) ≤σ 2 0, Var ui σ0 + ≤ 1, and so Var 1 N1 P i∈[N1] ui σ0 + ≤ 1 N1

= σ0 under F0 and Assumption 3(i) holds uniformly over Gc, we have P (EC N) → 0. Since E[(u i)+]≥E[u i] = 0 and E (ui)2 + ≤E[(u i)2] = σ2 0, it follows that Var((ui)+) ≤σ 2 0, Var ui σ0 + ≤ 1, and so Var 1 N1 P i∈[N1] ui σ0 + ≤ 1 N1 . It follows from Chebyshev’s inequality that for anyt >0, P   1 N1 X i∈[N1] ui σ0 + −E ui σ0 + ≥t/6   ≤ 36 t2N1 which c...

2009

[15] [15]

Because Assumption 3(i) holds uniformly overG c, it follows that ∆N = ˆσ(g∗ 1, g∗ 2)−σ(g ∗ 1, g∗ 2) σ(g ∗ 1, g∗

:= q 1 m∗ 1m∗ 2 P i∈[N1],j∈[N2] σ2 ijg∗ i,1g∗ j,2 = σ0. Because Assumption 3(i) holds uniformly overG c, it follows that ∆N = ˆσ(g∗ 1, g∗ 2)−σ(g ∗ 1, g∗ 2) σ(g ∗ 1, g∗

[16] [16]

Now define the event EN := |∆N | ≤ 1 2

=o p(1). Now define the event EN := |∆N | ≤ 1 2 . Since ∆ N = op(1), we have P (EC N) → 0. On the eventE N, ∆N 1+∆N ≤2|∆ N |, and so P E ui σ0 + ∆N 1 + ∆N > t/3 ≤P 2E ui σ0 + |∆N |> t/3 +P(E C N). Since E ui σ0 + is eventually bounded and ∆ N = op(1), the first term on the right-hand side converges to zero. The second term also converges to zero. This dem...

[17] [17]

≤ 1.7823 denote Krivine’s upper bound for KG(∞) (Lemma 3). Since KG(∞) ≤ ¯KG, the lower-tail inequality (A.12) of Lemma 10 re- mains valid with KG(∞) replaced by ¯KG, because (18 ¯KG)−1E[||ϵ|| †]≤ (18KG(∞))−1E[||ϵ|| †] so the event on the left-hand side of (A.12) only shrinks; likewise the inequality ||X||1,2 ≤ KG(∞)||X|| ∞→1 from the proof of Lemma 4 hol...

[18] [18]

By construction, |ˆθ(g∗ 1, g∗ 2)−θ(g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2)

∈argmax (g1,g2)∈Gc P i∈[N1],j∈[N2] ϵijgi,1gj,2 , using an arbitrary deterministic tie- breaking rule, and let m∗ t :=P i∈[Nt] g∗ i,t for t∈ { 1, 2}. By construction, |ˆθ(g∗ 1, g∗ 2)−θ(g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2). Since (g ∗ 1, g∗ 2)∈ G c, P ∩(g1,g2)∈Gc {θ(g1, g2)∈I(g 1, g2;α)} ≤P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)).(A.25) On the event {θ(g∗ 1, g∗

[19] [19]

∈I (g∗ 1, g∗ 2; α)}, both θ(g∗ 1, g∗

[20] [20]

belong to I(g∗ 1, g∗ 2; α) — the latter by hypothesis — and therefore |ˆθ(g∗ 1, g∗

[21] [21]

−θ(g∗ 1, g∗ 2)| ≤ |I (g∗ 1, g∗ 2; α)|. For all (N1, N2) with min(N1, N2) large enough that the width hypothesis (4.3) is in force, it follows that, almost surely, {θ(g∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α)} ⊆ |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤(1−δ ′)c ∗(α) ˆ¯τ m∗ 1m∗ 2 . (If ˆ¯τ= 0, the convention in (4.3) forces |I(g∗ 1, g∗ 2; α)| = 0, in which case |ˆθ−θ| (g∗ 1, g∗

[22] [22]

= 0 on the displayed event and the containment holds trivially.) Hence P(θ(g ∗ 1, g∗ 2)∈I(g ∗ 1, g∗ 2;α))≤P |ˆθ(g∗ 1, g∗ 2)−θ(g ∗ 1, g∗ 2)| ≤(1−δ ′)c ∗(α) ˆ¯τ m∗ 1m∗ 2 .(A.26) Let r be the sequence of real numbers in Assumption 3(ii) and set r′ := r¯τ. Decomposing the right-hand side of (A.26) according to whether ˆ¯τ≤¯τ+r ′ gives P |ˆθ−θ|(g ∗ 1, g∗ 2)≤(1...

[23] [23]

residual

−θ (g∗ 1, g∗ 2)| = ||ϵ||□;c/(m∗ 1m∗ 2), the first inequality is becauseF∈ F β,δ implies||σ|| F ≤C −1 β,δ E[||ϵ|| †], the second equality rearranges terms, and the second inequality is the lower-tail inequality (A.12) of Lemma 10 (with ¯KG in place of KG(∞), and noting that 2V 2 = 2 (||σ||2 F + 4BE[||ϵ|| †] +B||σ|| F )) applied withx=x β,δ. 84 Since r→ 0, ...

2015

[24] [24]

Define ˜Y:=Y /Bso that the entries of ˜Yare bounded by 1 in absolute value. 89

[25] [25]

Let ˜Y= P i∈[Nmin] siuivT i be the singular value decomposition of ˜Y

[26] [26]

DefineS:={i:s i ≥(2 +η) √Nmax}andW:= P i∈S siuivT i

[27] [27]

Define ˆµij :=B(W ij1{|W ij| ≤1}+1{W ij >1} −1{W ij <−1})

[28] [28]

weak” version that restricts the full matrices µ or σ, and a “strong

Define ˆ¯τ:= 1.01||Y−ˆµ||†+0.25||Y−ˆµ||F , ˆV := p ||Y−ˆµ|| 2 F +B||Y−ˆµ|| F + 4B||Y−ˆµ|| †, and ˆσ(g1, g2) := ||(Y−ˆµ )g1g2||F /√m1m2 for (g1, g2) ∈ G c where the matrix ˆµhas ˆµij as itsijth entry and the matrix (Y−ˆµ)g 1g2 has (Yij −ˆµij)gi,1gj,2 as itsijth entry. B.2.2 Assumptions We make four additional assumptions about the class of random graph mod...

2020