Missingness in ILD: diagnostics and sensitivity routes

Why missingness matters in intensive longitudinal data

In EMA and diary studies, missing responses are often non-ignorable in substance even when analysts assume missing at random (MAR) for estimation: burden, symptom severity, context, or device issues can co-determine both whether a prompt is answered and the outcome. tidyILD does not replace dedicated missing-data software; it gives structured diagnostics, person-level adherence views, time-oriented summaries, and hooks to IPW-based sensitivity workflows already in the package.

MNAR (missing not at random) means missingness depends on unobserved values or latent states. No routine plot proves MAR vs MNAR. Use multiple sensitivity routes and transparent reporting.

Types of missingness (useful labels)

Unit non-response / attrition: a person stops contributing (often modeled as monotone dropout on the outcome).
Intermittent missingness: gaps with later observed values again; common in EMA.
Item missingness: some variables missing while others are observed on the same prompt.

The ordinal occasion index .ild_seq (from ild_prepare()) is the default backbone for “wave” summaries; it is not the same as equal calendar spacing—see vignette("ild-decomposition-and-spacing", package = "tidyILD") when timing is irregular.

Descriptive profiling: `ild_missing_pattern()` and heatmaps

ild_missing_pattern() tabulates NA rates by variable and by person, and builds a person × occasion heatmap (sequence index on the x-axis). Pass outcome to enrich by_id with compliance metrics from ild_missing_compliance() (see below).

library(tidyILD)
set.seed(11)
d <- ild_simulate(n_id = 25, n_obs_per = 12, seed = 11)
d$stress <- rnorm(nrow(d))
d$mood <- d$y
miss_i <- sample(nrow(d), 45)
d$mood[miss_i] <- NA
x <- ild_prepare(d, id = "id", time = "time")
mp <- ild_missing_pattern(x, vars = c("mood", "stress"), outcome = "mood")
mp$summary
#> # A tibble: 2 × 4
#>   var    n_obs  n_na pct_na
#>   <chr>  <int> <int>  <dbl>
#> 1 mood     300    45     15
#> 2 stress   300     0      0
head(mp$by_id, 3)
#> # A tibble: 3 × 10
#>   .ild_id mood_n_obs mood_n_na stress_n_obs stress_n_na n_rows n_obs_outcome
#>     <int>      <int>     <int>        <int>       <int>  <int>         <int>
#> 1       1         10         2           12           0     12            10
#> 2       2          9         3           12           0     12             9
#> 3       3         11         1           12           0     12            11
#> # ℹ 3 more variables: pct_nonmissing_outcome <dbl>, longest_run_observed <int>,
#> #   monotone_missing <lgl>

Plot the same view with ild_plot(x, type = "missingness", var = "mood") (see ?ild_plot).

Person-level compliance: `ild_missing_compliance()`

tidyILD::ild_missing_compliance() returns, per .ild_id:

pct_nonmissing_outcome, longest_run_observed (longest streak of observed values in time order),
monotone_missing: TRUE if, after the first missing outcome, all later values are missing (NA if there is no missingness for that person),
optional expected_occasions for rough adherence vs planned N (pct_of_expected, meets_expected_rows).

cm <- ild_missing_compliance(x, outcome = "mood", expected_occasions = 12L)
summary(cm$pct_nonmissing_outcome)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   58.33   83.33   83.33   85.00   91.67  100.00

When to use `ild_missing_model()` and `ild_missing_bias()`

ild_missing_model() fits a logistic model for is.na(outcome) ~ predictors (pooling glm or glmer with random = TRUE). Use it as a diagnostic for whether observed covariates predict missingness, not as proof of MAR.
ild_missing_bias() is a shortcut for one numeric predictor vs missingness (teaching / quick screening).

If predictors are associated with missingness, complete-case summaries of the outcome can be biased even when a mixed model uses all rows—because the composition of who contributes at each occasion may shift. Compare descriptive means by missingness pattern only as exploratory, not causal.

Complete-case vs mixed models (careful wording)

A linear mixed model fitted to all available rows uses the likelihood contribution from observed outcomes conditional on random effects. Under MAR and correct mean and covariance specification, inference for the outcome model can be appropriate while ignoring the missingness mechanism (likelihood-based inference). That statement has scope limits:

It concerns outcome missingness in the modeled response, not necessarily intermittent predictor missingness handled ad hoc.
MNAR breaks the interpretation; informative dropout requires sensitivity analysis.
Complete-case analysis (drop any person-occasion with missing outcome) changes the estimand when missingness is not MCAR.

tidyILD encourages comparing descriptives and fits on full vs complete-case data as a coarse sensitivity check, not a formal test.

Cohort-level and hazard summaries

ild_missing_cohort(): fraction of non-missing outcomes at each .ild_seq plus an optional line plot.
ild_missing_hazard_first(): discrete hazard of being missing on the current row among rows at risk (previous occasion observed, or first occasion). Under intermittent missingness this is a rough first-event summary; under monotone dropout it aligns better with a discrete-time dropout hazard.

coh <- ild_missing_cohort(x, outcome = "mood", plot = FALSE)
head(coh$by_occasion)
#> # A tibble: 6 × 4
#>   .ild_seq n_rows n_obs pct_observed
#>      <int>  <int> <int>        <dbl>
#> 1        1     25    23           92
#> 2        2     25    21           84
#> 3        3     25    21           84
#> 4        4     25    24           96
#> 5        5     25    19           76
#> 6        6     25    21           84
head(ild_missing_hazard_first(x, outcome = "mood"))
#> # A tibble: 6 × 4
#>   .ild_seq n_at_risk n_missing hazard
#>      <int>     <int>     <int>  <dbl>
#> 1        1        25         2 0.08  
#> 2        2        23         3 0.130 
#> 3        3        21         2 0.0952
#> 4        4        21         0 0     
#> 5        5        24         5 0.208 
#> 6        6        19         3 0.158

One entry point: `ild_missingness_report()`

ild_missingness_report() bundles compliance, ild_missing_pattern() (with outcome enrichment), cohort and hazard tables, optional ild_missing_model(), the same late-dropout heuristic used in guardrails (GR_DROPOUT_LATE_CONCENTRATION), and short snippets for methods text.

rpt <- ild_missingness_report(
  x,
  outcome = "mood",
  predictors = "stress",
  fit_missing_model = TRUE,
  random = FALSE,
  cohort_plot = FALSE
)
names(rpt)
#> [1] "compliance"    "pattern"       "cohort"        "hazard"       
#> [5] "flags"         "missing_model" "snippets"
rpt$snippets["overview"]
#>                                                                                                                                                                                                                 overview 
#> "Outcome mood was summarized with tidyILD person-level compliance, cohort observed fractions by occasion (.ild_seq), and a discrete-time hazard of first missing row (ordinal schedule; see ?ild_missing_hazard_first)."

MNAR as sensitivity (no single fix)

tidyILD does not fit selection models, pattern-mixture models, or joint models for MNAR. Consider external packages and pre-specified sensitivity analyses. The snippets in ild_missingness_report() remind readers that logistic missingness models are diagnostic / sensitivity, not proof of MAR.

IPW and causal tools as one sensitivity route

If you fit ild_missing_model(), you can feed predicted probabilities into ild_ipw_weights() and ild_ipw_refit() for inverse-probability weighting (see ?ild_ipw_weights and causal vignettes). This addresses observed confounding of missingness under a MAR-like weighting story; it is not a blanket MNAR solution.

mm <- ild_missing_model(x, outcome = "mood", predictors = c("stress"), random = TRUE)
x_w <- ild_ipw_weights(x, mm, stabilize = TRUE)
fit_w <- ild_ipw_refit(mood ~ stress + (1 | id), data = x_w, weights = ".ipw")

Other templates (not evaluated here)

Compare complete-case vs full mixed model (same formula):

x_cc <- dplyr::filter(x, !is.na(mood))
fit_full <- ild_lme(mood ~ stress + (1 | id), data = x, warn_uncentered = FALSE)
fit_cc <- ild_lme(mood ~ stress + (1 | id), data = x_cc, warn_uncentered = FALSE)

Multiple imputation outside tidyILD, then ild_prepare() per imputed dataset and pool with mice / mitools / brms—keep the imputation model and substantive model aligned with your estimand.

What tidyILD does not do (and where to look)

Full MI pipelines — mice, Amelia, jomo, etc.
MNAR selection / pattern-mixture — specialized books and packages; consult a statistician.
Continuous-time event models for dropout — survival / joint longitudinal-survival packages.
Replacing domain knowledge about why prompts are missed.