This Deep Dive Into Northeastern University Child Welfare Crisis post provides a more comprehensive account of the research done by Northeastern University students from KARA’s shorter child welfare crisis post here.
This is a post for readers that want to dive deeper into the research that has been done by Northern University. If who wish to review the actual work done, email info@invisiblechildren.org with Northern University Research Request in the subject line.
WHEN YOU Share KARA’s reporting with FRIENDS, INSTAGRAM & FACEBOOK
CALL AND EMAIL YOUR STATE REPRESENTATIVE AND
SHARE THIS POST AND YOUR CONCERNS
at-risk children and your community will benefit:
Small efforts = real results.
GROUP 1XN MIDTERM:
Project overview
This capstone partners with Kids At Risk Action (KARA), a child welfare advocacy nonprofit focused on making disparities in child outcomes visible to policymakers and the public. The group delivers three independent but connected analytics studies that look at risk from the individual child up to the state system level.
- Study 1 builds a national, individual‑level model that predicts which infants are most at risk of dying in the first year of life using routinely collected U.S. birth certificate data.
- Study 2 focuses only on infants who die and predicts how they die, assigning each death to one of five clinically meaningful cause‑of‑death pathways.
- Study 3 leaves the vital‑statistics dataset and looks at Texas child welfare operations, describing how removal rates and placement patterns vary across counties and regions over time.
Together, the studies are designed to give KARA three things: (a) a defensible risk model for federal‑level infant mortality advocacy, (b) a pathway‑specific lens that maps risk to concrete interventions, and (c) a state‑system baseline KARA can use when arguing about removal policies and regional disparities.
Study 1: National infant mortality risk model…
Study 1 uses the CDC NCHS Period Linked Birth–Infant Death Public Use Files from 2018–2024, covering about 25.8 million births and 123,523 linked infant deaths, with a raw infant mortality rate of 4.79 per 1,000. The model draws on 48 features including maternal demographics, prenatal care timing and intensity, clinical risk factors, infant characteristics, and delivery details.
Because infant deaths are rare (about 1 in 208 births), the team downsamples negatives in training but keeps the natural mortality rate in testing to maintain honest performance and calibration estimates. Three model families are compared: logistic regression, histogram gradient boosting, and XGBoost, with careful attention to missing data and fairness.
Key findings
- Performance: The headline XGBoost model trained on 2018–2022 and tested on 2024 achieves AUROC 0.934, PR‑AUC 0.519, and Brier score 0.063, competitive with or better than recent published work.
- Missingness as signal: A detailed audit shows that missing values in critical fields like gestational age and APGAR score are strongly associated with much higher mortality (9–10× higher than recorded values), so treating missingness as informative signal rather than noise is methodologically correct.
- Imputation strategy: A sidecar experiment compares median imputation, median plus missingness indicators, native NaN handling, and a MICE‑style IterativeImputer. Median‑plus‑indicators for logistic regression and native NaN for XGBoost perform best, while MICE degrades performance slightly by “smoothing away” informative missingness.
- Fairness: Within‑group AUROC is high and stable across all major race/ethnicity groups, but operational disparity remains. Under the single‑year model, the top‑decile screen‑in rate gap between non‑Hispanic Black and non‑Hispanic White infants is 16.85 percentage points; under the multi‑year model it narrows but persists at 14.78 points.
- Stability over time: Training on multiple years meaningfully improves discrimination and preserves stable performance across 2023 and 2024, with only small AUROC shifts by subgroup.
For KARA, Study 1 provides an individual‑level risk tool that quantifies racial and socio‑economic gradients in infant mortality and is explicitly audited for calibration and fairness. It offers a stronger evidence base than raw demographic rate ratios when arguing for investments in prenatal care, Medicaid, and maternal health.
Study 2: Cause‑of‑death pathways for infant deaths
Study 2 looks only at infants who die and asks a different question: given birth‑time information, which cause‑of‑death pathway is most likely? It uses the numerator (death) side of the same CDC linked file, 2018–2023, yielding 121,688 infant death records linked back to birth‑certificate predictors.
ICD‑10 codes are collapsed into five mutually exclusive pathways: Prematurity/Respiratory, Congenital Anomalies, SIDS/Post‑Neonatal, Maternal/Perinatal, and Other. These proportions are fairly stable year‑to‑year, supporting pooling across years.
Methods and performance
Predictors overlap heavily with Study 1: maternal age, race/ethnicity, marital status, education, smoking, WIC participation, prenatal care, gestational age, birth weight, sex, and multiple birth, plus engineered flags for preterm and low birth weight. The team uses an 80/20 stratified train–test split and evaluates three models: one‑vs‑rest logistic regression, Random Forest, and XGBoost.
- Random Forest achieves macro ROC‑AUC 0.8119 and macro F1 0.46 on the test set.
- XGBoost slightly exceeds Random Forest on macro ROC‑AUC (0.8150) but has almost zero recall and F1 for the smallest class (Maternal/Perinatal).
- Logistic regression performs worse overall and fails entirely on Maternal/Perinatal.
Under the project’s equity‑first deployment rule, Random Forest is selected despite its marginally lower aggregate AUROC, because it recovers the minority Maternal/Perinatal class at a clinically meaningful recall of 0.58 (versus 0.01 for XGBoost). Feature importance concentrates heavily in gestational age and birth weight (about 47% of total importance), with engineered prematurity and very‑low‑birth‑weight flags adding additional signal.
Implications for intervention
Random Forest outputs support a pathway‑specific intervention map that links each pathway to likely stakeholders and candidate actions. For example, Prematurity/Respiratory deaths point toward NICU capacity, antenatal steroids, and high‑risk obstetric referral; SIDS/Post‑Neonatal cases highlight safe‑sleep counseling, smoking cessation, and home visiting; Maternal/Perinatal cases emphasize early prenatal enrollment and risk stratification; and Congenital pathways suggest investments in prenatal screening and anomaly surveillance.
For KARA and partners, Study 2 shows that cause‑of‑death pathways can be predicted reasonably well from birth‑time data alone, and that model choice must explicitly weight minority‑class recall to avoid writing off clinically important pathways.
Study 3: Texas child welfare removals and placements
Study 3 shifts focus from infant vital statistics to the operation of a large state child welfare system, using public county‑year data from the Texas Department of Family and Protective Services (DFPS) for fiscal years 2016–2025. Three DFPS datasets are joined: child population (ages 0–17), removals, and substitute‑care placements.
After cleaning, merging, and excluding very small counties (average child population under 1,000) from rate‑based rankings, the final analytical table covers 2,540 county‑year rows across 254 counties and 10 years. The study examines statewide trends, county‑level variation, regional patterns, placement composition, and a simple predictive baseline.
Statewide and county patterns
Statewide, total removals rise from 19,073 in 2016 to a peak of 20,674 in 2018, then decline sharply to 10,010 in 2025, with the steepest single‑year drop (almost 40%) between 2021 and 2022. Removal rates per 1,000 children follow the same pattern, indicating the decline is not just population growth.
At the county level, large urban counties such as Bexar, Dallas, and Harris account for the largest removal volumes. When standardized by child population, however, smaller rural or peri‑urban counties dominate the high‑rate ranking, with top average removal rates in Llano, Marion, Brown, Taylor, San Saba, Falls, and others. Aggregating to DFPS regions shows the highest rates in regions like Abilene and Austin, with much lower rates in Houston and El Paso.
Placement composition and baseline model
Among children who do enter substitute care, foster care is the most common setting, followed by other substitute care; relative and non‑relative placements are similar in overall volume, with relative placements slightly higher in several years. The youngest children (0–5) account for the largest share of placements.
A simple time‑aware linear regression is used as a baseline predictor for county‑year removals, with training on 2016–2023 and testing on 2024–2025. Using child population, several placement counts, and DFPS region as predictors, the model explains about 91.4% of the variance in held‑out removal volume and far outperforms a naive mean predictor. However, residuals are largest in major urban counties, and the high R‑squared is largely mechanical because placements and removals are tightly correlated, motivating future work with count‑aware and hierarchical models and external covariates such as poverty and healthcare access.
For KARA, Study 3 offers a reusable template for mapping removal intensity and placement patterns across counties and regions, setting up a clear, data‑driven way to identify high‑priority regions for advocacy and to replicate the analysis in other states.
How the three studies fit together
The three studies form an advocacy pipeline: Study 1 identifies which infants are at highest risk, Study 2 indicates which pathway is most likely to kill them, and Study 3 shows how state systems respond to children at risk via removals and placements. The midterm report positions these as complementary views of the same underlying question: who is at risk, what kills them, and how does the system behave around them.
Planned next‑phase work includes feature‑parsimony experiments and neonatal‑only variants in Study 1, hyperparameter tuning and imbalance mitigation in Study 2, and richer forecasting models and multi‑state replication in Study 3. This roadmap is intended to move the project from strong descriptive and predictive baselines at midterm toward more parsimonious, interpretable, and policy‑ready deliverables by the final capstone milestone.
Group 2 Midterm Report Summary
Overview
Group 2’s midterm report presents a section-based analytics framework for Kids At Risk Action (KARA) rather than a single unified model. The team uses four separate workstreams to examine child welfare risk at different levels: state-level profiling, county-level forecasting, recurrence prediction, and a proxy linked-outcome workflow built from public health data.
The shared purpose across all four sections is to help KARA understand where child welfare risk is concentrated, which indicators are associated with worse outcomes, and how evidence can be translated into advocacy tools, dashboards, and clearer public communication. At this midterm stage, the report is best understood as a set of analytical prototypes that test multiple directions for a sponsor-facing child welfare risk analytics framework.
Project purpose
KARA is a nonprofit focused on advocacy, public education, and policy communication for abused and neglected children, so the project is designed around practical evidence needs rather than purely academic modeling goals. The report argues that child welfare data is difficult to interpret because counts, rates, screening decisions, and substantiation practices vary across places and systems, which means risk must be examined from multiple angles instead of through one metric alone.
For that reason, the team deliberately split the work into four tracks. Together, those tracks test how public, administrative, synthetic, and proxy datasets can support geographic comparison, future forecasting, recurrence triage, and linked-risk analysis.
State risk profiling
Kanishka’s section uses the NCANDS 2017 Combined Data Tables to profile state-level child welfare risk across the United States. The analysis moves beyond simple descriptive rankings and adds correlation analysis, Random Forest feature importance, clustering, PCA visualization, outlier detection, and regression comparison.
Several states appear consistently high-risk across multiple indicators, especially Alabama, Mississippi, West Virginia, Arkansas, and South Carolina. The report shows that these states do not stand out on just one measure, but across broader combinations of victim rates, fatality rates, maltreatment patterns, and screening-related measures.
A useful insight for KARA is that normalized rates tell a different story than raw counts. Large states such as California, Texas, and New York do not dominate when the data is adjusted by child population, while smaller states with more severe rates become much more visible.
The section also highlights how steep the national screening and investigation funnel is. Of about 10.9 million referrals in 2017, only 60.9 percent were screened in, and only 12.8 percent of all referrals ended in substantiation. That drop-off matters because it raises questions about how much risk is filtered out before a case reaches formal confirmation.
The cluster analysis groups states into low-, moderate-, and high-risk profiles, and the high-risk cluster repeatedly includes Alabama, Mississippi, West Virginia, and Arkansas. Isolation Forest outlier detection also flags Alabama, Mississippi, South Carolina, and West Virginia, which reinforces the conclusion that some states show unusual combinations of risk indicators even when they do not top every single-variable ranking.
For KARA, this section offers a practical way to communicate where child welfare burden appears most severe and to frame policy conversations around multivariate risk profiles instead of isolated statistics.
County forecasting
Sreekarteek’s section focuses on Iowa county-level child abuse occurrences from 2004 through 2023 using data from the Iowa Department of Health and Human Services. The work has two phases: descriptive trend analysis and predictive modeling using a county-year panel.
The descriptive results show that child abuse reporting is heavily concentrated in a small number of metropolitan counties, especially Polk, Linn, Scott, Black Hawk, and Woodbury. The top ten counties account for more than half of statewide volume even though they represent only ten of Iowa’s ninety-nine counties.
The section also shows a major statewide trough in 2014, which the report attributes to reporting-system changes rather than a true collapse in underlying incidence. That distinction matters because the forecasting model must absorb regime change instead of treating it as substantive improvement.
Abuse-type composition is another important theme. Neglect remains the dominant category, while substance-related categories rise sharply after 2017, including a strong increase in dangerous-substance cases. This suggests that local child welfare burden is changing in composition, not just volume.
The predictive phase builds a county-year panel with lagged counts, abuse-type shares, a pre-2014 indicator, and county and service-area effects. Ridge regression, XGBoost, and a naive lag-1 baseline are compared using rolling-origin time-based validation across 2018 through 2023 holdout years.
All three models perform very well on paper, with mean R-squared values around 0.965 to 0.970, but the report is explicit that this performance is driven largely by autoregressive persistence. In fact, the naive lag-1 baseline beats the fitted models in some holdout years, showing that prior-year history carries most of the signal in this dataset.
XGBoost is still chosen as the production model for the 2024 forecast because it slightly improves average MAE and better captures nonlinear interactions. It projects Polk, Linn, Scott, Dubuque, and Pottawattamie as the top five Iowa counties for 2024 reported abuse occurrences.
For KARA, this section is valuable because it shows how county-level data can identify high-burden local areas and support planning, but it also models good analytical caution by separating true predictive learning from simple carry-forward persistence.
Recurrence prediction
Neeti’s section asks a more operational question: among children with a prior substantiated maltreatment history, can structured variables help predict recurrence within 12 months. Because restricted NCANDS child-level data was not yet available, the analysis uses a synthetic NCANDS Child File-style dataset of 12,000 records calibrated to national distributions, plus a second state-year aggregate panel covering 30 states across 8 years.
The individual-level pipeline compares Logistic Regression, Random Forest, and Gradient Boosting, then adds threshold optimization, fairness auditing, feature engineering, and SMOTE oversampling. At the default threshold, recall is too low to be useful, which leads the section to optimize the threshold to 0.43 for the Logistic Regression model.
That optimized deployment model reaches AUC 0.627, recall 77.9 percent, and F1 0.399. The key tradeoff is clear: the model is not highly discriminative in a broad academic sense, but it is tuned to catch many more true recurrence cases, which is often the more important operational goal in child welfare screening.
The section’s strongest substantive finding is that the Caregiver Risk Score composite is the single most important predictor. Caregiver drug abuse, domestic violence, and financial problems are repeatedly identified as the most important risk-elevating factors, while post-investigation services appear protective across major maltreatment types.
The fairness audit shows false negative rate differences by race and ethnicity, though calibration remains broadly maintained across groups in the synthetic dataset. The report appropriately cautions that these fairness results may change once restricted real-world data becomes available.
The macro-level state-year panel adds another layer by linking child poverty and maltreatment burden. Across 30 states and 8 years, child poverty rate has a positive relationship with maltreatment victim rate, with r = 0.62 and R-squared = 0.39, meaning poverty alone explains about 39 percent of cross-state variation in victimization rates.
For KARA, this section provides a practical recurrence-risk framework, a strong prevention narrative around caregiver conditions and service access, and a policy argument connecting family-level risk factors to broader structural poverty.
Linked-outcome workflow
Haomin’s section uses CDC and NCHS public health data as a proxy workflow rather than a direct child welfare model. The goal is to demonstrate how large administrative files can be cleaned, linked, summarized, and modeled in a way that could later transfer to KARA-specific data.
The earlier mortality analysis examines suicide-related mortality by demographic group and finds that male deaths are much more common than female deaths, with the largest counts among adults aged 20 to 49. The report treats this as broad public health context rather than direct evidence about KARA’s client population.
The more relevant component uses linked birth–infant death data to compare general birth records with infant death records. In that comparison, preterm birth appears in 14.07 percent of the general birth sample but 64.57 percent of linked infant death records, while low birth weight appears in 9.63 percent of the birth sample and 65.99 percent of linked infant death records.
A logistic regression demonstration supports the same pattern. Low birth weight is the strongest predictor with an odds ratio of about 8.58, followed by preterm birth at about 2.82 and no prenatal care at about 1.91. The model reaches accuracy 0.881, specificity 0.939, precision 0.625, and recall 0.563.
The report emphasizes that the real value here is not the specific outcome, but the workflow itself. It shows how earlier background indicators can be linked to later adverse outcomes in a structured, interpretable way, while also demonstrating the limits of public outcome data for prevention-focused questions.
For KARA, this section suggests that future analytics tools should emphasize variables that exist before harm becomes severe, such as referral history, caregiver risk, service access, and timing of intervention.
Shared takeaways
Across all four workstreams, Group 2 reaches several common conclusions. First, child welfare risk is geographically uneven, whether measured across states or across counties. Second, prior history and early indicators matter, but models must be interpreted carefully because strong predictive performance can reflect persistence or reporting systems rather than deeper causal structure.
Third, communication is as important as modeling. The report repeatedly argues that KARA will benefit most from outputs that are easy to explain, such as state risk profiles, county forecasts, recurrence risk tiers, fairness summaries, and linked-indicator comparisons.
Fourth, the current midterm work is useful but not yet production-ready. The sections use different datasets, levels of analysis, and assumptions, including synthetic and proxy data, so the findings should be treated as analytical prototypes rather than a single operational decision system.
Why it matters
The report’s main value is that it gives KARA several tested pathways for using data more strategically. State-level profiling supports advocacy and policy communication, county forecasting supports local burden planning, recurrence modeling supports triage and prevention thinking, and the linked-outcome workflow demonstrates how future sponsor data could be organized into a more integrated risk framework.
The strongest direction for the final capstone phase is to narrow these separate tracks into a clearer sponsor-facing product, likely a dashboard or presentation framework built around practical questions rather than around datasets. The report specifically points toward a layered final product that could answer where risk is concentrated, which indicators are associated with higher risk, whether future burden can be estimated, and what limitations KARA should keep in mind when communicating the findings.
Group 3 Midterm Report Summary
Group 3’s midterm report presents four separate but connected analytics studies for Kids At Risk Action (KARA), all focused on child welfare risk, child fatalities, homicide, and foster care burden across the United States. The studies use different datasets, levels of analysis, and model types, but together they aim to give KARA a defensible evidence base for advocacy around where risk is highest, how child deaths are classified, what structural conditions drive lethal outcomes, and what future foster care trends may look like.
The report is organized as a multi-study framework rather than a single unified model. Study 1 identifies states at elevated risk for child maltreatment fatalities, Study 2 examines how child deaths are classified by manner of death, Study 3 analyzes state-level child homicide trends and structural drivers, and Study 4 classifies states by maltreatment burden while also forecasting foster care population under different policy scenarios.
Study 1: State-level child maltreatment fatality risk
Study 1 uses CDC NVSS Multiple Cause of Death data from 2010 to 2013 to classify U.S. states as higher-risk or lower-risk for child maltreatment fatalities. The data is aggregated to the state-year level and merged with Census child population denominators, producing a 196-row panel covering 49 states across four years. A state is labeled high-risk if its fatality rate is above the median for the panel, creating a balanced outcome split.
The study engineers five main predictors: fatality rate per 100,000 children, percent male victims, percent of victims under age 5, percent of victims aged 15 to 17, and total child deaths from all causes. A later model also adds a lagged fatality-rate feature to capture state-level persistence over time.
Three models are compared on a 2013 holdout year: Random Forest, XGBoost, and a tuned XGBoost model with the lagged feature. The Random Forest baseline reaches 73.5 percent accuracy, F1-score 0.74, and AUC 0.779, while XGBoost reaches a similar performance with AUC 0.786. The tuned lagged-feature XGBoost model performs best, reaching accuracy 0.776, F1-score 0.78, and AUC 0.811.
The most important demographic predictor in the Random Forest model is the share of victims under age 5, followed by total child deaths, adolescent victim share, and percent male. In the lagged model, prior-year fatality rate becomes the strongest feature, showing that state risk is partly autoregressive. The study also notes an important limitation: adolescent fatalities may partly reflect community violence cases included under broader assault codes rather than only caregiver-inflicted maltreatment.
For KARA, Study 1 offers a public-data-based framework for ranking states by elevated fatality risk without needing restricted child welfare records. Its value is less in perfect prediction than in giving KARA a reproducible way to identify where advocacy attention may be most urgent.
Study 2: Manner-of-death classification
Study 2 reframes child fatality analysis as a three-class prediction problem: Homicide, Accident, or Could Not Determine. It uses CDC NCHS mortality microdata from 2018 to 2024 and applies a three-tier case definition that includes core maltreatment codes, assault and homicide codes, and undetermined-intent or accidental suffocation codes, resulting in 24,161 child records.
A critical methodological choice in this study is explicit leakage prevention. The team excludes tier and maltreatment-type variables from the model because those are directly derived from ICD-10 codes and would artificially inflate performance. Once leakage is removed, the model must rely only on independent demographic and situational features such as age, sex, race indicators, ethnicity, place of injury, month, and year.
The final classification task uses the three dominant classes: Homicide, Accident, and Could Not Determine, for a filtered sample of 24,092 records split 80/20 into training and testing. Random Forest slightly outperforms XGBoost, reaching 76.0 percent accuracy and weighted F1 of 0.775. Per-class results show strong performance for Accident and Homicide but much weaker performance for Could Not Determine, where F1 is only 0.36.
The study shows that age is the dominant predictor because the population is strongly bimodal. Infants under age 1 make up 45.1 percent of deaths, while adolescents aged 12 to 17 account for 39.0 percent. Among infants, only 10.7 percent of deaths are classified as homicide, while 59.3 percent are classified as accident and 29.1 percent as could not determine. Among adolescents, 97.5 percent of deaths are classified as homicide.
This produces the report’s most important finding: the “Could Not Determine” category is overwhelmingly an infant classification problem, with about 80 percent of undetermined deaths involving children under age 1. For KARA, that means improved infant death investigation protocols could directly reduce surveillance gaps and improve fatality counting.
Study 3: Child homicide trends and structural drivers
Study 3 analyzes national and state-level child homicide using NCHS microdata, CDC WONDER state-year panels, Children’s Bureau outcomes reports, and Census denominators. Its central question is which structural factors are most strongly associated with child homicide and whether child welfare reporting intensity has a measurable protective effect.
The report documents a sharp national increase in child homicide during the pandemic era. National child homicide rose from 2,223 in 2019 to 3,082 in 2023, a 39 percent increase. Firearms became even more dominant during this period, with the firearm share rising from 73.7 percent to 84.1 percent. Non-Hispanic Black children account for 53 to 55 percent of homicide victims despite being only about 14 percent of the U.S. child population, showing severe racial disparity.
The cross-sectional OLS regression finds that child poverty alone explains 53.3 percent of cross-state variation in child homicide rates. A one-percentage-point increase in state child poverty is associated with 0.457 additional pediatric homicides per 100,000 children, and that relationship remains strong even after adding foster-care entry rate as a control.
The panel Ridge regression performs even better, reaching on a held-out 2023 test set with MAE 0.87 per 100,000. SHAP decomposition shows that prior-year homicide rate is the strongest predictor, followed by poverty rate and percent Black population share. NCANDS victim rate shows a negative association, suggesting that stronger reporting and surveillance may have a protective effect.
The structural-only model removes prior-year homicide rate to isolate more actionable policy variables. In that model, bootstrap inference confirms poverty rate and Black population share as statistically significant risk factors, while NCANDS victim rate remains a statistically significant protective factor. The report interprets this as direct support for advocacy around poverty reduction and stronger mandatory-reporter systems.
For KARA, this section is especially powerful because it moves beyond description into a structural argument about what is driving child homicide and what public systems might reduce it.
Study 4: Maltreatment burden and foster care forecasting
Study 4 combines NCANDS aggregate statistics, AFCARS, and ACS data from 2011 to 2020 to classify states by child maltreatment burden and forecast foster care population through 2035. The classification dataset uses a median split on victim rate per 1,000 children to label 25 states as high-burden and 25 as low-burden.
The final six-feature version of the model includes poverty rate, unemployment rate, region code, fatality rate, foster care rate, and median income. Logistic Regression, Random Forest, and Gradient Boosting are compared using 5-fold stratified cross-validation. Gradient Boosting performs best and reaches a CV-AUC of 1.000 on the 50-state dataset. The report is careful not to overstate this result, noting that the sample is very small and the signal in foster care rate and fatality rate is unusually strong.
Feature interpretation shows that foster care rate is the dominant predictor of state burden. Poverty rate and fatality rate increase burden, while median income and employment are associated with lower burden. Region carries little importance once the stronger socioeconomic and system features are included.
The second half of Study 4 focuses on forecasting national foster care population under three policy scenarios: optimistic, baseline, and pessimistic. Using a linear baseline fit to 2011 to 2020 data, the report projects about 479,000 children in foster care by 2035 under the baseline scenario. Under an optimistic prevention-investment scenario, the projection falls to about 393,000, while under a pessimistic funding-reduction scenario it rises to about 565,000. This creates a 172,000-child policy gap by 2035.
That 172,000-child gap is one of the report’s clearest advocacy outputs. It translates child welfare policy choices into a concrete, understandable number that KARA could use in legislative and funding conversations.
Overall significance
Taken together, the four Group 3 studies give KARA a multi-scale evidence framework. Study 1 identifies which states are at greater fatality risk, Study 2 shows where death classification ambiguity may hide maltreatment, Study 3 links child homicide to structural conditions such as poverty and racial disparity, and Study 4 turns system burden into a forward-looking forecast.
A major strength of the report is that it consistently uses public and reproducible data sources, which fits KARA’s advocacy mission. Another strength is that the studies are framed around policy-relevant questions rather than only model performance. The main limitation is that the studies are independent rather than fully integrated, so they function more as complementary evidence streams than as one operational decision system.
Even so, the report gives KARA several concrete advocacy tools: a state fatality-risk ranking framework, evidence of classification ambiguity in infant deaths, strong support for poverty reduction and reporting-system investments as homicide prevention levers, and a forecasted foster care policy gap that translates system choices into long-term consequences.
Short summary (Group 3)
Group 3 Short Summary
Group 3’s midterm report brings together four analytics studies for KARA focused on child maltreatment fatalities, child death classification, child homicide, and foster care forecasting across the United States. Together, the studies create a public-data-based evidence framework for identifying where risk is highest, how deaths may be misclassified, what structural conditions are driving lethal outcomes, and what future child welfare system burden may look like.
Main findings
- Study 1: State fatality risk classification. Using CDC mortality data from 2010 to 2013, the team builds a state-level classifier for elevated child maltreatment fatality risk. The best model reaches AUC 0.811, and the strongest predictors are under-5 victim share and prior-year fatality rate.
- Study 2: Manner-of-death classification. Using CDC mortality microdata from 2018 to 2024, the team classifies child deaths as Homicide, Accident, or Could Not Determine. Random Forest reaches weighted F1 0.775, but the Could Not Determine class performs poorly with F1 0.36, showing that infant death investigations remain highly ambiguous.
- Study 3: Child homicide drivers. Using CDC WONDER and other public data, the report finds that child poverty explains 53.3 percent of cross-state variation in child homicide rates. Poverty and Black population share are significant structural risk factors, while NCANDS victim rate shows a protective association, supporting stronger reporting systems.
- Study 4: Burden classification and foster care forecasting. Using NCANDS, AFCARS, and ACS data, the team classifies states by maltreatment burden and forecasts foster care population through 2035. The most policy-relevant result is a projected 172,000-child gap between optimistic and pessimistic foster care scenarios by 2035.
Why it matters for KARA
This report gives KARA several practical advocacy tools. It identifies high-risk states, shows that many infant deaths remain difficult to classify cleanly, ties child homicide to structural poverty and racial inequality, and quantifies the long-term foster care consequences of policy investment versus disinvestment.
The report’s biggest strength is that it relies on public, reproducible datasets and turns technical analysis into clear policy arguments. Its main limitation is that the four studies are separate rather than fully integrated, so the work is best seen as a strong midterm evidence portfolio rather than a final decision-support system.
Group 4 Midterm Report Summary
Group 4’s midterm report presents three connected studies on child maltreatment risk, child abuse victimization, and child fatality forecasting in collaboration with Kids at Risk Action (KARA). All three studies rely on NCANDS data and collectively cover national descriptive analysis, state-level feature engineering, county-level prediction, geographic comparison, and time-series forecasting across the 2013 to 2017 period.
The overall goal of the report is to turn fragmented child welfare administrative data into a structured, policy-relevant evidence base that KARA can use for advocacy, early intervention planning, and future predictive modeling. Rather than building one final operational model, the report creates a progression from data pipeline development, to exploratory analytics, to early predictive modeling and short-term forecasting.
Study 1: National child maltreatment risk profiling
Study 1 builds the foundation for the entire report by creating a reproducible analytical pipeline from the NCANDS 2017 Combined Data Tables and linked trend data from 2013 to 2017. The project uses state-level measures on fatalities, unique victims, investigations, maltreatment types, perpetrator relationships, referral screening, and age distribution. These raw values are transformed into four main engineered features: screened-in share, under-age-5 victim share, investigation-to-victim ratio, and fatalities per 1,000 victims.
At the national level, the analysis finds that child fatalities rose from 1,548 in 2013 to 1,688 in 2017, a 9 percent increase. Over the same period, unique victims rose only 2.7 percent, while children investigated rose 10 percent. The report interprets this divergence as a warning sign that case severity may be worsening even if overall victim counts are not rising dramatically.
The report also shows that neglect dominates the maltreatment landscape, accounting for 64.1 percent of cases, far above physical abuse at 15.6 percent and sexual abuse at 7.4 percent. That finding matters because neglect often reflects structural and socioeconomic strain rather than only direct assault, which means prevention policy must include family support, not just punitive responses.
Referral screening is another major theme. Nationally, 62.4 percent of referrals are screened in, but the state range is extremely wide, from about 20 percent to nearly 100 percent. Alabama has one of the highest screened-in shares at 98.3 percent, while South Dakota is near the bottom at 15.6 percent. This variation suggests major differences in state policy thresholds, institutional capacity, and decision practices.
Age-based vulnerability is also highly concentrated in early childhood. Nationally, 72 percent of victims are under age 5, and states such as Arizona, Tennessee, Arkansas, and Idaho have especially high under-5 victim shares. The report consistently frames early childhood as the most vulnerable life stage for maltreatment exposure.
At the state level, the fatality-rate rankings identify Mississippi, Arkansas, Indiana, West Virginia, and Georgia among the highest-risk jurisdictions when normalized per 100,000 children. This is a key shift from raw fatality counts, where large states dominate due to population size.
The study then moves into advanced exploratory analytics. It creates an integrated 2017 analytic file for 51 jurisdictions with eight variables, checks missingness and structure, and produces a composite Child Welfare Priority Score based on fatality severity, early-childhood vulnerability, and system-response intensity. Using this score and related flagging systems, states such as Pennsylvania, Idaho, Tennessee, Washington, Georgia, and Missouri emerge as follow-up priorities because they show multiple overlapping concern signals.
The report also examines correlation and outlier structure. Investigation-to-victim ratio and fatalities per 1,000 victims show a moderate positive relationship, while most other engineered features capture more distinct dimensions of risk. Outlier analysis highlights states such as Georgia, Missouri, and Pennsylvania on severity-related measures, and Puerto Rico on under-5 victim share.
For KARA, Study 1’s value is that it converts raw national child welfare tables into a structured, interpretable, modeling-ready framework for identifying severity, vulnerability, and system-response patterns across states.
Study 2: County-level child abuse victimization and prediction
Study 2 shifts from state-level NCANDS aggregation to county-level child abuse victimization, focusing on exploratory analysis and predictive modeling. The report identifies clear temporal trends, age concentration, and geographic clustering in the county data, with particular emphasis on Iowa counties.
One of the most important descriptive findings is that children aged 3 to 5 and younger carry the heaviest victim burden. This reinforces the broader report theme that the youngest children are consistently the most vulnerable.
Geographic concentration is also strong. Polk, Linn, and Scott counties lead in Iowa victim counts, indicating that burden is not evenly spread across the state. This makes local prioritization possible and highlights the need to think beyond statewide averages when planning intervention or resource allocation.
The predictive component compares Linear Regression and Random Forest for county-level victim count prediction. Random Forest performs better, with lower prediction error and greater stability, suggesting that ensemble models capture the nonlinear structure of the county data more effectively than a linear baseline.
The report does not treat prediction as a substitute for interpretation. Instead, it uses model results to show that demographic and geographic variables contain meaningful signal and could support a more advanced future risk model, especially if socioeconomic variables are added later.
For KARA, Study 2 contributes a local, operational perspective. It shows that county-level burden can be mapped, compared, and predicted, and that predictive tools are most useful when paired with geographic context and demographic segmentation.
Study 3: Time-series forecasting and geographic analysis of child fatalities
Study 3 focuses specifically on child fatalities using NCANDS state-level fatality data from 2013 to 2017. This section combines descriptive statistics, geographic analysis, hierarchical clustering, policy-risk categorization, and forecasting.
The first major finding is structural persistence. Fatality patterns are highly stable over time, with inter-year correlations above . This means state rankings barely shift across the five-year period, suggesting deeply entrenched structural differences rather than short-term fluctuations.
The distribution of annual state fatality counts is strongly non-normal, with heavy right skew caused mainly by very large states such as Texas, California, and Florida. Shapiro-Wilk tests reject normality for all five annual distributions, which the report uses to justify nonparametric approaches in future phases.
Regional analysis shows that the South accounts for 51 percent of all national child fatalities across the study period. This is one of the report’s clearest structural findings, and it points to the South as the region where policy intervention may be most urgent.
The study also classifies states into four policy-risk quadrants using average burden and trend direction. Texas, Colorado, and Arkansas are labeled critical because they combine high burden with worsening trend. California and Florida are treated as stable-high, while states such as Vermont, Wyoming, and Maine are framed as well-managed benchmark cases.
Bootstrap confidence intervals show that only Texas has a statistically confirmed worsening trend over the period. That matters because many apparent trend differences across states may simply reflect short-series noise rather than meaningful change.
For forecasting, the report uses Holt’s damped exponential smoothing for both state and national fatality series. The damped form is selected because it avoids unrealistic extrapolation from such short time series. The method produces a backtest mean absolute error of about 8 to 12 fatalities per state, which the report describes as competitive with more complex machine learning approaches.
At the national level, the forecast projects 1,710 to 1,730 child fatalities in 2018 and 1,720 to 1,750 in 2019. At the state level, Texas is forecast at about 180 fatalities in 2018 and California at about 145. The report stresses that these are short-term planning forecasts, not long-range causal projections.
For KARA, Study 3 provides a practical surveillance framework. It identifies where fatality burden is structurally concentrated, where worsening appears statistically credible, and what near-term fatality counts may look like under persistent conditions.
Overall significance
Taken together, Group 4’s three studies form a coherent analytics progression. Study 1 builds the data pipeline and state-level risk framework, Study 2 adds local predictive modeling at the county level, and Study 3 extends the work into geographic interpretation and time-series fatality forecasting.
A major strength of the report is that it consistently turns raw NCANDS tables into interpretable indicators that KARA could use in advocacy and future dashboard development. The studies also balance descriptive policy insight with predictive ambition, rather than presenting model performance in isolation.
The most important cross-cutting conclusions are that fatalities are rising faster than victim counts, neglect remains the dominant form of maltreatment, children under age 5 are consistently the most vulnerable group, screening behavior varies dramatically across states, and structural fatality burden is highly persistent over time. These findings support a prevention strategy centered on early childhood intervention, standardized screening practices, stronger state monitoring, and deeper analysis of high-risk states and counties.
For KARA, the report provides a strong midterm evidence base for identifying priority geographies, framing child welfare severity in a risk-adjusted way, and preparing future predictive models and dashboards that can support policy conversations with clearer and more defensible data.
GROUP 5 MIDTERM REPORT
Group 5 Midterm Report Summary
Group 5’s midterm report presents a three-study analytics initiative for Kids at Risk Action (KARA), combining case-level machine learning, state-level socioeconomic risk modeling, and long-term national trend analysis. The three studies are independent in method and dataset, but they converge on a consistent message: child maltreatment risk in the United States is strongly shaped by poverty, structural conditions, and state policy choices rather than by short-term economic change alone.
The project uses federal and state child welfare data spanning fiscal years 2010 to 2023. Its purpose is not only to describe the problem, but also to give KARA practical tools for advocacy, prioritization, and future predictive decision support.
Study 1: Predicting maltreatment substantiation
Study 1 uses the NCANDS FY 2022 Child File to build a machine learning model that predicts whether a child maltreatment report will be substantiated. The starting dataset contains 3,216,847 CPS report records, and after excluding ambiguous dispositions, non-reporting states, and administrative or test records, the final analytic sample includes 3,089,412 completed reports.
The study focuses on 15 selected variables drawn from 112 original NCANDS fields, covering child demographics, report characteristics, maltreatment allegations, family risk factors, and administrative identifiers. Two outcomes are defined: a primary substantiation flag and a secondary within-year multiple-report flag used as a proxy for poly-victimization.
The preprocessing pipeline is extensive and clearly structured. It includes age binning into four developmental stages, a count of maltreatment types, consolidation of report sources into four categories, explicit handling of missing data, stratified train-test splitting, feature scaling, and SMOTE to address class imbalance in the training set. The final model is a baseline XGBoost classifier evaluated on a 20 percent holdout set of 617,882 records.
The model performs strongly for an administrative child welfare dataset. It reaches 76 percent accuracy, precision 0.71, recall 0.68, F1-score 0.69, and AUC-ROC 0.82. The report interprets this as strong enough for triage-support use, especially in high-volume CPS environments where better prioritization could redirect investigative resources toward the highest-risk cases.
Explainability is a major strength of the study. SHAP analysis shows that prior victim history is the single strongest predictor of substantiation, followed by the number of maltreatment types alleged. Anonymous reports have a negative effect on substantiation likelihood compared with professional report sources. These findings are translated into concrete policy logic: cases involving prior CPS contact, multi-type allegations, and professional reporting sources should receive higher triage priority.
The study also evaluates fairness across racial groups using false negative rates, which is important because a false negative in this context means a true maltreatment case is missed. False negative rates range from 0.30 to 0.35 across White, Black, Hispanic, Other, and Unknown categories, indicating relatively consistent performance across groups, although the Unknown category performs slightly worse and may reflect data-quality issues.
The study’s main limitation is that the poly-victimization model is still in progress and the current proxy only captures multiple reports within the same fiscal year. Even so, Study 1 shows that NCANDS administrative data already contains substantial predictive value and could support a defensible ML-based triage framework for child welfare systems.
Study 2: Socioeconomic risk index framework
Study 2 shifts from case-level prediction to state-level structural vulnerability. It integrates ACS, BLS, NCANDS, AFCARS, and Child Welfare Outcomes data into a composite socioeconomic Risk Index designed to rank U.S. states by child welfare vulnerability.
The central idea is that child welfare burden is strongly linked to poverty, unemployment, and low income, but these drivers are often scattered across incompatible public datasets. To address that, the study constructs a composite Risk Index based on standardized poverty, unemployment, and income measures.
The analysis finds that poverty and low income are the dominant predictors of state-level child welfare vulnerability. Deep South states consistently appear as the highest-risk cluster in the resulting index framework. The report emphasizes that geography is not destiny, however, because states with similar demographics can produce very different outcomes depending on policy design and prevention infrastructure.
For KARA, this study’s practical value is straightforward. It provides a state-level ranking tool that can help prioritize advocacy targets and frame legislative conversations around structural vulnerability rather than isolated child welfare statistics.
The study also points toward future forecasting and lag-based predictive modeling, though those parts are still under development. At the midterm stage, its main contribution is the integrated risk framework and the empirical confirmation that poverty-related measures are the strongest state-level warning signals.
Study 3: Longitudinal national child welfare trends
Study 3 analyzes 14 years of national child welfare outcomes from FY 2010 to 2023 using four federal systems. It applies descriptive analysis, OLS trend modeling, and Pearson correlation analysis to produce five statistically significant findings, each tied directly to KARA’s policy agenda.
The first finding is that the national substantiation rate declined from 20.24 percent in 2010 to 15.40 percent in 2023, a drop of 4.84 percentage points or 23.9 percent. The slope is statistically significant, with and . The report argues that this decline should not be interpreted as real improvement in child safety, but as a policy artifact linked to changes such as alternative response expansion.
The second finding is more alarming. Child fatality rates increased over the same period, with total fatalities rising from 1,520 in 2010 to 1,980 in 2023, an increase of 460 deaths or 30.3 percent. This trend is also highly significant, with and . The report highlights a crucial contrast: unemployment fell 61 percent during the same period, so general economic recovery does not explain the rise in fatalities.
The study therefore argues that structural factors are driving the fatality increase. It specifically points to substance abuse epidemics, housing instability, and prevention-system gaps as the most plausible structural explanations.
The third finding is that poverty is the strongest socioeconomic predictor of child welfare risk. The reported correlation is with , meaning poverty alone explains about 51.6 percent of the variance in the relevant child welfare outcome. This becomes the highest-impact finding in the report and the clearest cross-study point of convergence.
The fourth finding is large geographic variation across states. The report describes a 270 percent spread in child welfare rates across jurisdictions, reinforcing the idea that policy choices, prevention systems, and reporting practices matter greatly.
The fifth finding is that Minnesota performs as a sustained best-practice model. Over 14 years, Minnesota’s average rate is 3.67 per 1,000 children, about 61.5 percent below the national average, with a very tight year-to-year range of 3.45 to 3.84. The report interprets this stability as evidence of durable prevention infrastructure rather than statistical noise, and it recommends “Minnesota Model” replication as a national policy strategy.
For KARA, Study 3 provides a particularly strong legislative evidence package. It gives the organization statistically grounded trend findings, a warning against misreading falling substantiation as progress, evidence that poverty is the most powerful long-run driver, and a concrete example of a state that has outperformed national norms consistently over time.
Cross-study conclusions
One of the strongest parts of the Group 5 report is the integration across studies. Although the three studies use different units of analysis and different methods, they all point to the same broad conclusion: poverty is the dominant, consistent driver of child maltreatment risk.
Study 1 shows this at the case level through feature importance tied to prior victimization and report severity, Study 2 confirms it through the state-level socioeconomic Risk Index, and Study 3 demonstrates it in national trend and correlation analysis. Together, they argue that child welfare outcomes are structurally driven and geographically concentrated, not simply random or cyclical.
The report also makes a strong methodological claim that administrative child welfare data is far more useful than many agencies currently treat it. Study 1 achieves AUC-ROC 0.82 using NCANDS alone, and Study 3 shows that relatively simple regression methods can produce statistically robust policy findings from long-run administrative trends.
Another key cross-study insight is that economic improvement alone is insufficient. Fatality rates rose even while unemployment fell sharply, meaning that general labor-market recovery does not automatically reduce severe child welfare outcomes. The report therefore argues for structural interventions beyond economic growth, especially in housing support, substance abuse treatment, and prevention infrastructure.
Why it matters for KARA
The report gives KARA a strong midterm evidence base for advocacy and planning. It offers a case-level triage model with clear explainability, a state vulnerability ranking framework, and 14 years of statistically significant national trend findings that can support legislative testimony and public messaging.
Its most policy-relevant recommendations are also clearly stated. The report advocates for poverty-responsive prevention through TANF, EITC, housing assistance, and childcare subsidies; structural intervention through substance abuse treatment and housing stability programs; replication of Minnesota’s prevention model; correction of misleading substantiation narratives; and eventual deployment of ML triage tools in high-caseload systems.
The biggest limitation is that the three studies are not yet fully integrated and several pieces remain in progress, including poly-victimization modeling, county-level disaggregation, and additional forecasting work. Even so, the report already functions as a coherent and persuasive evidence portfolio showing that child welfare risk is measurable, predictable, and strongly shaped by policy-relevant structural conditions.
Short summary (Group 5)
Group 5 Short Summary
Group 5’s midterm report combines three studies for KARA: a machine learning model for maltreatment substantiation, a state-level socioeconomic risk index, and a 14-year national trend analysis of child welfare outcomes. Together, the studies show that child welfare risk is strongly tied to poverty, structural conditions, and state policy choices, and that administrative data can be used effectively for both prediction and advocacy.
Main findings
- Study 1: ML-based substantiation prediction. Using 3,089,412 NCANDS FY 2022 records, the team built an XGBoost model that predicts substantiation with AUC-ROC 0.82, accuracy 0.76, recall 0.68, and F1-score 0.69. SHAP analysis shows that prior victim history and number of maltreatment types are the strongest predictors, while fairness assessment finds relatively similar false negative rates across racial groups.
- Study 2: Socioeconomic risk index. By integrating ACS, BLS, NCANDS, AFCARS, and Child Welfare Outcomes data, the study creates a composite state-level Risk Index based on poverty, unemployment, and income. Poverty and low income emerge as the strongest predictors of child welfare vulnerability, with Deep South states showing the highest risk profiles.
- Study 3: Longitudinal trend analysis. Over FY 2010 to 2023, the report finds a 4.84-point decline in substantiation rate, a 30.3 percent increase in child fatalities, poverty as the strongest predictor with , major geographic variation across states, and Minnesota as a sustained best-practice model performing about 61.5 percent below the national average.
Why it matters for KARA
This report gives KARA a strong multi-level advocacy toolkit. It supports ML-based case triage, state-level vulnerability ranking, and long-run legislative arguments showing that falling substantiation rates do not necessarily mean child welfare is improving.
The report’s clearest overall message is that poverty is the most consistent driver of child maltreatment risk across all three studies. It also argues that structural interventions such as housing support, substance abuse treatment, childcare subsidies, and prevention-system investment are more important than relying on general economic improvement alone.







