Research

Enabling disaggregation of Asian American subgroups: a dataset of Wikidata names for disparity estimation

Authors: Lin Q, Ouyang D, Guage C, Gallegos IO, Goldin J, Ho DE

Journal: Scientific Data, 2025

Abstract

Decades of research and advocacy have underscored the imperative of surfacing – as the first step towards mitigating – racial disparities, including among subgroups historically bundled into aggregated categories. Recent U.S. federal regulations have required increasingly disaggregated race reporting, but major implementation barriers mean that, in practice, reported race data continues to remain inadequate. While imputation methods have enabled disparity assessments in many research and policy settings lacking reported race, the leading name algorithms cannot recover disaggregated categories, given the same lack of disaggregated data from administrative sources to inform algorithm design. Leveraging a Wikidata sample of over 300,000 individuals from six Asian countries, we extract frequencies of 25,876 first names and 18,703 surnames which can be used as proxies for U.S. name-race distributions among six major Asian subgroups: Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese. We show that these data, when combined with public geography-race distributions to predict subgroup membership, outperform existing deterministic name lists in key prediction settings, and enable critical Asian disparity assessments.

Do mandatory minimum penalties and penalty relief work? Evidence from California's clean water program

Authors: Treves RJ, Lin Q, Hilderbran M, Ouyang D, Rodolfa KT, Mustain E, Ho DE

Journal: PLOS Water, 2025

DOI: 10.1371/journal.pwat.0000326

View Article

Abstract

Promoting regulatory compliance in the face of limited resources poses a distinct challenge to regulators, who can find within rational choice theory a diverse toolkit of policy levers – ones that change the likelihood that noncompliance is sanctioned, the size of sanctions, or the cost of compliance – but must look beyond theory to understand how such levers actually work in practice. In 1999, California introduced changes to its clean water program that modified each of these components, and in the present work we explore the impact of these changes using a mixed-methods approach. While the state's introduction of $3,000 mandatory minimum penalties for certain Clean Water Act effluent and reporting violations by permitted wastewater facilities reflected a significant step-up in enforcement, the policy also allowed small communities with financial hardship to redirect penalties toward investments in compliance. Our results suggest that the increase in sanctions was associated with decreases in violations with relatively low compliance costs (such as reporting violations), but that there may be considerable mismatch between the scale of penalties and compliance costs for keeping many types of pollutants within regulatory limits, and an underappreciation of critical factors like political pressure that are uncaptured by classical theory. We also find suggestive evidence that penalty conversions reduced pollution limit violations, and highlight tensions between their eligibility criteria and environmental justice. Our case study highlights how policy design and implementation fidelity — how closely a policy is carried out as originally intended — shape regulatory effectiveness and equity, with lessons for regulators and researchers across policy domains.

Qiwei Lin

Enabling disaggregation of Asian American subgroups: a dataset of Wikidata names for disparity estimation

Abstract

Do mandatory minimum penalties and penalty relief work? Evidence from California's clean water program

Abstract