U.S. Bureau of the Census

04/24/2025 | Press release | Archived content

A Machine Learning Approach for Counting Language Minority Groups in the United States

The U.S. Voting Rights Act (VRA) prohibits discrimination at the polls based on language minority status. The VRA requires the U.S. Census Bureau to use data on the voting-age population, including the number of citizens, limited English proficient individuals, and those with limited education, to identify those language minorities. In the 2021 cycle of determining which jurisdictions (states, counties, cities) must provide voting materials in languages in addition to English, Census Bureau statisticians developed both frequentist and Bayesian models to estimate the population sizes of language minority groups. In this paper, we present a new machine learning model that outperformed the previous 2021 statistical models for some language minority groups. Our machine learning model was developed in the framework of random forests (RF), which adopted the beta-binomial posterior as the objective function to construct RF trees. This adoption is in the spirit of soft computing because the new RF method relaxed a typical objective function used for the RF to accommodate the unique VRA data structure.

U.S. Bureau of the Census published this content on April 24, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 08, 2025 at 13:15 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io