NCSES - National Center for Science and Engineering Statistics

01/13/2025 | Press release | Distributed by Public on 01/13/2025 13:30

National Survey of College Graduates (NSCG) 2023

Data collection. The data collection period lasted approximately 6 months (25 May 2023 to 20 November 2023). The NSCG used a trimodal data collection approach: self-administered online survey (Web), self-administered paper questionnaire (via mail), and computer-assisted telephone interview (CATI). Individuals in the sample generally were started in the Web mode, depending on their available contact information and past preference. After an initial survey invitation, the data collection protocol included sequential contacts by postal mail, e-mail, and telephone that ran throughout the data collection period. At any time during data collection, sample members could choose to complete the survey using any of the three modes. Nonrespondents to the initial survey invitation received follow-up contacts via alternate modes.

Quality assurance procedures were in place at each data collection step (e.g., address updating, printing, package assembly and mailing, questionnaire receipt, data entry, CATI, coding, and post-data collection processing).

Mode. About 91% of the participants completed the survey by Web, 7% by mail, and 2% by CATI.

Response rates. Response rates were calculated on complete responses, that is, from instruments with responses to all critical items. Critical items are those containing information needed to report labor force participation (including employment status, job title, and job description), college education (including degree type, degree date, and field of study), and location of residency on the reference date. The overall unweighted response rate was 61%; the weighted response rate was 61%. Of the roughly 161,000 persons in the 2023 NSCG sample, 94,606 completed the survey.

Data editing. Response data had initial editing rules applied relative to the specific mode of capture to check internal consistency and valid range of response. The Web survey captured most of the survey responses and had internal editing controls where appropriate. A computer-assisted data entry (CADE) system was used to process the mailed paper forms. Responses from the three separate modes were merged for subsequent coding, editing, and cleaning necessary to create an analytical database.

Following established NCSES guidelines for coding NSCG survey data, including verbatim responses, staff were trained in conducting a standardized review and coding of occupation and education information, certifications, "other/specify" verbatim responses, state and country geographical information, and postsecondary institution information. For standardized coding of occupation (including auto-coding), the respondent's reported job title, duties and responsibilities, and other work-related information from the questionnaire were reviewed by specially trained coders who corrected respondents' self-reporting errors to obtain the best occupation codes. For standardized coding of field of study associated with any reported degree (including auto-coding), the respondent's reported department, degree level, and field of study information from the questionnaire were reviewed by specially trained coders who corrected respondents' self-reporting errors to obtain the best field of study codes.

Imputation. Logical imputation was primarily accomplished as part of editing. In the editing phase, the answer to a question with missing data was sometimes determined by the answer to another question. In some circumstances, editing checks found inconsistent data, which were removed and then subject to statistical imputation.

The item nonresponse rates reflect data missing after logical imputation or editing but before statistical imputation. The rates presented in this section are unweighted item nonresponse rates. For key employment items-such as employment status, sector of employment, and primary work activity-the item nonresponse rates ranged from 0.0% to 1.3%. Nonresponse to questions deemed sensitive was higher: nonresponse to salary and earned income was 5.1% and 7.6%, respectively, for the new sample members and 4.5% and 7.0%, respectively, for the returning members. Personal demographic data of the new sample members had variable item nonresponse rates, with sex at birth at 0.8%, birth year at 0.04%, marital status at 0.6%, citizenship at 0.4%, ethnicity at 1.6%, and race at 3.7%. The nonresponse rates for returning sample members were 0.7% for marital status and 0.7% for citizenship.

Item nonresponse was typically addressed using statistical imputation methods. Most NSCG variables were subjected to hot deck imputation, with each variable having its own class and sort variables chosen by regression modeling to identify nearest neighbors for imputed information. For some variables, there was no set of class and sort variables reliably related to or suitable for predicting the missing value, such as day of birth. In these instances, random imputation was used, so the distribution of imputed values was similar to the distribution of reported values without using class or sort variables.

Imputation was not performed on critical items or on verbatim-based variables.

Weighting. Because the NSCG is based on a complex sampling design and subject to nonresponse bias, sampling weights were created for each respondent to support unbiased population estimates. The final analysis weights account for several factors, including the following:

  • Adjustments to account for undercoverage of recent immigrants and undercoverage of recent degree-earners
  • Adjustment for incorrect names or incomplete address information on the sampling frame
  • Differential sampling rates
  • Adjustments to account for non-locatability and unit nonresponse
  • Adjustments to align the sample distribution with population controls
  • Trimming of extreme weights
  • Overlap procedures to convert weights that reflect the population of each frame (2015 ACS, 2017 ACS, 2019 ACS, and 2021 ACS) into a final sample weight that reflects the 2023 NSCG target population

The final sample weights enable data users to derive survey-based estimates of the NSCG target population. The variable name on the NSCG public use data files for the NSCG final sample weight is WTSURVY.

Variance estimation. The successive difference replication method (SDRM) was used to develop replicate weights for variance estimation. The theoretical basis for the SDRM is described in Wolter (1984); Fay and Train (1995); Ash (2014); and Opsomer et al. (2016). As with any replication method, successive difference replication involves constructing numerous subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. The 2023 NSCG produced 320 sets of replicate weights.

Disclosure protection. To protect against the disclosure of confidential information provided by NSCG respondents, the estimates presented in NSCG data tables are rounded to the nearest 1,000.

Data table cell values based on counts of respondents that fall below a predetermined threshold are deemed sensitive to potential disclosure, and the letter "D" indicates this type of suppression in a table cell.