Wednesday, September 30, 2020

Discovery & Correction of An Error in the 2019 National Health Interview Survey Release

 

I recently discovered a critical error in the 2019 National Health Interview Survey by the Centers for Disease Control and Prevention.  Here is the story.

When the CDC announced the release of the 2019 NHIS on September 23 here, I downloaded and began using the data.  Right away, I discovered an error that made accurate population estimates impossible.  The flaw essentially underestimated the U.S. population by about 100 million.    

I phoned the CDC and was told that staff would investigate the matter.

The CDC typically releases data as a mass of numbers that cannot be used without a statistics program.  The agency provides computer code for three common programs (named SAS, SPSS and Stata) that help researchers massage the data into usable form.  The codes consist of several thousand lines of directions written in the language of each statistics program.  I found a single line of erroneous code in the SPSS input statements.  On September 25, I emailed my CDC contact with details of the problem and the solution.  He replied on September 28, “We do have an error in our program that is affecting the total population and the point estimates.  We are working on a fix and will re-publish the SPSS program soon on our website.” 

On September 29, the CDC corrected its error.  The only acknowledgement of the correction was a message sent to members of the NHIS Users email list.  The agency failed to make note of the correction on their website, and did not change the date stamp on the page, which still reads, “Page last reviewed: September 23, 2020,” rather than September 29, as it should.  This is inadequate, as many users will not be aware of the error; if they use the original program, their work will be inaccurate. 

The NHIS is the premier instrument for assessing the health of the nation.  For decades, the CDC has used the annual NHIS to count the number of current and former smokers, and the surveys are a valuable source of health-related information for researchers across the globe.  The CDC should better document its errors and corrections.

 

No comments: