Addressing Uncertainty: The Emergence of the CRMS/CFSPID Diagnostic Category Following Newborn Screening for Cystic Fibrosis

This article uses cystic fibrosis as a case study to examine how physicians and scientists have navigated uncertainty following newborn screening. Despite the many benefits of newborn screening, including earlier diagnosis, therapeutic intervention, and a reduced diagnostic odyssey, this public health approach also comes with challenges. For example, physicians began to document infants with indeterminate diagnoses - those with a positive screen who did not clearly fit into the cystic fibrosis or “normal” categories - by the early twenty first century. As a means of addressing this uncertainty and to facilitate long-term follow up of such infants, the U.S. Cystic Fibrosis Foundation recommended in 2009 that a new diagnostic term, CFTR-related metabolic syndrome (CRMS), be used. However, the CRMS label was not favored in Europe and a different term, cystic fibrosis screen positive, inconclusive diagnosis (CFSPID), was adopted in 2015 instead. Efforts to address the uncertainty associated with indeterminate diagnoses have been complicated, as stakeholders have held differing views about whether the screening algorithms should aim to maximize or minimize CRMS/CFSPID cases. Many who favor the identification of babies with CRMS/CFSPID note that they are at increased risk of developing symptoms and signs consistent with a cystic fibrosis diagnosis. In contrast, those who support algorithms that reduce CRMS/CFSPID cases point to iatrogenic harms associated with the medicalization of children who may remain healthy into adulthood. Furthermore, investigators have grappled with how best to ensure equity in newborn screening among different racialized groups while concomitantly attempting to minimize false positive results. These issues are applicable beyond the context of cystic fibrosis, especially as programs contemplate the incorporation or expanded use of next generation sequencing in algorithms for a wide range of diseases, and this history highlights how efforts to reduce uncertainty in one setting can lead to new and persistent sources of uncertainty in other areas.


Introduction
Newborn screening (NBS) has been associated with numerous benefits, including potentially sparing parents "diagnostic odysseys," the possibility of early therapeutic intervention, and improved outcomes [1][2][3]. In the case of phenylketonuria (PKU), the first condition to be widely screened for in the U.S. and other jurisdictions in the 1960s, early detection allowed for the initiation of a special diet to prevent the cognitive disability seen in individuals with untreated disease [4]. With the introduction of tandem mass spectrometry into NBS algorithms in the 1990s, many more diseases could be detected prior to symptom development [5,6]. Recent advances in genetic and genomic technologies have expanded NBS even further, including pilot programs to apply whole genome sequencing to screening protocols [7][8][9]. Yet, despite the ability to screen for an increasing number of conditions, attempts to reduce uncertainty through NBS have led to "new uncertainties in unanticipated arenas," further prompting debates about the goals and appropriate targets for screening programs [1,3,10].
In the setting of discussions about strategies for newborn screening over the past several decades, many have pointed to James M. G. Wilson and Gunnar Jungner's 1968 criteria for screening for disease. Asked by the World Health Organization to develop principles and practices for screening programs, Wilson and Jungner noted that "the object of screening for disease is to discover those among the apparently well who are in fact suffering from disease" so that preventative measures could be put in place [11]. They laid out ten principles of early disease detection, the first two of which stipulated that "the condition sought should be an important health problem" and that "there should be an accepted treatment for patients with recognized disease" [11]. However, as Diane Paul and Jeffrey Brosco have noted, Wilson and Jungner's tenet that a therapeutic intervention be available "was being described by some as outmoded" by the 1990s [4]. Those who were opposed to strict adherence to Wilson and Jungner's principles suggested that newborns could benefit from early diagnosis even in the absence of an effective treatment and some felt that "the family as a whole, and not only the child, should be considered a legitimate beneficiary of NBS" [4]. In contrast, those who favored screening algorithms that more closely followed Wilson and Jungner's principles expressed concern that expanded screening had the potential for harm [4]. In the twenty first century, numerous investigators have suggested modifications to the Wilson-Jungner criteria [12,13].
In this paper, I use cystic fibrosis (CF) as a case study to examine how physicians have evaluated different screening algorithms and grappled with uncertainty in newborn screening. CF is an autosomal recessive genetic disease associated with a wide range of symptoms, including those affecting the respiratory and gastrointestinal systems. Prior to NBS, CF was typically diagnosed in the setting of symptoms, yet babies diagnosed following a positive newborn screen are often asymptomatic, resulting in uncertainty about symptom development and long-term prognosis [3]. Based on interviews with parents whose children received a diagnosis of CF following a positive newborn screen, Rachel Grob documents how challenging such uncertainty can be [1]. As one mother explained, her son "could be fine for the rest of his life" or "something terrible could happen" [1]. Although most individuals diagnosed with CF through NBS will go on to develop symptoms, this can be difficult for parents to come to terms with, especially if their child appears healthy. This type of ambiguity following newborn screening and a subsequent diagnosis can result in what Stefan Timmermans and Mara Buchbinder have termed "patients-in-waiting," defined "as an umbrella concept for those under medical surveillance between health and disease" [3,14]. Reflecting on the history of PKU, Diane Paul and Jeffrey Brosco have noted that "in a sense, NBS programs have always created patients-in-waiting" [4]. Furthermore, uncertainty associated with a diagnosis in the absence of symptoms has also been described outside of NBS, in the context of such diagnoses as breast cancer, high blood pressure, and elevated cholesterol [15][16][17][18][19].
In addition to uncertainty about the nature and timeline of symptoms that will develop following a diagnosis, confusion may also exist when a child with a positive newborn screen does not fall into a clear diagnostic category [1,4,[20][21][22][23]. Since at least 2005, investigators noted that there were a number of babies who did not meet the strict diagnostic criteria for CF following a positive screen, yet they did not clearly fit into the "normal" category either [24]. Based on ethnographic and historical work examining newborn screening in France, Joëlle Vailly argued that patients with an indeterminate diagnosis following NBS for CF are "typical of the 'patients-in-waiting' living between illness and health after screening" that Timmermans and Buchbinder described [20,21]. Since some of these babies who received an indeterminate diagnosis were later reclassified and given a CF diagnosis, many investigators felt that it was important to monitor these infants for signs of CFrelated disease. However, others argued that the identification of these children was not in line with the intended goals of newborn screening and could lead to tangible harms resulting from their medicalization.
As a means of addressing this uncertainty and to facilitate long-term follow up, the U.S. Cystic Fibrosis Foundation recommended in 2009 that such infants be given a new diagnostic label, termed CFTR-related metabolic syndrome (CRMS) [25]. As physicians Hara Levy and Philip Farrell noted years later, the CRMS diagnostic category was "literally invented" during a consensus conference that resulted in the 2009 guidelines [26]. While investigators in Europe initially opted not to provide these infants with a diagnostic label, by 2015 the term cystic fibrosis screen positive, inconclusive diagnosis (CFSPID) was formally proposed [27]. Soon after, the two terms were merged to CRMS/CFSPID to facilitate international collaboration and long-term follow up of these infants [28]. Analysis of the CRMS/CFSPID category provides insight into how physicians have navigated the perceived benefits and drawbacks of different NBS algorithms, with important implications for newborn screening more generally, including how the composition of mutation panels can affect who gets diagnosed. The subsequent section includes a brief overview of the early years of newborn screening for CF, which provides important context for understanding how physicians responded to uncertainty following NBS.

The Early Years of IRT-Based Newborn Screening for Cystic Fibrosis
During the 1970s, investigators in New Zealand developed a screening test that is still used today to detect newborns with CF. As described in the March 2, 1979 issue of The Lancet, Jeanette Crossley and colleagues from the University of Auckland School of Medicine provided evidence that the digestive enzyme trypsin was elevated in the blood of young children with CF [29]. At that time, the investigators knew that digestive enzymes were produced by the pancreas and released into the intestine in healthy individuals, yet the enzymes were unable to leave the pancreas in many people with CF. As one of the 1979 coauthors reflected on their approach to newborn screening years later, "The rationale was quite simple. We knew the ducts draining the pancreatic juices into the intestine are blocked in CF, so where were all the enzymes going? Were the dammed up enzymes flowing back into the blood?" [30]. Subsequent work documented that the enzymes were entering the blood as predicted, "and this could be easily detected in the same dried blood spots collected for other diseases from all newborns in NZ" [30].
After demonstrating that the pancreatic enzyme immunoreactive trypsin (IRT; later referred to as immunoreactive trypsinogen, the precursor to trypsin) was elevated in the serum of children with CF, Crossley and colleagues developed an assay for dried blood spots. In particular, they used Guthrie cards from newborn screening that had been conducted for other conditions such as PKU, including eleven from children who were later diagnosed with CF. While the cost of the assay was a potential drawback, they argued that IRT screening using dried blood spots was preferred over existing screening methods because "the sample is already collected for other purposes in countries conducting Guthrie testing of the newborn" and "the test gives an abnormal result even for C.F. children with residual pancreatic function" [29]. Since these children with at least some pancreatic enzymes available for food digestion would not be detected using the stool trypsin and other screening tests that were already available, the IRT bloodspot screening test would be able to detect children with a wider range of disease manifestations [31][32][33]. By 1981, Crossley and colleagues provided evidence that the IRT method of newborn screening for CF was highly sensitive in the context of both retrospective and prospective studies, noting that "no false negatives are yet known" [34].
The use of the terms sensitivity and specificity in diagnostics can be traced to the early 20 th century, in the context of immunological concepts stemming from the development of serologic tests for syphilis [35,36]. The sensitivity of a test, such as IRT screening for CF, refers to the proportion of people who have the disease who test positive, and specificity indicates the proportion of people who don't have the disease who test negative [11]. Ideally, sensitivity and specificity will be high, and, correspondingly, the false positive and false negative rates will be low [11]. However, as Wilson and Jungner note, "a relatively high proportion of false positives can be accepted" in the context of a screening test, as a diagnostic test will follow [11]. In contrast, scholars have noted that a high false positive rate in a compulsory diagnostic test, such as for HIV or syphilis, is not acceptable because of the associated stigma that can result from such a finding [37,38]. When considering the appropriate level of specificity for the newborn screening test for CF, Crossley and colleagues opted for lower specificity, and a correspondingly higher false positive rate, as they did not want to miss any infant with CF. Explaining their rationale for a lower IRT cutoff level, and a resultant increase in the false positive rate, Crossley and colleagues noted that "we prefer to err on the side of caution, and to accumulate information on those babies who are therefore designated 'false positives'" [34].
Shortly after the 1979 report by Crossley and colleagues, described by Philip Farrell as "the shot heard around the world," IRT-based newborn screening began in many countries, including Italy, Australia, and the United States [39,40]. Colorado was the first state to implement screening in 1982, followed by Wisconsin and Wyoming over the next several years [41]. Also around this time, in the mid-1980s, trials were initiated in the UK and Wisconsin to evaluate the clinical efficacy of NBS [33]. By 1997, the CDC recommended that pilot programs be implemented in the U.S. at the state level [39]. Noting that the "most convincing evidence of health outcomes from screening for CF is in the area of nutrition and growth," the CDC recommended CF NBS in 2004, followed by the CF Foundation's recommendation in 2005 [32,41,42]. By 2010, newborn screening for CF had extended to all 50 states in the U.S., covering virtually all newborns regardless of perceived risk of disease [43]. At that time, evidence had continued to accumulate about the clinical benefits of early diagnosis, especially in terms of nutrition and growth, yet the pulmonary benefits of CF NBS were "more controversial" [41,[44][45][46][47][48][49][50].
While the exact protocol for newborn screening has varied over time and in different locations, all tests include an initial IRT measurement [51][52][53]. After the gene associated with CF (CFTR; cystic fibrosis transmembrane conductance regulator) was described in 1989, some state screening programs (and those in other jurisdictions) began to include a DNA testing component as well [24,[54][55][56][57]. By 2002, investigators from Wisconsin and the Czech Republic reflected on the two-tier IRT/DNA approach to newborn screening, pointing to the challenges that DNA analysis posed in reaching the 100% sensitivity needed to detect all babies with CF [58]. At that time, close to 1,000 mutations had been identified in the CFTR gene, yet it was not technically feasible to detect all mutations in screening programs [58]. In fact, some programs only tested for a single mutation, ∆F508 (F508del), which had been found in the majority of CF chromosomes initially analyzed [59].
Since the distribution of CFTR mutations varied across locations and between different racialized groups, the composition of a mutation panel could have important implications for the sensitivity of the test [41]. As the Wisconsin and Czech Republic investigators cautioned in 2002, there was a "great risk of introducing an internal bias into any screening program that does not properly select the correct mutational array associated with the population or populations that it is servicing" [58]. Of relevance to this debate about mutation selection, states with a DNA testing component included between one and approximately 27 mutations in 2004 [41]. This concern about false negatives due to the DNA mutation panel component of newborn screening programs for CF has continued to receive attention. In an attempt to increase the sensitivity of CF NBS in 2007, California began including a DNA sequencing component to their protocol. The resulting three-tiered algorithm included an initial IRT measurement, followed by mutation panel analysis in the setting of an elevated IRT, and, finally, CFTR gene sequencing if one disease-causing mutation was detected [51,60].

Addressing Diagnostic Uncertainty with a New Diagnostic Label
While many physicians were concerned about the possibility of false negative NBS results because of the inability to detect all CF mutations, investigators from the New England Newborn Screening Program directed attention to an additional source of diagnostic uncertainty in 2005 [24,61,62]. Based on analysis of results from a two-tier IRT/DNA screening algorithm in Massachusetts between 1999 and 2003, physician Richard Parad and Anne Marie Comeau, Deputy Director of the New England Newborn Screening Program, identified "four problematic diagnostic categories generated by CF NBS" [24]. In Massachusetts, as in other jurisdictions, infants who screened positive following the two-tier IRT/DNA test were referred for a sweat test, widely described as the "gold standard" diagnostic for the disease [63]. The sweat test takes advantage of the observation, initially made in the late 1940s, that people with CF generally have elevated levels of chloride in their sweat compared to those without the disease [64][65][66].
When a positive newborn screen was followed by a positive sweat test, the diagnosis of CF was straightforward. However, some infants with a positive newborn screen did not have an elevated sweat chloride level that was diagnostic for CF, yet they did not fit clearly into the "normal" category either. The diagnostic challenges characterized by the New England investigators each stemmed from a mismatch between the newborn screening and diagnostic test results. In particular, infants in Group I had an elevated IRT level, 2 CFTR mutations, and sweat chloride in the borderline range; Group II infants also had an elevated IRT and 2 CFTR mutations, but a negative sweat test; Group III was characterized by an elevated IRT, 1 CFTR mutation, and a borderline sweat test; and Group IV newborns had a highly elevated IRT in the absence of any detectable CFTR mutations and a borderline sweat test [24]. While infants in each of these four categories did not meet the diagnostic criteria for CF in place in the U.S. at that time, investigators were hesitant to place them into the "normal" category. Similar challenges with indeterminate diagnoses following NBS had also been seen in the context of PKU [4].
In referring to these infants that presented a "diagnostic dilemma," the New England investigators noted that some "might go on to have severe, life-threatening, morbidity-causing ramifications of their CFTR abnormalities" and others might "go on to have such minimal mild phenotypes that they would never cross the threshold to come to clinical attention as part of the CF spectrum" [24]. The detection of such infants, labeled as having "atypical CF," was not the intended goal of newborn screening, which instead sought to identify babies with "classic CF…in whom early severe disease will develop" [24]. However, the New England investigators pointed to existing evidence that suggested that "early intervention could affect outcome" in some babies with "atypical CF" [24]. Based on the input of others involved in CF NBS, as well as CF Center directors, Parad and Comeau recommended that physicians "withhold a definitive diagnosis of classic CF, but explain to parents that a CF diagnosis may surface with time" [24]. In other words, they felt the benefits of identifying and monitoring infants with an uncertain diagnosis outweighed the potential harms.
Concerns about diagnostic uncertainty in the setting of CF NBS prompted the CF Foundation to sponsor a consensus conference on the topic in 2007 [26]. Based on recommendations from this conference, the CF Foundation published practice guidelines in 2009, in which they proposed that the term CFTR-related metabolic syndrome (CRMS) be used as a new diagnostic label for babies with an indeterminate diagnosis following NBS [25]. The CRMS diagnosis applied to infants with elevated IRT levels who fell into one of two categories. Category 1 infants had at least two sweat tests with an intermediate sweat chloride level and less than two CF-causing mutations in the CFTR gene; Category 2 infants had normal sweat chloride levels and two CFTR mutations, only one of which was known to be disease-causing [25]. In addition to diagnostic criteria, the practice guidelines also included specific recommendations about the management of children with CRMS, including repeat sweat testing and follow up visits with a CF specialist [25]. While CRMS was not considered to be a metabolic disorder, the CRMS diagnostic label was agreed upon in part because it provided a "clear name to families" that was distinct from CF and allowed "for US healthcare delivery system follow-up and billing purposes" [25,67]. Furthermore, the guidelines published in 2009 indicated the specific International Classification of Disease (ICD) codes that physicians could use in conjunction with the CRMS diagnosis [25,68].
As Geoffrey Bowker and Susan Leigh Star have noted, the ICD nomenclature began in the late nineteenth century and has subsequently undergone many revisions [69]. While ICD codes might seem "mundane," Anne Kveim Lie and Jeremy Greene argue that "the systems we use to classify disease shape the nature of medicine and public health in substantial and powerful ways" [70]. Charles Rosenberg has also documented how disease definitions have been influenced by insurance reimbursement considerations [17]. In fact, reflecting back on the CRMS label years later, Phillip Farrell and colleagues noted that "this consensus-producing effort led to the general recognition that ICD coding was not included as part of the diagnosis considerations in previous CF Foundationsponsored consensus conferences," and they suggested that consideration of ICD codes should be taken into account in the future [71]. In March 2010, shortly after the CRMS label had been proposed, the CF Foundation added this diagnostic category to their patient registry [62]. As noted in the 2011 CF Foundation Patient Registry Data Report, the "collection and analysis" of data on individuals with CRMS "will hopefully provide new information on this important group of patients" [72].
The same year that the CRMS label was proposed, European guidelines were also published, yet they focused exclusively on the management of children with an indeterminate diagnosis and did not recommend that a specific diagnostic label be used [73]. However, by 2015, the European Cystic Fibrosis Society Neonatal Screening Working Group organized a consensus process and came up with the term CF Screen Positive, Inconclusive Diagnosis (CFSPID) to designate infants with an unclear diagnosis following CF NBS [27,74]. Although the CRMS label was considered during discussions, those involved did not favor its use because they felt that the terms metabolic and syndrome were not appropriate descriptors for this group of asymptomatic infants because they did not have a clear disease and because of concerns that the CRMS label might result in the "overmedicalization of these children" [68]. Australian investigators had also expressed concerns about the CRMS label and recommended that it be dropped completely, pointing out that it "may not be so helpful to the parents whose child now has a very elaborately named condition, that might not turn out to be a condition at all" [75].
Like CRMS, the CFSPID category included infants in two groups: those in Group A had a "normal sweat chloride and two CFTR mutations, at least one of which has unclear phenotypic consequences" and Group B was defined by an "intermediate sweat chloride and one or no CFTR mutations" [27]. CFSPID was characterized as "a descriptive term rather than a diagnostic label, as these infants do not have a disease but have a number of risk factors for developing CF related issues in the future" [27]. The investigators also hoped that the CFSPID designation would facilitate long-term study of these infants to better understand outcomes [27]. In fact, the guidelines recommended that children with CFSPID undergo repeat sweat testing and follow up monitoring at a CF clinic [27]. However, in 2016, Juerg Barben and colleagues from Switzerland argued that "children with inconclusive CF diagnosis (CFSPID) should not be detected, as there is no evidence for improvement through early treatment" [76].
Shortly after CFSPID was proposed, an international group of investigators collaborated to integrate the CFSPID and CRMS labels to "allow for collection of data from populations around the world and increase our understanding of the epidemiology and outcomes of CRMS/CFSPID" [67]. The collaboration culminated in recommendations for how to characterize and monitor infants with an inconclusive diagnosis following CF NBS [26,67,68,77,78]. As the resulting 2017 consensus guidelines note, "the term CRMS is used in the US for healthcare delivery purposes and CFSPID is used in other countries, but these both describe an inconclusive diagnosis following NBS" [28]. The CRMS/CFSPID label was used to describe an individual with a positive newborn screen and either a normal sweat chloride with 2 CFTR mutations, "at least 1 of which has unclear phenotypic consequences OR…an intermediate sweat chloride value…and 1 or 0 CF-causing mutations" [28].
In addition to integrating the CRMS/CFSPID label, the diagnostic guidelines published in 2017 also cautioned physicians against using other terms to designate uncertain diagnoses. In particular, physicians were urged to "avoid the use of terms like classic/nonclassic CF, typical/atypical CF, delayed CF, because these terms have no harmonized definition and could be confusing for families or caregivers" [28]. Although the 2017 diagnostic guidelines cautioned against the use of terms such as nonclassic or atypical CF following inconclusive diagnostic testing in the setting of a positive newborn screen, the favored CRMS/CFSPID label did not eliminate confusion or uncertainty.

The CRMS/CFSPID Diagnostic Label and Continued Uncertainty
Shortly after the term CRMS was coined in 2009, investigators began carrying out studies to examine outcomes in individuals with this diagnostic label. In 2011, physician Clement Ren and colleagues published the results of a retrospective study that examined the clinical characteristics of individuals who would have qualified for a CRMS diagnosis following newborn screening for CF in New York from 2002 to 2010. While most children with CRMS "remained well and free of signs of CF disease," the study found that some individuals "did develop features of CF disease," including 25% who tested positive for a respiratory microorganism often seen in individuals with CF [61,79]. Furthermore, one child who would have been given the CRMS label at birth went on to develop a sweat chloride level consistent with a CF diagnosis [61].
Based on these data, the investigators felt that the creation of the CRMS label was justified and that such children should be followed due to their increased risk of developing "signs of CF disease" [61]. However, given that outcomes associated with CRMS varied, they also expressed concern about the risks of such an approach, including the possibility of "vulnerable child syndrome" in those with a CRMS label [61]. First described by physicians Morris Green and Albert Solnit in 1964, vulnerable child syndrome originally referred to an altered child-parent relationship seen in families with children "with a history of an illness or accident from which they were not expected to recover" due to the persistent belief that the child was "uniquely vulnerable" [80,81]. The New York investigators predicted that newborn screening for CF would result in the identification of a "substantial number of CRMS infants," with conversion from the CRMS to CF label dependent on the screened population and NBS protocol employed [61].
In fact, studies conducted in different regions did confirm that the ratio of CF to CRMS infants identified following NBS varied, as did the conversion from a CRMS to a CF diagnosis. As demonstrated in the New York study, 30 infants qualified for a CF diagnosis and 15 qualified for a CRMS diagnosis between 2002 and 2010, corresponding to a CF to CRMS ratio of 2:1 [61]. In contrast, as documented in a retrospective study examining individuals with a positive newborn screen in California from 2007 to 2010, investigators identified 248 infants with CF and 279 with CRMS, for a ratio of 1:1.1 [60]. In other words, the ratio of infants with CRMS to CF was much higher in California compared to New York, which meant that one's chance of being labeled with CRMS, including the corresponding medical surveillance, was dependent in part on what state they were born in. While the New York program used a two-tier IRT/DNA algorithm, California used a three-tier system that included an additional DNA sequencing step [60].
Numerous investigators have noted that California's "ethnically and racially diverse population" prompted the state to include a DNA sequencing component into their NBS program to reduce the number of false negative test results [79]. Since early DNA mutation panels were based on studies done in people who were predominantly characterized as white, the tests were less likely to detect mutations more common in other racialized groups [82][83][84]. As Philip Farrell explained in a 2021 interview, "the dogma is that CF is a disease of white people; that's not true, never has been true, but it's part of the dogma in CF that lingers" [40]. Along these lines, historians Keith Wailoo and Stephen Pemberton have documented how descriptions of CF, as either a white disease or a panethnic disease, have changed according to social and cultural contexts, and Dorothy Roberts has illustrated the harms associated with "medical stereotyping" based on race [85,86]. Reflecting on the retrospective New York and California studies, physicians Hara Levy and Philip Farrell noted "that the number of CRMS cases identified in NBS programs varies depending on the screening protocol being used, the IRT method and cutoff values, and perhaps the region being screened" [26]. These differences in CRMS incidence would have important implications for the long-term follow up of certain babies following NBS.
Several years later, the proportion of CRMS diagnoses increased even further in California. Based on the analysis of newborns with a positive screen in California from July 2007 to June 2012, the ratio of CF to CRMS diagnoses was 1:1.5 [43]. During that period, the California program continued to employ a 3-step NBS program for CF, in which an elevated IRT prompted DNA testing using a mutation panel (from 28-40 mutations), followed by DNA sequencing of the CFTR gene if 1 CFTR mutation was detected using the panel [43]. As the California investigators pointed out at the time, they used a "broad definition of mutation" in the sequencing component of their NBS algorithm, resulting in "enhanced sensitivity" and an increase in CRMS diagnoses [43]. They felt that this increase was warranted, however, because "a lack of comprehensive genotyping" resulted in misdiagnosis of "CRMS cases and others" as carriers [43]. While physicians Patrick Sosnay and Philip Farrell note that California "yields the highest positive predictive value ever observed in CF NBS," they expressed concern in 2015 about the increase in CRMS diagnoses, which they felt created "an unmeasured 'burden' for families and CF centers" [87]. By that time, investigators had already pointed to the possible medicalization of children with CRMS [61,88].
In the setting of debates about the identification of cases and the use of the CRMS label, physicians from various regions in the U.S. published a report in 2015 on outcomes associated with CRMS [62,79]. Drawing from CF Foundation (CFF) registry data of infants diagnosed following NBS from 2010-2012, they found that individuals with CRMS generally had "normal growth and nutrition" [62]. However, the incidence of Pseudomonas aeruginosa and Stenotrophomonas maltophilia infection in children with CRMS "was much higher" during their first year compared to children without a CF diagnosis [62]. Of particular interest, over 40% of infants who should have been given a CRMS diagnosis based on the consensus guidelines had actually been given a diagnosis of CF instead, further contributing to diagnostic uncertainty [62,89]. While the cause of this discrepancy was unclear, the investigators postulated that it may have been a result of "misclassification" due to misinterpretation of diagnostic guidelines, "reliance on NBS as a diagnostic test," and the "reluctance among CF clinicians to accept the use of CRMS as a diagnostic category" [62]. As another possibility, clinical data not included in the registry may have prompted physicians to label a child with CF instead of CRMS [62].
Also in 2015, investigators outside of the U.S. published outcomes data on children with an inconclusive diagnosis following NBS [90]. For example, a study carried out in Canada and Italy found that 11% of individuals with a CFSPID label were reclassified as having CF within three years of age [91]. In addition, a retrospective study carried out on NBS positive infants in Australia with intermediate sweat chloride levels found that 48% went on to be reclassified as having CF [90]. However, some have questioned the diagnostic criteria used for CF in this study, suggesting that the actual conversion rate may have been lower [46]. Nevertheless, data accumulating from studies provided evidence that some babies with the CRMS/CFSPID label were at risk of developing either CF or CF-like symptoms, whereas others did not have any detectable clinical consequences. However, at that time, physicians were unable to definitively predict outcomes for specific individuals labeled with CRMS.
In addition to debates about the significance of the CRMS/CFSPID label, investigators also noted variation in the incidence of CRMS/CFSPID in different racialized groups, with important implications for equity in newborn screening for CF. For example, the 2015 U.S. CF Foundation patient registry study described earlier found that "infants with CRMS were significantly more likely to be nonwhite (African American, Asian, Native American, or mixed race)" [62]. Of possible relevance, in a 2016 Children's Hospital Los Angeles report highlighting the DNA sequencing component of the California program for identifying babies with CF, physician Danieli Salinas explained that "if only a commercial panel is applied, a large number of diagnoses are missed among African Americans and Hispanics," resulting in the "devastating consequence of not detecting CF in these individuals until later in life, when lung damage is already irreversible" [92]. Furthermore, a California study published in 2016 stressed how "the CFTR variant spectrum and prevalence in black, Asian, Native American, and Middle Eastern CF patients have not been elucidated completely," possibly leading to "racial-ethnic disparities in the clinical sensitivity of neonatal screening algorithms" [93]. Reiterating the need for "more inclusive test approaches" that would detect a comparable number of mutations in all racialized groups, investigators from California suggested in 2017 that "a screening approach that includes comprehensive sequence analysis…is expected to miss the smallest number of patients" [94]. Reflecting on CF NBS in a 2021 interview, Philip Farrell noted that "we have a major problem with a lack of equity in CF newborn screening in this country" and Farrell explained that "we need to have newborn screening protocols, algorithms that cover the minority populations" [40].
While many agreed that NBS algorithms should include extended DNA analysis from an equity standpoint, others expressed concern about the harms associated with concomitantly identifying so many babies with CRMS/CFSPID [28,67,77,78,[95][96][97][98][99]. For example, while acknowledging that some newborns with CRMS had been reclassified with CF, investigators from Colorado, Wyoming, and Texas nevertheless asserted that "the number of newborns with an ambiguous diagnosis of CRMS remains troubling, and the long-term psychosocial, medical, and financial impacts need to be carefully studied" [100]. Referencing the Wilson and Jungner criteria, the investigators noted that "Colorado, Wyoming, and Texas have remained committed to identifying newborns following the underlying principles of screening, namely the condition sought should be an important health problem and the infants who require treatment should be agreed upon" [100]. In fact, based on a retrospective study of pooled data from the three states, where a two-bloodspot IRT/IRT/DNA screening protocol was used, the ratio of CF to CRMS was 10.8:1 from 2008/2009 to 2012, and this lower incidence of CRMS cases identified following NBS was more in line with Wilson and Jungner's principles than algorithms such as California's [100]. However, reflecting on the Colorado, Wyoming, and Texas study in 2016, Martin Kharrazi of the California Department of Public Health reiterated the possible benefit of a less strict adherence to Wilson and Jungner's principles and the resulting identification of more babies with CRMS, noting that some "develop CF symptoms as they age and ultimately obtain a CF diagnosis" [95]. In fact, in a recent report of individuals with a CRMS/CFSPID label at six centers in Italy, Vito Terlizzi and colleagues noted that 5.3% converted to a CF diagnosis during the study period (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018), and the vast majority of those cases were due to an increase in sweat chloride concentration [101].

Conclusion
As the history of newborn screening for CF makes clear, it has been "extremely challenging" for investigators to reach consensus about the most suitable algorithm to detect infants with the disease [46]. Decisions about sensitivity and specificity, as well as adherence to Wilson and Jungner's criteria for screening programs, are not straightforward. As Wilson and Jungner themselves pointed out in 1968, "in theory…screening is an admirable method of combating disease, since it should help detect it in its early stages and enable it to be treated adequately," but "in practice, there are snags" [12,102]. And the "snags" associated with CF NBS have arisen alongside its successes.
While programs with expanded CFTR sequence analysis have the advantage of promoting equity by detecting mutations in a wider range of racialized groups, the increased ability to detect CFTR variants has been associated with an elevated CRMS/CFSPID detection rate as well [45,46]. Some investigators view this as advantageous, since certain individuals with CRMS/CFSPID have gone on to develop signs and symptoms consistent with CF, yet other investigators have described the detection of such infants as concerning due to the associated parental anxiety and medicalization of children because many "may never develop the disease" [47,[103][104][105]. In fact, reflecting on parental anxiety in the setting of the COVID-19 pandemic, physician Danieli Salinas recalled that "a lot of them contacted us for guidance, asking specifically about CRMS being a pre-existing condition that would put their child at a higher risk of dying from COVID" [106].
To further complicate the situation, evidence suggests that some babies with CRMS/CFSPID are at increased risk of being diagnosed, perhaps as adults, with a CFTR-related disorder (CFTR-RD) [45,74,79,101,[107][108][109][110][111]. Distinct from CF, those with CFTR-RD typically have reduced CFTR protein function with symptoms in one organ system, resulting in conditions such as chronic pancreatitis, chronic sinusitis, or infertility due to congenital bilateral absence of the vas deferens (CBAVD) [112][113][114]. To add even more complexity, a recent retrospective study provided evidence that CF carriers are at significantly increased risk of 57 CF-related conditions, including pancreatitis, bronchiectasis, diabetes, and cholelithiasis, compared to controls [115]. While the individual risk for most of these conditions is low, Aaron Miller and colleagues noted in 2020 that "the morbidity attributable to the CF carrier state is likely substantial…given that there are more than 10 million CF carriers in the United States alone" [115]. Referring to this and other studies, Philip Farrell and colleagues suggested that genetic counseling following CF NBS "may eventually need to include the possibility that heterozygote infants have a higher risk of CFTR-RD conditions" [32,116]. Consistent with this suggestion, Maria Valeria Esposito and colleagues from Italy provided evidence that some CF carriers are living with "undiagnosed CFTR-RD" and suggested that more extensive genetic testing of carriers could help to identify such cases [117]. However, Farrell and colleagues felt that "incidental detection of carrier status in false-positive infants does not yet seem actionable for the child because of the low absolute risks and thus the expectation that most CF heterozygotes will be healthy -at least until later in life" [32].
In an attempt to further refine newborn screening for CF, some investigators have explored the use of next generation sequencing (NGS) technologies given their increased ease and affordability compared with other methods of DNA analysis such as Sanger sequencing [52,118]. For example, hoping "to address the shortcomings" of the IRT/DNA algorithm used in the Wisconsin NBS program, which targeted the 23 ACMG-recommended CFTR mutations, investigators carried out a retrospective study to evaluate an NGS assay capable of identifying 162 CFTR variants "for which clinical consequences have been described" [119]. Based on their analysis, they argued in a 2016 publication that an IRT/NGS algorithm could result in "high sensitivity" and "better specificity and positive predictive value" compared to the IRT/DNA algorithm in place at that time [119]. In April of 2016, Wisconsin switched to an IRT/NGS algorithm, in which the presence of at least one CF-causing variant (as determined by the CFTR2 project) prompted sweat testing. In infants with a sweat chloride over 30 mmol/L, the investigators would then reanalyze the NGS data by "removing preset panel restrictions and viewing all variants" [32]. In the same year, investigators from New York also explored the use of an NGS platform capable of detecting 139 mutations. While sensitivity was improved over the 39-mutation panel used in New York at that time, inequity across racialized groups still remained. As the investigators explained, "sensitivity was highest in Whites and lowest in the Black population" [120]. Thus, even with NGS, inequity was still a concern because the specific mutations targeted were more prevalent in individuals characterized as white.
While the Wisconsin and New York studies had used NGS to examine more than one hundred mutations, Martina Lefterova and colleagues in California and Texas reported in the same year that their CFseq NGS assay could identify a much larger set of variants in "all exons, flanking intronic regions, and key noncoding regions of the CFTR gene" [121]. At that time, more than 2,000 variants had been detected in the CFTR gene, although not all of them had been documented as diseasecausing. Yet in response to this study, and consistent with long-standing debates about the ideal algorithm for CF NBS, Lawrence Silverman of Virginia pointed to the potential drawbacks of incorporating next generation sequencing into CF NBS [122]. Given that the California NBS program for CF, which included Sanger sequencing, resulted in an increase in CRMS diagnoses, Silverman was concerned that the number of CRMS diagnoses could increase even further with NGS. As Silverman pointed out, "the existing problems associated with sequenced-based NBS should be addressed before using NGS in routine newborn screening" [122]. As noted earlier, it is possible to mask CFTR variants that are non-disease-causing or of unknown clinical significance using NGS technology, an approach some investigators have favored to address these complexities.
As investigators consider expanding newborn screening further, including the possible use of whole genome sequencing to detect a wider range of conditions, the history of NBS for CF can provide important insights about the advantages and drawbacks of various NBS algorithms. Furthermore, this history makes clear how the same outcome, such as the detection of individuals with CRMS/CFSPID, can be viewed as a benefit by some and as an iatrogenic harm by others. And, as the CF case shows, efforts to reduce uncertainty in one area can lead to new and persistent forms of uncertainty in another.