Early detection of cancer, part 3: Computer-aided degradation of screening mammography

It figures. On the very day that I posted a rather long post about a series of three papers discussing the use of mammography and MRI for screening women for breast cancer, there would have to be another paper relevant to the topic of the early detection of cancer, again in this case breast cancer. This one didn’t get as much play in the media, but it fits in very well with the primary messages of Part 1 and Part 2 of this series: that earlier detection is not necessarily better. This study, however, has a bit of a twist.

Now, despite the general tone of my commentary implying that newer, more sensitive technology like MRI might not necessarily be better for patients or produce better outcomes in terms of decreasing mortality due to breast cancer because of lead time and length bias, don’t misconstrue this as my being hostile to technology. I love technology. Indeed, there are times that I wish I had become a radiologist so that I could play with the high tech toys. However, I like science better, and I doubt I would have been able to devote the time necessary to learn how to play with all those high tech toys and still maintain an NIH-funded research laboratory. I also like evidence-based practice better, and I’d be more than happy to adopt screening MRI if it were shown to have a real benefit in terms of mortality or to use MRI with all cancer patients if it could be shown that it made a real difference in decreasing recurrence. Indeed, this message comes through again in another New England Journal of Medicine paper, entitled Influence of Computer-Aided Detection on Performance of Screening Mammography. You’d think that computer-aided detection on mammography would result in better performance and better identification of breast cancer.

You’d be wrong:

Background. Computer-aided detection identifies suspicious findings on mammograms to assist radiologists. Since the Food and Drug Administration approved the technology in 1998, it has been disseminated into practice, but its effect on the accuracy of interpretation is unclear.

Methods. We determined the association between the use of computer-aided detection at mammography facilities and the performance of screening mammography from 1998 through 2002 at 43 facilities in three states. We had complete data for 222,135 women (a total of 429,345 mammograms), including 2351 women who received a diagnosis of breast cancer within 1 year after screening. We calculated the specificity, sensitivity, and positive predictive value of screening mammography with and without computer-aided detection, as well as the rates of biopsy and breast-cancer detection and the overall accuracy, measured as the area under the receiver-operating-characteristic (ROC) curve.

Results. Seven facilities (16%) implemented computer-aided detection during the study period. Diagnostic specificity decreased from 90.2% before implementation to 87.2% after implementation (P<0.001), the positive predictive value decreased from 4.1% to 3.2% (P=0.01), and the rate of biopsy increased by 19.7% (P<0.001). The increase in sensitivity from 80.4% before implementation of computer-aided detection to 84.0% after implementation was not significant (P=0.32). The change in the cancer-detection rate (including invasive breast cancers and ductal carcinomas in situ) was not significant (4.15 cases per 1000 screening mammograms before implementation and 4.20 cases after implementation, P=0.90). Analyses of data from all 43 facilities showed that the use of computer-aided detection was associated with significantly lower overall accuracy than was nonuse (area under the ROC curve, 0.871 vs. 0.919; P=0.005).

D’oh!

It turns out that maybe computer-assisted reading of mammography, at least as it is practiced now, may not be such a great thing after all. In reality, I have to admit that I was just as surprised by the results of this study as any mammographer. After all, it makes intuitive sense that screening mammographs with a computer algorithm to highlight parts of the films that might be suspicious for closer examination by a radiologist would be likely to result in better interpretation of the study. Oddly enough, it does not, and this was the largest, most comprehensive analysis of computer-aided detection in breast cancer screening. Although it’s not a randomized trial, rather a longitudinal trial, it was controlled for confounding factors about as well as such a trial can be controlled. Given that it’s highly unlikely that there will ever be a head-to-head randomized, double-blind comparison between computer-aided detection and old-fashioned radiologist reading of mammography, this is as good as it gets as far as evidence-based medicine goes with regards to this particular question.

Key findings from this study are actually rather disturbing. My initial expectation upon hearing of the study was that the sensitivity of computer-aided mammography for detecting breast cancer would be greater. Such a result would then lead to exactly the same problem I mentioned with respect to MRI, specifically the detection of smaller tumors, with the resultant difficulties with the confounding factors of lead time and length biases making it difficult to determine whether the increased sensitivity saves lives. However, there was significant increase in sensitivity and the rate of cancer detection did not increase with computer-aided detection. Indeed, it was actually harmful, because the number of false-positive mammograms increased, resulting in more call-backs for more imaging and more biopsies. In other words, the downsides of increased sensitivity were realized without any of the benefits of actual increased sensitivity, namely an increase in the diagnosis of invasive cancers. In addition, computer-assisted detection resulted in an increase in detection of ductal carcinoma in situ (DCIS). The reason for this is because most DCIS is detected because of clusters of calcium deposits (known as clustered microcalcifications), which show up quite well on mammography and are more often associated with DCIS than cancer. The relationship between DCIS is not entirely clear. All invasive breast cancers probably arise from DCIS, but many, if not most, DCIS never progresses to invasive breast cancer within a woman’s lifetime. Because it is estimated that only 10% of the decrease in mortality linked to screening mammography is linked to the improved diagnosis of DCIS, identifying more DCIS very well may not significantly lower mortality from breast cancer. Moreover, this questionable benefit comes at a high cost:

“Our study suggests that this technology may not offer a benefit in the way people would have hoped,” Dr. Fenton told Medscape. The investigators explain that approximately 157 women would be recalled and 15 women would undergo biopsy to detect 1 additional case of cancer, possibly a ductal carcinoma in situ. After accounting for the additional fees for the use of computer-aided detection and the costs of diagnostic evaluations after recalls resulting from the use of the technology, the group calculates that system-wide use could increase the annual national costs of screening mammography by approximately 18%.

Here’s what I think is going on here, at least to some extent. As way of background, it’s known that, the more experienced the mammographer, the less computer-aided detection helps. Consequently, what is probably going on here is that the computer biases the mammographer to look more closely. Less experienced and less confident mammographers are going to be more likely to call an abnormality there. Even more experienced mammographers would find the call of the computer hard to resist. Indeed, this has been pointed out as a potential weakness of the study in an accompanying editorial:

One possible flaw in the study by Fenton et al. was the failure to assess the time it takes to adjust to computer-aided detection. Mammographers initially exposed to computer-aided detection may be unduly influenced by the three to four marks the software places on each mammogram, with the necessity to ignore the 1000 to 2000 false positive marks for every true positive mark. The adjustment to computer-aided detection has been estimated to take weeks to years.

This to me suggests a problem with the computer algorithm, making it too sensitive. Even if a mammographer has the wherewithal to decided, after looking at the vast majority of those extra marks, that there’s nothing there, it would take a very high ability to recognize false positives flagged by the computer to avoid calling at least some of them suspicious. It is certainly possible that improved software and imaging technology will decrease this problem, but I highly doubt that it will eliminate it. After all, what’s the selling point of most image analysis software is to pick these things up. Also, as long as more and more mammography facilities are purchasing digital equipment, it will be likely that image analysis software designed to assist in the analysis of the digital mammography images will be offered along with it. A computer-assisted system requires a digital image, and thus digital mammography and computer-assisted systems complement each other.

Indeed, none of this means that digital mammography or computer-aided detection is going away. Even though studies have in general failed to show an improvement in sensitivity and the ability to identify breast cancers for digital mammography alone, and the above study has shown serious problems with computer-assisted diagnosis, the software used has already been updated. Moreover, digital mammography is a boon to storying mammography studies without film and allows for teleradiology, which allows digitized images to be interpreted remotely by more experienced mammographers. These are not insignificant advantages.

Technology marches on, and I’d be surprised if in five years or so the problems with computer-aided diagnosis revealed in this paper will be mostly ironed out. However, this study is a cautionary tale about the introduction of new technology before it’s fully ready for prime time, something that happens a lot. Promising initial studies led to FDA approval of computer-aided detection systems in 1998, and Medicare and many insurance companies now pay extra for digital mammography and the use of these systems, leading to a financial incentive, albeit not very great given the high initial cost of these systems, for radiology groups to adopt them. Once again, it’s another example of newer and spiffier not necessarily meaning better.

Share this:

Related

By Orac

Discover more from RESPECTFUL INSOLENCE