|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Clinical Window Educational Program is sponsored by GE Healthcare
Clinical Window Web Journal complies with the HON code standard for trustworthy health information: verify here.
|
Automated Analysis of ECG Rhythm
Correspondence: Prof. Paul Kligfield, MD, Division of Cardiology, 525 East 68th Street, New York, New York 10021, (E-mail and other contact info can be obtained from CWWJ's Editor-in-Chief). The article also available in The electrocardiogram (ECG) records the electrical activity of the heart from the body surface, and it has been in general use worldwide for much of the past century (1). It is the first-line tool for the rapid determination of heart rhythm and for the immediate evaluation of, and guidance of therapy in, patients with suspected myocardial infarction and other types of acute coronary syndrome. The ECG is also used for the definition of intracardiac conduction, for the assessment of chamber hypertrophy and enlargement, and for the evaluation of many other aspects of cardiac physiology that affect its waveforms. It can be estimated from prior data that about 100 million 12-lead ECGs are done each year in North America, with similarly prevalent use also in Europe and in Asia (2). Most modern electrocardiographs convert the filtered analog ECG signal to digital form, now generally at 500 samples per second. Simultaneous acquisition of all 12 leads allows representative average or median complexes to be formed from a full 10 seconds of recorded data in each lead, which reduces noise and provides a template for analysis of each waveform. As a result of the advances in microprocessing that have evolved during the past four decades (3,4), digital electrocardiographs of all major manufacturers now are capable of providing automated diagnostic statements that can help the physician. However, at times automated diagnostic statements mislead the physician (5-7). Diagnostic algorithms Automated ECG analysis statements are based on evolving diagnostic algorithms that have been either statistical or rule-based (3). Interpretive statements may be based on measurements in individual leads or on waveform patterns and correlations. Interpretive statements that depend on precise measurement of ECG amplitudes and durations can approach experienced readers in sensitivity, specificity, and reproducibility (8,9). A good example is left ventricular hypertrophy, where ECG detection depends on criteria based on precise measurement of voltage amplitudes and interval durations. Indeed, some newer criteria for the detection of left ventricular hypertrophy depend on integrated measurements of waveform area that can only be made by computer. Similarly, precise measurements of waveform durations can improve the consistency of recognition of myocardial infarction, while newer algorithms for quantification of infarction may be strongly dependent on computer analysis for practical use. However, statements that depend on waveform configuration, such as repolarization, and relationships between waveforms, such as irregular P waves, may be less accurate (10,11). This is particularly true for the diagnosis of cardiac rhythm, which requires assessment of the presence and shape of P waves that often are small or embedded in unexpected parts of the ECG signal. As a consequence, evolving algorithms for the interpretation of cardiac rhythm have improved over time (9,11-14) but remain imperfect (11,15-18). How well do these programs work? Surprisingly, despite the widespread use of computerized electrocardiography, there have been relatively few published studies of the performance of currently available computer-based diagnostic rhythm algorithms in large, unselected patient populations (10,16-18). Improvement in the Hewlett-Packard algorithm for correct diagnosis of a range of arrhythmias in the eight years following 1975 was documented by Floro and Laks in 1983 (19). Using an earlier-generation Hewlett-Packard algorithm (version 5), Thomson et al (10) in 1989 reported a 96.6% sensitivity and 97.0% specificity for sinus rhythm. Although rhythms other than sinus were not separately quantified in that report, overall sensitivity for combined arrhythmia and AV block was 89.0%, with specificity of 90.5%, positive predictive value of 77.9%, and negative predictive value of 95.7%. Progressive improvement in test performance over three generations of GE/Marquette 12SL software to 1998 was found in a single center prospective study by Reddy et al (16), with the newest version having a specificity for sinus rhythm of 91% and a sensitivity for atrial fibrillation of 87.5%. More recently, Farrell et al, using version 20 of the GE software algorithm in a large population pooled from several teaching hospitals (17), found that 4.1% of primary rhythms were changed by the reviewing physicians during the course of routine review of the tracings. Sensitivity and specificity for sinus rhythm were 98.2% and 85.5%, respectively, while sensitivity and specificity for atrial fibrillation were 89.0% and 99.4%. Sensitivity for both rhythms was improved with comparable specificities in comparison with an older version of the rhythm algorithm. Our own examination We recently examined the accuracy of computer-based rhythm interpretation from one major manufacturer of automated electrocardiographs (GE Healthcare Technologies MUSE software 005C, 12 SL version 19) in 4297 consecutive recordings in a single university hospital setting (18). Over-reading was performed by one of two experienced cardiologists. All disagreements with the initial computer rhythm statement were reviewed by the second cardiologist to achieve consensus used as the "gold standard" for rhythm diagnosis. For analysis of the data, rhythm statements were separated into primary and secondary categories. "Primary" rhythm statements were defined as those describing the dominant cause of repetitive QRS activation; examples include sinus rhythm, non-sinus atrial rhythm, atrial fibrillation, AV junctional rhythm, and pacemaker rhythms. "Secondary" rhythm statements were defined as the modifiers of primary rhythms; examples include supraventricular premature complexes, ventricular premature complexes, and types of AV block. In our population of consecutive tracings, 13.2% (565 of 4297) of computer-based rhythm statements required revision (18). In this version of the 12SL program, there was particular difficulty with the detection of pacemakers and with the recognition of atrial tracking by dual chamber devices. Since this requires a separate engineering solution that is currently in development, attention was focused on rhythm diagnoses in patients without pacemakers. Excluding tracings with pacemakers, the overall revision rate was 7.8% (307 of 3954), including 3.8% involving the primary rhythm diagnosis and 3.9% involving definition of ectopic complexes. Of the changes required in primary rhythms in unpaced patients, one-third occurred in patients in sinus rhythm (48 of 151), while two-thirds (103 of 151) occurred in patients with rhythms other than sinus. There were no episodes of AV block that exceeded first-degree in this population. Sensitivity, specificity, and predictive values for primary rhythm diagnoses in this population are shown in Table 1. The false negative rate for sinus rhythm was only 1.3%, but a computer statement of sinus rhythm was incorrect in 9.9% of primary rhythms that were not sinus in origin, including patients with non-sinus atrial rhythm, atrial flutter, atrial tachycardia, atrial fibrillation, and AV junctional rhythm. The false negative rate for atrial fibrillation was 9.2%, while a computer statement of atrial fibrillation was incorrect in 1.1% of other rhythms, including patients with atrial tachycardia with varying AV block, sinus rhythm, non-sinus atrial rhythm, and atrial flutter. True atrial premature complexes were originally misinterpreted as ventricular in origin in 8.9% of cases, while true ventricular premature complexes were misinterpreted as atrial in origin in 6.0% of cases. Table 1: Sensitivity, specificity, and predictive values of a computer-based algorithm for the detection of primary ECG rhythms.
Reproduced from reference 18, with permission. The performance of computerized algorithms Do these findings indicate that performance of computerized algorithms for the interpretation of ECG rhythm is good, bad, or somewhere in between? The answer to this important question depends in part on what test characteristic is being considered, in part on the prevalence of the different rhythms in the population in which computer-based rhythm analysis is applied, in part on the consequences of a wrong diagnosis, and in part on what we mean by good and bad. What do these considerations tell us about how these tests perform in clinical practice? Sensitivity is the proportion of people with
a rhythm diagnosis who are correctly detected by the test; as sensitivity
decreases from 100%, the more the rhythm diagnosis is missed (false negative
tests). Specificity is the proportion of people without a rhythm
diagnosis who do not have that computer-based diagnosis; as specificity
decreases from 100%, the more the rhythm diagnosis is misapplied (false
positive tests). Positive predictive value represents the likelihood
that a rhythm statement is correct; as positive predictive value decreases
from 100%, the less certain the computer-based diagnosis is correct. Negative
predictive value represents the likelihood that absence of a rhythm
statement means the rhythm is not present; as negative predictive value
decreases from 100%, the less certain absence of the diagnosis means it
is not present. Atrial fibrillation. The prevalence of atrial fibrillation, atrial flutter, and atrial tachycardia in the population are 6.3%, 1.0% and 0.9% respectively. When prevalence of a rhythm is low, as is the case for all rhythms other than sinus, algorithm specificity must be very high to avoid a large absolute number of false positive diagnoses in the total population. The high, nearly perfect, specificities for these diagnoses is good. The moderate sensitivity for atrial fibrillation that results from keeping specificity high is not as good, while sensitivities for atrial flutter and atrial tachycardia are bad. Even so, the positive predictive values for atrial fibrillation and atrial flutter approach good, while all negative predictive values are very good. In effect, the computer-based rhythm algorithm appears to minimize the total number of incorrect rhythm diagnoses (and maximize the total number of correct rhythm diagnoses) by using highly sensitive criteria for high prevalence rhythms and highly specific criteria for low prevalence rhythms. The algorithm minimizes false positive diagnoses for the uncommon rhythms at the cost of a low sensitivity, but because of the low prevalence of these findings, the absolute number of false negative diagnoses are relatively few in the total population. This may reflect an effort to balance overall performance of the algorithm, whose performance for different rhythms ranges from good to bad. Individual areas for improvement are apparent, but altering criteria for any single rhythm may have unintended consequences for all diagnoses. Variations in population prevalence of rhythms Another factor to consider is the prevalence of each rhythm diagnosis within populations, which must vary considerably. For example, a healthy ambulatory population would likely have an even higher prevalence of normal sinus rhythm and lower prevalences of the other rhythms than in a general inpatient and outpatient population. An algorithm that simply labeled all rhythms as sinus, with no attempt to make other diagnoses, might be 98% accurate in that setting. Conversely, the prevalence of atrial fibrillation might be higher, and sinus rhythm correspondingly lower, in elderly nursing home populations. In that setting, an algorithm with rhythm statements limited to sinus only would be significantly less accurate, perhaps 80-85%, with a large proportion (and large absolute number) of missed diagnoses. Because the missed diagnoses are generally clinically meaningful, it is important to continue efforts to correctly identify the non-sinus rhythms. Variations in population prevalence of rhythms will alter both positive and negative predictive values, as well as overall test accuracy (which represents the proportion of correct diagnoses) for any diagnostic algorithm. As a result, the same algorithm might be considered good in one clinical setting, while not so good in another. Final conclusion In summary, there have been substantial recent improvements in computer algorithms used for the identification of cardiac rhythm. At the same time, enough clinically important errors in automated diagnosis exist that the computer remains an adjunct to the physician (20) for ECG interpretation, not a substitute. Physician over-reading to correct computer-based ECG rhythm statements remains essential, even as progress continues. References
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||