ROC Curves Analysis
Introduction
Receiver operating
characteristic (ROC) curves are used in medicine
to determine a cutoff value for a clinical test. For
example, the cutoff value of 4.0 ng/ml was determined
for the prostate specific antigen (PSA) test
for prostate cancer. A test value below 4.0
is considered to be normal and above 4.0 to be
abnormal. Clearly there will be patients with
PSA values below 4.0 that are abnormal (false
negative) and those above 4.0 that are normal
(false positive). The goal of an ROC curve analysis
is to determine the cutoff value.
Assume that there
are two groups of men and by using a "gold standard" technique
one group is known to be normal (negative), not
have prostate cancer, and the other is known
to have prostate cancer (positive). A blood
measurement of prostate-specific antigen is made
in all men and used to test for the disease. The
test will find some, but not all, abnormals to
have the disease. The ratio of the abnormals
found by the test to the total number of abnormals
known to have the disease is the true positive
rate (also known as sensitivity). The test will
find some, but not all, normals to not have the
disease. The ratio of the normals found by the
test to the total number of normals (known from
the gold standard technique) is the true negative
rate (also known as specificity). The hope is
that the ROC curve analysis of the PSA test will
find a cutoff value that will, in some way, minimize
the number of false positives and false negatives. Minimizing
the false positives and false negatives is the
same as maximizing the sensitivity and specificity.
For the PSA test
abnormal values are large (> 4) and normal
values are small (<4). This is not always
the case, however, so the present program allows
for both conditions of abnormal being larger
and abnormal being smaller.
The ROC curve is
a graph of sensitivity (y-axis) vs. 1 specificity
(x-axis). An example is shown in Figure 1. Maximizing
sensitivity corresponds to some large y value
on the ROC curve. Maximizing specificity corresponds
to a small x value on the ROC curve. Thus a
good first choice for a test cutoff value is
that value which corresponds to a point on the
ROC curve nearest to the upper left corner of
the ROC graph. This is not always true however. For
example, in some screening applications it is
important not to miss detecting an abnormal therefore
it is more important to maximize sensitivity
(minimize false negatives) than to maximize specificity. In
this case the optimal cutoff point on the ROC
curve will move from the vicinity of the upper
left corner over toward the upper right corner. In
prostate cancer screening, however, because benign
enlargement of the prostate can lead to abnormal
(high) PSA values, false positives are common
and undesirable (expensive biopsy, emotional
impact). In this case maximizing specificity
is important (moving toward the lower left corner
of the ROC curve).

Figure 1. An example ROC curve.
An important measure
of the accuracy of the clinical test is the area
under the ROC curve. If this area is equal to
1.0 then the ROC curve consists of two straight
lines, one vertical from 0,0 to 0,1 and the next
horizontal from 0,1 to 1,1. This test is 100%
accurate because both the sensitivity and specificity
are 1.0 so there are no false positives and no
false negatives. On the other hand a test that
cannot discriminate between normal and abnormal
corresponds to an ROC curve that is the diagonal
line from 0,0 to 1,1. The ROC area for this
line is 0.5. ROC curve areas are typically between
0.5 and 1.0 like shown in Figure 1.
Two or more tests
can be compared by statistically comparing the
ROC areas for each test. The tests may be correlated
because they occurred from multiple measurements
on the same individual. Or they may be uncorrelated
because they resulted from measurements on different
individuals. The ROC Curves Analysis Module
refers to this as "Paired" and "Unpaired", respectively,
and can analyze either situation.
The test measurements
may contain missing values and two methods are
provided to handle missing values when comparing
ROC areas pairwise deletion and casewise deletion. This
is described in detail later.
Given a value for
the probability that the patient has the disease
(pre-test probability) the probability that the
patient has the disease, given the value of the
test measurement, can be computed. Also, given
a value for the false-positive/false-negative
cost ratio (for the screening example above,
the false-negative cost would be greater than
the false-positive cost), an optimal test value
cutoff can be computed. The present program
allows entry of the pre-test probability and
the false-positive/false-negative cost ratio.
Data Entry
Data can be entered
in two formats in SigmaPlot Indexed and Grouped.
Indexed Data Format
This is the format
found in statistics programs such as SYSTAT and
SigmaStat. "Indexed" is the terminology used
in SigmaStat. It has one column that indexes
another column (or other columns). It is also
the format of the output of logistic regression
where ROC curves are used to determine the ability
of different logistic models to discriminate
negative from positive test results (normals
from abnormals). Each data set consists of a
pair of columns a classification variable and
a test variable. The classification variable
has a binary state that is either negative (normal)
or positive (abnormal). Many programs use a
value of 1 for positive and 0 for negative. The
classification variable is required to be located
in column 1 of the worksheet. The test variable
is a continuous numeric variable and contains
the test results. A single test variable will
be located in column 2. Multiple test variables
will be located in multiple columns starting
in column 2. There is no built-in limit for
the number of test variables. There is only
one classification variable for multiple test
variables and it is located in column 1. The
test variable columns must be left justified
and contiguous. Therefore no empty columns to
the left of or within the data are allowed.
The following example
shows a few rows of data for two data sets. The
first column is the classification variable. It
contains a column title "Thyroid Function" which
is the classification variable name. It also
contains the two classification states "Hypothyroid" and "Euthyroid" (normal
thyroid function). Hypothyroid and Euthyroid
are the abnormal and normal classification states,
respectively. T4 and T5 are the names of different
blood tests that will be used in the ROC analysis
to discriminate between normal and abnormal and
then compared to determine which is the better
test. The classification variable must be in
column 1 and the two test variables in the two
columns adjacent to it
The classification
variable name will be obtained from the column
1 column title if it exists. The test names
will be obtained from the column titles of the
test variable columns if they exist. The classification
state names will be obtained from the entries
in the cells of column 1. If no column titles
have been entered for the test variables then
default names for the tests, "Test 1", "Test
2", etc., will be used and displayed in the graphs
and reports. The test variable names should
be unique but the program will subscript any
identical names that are not.

Figure
2. Indexed data format for two tests. The test
names are T4 and T5, the classification states
are Euthyroid and Hypothyroid and the Classification
variable name is Thyroid Function. The index
column is always column 1 and data columns must
be left adjusted.
There must be two or more non-missing
data points for each test for each classification
state. Missing values are handled automatically
by the analysis. For data columns, missing values
are everything but numeric values (blank cells,
the SigmaPlot double-dash missing value symbol, "+inf", "-inf", "NaN",
etc.). Missing values are ignored for all computations
except the Paired area comparison (see the Missing
Value Method section) where they are handled using
one of two possible algorithms.
Grouped Data Format
The grouped data
format consists of pairs of data columns one
pair for each test. One column in a data pair
consists of the negative (normal) data values
and the other column for positive (abnormal)
values. So, for example, if two tests are to
be compared, the worksheet will contain four
columns of data the first two columns for the
first test and the third and fourth column for
the second test.
A specific column
title format is used to identify the test associated
with the data column pair and the classification
states within each pair. The user is encouraged
to use this format since it clearly identifies
the data in the data worksheet and will annotate
all the graphs and reports generated. It is
not necessary to use column titles as the program
will identify column pairs starting in column
1 with the generated test names Test 1, Test
2, etc., and will arbitrarily assign "1" and "0" classification
state names to the first and second columns,
respectively, but this is clearly not the best
way to organize the data. Since the test names
and classification states are numerical it is
also more difficult to interpret the results.
Column Title Convention for Grouped Data
This column title convention
is a simple way to identify worksheet data for
the Grouped data format. The following example
shows a few rows for two data sets. The first
two columns contain the data for the T4 test. The
first column "T4 - Euthyroid" is the column with
the normal data for test T4. The column title
consists of the test name followed by a minus sign
followed by the classification state. Spaces on
either side of the minus sign are ignored. The
second column "T4 - Hypothyroid" is the column
with the abnormal data for test T4. The third
and fourth column titles are the same as the first
two except the second test name T5 is used.

Figure 3. Grouped data
format for two tests. This is the same data as
in Figure 1. There are two tests T4 and T5. Each
test consists of a pair of data columns. In this
case T4 is in columns 1 and 2 and T5 in columns
3 and 4. The "Test-State" column title format
is used to identify the two tests and the normal
(Euthyroid) and abnormal (Hypothyroid) states.
The test names
in both columns of a column pair must be the
same. Also there must be exactly two classification
states in the column titles.
Like Indexed format,
missing values in the worksheet cells are ignored
except for special handling when comparing ROC
areas (see the Missing Value Method section).
Program Options
Selecting
ROC Curves from the SigmaPlot Toolbox menu opens
the dialog

Test and classification state
names from the indexed data shown in Figure 2 of
the Data Entry section are displayed in this dialog.
Data Selection Options
Data Format (Automatic Determination)
In most case the program will
identify the data format from the information in
the data worksheet. In the dialog above the format
was identified as Indexed. You may select from
the two formats Indexed and Grouped.
Available Data Sets Selected Data Sets
Select one or more of the available
data sets by clicking on them in the Available
Data Sets window and then clicking on the Add button. If
desired, you may then select a test name in the
Selected Data Sets window and click Remove to deselect
the test.

Data Type
If two or more data sets
are selected then the Data Type option for correlated
tests is made available

You may select
either Paired, for correlated tests, or Unpaired. If
Paired is selected the ROC areas and area comparisons
are determined using the DeLong, Delong and Clarke-Pearson
method(2). If Unpaired is selected
the areas are computed using the Hanley and McNeil
method(3) and the areas are compared
using a Z test.
Missing Value Method
If missing values exist then
two options are available for the pairwise comparison
of ROC areas Pairwise Deletion and Casewise Deletion.This option is not available if no missing values exist.

Pairwise deletion
only deletes rows containing missing values for
the particular pair being analyzed not for
an entire row of data. Fewer data values are
deleted using this method. There are situations
when pairwise deletion will fail but this is
the option to use when it is possible. Casewise
deletion deletes all cells in any row of data
containing a missing value. Much more data may
be deleted using this option. To better understand
the difference, consider a simple example of
two data columns of equal length one of which
has no missing values and the other has one missing
value. When ROC areas are being compared, certain
computations on these two columns will be done
pairwise the first column with itself, the
first column with the second column and the second
column with itself. When the column without
a missing value is being compared with itself
no row deletions occur for pairwise deletion. For
casewise deletion, however, the row that contains
the missing value will be deleted from both data
sets. So, for casewise deletion, the computation
involving the column without a missing value
with itself will be done with one row deleted
(the row corresponding to the missing value in
the other data set). The program determines
when pairwise deletion is not valid and informs
the user when this is the case.
Positive State Options - Classification State and Direction
The two classification
states are referred to as "Negative" (normal)
or "Positive" (abnormal). The ROC analysis software
must be informed which state is "Positive" and
whether the test measurement values for the positive
state are "High", meaning higher than those of
the negative state, or "Low", meaning lower than
those of the negative state.
Accepted normal
values for the PSA (prostate specific antigen)
test are less than 4 ng/ml and abnormal values
are higher than this. Thus if the two classification
states names are "positive" and "negative" then
the Positive state is "positive" and the Positive
Direction is "High". In this case you would
select the radio button next to "positive" and "High".

On the other hand, for the T4
(thyroxine) test for hypothyroidism the T4 values
are lower in the abnormal state than for the normal
state. In this case the abnormal Positive State
is "Hypothyroid" and the Positive Direction is "Low". So
you would select the radio button next to "Hypothyroid" and "Low".

What happens if you select the
incorrect option? Sensitivity (specificity) is
defined in terms of the positive (negative) state. So
if the positive state is incorrectly selected then
sensitivity and specificity will be incorrectly
defined (switched) and the ROC curve will have
the X and Y axes switched. This will result in
an ROC curve that appears below the diagonal unity
line. It will have an area less than 0.5. The
program will detect this and give you the options

It is possible that there is
something wrong with the data so you can Abort
the analysis and correct the problem. More likely
you have selected the incorrect positive state
or direction so you can Retry the analysis with
correct selections. In rare occasions for multiple
tests some tests will have areas greater than 0.5
and one or more will have areas less than 0.5. In
this case you can Ignore this warning and continue
with the analysis.
Report Options
Confidence Intervals
Confidence intervals are computed
for statistics in both the Sensitivity & Specificity
and Area Comparison reports. You can generate
90, 95 and 99% confidence intervals.
Create Sensitivity and Specificity Report
Cutoff values are created between
each test data value in the (sorted) data set. If
there are a large number of data points and several
tests then there will be a large number of cutoff
values and the Sensitivity & Specificity Report
can be very long. The checkbox

allows you to turn off this
report. If you turn off this report then all report
options in the dialog below this are not required
and are disabled.
Fractions/Percents
You may display sensitivities,
specificities and probabilities in either fraction
or percent format. Selecting Percents also requires
the pre-test probability to be entered as a percent.
Create Post-Test Results
Selecting this option allows
entry of the pre-test probability. It also enables
the possible entry of the false-positive/false-negative
cost ratio. Given a pre-test probability the program
will create post-test probabilities, both the positive
predictive value (PV + = probability of disease
given a positive test result) and the negative
predictive value (PV - = probability of no disease
given a negative test result), for each cutoff
value. If the cost ratio option is selected then
the optimal cutoff value will be computed. All
of these results are displayed for each test in
the Sensitivity & Specificity report.
ROC Graph Options
All of the graph options in
the dialog apply to the ROC graph. They allow
you to add a diagonal line to the graph, add grid
lines, add symbols for sensitivity and specificity
at each cutoff point and change the ROC plot lines
from solid to different line styles.
Analysis Results
Introduction
Typical results of the ROC analysis
are shown in the following example from the Notebook
Manager.

The first section entitled "Ovarian
Cancer" contains the worksheet containing the raw
data. The program created the next three sections
that contain two graphs and two reports. The contents
of the two graphs
- ROC Curves
- Dot Histogram
- and two reports
- Sensitivity & Specificity
- ROC Areas
are described in
the next sections.
ROC Curves Graph
The ROC curves graph for three
data sets is shown in Figure 4. These graphs are
derived from numerical results in the worksheet
entitled Graph Data. The graph title is obtained
from the section name containing the raw data. The
legend shows the test names and the ROC areas for
each curve. The diagonal line and grids options
were selected for this graph.

Figure 4. The ROC curves
graph for three tests.
Of course this graph can be
edited in any way you wish. You might want to
change the starting color of the color scheme used
for the line colors. You can do this by double
clicking on one of the ROC plot lines and then
right clicking on the Line Color listbox as shown
next.

Dot Histogram Graph
Dot histograms for the data
associated with the ROC curves in Figure 4 are
shown in Figure 5.

Figure 5. Dot histogram
pairs for each test. The horizontal lines and
the tables below the graph show the optimal cutoff
values determined from the pre-test probability
and cost ratio.
The graph title
is obtained from the title of the section containing
the raw data. The x-axis tick labels are obtained
from the test names and the classification state
names. The tick labels will rotate if they are
too long to fit horizontally. The symbol layout
design allows for symbols to touch horizontally
and nest vertically.
If values for pre-test
probability and false-positive/false-negative
cost ratio are entered then the optimal cutoff
values for each test are computed and represented
as a horizontal line across the two dot histograms
for each test. The numeric values for the optimal
cutoff parameters are shown as tables below the
x-axis.
Sensitivity & Specificity Report
The sensitivity & Specificity
report contains results for all tests with additional
tests results placed in report rows below those
of prior tests. The results for each test can
be separated into three parts: 1) optimal cutoff
value, 2) sensitivity and specificity versus
cutoff values and 3) likelihood ratios and post-test
probabilities.
If values for
both pre-test probability and cost ratio have
been entered then the optimal cutoff is calculated. A
slope of the tangent to the ROC curve m is
defined in terms of the two entered values (P
= pre-test probability)(1)
(1)
The optimal cutoff
value is computed from sensitivity and specificity
using the slope m by finding the cutoff
that maximizes the function (1)
(2)
The results of this computation in the Sensitivity & Specificity
report are shown in Table 1.

Table
1. Optimal cutoff results in the Sensitivity & Specificity
report.
For this data set,
the optimal cutoff is 7.125 for a pre-test probability
of 0.5 and cost ratio of 1.0.
Sensitivities,
specificities and their confidence intervals
are listed as a function of cutoff value in the
second part of the report. A portion of these
results is shown in Table 2. These results can
be expressed as fractions or percents by using
the Fractions/Percents option.

Table
2. Sensitivity and specificity results in the
Sensitivity & Specificity report.
The third part of the Sensitivity & Specificity
report contains the likelihood ratios and post-test
probabilities.
The positive and negative
likelihood ratios are defined respectively as
(3)

(4)
The post-test probabilities
are the probability of disease given a positive
test (PV+) and the probability of no disease given
a negative test (PV-). These will be computed
when a pre-test probability has been entered. Using
P = pre-test probability, the equations used for
these probabilities are
(5)
(6)
A portion of the report showing the likelihood and post-test probabilities results
is shown in Table 3.
Table
3. Positive and negative likelihood ratios,
LR+ and LR-, and post-test probabilities, PV+
and PV-, in the Sensitivity & Specificity
report.
The positive likelihood ratio
is not defined for some cutoff values since specificity
= 1.
ROC Areas Report
The ROC Area report consists
of two parts: 1) ROC areas and their associated
statistics and 2) pairwise comparison of ROC areas. An
example of a report is shown in Table 4.

Table
4. An example ROC Areas report. From top to
bottom it shows the type of analysis used together
with the missing value method, the ROC areas
and associated statistics and a pairwise comparison
of ROC areas.
In this case there
are three correlated tests. Row two of the report
shows that a Paired Analysis was performed and,
since there were missing values in the data,
Pairwise Deletion of missing values was selected
to compare the areas.
The first section
of the report shows the ROC curve areas for the
three tests. This is followed by the standard
error of the area estimate, the 95% confidence
interval (90% and 99% are also available) and
the P value that determines if the area value
is significantly different from 0.5. The sample
size and the number of missing values for each
classification state are given. The number of
missing values reflects only what is seen in
the data and does not give the number used for
each computation-pair in the pairwise-deleted
comparison of areas.
The second section
shows the results of the pairwise comparison
of areas. The method of DeLong, DeLong and Clarke-Pearson(2) is
used to compare areas when the Paired data type
option is selected. When the Unpaired data type
is selected, areas are compared using a Z test. The
report shows results for all pairs of data sets. The
difference of each area pair and its standard
error and 95% confidence interval are computed. This
is followed by the chi-square statistic for the
area comparison (or Z statistic if Unpaired is
selected) and its associated P value.
Formatted Full Precision Display
This report presents the numeric
results in a four significant digit format with
full precision available. Double click on any
cell (except the confidence intervals) to display
the number at full precision.
Additional Graphs
Results data in both reports
can be used to create additional graphs. Some
examples seen in the literature are shown here.
Sensitivity and Specificity vs. Cutoff
The data for the graph in Figure
6 is from the Sensitivity & Specificity report
in columns 1, 2 and 4. Use the Data Sampling option
in Graph Properties, Plots, Data to specify the
row range for the graph (you can also drag select
the rows in the worksheet to do this).

Figure 6. Graph of sensitivity
and specificity vs. cutoff for one test using data
from columns 1,2 and 4 of the Sensitivity & Specificity
report.
Likelihood Ratios
The positive and negative likelihood
ratios for three different imaging modalities are
shown in Figure 7 (the data is artificial). The
data is in columns 1, 6 and 7 of the Sensitivity & Specificity
report. The values associated with the optimal
cutoff are shown as solid symbols. The largest
positive likelihood and smallest negative likelihood
at the optimal cutoff is associated with magnetic
resonance imaging (MR).

Figure 7. Positive and
negative likelihood ratios graphed from data in
the Sensitivity & Specificity report from columns
1, 6 and 7. The results for three tests are shown
together with values associated with the optimal
cutoff (solid symbols).
Optimal Cutoff vs. Cost Ratio
Frequently it can be difficult
to determine a value for the false-positive/false-negative
cost ratio. So it is worth performing a sensitivity
analysis (sensitivity here means how much one variable
changes with changes in a second variable) to see
whether the cutoff value changes significantly
in the range of cost-ratio values of interest. The
ROC Curves Module was run multiple times for different
cost ratios and a graph of optimal cutoff vs. cost
ratio for the three imaging modality tests is shown
below.

Figure 8. Optimal cutoff
values obtained from multiple runs of the program. Regions
of insensitivity, or strong sensitivity, to cost
ratio can be identified.
If the relative cost of a false-positive
is much greater than that of a false-negative then
the cost ratio is greater than 1. But lets assume
that we don´t know exactly how much greater it
is but have some idea that it should be in the
range of 2 to 5, say. Looking at the optimal cutoff
for the best imaging modality (MR, green line)
we find that it doesn´t change for cost ratios
from 2 to 20. So the optimal cutoff is insensitive
to cost ratio and, in this case, it is not important
to know a precise value for cost-ratio.
Post-Test Probability vs. Pre-Test Probability
Given values of sensitivity
and specificity associated with the optimal cutoff
a graph of post-test probabilities as a function
of pre-test probability can be created using equations
(5) and (6). The post-test probability of disease
when the test is positive, blue lines in Figure
9, was obtained from equation (5) and the post-test
probability of disease when the test was negative,
red lines, was obtained from 1.0 minus equation
(6). A transform was written in SigmaPlot implementing
these two equations that generated the post-test
probabilities for a range of pre-test probabilities. The
results for the best test, MR, and worst test,
US, are shown. The MR test is clearly better since
the post-test probability range, from negative
test to positive test, is larger. Thus given a
positive test the patient is more likely to have
the disease using the MR test rather than the US
test. Similarly, given a negative test it is less
likely that the patient has the disease using the
MR test.

Figure 9. Post-test
probabilities of disease given positive and negative
test results. The MR test is based on sensitivity
= 0.94 and specificity = 0.97 whereas the US test
used sensitivity = 0.78 and specificity = 0.85.
References
- Zweig, MH, Campbell,
G. Receiver-operating characteristic (ROC)
plots: A fundamental evaluation tool in clinical
medicine. Clin Chem 1993;39/4, 56-577.
- DeLong, ER,
DeLong, DM, Clarke-Pearson, DL. Comparing
the areas under two or more correlated receiver
operating characteristic curves: a nonparametric
approach. Biometrics 1988;44,
837-845.
- Hanley,JA, McNeil, BJ.The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology
1982, 143, 29-36.
|