 |
Click
here to download a PDF of
the PowerPoint Slides used in the July 2010 Webinar:
"Statistical Analysis with SigmaPlot" presented
by Systat Software. |
SigmaPlot is now bundled
with SigmaStat as an easy-to-use package for complete
graphing and data analysis. The statistical functionality
was designed with the non-statistician user in mind.
This wizard-based statistical software package guides
users through every step and performs powerful statistical
analysis without having to be a statistical expert.
Each statistical analysis has certain assumptions that
have to met by a data set. If underlying assumptions
are not met, you may be given inaccurate or inappropriate
results without knowing it. However, SigmaPlot will
check if your data set meets test criteria and if not,
it will suggest what test to run.
=
New Feature added in SigmaPlot Version 11
One-Sample
t-test 
Feature Description – The one-sample t-test is used to test the hypothesis that the mean of a sampled normally-distributed population equals a value specified by the user. SigmaStat at the time had no one-sample testing except for normality.
The menu and test combo box will be modified to include a command for this test. The menu command will be on a submenu under a new test category called Single Group. The unpaired t-test that is currently in SigmaPlot is simply called t-test, and this name will kept. The one-sample t-test will have its own options that are set from the Test Options dialog. The first panel will provide an edit control option for entering the value of the hypothesized population mean. The remaining options in the dialog will be a subset of the options available for the two-sample test. The Test Wizard for the one-sample case will provide the same data format options as the two-sample case except for the Indexed format, which makes no sense when you have one sample. In addition to producing a report, there will be three result graphs that are a subset of those produced for the two-sample t-test.
Computational Results
Hypothesis Testing
The null hypothesis is that the mean of the sampled
population equals the user supplied value. The sample
mean of the selected data is compared with the hypothesized
population mean supplied by the user by computing:

By random sampling of the population, assuming the
null hypothesis is true, this quantity defines a
random variable T, whose distribution is Student's
central T-distribution with n -1 degrees of freedom.
The (two-sided) P-value for this test is computed
as P(|T| > |t|), where P denotes
the probability distribution for T. This P-value
is then compared to the significance level α that
is set by the user. If the value is less than α,
there is a significant difference between the mean
of the sampled population and μ
Confidence Interval for the Population Mean:
The (1- α)100% confidence interval for
the true population mean is

where and are
defined above, and is
the value that satisfies P(|T|> )
= α.
Retrospective Power
It may be of interest to know the power of the test
based upon the difference in means that was observed
from the sample. This will not assist, however, in
reaching conclusions about the significance results
since you already know the test is not powerful enough
if you failed to reject the null hypothesis. For
retrospective, or observed, power one must keep in
mind that it is not simply defined as the probability
of rejecting a hypothesis that is known to be false,
but as the probability of detecting the difference
that was actually observed in the data.
To compute the power, the critical value of
Student's central T distribution is computed
for the given significance level α. This
value is found by solving the equation P(|T|> )
= α. Now, let be
a random variable whose distribution is Student's
non-central distribution with n-1 degrees of freedom
and with non-centrality parameter equal to .
Then .[top]
Analysis
of Variance
- Independent and paired t-tests
- One, two and three-way ANOVA
- One and two-way repeated measures ANOVA
ANOVA Profile Plots – Used to analyze
the main effects and higher-order interactions of
factors in a multi-factor ANOVA design by comparing
averages of the least square means.
Examples of Profile Plots for a 3-Way ANOVA design:
1. Main
Effects Plots

| 2. Two-Way
Effects Plot |
3. Three-Way Effects Plot |
 
[top]
Non-Parametric
Statistics
- Mann-Whitney rank sum test
- Wilcoxon signed-rank test
- Kruskal-Wallis ANOVA
- Friedman repeated measures ANOVA [top]
Correlation
- Spearman rank-order
- Pearson product-moment [top]
Regression
- Linear and Multiple Linear
- Polynomial (up to 10th order)
- Stepwise (forward and backward)
- Best Subsets
- Multiple logistic
[top]
Rates
and Proportions
- Chi-square analysis of contingency tables
- McNemar test
- Fisher's exact test [top]
Power
and Sample Size
- t tests and proportions
- ANOVA and correlation
- Chi-square [top]
Descriptive
Statistics
- Mean, median, standard deviation, standard error
of mean
- Percentiles, sum of squares
- Skewness, kurtosis
- Confidence interval for the mean range
- Maximum and minimum values
- Normality, sample size and missing value content
- Size, sum, standard error, skewness, minimum
positive, number of missing values [top]
Normality
- Kolmogorov-Smirnov
- Shapiro-Wilk [top]
Feature Description – The Shapiro-Wilk's
normality test will be added to both SigmaStat and
SigmaPlot to determine if either worksheet data or
the residuals that result from curve fitting are
consistent with data drawn from a normal distribution.
Normally distributed data is a principal assumption
when using many of the tests in SigmaStat and in
the non-linear regression analysis of SigmaPlot.
The current normality test in both programs is based
upon the Kolmogorov-Smirnov (KS) statistic that computes
the maximum difference between the sample cumulative
distribution function of the data and the theoretical
normal distribution having the same population parameters
as the data.
The main advantage of this method is that the distribution
of the statistic is independent of the underlying
theoretical distribution (so long as it is continuous).
For example, the KS –statistic could be used to test
data against a gamma or Weibull distribution. The
primary disadvantage of the method is the assumption
that the population parameters are known. In practice,
these parameters are only estimated from the data
by using the sample mean and the (unbiased) sample
variance. To compensate for this lack of information,
simulation studies by Lilliefors and Wilkinson have
yielded a correction for normal distributions that
is used by both programs. Another disadvantage is
that for small to moderate sample sizes, the test
has difficulty discriminating between distributions
that are roughly similar (low power).
The Shapiro-Wilk's so-called W-statistic is specifically
designed for the normal distribution and has higher
power than Kolmogorov-Smirnov. The statistic is a
ratio of two estimates of the variance of a normal
distribution based on a random sample. The numerator
of W is proportional to the square of the best linear
estimator of the standard deviation, and the denominator
is the sum of squares of the observations about the
sample mean. The main limitation of the method is
that the sample size is restricted to values between
3 and 5000, inclusive.
[top]
Equal
Variance
ANOVA
Multiple Comparison Options
- Holm-Sidak test
- Tukey test
- Duncan's multiple range test
- Fisher's least significant difference test
- Student-Newman-Keuls test
- Bonferroni t test
- Dunnett's test
- Dunn's test [top]
Survival
Analysis
- Kaplan-Meier product-limit estimation method
- Survival curve options: error bars,confidence
intervals, censored or failure points, fraction
or percentage scale
- Single group
- LogRank
- Gehan-Breslow
- Cox Regression [top]
The Cox Regression feature in SigmaStat and
SigmaPlot consists of two separate analyses, the Proportional
Hazards model, with no stratification variable,
and the Stratified Model, where the user selects
a worksheet column containing the strata. Each test
is accessed from the Statistics menu on a submenu under
the item Survival – Cox Regression. For each
test, the user selects a time column, status column,
and any number of covariate columns from the worksheet.
The user can subsequently designate which selected
covariates should be interpreted as categorical. When
the analysis is completed, a report will be generated
to provide the numeric results. Result graphs will
also be available for obtaining covariate-adjusted
survival curves, cumulative hazard functions, and log-log
survival functions (discussed below). Various options
for controlling the regression process, displaying
report results, and for setting result graph attributes
can be set in the Test Options dialog box. The Advisor
wizard has also been modified to suggest the usage
of these tests.
Cox Regression – Includes the proportional-hazards model with stratification to study the impact of potential risk factors on the survival time of a population. Input data can be categorical.
Examples of Cox Regression Result Graphs:
 
[top]
Cox
Regression  Background and Terminology – Cox Regression is a part of Survival Analysis that studies the impact of potential risk factors on the survival time of a population. The risk factors are often called covariates, predictors, or explanatory variables. We will use the term covariates in our applications. As an example, consider the possible effects of Sex, age and two types of drug therapy on the survival of a population suffering from some form of cancer. The survival time may decrease as age increases. Death rates among males may be higher than for females.
Finally, drug A may increase survival time more than drug B. In this study, Sex, Age and Drug Therapy are the covariates that affect the survival experience. In Cox Regression, a model is defined that describes the relationship between the covariates and survival time. This model is then used to predict the likelihood of survival at each point in time for any values of the covariates. It also allows us to determine the significant effect of each covariate.
There are two types of covariates. The above covariates, Sex and Drug Therapy, each have two categories of non-numeric values and are called categorical covariates. Since the covariate Age can assume a continuous range of numeric values, it is called a continuous or nominal covariate. Frequently, a categorical covariate has numeric values assigned to its categories but these values are only used for naming purposes and are not used to indicate a measurement.
The simplest way to visualize the effect of covariates on survival time is to construct a survival curve. A survival curve plots the relationship between each value of time and the probability of surviving beyond that value. This relationship is called the survival function (or survivorship function). In Kaplan-Meier survival analysis, one survival function is defined that is independent of any covariates. In Cox survival analysis, specific values for each of the covariates lead to one estimated survival function for the population. The graph of such a function is called a covariate-adjusted survival curve.
In Cox Regression, the primary object of study is the hazard function of the population, as
estimated from the sampled survival data. This function
is closely related to the survival function. The
hazard function (sometimes known as the conditional
failure rate, hazard rate, or just the hazard)
is defined as the instantaneous rate of change in
the likelihood of failure at each point in time,
given survival up to that point.
As an example, suppose h is the hazard function
and suppose h(t) = .1 at some time t,
then an interpretation of this value is that there
is approximately a 10% chance that a subject will
fail within the next unit time period, given the
subject has survived up to time t. Another function,
the cumulative hazard function, is defined
at each value of time as the integral of the hazard
over all previous values of time. It provides a smoothed
alternative to the hazard function as estimates of
the hazard function itself can be too "noisy" for
practical use. If H denotes the cumulative
hazard function, then the above definitions can be
used to show that the survival function S is
defined at each time t by:
S(t)
= exp(-H(t)).
All of the functions discussed above are not only
functions of time, but also depend upon the covariates
in the survival study. In the Cox model, the hazard
function assumes a specific form given by:
h(t,
X1, X2, . . ., Xn)
= h0(t) . exp(b1X1 +
b2X2 + . . . +
bnXn)
where X1, X2, . . ., Xn are
the covariates in the study. The function h0 is
called the baseline hazard function and only
depends upon time. The exponential factor on the
right-hand side of the equation involves the covariates,
but does not depend on time. In our implementation
of Cox Regression, we are assuming that every covariate
is time-independent and so its value for each subject
remains constant over time (it is possible, however,
to extend Cox Regression to include time-dependent
covariates).
The coefficients b1, b2,
bn in our model are constants, independent
of both time and the covariates, and their values
are determined from the regression analysis by
maximizing a quantity known as the partial likelihood
function. The resulting values of the coefficients
are called the best-fit coefficients or,
sometimes, the maximum likelihood estimates.
Once the coefficients are determined, there is
a procedure that estimates the values of the baseline
survival function at the sampled event times.
The baseline survival function is defined by setting
all covariates to zero. Denoting this function
by S0, the covariate-adjusted
survival functions and cumulative hazard functions
are determined for each event time t by:
H₀(t)
= -log(S₀(t))
H(t,
X₁,…,
Xn) = H₀(t)exp(b ₁X ₁+ ⋯ +bnXn)
S(t,
X₁,…,
Xn) = S ₀()exp(b ₁X ₁+ ⋯+bnXn)
Our model of the hazard function shows that if there
are two specifications for the values of the covariates,
then the corresponding values of the hazards are
proportional over time. This is the reason the Cox
model is called a proportional hazards model.
It is possible that a potential covariate for the
model does not satisfy this assumption.
For example, suppose we have the covariate Sex in
a survival study. If males are dying at twice the
rate of females during the first month of a study,
and both Sexes die at the same rate during the next
month of the study, then the ratio of the hazards,
or the hazard ratio, for males to females
is not constant over time and the proportionality
assumption fails. Such a covariate cannot be included
in the hazard model.
A covariate may also be omitted from the model because
its value is based on the design of the study and
has secondary importance as a risk factor for survival.
For example, when a study is performed at two different
clinics to determine the impact of age and drug therapy
on patient recovery, then the variable Clinic is
such a covariate.
Any variable whose values have been included in
the survival data but is not included as a covariate
in the hazard model for the reasons described above
is called a stratification variable. Each
value or level of such a variable is called a stratum;
collectively, the levels are the strata.
When a stratification variable is present, then
the survival study is partitioned into groups, one
for each stratum, where each group has its own survival
function that is determined from the regression analysis.
The best-fit coefficients are the same for each stratum,
but the baseline time-dependent factors in the model
are different.
Related Documents and Texts
- Hosmer, D.W. Jr. and
Lemeshow, S. (1999). Applied Survival Analysis
– Regression modeling of time to event data.
New York: John Wiley & Sons.
- Kleinbaum, David G.
(1996). Survival Analysis – A Self-Learning Text. Statistics
in the Health Sciences series, Springer-Verlag,
New York.
- SurvivalGuide.pdf – internal document.[top]
Test-Specific
Options
- Assumption checking for normality, equal variance
and auto-correlation of residuals (Durbin-Watson)
- Confidence and Prediction Intervals
- DFFits, leverage and Cook's Distance
- Varience Inflation Factor for Multicollinearity
- Set alpha for power calculation
- Set significance level for multiple comparisons
[top]
Regression
Wizard
- Linear and nonlinear regressions
- Over 100 built-in, graphically-illustrated equations
- Marquardt-Levenberg algorithm with up to 10 independent
variables and 100 parameters
- Define constraints, tolerance, step size and
iterations
- Automatically determines your initial parameters
- Writes a complete statistical report to your
SigmaPlot Notebook
- Automatically graphs your results on new or existing
graphs
- Edit code so you can customize the SigmaPlot
library of functions or create your own
- Specify the range for the predicted values output
by curve-fitter [top]
Graphical Linear Regression
- Automatic Linear Regressions
- Up to 10th order with confidence and prediction
intervals and regression statistics [top]
Column Statistics
- Column Statistics Generated Automatically
- Size, sum, mean, minimum, maximum, standard deviation, standard error, skewness, minimum positive, number of missing values and 95% & 99% confidence intervals [top]
Global
Curve Fitting 
- Perform simultaneous fitting of multiple data
sets using a single fit equation
- Optionally share one or more equation parameters
across all data sets
- Several data formats are available to represent
your data
- Creates a graph containing the raw data and the
fit curves for each data set
- Creates a report with numeric results for each
data set [top]
Minor New Statistical Tests
- One-Sample
T-test – Tests the hypothesis that the
mean of a population equals a specified value.
- Odds
Ratio and Relative Risk tests – Both tests
the hypothesis that a treatment has no effect
on the rate of occurrence of some specified event
in a population. Odds Ratio is used in retrospective studies
to determine the treatment effect after the event
has been observed. Relative Risk is used in prospective studies
where the treatment and control groups have been
chosen before the event occurs.
- Shapiro-Wilk
Normality test – A more accurate test than
Kolmogorov-Smirnov for assessing the normality
of sampled data. Used in assumption checking
for many statistical tests, but can also be used
directly on worksheet data.[top]
Dynamic
Curve Fitting Wizard
- Performs multiple fits of a single equation to
a data set using several sets of starting parameter
values randomly selected from specified ranges
- Improves the likelihood of obtaining the global
minimum solution
- Parameter ranges can be user-defined or computed
automatically
- Worksheet results provide several statistics
and performance measures for each fit
- Creates a Dynamic Fit Profile plot that summarizes
the performance of all convergent fits
- Creates a graph of the raw data with the fit
curve that corresponds to the overall best-fit
parameters
- Creates a report containing summary information
for all fits and detailed numeric results of the
overall [top]
24
New Probability Transforms 
- Gamma, Weibull, Cauchy, Error, LogNormal, Exponential, Logistic, LogLogistic [top]
New Probability Transforms
Feature Description – Seven new sets of probability
functions will be added to the Transform language
for computing the values of cumulative distribution
functions and their inverses, and for computing probability
density functions. In addition, we will add two other
functions frequently used in statistical calculations:
the error function and the complementary error function.
Finally, a function for computing the median of a
column of data will be added.
Like all transform language functions, these functions
can be accessed from the User-defined Transform dialog,
the Quick Transform dialog, SigmaPlot´s automation
interface (macros), the Nonlinear Regression Wizard,
and SigmaPlot´s Plot Equation dialog.
General Definitions of Probability Functions:
- Cumulative Distribution Function (CDF) – If X
is a random variable with respect to a particular
probability measure P, the CDF(x) is the probability
that the values of X are less than x, i.e. CDF(x)
= P(X < x). For a continuous random variable,
CDF(x) can be computed as the indefinite integral
of the probability density function (if it exists).
- Inverse Cumulative Distribution Function – This
function computes the value x of the random variable
X that yields a specified probability value P for
the CDF. That is, Inverse CDF(p) = x such that
P(X <x) = p.
- Probability Density Function (PDF) – For a continuous
random variable, the derivative of the CDF, if
it exists. In this case, the probability that the
values of the random variable lie within a small
interval can be estimated by the product of the
density at some point in the interval and the size
of the interval. For a discrete random variable,
PDF(x) = P(X=x). For sampled data, the density
function is approximated by a histogram.
Some applications of the functions being added:
A cumulative distribution function, an inverse distribution
function, and a density function will be added for
each of the three families of probability distributions
below.
|
Function Family |
Use |
Parameters |
|
Gamma |
Describes the distribution of time until the nth occurrence in a Poisson process. |
Two positive parameters – a shape parameter and a scale parameter. Setting the shape parameter to 1 yields the exponential distribution. |
|
Weibull |
Describe
the failure time distributions when the failure
rate is assumed to increase as some power. |
Two positive
parameters – a shape parameter and a scale
parameter. |
|
Cauchy |
Gives
the distribution of the ratio of two standard
normal random variables.
Also gives
the distribution of the random variable
Y = tan(X), where X has a uniform distribution. |
Two parameters
- a location parameter and a positive scale
parameter. |
|
Lognormal |
Gives the
distribution of the random variable Y = exp(X),
where X has a normal distribution. |
Two parameters
- a location parameter and a positive scale
parameter. They are the mean and standard
deviation of the underlying normal distribution. |
|
Exponential |
A special
case of the Gamma distribution. Gives the
distribution of time until the first occurrence
in a Poison process. |
One positive
scale parameter. |
|
Logistic |
Similar
in shape to the normal distribution, but
with wider tails and is easier to compute |
Two parameters
- a location parameter and a positive scale
parameter. |
|
LogLogistic |
Gives the
distribution of the random variable Y = exp(X),
where X has a logistic distribution. |
Two parameters
- a location parameter and a positive scale
parameter. |
Mathematical Descriptions
Some of the functions below are expressed in terms
of the gamma function:

|
Gamma Cumulative Distribution Function

|

|
|
Weibull Cumulative Distribution
Function

|

|
|
Cauchy Cumulative Distribution
Function

|

|
|
Error
Function

|

|
|
Complementary
Error Function

|

|
|
Lognormal
Cumulative Distribution Function

|

|
|
Exponential
Cumulative Distribution Function

|

|
|
Logistic
Cumulative Distribution Function

|

|
|
LogLogistic
Cumulative Distribution Function

|

|
[top]
The functions below have been added to SigmPlot´s
Transform language for calculating probabilities
and scores associated with distributions that arise
in many fields of study. The abbreviation CDF means
"Cumulative Distribution Function".
|
Transform Language Name |
Description |
|
gammadist |
Gamma CDF |
|
gammainv |
Inverse Gamma CDF |
|
gammaden |
Gamma Density |
|
weibulldist |
Weibull CDF |
|
weibullinv |
Inverse Weibull CDF |
|
weibullden |
Weibull Density |
|
cauchydist |
Cauchy CDF |
cauchyinv |
Inverse Cauchy CDF |
|
cauchyden |
Cauchy Density |
|
erf |
Error Function |
|
erfc |
Complementary Error Function |
|
lognormdist |
Lognormal CDF |
|
lognorminv |
Inverse Lognormal CDF |
|
lognormden |
Lognormal Density |
|
expdist |
Exponential CDF |
|
expinv |
Inverse Exponential CDF |
|
expden |
Exponential Denstiy |
|
logisdist |
Logistic CDF |
|
logisinv |
Inverse Logistic CDF |
|
logisden |
Logistic Density |
|
loglogisdist |
Loglogistic CDF |
|
loglogisin |
Inverse Loglogistic CDF |
|
loglogisden |
Loglogistic Density |
|
median |
A number that is both less than or equal to and greater than or equal to half of the values in the data set. |
[top]
New
Results Graphs 
-
ANOVA
Profile Plots
[top]
Feature Description – Profile plots are useful
for comparing the least square means, also
called estimated marginal means, in a multifactor
ANOVA model. Differences in the means, or effects, among
the levels of a specified factor, when computed over
a range of levels of the remaining factors, determine
how the data is affected by that factor and its interaction
with other factors. Profile plots provide a quick
qualitative assessment of the various treatment effects
so that the investigator can determine the impact
of each factor on the data. The hypothesis testing
in ANOVA reports quantifies these effects to determine
if any of the differences are statistically significant.
In ANOVA analysis, the least square means are first
computed for the individual cells. A cell is defined
as the collection of observations made for a particular
combination of levels, where one level is selected
from each factor. Generally, the cell means are obtained
as the predicted values in a regression model that
is associated with the ANOVA model. The cells means
determine the two-way interaction effects in
a Two-Way ANOVA and the three-way interaction
effects in a Three-Way ANOVA. If the cell means
are averaged over all levels of one factor while
fixing the levels of the remaining factors, you obtain
lower-order effects. This is how the main effects are
computed in Two-Way ANOVA and the two-way interaction
effects are computed in Three-Way ANOVA. Finally,
the main effects for a given factor in a Three-Way
ANOVA are determined by averaging the cell means
over all levels of the remaining two factors while
fixing each level of the given factor.
Profile plots are line plots with the levels of
one factor represented on the horizontal axis of
the graph and the experiment´s data (and the
least square means of that data) represented on the
vertical axis. The least square means have the same
scale as the data and so are positioned relative
to the data axis for each factor level on the horizontal
axis.
We will use the following design for presenting
profile plots:
- For Main Effects, there is one plot per graph
and the number of graphs equals the number of factors.
- For 2-Way Effects, we have one graph for each
distinct pairwise-combination of factors (so one
graph for Two-Way ANOVA and three graphs for Three-Way
ANOVA). Each of these graphs contains multiple
profile plots, one for each level of one of the
factors.
- For 3-Way Effects in Three-Way ANOVA, the number
of graphs equals the number of levels of the third
factor (this factor is the last factor that was
selected for running the test). Each graph for
3-Way Effects contains multiple profile plots,
one for each level of one of the factors.
All of the data that is graphed for Profile plots
is listed in the Summary table of the report.[top]
- Cox Regression Plots (Cumulative Hazard, Log Log
Survival)
Odds
Ratio and Relative Risk  > [top] Feature Description – The odds ratio and relative
risk are values that measure the strength of
association between a treatment or risk factor
and a specified event that occurs in members of
a population. In a study for which these values
are computed, you have a control group and a treatment
group, each of whose members are randomly selected,
and you have an event, like a disease, whose frequency
in the population may be affected by the treatment
administered. The total number of subjects in each
group can be different.
A study using relative risk assumes the control
group and the treatment group have been selected
in advance. Observations are then made to determine
how many from each group experience the event. This
is an example of a prospective study. The
relative risk RR is defined as the probability
of the event in the treatment group divided by the
probability of the event in the control group, where
each probability is estimated as the relative frequency
of the event in the group.

Odds ratio is frequently used in case-control studies.
This type of study is done retrospectively,
in which the investigator samples two groups of subjects
from the population according to whether a subject
did or did not experience the event. The two groups
are called the Cases and Controls, respectively.
The number of subjects from each group who were exposed
to the treatment or risk factor is then noted. The
odds ratio OR is defined by:

The odds ratio is an estimate
of how much more likely the event occurs for an
individual in the population exposed to the risk
factor as compared to an individual not exposed
to the risk factor.
In summary, the main computational
difference between Relative Risk and Odds Ratio
is that the former is a computed as a ratio of
probabilities whereas the latter is computed as
a ratio of odds.
The null hypothesis for both the relative risk and
the odds ratio is that its value equals 1. This means
that the treatment or risk factor does not affect
the event rate. A value significantly different from
1 indicates that the treatment either significantly
increases or decreases the risk of the event in the
population.
The data that is used for computing either quantity
can be represented in a 2x2 contingency table. The
probability of significance calculation for the test
uses the chi-square statistic for this table. If
the expected number of observations for any cell
of the table is less than 5, then the Fisher-Exact
test is used to compute the probability.
In SigmaPlot, and earlier in SigmaStat, we implement
relative risk and odds ratio as two separate tests
since they are used with different assumptions. The
menu and test combo box will be modified to include
a command for each test. The menu commands will be
on a submenu under the test category Rates and
Proportions. Each test will have its own options
in the Test Options dialog. These options will include
settings for the Yates continuity correction factor
for the chi-square statistic, the confidence interval
for the ratio, the control group row selection and
the power. The Test Wizard for each test will have
two data formats to select from: Tabulated and Raw.
The Tabulated format assumes the data is in the form
of a 2x2 contingency table where the two column selections
represent event counts and non-event counts. The
Raw data assumes the selected data is in two columns,
one for the risk factor/control group labels and
one for the event/non-event labels. After finishing
the Test Wizard, a report will be produced. There
are no result graphs for either test.
Example
Suppose we are given a 2x2 contingency table of
observations for studying the association of some
risk factor to an event. Suppose the risk factor
is radiation and the event is cancer.

The relative risk for the above table is RR = (50/93)/(14/49)
= 1.88, so that the risk of developing cancer in
the population is estimated to be 1.88 times higher
for those receiving the radiation. The chi-square
probability for this table is .007 so that risk of
developing cancer is significantly greater for those
exposed to the radiation.
The odds ratio the above table is OR = (50/43)/(14/35)
= 2.91, so that exposure to radiation increases the
odds of developing cancer by an estimated 2.91 times
among the population. With the same probability value
as above, it is clear the effect of the radiation
is significant.
Computations
Results in the report for relative risk and odds
ratio use the computations below. It is assumed the
input data can be put into the form of the 2x2 contingency
table below:

Relative Risk:
RR 


Odds Ratio:
OR

[top]
|