A Primer For Evaluating Clinical Trials
Gary H. Lyman, MD, MPH, and Nicole M. Kuderer
An understanding of the limitations of design and conduct
of clinical trials helps clinicians to assess the results
Background: Evidence-based medicine demands the use of information from clinical
trials to direct medical care. Knowledge of the principles of trial design and conduct is
important to assess the validity of results.
Methods: The authors review the key principles behind clinical study design and
conduct, and they summarize important biases and confounding issues.
Results: Clear hypotheses, a well-described study population, precise measurements,
freedom from bias, and consideration of any interactions are attributes of good clinical
trials.
Conclusions: The greatest level of evidence in support of a difference in outcome
is associated with randomized, controlled clinical trials, particularly when combined with
other randomized trials in a systematic fashion (meta-analysis).
Introduction
Clinicians are faced daily with the challenge of understanding and evaluating the
results of clinical investigations published in the medical literature. Clinical
researchers also face the task of translating their ideas into reasonable hypotheses and
designing appropriate studies to test these hypotheses. Editors and reviewers experience
similar challenges in evaluating clinical reports and selecting those of sufficient
interest and quality to justify publication. It is essential that clinician readers of the
medical literature and investigators conducting clinical research, as well as reviewers
and editors of medical journals, become familiar with the fundamentals of clinical
research methods.
This primer for designing, analyzing, and evaluating the results of clinical
investigations may serve as an introduction to these issues for students and young
clinicians as well as a refresher or aid to readers and established investigators
of the medical literature. This primer is not a comprehensive treatise on this vast
subject; however, it provides a framework for designing studies, analyzing data, and
reading the medical literature for quick and ready reference.
Table 1 presents five basic questions to be addressed when designing and analyzing a
clinical investigation or when reviewing or reading a published report of a clinical
trial. Researchers, reviewers, and readers of the medical literature should be able to
address and successfully answer each of these fundamental questions in the evaluation of
any clinical trial. The optimal evaluation of a clinical investigation requires both an
understanding of the clinical issues involved and a basic awareness of statistical
methods.
Question 1: Are the Study Hypotheses Clearly Stated and Relevant?
Developing a clear statement of the clinical question or hypothesis to be studied
presents a difficult but critical challenge to the inexperienced investigator or reader of
the medical literature. In many cases, the researcher has a good understanding of the
issues involved but has difficulty in explicitly stating or translating these into
testable hypotheses. A clear statement of the hypothesis of interest requires a definition
of both the dependent (outcome) variables and the independent (treatment, prognostic
factors) variables of interest. The primary hypothesis generally addresses the effect of
an intervention variable, such as treatment, on the outcome variables of interest, such as
response or survival (Fig 1). Secondary hypotheses generally address the effect of the
intervention on outcome among specified subgroups (subgroup analysis). In randomized
clinical trials, the primary hypothesis is usually evident, while in nonrandomized trials
and especially in uncontrolled trials, it may be much less apparent. Both the primary and
any secondary hypotheses should be stated in advance of the study and should include any
planned subgroup analyses.
Testable hypotheses must be formulated in terms of measurable outcomes. The measures of
interest must be clearly stated, and the method of measurement must be described. Clinical
measurement scales include continuous, categorical, and time-to-event observations.
Time-to-event measures should specify the beginning and ending points and should clearly
define the criteria for an event. Methods utilized to assure the completeness and quality
of the measured endpoints should be discussed. The measurement scale used will determine
the appropriate summary measures and applicable statistical methods. Finally, the
importance and clinical relevancy of the study question should be clearly stated. Even a
well-designed and carefully analyzed trial cannot justify a study of limited clinical
importance.
Question 2: Is the Study Population Adequately Described?
No investigation can be properly evaluated without a detailed understanding of the
study population. Investigators often fail to adequately define and account for the
subjects evaluated in a study. The investigator must find the proper balance between
restricting eligibility in order to obtain a relatively uniform group of subjects and
minimizing eligibility restrictions in order to provide greater ability to generalize the
study results. Every study evaluation should specify the eligibility criteria for entry,
including any reasons for exclusion. Obviously, specifying such criteria may not fully
account for differences in referral patterns to a particular institution or in selection
bias imposed by clinicians or patients. While this is of great concern in uncontrolled or
historically controlled studies, it is also important to the interpretation and
extrapolation of the results of randomized, controlled trials. Fig 2 provides a flow chart
illustrating subject evaluation, entry, and loss during the process of a clinical trial.
The number of subjects, reasons for ineligibility or study refusal, failure to randomize,
alteration in treatment, or withdrawal from study should be presented, as well as a
description of the specific population on which subsequent analysis is based. An
inevaluability rate of 10% or less for major outcome measures should be achievable in most
studies. In addition, incomplete data can seriously restrict the sample population
considered in subgroup and multivariate analyses and can potentially bias the results.
Question 3: Are the Observed Differences Due to Random Error (Chance)?
The outcomes of a clinical study are seldom exactly equal among the study groups. The
differences observed may represent true outcome differences. However, it is imperative
that the investigator and the reader consider the possibility that the differences are due
to either random error or systematic error. Random error is the result of either biologic
or measurement variation, whereas systematic error is the result of a variety of biases
that can affect the results of a trial. The process of evaluating the outcomes of a study
for random error includes both estimation and statistical testing. Estimates summarizing
the distribution of measured variables may include point estimates (such as means or
proportions) and measures of precision (such as confidence intervals). Confidence limits
represent the upper and lower bounds likely to contain the true value of the variable. In
an uncontrolled study, the number of subjects should be sufficient to achieve the desired
level of precision in estimating an outcome (such as the response rate).
Statistical testing involves an assessment of the probability of obtaining an observed
difference in outcome when there is actually no true difference between the groups
(false-positive rate; P value). When the P value is greater than a specified
critical value (eg, .05), the observed difference is considered to be not statistically
significant and is attributed to random error or chance. When the P value is less
than the specified critical value, the difference is considered to be statistically
significant and is attributed to a true effect. The larger the difference in outcome and
the greater the precision, the greater is the evidence for a true difference between the
groups.
Repeated or interim analyses will increase the chances of observing a statistically
significant difference purely due to chance. The number of interim analyses planned and
any measures taken to appropriately adjust the final analysis should be discussed.
Similarly, the risk of a false-positive result will increase with multiple subgroup
comparisons. Subgroup analysis should be considered in the design of the study and should
be limited to those with sufficient biologic or clinical rationale.
Random error should be considered and controlled for in the design, conduct, and
analysis of a clinical investigation. Evidence supporting a true effect on outcome is
summarized in Table 2. It is imperative that sufficient numbers of subjects are included
to ensure that a nonsignificant difference will not be due to a false-negative result. The
probability of obtaining a significant result when a real difference exists is termed the power.
A sample size large enough to achieve a power of 80% (and preferably 90% or greater) for
detecting a clinically meaningful difference is generally considered desirable.
| Large effect size (difference) |
| Low variability |
| Large sample size |
| Small P value (low false-positive) |
| Large power (low false-negative) |
Table2.--Factors Determining Power in a Clinical Study
As a cautionary note, in large trials with high power, small differences in outcome may
be statistically significant and yet clinically meaningless. Alternatively, in small
trials, differences in outcome may appear clinically important and yet be statistically
nonsignificant due to low power. During the conduct of a trial, careful measuring of
clinical observations is essential. In the analysis, measures of precision (eg, standard
error or confidence limits) should be presented in addition to any statistical testing.
Question 4: Are the Observed Differences Due to Systematic Error (Bias)?
Apparent differences in measured outcomes may appear to be clinically and statistically
significant and yet may be the result of systematic error or bias within the study. Even
the most careful measurement and elegant statistical analysis will not salvage a biased
clinical trial. The most common types of bias in clinical investigations are those related
to subject selection, outcome measurement, and confounding. Confounding represents the
modification of the true relationship between treatment and outcome by another factor, eg,
prognostic factor (Fig 1). This occurs when the factor is associated with both the outcome
of interest and treatment group assignment. Confounding can obscure a true outcome
difference when it exists or can create an apparent difference that does not exist.
The most effective approach to controlling bias is in the design of a clinical
investigation. As illustrated in Fig 3, the level of evidence in support of a true
difference in outcome measures will be greater for controlled clinical trials, especially
when treatment is randomly assigned and when there is blinding of both the observer and
the subject to the treatment administered (double-blinding). In the process of
randomization, both known and unknown confounding factors will be evenly distributed among
the treatment groups on the average. The balance of important factors within treatment
groups can be ensured by randomizing separately within subgroups (stratification). Support
for an outcome difference may be provided by nonrandomized studies with one or more
concurrent control groups that are preferably matched on important prognostic factors.
However, historically controlled, uncontrolled, and purely descriptive studies provide
relatively limited evidence in support of an observed outcome difference. Perhaps the
greatest level of evidence comes from a systematic and quantitative overview of properly
conducted clinical trials in the form of a meta-analysis.
Bias can also be introduced in the conduct and measurement phase of a study. The
quality of observation, measurement, and data recording represent additional information
that affects the validity of study results. Great care must be taken to obtain complete
information and to account for any missing data.
Despite the randomization process and other safeguards in the design and conduct of a
study, the distribution of known prognostic factors within treatment groups should be
compared in the analysis. In randomized trials, outcomes should be compared among the
groups based on the original treatment assignment rather than based on the treatment
received (intention-to-treat analysis). The investigator should present the distribution
of known prognostic factors among the treatment groups. Any prognostic factor associated
with both outcome and treatment group assignment must be considered as a potential
confounding factor and should be properly adjusted in the analysis. If actual confounding
has occurred, the relationship between treatment and outcome will be either strengthened
or weakened in subsequent analysis.
Adjustment for confounding can be achieved through either stratified or multivariate
analysis. In stratified analysis, the relationship between treatment and outcome is
evaluated separately among the subgroups of a prognostic factor. The major strengths of
such analysis are the ease and clarity of presentation, while the major weakness is that
of small numbers in the subgroups considered. The Mantel-Haenzel method permits the
combining of subgroups, thus providing an adjusted overall test.
Multivariate analysis permits evaluation and adjustment for confounding for any
variable contained in the model. The coefficients for variables within the model represent
the rate of change in outcome with change in the variable. The effect of each variable on
outcome is adjusted for the effects of the other variables retained in the model. While
multivariate methods avoid the problem of small subgroups, subjects missing one or more
clinical measures will be deleted from the analysis, which may bias the results. The
investigator should present the actual number of subjects evaluated in each model.
Multivariate analysis is a sophisticated process requiring knowledge of the assumptions on
which the model is based. In addition, the results are often difficult to display and
explain to an inexperienced reader. Finally, the reader must recognize that the observed
outcome differences may still be the result of an imbalance of unmeasured or unrecognized
confounding factors that cannot be addressed in any analysis.
Question 5: Are the Observed Differences Modified by Other
Factors?
When confounding is present, the relationship between treatment and outcome is altered
by the confounding factor. Interaction occurs when the relationship between
treatment and outcome is different within subgroups of the factor. Synergy occurs
when the relationship between treatment and outcome is greater than expected in a
subgroup, while a decrease in the expected relationship is termed antagonism. When
interaction is present, the relationship between treatment and outcome should be presented
separately for each subgroup. Combining results across subgroups of an interaction term
will produce an inaccurate measure that does not apply to any subgroup.
Inclusion of a variable in a multivariate model will adjust for confounding but not for
interaction. If interaction is present, the investigator must either present separate
models for each subgroup or include an interaction term (product term) in the model along
with each variable. Such models are often difficult to present and interpret; therefore,
the presence of interaction is often not properly considered or addressed in the
presentation of clinical trial results. The optimal consideration of interaction is aided
by an understanding of the underlying biological mechanisms and relationships of the
clinical measures involved.
Conclusions
The appropriate design, presentation, and evaluation of a clinical investigation
require an explicit definition of the study population, a clear statement of the primary
and secondary hypotheses in terms of measurable outcomes, and careful consideration of any
observed differences for precision, bias, and possible interaction. Table 3 represents a
guide for the proper design, review, and evaluation of clinical trials. The reader of the
medical literature should consider each of these issues in evaluating the published
results of a clinical investigation.
Resource Bibliography
Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting randomized
controlled trials: the CONSORT statement. JAMA. 1996;276:637-639.
Gardiner MJ, Altman DG. Statistics With Confidence: Confidence Intervals and
Statistical Guidelines. London: British Medical Journal; 1989.
Gehlback SH. Interpreting the Medical Literature. 3rd ed. New York, NY:
McGraw-Hill Inc; 1993.
International Committee of Medical Journal Editors: Uniform Requirements for
Manuscripts Submitted to Biomedical Journals. Ann Intern Med. 1997;126:36-47.
Lang TA, Secic M. How to Report Statistics in Medicine: Annotated Guidelines for
Authors, Editors and Reviewers. Philadelphia, Pa: American College of Physicians;
1997.
Back to Cancer Control Journal Volume 4 Number 5