While assessing research evidence on any given topic can be very complex, EBP reviews are categorized in a manner designed to convey quality in a simple format. (Note, however, there are a lot of assumptions built into -- and omitted from -- such ratings!)
Here is an example of the use of evidence hierarchies in EBP. This is taken from the U.S. Department of Health and Human Services' National Guidelines Clearinghouse which (as the name implies) sets guidelines for treatment guidelines. The categories themselves are clearly imported from an uncited source used in the United Kingdom.
I: Evidence obtained from a single randomised [sic - British spelling from the original] controlled trial or a meta-analysis of randomised controlled trials
IIa: Evidence obtained from at least one well-designed controlled study without randomisation
IIb: Evidence obtained from at least one well-designed quasi-experimental study [i.e., no randomization and use of existing groups]
III: Evidence obtained from well-designed non-experimental descriptive studies, such as comparative studies, correlation studies, and case-control studies
IV: Evidence obtained from expert committee reports or opinions and/or clinical experience of respected authorities
Grade A - At least one randomised controlled trial as part of a body of literature of overall good quality and consistency addressing the specific recommendation (evidence level I) without extrapolation
Grade B - Well-conducted clinical studies but no randomised clinical trials on the topic of recommendation (evidence levels II or III); or extrapolated from level I evidence
Grade C - Expert committee reports or opinions and/or clinical experiences of respected authorities (evidence level IV) or extrapolated from level I or II evidence. This grading indicates that directly applicable clinical studies of good quality are absent or not readily available.”
[Retrieved Feb 20,
One can plainly see that the Evidence Categories or hierarchies are used to 'grade' the Recommendations that constitute practice guidelines. Note too that research evidence based on multiple RCTs is privileged, while an work based on quasi-experiments, correlational studies, case studies or qualitative research is viewed as not particularly useful. Quality of conceptualization (of disorder and of treatment) is assumed; samples sizes and composition are not mentioned beyond randomization; generalization from prior work is assumed to be non-problematic; analyses are presumed to be done appropriately and issues of diversity and context are not considered.
Another, quite similar, rating system is used by the United States Preventive Services Task Force. They use the following system:
Strength of Recommendations
The U.S. Preventive Services Task Force (USPSTF) grades its recommendations according to one of five classifications (A, B, C, D, I) reflecting the strength of evidence and magnitude of net benefit (benefits minus harms).
A.- The USPSTF strongly recommends that clinicians provide [the service] to eligible patients. The USPSTF found good evidence that [the service] improves important health outcomes and concludes that benefits substantially outweigh harms.
B.- The USPSTF recommends that clinicians provide [this service] to eligible patients. The USPSTF found at least fair evidence that [the service] improves important health outcomes and concludes that benefits outweigh harms.
C.- The USPSTF makes no recommendation for or against routine provision of [the service]. The USPSTF found at least fair evidence that [the service] can improve health outcomes but concludes that the balance of benefits and harms is too close to justify a general recommendation.
D.- The USPSTF recommends against routinely providing [the service] to asymptomatic patients. The USPSTF found at least fair evidence that [the service] is ineffective or that harms outweigh benefits.
I.- The USPSTF concludes that the evidence is insufficient to recommend for or against routinely providing [the service]. Evidence that the [service] is effective is lacking, of poor quality, or conflicting and the balance of benefits and harms cannot be determined.
Quality of Evidence
The USPSTF grades the quality of the overall evidence for a service on a 3-point scale (good, fair, poor):
Good: Evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes.
Fair: Evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, or consistency of the individual studies, generalizability to routine practice, or indirect nature of the evidence on health outcomes.
Poor: Evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes.
Once again, research evidence based on multiple RCTs is privileged, while an work based on quasi-experiments,
correlational studies, case studies or qualitative research is viewed as
not particularly useful. Quality of conceptualization (of disorder and
of treatment) is assumed; samples sizes and composition are not
mentioned beyond randomization; generalization from prior work is
assumed to be non-problematic; analyses are presumed to be done
appropriately and issues of diversity and context are not considered.
Still, the clarity is valuable - and useful if you understand the rating
system and its underlying logic.
February 20, 2007 from
J. Drisko page begun 3/171/04, updated 2/20/08