Understanding Evidence
How do we rate evidence?
This site rates evidence using the GRADE framework — the same system used by Cochrane, WHO, and major clinical guideline bodies. This page explains what each certainty level means and how ratings are assigned.
The four certainty levels
High certainty
Standard language: "[Treatment] results in [effect]."
We are very confident the true effect lies close to the estimate. Further research is very unlikely to change confidence in the estimate. Typically requires multiple large, well-designed RCTs with consistent findings.
Moderate certainty
Standard language: "[Treatment] likely results in [effect]."
We are moderately confident the true effect lies near the estimate, but there is a possibility that it differs substantially. Further research may change our confidence or the estimate. Most acupuncture-vs-no-treatment findings for back pain fall here.
Low certainty
Standard language: "[Treatment] may result in [effect]."
Our confidence in the effect estimate is limited. The true effect may be substantially different. Further research is likely to have a significant impact on our confidence. Most acupuncture-vs-sham and most neck pain findings fall here.
Very low certainty
Standard language: "The evidence is very uncertain about whether [treatment] results in [effect]."
We have very little confidence in the effect estimate. The true effect is likely to be substantially different. Comparisons between active treatments (acupuncture vs. PT, massage, chiropractic) often fall here due to limited head-to-head RCTs.
How are ratings assigned?
GRADE ratings are assigned per outcome and per comparison — not per treatment. "Acupuncture works" is not a GRADE-compatible claim. "Acupuncture likely results in meaningful pain reduction compared to no treatment for chronic low back pain" is.
Factors that lower certainty from an initially high starting point (for RCTs):
- Risk of bias — methodological problems in the underlying studies (e.g., inadequate blinding, selective reporting)
- Inconsistency — unexplained variation across trials
- Indirectness — the evidence applies to a different population, intervention, or outcome than the question being asked
- Imprecision — wide confidence intervals; small sample sizes
- Publication bias — evidence that negative studies are less likely to be published
Why does the sham acupuncture problem lower certainty?
GRADE certainty for acupuncture-vs-sham comparisons is rated low rather than moderate because of two interacting problems:
- Indirectness: Sham acupuncture is not a true placebo — it produces physiological responses including sensory input and may activate endogenous opioid pathways. Comparing real vs. sham may not answer the question "does acupuncture work?" — it may ask "does needle location matter?"
- Inconsistency: The real-vs-sham difference is small and varies across trials, making it less reliable as an effect estimate.
Full sham debate discussion on the back pain evidence page.
Why study type is not the whole story
The evidence pyramid (systematic reviews at top, case reports at bottom) is a useful starting framework. But study type does not guarantee quality. A systematic review of biased small studies does not produce high-certainty evidence.
GRADE reflects both study type and study quality. High certainty requires high-quality evidence from the highest study types — not just any systematic review.
What sources were excluded?
- Studies from predatory journals or journals with documented peer-review problems
- Animal model studies extrapolated to humans without replication in human trials
- Testimonials and practitioner case series without control conditions
- Studies with undisclosed industry funding that could not be independently assessed
Key sources
- Guyatt GH et al. "GRADE: an emerging consensus on rating quality of evidence and strength of recommendations." BMJ. 2008;336:924.
- Schünemann H et al. "Interpreting GRADE's levels of certainty or quality of the evidence." J Clin Epidemiol. 2016.
- Full methodology and source list.
Page last reviewed: March 7, 2026 · Authored by Claude (Anthropic AI) · Research methodology