Meta-analysis: the analysis of analyses

Meta-analysis: the analysis of analyses


“I am always certain about things that are a matter of opinion. ” – Charlie Brown

Why do we need systematic renews?

Traditionally, most journals many review articles which attempt to summarize the current state of knowledge on specific issues. In these reviews, the reviewers take the trouble of wading through large numbers of original articles and then attempt to summarize them in an easily readable format. A good review can be an excellent resource, particularly when beeping up with all new contributions is so difficult in this era of information explosion. Review articles, thus, have an important place in health literature. However, what if the reviewer had already made up his/her mind about something and only quotes those original contributions which corroborates the reviewers opinion? In other words, are traditional review articles objective and unbiased?

Another problem with any review is one of deciding whether a particular therapy is effective when results from various randomized controlled trials (RCTs) are conflicting. In such situations, reviewers are tempted to indulge in “vote counting” – comparing the number of “positive” trials with the number of “negative” trials. This can be a problem because a trial may be statistically “negative” (difference in treatment effects was not statistically significant) but it can still show a trend towards positive effect. In fact, many trials may not have the power (sample size) to pick up differences in treatment effects. A trial result may have clinical significance but not attain statistical significance. All these issues are missed when vote counting is done in traditional reviews.

The figure illustrates this using the review of beta-blockers in the prevention of deaths after myocardial infarction (Yusuf et al 1985). The figure shows the relative risk (RR) and 95% Cl for death comparing those who did and did not get beta-blockers. As the figure reveals, only 2 of the 11 trials were “positive”; one trial showed risk instead of protection and all the other 8 trials showed a trend towards protection (but not statistically significant because of the 95% Cl straddling 1.0). If simple vote counting were to be done, it would completely miss the efficacy of beta-blockers. In fact, pooling of data revealed a fairly precise protective effect.

Given all these problems, it is now felt that traditional reviews are very prone to biases and there is a need for systematic reviews. A systematic review is an overview of primary studies which contains an explicit statement of objectives, materials and methods and has been conducted according to explicit and reproducible methodology [Greenhaigh T 1997].

Systematic reviews are considered advantageous because:

  • explicit methods limit bias in identifying and rejecting studies
  • conclusions are more reliable and accurate because of methods used
  • large amounts of information can be assimilated quickly
  • results of different studies can be formally compared to establish generalisability of findings and consistency of results
  • reasons for inconsistency in results across studies can be identified and new hypotheses generated particular subgroups
  • quantitative systematic reviews (meta-analyses) increase the precision of the overall result

A meta-analysis is a quantitative type of systematic review. It is a mathematical synthesis of the results of two or more primary studies that address the same hypothesis in the same way [Greenhaigh T 1997]. It is essentially pooling of the quantitative results of different studies on the same research question into single estimates of, say, the effect of a treatment.

Meta-analysis has been defined as the process of using statistical methods to combine the results of different studies [Last 1995]. Compared to the 16 meta-analyses published during the 1970s, over 500 have published during the year 1996 alone! More than 2000 meta-analyses have been published till date and reflects their popularity (Lau et al 1998].

However, in the last few years, the initial euphoria about their value has been undermined by the realization that poorly done meta-analyses can actually confuse than enlighten the reader! Indeed, meta-analyses have received severely discrediting reviews in the recent past: critics have opined that meta-analyses combined studies which are very heterogeneous and therefore unreliable [Eysenck HJ 1994]; negative trials are often not published and hence they can be missed by meta-analysts; and large randomized clinical trials do not always agree with prior meta-analysis {Lau et al 1998, Villar J et al 1995, Cappelleri JC et al 1996, Leiorier J et al 1997, Egger M et al 1995]. To understand these limitation of meta-analyses, one would first have to understand the process of a meta-analysis.

How is a meta-analysis done?

State the objectives of the meta-analysis:This is the first crucial step because it is the hypothesis which will determine which RCTs will be selected to be included in the meta-analysis. A distinction is often made between exploratory meta-analysis and hypothesis-driven meta-analysis. Ideally, the question addressed by the meta-analysis should be defined precisely so that a yes/no decision can be made at the end of the review. Exploratory reviews use all available studies without a specific hypothesis. This is similar to the process of conducting a trial without defining a study hypothesis. In addition to clearly defining the hypothesis, the reviewer must also decide on issues like what types of studies should be included, what types of comparisons can be made, what type of therapies to be compared, what outcomes should be compared, etc.

2. Search for trials that meet eligibility criteria: A very comprehensive search must be made for all studies on the research question. Even the best MEDLINE search will pick up only 30-80% of all published RCTs on the question. In fact, many negative trials are not published at all and trials published in non-English language journals are also often missed while searching. An effort must be made to get around these problems by consulting trial registers, checking reference lists of published papers, writing to first authors of published papers and asking them for information on similar studies, etc. An effort must be made to get the raw data from previous published trials since journal articles do not always report all the important baseline and outcome variables.

3 Assess methodological quality of RCTs: Once all trials on the topic have been identified, the next task is to assess them all in terms of their methodological quality (the extent to which the study has been designed and executed without any systematic errors or bias.), precision (usually shown as 95% Cl) and external validity (how generalisable are the results of a trial to the general population). Many reviewers try to assign a “quality score” to each paper. Poor quality papers (for example, a trial where allocation was not done by randomization) may be excluded from the meta-analysis. It is important that reviewers do not reject studies without a valid justification. In fact, it is now considered desirable to assess quality of studies in a blinded fashion – the reviewer should not know the outcome of the trial.

4.Collect data from chosen trials and tabulate them:This involves a complete tabulation of each trial, its quality, its sample size, comparison groups, therapy given including dose, duration, etc., outcomes measured, etc. During this stage, it is often necessary to write to the authors of the original trials and ask them for raw data. Despite all these efforts, it is possible that there may be several gaps in the data set.

5.Analyze data using statistical methods: Pooling of data is then done by using statistical methods. Most often pooling of Odds Ratios or Relative Risks is done. It is also possible to pool “effect sizes” (magnitude of difference in outcomes between the two trial arms) or Number needed to treat. Recently, the Centers for Disease Control, Atlanta, has come up with an early version of a software called Epi Meta for performing meta-analysis [web site for download:]. In recent times, meta-analyses results tend to be presented in a standard format such as the one shown in the figure below. This is called the “forest plot.” The point estimate of the OR or RR of individual trials is shown as solid boxes and the 95% Cl is shown as horizontal lines. The line down the middle of the picture is known as the “line of no effect” and in the case of OR / RR, it is associated with a risk ratio of 1.0. The diamond below all the individual horizontal lines represents the pooled data from all the trials. If the diamond touches the line of no effect, then the inference is obvious.

6.Interpretation of results: Meta-analyses aim to increase the sample size by combining studies (thereby increasing the overall power). Though this has many advantages (like increasing precision), it must be remembered that when numbers are large enough, even clinically insignificant differences can become statistically significant. Also, while interpreting results of meta-analyses, it is important to consider issues like heterogeneity of trials which have been included, quality of trials that have been included, publication bias which may influenced selection of trials into the meta-analysis, etc.

Limitations of meta-analyses

  • Comparing apples and oranges: While the theory of pooling data to drive precise estimates of treatment effect is fairly sound, it presumes that different among the various trials are primarily due to chance. This need not be true. RCTs are done on different populations and different subgroups representing various patients characteristics. Different subgroups may respond to the same therapy differently. In addition, individual trials use different inclusion, exclusion criteria. To this, we need to add differences in treatment protocols and outcome measurement in different trials. With all this variability, is one justified in pooling data to generate single estimates of all RCTs? Many authors feel that this kind of unthinking pooling of data can generate misleading results by ignoring meaningful heterogeneity among studies, and introducing further biases through the process of selecting trials. [Naylor 1997, loannidis JPA 1998). It is now widely felt that meta-analyses should attempt to evaluate heterogeneity rather than just drown all differences by pooling data [LauJ et al 1998].
  • Publication bias: It is well known that studies with statistically significant outcomes are more likely to get published than non-significant studies [Naylor 1997]. Also, small trials are less likely to be published as compared to large trials. It has also been shown that negative studies take longer to appear in print. Another problem is one of covert duplicate publication (same trial data being published more than once). It has also been documented that papers which appear in languages other than English are more likely to be excluded from meta-analyses. All these biases are important and can invalidate the results of meta-analyses.
  • Discrepancies with mega-trials: One of the reasons for the recent skepticism about the value of meta-analyses is the discrepancy between the results of meta-analyses and subsequent large RCT instance, in a meta-analysis published in 1993 by Teo et al, it was shown that magnesium given intravenously for acute myocardial infarction decreases mortality in hospital. Results of a subsequent mega-trial with; patients, the ISIS-4 in 1995 did not show any such effect. Similarly, aspirin was shown to prevent pre-eclampsia in a meta-analysis by Imperiale in 1991. A mega-trial published in 1994 (CLASP Trial) did not show such a protective effect.
  • Several authors have attempted to quanta the discrepancies between meta-analyses and subsequent large trials. Villar et al (1995) examined 30 meta-analyses in perinatal medicine and found that directionally 80% of them agreed with the results from the larger trials. Cappelleri et al (1996) reviewed 79 meta-analyses and also found about 80% directional agreement. More recently LeLorier et al (1997) compared 12 RCTs with 19 previous meta-analyses. They found that meta-analyses would have led to the adoption of an ineffective treatment in 32% of the cases and rejection of a useful treatment in 33%.
  • Authors have attempted to explain these discrepancies using various models [Lau I et al 1998, Woods KL 1995, loannidis et al 1998, Ptogue J et al 1998, Eggar M et al 1995]. Thanks to these controversies, the methods of meta-analyses are rapidly evolving. As loannidis et al point out, “discrepancies between meta-analyses and large trials should be expected, given the variable characteristics and treatment responses in different persons, protocols and populations. Not only are trials in meta-analyses frequently heterogeneous, but also the idea of the homogeneous single trial is often a myth. Discrepancies occur even within trials and between large trials themselves.”

Judging a meta-analysis

It is now recognized that meta-analysis has made and continues to make important contributions to clinical practice. It is also recognized that it is no panacea. We need to realize that for a meta-analysis to give good information, it should meet at least the minimum standards that would be expected of a well-designed, adequately powered, ami carefully conducted randomized controlled trial. Thus, the time and effort require do a reliable meta-analysis may at least be equivalent to that needed for a single RCT. If such high standards are not met with, then meta-analyses can do more harm than good.

As loannidis and colleagues wrote “Meta-analysis is not statistical alchemy that makes life easier by distilling one magic number from confounded data; it is a scientific discipline that aims to quantify evidence and to explore bias and diversity in research systematically.” It is the responsibility of the reader to evaluate a meta-analysis very critically and make the judgment about its quality and validity.

The following is a format [Riegelman & Hirsch 1996] that can be used for critiquing and judging meta-analyses:

Study Design

1 .Was the meta-analysis hypothesis driven or exploratory?

2. If the meta-analysis was hypothesis driven, were inclusion and exclusion criteria defined by the hypothesis? If it was exploratory, were all potentially relevant studies included?


1.Did the investigator identify methods used to search for all potentially relevant research? Was double counting avoided?

2.Was the possibility of publication bias evaluated using a technique such as the funnel diagram?

3.Was an assessment of the quality of the studies used? If so, was exclusion of low-quality studies justified by a difference in outcomes?

4.Were characteristics of patients and treatments identified that may differ between the investigations and may also affect the outcome?


1 .When assessing outcomes, were the investigators masked or blinded as to the identity of the authors of the ‘ investigation?

2.Was a common outcome measurement or end point used for all available studies? 3.Did the definition of the end point used affect the results?


1.Was homogeneity evaluated using a method such as the graphic approach before combining investigations ‘ into one meta-analysis? If homogeneity did not appear to exist, were two or more separate meta-analysis conducted?

2.Were the number of individuals in each group and the proportion that experienced a particular outcome available for each investigation, allowing calculation of an effect size?

3.Was statistical significance of the meta-analysis determined even if effect size could not be calculated?


1.Was it recognized that the larger sample’s sizes of a meta-analysis increases the statistical power and makes it especially important to distinguish between statistically significant and clinically important?

2.Did the investigators recognize the potential for interpreting safety data with greater confidence because of larger sample’s size?

3.Did the investigator examine the impact and characteristics of outliers? Were outliers excluded only with good justification?

4.1f data presented in the report are adequate, did the investigators selectively perform subgroup analysis on the basis of predetermined study questions?

5.Did the investigators determine a Jail-safe n, which tells one the number of additional average size investigations with zero effect size that must exist before the results would no longer be statistically significant.


1.Did the investigators recognize that meta-analyses aim to determine average effect size and that single studies may be more relevant to a particular patient or a particular setting?

2.Did the investigators recognize that meta-analysis does not allow better extrapolation beyond the data than do single investigations?

References & further reading

1. Iommidis JPA et al. Meta-analyses and large randomized controlled trials. NEJM 1998;338:59.

2. Yusuf S et al. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Caidiovasc Dis 1985,27:335-371.

3. Last JM. A Dictionary of Epidemiology, 3rd Edition. Oxford University Press, 1995.

4. Greenhaigh T. How to read a paper. Papers that summarize other papers (systematic reviews and meta-analyses) BMJ 1997;315:672-675.

5. Lau J et al. Meta-analysis Duet. Summing up evidence: one answer is not always enough. Lancet 1998;351:123-12′

6. Pogue J, Yusuf S. Meta-analysis Duet. Overcoming the limitations of current meta-analysis ofrandomized control trials. Lancet 1998;351:47-52.

7. Naylor DC. Meta-analysis and the meta-epidemiology of clinical research. BMJ 1997:315:617-619.

8. Teo KK et al. Role of magnesium in reducing mortality in acute myocardial infarction. A review of the evidence. Drugs 1993;46:347-59.

9. ISIS-4 Collaborative Group. ISB-4: a randomized factorial trial assessing early oral captopril, oral mononitrate, and intravenous magnesium sulphate in 58,050 patients with suspected acute myocardial infarction. Lancet 1995;345:669-87.

10. Imperiale TF et al. A meta-analysis of low-dose aspirin for the prevention of pregnancy-induced hypertensive disease. JAMA 1991^66:261-5.

11. CLASP Collaborative Group. CLASP: a randomized trial of low-dose aspirin for the prevention and treatment pre-eclampsia among 9364 pregnant women. Lancet 1994;343:619-29.

12. Villar J et al. Predictive ability of meta-analyses of randomized controlled trials. Lancet 1995;345:772-6.

13. Cappelleri JC et al. Large trials versus meta-analysis of smaller trials. How do their results compare? JAMA1996^76:1332-8.

14. Leiorier J et al. Discrepancies between meta-analyses and subsequent large randomized controlled trials. NEJ 1997,337:536-42.

15. Egger M et al. Misleading meta-analysis. Lessons from “an effective, safe, simple” intervention that wasn’t. BMJ 1995;310:752.

16. Woods KL. Mega-trials and management of acute myocardial infarction. Lancet 1995;346:611-4.

17. Riegelman RK, Hirsch RP. Studying a Study and Testing a Test. How to Read the Health Science Literature. 3 Edition. Little Brown and Company, 1996.

18. Eysenck HJ. Meta-analysis and its problems. BMJ 1994;309:789-792.


Dr. Madhukar Pai MD, DNB
Consultant, Community Medicine & Epidemiology
Email: [email protected]