“Give me an undigested heap of figures and I cannot see the wood for the trees. Give me a diagram and I am positively encouraged to forget detail until I have a real grasp of the overall picture.”
– M J Moroney
Quantitative research is all about collecting, analyzing and interpreting numbers. Sadly, the human mind is quite limited when it comes to understanding patterns behind a pile of naked numbers (raw data). For us to make any sense of data, we first have to reduce raw data into something intelligible. Thanks to descriptive statistics, we can group data, look at their frequencies, look at data as charts, graphs and pictures, and compute averages, etc.
Presentation of data in a visually appealing fashion, without sacrificing the richness of the data, is an art. Data should be presented in a manner that will communicate the maximum information in the most efficient manner. Pictorial presentations are advantageous because even a novice without any technical expertise can assimilate the information by looking at them. It helps if one knew some basic principles behind data presentation.
Types of presentalion formats
Graphs: Histogram, Line diagrams. Scatter plot
Charts: Bar chart. Pie chart
A graph is a visual display of the relationship between variables: the values of one set of variables are plotted along the horizontal or X axis, of a second variable, along the vertical or Y axis. Three dimensional graphs of relationships between 3 variables can also be made and depicted in two dimensions.
Graphs differ from charts: the X axis in graphs are usually variables on a continuous scale (e.g. time, height, weight). In charts, the variables depicted on the X axis usually are unrelated to each other (discrete) and there is no continuity (e.g. sex, diagnosis).
A histogram is a graphic representation of the requency distribution of a variable. Vertical rectangles (bars) are drawn in such a way that their bases lie on a linear scale representing different intervals, and their heights are proportional to the frequencies of the values within each of the intervals. Each bar. covers a class interval and the centers of the bases of the bars are located at the mid-point of the class intervals.
A histogram is called a frequency histogram when simple frequencies are plotted along the vertical axis. Instead of frequencies, if relative frequencies (percentages) are plotted along the vertical axis, then it is called a relative frequency histogram.
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display, the resultant graph will be a line graph (also called a frequency polygon).
Line graphs are good at showing trends over a period of time. When trends of rates (death rate. Infant Mortality Rate, etc.) are to be displayed it is better done with line graphs rather than histograms.
Also called a scattergram. This a method of displaying the distribution of two variables in relation to each other another. The value of one variables is measured on the X axis and the values of the other on the Y axis. The variables have to be on a continuous scale. Each plot thus has two values ( coordinates) from the Y and X axis scales.
A wide scatter of the plots denotes poor correlation between the two variables. If the two variables are perfectly correlated, then all the plots will fall on the diagonal (regression line).
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories. The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis. The heights of the bars correspond to the frequencies. The bars should be of equal width and they should not be touching me other bars.
A compound bar chart ( also called component bar chart ) is a variant: here the bars are cut into various components depending on what is being shown. If percentages are used for various components of a compound bar, then the total bar height must be 100% . The compound bar chart a little more complex but if this method is used sensibly, a lot of information can be quickly shown in an attractive fashion.
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments, each representing a category or subset of data (part of the whole). The amount for each category is proportional to the area of the sector (slice of the pie). The total area of the circle is 100% and it represents the total population that is being shown.
Misuse of graphics
” It pays to be wide awake in studying any graph. The thing looks so simple, so frank, and .so appealing. that the careless are easily fooled. ” – M J Moroney.
Graphs and charts are often misused. The honest researcher must have a good handle on how graphs can be used to deliberately mislead people so that such misadventures can be avoided.
The problem of scaling: The oldest trick in the book is to mislead people by exaggerating or compressing the scale of the graph. The figures below show the same data: trend of death rate in women with breast cancer in England and Wales between 1951 and 1981. The first graph shows an alarming increase in the trend while the second graph looks much less impressive. The difference between the graphs lies in the fact that the second one has a true zero on the Y axis. Researchers must be cautious of drawing graphs with the base cut on the vertical axis, a true zero must be included as far as possible.
The Advertiser’s Graph: This is very commonly used by drug companies to push their drugs in the market. Neat trends are shown to demonstrate the beneficial effects of a drug – without any scale! One must always be careful while interpreting graphs and charts which do not have a scale Also, watch out when a scale is given but no units are mentioned!
The graph with scanty data: This is another trick which drug companies use often. Graphs with great looking trends and comparisons are drawn without any mention about the sample size! The trial of the drug could I been made on only a handful of patients and this fact is neatly concealed.
The transformed graph: Sometimes it is necessary to transform the data from an arithmetic scale to a logarithmic scale. If the reader is not careful in appreciating this, the effect can be quite misleading. If transformation has been done, this should be justified and clearly labeled.
The chart with too much data: This is a problem which is very commonly encountered when reading he literature. Graphs and charts tend to have too much information and all that they succeed in doing is give a headache! It is worth remembering Hill’s words on this matter: ” In medical literature the amount of time wasted by useless or unreadable diagrams is quite astonishing… it should be remembered that quite simple data are perfectly clear in a table and therefore a graph is merely a waste of space. Alternatively with complicated data a diagram may be equally of no assistance and a waste of space.”
Guidelines for making good presentations
1. Before making a graph/chart, decide on the point that you wish to present and then chose the appropriate method.
2. Emphasize one idea at a time in a figure; too much information in a graph or a chart defeats the purpose of the entire presentation. Keep it simple and straightforward!
3. Use conventional graphing methods; e.g., time is almost always plotted along the X axis.
4. Pay careful attention to the scaling of the graph; use equal increments as far as possible.
5. Graphs and tables must be self-contained and must stand on their own without reference to the text Clear labels of the-graph is a must; make sure you mention the what, when and where of your data.
6. Specie the units that are being used clearly; for example, if rates are being shown, mention whether the rates are per cent, per 1000, per 100000, etc.
7. As far as possible, mention the total sample size of the data set for which the graph or chart is made.
8. Only keys/legends should be within the field of the graph.
9. If colors are being used in the graphs, make sure too many colors are not used.
10. Be consistent in the use of colors and fonts in a series of graphs and tables.
11. Graphs and charts are subsidiary aids to the intelligence and should not be taken as evidence associations. That evidence must be drawn from the statistical tables and tests. Hence, graphs are a substitute for the actual tables. Statistical tables contain the basic data and they allow the readen make their own calculations and judgments.
References & further reading
1. Last JM (Ed. ). A Dictionary of Epidemiology, 3rd Edition. Oxford University Press, 1995.
2. Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. Geneva, World Health Organization, 1993.
3. Hill AB, Hill ID. Bradford Hill’s Principles of Medical Statistics, 12th edition (Indian). New Delhi: B.I. Publications, 1993.
4. Gonick L & Smith W. The Cartoon Guide to Statistics. HarperPerennial, 1993.
5. Moroney MJ. Facts from Figures. England: Penguin Books, 1953.
6. McNeil D. Epidemiological Research Methods. John Wiley & Sons Ltd, 1996.
7.HuffD. How to lie with statistics. Penguin Books, 1954.
Exercise: Basic biostatistics
1. The serum cholesterol levels of 20 patients attending a diabetes clinic were measured, and the following results were obtained:
260 210 240 230 260 280 280 300 300 210
150 220 190 210 240 200 276 288 300 310
Summarise the major features of this data set by calculating
a) the mean and median
b) the range and standard deviation:
c) Depict the frewuency distribution as a bar diagram:
2. In a study done to determine the daily energy requirements of healthy adult men, heights and weights of medical students (14 subjects) were measured. The following are the body mass index (BMI) values (Kg/m^) for the 14 subjects:
24.4 30.4 21.4 25.1 21.3 23.8 20.8 22.9 23.2 21.1
23.0 20.6 26.0 20.9
a) Compute the mean, median, range and standard deviation for this data set.
b) Construct a histogram to display this graphically.
c) What percentage of the measurements are within one standard deviation of the mean? Within two standard deviations? Three standard deviations?
Dr. Madhukar Pai MD, DNB
Consultant, Community Medicine & Epidemiology
Email: [email protected]