SYSTAT 12
SYSTAT 12 Review
By: John Ludbrook
Department of Surgery, University of Melbourne
Published by: Clinical and Experimental Pharmacology and Physiology, Volume 35 Issue 1 Page 103-104, January 2008
I make no apology for this being an unusual and wide-ranging review. Although I have reviewed several specialised statistics programs, (ref:1,2) I have never reviewed SYSTAT or any other general-purpose statistics package. I have used SYSTAT in successive versions for over 20 years. During that time, I have searched for the ‘perfect’ statistics package and have tried out spss, sas, Statistica and Minitab. However, I have always rejected them in favour of SYSTAT because of its combination of user-friendliness and breadth of coverage.
In The American Statistician there are recent reviews of SYSTAT 11, (ref:3) SigmaStat, (ref:4,5) Stata, (ref:6) spss (ref:4,7,8) and Statistica. (ref:9) However, these are reviews for statisticians, not for biomedical scientists or even biostatisticians.
SYSTAT is the brainchild of Leland (Lee) Wilkinson of Northwestern University (Chicago, IL, USA). His company, Systat Inc. (Chicago, IL, USA), bypassed mainframe computers and launched the first DOS-based, command-driven version of SYSTAT for PC's in 1983. Since then, it has been adapted to the Windows environment and has become menu driven. In 1995, SPSS Inc. took over SYSTAT. In turn, SPSS sold it in 2002 to Cranes Software International Ltd, which is based in Bangalore, India. The SYSTAT arm of their enterprise is now incorporated as Systat Software Inc. (San Jose, CA, USA). Cranes has been responsible for versions 11 and 12, under the direction of Dr T Krishnan, Chief Statistical Architect for SYSTAT. Cranes has also bought SigmaPlot and SigmaStat from SPSS and Cranes has entered into an agreement with the Cytel Software Corporation in order to provide an ‘add-on’ to SYSTAT of exact tests on categorical variables, plus a limited number of permutation tests on means (T Krishnan, pers. comm., 2007).
What do biomedical scientists want from a statistics package?
Joseph Hilbe, the editor responsible for software reviews in The American Statistician, recently conducted a poll of readers to discover what procedures they regarded as important in general purpose statistics software. (ref:10) In order, these were linear models, generalised linear models, logistic regression, exact statistics, design of experiments, survival analysis, non-linear models and exploratory data analysis.
Of more relevance to biomedical scientists, the statistical techniques used in articles published in The New England Journal of Medicine and Nature Medicine were described recently. (ref:11) These are of note for their simplicity: t-tests, simple anova, regression and correlation, survival analysis and simple analyses of tables of frequencies. I use, and I continually urge my colleagues to use, much more complex statistical routines than these. A simple example is to test global hypotheses, for instance by way of the interaction terms in complex anova, rather than make multiple pairwise comparisons of means.
What are the features of SYSTAT that make it attractive?
These are, in no particular order:
- it costs a great deal less than its major competitors;
- it is available under licence at 14 Australian universities and research institutions;
- it provides a similar range of tests to those in spss, sas and Stata and a wider range than SigmaStat and Statistica;
- it is driven by a very comprehensive menu (unlike Stata or sas), although commands are available for those who prefer them;
- this menu caters for even the most complex forms of anova, including interactions, nesting and repeated measures, for all forms of linear and non-linear regression analysis, for logistic regression analysis and for survival analysis;
- SYSTAT's manuals are written in clear language and are very well referenced (over the years, I have acquired a good deal of my understanding of statistics from them).
This tradition continues and has been extended in SYSTAT 12. Thus, SYSTAT 10.1 had six manuals (3.70 kg) and SYSTAT 12 has 10 manuals (7.25 kg). The additional material has been contributed by Dr Krishnan’s team at Cranes International. However, size should not be the sole basis for judging quality. For instance, in SYSTAT 10.1 and earlier versions only five post hoc tests for comparison of means following one-way anova were offered: the Duncan and Student–Newman–Keuls’ procedures were deliberately omitted because they fail to adequately control the familywise Type I error-rate.
SYSTAT 12 offers no fewer than 15 post hoc tests (including the two mentioned above). There is a risk that the statistically naïve user will try out all 15 tests in turn and select the one that gives the most pleasing outcome. I would like to see these post hoc tests reviewed critically in the manual, together with recommendations about which are safe, in the sense of not inflating the Type I error-rate.
I had better start at the beginning: data. SYSTAT’s data files look just like a spreadsheet. Data can be entered directly into a blank data file or they can be imported from a wide selection of spreadsheets and other statistics and graphics programs. There is also an advanced capability for editing data. It includes much that can be done on a spreadsheet, such as Microsoft Excel, but it has several additional features. These include the ability to select cases, select groups, select reference levels in logistic regression, create trimmed groups and so forth.
To go straight to the end: the output that results from statistical testing. In SYSTAT 12, the output display is much more elegant than in SYSTAT 10 and, importantly, it automatically prints-to-fit in an A4 portrait format. The output also includes valuable graphical displays: of least squares means in anova (although the numerical values are not given, as they are in spss) and an easy facility for displaying the residuals; and scatterplots, lines of best fit and residuals following regression analysis.
Graphics were a special interest of Lee Wilkinson and SYSTAT has always had an advanced Graphics module. In the past, the graphs have not been of publication quality, but have included very advanced three-dimensional plots and surfaces and, unique among graphics programs, there is a ‘jitter’ option. Thus, identical points in a scatterplot can be ‘jittered’ so that the several identical points can be distinguished. In SYSTAT 12, the quality of the printed output of graphs is much improved, although there seem to have been no other major changes. However, the graphical capabilities of SYSTAT 12 are not as versatile as those of SigmaPlot, which has now also been taken over by Cranes International. It will be interesting to see whether these will be developed separately or will converge.
It has been impossible for me to try out every single routine offered by SYSTAT 12. Under the menu item ‘Utilities’, there are routines available for which I have, in the past, used other specialised software: for instance, probability tables and power (minimal sample size) analysis. The output of the latter is graphical rather than numerical and pretty comprehensive (except for repeated-measures anova, which only nQuery Advisor provides for (ref:1)). Then there are a host of routines that I rarely, if ever, have had occasion to use: Monte Carlo random sampling, bootstrapping, quality analysis and so forth. So what I have done is scrutinise closely those routines that I use on a daily, weekly or monthly basis to analyse my own or others’ data. I have tested these out against published data sets. I discuss my findings below.
The great strength of SYSTAT resides in its ‘General Linear Model’ module, which encompasses everything from the simplest one-way anova to the most complex multiway forms of anova, including repeated-measures anova. SYSTAT does an excellent job of these analyses.
The ‘Survival’ module includes the usual non-parametric procedures: Kaplan–Meier survival curves and comparisons of survival by the Mantel–Haenszel logrank and similar techniques. It also caters for Cox’s proportional hazards regression analysis, invaluable for comparing survival under more than two sets of conditions, but it does not permit the extraction of interactions of the sort exemplified by table 3.2 of Hosmer and Lemeshow (ref:12) (this can be done in Stata 9.2).
SYSTAT’s ‘Regression’ module is also excellent and provides for linear, non-linear, logistic and robust regression. However, it deals only with Model I regression analysis and does not provide explicitly for Model II regressions (although these can be executed by way of the loss function). (ref:13)
Perhaps I should explain. The most commonly used form of Model I regression is so-called ordinary least squares regression. It is a prerequisite of this that the x-values be fixed by the experimenter, but the y-values are free to vary. Often, neither the y- nor the x-values are fixed. In these circumstances one or another form of Model II regression is required. Neither does SYSTAT address head-on the vexed matter of how best to compare two measures or measurers in which an interval scale is used. The two best solutions are Model II regression analysis (e.g. ordinary least products regression) or the Altman–Bland method of differences. (ref:14) The former can be executed by SYSTAT, but requires some ingenuity. (ref:13) SYSTAT provides for robust regression techniques, useful when there are outlying x,y-values. My preference is for the least trimmed squares (LTS) technique. I am puzzled that when I executed this on an example there was an admittedly small, but nevertheless definite, discrepancy between the outcomes from SYSTAT 12 and SYSTAT 10.
SYSTAT provides for any sort of non-linear regression analysis, but users must be familiar with, and specify, the mathematical model: for instance, exponential, hyperbola or, commonly used in dose– response and stimulus–response studies, the sigmoidal four-parameter logistic regression. It would be very helpful if worked examples were given of some of these commonly used non-linear regressions.
How and why do biomedical researchers select a statistical software package?
For a start, they want a package that is relevant to the sort of research they do. The package they use is likely to be the one used by their postgraduate supervisor, is licensed by their institution or has been the focus of a course they have attended. Cost may also be an important influence: note that SYSTAT 12 for academics, plus manuals, costs A$1,450.00 all in (including 10% GST and 12 months of free product updates/upgrades) . In contrast, spss base 16 plus, say, five add-ins costs nearly A$12,000.00 and it is of enormous advantage if a colleague, or a biostatistical consultant, uses the selected software and is available for advice and troubleshooting. (Note: SYSTAT is avaialable at the academic rate for Hospitals in Australia and New Zealand)
I make two suggestions. First, SYSTAT is advertised as being useful to those who work in disciplines such as astronomy, archaeology, engineering, geology and manufacturing. I wish Cranes International would add the biological and biomedical sciences to that list and show they are in earnest by adding worked examples from those disciplines. Second, in the old days of SYSTAT, there was a vigorous SYSTAT Users Group that was based in Melbourne and mailed out a very helpful newsletter called SysNet (from which I learned of Model II regression). (ref:13)
There is currently a SYSTAT website for interchange of problems and solutions (http://board.systat.com). Cranes International plans to start a global newsletter towards the end of 2007 (S George, pers. comm., 2007). This should be a great asset for SYSTAT users. SYSTAT 12 requires the Microsoft Windows platform. Cranes International tell me that they hope to issue a version for the Linux platform in a couple of months and a Macintosh version in 18 months or so (S George, pers. comm., 2007). I have confessed to my 20 year love affair with earlier versions of SYSTAT. Nevertheless, I have done my best to review SYSTAT 12 objectively, drawing attention to its deficiencies as well as its virtues. I commend it to biomedical scientists because of its ease of use, comprehensive coverage and low cost. Moreover, it is certain that Cranes International will continue to develop and expand SYSTAT. If readers have any doubts, I suggest they take up the offer of a free 30 day evaluation copy.
References
- Ludbrook J. Software review: DBMS/COPY 7 for Windows, nQuery-Advisor 4.0 for Windows, MathType 5 for Windows. Clin. Exp. Pharmacol. Physiol. 2002; 29: 739.
- Ludbrook J. Software Review: Statxact, version 6 with Cytel Studio.Clin. Exp. Pharmacol. Physiol. 2004; 31: 367.
- Hilbe JM. A review of SYSTAT 11. Am. Statistician 2005; 59: 104–10.
- Hilbe JM. A review of current spss products: spss 12, SigmaPlot 8.02, SigmaStat 3.0. Part 1. Am. Statistician 2003; 57: 310–15.
- Hilbe JM. SigmaStat 3. 1: A second look. Am. Statistician 2005; 59: 187–91.
- Hilbe JMA. Review of Stata 9.0. Am. Statistician 2005; 59: 335–48.
- Hilbe JMA. Review of spss 12.01, Part 2. Am. Statistician 2004; 58: 168–71.
- Hilbe JM. A review of spss, Part 3: Version 13.0. Am. Statistician 2005; 59: 185–6.
- Hilbe JM. statistica 7: An overview. Am. Statistician 2007; 61: 91– 4.
- Hilbe JM. Section editor’s notes. Am. Statistician 2007; 61: 78.
- Strasak AM, Zaman Q, Marinell G, Pfeiffer KP, Ulmer H. The use of statistics in medical research: A comparison of The New England Journal of Medicine and Nature Medicine. Am. Statistician 2007; 61: 47–53.
- Hosmer DW, Lemeshow S. Applied Survival Analysis: Regression Modelling of Time to Event Data. John Wiley & Sons, New York. 1999.
- Ludbrook J. Comparing methods of measurement. Clin. Exp. Pharmacol. Physiol. 1997; 24: 193–203.
- Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: A critical review. Clin. Exp. Pharmacol Physiol. 2002; 29: 527–36.