Different authors seem to have different conventions, for example:
Stata statistical package (release 9.1; Stata, College Station, TX)
Stata version 9.1
Is there an accepted format? Does it matter?
I would appreciate anyone's thoughts on this matter.
Thanks,
Patrick
-- --------------------------------------------------------------- Statistical Consultant kylie...@flinders.edu.au Flinders University ph: (08) 8201 3346 Information Services Division fax: (08) 8201 3003 ADELAIDE SA www.flinders.edu.au/compserv/SPSS/
I know that SPSS has their own preferred way of being cited. They say to use the following format:
SPSS for Windows, Rel. 11.0.1. 2001. Chicago: SPSS Inc.
I don't know about any other packages though.
If possible, the software should be listed in the Bibliography proper,
and not in something extra like a "Reference Notes" section.
Often people cite the manual, as this has the usual information like
publisher, city of publication, etc.
Some ways to see what is customary:
1. Other articles in journals like JASA, NEJM, Science, Nature, etc.
2. Search for "citation" in the website of the software company
3. Check the software documentation for ideas
HTH
--
John Uebersx PhD
>This I feel is quite false. Software as a publication and explicit
>description of algorithms is critical, at least if you are doing "modern"
>statistics (i.e. requiring a computer, as opposed to a computer being nice.
>Results for methods at the research level are quite implementation
>dependent, and particular citations are critical to understand the basis
>for the results.
I agree. However, assuming one is talking about 'well known software', to
be adequate in that respect all that is required is that the package be
named, together with its version number and 'upgrade status' (i.e. service
packs, 'hot fixes' etc.)
>Now as to whether you need to provide free advertising, that clear isn't
>mandatory; but proper citations are.
Again, I agree. The information I've mentioned above is all that is
required to fully define the implication; anything beyond that is probably
undesirable.
IMPORTANT NOTICE: This e-mail and any attachment to it are intended only to be read or used by the named addressee. It is confidential and may contain legally privileged information. No confidentiality or privilege is waived or lost by any mistaken transmission to you. The CTC is not responsible for any unauthorised alterations to this e-mail or attachment to it. Views expressed in this message are those of the individual sender, and are not necessarily the views of the CTC. If you receive this e-mail in error, please immediately delete it and notify the sender. You must not disclose, copy or use any part of this e-mail if you are not the intended recipient.
> Software... is not in the same category as papers and books,
> and so there is no need to use the same rules for citation.
I think we will perhaps find that the line between software
and books is increasingly indistinct.
> Nowadays, anyone can find out about standard software by
> going to a search engine
Can't they also go to Amazon.com and find out about a book?
In any case, a search engine won't tell you what version
of the software a person used--something that not
infrequently makes a difference!
> certainly object to cluttering up papers with long-winded
> statements designed by software marketeers, mentioning trademarks
> and the legal status of the company (Inc, Ltd etc): they have no
> place in scientific publications.
This seems like attributing motives here, which is in general
something to avoid IMHO. As for Inc, Ltd, etc--I don't know what
your concern is, but book publishers have the same assorted
abbreviations.
I believe that, among other things, references serve the goal
of letting others reproduce exactly ones results. For this goal,
specifying software can be just as important as formulas, etc.,
especially when there is any issue of ambiguity.
I think one needs to use good judgment here. I *wouldn't* cite
SAS just because I used SAS to perform a t-test. But if I
used something non-standard, which might be difficult for
another researcher to replicate otherwise, then I would.
For example, if it were logistic regression, I might cite
the software, as I can easily imagine that different packages
make somewhat different assumptions.
--
John Uebersax PhD
Fair points!
I want to weigh on on a more general aspect of this question,
however.
All too often, one sees "token" citations of software (and the
procedures used) in journal articles. Adnittedly, provided the
citation is sufficiently detailed one is indeed in a position
to ascertain exactly what was done. But, usually, this is not
the case.
As an example which is only one step up from John's "t-test"
example, a regression with associated ANOVA of data with variables
which are factors depends on what system of contrasts was used
"behind the scenes". Any system of contrasts is valid in theory,
of course, but in a particular application one system is likely
to be more "relevant" than others.
How often does one see this mentioned in the "citation" of the
software procedure used?
Indeed, one wonders how often the investigators are aware of
the issue.
Any statistical software has a "default" system of contrasts
for this kind of procedure, and different software implementations
may have different default systems.
For example, R and S-Plus (as near equivalent as you're likely
to find) use different default systems. In the Linear Model
procedure "lm", S-Plus by default uses "Helmert" contrasts
for unordered factors according to an essentially arbitrary
ordering of the factor levels. So if Factor X has 3 levels
(A,B,C), the linear model lm(Y ~ X) would issue estimates of
treatment effects
(B - A)/2
(2*C - B - A)/6
(the "ordering" A B C having been assigned on alphabetical
grounds; or, if X was set up as an unordered factor with 3
levels designated as "1,2,3" then on numerical grounds.
If you want the ordering to be different then you need to set
it explicitly using the "levels" parameter when setting up
the factor using the "factor" command).
On the other hand, R by default uses "treatment" contrasts
in which the estimates of treatment effects are
B - (A + B + C)/3
C - (A + B + C)/3
There are other built-in systems of contrasts available which
you can choose from using the "contrasts" parameter to "lm",
or you can roll your own.
The "contrasts" story in S-Plus and R can get quite convoluted,
so I'll only give this example. But it shows that someone using
S-Plus for the linear model "lm(Y~X)" will get different answers
from someone using "lm(Y~X)" in R, simply because of these
different default contrasts.
Yet I'm sure that very few investigators (who are not specialists
in the properties of the software they use) would be aware of
this kind of thing. And although it can be found by serious
research into the documentation in both cases, it certainly
does not leap to the eye, and you don't get any warnings "up
front" about it!
And that's an instance of only one kind of analysis -- a simple
linear model for the dependency of Y on a 3-level factor! It's
only one step up from John's t-test -- for which both systems
give identical results, since a t-test is what you apply when
X has only two levels, and then
Helmert: (B-A)/2
Treatment: B - (A+B)/2
give the same result.
But now imagine the scope for different approaches in more
complicated examples, and with other kinds of analysis.
[A formal account of the above issue can be found in Chapter 6,
"Linear Statistical Models" of "Modern Applied Statistics
with S-PLUS" by W.N. Venables and B.D. Ripley, though the
different defaults for R in this case are not noted there
and have to be elicited from the R documentation.]
And other software (such as SAS or STATA) may for all I know
use other systems of contrasts. And everyone and his dog has
produced software for regression (often written by the dog).
So my point is that, whatever the "polite" way to cite
software may be, what matters is giving the information
which readers will need in order to find out what really
happened. John's point about "letting others reproduce
exactly one's results" is all very well and is certainly
a necessary condition; but does not address the need to
let others KNOW WHAT WAS REALLY DONE!
To the extent that writers do not address this question,
and imagine that by a "token" citation of the form "The
data were analysed using the lm procedure in S-Plus version
X.Y [1]" they think they have told the whole story, and to
the extent that journal referees let these token citations
through as they stand with requiring full relevant details
(often on the same grounds as the writers), surely this is
very sloppy practice; and likely to mislead readers who
may have a different prior expectation of (e.g.) what
contrast system was used.
End of rant!
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-06 Time: 12:32:41
------------------------------ XFMail ------------------------------
On 22-Aug-06 Ted Harding wrote:
> [...]
> For example, R and S-Plus (as near equivalent as you're likely
> to find) use different default systems. In the Linear Model
> procedure "lm", S-Plus by default uses "Helmert" contrasts
> for unordered factors according to an essentially arbitrary
> ordering of the factor levels. So if Factor X has 3 levels
> (A,B,C), the linear model lm(Y ~ X) would issue estimates of
> treatment effects
>
> (B - A)/2
> (2*C - B - A)/6
>
> [...]
>
> On the other hand, R by default uses "treatment" contrasts
> in which the estimates of treatment effects are
>
> B - (A + B + C)/3
> C - (A + B + C)/3
>
> There are other built-in systems of contrasts available which
> you can choose from using the "contrasts" parameter to "lm",
> or you can roll your own.
### Set up the values of Y and the 3-level factor X
Y<-c(rnorm(mean=0,n=12),rnorm(mean=2,n=12),rnorm(mean=4,n=12))
X<-factor(c(rep("A",12),rep("B",12),rep("C",12)))
### Present a summary analysis of the Linear Model Y ~ X
### using Treatment contrasts
summary(lm(Y~X,contrasts=list(X="contr.treatment")))
### RESULTS
Call:
lm(formula = Y ~ X, contrasts = list(X = "contr.treatment"))
Residuals:
Min 1Q Median 3Q Max
-1.8630 -0.8983 -0.1523 0.5190 2.7672
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2303 0.3220 0.715 0.47944
XB 1.3057 0.4554 2.867 0.00716 **
XC 3.4204 0.4554 7.511 1.23e-08 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.115 on 33 degrees of freedom
Multiple R-Squared: 0.6352, Adjusted R-squared: 0.6131
F-statistic: 28.73 on 2 and 33 DF, p-value: 5.935e-08
### Present a summary analysis of the Linear Model Y ~ X
### using Helmert contrasts
summary(lm(Y~X,contrasts=list(X="contr.helmert")))
### RESULTS
Call:
lm(formula = Y ~ X, contrasts = list(X = "contr.helmert"))
Residuals:
Min 1Q Median 3Q Max
-1.8630 -0.8983 -0.1523 0.5190 2.7672
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8057 0.1859 9.713 3.34e-11 ***
X1 0.6529 0.2277 2.867 0.00716 **
X2 0.9225 0.1315 7.017 5.00e-08 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.115 on 33 degrees of freedom
Multiple R-Squared: 0.6352, Adjusted R-squared: 0.6131
F-statistic: 28.73 on 2 and 33 DF, p-value: 5.935e-08
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-06 Time: 13:20:09
------------------------------ XFMail ------------------------------
However, I recall SPSS, and perhaps others, running large sets of
complex calculations using their software, and finding that they bug
didn't have any effect on any actual calculations.
Jeremy
A.J. Rossini wrote:
> This I feel is quite false. Software as a publication and explicit
> description of algorithms is critical, at least if you are doing
> "modern" statistics (i.e. requiring a computer, as opposed to a computer
> being nice.
>
> Results for methods at the research level are quite implementation
> dependent, and particular citations are critical to understand the basis
> for the results.
>
> Now as to whether you need to provide free advertising, that clear isn't
> mandatory; but proper citations are.
>
> On 8/22/06, *Peter Lane* <Peter....@gsk.com
> <mailto:Peter....@gsk.com>> wrote:
>
>
> I agree with most of the previous responses, and disagree with John
> Uebersax's suggestion that citation of software is important. Software
> is not in the same category as papers and books, and so there is no
> need to use the same rules for citation. Nowadays, anyone can find out
> about standard software by going to a search engine, so the only need
> is to make clear what program, version and operating system was used. I
> certainly object to cluttering up papers with long-winded statements
> designed by software marketeers, mentioning trademarks and the legal
> status of the company (Inc, Ltd etc): they have no place in scientific
> publications.
> Peter
> Research Statistics Unit, GlaxoSmithKline
>
>
>
>
> --
> best,
> -tony
>
> blind...@gmail.com <mailto:blind...@gmail.com>
> Muttenz, Switzerland.
> "Commit early,commit often, and commit in a repository from which we can
> easily
> roll-back your mistakes" (AJR, 4Jan05).
--
Jeremy Miles
mailto:jn...@york.ac.uk http://www-users.york.ac.uk/~jnvm1/
Dept of Health Sciences (Area 4), University of York, York, YO10 5DD
Phone: 01904 321375 Mobile: 07941 228018 Fax 01904 321320
NOTE: New address from September 2006:
RAND Corporation, 1776 Main St, Santa Monica, CA, USA.
(New email and stuff too, but I don't know it yet).
Similarly, in papers and grant proposals. They say "I used/will use
procedure X in package Y". Aaarrggghhh! Either (a) I don't know what
procedure X in package Y does, so I want the author to explain. Or (b)
Procedure X in package Y does about 74 different things and so it tells
me nothing.
</rant>
Jeremy
The point has also been made that software developers should be
acknowledged for their efforts. I couldn't agree more, having been one
myself for many years. But the place to do this is in the body of the
article, and there is no need to clutter up the references with lists
of software manuals unless there is some explicit section of a manual
that needs to be referred to when discussing algorithms. As for
citation indexes and the like, I don't think they are relevant for
assessing popularity of software, unlike books and articles, precisely
because people will refer to different parts of their documentation.
Some thoughts then...... If I am reviewing a paper that does cite the
statistical software, I must confess that this always triggers a
concern that the authors might not know what they are doing
(statistically speaking)! Given that the vast majority of the studies
I review use commonly used statistical methods - t, chi, M-W, aov,
logistic, K-M, Cox - it seems to me be unnecessary to quote SPSS or
whatever and implies that "since we used SPSS the stats must be OK!!!".
My gut feeling in fact is that the stats are more often than not 'not
OK' when a stats package is in fact cited!!
Richard Szydlo, PhD
Imperial College School of Medicine
Dept of Haematology
Thank you for articulating so clearly and simply the exact
reservations I experience myself!
In fairness, however, one must also allow that people who do
know exactly what they are doing may feel obliged to cite the
software in the "token tickbox" format, perhaps because some
journals expect it, or perhaps because they feel colleagues
expect it.
Best wsihes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-06 Time: 14:30:23
------------------------------ XFMail ------------------------------
> myself for many years. But the place to do this is in the body of the
> article, and there is no need to clutter up the references with lists
You make a good point. I think what I've done (and also what I've seen
when I've reviewed medical papers) is to mention the algorithm, when it
makes a difference, in the body of the paper (e.g., "factor analysis
was performed using SAS PROC FACTOR (SAS, 2002), with the iterated
principal factors option) and then give a short citation of the manual
in the reference section.
Best,
John
--
John Uebersax PhD