Michele Weiss
Space Department (SIS)
Johns Hopkins University Applied Physics Laboratory
(240) 228-4806 or (443) 778-4806
Great example of how the set of metrics outlined in the original post can be
presented.
An example of the PDS/PPI Node metrics generated at Google Analytics is at:
http://hdmc-analytics.googlegroups.com/web/Analytics_ppi.pds.nasa.gov_201005
15-20100614_DashboardReport.pdf
-Todd-
The types of questions to ask is are they adequate? What are the minimum requirements? Does each one of our organization's web statistics fulfill them and if not, what would be involved in bringing them up to a state where they could? Are there any additional statistics that should be gathered and what would the costs associated w/ those additional statistics be?
Instead of reinventing the wheel, perhaps if each VxO could send out the types of statistics that their IT departments routinely produce for other web projects, we could then analyze the differences from there?
Michele
John F. Cooper, Ph.D.
Heliospheric Physics Laboratory, Code 672
NASA Goddard Space Flight Center
8800 Greenbelt Road
Greenbelt, MD 20771
Phone: 301-286-1193
Cell: 301-768-3305
Fax: 301-286-1617
E-mail: John.F...@nasa.gov
________________________________________
From: hdmc-an...@googlegroups.com [hdmc-an...@googlegroups.com] On Behalf Of Weiss, Michele [Michel...@jhuapl.edu]
Sent: Tuesday, June 15, 2010 12:13 PM
To: hdmc-an...@googlegroups.com
Subject: RE: hdmc-analytics: Measuring success in the HPDE
http://soho.nascom.nasa.gov/stats/
and SDAC:
http://umbra.nascom.nasa.gov/usage/
Webstats pages. We have a non-public Webstats page that shows us STEREO and other individual sites' stats. I can produce page views if people are interested; they use the same webalizer format as the SDAC page, above.
Since instances of the VSO can be run anywhere, we have no central depository for VSO stats. We solicit updates whenever we have a senior review proposal/invited talk/whatever for which summing such stats would be appropriate.
For what it's worth, as far as I know, the prohibition on retaining Webserver logs that include IP addresses of the client hosts remains in force, within NASA at least. Although it's several years old now, I have seen no updated directive, and have to assume that we can't retain logs with such information for more than a month, except for a forensic or technical investigation. We thus cannot amass statistics on new visitors, since we're not allowed to retain the old ones' IP addresses. We also cannot ask people for PII unless we somehow ascertain their age (more "sensitive PII") and get parents' permission for people 13 and under, so we just don't ask.
We are now several months into a request to allow some persistent cookies that contain no PII other than a user-generated ID and preferences for browser appearance (to facilitate aggregating and de-aggregating huge search returns, e.g. for SDO); no end in sight (site?).
In a broader sense, I can't see Tbyte/month moved as a meaningful statistic, particularly if we're comparing SDO and and earlier, non-imaging instrument, where the data rates might be a factor of 100,000 different. I agree that machine-recognizable acknowledgments in refereed papers will produce the best "metric" we might easily obtain. More difficult ones might be, "How much time (or data tech salary and benefits) did you save writing this paper by using the VxO?" and "Would this paper have been possible at all without the VxO?"
Best,
Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."
- Douglas Adams, 1952 - 2001
Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA
As for the idea of tracking references to the services, I think this will be much harder. Let's say someone uses the lovely ability to download data directly to IDL; the VxO will never be visible, and it is not clear it should be. If we ended up with SOHO-like stats, I don't think anyone would question the success. We are orders of magnitude from that.
Joe--any idea how many requests for SOHO data are now mediated by VSO?
Aaron
Can you comment from a headquarters perspective which metrics are expected.
What is typically included in division reports about projects?
Or is this something a project decides itself?
-Todd-
> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-
Aaron
What we get from the Web GUI (at each site) are the numbers in the pretty color bar graphs in the attachment (from our senior review proposal last year): the number of shopping carts, the number of items in the carts, and the data volume. N.b. that data in carts doesn't mean that any or all of the data were actually downloaded, only that the users "requested" (as opposed to downloaded), or at least asked to put in a cart, that much data. This is really on the infrared edge of the lies - damn lies - statistics - Web statistics (- VO statistics?) continuum. Possibly the mm wave range.
Since at least half of SOHO data requests probably hit our site (the other half hit the Stanford archive, for MDI), we should be able to come up with some sort of stats.
Joe, George -
If it's not a major hassle, could we comb the server logs to count VSO IDL client download volume for a recent month or two?
Thanks,
Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."
- Douglas Adams, 1952 - 2001
Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA
Since the objectives of a VxO is to aid in the dissemination of data
would the main measure of success be the same as for a mission?
Especially since a mission is designed to enable new discoveries
which should result in refereed papers. Providing data doesn't
seem to have that same potential and may even be transparent
to the user (as in your IDL example).
It seems bytes delivered, number of visits and similar metrics
would indicate VxO success.
-Todd-
> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-
> anal...@googlegroups.com] On Behalf Of Roberts, Dana A. (GSFC-6720)
> Sent: Wednesday, June 16, 2010 11:43 AM
> To: hdmc-an...@googlegroups.com
> Subject: Re: hdmc-analytics: As long as we're all boasting....
>
Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."
- Douglas Adams, 1952 - 2001
Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA
True the business of a VxO is to enable science. I didn't mean to imply
otherwise.
Perhaps my perspective can be better expressed with a different example.
If IDL or MATLab is used in performing the analysis are they referenced?
The software may be a critical part of the process, but it's a passive
part which is used in creative ways. I see VxOs as a similar
kind of science utility. Enabling science, but not a creative part
and thus not something one might reference. This may vary some from
VxO to VxO.
If we set a benchmark of number of publications that reference a VxO
as a measure of success we may be disappointed. Plus, if we use
publications as a measure then to insure proper attributions,
we may find ourselves putting demands on users that
are unwarranted. For example: "You can use this data, but you must
do X or Y."
-Todd-
> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-
> anal...@googlegroups.com] On Behalf Of Joseph B. Gurman
> Sent: Wednesday, June 16, 2010 2:35 PM
> To: hdmc-an...@googlegroups.com
> Subject: Re: hdmc-analytics: As long as we're all boasting....
>
Aaron
Web hits and the like are a very rough approximation of use, but I suspect most people would be underwhelmed by a few hits per day and impressed by tens of thousands.
I also think we should not underrate comments of supporters. I keep asking people to identify even one person who says they find their research to be greatly enabled by a particular VxO. We need advocates, and we don't seem to be at that stage, apart from VSO (yes?).
Aaron
SPDF/CDAWeb has produced reasonable acknowledgment statistics. But (a) we directly deliver data so we can label the data with a request to acknowledge our role and (b) acknowledgements and even simple usage statistics may reflect more on the importance of the underlying data than on the excellence of the specific service providing the data. I'm personally still comfortable asking for acknowledgements wherever I effectively can, because we use those acknowledgements to help argue the services being used should be continued. I'm also confident that even SPDF data and services often contribute to presentations and eventually to papers where they are not acknowledged.
Even though it may feel like "something different from mentioning the data providers", I think the VxOs and VxO services need to take whatever opportunities they can to ask users to acknowledge in papers that their services have been used. Maybe over time our various data services will become so accepted as useful and necessary that acknowledgements won't be necessary, but I think everyone needs what they can get for now. A price the community may have to accept if they actually come to value such services and want them continued.
To the issue of statistics, SPDF/CDAWeb and our other services also have the luxury of being able to count when someone actually does something (creates and downloads a graphic, listing or data file). In this context, I can consider it a good thing when the number of simple web hits is actually not too high if I know the count of system "executions" is reasonable (given the size of our research community, given SPDF's target audience is researchers, given our services are probably not of broad appeal to the general public) , because that shows the system is working efficiently. And we also have to live with the reality that when we created the 4D Orbit Viewer, we lost web statistics because users could simply download orbits into their local environment to manipulate rather than request successive plots for example. Which I can footnote and explain on usage charts as much as I want, but the instinctive reaction is always going to be "looks like usage has gone down."
So some statistics may make it more plausible a given service is useful but I don't know how statistical measures can ever be used to rank different services against each other or other activities (which is always the programmatic issue).
Positive user comments and success stories are also always a good thing to make it more plausible a given service or program is useful. But what we all really want is that our services are so broadly useful and well known that we'll typically find at least one independent but nonetheless informed and enthusiastic supporter on most future advisory and review panels.
Bob
In a discussion with Ray Walker he mentioned a different type of metric
we haven't considered yet. References to data. If the data used in a
paper is referenced (like a publication) with a unique identifier this
could be tracked and perhaps easily determined by bibliographic systems.
To meet publisher requirements this would require us to adhere to
persistence rules (and perhaps the generation of DOI), which means
having a robust archive.
-Todd-
Aaron
The discussions have been very informative and I thought they had
progressed enough that a draft report/recommendations could be written.
I've included references to NASA policies which I could find.
There was also a discussion topic on the NASA Policies which was posted
on the Google Groups site, but apparently did not get sent.
It provides a little bit of context. I did not include the
30 day hold limit on web logs Joe mentioned because I couldn't
find a policy. Perhaps I missed it. I did find a statement
that web logs are exempt from privacy policies if they meet
certain requirements. I've included a reference to the
document in the report.
Please edit/comment on the report/recommendations.
Most importantly:
Does it provide the right level of detail and analysis?
Do we have consensus on the recommendations?
Did I miss anything?
-Todd-
Aaron
> <2010-Recommendations-v1.pdf><2010-Recommendations-v1.docx>