Measuring success in the HPDE

1 view
Skip to first unread message

Todd

unread,
Jun 15, 2010, 12:38:55 PM6/15/10
to HDMC Analytics
In order to determine what information (metrics) should be collected
to assess the operations of the HPDE we need to determine what
criteria the HDMC and NASA HQ will look at measure success.

For other projects we routinely collect the following information:

o Total bytes transferred
o Total requests - page views. (with bots filtered out)
o Total bytes/requests by domain (or requesting host).

This would be reported for each element (service) in the system.

Useful views into the basic metrics include

o transfers by geographical region (US, etc.)
o visits (sessions with multiple requests)
o time on site (length of a session)
o new visitors

Some additional information which can be collected and is useful for
development purposes are:

o Visits by Browser type
o Connection speed
o traffic sources (where are visitors coming from).

Comments?
-Todd-

Weiss, Michele

unread,
Jun 15, 2010, 1:13:13 PM6/15/10
to hdmc-an...@googlegroups.com
I believe that this webpage should be externally viewable http://www.timed.jhuapl.edu/stats/timed/current/ but it is the type of statistics that are collected for all websites in our Space Department. If it is not viewable, I can summarize it.

Michele Weiss
Space Department (SIS)
Johns Hopkins University Applied Physics Laboratory
(240) 228-4806 or (443) 778-4806

Todd King

unread,
Jun 15, 2010, 1:23:40 PM6/15/10
to hdmc-an...@googlegroups.com
Hi Michele -

Great example of how the set of metrics outlined in the original post can be
presented.
An example of the PDS/PPI Node metrics generated at Google Analytics is at:

http://hdmc-analytics.googlegroups.com/web/Analytics_ppi.pds.nasa.gov_201005
15-20100614_DashboardReport.pdf


-Todd-

Weiss, Michele

unread,
Jun 15, 2010, 1:55:06 PM6/15/10
to hdmc-an...@googlegroups.com
And Todd, that is another very good example. One thing to note is that these types of statistics are extremely cheap to implement and most of our institutions are already implementing a version of it on other websites w/in your organization so there would be very little costs associated w/ implementation for our respective VxOs.

The types of questions to ask is are they adequate? What are the minimum requirements? Does each one of our organization's web statistics fulfill them and if not, what would be involved in bringing them up to a state where they could? Are there any additional statistics that should be gathered and what would the costs associated w/ those additional statistics be?

Instead of reinventing the wheel, perhaps if each VxO could send out the types of statistics that their IT departments routinely produce for other web projects, we could then analyze the differences from there?

Michele

Todd

unread,
Jun 15, 2010, 2:05:34 PM6/15/10
to HDMC Analytics
The Google Analytics example report has been reposted with a better
URL:

http://groups.google.com/group/hdmc-analytics/web/example-google-analytics-DashboardReport.pdf


On Jun 15, 10:23 am, "Todd King" <tk...@igpp.ucla.edu> wrote:
> Hi Michele -
>
> Great example of how the set of metrics outlined in the original post can be
> presented.
> An example of the PDS/PPI Node metrics generated at Google Analytics is at:
>
> http://hdmc-analytics.googlegroups.com/web/Analytics_ppi.pds.nasa.gov...
> 15-20100614_DashboardReport.pdf
>
> -Todd-
>
> > -----Original Message-----
> > From: hdmc-an...@googlegroups.com [mailto:hdmc-
> > anal...@googlegroups.com] On Behalf Of Weiss, Michele
> > Sent: Tuesday, June 15, 2010 10:13 AM
> > To: hdmc-an...@googlegroups.com
> > Subject: RE: hdmc-analytics: Measuring success in the HPDE
>
> > I believe that this webpage should be externally viewable
> >http://www.timed.jhuapl.edu/stats/timed/current/but it is the type of

Cooper, John F. (GSFC-6720)

unread,
Jun 15, 2010, 2:09:34 PM6/15/10
to hdmc-an...@googlegroups.com
Citations to data services in published papers are very important for continuing justification of support, as we have long been doing for Space Physics Data Facility. Needs much through about this would be implemented more generally for HDMC services.

John F. Cooper, Ph.D.
Heliospheric Physics Laboratory, Code 672
NASA Goddard Space Flight Center
8800 Greenbelt Road
Greenbelt, MD 20771
Phone: 301-286-1193
Cell: 301-768-3305
Fax: 301-286-1617
E-mail: John.F...@nasa.gov
________________________________________
From: hdmc-an...@googlegroups.com [hdmc-an...@googlegroups.com] On Behalf Of Weiss, Michele [Michel...@jhuapl.edu]
Sent: Tuesday, June 15, 2010 12:13 PM
To: hdmc-an...@googlegroups.com
Subject: RE: hdmc-analytics: Measuring success in the HPDE

Joseph B. Gurman

unread,
Jun 15, 2010, 3:23:44 PM6/15/10
to hdmc-an...@googlegroups.com
Here are the SOHO (ESA-run on a NASA network):

http://soho.nascom.nasa.gov/stats/

and SDAC:

http://umbra.nascom.nasa.gov/usage/

Webstats pages. We have a non-public Webstats page that shows us STEREO and other individual sites' stats. I can produce page views if people are interested; they use the same webalizer format as the SDAC page, above.

Since instances of the VSO can be run anywhere, we have no central depository for VSO stats. We solicit updates whenever we have a senior review proposal/invited talk/whatever for which summing such stats would be appropriate.

For what it's worth, as far as I know, the prohibition on retaining Webserver logs that include IP addresses of the client hosts remains in force, within NASA at least. Although it's several years old now, I have seen no updated directive, and have to assume that we can't retain logs with such information for more than a month, except for a forensic or technical investigation. We thus cannot amass statistics on new visitors, since we're not allowed to retain the old ones' IP addresses. We also cannot ask people for PII unless we somehow ascertain their age (more "sensitive PII") and get parents' permission for people 13 and under, so we just don't ask.

We are now several months into a request to allow some persistent cookies that contain no PII other than a user-generated ID and preferences for browser appearance (to facilitate aggregating and de-aggregating huge search returns, e.g. for SDO); no end in sight (site?).

In a broader sense, I can't see Tbyte/month moved as a meaningful statistic, particularly if we're comparing SDO and and earlier, non-imaging instrument, where the data rates might be a factor of 100,000 different. I agree that machine-recognizable acknowledgments in refereed papers will produce the best "metric" we might easily obtain. More difficult ones might be, "How much time (or data tech salary and benefits) did you save writing this paper by using the VxO?" and "Would this paper have been possible at all without the VxO?"

Best,

Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."

- Douglas Adams, 1952 - 2001

Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA

Roberts, Dana A. (GSFC-6720)

unread,
Jun 16, 2010, 1:36:52 PM6/16/10
to hdmc-an...@googlegroups.com
Although we like to say that such raw statistics aren't very meaningful, I think it is hard to deny that SOHO looks like a successful mission based just on these numbers. There comes a point where quantitative becomes qualitative.

As for the idea of tracking references to the services, I think this will be much harder. Let's say someone uses the lovely ability to download data directly to IDL; the VxO will never be visible, and it is not clear it should be. If we ended up with SOHO-like stats, I don't think anyone would question the success. We are orders of magnitude from that.

Joe--any idea how many requests for SOHO data are now mediated by VSO?

Aaron

Todd King

unread,
Jun 16, 2010, 2:01:16 PM6/16/10
to hdmc-an...@googlegroups.com
Hi Aaron -

Can you comment from a headquarters perspective which metrics are expected.
What is typically included in division reports about projects?
Or is this something a project decides itself?

-Todd-

> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-

Roberts, Dana A. (GSFC-6720)

unread,
Jun 16, 2010, 2:42:50 PM6/16/10
to hdmc-an...@googlegroups.com
I think everyone suffers from the problem of not knowing how to measure effectiveness. For the missions, the main measure is refereed papers published, and ideally this would be true for us too, but it is more difficult.

Aaron

Joseph B. Gurman

unread,
Jun 16, 2010, 4:04:39 PM6/16/10
to hdmc-an...@googlegroups.com, Joe Hourcle, George Dimitoglou, Joseph B. Gurman
Actually, we can track IDL client uses of the VSO_GET function, because the transfer is effected via http and our server logs have entries for IDL's socket interface as the client software. In fact, that's the only VSO-built search and retrieval functionality where that's the case. I'll check with Joe H. for verification, but I'm pretty certain that's the only source of our file statistics --- and we'd have to get those from the data providers (in whose interest it presumably is to provide such usage information). We can get volume statistics from each site.

What we get from the Web GUI (at each site) are the numbers in the pretty color bar graphs in the attachment (from our senior review proposal last year): the number of shopping carts, the number of items in the carts, and the data volume. N.b. that data in carts doesn't mean that any or all of the data were actually downloaded, only that the users "requested" (as opposed to downloaded), or at least asked to put in a cart, that much data. This is really on the infrared edge of the lies - damn lies - statistics - Web statistics (- VO statistics?) continuum. Possibly the mm wave range.

Since at least half of SOHO data requests probably hit our site (the other half hit the Stanford archive, for MDI), we should be able to come up with some sort of stats.

Joe, George -

If it's not a major hassle, could we comb the server logs to count VSO IDL client download volume for a recent month or two?

Thanks,

Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."

- Douglas Adams, 1952 - 2001

Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA

vso_stats_2009_prop.pdf

Todd King

unread,
Jun 16, 2010, 5:28:42 PM6/16/10
to hdmc-an...@googlegroups.com
Hi Aaron -

Since the objectives of a VxO is to aid in the dissemination of data
would the main measure of success be the same as for a mission?
Especially since a mission is designed to enable new discoveries
which should result in refereed papers. Providing data doesn't
seem to have that same potential and may even be transparent
to the user (as in your IDL example).

It seems bytes delivered, number of visits and similar metrics
would indicate VxO success.

-Todd-

> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-
> anal...@googlegroups.com] On Behalf Of Roberts, Dana A. (GSFC-6720)
> Sent: Wednesday, June 16, 2010 11:43 AM
> To: hdmc-an...@googlegroups.com
> Subject: Re: hdmc-analytics: As long as we're all boasting....
>

Joseph B. Gurman

unread,
Jun 16, 2010, 5:35:29 PM6/16/10
to hdmc-an...@googlegroups.com
FWIW, couldn't disagree more. If we're not in the business of enabling science, making it easier to do, and/or maing it more likely to get done, why bother?

Joe
----
"I love deadlines. I love the whooshing sound they make as they go by."

- Douglas Adams, 1952 - 2001

Joseph B. Gurman, Solar Physics Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771 USA

Todd King

unread,
Jun 16, 2010, 6:58:15 PM6/16/10
to hdmc-an...@googlegroups.com
Hi Joe -

True the business of a VxO is to enable science. I didn't mean to imply
otherwise.

Perhaps my perspective can be better expressed with a different example.

If IDL or MATLab is used in performing the analysis are they referenced?
The software may be a critical part of the process, but it's a passive
part which is used in creative ways. I see VxOs as a similar
kind of science utility. Enabling science, but not a creative part
and thus not something one might reference. This may vary some from
VxO to VxO.

If we set a benchmark of number of publications that reference a VxO
as a measure of success we may be disappointed. Plus, if we use
publications as a measure then to insure proper attributions,
we may find ourselves putting demands on users that
are unwarranted. For example: "You can use this data, but you must
do X or Y."

-Todd-

> -----Original Message-----
> From: hdmc-an...@googlegroups.com [mailto:hdmc-
> anal...@googlegroups.com] On Behalf Of Joseph B. Gurman
> Sent: Wednesday, June 16, 2010 2:35 PM
> To: hdmc-an...@googlegroups.com
> Subject: Re: hdmc-analytics: As long as we're all boasting....
>

Roberts, Dana A. (GSFC-6720)

unread,
Jun 16, 2010, 10:56:03 PM6/16/10
to hdmc-an...@googlegroups.com
Certainly those are important measures. The ultimate measure is how well the services enable research, and thus the "papers test." On the other hand, if people end up coming to the VxOs for data, in preference to other routes, then presumably they are doing something to help them. I find that browsing with VO-like tools greatly facilitates exploring ideas. Easy to subset and ingest data also helps. Etc. It's all about efficiency.

Aaron

Roberts, Dana A. (GSFC-6720)

unread,
Jun 16, 2010, 11:40:14 PM6/16/10
to hdmc-an...@googlegroups.com
We all agree on the main goal, but to put it perhaps more precisely, suppose someone uses IDL to get data, using some VSO designed operator and middleware: I don't think I'd expect an acknowledgement of VSO any more than of IDL. If you would, how would you make sure the person (a) knows they used VSO and (b) cites it? We all know these are tough things to enforce. Or to take a case more near to me, I've used VSPO quite a bit in writing my recent papers, but I have not mentioned it in the papers. Maybe I should, but it feels like something different from mentioning the data providers.

Web hits and the like are a very rough approximation of use, but I suspect most people would be underwhelmed by a few hits per day and impressed by tens of thousands.

I also think we should not underrate comments of supporters. I keep asking people to identify even one person who says they find their research to be greatly enabled by a particular VxO. We need advocates, and we don't seem to be at that stage, apart from VSO (yes?).

Aaron

Bob McGuire

unread,
Jun 17, 2010, 11:40:09 AM6/17/10
to hdmc-an...@googlegroups.com
To state the obvious, different kinds of services will produce different kinds of statistics, and all statistics probably say much less than we want.

SPDF/CDAWeb has produced reasonable acknowledgment statistics. But (a) we directly deliver data so we can label the data with a request to acknowledge our role and (b) acknowledgements and even simple usage statistics may reflect more on the importance of the underlying data than on the excellence of the specific service providing the data. I'm personally still comfortable asking for acknowledgements wherever I effectively can, because we use those acknowledgements to help argue the services being used should be continued. I'm also confident that even SPDF data and services often contribute to presentations and eventually to papers where they are not acknowledged.

Even though it may feel like "something different from mentioning the data providers", I think the VxOs and VxO services need to take whatever opportunities they can to ask users to acknowledge in papers that their services have been used. Maybe over time our various data services will become so accepted as useful and necessary that acknowledgements won't be necessary, but I think everyone needs what they can get for now. A price the community may have to accept if they actually come to value such services and want them continued.

To the issue of statistics, SPDF/CDAWeb and our other services also have the luxury of being able to count when someone actually does something (creates and downloads a graphic, listing or data file). In this context, I can consider it a good thing when the number of simple web hits is actually not too high if I know the count of system "executions" is reasonable (given the size of our research community, given SPDF's target audience is researchers, given our services are probably not of broad appeal to the general public) , because that shows the system is working efficiently. And we also have to live with the reality that when we created the 4D Orbit Viewer, we lost web statistics because users could simply download orbits into their local environment to manipulate rather than request successive plots for example. Which I can footnote and explain on usage charts as much as I want, but the instinctive reaction is always going to be "looks like usage has gone down."

So some statistics may make it more plausible a given service is useful but I don't know how statistical measures can ever be used to rank different services against each other or other activities (which is always the programmatic issue).

Positive user comments and success stories are also always a good thing to make it more plausible a given service or program is useful. But what we all really want is that our services are so broadly useful and well known that we'll typically find at least one independent but nonetheless informed and enthusiastic supporter on most future advisory and review panels.

Bob

Todd King

unread,
Jun 17, 2010, 4:09:29 PM6/17/10
to hdmc-an...@googlegroups.com
Hi all -

In a discussion with Ray Walker he mentioned a different type of metric
we haven't considered yet. References to data. If the data used in a
paper is referenced (like a publication) with a unique identifier this
could be tracked and perhaps easily determined by bibliographic systems.

To meet publisher requirements this would require us to adhere to
persistence rules (and perhaps the generation of DOI), which means
having a robust archive.

-Todd-

Roberts, Dana A. (GSFC-6720)

unread,
Jun 17, 2010, 4:23:45 PM6/17/10
to hdmc-an...@googlegroups.com
This is a major reason that I'm pushing to have a unique set of SPASE IDs that are stable. These IDs could become accepted as the standard identifier.

Aaron

Todd King

unread,
Jun 17, 2010, 7:19:22 PM6/17/10
to hdmc-an...@googlegroups.com
Hi all -

The discussions have been very informative and I thought they had
progressed enough that a draft report/recommendations could be written.
I've included references to NASA policies which I could find.
There was also a discussion topic on the NASA Policies which was posted
on the Google Groups site, but apparently did not get sent.
It provides a little bit of context. I did not include the
30 day hold limit on web logs Joe mentioned because I couldn't
find a policy. Perhaps I missed it. I did find a statement
that web logs are exempt from privacy policies if they meet
certain requirements. I've included a reference to the
document in the report.

Please edit/comment on the report/recommendations.

Most importantly:
Does it provide the right level of detail and analysis?
Do we have consensus on the recommendations?
Did I miss anything?

-Todd-

2010-Recommendations-v1.pdf
2010-Recommendations-v1.docx

Roberts, Dana A. (GSFC-6720)

unread,
Jun 17, 2010, 10:49:09 PM6/17/10
to hdmc-an...@googlegroups.com
The definition of HDMC elements is too broad; SDAC and SPDF take care of themselves, for example, and don't need HDMC to make a policy for them. In general I find this to be ok in principle, but overkill. Most groups seem to be doing something like this. I don't think we should be imposing requirements on people in an area that is secondary to establishing some level of completeness and some sort services that have an informal constituency. [As suggestions, it should also be stated that hits from the organizing group should be excluded.] Maybe phrasing this as suggestions at this point would be better, with a lot less formality. The registry plan is much higher priority than this.

Aaron

> <2010-Recommendations-v1.pdf><2010-Recommendations-v1.docx>

Reply all
Reply to author
Forward
0 new messages