I've done this for the proposed use of Omniture for the mozilla.com,
mozilla-europe and Mozilla Japan; these sites the first step for people
experiencing our products. We can come back to the proposal for Google
Analytics.
It's a somewhat long discussion, and includes a bunch of information
that will be well known to many of you. I wanted to write something
that could be understood by people who aren't deep into technology or
privacy issues. I've put it in my blog as well.
Many thanks to Basil and Catherine for the enormous amount of work
represented by the discussion Basil launched, and for helping me with this.
Mitchell
Mozilla Websites, Web Analytics and Privacy
This document discuses the application of web analytics tools to Mozilla
websites.
We live in a world of data; we should be thinking carefully about that
data and its impact. Many people don't realize how much information
about them is collected by websites and used as a business asset. Some
of those who do understand don't care, or figure there's no sense
talking about it. But a core of the Mozilla community is intensely
focused on privacy and the individual person's ability to understand and
control personal information. This has always been the case, and it is
part of our strength. These aspects should continue to inform the
development of both our software and our websites. With this in mind,
I've put together a discussion of a particular data-gathering proposal,
together with the safeguards that make me comfortable with it.
We would like to understand how people interact with Mozilla's websites,
in particular the consumer-facing websites such as www.mozilla.com,
mozilla-europe.org and mozilla-japan.org. To do this we want to
implement tools that measure what people do when they visit these sites.
These tools are generally known as "web analytics" tools. In particular,
we want to implement a product called SiteCatelyst from a company called
Omniture for a range of Mozilla websites. The specific sites, the phased
rollout plan and the evaluation details are below. Using this services
means that data about Mozilla visitors will be processed by Omniture,
and will be stored on servers that are not under the direct, physical
control of Mozilla. This is new to us and requires consideration of
appropriate safeguards. Some wonder if it should even be done. I believe
the proposal below is worth trying, and that our arrangement with
Omniture includes appropriate safeguards.
Commitments
Mozilla will use the web analytics data only to determine aggregate
usage patterns for our website. We will not seek to determine personal
information from this data. Omniture will use the data from Mozilla
websites only to provide and maintain the service for Mozilla; it will
not share the information with others or use the information for other
purposes. Omniture will not "correlate and report on any Customer Data
with any other data collected through other products, services or web
properties." The domain names in Mozilla cookies will clearly identify
their affiliation with Mozilla and the Omniture service. We will have
public discussions of the results. Before the end of 2008 we will have a
public discussion about the benefits (or lack thereof) of using this
system. There will be a clear public statement about which web analytic
services, if any, are in use with our websites. There will be a public
notice and discussion period before including other types of websites,
such as developer.mozilla.org and spreadfirefox.com.
Description
One aspect of the Mozilla project that is bigger than many people
realize is our website presence. There are actually a number of Mozilla
sites. (Or, in industry terms, "website properties.") There are the
development and community-focused sites like developer.mozilla.org, and
spreadfirefox.com. And then there are the websites that consumers visit
-- in particular the download, support and services mozilla.com,
mozilla-europe.org, and related sites. The latter are significant web
presences, causing Mozilla to periodically appear in the list of top 50
most visited websites published by comScore (an Internet measurement
firm analogous to Nielson in the TV space).
1. Our websites act as integral components of our users’ experience.
They are also a primary way of communicating with most of our users who
aren't likely to read Planet Mozilla, the newsgroups or other community
tools. Today we know very little about how people interact with our
websites, in particular the consumer-facing websites. To improve the
experience we first need to know some basic data about how users
interact with our website properties. We’d like to understand things
such as:
* Is something we think should be easy -- like getting from a
top-level page to useful add-ons -- simple enough for people who aren't
familiar with Mozilla?
* If we add a landing page with explanations, do people get lost at
those pages? Or do these pages help people as we had hoped?
* How many users successfully find, download, install and become
long-term Firefox users?
* What paths do people take through the website?
* Is something new (like the dropdown content on the "whatsnew"
page) useful to people? How many people see that page and actually click
on the links?
* Do people find the language version of Firefox that fits their
location?
2. Each of these websites is large and complex, and each gets an
enormous number of visits from general consumers -- that is, from people
who are not familiar with Mozilla, may not be power users, and whom we
can't claim to understand from our own experiences. Those of us who work
on the Mozilla project have -- by definition -- some familiarity with
Mozilla. That is not the case for most of our current 150 or so million
users. What feels "easy to use" or comfortable to us could be completely
wrong for many people who visit these websites. Furthermore, what might
make sense in one language or locale might not be helpful in other
languages or cultural contexts.
3. How do we develop a better understanding of how people interact with
a website? The basic answer is to gather aggregate data about how people
use the website. The term generally used to describe this is "web
analytics." Aggregate data will help us answer the types of questions
listed above.
4. What techniques are used to instrument a website so that it
aggregates data about usage patterns? Two elements are used together to
gather data-- "cookies" and "web beacons." A cookie is a string of
information that a Web site stores on a visitor’s computer, and that the
visitor’s browser provides to the Web site each time the visitor
returns. Because the browser provides this cookie information to the
website at each visit, cookies serve as a sort of label that allows a
website to "recognize" a browser when it returns to the site. A "web
beacon" is a marker placed in a webpage that makes it easier to follow
and record the activities of a recognized browser, such as the path of
pages visited at a website.
5. Are there negative things that could happen with this data? As with
many kinds of data, yes. It is possible to correlate web analytics data
with other data and potentially figure out persona information. Mozilla
does not do this and Omniture is not allowed to correlate Mozilla data
with any other data to derive personal information.
6. What precisely is Mozilla proposing to do? Use a web analytics
product from Omniture called SiteCatalyst to measure interaction with a
number of our other consumer-facing websites. The proposed rollout of
the web analytics is in phases:
* Phase 1: www.mozilla.com, firefox.com, getfirefox.com,
*.mozilla.com. Rollout is pending discussion and feedback on this
document. I believe the concerns raised in the newsgroup discussion are
addressed, so there may very little discussion to be had. In that case,
the implementation will occur shortly. We would also amend our Privacy
Policy as appropriate to describe the storage and processing of this
data by a third party.
* Phase 2: www.mozilla-europe.org, possibly mozilla-japan.org,
pending discussion and feedback on this document.
* Phase 3: Discussion and review period of usefulness of data at
the end of 2008.
* Phase 4: (Pending outcome of Phase 3): add other Mozilla websites
such as: addons.mozilla.org, developer.mozilla.org, www.mozilla.org,
spreadfirefox.com, planet.mozilla.org; or consider use of a different or
additional web analytics program.
7. Isn't there an open-source or free software version that will do the
job? Not that we know of.
8. Why don't be build our own? This is a significant project in which we
have no expertise. We need a solution that works at scale, in a complex,
distributed setting, and is available now. That's a serious project to
take on, and one that would certainly take a lot of time and focus. We'd
need to build a new community of people that embodies Mozilla DNA and
values AND build a world-class piece of software. We're not experts in
analytics or in defining requirements, so we would have to wait until a
fair amount of development was done before we could even begin to
evaluate how helpful the project was. For those people who were around
Mozilla since the early days, you will undoubtedly remember the enormous
pain of trying to build the application (in those days the Mozilla
Application Suite) before we had a solid infrastructure (the Gecko
implementation.) The idea of building an analytics package while trying
to use it at the same time on websites as complex as the those in
question is a recipe for disaster.
9. Why Omniture? Omniture has many positive points. The use of the data
is limited to providing the web analytics service to Mozilla. The
product SiteCatelyst is widely used solution for large websites; it’s
known to scale, be stable, and provide reliable, trustworthy results.
Access to the data is highly secured and Omniture provides support
resources. In addition, there is a user interface for allowing
individuals to opt out of the web analytics processing. There are some
drawbacks of course, there usually are. Omniture is not open source
code, which we always prefer. Our arrangement with them is contractual.
That's helpful in that it allows us to include the privacy safeguards in
the contract. But as is almost always the case the complete contract is
confidential. Omniture has been criticized for its business practice of
using cookies that don't clearly say they are from Omniture. It turns
out Omniture allows its customers to specify whether they want a cookie
with the Omniture name in it. Mozilla cookies will do so. And finally,
Omnniture is not free. Use of Omniture requires payment, unlike other
options and the cost generally rises with the usage of the sites. So it
could get expensive and we'll have to monitor this.
10. How will we evaluate if the data is worth the effort to get it?
We'll look at the results. We have a set of people who are adapt at
looking at data -- Ken, Polvi and Daniel, who just joined us. Ken and
Polvi have been publishing what we've learned from the data we do have,
and we'll see what can be learned from the additional data. We've
already moved the data (known as "metrics") discussions into the public
via the Metrics Blog We will continue to do this.
11. Will Omniture be used with all Mozilla websites? We don't know yet.
As noted above, we'll do a review of the consumer-facing sites and see
how valuable the data is and how we feel about gathering it. We may also
look at alternative providers as part of this discussion. Then we can
decide about other sites as well such as our developer and community
facing websites.
12. Privacy Policy. Our current privacy policy says that Mozilla data
won't go to an outside third party. So it will need amendment to allow
for this case. Details on the proposed changes will follow, but for now
I'd like to talk through the goals and proposed techniques.
13. Sensitivity to data, privacy and user control. Most websites (and
the organizations running them) are unabashed about collecting data, and
using that data to improve their business. The use of web analytics is a
standard practice, taken for granted by many website operators. This
proposal is an extremely mild version. Some people have suggested to me
that this discussion is "much ado about nothing" and reflects an extreme
focus on privacy of a portion of the Mozilla community. I agree that
this is a mild proposal, collecting the most basic of data. But I don't
believe this discussion, or the basic concern is irrelevant or extreme.
As noted above, we live in a world of data; we should be thinking
carefully about that data and its impact.