Could someone set the record straight on what the article is about?
That's not much to go on, to be honest.... It's fuzzy enough to allow anything.
Reading through that and the comments, I think the potential gain (whether to the company, the foundation, or the user community), *no matter how good it is* and *no matter how well it is marketed* is going to be outweighed by the loss of goodwill. There's just no way we can possibly avoid enraging the screechy monkeys, nor is there any way we can shout louder than them.
Well... It seems to me that the core problem so far is that the whole thing is
posed in a vague enough way that people can easily jump to conclusions.
Personally, I can't really make sense of John's post. Either I'm missing some
very large pieces of information here, or the situation is just as open to
potential privacy and abuse issues (unintentionally so, I must add!) as the
Are we talking about standardizing in-use data-gathering techniques so as to
give users more control over them (aka <a ping>)? Are we talking about
gathering overall information about the Firefox user community (10% of our users
are in Germany; 7% of our users use Macs, 5% of our users view the BBC front
page at least once a week, that sort of thing)?
It seems to me that people are assuming (without much justification, but without
any counterindications either) that at least part of the process will be looking
at "anonymized" per-user data. I certainly hope this is not the case; I don't
think there's a reasonable way to anonymize such data. But I think overall
statistics, if they are gathered in ways that cannot be tied to users might be
OK. In my personal judgment, of course.
Even then we have possible problems, I should note. If we are gathering such
aggregate data, and one day, say, all schools in China switch to using Firefox
as their official browser, and the next usage report shows some sort of
significant changes in aggregate usage of certain websites, people might draw
some (warranted or not; not sure which is worse) conclusions from this. Of
course it'd have to be a big group of users making the switch all at once to
rise above the expected noise level...
It might help to come out with a clear message here about what we are NOT doing,
if we know that. In fact, I would consider that a necessity. And it needs to
be very clear and understandable, not couched in fuzzy wording subject to
interpretation, if it's to lay fears to rest.
If this is impossible, then I think I agree with Zachary about the likely
effects of this, at least in certain cicles.
- no, there is no secret data project.
- no, there is no secret plan to snoop or collect user data
- no, we are not already secretly collecting data
- yes, we are trying to figure out how we can accumulate better data about
how users are using their browsers, and what they're trying to accomplish;
as with everything we do, this starts with public discussion to make sure we
do it right in terms of respecting user privacy and our own community
ideals - that's what Lilly was saying.
- yes, any such program would be opt-in, not opt-out
Alex Polvi recently released a double-opt-in add-on (first you need install
it, then you need to turn it on) which collects clicksteam data to
understand what buttons are pressed most often in the UI. Even in this
add-on, Spectator, the data is double blinded and anonymized.
I know this is a touchy subject, and everyone is looking for a scoop, but
our record here is clear. We're open. We don't do things without talking
about them first. So we're talking about the goals and objectives and
constraints, and someone from TechCrunch decided to make a story out of it
and misrepresented the facts.
I hope this helps.
----- Original Message -----
To: dev-pl...@lists.mozilla.org <dev-pl...@lists.mozilla.org>
Sent: Mon May 19 10:38:09 2008
Subject: Re: Data snooping
Robert Accettura wrote:
> Pretty sure this is the latest and most detailed that's out there by
> who can speak with authority:
That's not much to go on, to be honest.... It's fuzzy enough to allow
Data is a giant issue with people feeling safe and in control of their
Internet lives. It's true today among people who understand what's going
on, it's an area of concern that more and more people are likely to feel.
That means it's an area where we should consider: what if anything can
Mozilla do to help people be in control?
A closely related topic -- and I think this is the one that sparked the
current discussion -- is whether there are large chunks of *aggregate*
non-personal data about how the Internet is used, that might be good to
be transparent. The big portals have a lot of information about
Internet trends and usage for which there is no open alternative. The
public can't see it, citizens can't see it, only the big commercial
enterprises can. Maybe this is the perfect setting, maybe not. We
should think about whether this is what we want and whether there are
things Mozilla could do to make the setting better.
Some people have jumped to the conclusion that this means Mozilla would
adulterate our core values and the primacy of user control. They
assert, or assume, or worry that thinking about data means somehow that
Mozilla will simply join the existing model of gathering and
commercializing personal data.
*This is us not the case.*
Mozilla's approach to data must be improving the user experience. What
that means is the topic of discussion. Some may urge Mozilla opt out
completely. I hope we don't take this approach. Maintaining individual
control in a data-centric world is incredibly important. I'd like to
see Mozilla provide some leadership here. I don't know what that
means specifically or how we would do this, the discussion is just
beginning. But I hope to feel safer about data relating to me in the
future than I do now.
I think there is a subset of the user base that will jump to negative
conclusions if Moz does ANYTHING AT ALL even remotely resembling user
tracking. (They're probably already unhappy with the phoning home for
This subset of the user base is also loud. Very loud. Capable of
shouting louder than us, no matter how hard we try.
> It might help to come out with a clear message here about what we are
> NOT doing, if we know that. In fact, I would consider that a necessity.
> And it needs to be very clear and understandable, not couched in fuzzy
> wording subject to interpretation, if it's to lay fears to rest.
while I think this is a great idea, especially if phrased as "things we
will not do, ever" and made legally binding on the foundation, I'm not
sure it's enough, if there remains permitted to us some sort of user
tracking, or thing that can be misinterpreted as user tracking.
> If this is impossible, then I think I agree with Zachary about the
> likely effects of this, at least in certain cicles.
I should also point out that the Internet rumor mill makes it easy
for a smear that is *completely false* to circulate endlessly, shorn
of context and refutations. Probably people are going to be pointing
at that Reg article ten years from now and saying "see, Moz is evil."
Thank you. Note that I personally know that no one here is up to anything
underhanded. The problem is convincing others of this.
> - no, there is no secret data project.
> - no, there is no secret plan to snoop or collect user data
> - no, we are not already secretly collecting data
It would be good saying all this very publicly. In particular, item 2, which
was not made at all clear in any of the coverage I've seen so far.
> - yes, we are trying to figure out how we can accumulate better data about
> how users are using their browsers, and what they're trying to accomplish;
This phrasing is significantly different from anything anyone else has said
about it in terms of the connotations it raises with me. It's very much what I
was looking for in terms of explaining to others what the goal is.
I think it would be worthwhile to publicise this as the goal, emphasizing that
we want to gather such data without tying it to any individuals and without
infringing on anyone's privacy and in an opt-in way without the
did (since the specifics may well turn out to be wrong, in any case).
And we should make very sure that anyone we talk to about this carries this
message: "data on what users are trying to accomplish and how they use their
browsers" (as opposed to the focus being which websites they browse, though that
could be part of the "how they use their browsers" thing) and "opt-in".
> as with everything we do, this starts with public discussion to make sure we
> do it right in terms of respecting user privacy and our own community
> ideals - that's what Lilly was saying.
Certainly. And people aren't very good at listening. The problem I had with
John's post was that he didn't make it very clear what the goals were, so people
focused on their own guesses as to what the goals were, and assuming the worst
as people do.
> - yes, any such program would be opt-in, not opt-out
That's pretty clear, though apparently some people are having a hard time
> Alex Polvi recently released a double-opt-in add-on (first you need install
> it, then you need to turn it on) which collects clicksteam data to
> understand what buttons are pressed most often in the UI. Even in this
> add-on, Spectator, the data is double blinded and anonymized.
This would be an excellent example to provide to make it clear that we're not
simply talking about tracking users' browsing here.
> I know this is a touchy subject, and everyone is looking for a scoop
Which is the problem here, yes, and which necessitates that we make it crystal
clear what the goal is what we are NOT doing. The articles in the press so far
have done neither, unfortunately. It's sad that we have to be so careful in
this minefield to avoid people jumping to incorrect conclusions, but I think
that's just how things are here.
> and someone from TechCrunch decided to make a story out of it
> and misrepresented the facts.
I think it would have helped our case if John had made that clear in his post
pointing to said article....
> I hope this helps.
I didn't personally need help, for what it's worth; I trust you and I trust John. :)
I do think it helps in clearly articulating the information that we should be
getting out there when someone asks questions about it, which is what this
thread started out as.
Which is why I think Mike's characterization of the goals here (which are
nothing like user tracking, I must add) is very important.
> (They're probably already unhappy with the phoning home for
> updates, even.)
I think that if we could make this phoning fully anonymous, while still
providing the right response (the correct app and the right localization of that
app) we would... As it is, the update request ends up sending an IP address
(sad fact: we have to send the reply there) and the app/localization desired, at
the very least. There's no way we could do updates without those pieces of
information. Note that we don't _save_ the IP/app/locale association that I
know of, which is how it should be.
> This subset of the user base is also loud. Very loud. Capable of
> shouting louder than us, no matter how hard we try.
Perhaps. The question is whether they shout somewhere where someone is listening...
> I should also point out that the Internet rumor mill makes it easy
> for a smear that is *completely false* to circulate endlessly, shorn
> of context and refutations. Probably people are going to be pointing
> at that Reg article ten years from now and saying "see, Moz is evil."
This is why I think it's so important that we get the right message out to the
press every single time they contact us.
And most of the "screechy monkeys" appear to equate such data with
"selling my browser history to Corporation XYZ for $$$," which, AFAICT,
is far from what anybody is suggesting. I'm hard-pressed to think of
anyone proposing something more intrusive (for lack of a better word)
than an Amazon-esque "people who bought X also bought Y."
Following which, the primary purpose of such a project, at least as well
I glean, is for research, used here in the academic sense. Commercial
products allowing researchers to access data is not uncommon; I recall
an Economist article a few months ago about researchers who used World
of Warcraft to model what would happen in a mass-disease pandemic, and
found some surprising results (e.g., some people with a disease will
deliberately infect others). While my imagination is currently
experiencing a prolonged rut, I foresee that any data collected by
Firefox would be intended for similar purposes.
In conclusion, I would also like to point out that the majority of these
"screechy monkeys" appear to be Slashdot-like readers in that they are
complaining about something they know practically nothing about. Even
core Mozilla developers are rather fuzzy on the details, so very few can
actually claim to really know what's going on. But look at all of the
comments: several of the non-ranting ones actually have good gems, like
Alright, I'm going to weigh in here for a bit. Note that I have
*nothing* to do with whatever's going on in the Firefox data collecting
space, and I'm not going to talk about that, specifically. So what I am
saying could be completely irrelevant. Of course, I hope that it isn't.
ChatZilla, for those who are unfamiliar, is an IRC client that runs as
an extension to Firefox, is part of the SeaMonkey Suite, and can be run
standalone on XULRunner. I'm one of the ChatZilla developers, and we
recently started a data collection program (formally CEIP, Customer
Experience Improvement Program).
Basically, the idea that we had was that we wanted to know how our users
used the UI of ChatZilla to do what they do. We don't really care
precisely where they're going, but how they're going there, so to speak.
So what we do is logging, perhaps like Spectator (which I haven't looked
into at all, I should note), how often which commands get called, which
main dialogs get shown, and how this happens. What we don't log is what
exactly the commands are doing (that is, which channels people join,
which files they transfer, etc.). All the data is associated per
randomly generated internal user ID, but this is not linked to any IPs
or actual data values (eg. nicknames, which we obviously also don't
store), and hence anonymous.
In addition, the program is opt-in. ChatZilla does randomly select users
whom are asked if they would like to participate, rather than waiting
for them to find out about the program and enable it without any
incentive whatsoever from our side. The request to opt-in is not modal,
however, and nothing is logged or sent until the user does opt-in (so,
doing nothing really still is opt-out, even for this user group).
I've so far seen one user (admittedly, before I had added this privacy
policy) come to #chatzilla to complain about how we had turned into
spyware. They calmed down after we explained what was going on, however.
Now, comparing ChatZilla with Firefox would be completely ridiculous on
a userbase and usecase perspective (I would imagine that the general IRC
user is perhaps more technical/geeky than the average Firefox user, and
there are hundreds of other differences in purpose and so forth).
However, some things that I/we learned when doing this, that probably
apply to Firefox:
- Be very clear about what you do and don't collect. Don't just say "no
identifiable information is collected", because that gives no clue
whatsoever what you don't collect, or what you consider "identifiable
information". Try to imagine what users would be worried about (form
data they enter, websites they visit, preferences they set, search
terms, etc. - recall the debacle with people misunderstanding the
awesomebar as sending your search terms to google immediately, as well
as searching your history)
- Point out a usecase. Why do you need this data? I actually already
blogged a little bit about that as well - we are now noticing, for
example, how many ChatZilla users close their network tabs. There is a
bug on file for an option to have these always be closed after the
connection is successful, which we are considering giving higher
priority (because from a user perspective, it seems like many people
find them useless, so for most users it would be better if this
behaviour was improved so we did the repetitive things *for* them).
- Capitalize on openness. People are weary when it comes to giving away
any data whatsoever, as they should be, but the fact that it is
impossible for us to collect something without being spotted doing this,
even without active reverse-engineering by suspecting users, is a big
credibility boon, as far as I'm concerned.
If you have questions (or want to rant about how terrible a thing we're
doing, even after the above explanation), go for it. :-)
[ Hey, I know it's poor form to reply to your own post, but I forgot to
make this addendum, so here goes: ]
However screechy the current batch of readers are, we likely won't have
heard the ugliest vitriol until someone posts it on Slashdot. With any
luck, the fine people at Mozilla will have prepared a rebuttal to
counter the misinformation posted there. Hmm, I hate to say this, but
it's making me feel that I'm trying to defend a heinous crime when all
I'm trying to do is rebut some mischaracterized information.
This article was a direct result of a conversation with John, according
to John's blog.
But to answer your question, I think it's perfectly fine to issue a
press release to the effect of "Journalist X at publication Y is telling
a base lie", along with a link to the information that's already on the
web which shows that it is in fact a base lie. And send this press
release to the editors, as well as to the competitors of the publication.
Won't work so much for bloggers, except insofar as other bloggers blog
But sure, if there is a coordinated smear campaign life gets hard. That
could happen no matter what we do, though.
> We can push a message all we want, but that won't change the fact
> that the press can pick and choose facts to make the story come out how
> they want it to.
"The press" is not that monolithic on this issue, and still sometimes
has a reputation to keep. Maybe.
> In conclusion, I would also like to point out that the majority of
> these "screechy monkeys" appear to be Slashdot-like readers in that
> they are complaining about something they know practically nothing
Ayup. I borrowed the term "screechy monkeys" from John Scalzi - it
is intended to suggest exactly people who make a huge fuss about something
they know very little about, perhaps deliberately misunderstanding it or
extending it well beyond what a reasonable person would, and cannot be
reasoned with. Slashdot comments are one of the canonical places to find
My point is that these people exist, and (because the Internet rumor mill
works the way it does) other people listen to them; perhaps having no idea
that the factoid they heard ultimately originates in someone who was talking
complete nonsense. As such, screechy monkeys have the ability to generate
endless, enormous bad PR. In my opinion, any move by Moz to collect additional
information from users, no matter how reasonable on its face, risks drawing
the attention of the screechy monkeys - indeed, *has already* drawn their
attention - and this is a sufficient reason not to do any such thing.
Speaking as one who started the thread on mozilla.support.firefox
entitled "Preventing data gathering by browser" 19/05/2008
I'd just like to say I'm not a monkey and I didn't screech. I just asked
a simple question
"Is this true?
If so - can anyone recommend another browser?"
As my ISP is currently introducing Layer 7 interception of my browsing
using a company called Phorm Inc. I'm "attuned" to privacy issues over
browsing. I use Firefox because I believe it protects my privacy better
by nature of its design, and because I am more sympathetic to the ethics
of Open Source software than I am to those of Microsoft. But trust is a
fragile thing and easily lost, and once lost, very hard to regain.
I'll be watching the tone of the replies, as that indicates to me very
clearly the way Mozilla feels about its users.
I'm quite ready to admit I know nothing about the issue - that is why I
asked the question. It's what I do when I don't know something - ask for
Best wishes to all.
Rev Robert M Jones, Wimborne Baptist Church, UK
Free trial of Mailwasher Pro - effective email spam filter - (commission
goes to our partners in Bulgaria)
The original Techcrunch article was; the Register and Heise and other
knock-on articles were not, and I don't think that any of the other
(mis)reporters of the story were in contact with anyone.
I think one core problem here is that people are not used to hearing
about discussion of things as unformed as the putative "Data Project"
from the CEO of a major industry player. And, indeed, the other three
projects mentioned in that article (Weave, Mobile, and a little
browser experiment codenamed "Firefox") are much further along, so the
juxtaposition didn't help understanding at large.
There is no project underway to collect user browsing data, other than
the already-known and deeply opt-in Spectator, as mentioned previously
in this thread. There is no secret plan, there are no people working
on it, and the sum-total of the "project" is conversations John has
had with Arrington and Asay, plus his blog post (and then the
subsequent conversations here).
There is a growing *individual* understanding held by some people that
there are no good, open sources of web statistics, which are proving
to be quite important in guiding investment in the web (commercial and
non-commercial sorts of investment both). Some people, myself
included, believe that Mozilla could participate in creating such
sources in a way that was extremely respectful of user privacy and
provided a much-needed level of transparency into how the raw inputs
are collected and analyzed. The current sources for information about
traffic on the web are extremely black-box, and that makes it hard to
determine which of the various numbers are useful for making whatever
decision you have at hand.
But if Mozilla (in the very largest sense) can't figure out a way to
improve that space that isn't true to our decade-long history of
protecting user privacy, choice, and control...then we won't help
improve that space. I don't think that we should set our project
direction on the basis of avoiding "screechy monkey" reaction, though,
since that lets our activities be defined by people who are extremely
ill-informed about what's actually happening, or what the tradeoffs
I also read that as one thing that John wanted to point out in his post:
Lots of data are already collected somewhere in private right now, and
it would be a good idea if we at Mozilla could figure out a way to make
those processes and collections public.
Interestingly, that's something going directly in the opposite way of
the FUD posted by some people about our intentions.
Mike Belzner's post above in this thread (5/19/08 11:47 AM) gives the
specific "no" you are looking for. My post of a minute or two later
(5/19/08 12:11 PM) lays out the framework. The article does not paint a
realistic picture of either.
Thank you - that was helpful.
Having been involved fairly heavily in the current UK debate over Phorm
(similar to NebuAd in US) I am familiar with the way certain stories get
misreported/hijacked/cause panic. Part of why this happens is because
trust has been lost - my ISP (BTBroadband/BTYahoo! over here in UK) have
lost my trust because of lies they have told over the last two years,
and things they did in secret (and denied doing until forced publicly to
retract those denials) - so the situation now is that it really doesn't
matter what they say or do - we assume the worst, assume they are lying
to us, and see significance in every detail.
I sincerely hope Mozilla Foundation never gets itself into that
position. So far so good. Many thanks.
Yes, the trust issue is important. That's why stories like that in the
register are so disturbing. Thanks for the response
I think Zachary was specifically referring to comments on John's blog post here.
> I just asked a simple question
> "Is this true?
> If so - can anyone recommend another browser?"
The answer is "that story is false".
Yes, that's the one I was referring to. It was written in a way pretty much
guaranteed to engender misunderstanding. The other articles just state blatant
falsehoods, but this one certainly said various things that were very easy to
interpret in all sorts of somewhat-paranoid (with good reason, given how various
organizations treat privacy) ways.
It doesn't help that such articles often simplify the issue, often to the point
where important parts go missing. :(
> the Register and Heise and other
> knock-on articles were not, and I don't think that any of the other
> (mis)reporters of the story were in contact with anyone.
> I think one core problem here is that people are not used to hearing
> about discussion of things as unformed as the putative "Data Project"
> from the CEO of a major industry player.
That seems pretty likely. Otherwise we wouldn't have to make it clear what this
is NOT about, since it would be obvious that this is just a very preliminary
discussion, not a plan...
> But if Mozilla (in the very largest sense) can't figure out a way to
> improve that space that isn't true to our decade-long history of
> protecting user privacy, choice, and control...then we won't help
> improve that space.
I think we all agree with each other (and have all along). The question is how
to communicate what we're doing to people, I think. The posts in this thread
from beltzner and Mitchell are a very good approach that I think we should
consider making more prominent (e.g. blogging or having theregister do a
followup article, preferably withdrawing their baseless accusations in the
> I don't think that we should set our project
> direction on the basis of avoiding "screechy monkey" reaction, though
I agree. We should still think a bit about how to mitigate the impact of said
reaction, if any.