On Feb 6, 8:02 pm, "David E. Ross" <nob...@nowhere.invalid> wrote:
> An enterprise the size of Mozilla must surely have attorneys on staff or
> retainer. You should find out if what is proposed is legal before
> expending any efforts to implement it. Besides Germany, there might be
> other nations with laws impacting on this concept.
>
> Furthermore, where such laws do not exist, Mozilla needs to have a firm
> policy on how the organization would respond to a warrant or subpoena
> for the data. That policy must be in place before the data collection
> begins and should address not only a government's request for the data
> but also a request resulting from a civil lawsuit.
>
We do have a legal team, we also engaged outside legal council
specifically on the question of European and German law for this
project. We have asked the legal and privacy teams to share the
results of their reviews.
On Feb 7, 1:28 am, "Justin Wood (Callek)" <
Cal...@gmail.com> wrote:
> Using this logic, SeaMonkey should gather all data about all users, we
> possibly can, because we have been losing market share heavily every
> since we became SeaMonkey from "the Mozilla Suite".
>
Please reconsider the phrase "should gather all data about all users
we possibly can". This project is not about gathering all data
possible. It has a very specific list of the minimal data that was
determined to be required to answer the questions determined as
necessary to answer. There has been a lot of information shared about
what those questions are and the justifications for most of the data
points on other mediums such as the bugs and the wiki. I am happy to
continue to work toward sharing justifications and considerations for
any of the data listed. It is right for Mozilla and the community to
ask for those explanations. It is difficult to maintain a productive
discussion where everyone has a clear picture of the facts when using
exaggerated phrases though.
> From where I sit, the largest fault of our market share is the fact
> that Google has heavy brand awareness, and is doing LOTS of expensive
> advertising campaigns, and well-done in most cases. So "Google Chrome"
> is interesting to the ignorant-of-computer users.
>
> Also Microsoft is (Finally) developing a Sane IE, which means less
> reason for people to install a different web browser on Windows.
>
Both of these are great concerns that tie in to this project. These
changes in the market are significant changes that primarily deal with
a large class of mainstream users that are under-represented in our
current understanding. These other companies are focusing a lot of
attention on understanding how the browser is used by mainstream
users. We are striving to improve our own understanding.
We don't want to just do things the same way as others though. We
have tried to develop a project that can analyze usage without
collecting personally identifying information. We have worked with
the privacy and legal teams to propose policies to mitigate the
unavoidable PII such as ensuring that IP addresses are never tied to
the data and that we don't leave any easy way to associate identifying
information such as an e-mail address or name with the data. We have
also put into the project a set of goals around giving the users
visibility, functionality, and control of the data generated by their
browser.
On Feb 7, 3:25 am, Henri Sivonen <
hsivo...@iki.fi> wrote:
> ...
> Now Telemetry has been very carefully designed to have privacy
> characteristics that suit Mozilla's stated privacy principles and
> those characteristics have been bragged about. And then another team
> comes along, treats that design as a bug wants to send a per-user ID
> to enable longitudinal study. If doing what this metrics feature
> suggests to be done was OK, surely Telemetry would already have UUIDs
> and support for "longitudinal study".
We definitely spent a lot of time looking at Telemetry and working
with that team. The data that Telemetry collects and the purpose that
it exists for is different though. Telemetry was designed to enable
developers to understand the performance characteristics of individual
features or code paths "in the wild". It does not require retention
or the same sort of longitudinal data that MDP proposes to meet those
requirements. Putting those characteristics into Telemetry would be
doing the very thing that several people have spoken out against,
adding data to a system that is not directly needed by that system.
There is a significant value in judiciously partitioning data by
purpose. It enables better policy governing the data. It allows
finer control over what data is collected and how it is reviewed. It
allows walls to be put up to prevent associations from being made
where the organization does not wish them to be made (for instance
tying usage data directly to crash reports).
> As for the Germany/EU aspect: (Note the rest of this paragraph says
> nothing about law. I'm not trying to play a lawyer here.) Even if
> sending an UUID had no real privacy impact, sending an UUID would be
> bad publicity in Europe. The usage share of Firefox is in the decline.
> Europe in general and Germany in particular is a place where the usage
> share of Firefox is high. It seems like a bad idea to hurt that market
> share in order to study metrics related to it.
I just want to clarify precisely what is being discussed when we say
"sending an UUID". MDP is generating cumulative data on the client
and submitting that data as a document. That document is given a new
UUID and the client retains that document ID. Every time a new
submission is made, it will have a new document identifier. It is
even possible for the identifier to not be part of the URL (which is
sent using SSL). If the user wishes to delete the usage data for
their installation, the browser submits a delete request with last
submitted ID. When a new document is generated on another day and
submitted, the client also sends the old document ID to be deleted so
that there are not two copies of the data on the server. This allows
us to look at retention. If a document is older than N days, we know
that there have been no further submissions from that installation.
This implementation does still require policy and trust. It requires
that we not record IP addresses with the data set. It requires that
we do not longitudinally track location. There might be further ways
we can make it easier to follow those policies.
On Feb 7, 6:19 am, Gervase Markham <
g...@mozilla.org> wrote:
> On 06/02/12 22:16, Daniel E wrote:
>
> > It is an unfortunate fact that even in the other data available to us
> > today, there are occasional ways in which a user can modify their
> > system or browser such that some private information is leaked out.
> > One of the best examples I can give of that is the ability to change
> > variables that are used in the update or blocklist checks. There are
> > requests to those systems that have an e-mail address in the place of
> > the product name ("Firefox"). There are systems that have a changeset
> > or bug number or username in the channel or distribution name.
>
> I have no reason to doubt you that this happens, but there is a big
> difference between designing your system to request particular data, and
> accidentally receiving some of it because a user mis-configures their
> browser.
>
> If I have a web "contact me" form, and someone pastes their entire
> medical history into it and hits Submit, I probably want to delete the
> data - but I don't have to engineer my data handling process for content
> coming from that form so that it's robust for handling medical data!
>
We need the legitimate data that is expected to be in those
variables. We are designing the system to be able to use that data.
We do not want to be burdened by illegitimate data that is available
as the result of a mistake on the part of a developer or user, so we
have made sure that the system has checks and features to restrict and
eliminate that data easily.
> > It was critical for us when we proposed this system to have data
> > collection that was focused on the browser installation rather than
> > any attempt to learn anything about an individual person.
>
> I'm not sure that's a distinction we can make. I am the only user of my
> browser, and I'm sure that's true of lots of other people too. What can
> you tell about me from my list of installed add-ons? I won't give you
> the full list, but I suspect you could tell:
>
> - I do web development of RESTful services using JSON
> - I work for Mozilla
> - I care about my privacy
I believe that it is important to consider even the worst cases, but
please keep in mind that this is not a normal case. The system is
designed such that it would have no way of telling that Gerv is a web
developer who works for Mozilla and cares about privacy. There are
specific policies and features put in place to prevent the system from
ever being able to associate those conclusions with a person. We
don't keep IP addresses with the data to prevent the possibility of
using that IP address to identify the person using an installation.
We use a document identifier so that even if one document ID were ever
leaked or shared by you (say via an e-mail), the ID would change at
the next submission so we would not be able to use that ID to look up
the data from your installation next month and see if you still care
about privacy.