Hey Hanno,
Thank you for your detailed discussion of your work and your thinking.
This is exactly the kind of explanation I was seeking. I would hope, as
you take Moz.Loc.Service live, that you will fold your explanation into
text on the web site for the benefit of all.
My comments below aim to push you a little to consider the impact on
User Experience that taking live a service like Mozilla Location Service
in its current state can have on users. (Issues which tend not to be on
the Mozilla Location Service side of things but on the Firefox OS app
developers' side.) Please take my comments below as discussion, or
critique, not criticism. You are surely doing good work. I hope only
that we have the energy and will to deal with the issues which arise
when going live with a data source which has a very peculiar pattern of
accuracy.
On 3/27/14, 8:39 PM, Hanno Schlichting wrote:
> To clarify the scope of the initial MLS use in FxOS: We are first
> looking at Tarako devices, which don’t have a GPS chip. So in order
> to provide user-value, all we have to do is beat “no position
> estimate”.
Can you push yourself a little here? By saying "All we have to do" you
are selling yourself short. Perhaps, that one thing is not exactly 'all
we have to do'. I can see at least two issues.
First, there is the issue that you need to be better than, or at least
be complimentary to, other available data sources. Could you clarify
what other data sources are in use, could be used, or will be in use for
the new device? Also, how do you compare Mozilla Location Service data
to that generated by those other sources? You talk about GeoIP so it
sounds like there is at least one such data source in use. What scale of
error comes out of that service and what is its variability? Also, does
the GeoLocation API use another service that uses radiometric signals
like Mozilla Location Services does, such as OpenCellID? I have a vague
notion that you have been using some Google service so far but that may
be totally wrong (or that might be just for the geoIP).
Second, user experience, when consuming GeoLocation data, is *deeply*
intertwined with the tuple {Position, Error} rather than with the single
element {Position}. Indeed, you actually talk about this below in your
discussion of GeoIP. Furthermore, the User Experience happens across all
applications, not merely applications that Mozilla controls.
Unfortunately, we (humanity) are very good at handling the {Position}
term and terrible at handling (visualizing, analyzing, reasoning with,
...) the {Error} term. (At the scientific level, from what I could
understand in my survey of techniques of spatial analysis, this is
fundamentally due to our complete lack of mathematics for spatial
analysis on things other than points.) Furthermore, we have been so
spoilt with exceedingly high quality spatial data and spatial position
estimates that we have learned to deal with {Position, ~0}, i.e. high
accuracy, and are no longer great at handling position estimates with
bad accuracy.
So the Mozilla task, both in the Mozilla Location Service and in the
Firefox OS team, seems to me to be dual: to produce data with small
{Error} terms and to get *everyone* involved in the production chain
that eventually presents Users with a location estimate to think
effectively of how to handle, present, and analyze the tuple {Position,
Error}. In that context, if the Mozilla Location Service enters
production with a very different type of {Error} term (say highly
variable or with some fixed inaccuracy estimate), that will be important
for everyone to know and understand.
>
> One primary way to achieve this is using GeoIP, which will give you
> country or city-level position estimates. That’s not great for many
> use-cases, but it’s better than nothing. One example might be an app
> showing you restaurants around you.
Okay, let's work with this example. Let us make three work flows, the
first based on text, one GPS based, and one MozLocationService based,
and consider the User Experiences.
Work flow 1: Text
1. Show user dialog 'what city are you in'
=> User picks city (this sucks and is hard to do efficiently)
2. Present user with list of 'Restaurants near you'
=> The user has a pretty good idea that these are simply restaurants in
the city and therefore she ought to do some work figuring out where in
the city they are. User kind of likes the app but is not crazy about it.
Work flow 2: GPS
1. Present the user a list of 'Restuarants near you'
=> The user thinks 'near you' really means 'near you' at the human scale
(i.e. walking) and indeed discovers that the restaurant he chooses is
three blocks away. User loves his 'Eat nearby app'.
Work flow 3: MozLocService
1. Present the user a list of 'Resturants near you'
=> Ouch, now 'near you' is not really nearby since we don't actually
know where 'you is' at human scales. (At car scales, perhaps this is
fine.) If done badly, this means the user will hate the App because it
claims to do one thing but does another. Note that the app, to work
around this, has to be able to react to the {Error} term and present
visibly distinct user interfaces, one that is 'Restaurants near you' and
one that is 'Restaurants in your city'.
So I would suggest that, if we are really focused on the User
experience, "all we have to do" is help everyone deal with what are very
different consequences of {Error} terms which vary over at least three
orders of magnitude (5m for good GPS, 5km for 'I only recognize your
cell antenna in your rural area'). This is not easy at all and is
especially difficult because it goes far beyond any technological
solution we could set ourselves. We need to create good examples, good
documentation, and good communication. Developers creating User
Interfaces need to understand that despite having only a single API
call, they need to think hard about the different possible user
experiences based on vastly different values for {Error}.
> The alternatives are either to
> show a default world-wide view (with no position estimate) or to zoom
> the map view to the country / city you are actually in. The latter
> still requires the user to pan and zoom a bit to get to the right
> area, but it’s better than a world-wide view. While GeoIP is pretty
> bad for mobile networks, a lot of the target users will use WiFi
> networks to access the internet, as the data connections are too
> unreliable/slow/expensive. GeoIP works better for those
> landline-based WiFi networks. In addition the code is already in FxOS
> to send observed cell and wifi networks along to MLS.
Shit. Really? Grr, then I just wasted time rewriting this code for
MvdStumbler. Where is this code?
> So once we get
> more and better data for the countries in question, the user
> experience will get better without any need to update the client
> software.
>
> A second use-case on more capable hardware is around misconfigured or
> missing A-GPS support in FxOS devices. One part of A-GPS is to inject
> a coarse position estimate into the GPS chip with an accuracy
> required to be in the order of 100-200km. This helps the GPS chip to
> figure out which satellites should be visible in the sky and look for
> their specific signals. As this use-case only requires a very coarse
> grained position estimate, it’s also something we can provide based
> on the current MLS. This helps with reducing the “time to first fix”
> from a worst case of 12 minutes to much less time.
Right, that's great. But again is Moz.Loc.Service the only available
source of that data or merely one of several, and how do those compare?
>
> Of course we want to provide better position estimates and make the
> service useful for other use-cases. But that requires a mixture of
> getting much more data and better algorithms and approaches on
> collecting and processing that data. Those things are underway,
Wait. 'Underway' means what? Below you say you are not working on the
codebase. Now I am confused. What is 'underway'? That sounds like what I
wanted to learn about. Is this your work or someone else's?
> but
> don’t have to block MLS use in FxOS, as we can already provide some
> user value.
> On 25.03.2014, at 06:05 , Adrian Custer <
a...@pocz.org> wrote:
>> It would be interesting to have the discussion of the work you are
>> doing to bring the whole system up to production quality on the
>>
https://mozilla-ichnaea.readthedocs.org/en/latest/calculation.html
>> web page. That should probably include a discussion of all the
>> different error terms you are considering in the position analysis
>> and how multiple observations attenuate the different terms.
>
> The short answer is: We aren’t doing any real work in the current
> codebase.
Okay.
The assumption so far is that we only get a single or very
> few observations for any given cell or wifi network, as most of the
> areas are only stumbled once or very infrequently for each cell
> network / cell standard combination.
This is contradicted by the intense discussion of privacy issues being
that the user's 'home' areas are stumbled repeatedly leading to obvious
patterns in the visual representation of the data. There are issues with
such repeated data but 'stumbled only once' is not one of those issues.
But, okay, if you are not even analyzing the data, then we have no clue
about the quality of what we are generating.
> With so little data per network,
> there’s not a whole lot of algorithms you can use. We are fully aware
> that this leads to position estimates with low accuracies. To cover
> the widest area we are going to rely on cell networks. As a result of
> limitations in GSM networks, the devices will only send us a single
> cell observation for the currently connected cell. The best we can do
> here is to equate the estimated user position with the center of the
> cell and a range estimate of the cell size (this method is generally
> known as CellId). That leads to accuracies in the kilometer range for
> urban areas or tens of kilometers in suburban/rural areas.
There are two issues here:
Stumble --(1)--> {cell pos, err} --(2)--> {user pos, err}
the first going from initial observation to estimate the cell antenna
location and the second from that estimate to user position. With only
one observation, we are essentially going to tell the users they are
where the initial stumbler was located and then we have a double (cell
size/f{cell size,signal level}) error term.
How are you figuring out the urban/rural? Are you doing a nearest
neighbour tesselation based on the Cells you actually have in the
database? (Hmm, that makes me wonder how one constrains Vornoi
tesselations given large areas without data.)
> The GPS
> inaccuracies or inaccuracies from sensor / timing mismatch aren’t
> having a big impact at this scale.
Right. But presumably we are also trying to gather today data that will
help us tomorrow. That would mean that we are worried about getting good
Wifi data now, even if we can't really use it yet.
>
> I’m more than interested in hearing about your work and your own
> conclusions.
So far I am writing MvdStumbler, a Firefox OS app for stumbling:
http://pocz.org/ac/projects/mvdstumbler/mvdstumbler.html
which is still a fragile, piece-of-s***. It works fine but has all sorts
of unacceptable rough edges and lacks any kind of UI esthetic or polish.
It also lacks the map panel which is my current work. After that lands,
and if I have the energy to polish it up a bunch, I might announce its
existence formally.
For now, however, MvdStumbler *is* giving me my actual data (beyond
giving it to both MLS and OpenCellID). I am keeping that so that at some
point I might be able to analyze that raw data. (It seems that the data
I upload to Mozilla Location Service have been lost forever due to your
privacy policy.) I already know of all sorts of issues with the data I
am generating and mentioned one, wifi collection, because that seems
unrelated to the way my application gathers data but rather seems a
consequence of the very data itself: that wifi signals are gathered from
the periodic broadcasts from the wifi base stations. But you say that
these issues are not yet of concern for all of you so I guess that
discussion does not make sense yet.
I have done no analysis yet and therefore have no *conclusions*. I can
see from OpenCellID's approach to analysis (they nicely show you how
they arrive at the position estimate for each cell tower) that either my
data totally suck or the algorithms do. Therefore it seems there is
*lots* of work needed to come up with data that is even vaguely useful.
I will have to do some field work on that.
> We will look at better algorithms in the near future.
> But those are a secondary priority after getting any data at all.
Again, I am confused. Is Moz.Loc.Serv. the only source of data for the
GeoLocation API on your new device or one of several?
>
> We are following a very incremental approach here, where we try to
> provide value at each step of the process and not wait until we can
> beat the current market offerings from competitors with many years of
> experience.
My concern for users is to understand how relying on such a problematic
data source as Mozilla Location Services for the Geolocation API will
impact user interfaces and work flows. As a consumer of the Geolocation
API, I do not see that a simple calculus of 'results of any quality are
better than no results' but something more subtle.
Hmm, all of a sudden I realize that you might be using MozLocServices
only to feed the GPS chip and not for the GeoLocation API. So maybe I
should have clarified that first. Will you be using Moz.Loc.Sevice data
for the GeoLocation API?
>
> Hope this clarifies the current situation a bit. Hanno
Yes indeed. Sorry that it also raises many issues.
Thank you very much for your writeup.
~adrian