Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Self-hosted Ichnaea Instance setup help (2.1.0)

136 views
Skip to first unread message

Ben Withers

unread,
Jul 12, 2017, 6:47:01 AM7/12/17
to mozilla-dev...@lists.mozilla.org
The company I work for has an instance of Ichnaea hosted on AWS. I set this up with dev version 2.1.0 and installation went fine. Then came the issue of uploading data into the database for geolocation requests. I got a combined dataset from https://www.mylnikov.org/download which combines MLS, openBmap and OpenCellID.

I’ve been trying to get geolocation working for the past couple of weeks, but to no avail. I’ve tried my best by looking at the source code, and managed to figure things out such as the database structure and table/column formatting, however after inserting presumably correct data records into the database, nothing works.

I need to know how to get data into the database (using v2.1.0) properly, as my method may be doing it wrong, and then get geolocate requests working. Currently I only get 404 errors, which seem to suggest that the location cannot be found, despite the cell towers in the request being in my current database. Another thing to note is that I only inserted into the cell_gsm and cell_lte tables so far and nothing else. I assumed the server could work in this state and that celery tasks would update the state accordingly, but this may not be the case. Maybe some steps on how to go from a newly installed Ichnaea 2.1.0 instance to working geolocate requests would help out massively.

I know that Hannosch is working on reintroducing the import cell data tool (location_load from previous versions), but considering it could be a while till he implements that, I need some immediate assistance to overcome this issue. I hope this group is still active, even a couple of pointers would help since I've started running out of ideas to try! :D

Thanks in advance,
Ben

Ben Withers

unread,
Jul 12, 2017, 11:49:49 AM7/12/17
to mozilla-dev...@lists.mozilla.org
Here's a sample record from the cell_gsm table. I now have cell_lte and cell_wcdma and region_stat tables filled too.

| max_lat | min_lat | max_lon | min_lon | lat | lon | created | modified | radius | region | samples | source | weight | last_seen | block_first | block_last | block_count | hex(cellid) | radio | mcc | mnc | lac | cid | psc |

| 23.745807 | 23.745807 | 37.961832 | 37.961832 | 23.745807 | 37.961832 | 2013-11-12 14:05:43 | 2015-11-06 23:18:53 | 313 | GR | 1 | NULL | NULL | NULL | NULL | NULL | NULL | 0000CA000100010000034C | 0 | 202 | 1 | 1 | 844 | NULL |

I displayed the binary cellid column in hex to be more human readable. It follows the structure defined in ichnaea/models/cell.py, in which it represents the 5 columns after it.

I also added in the region country codes corresponding to the mcc of each record. I have a lot of records that have weird/invalid mcc's, I may remove them later.

I set samples to 1 for now, as I don't have any sample values to use.

Max and min lat and lon I set to the lat and lon, since I don't know how those are set.

That's what I've tried so far, but still none of my requests work. Keep getting the 404 error as follows:

HTTP/1.1 404 Not Found
Server: gunicorn/19.7.1
Date: Wed, 12 Jul 2017 15:24:36 GMT
Connection: close
Content-Type: application/json
Content-Length: 122
Access-Control-Allow-Origin: *
Access-Control-Max-Age: 2592000

{"error":{"errors":[{"domain":"geolocation","reason":"notFound","message":"Not found"}],"code":404,"message":"Not found"}}

Here's the JSON I post to the server: https://gist.github.com/Manicben/6546759a3897f7b986e0b3ed98607a5e

I use curl to post the JSON, which I confirm does work with MLS (with radioType instead of radio, etc.) I've tried both syntax versions, with radio and radioType.

radio produces the 404 error (this one doesn't work with MLS). radioType produces a 500 Internal Server Error (this one works with MLS).

HTTP/1.1 100 Continue

HTTP/1.1 500 Internal Server Error
Connection: close
Content-Type: text/html
Content-Length: 141

<html>
<head>
<title>Internal Server Error</title>
</head>
<body>
<h1><p>Internal Server Error</p></h1>

</body>
</html>

Very odd, maybe the 500 error means that it's trying to work but something is failing? Since 500 is the generic error, I can't trace what is causing it. Are there logs somewhere that I can see and trace the error back?

Thanks, Ben

Hanno Schlichting

unread,
Jul 12, 2017, 5:54:38 PM7/12/17
to Discussion about Geo location services
On 12. Jul 2017, at 17:49, Ben Withers <manic...@gmail.com> wrote:
> I displayed the binary cellid column in hex to be more human readable. It follows the structure defined in ichnaea/models/cell.py, in which it represents the 5 columns after it.

That cellid looks good to me. I've run the five cell values troug encode_cellid and binascii.hexlify locally and it produces the same result.

> Max and min lat and lon I set to the lat and lon, since I don't know how those are set.

In the old import code those where calculated as a bounding box around a circle, based on the cell lat/lon as a center point and the cell radius being the radius of that circle.
That looks good as well.

> radioType produces a 500 Internal Server Error (this one works with MLS).
>
> Very odd, maybe the 500 error means that it's trying to work but something is failing? Since 500 is the generic error, I can't trace what is causing it. Are there logs somewhere that I can see and trace the error back?

A 500 error is definitely suspicious. You shouldn't be able to get one of those with weirdly formatted POST data. Either there is an application code bug or there is something wrong with the installation. It should even return a 404 for things like a missing database backend connection. So getting a 500 should actually be rather hard.

For debugging, the most useful thing to do is to set up and configure Sentry. Either a self-hosted instance or an account from their sentry.io <http://sentry.io/> cloud offering should do. The application expects the Sentry DSN (project id + secret token) in an environment variable (https://mozilla.github.io/ichnaea/install/config.html#sentry <https://mozilla.github.io/ichnaea/install/config.html#sentry>).

Any kind of Python exceptions should be captured and forwarded to Sentry, which has full tracebacks and captures all local stack variables. There isn't really any local logging available.

Hope this helps,
Hanno

Ben Withers

unread,
Jul 17, 2017, 11:10:07 AM7/17/17
to mozilla-dev...@lists.mozilla.org
I have managed to get everything working except for fallbacks.

For anyone else struggling, here's what I did:

1 - Follow everything here: https://mozilla.github.io/ichnaea/install/debug.html this obviously will wipe your database, but it was worth it for me. Everything worked as expected and the Stumbler app my company previously modified from MozStumbler worked as well with the instance. All good.

2 - Reupload GSM cell tower data. I have some custom Python scripts to do this. I may share them at some point. However, simply loading what the CSV has into the DB is not enough to get it working.

3 - Add more to the records. I proceeded to fill in more of the blank columns. I set the following:
a) max_lon, min_lon, max_lat, min_lat -> used geocalc.pyx to determine these from the radius and lat, lon. If radius was 0, set to lat,lon.
b) last_seen -> I set this one just in case to the date in the modified column.
c) samples -> set all to 1, just in case.
Those were the additional columns I edited and now the instance works as expected and very close to MLS in terms of output (the data is based on MLS after all, but prob an older version. I'll look into merging the current MLS data in). All of those columns may not be needed, but that's what I tried.

Now my only issue is fallbacks, which I assume can be enabled by editing the api_key table. Other than setting allow_fallback to 1, what else should be set (e.g. fallback_name, fallback_url, etc.)?

And just out of interest, does Ichnaea handle timing advance parameters well? As in, does it make a significant improvement to include them in requests in addition to signal strength?

Thanks, Ben

Hanno Schlichting

unread,
Jul 25, 2017, 8:57:29 AM7/25/17
to Discussion about Geo location services
On 17. Jul 2017, at 17:09, Ben Withers <manic...@gmail.com> wrote:
> Now my only issue is fallbacks, which I assume can be enabled by editing the api_key table. Other than setting allow_fallback to 1, what else should be set (e.g. fallback_name, fallback_url, etc.)?

There is some help text available on the database model, at https://github.com/mozilla/ichnaea/blob/master/ichnaea/models/api.py#L49 <https://github.com/mozilla/ichnaea/blob/master/ichnaea/models/api.py#L49>

Basically a fallback is a way to tell one ichnaea instance to call out and get help from a different location service. This is only done for API keys that have allow_fallback = 1 set and if the ichnaea instance itself doesn't find a position with a "good enough" quality score, or in other words if it doesn't find a position at all or one of dubious trustworthiness based on its own data.

The score is based on things like sample count, number of times the network has moved (blocklist), age of the data (modified) and also time difference between creation date and modification date. This accounts for networks which were seen dozens or hundreds of times, but only on a single day.

The fallback columns could be set to something like:

fallback_name: mozilla
fallback_url: https://location.services.mozilla.com/v1/geolocate?key=some_key <https://location.services.mozilla.com/v1/geolocate?key=some_key>
fallback_ratelimit: 10
fallback_ratelimit_interval: 60
fallback_cache_expire: 86400

The cache expire setting would cache the response values of the fallback service in the local Redis instance for a day (86400 seconds). This avoids repeated calls to the external service for the same queries.

The rate limit settings are a combination of how many requests are allowed to be send to the external service. It's a "number" per "time interval" combination, so in the above example 10 requests per 60 seconds.

The rate limit is tracked per fallback_name and the cache is global and shared across all fallbacks. So if one configures multiple API keys with all the same fallback values, they share the same information.

API compatible services are other ichnaea instances (like MLS), Combain's location service and even Google's. Not all of these allow local caching of results and they have different rate limits according to the contracts one could have with them.

> And just out of interest, does Ichnaea handle timing advance parameters well? As in, does it make a significant improvement to include them in requests in addition to signal strength?

Currently ichnaea ignores timing advance data. I think it's accepted in the public APIs and sent through the analysis pipeline, but simply ignored in there. When we did most of the development of the positioning algorithms, timing advance data was not actually gathered and send to us by any of the phones out there. For GSM it wasn't available in the Android APIs (only added recently in API level 26, https://developer.android.com/reference/android/telephony/CellSignalStrengthGsm.html#getTimingAdvance()) <https://developer.android.com/reference/android/telephony/CellSignalStrengthGsm.html#getTimingAdvance())>. For LTE it was in theory available, but almost no phones actually send us that data. I think most of the time the cell modems and firmwares simply didn't report this data up to the Android OS layer.

Hanno
0 new messages