API for Pontoon

Staś Małolepszy

unread,

Jul 12, 2017, 1:15:24 PM7/12/17

to tools...@lists.mozilla.org

I'd like to work on exposing Pontoon data via an API. The main driver
is the use case from bug 1302053:

- Stats for a locale: supported projects, status of each project.
- Stats for a project: supported locales, incomplete locales,
complete locales.

I researched using REST and GraphQL and wrote down my notes on the wiki:

https://wiki.mozilla.org/L10n:Pontoon/API

I'm still hesitating between REST and GraphQL (without Relay). The
former is familiar and well-established. The latter offers really nice
syntax and an amazing API explorer with built-in documentation. OTOH,
in case of some requests, using GraphQL naïvely may result in a lot of
queries being made to the DB.

In an effort to see if the problem of too many queries can be remedied
I implemented four simple entry points for Pontoon: projects, project,
locales and locale. Prefetch_selected optimizations are only added if
the query requires them. Cyclic queries are also explicitly forbidden
to prevent querying for projects of locales of projects etc. (I'm sure
the code I wrote for this can be generalized and made more robust, but
it's good enough for the purposes of the demo.)

https://github.com/mozilla/pontoon/compare/master...stasm:graphql

The demo is read-only . The GUI editor is only available in DEV mode.
On production, the /graphql endpoint is available without
authentication, CSP nor CSRF.

This exercise nicely shows off the good things about GraphQL, too.
See the following 1.5-minute-long silent screencast of the GraphiQL
tools which makes writing queries and debugging them a pleasure:

https://drive.google.com/file/d/0B4XpFaGRPsjHai1nUXJRdWMya3c/view?usp=sharing

I'd love to open this up for discussion. Let me know if my notes
reflect your experience working with REST and GraphQL. What other
factors and considerations are important here? What would your
preferred way forward be?

Thanks,
Staś

Matjaz Horvat

unread,

Jul 12, 2017, 7:08:31 PM7/12/17

to Staś Małolepszy, tools...@lists.mozilla.org

Stas,

Thanks so much for researching, implementing and documenting all of this!

I gave your patch a spin, which allowed me to use GraphQL for the first
time (yay!). I find it easy to use and also performant for the use case
from bug 1302053. The API explorer is indeed helpful. I like the
flexibility of the output GraphQL gives me.

Once he's back from vacation, I'd love to hear what flod - as the main
consumer of the bug 1302053 use case - thinks about these two approaches
and which one would he prefer.

BTW, the other day I came across this article and quickly scanned through
it:
https://philsturgeon.uk/api/2017/01/24/graphql-vs-rest-overview/

One thing worth pointing out is "You can definitely use both [REST and
GraphQL] at the same time". I understand that's not ideal, but it's a
useful reminder that even if we choose X over Y and sometime in the future
Y turns out to be much more suitable for use case A, it's not like we have
to rewrite the whole thing.

-Matjaž

> _______________________________________________
> tools-l10n mailing list
> tools...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/tools-l10n
>

Axel Hecht

unread,

Jul 13, 2017, 5:06:44 AM7/13/17

to mozilla-t...@lists.mozilla.org

Am 12.07.17 um 19:14 schrieb Staś Małolepszy:

I also want to find some time to play with your branch. In general, but
also in the light of "can I get suggestions that are older than a week"
or something like that.

I also think that traditional REST as an impl of CRUD isn't going to
work for most of what we do. Things just scale too badly.

Which leaves us with inventing a query language, or use GraphQL. Using
something with a spec is nice. :-)

Which lead me to investigate a bit on what the libraries look like in
python land. I ended up on
https://github.com/graphql-python/graphql-core, which implements quite a
bit. Not sure which functionality graphene has precisely in the stack
yet. What concerns me a bit is the following form the README:

> This library is a port of graphql-js to Python and currently is
> up-to-date with release 0.6.0.

Which is 18 releases in the past, and well over a year.

Which leads me to the question if the python stack is actually something
we should bet on?

Axel

Francesco Lodolo [:flod]

unread,

Jul 17, 2017, 5:12:39 AM7/17/17

to tools...@lists.mozilla.org

I'm a bit lost, since I've never seen GraphQL before and I've only spent
a short amount of time looking at documentation.

The first impression is that it offers a lot of flexibility, probably
more than we need. If it doesn't come at a cost (e.g. code complexity,
dependencies, etc.) I guess it won't hurt. Let me know if you deploy
that branch somewhere, I'll probably play with the editor (I don't have
Pontoon installed locally, but I might at some point).

As long as the API can also be accessed via dumb GET requests[1], I'm
fine with any solution you adopt.

Francesco

[1]
http://graphql.org/learn/serving-over-http/#http-methods-headers-and-body

> Stas,
>
> Thanks so much for researching, implementing and documenting all of this!
>
> I gave your patch a spin, which allowed me to use GraphQL for the first
> time (yay!). I find it easy to use and also performant for the use case
> from bug 1302053. The API explorer is indeed helpful. I like the
> flexibility of the output GraphQL gives me.
>
> Once he's back from vacation, I'd love to hear what flod - as the main
> consumer of the bug 1302053 use case - thinks about these two approaches
> and which one would he prefer.
>
> BTW, the other day I came across this article and quickly scanned through
> it:
> https://philsturgeon.uk/api/2017/01/24/graphql-vs-rest-overview/
>
> One thing worth pointing out is "You can definitely use both [REST and
> GraphQL] at the same time". I understand that's not ideal, but it's a
> useful reminder that even if we choose X over Y and sometime in the future
> Y turns out to be much more suitable for use case A, it's not like we have
> to rewrite the whole thing.
>
> -Matjaž
>
> On Wed, Jul 12, 2017 at 7:14 PM, Staś Małolepszy <st...@mozilla.com> wrote:
>

Jarek Śmiejczak

unread,

Jul 23, 2017, 5:07:20 PM7/23/17

to Francesco Lodolo [:flod], tools...@lists.mozilla.org

I think it will be easier to create a some kind of bridge between REST &
GrapQL. REST in my interpretation has a subset of GraphQL's capabilities.
We could create the core GraphQL backend and then create REST backend which
will only execute simplified queries on GraphQL backend.

On Mon, Jul 17, 2017 at 11:12 AM, Francesco Lodolo [:flod] <fl...@lodolo.net>
wrote:

--
Kind regards

Jarek "jotes" Śmiejczak,
P
ython programmer by day, mozillian by heart..
Homepage: https://evilb.it, Github: http://github.com/jotes
Mobile: +48693027040, Mozillian: http://mozillians.org/u/jotes

Staś Małolepszy

unread,

Aug 28, 2017, 11:57:32 AM8/28/17

to mozilla-t...@lists.mozilla.org

flod pointed out a number of troubles he ran into when testing my branch. Here's the summary of the fixes:

I added Project.slug and also changed the arguments to the project(slug) and locale(code) queries.

For getting a JSON response something like this should work:

curl -X POST -d 'query={ project(slug: "amo") { name } }'
http://localhost:8000/graphql

I removed pagination for now to make it easier to support the main
goal of this API experiment which is programmatically getting data out
of Pontoon. If we need pagination in the future we can add new fields
like project_pages and locale_pages.

Pagination helps prevent getting flooded with data from the API. I
guess that's useful when building UIs, but not very useful when doing
data mining. It also helps prevent DOS attacks but I already
mitigated that to a large extent by doing introspection on the query
and forbidding cyclic relations.

I renamed Project.locales to Locale.localizations and
Locale.projects to Locale.localizations to make it clearer that those are the "ProjectLocale" objects in Pontoon's DB.

I also added missingStrings where possible. missingStrings are not stored in the DB, so I had to add them explicitly by computing them:

https://github.com/stasm/pontoon/commit/f16c1eeb94b65d9f9fff1c112bf6f424983fd5d2

The query structure was rather verbose with a lot of intermediate "items" objects. The goal of those was to make it possible to provide meta-information about the results, like the total count of items (which isn't an item on its
own). I removed "items" for now and flattened the query structure. The main use-case for now is the data itself rather than the counts.

We could also establish a convention in the future that for any Foo, "foos" is a simple list of Foos and foo_pages is a paginated result with meta-information.

To illustrate the changes, here are a few example queries that you can try running on my branch:

query sumo {
project(slug: "sumo") {
missingStrings,
localizations {
locale {
name
}
missingStrings
}
}
}

query french {
locale(code: "fr") {
missingStrings,
localizations {
project {
name
}
missingStrings
}
}
}

query amo {
project(slug: "amo") {
name
}
}

query allProjects {
projects {
name
localizations {
locale {
name
}
}
}
}

Also interesting to note is that last night a big PR was merged to graphene and it looks like version 2.0 is in the works:

https://github.com/graphql-python/graphene/pull/500

This might answer some of our worries about graphene not being maintained. And introduce new ones about it being not stable enough :)

Staś

Matjaz Horvat

unread,

Aug 28, 2017, 4:21:36 PM8/28/17

to Staś Małolepszy, mozilla-t...@lists.mozilla.org

Nice work, Staś!

I'm playing with the API and like it a lot.

What are the next steps here?

There's already a new possible use case around the corner:
https://addons.mozilla.org/sl/firefox/addon/pontoon-tools/
https://github.com/MikkCZ/pontoon-tools/

-Matjaž

Staś Małolepszy

unread,

Aug 29, 2017, 4:58:58 AM8/29/17

to Matjaz Horvat, mozilla-t...@lists.mozilla.org

Thanks, Matjaž!

> What are the next steps here?

First of all we need to decide if GraphQL is what we want.

My take-aways from the discussion so far are:

1 The alternative to GraphQL is RPC, not REST.
2 RPC requires that we come up with some sort of querying language.
3 GraphQL is a query language and has a spec, which is nice.
4 graphql-python and graphene are not as well maintained as their
JavaScript counterparts.

In one of our meetings we considered a number of paths forward which
would mitigate this risk outlined in take-away #4.

1. Use GraphQL to parse the query; write own RPC backend.

The full GraphQL-in-Django stack includes:

1 A developer-built schema defining resolvers for data.
2 A layer allowing the resolves to use Django models.
3 A query executor which uses the schema.
4 A query parser.

One of the biggest advantages for us is that GraphQL has a well-defined
spec. We could only use the query parser and translate queries to
a custom RPC backend. This would allow us to skip layers 1-3 on the
stack.

2. Use the JS implementation of graphql and set up a separate server
talking to the DB.

GraphQL.js is well maintained and is the reference implementation.

This approach would require extra work to ensure our GraphQL shapes
don't break when Django models change. We'd need to assess how frequent
DB migrations are in Pontoon and consider the cost of setting up systems
which validate or even auto-generate the GraphQL schema from Django
models.

Approaches 1-2 require additional work before we have a working API. If
it then turns out that the API doesn't work well or that GraphQL is not
the right solution after all, this additional work will be for nothing.

The reasons for trying GraphQL are strong and my recommendation would be
to give it a spin in its current form. Let's call it the approach #3:

3. Use graphene and graphene_django as they are

They both seem sufficient for the purpose of the experiment that I ran
so far. The most buggy part seemed to be the Python implementation of
Relay which I don't think is a good match for Pontoon anyways.

As I mentioned in my previous email there has been recent activity in
Graphene and it looks like version 2.0 is being worked on. If we have
concerns about the codebase, it would be nice to take them up to the
authors and help fix them.

I have a WIP PR which I'd like to bring to a state in which it can be
reviewed.

https://github.com/mozilla/pontoon/pull/630

> There's already a new possible use case around the corner:
> https://addons.mozilla.org/sl/firefox/addon/pontoon-tools/
> https://github.com/MikkCZ/pontoon-tools/

Let's keep track of use-cases on the wiki. I updated it to make room for
the roadmap:

https://wiki.mozilla.org/L10n:Pontoon/API

I think it's important to start small and then expand. The main driver
for the time being for me is to allow access to the following data:

- Stats for a locale: supported projects, status of each project.

- Stats for a project: supported, incomplete, complete locales.

That's why I decided to remove pagination in this iteration. I still
think we should add it as another entry point because that's what we're
most likely to need to prototype the new UI which fetches data from the
API. Once we have a better understanding of the prototype we want to
build we'll be able to design the API more precisely.

Staś

Matjaz Horvat

unread,

Aug 29, 2017, 11:16:19 AM8/29/17

to Staś Małolepszy, mozilla-t...@lists.mozilla.org

To update the list, we just decided to move forward with the approach #3.

We'll finalize, review and deploy the current patch and see how it
behaves with production data.

We'll keep track of the future use cases and milestones (including
switching to Graphene 2.0) on the wiki:
https://wiki.mozilla.org/L10n:Pontoon/API

-Matjaž

> - Stats for a locale: supported projects, status of each project.

Staś Małolepszy

unread,

Oct 10, 2017, 12:35:51 PM10/10/17

to mozilla-t...@lists.mozilla.org

Another update: the first iteration of the API landed at the end of September and has been enabled on the production server!

I wrote a blog post with more details and examples:

https://blog.mozilla.org/l10n/2017/10/10/exposing-pontoon-data-through-api/

We’d like to keep the development of the API use-case-driven. If you’re interested in creating a report or an extension which pulls data from Pontoon please let us know about your use-case! We’d like to prioritize the upcoming features to best serve the needs of the community.

For more information about the planning process consult the wiki:

https://wiki.mozilla.org/L10n:Pontoon/API

-Staś