Hosting data on DoltHub

59 views
Skip to first unread message

Oscar Batori

unread,
May 27, 2020, 3:37:39 AM5/27/20
to opensport
Folks,

I am a passionate football fan, and also happen to work on a product called DoltHub. Briefly, Dolt is an open source database with Git-like features. It's as if MySQL and Git had a baby. It provides a Git-like experience for tables of data. DoltHub is a product that provides a collaboration platform for Dolt data. Dolt is free, and always will be. DoltHub is free for open data.

I believe that Dolt and DoltHub provide an extremely elegant distribution model, very similar to Git, but for a relational database. Furthermore, when you get a Dolt repo, it's a functioning SQL database, so you can start doing analysis right away. It runs a MySQL Server instance, so it connects to the usual Pandas toolchain. Our CEO wrote a blogpost about distributing data using Dolt and DoltHub, and we have an extensive documentation site.

I notice many different files for different leagues and different years. I believe this project is calling for a unified database with different tables for leagues that provide different schemas, then possibly with views to make data uniform. Cloning Dolt data is easy:
$ dolt clone openfootball/club-football

Within such a repo we can imagine having tables for each league/country or whatever. Members of the community can make pull requests against DoltHub suggesting useful views. DoltHub empowers the maintainers of OpenFootball to keep total control of the repo, setting permissions appropriately.

Let me know your thoughts, we are happy to put some hours into this on our side.

-Oscar

Gerald Bauer

unread,
May 27, 2020, 3:42:03 AM5/27/20
to open...@googlegroups.com
Hello,
Thanks for your kind words and thanks for highlighting dolt and
dolthub. Great initiative.

Unfortunately, for now my time is limited and the priority for the
next week is to get the format / setup change to v2.0 back to normal.

Keep up your great work on dolt and dolthub. Cheers.

Oscar Batori

unread,
May 27, 2020, 8:19:18 PM5/27/20
to open...@googlegroups.com
I can allocate time to this starting this weekend, and I am happy to do so. I have a good deal of experience designing data infrastructure, and I'd love to use this to make an amazing open football database. I will start with this one: https://github.com/openfootball/football.json.

I'd like to understand a bit more about the structure of your repos and broader goals, so if you have time it would be great to chat for a few minutes, that'd be great. 



--
You received this message because you are subscribed to the Google Groups "opensport" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensport+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opensport/CAAxEZd83BYuMW100SH%2BPT14Lf%2BaBobFNAi8EYRW1ABL4ivdVfQ%40mail.gmail.com.

Gerald Bauer

unread,
May 28, 2020, 3:50:29 AM5/28/20
to open...@googlegroups.com
Hello,
All datasets are dedicated to the public domain - that is - you're
free to serve-yourself without any requirement for attribution, remix,
etc

To build yourself a single-file SQLite database e.g. sport.db for
the top 4 leagues (as exported in football.json) use / create a
Datafile:

Datafile:
-----cut---
football 'england'
football 'deutschland'
football 'espana'
football 'italy'
----cut--

and than use
$ sportdb build

that will download the datasets to /tmp and build everything from
scratch / zero.
As an alternative you can create one SQLite database per country e.g.
england.db, italy.db, etc. See the quick starter (datafile) templates
for some more examples. [1] Good luck. Let us know how it goes.
Cheers.

[1] https://github.com/openfootball/quick-starter
Reply all
Reply to author
Forward
0 new messages