seek help for Sakai database structure

Zhang, Ruiling

unread,

Oct 5, 2017, 10:46:04 AM10/5/17

to saka...@apereo.org

Hello All,

Any one could help me out to point where I could get detail information about sakai database schema? Thanks!

Ruiling

Web application developer

Virginia Tech

Diego del Blanco (Unicon)

unread,

Oct 5, 2017, 11:35:09 AM10/5/17

to Zhang, Ruiling, saka...@apereo.org

Hi Ruiling:

AFAIK there is not any documentation about the database schema itself (unless there is something new).

If you want to view a more graphic representation you surely can use some of the tools included in several db managers. But it is huge, and still with that, I wouldn't trust about being there all the relationships between tables.

Do you have any specific question about the schema?

Regards

Diego del Blanco Orobitg

Sakai Tech Lead (UNICON)

Apereo LATAM Representative (Apereo Foundation)

Ph: +1-480-521-8527

Em: diegodelbl...@gmail.com

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

Reinier Post

unread,

Oct 18, 2017, 7:44:27 AM10/18/17

to saka...@apereo.org

I'd just like to add that we've spent many hours on this exact question.

In running Sakai we find that for many questions, the answer is provided by

crafting and executing a suitable SQL query against the database,

which requires sufficient understanding of the schema.

Reinier Post

TU Eindhoven

Gregory Guthrie

unread,

Oct 18, 2017, 12:24:37 PM10/18/17

to Sakai Development, rlz...@exchange.vt.edu

And one thing that has made it hard for us, is that there are many fields which are used as foreign keys, but not documented (tagged) as such in the tables. Thus the semantics and referrent of some mysterious long field value, requires a lot of detective work. Maybe that has changed by now, this was a few versions ago...

Charles Severance

unread,

Oct 18, 2017, 4:28:30 PM10/18/17

to Gregory Guthrie, dev sakai, rlz...@exchange.vt.edu

Part of the reason for the lack of fully modeled foreign keys, etc is the need to be portable across multiple databases.

/Chuck

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Earle Nietzel

unread,

Oct 18, 2017, 4:44:06 PM10/18/17

to Zhang, Ruiling, saka...@apereo.org

Attached is a current graph of master's database.

I used DBVis https://www.dbvis.com/ to generate the image.

--

Steve Swinsburg

unread,

Oct 18, 2017, 5:07:38 PM10/18/17

to Severance Charles, rlz...@exchange.vt.edu, sakai-dev, Gregory Guthrie

I'm curious about't what this statement actually means. We only support two databases and both can have FKs so I don't believe that would be an issue. Perhaps it is the Sakai SqlService that is the culprit?

I've always believed it was more down to lack of db experience and then handling the referential integrity situation in code instead.

On 19 Oct. 2017 07:28, "Charles Severance" <cs...@umich.edu> wrote:

Part of the reason for the lack of fully modeled foreign keys, etc is the need to be portable across multiple databases.

/Chuck

On Oct 18, 2017, at 9:24 AM, Gregory Guthrie <grgu...@gmail.com> wrote:

And one thing that has made it hard for us, is that there are many fields which are used as foreign keys, but not documented (tagged) as such in the tables. Thus the semantics and referrent of some mysterious long field value, requires a lot of detective work. Maybe that has changed by now, this was a few versions ago...

On Thursday, October 5, 2017 at 9:46:04 AM UTC-5, Zhang, Ruiling wrote:
Hello All,

Any one could help me out to point where I could get detail information about sakai database schema? Thanks!

Ruiling
Web application developer
Virginia Tech

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Hendrik Steller

unread,

Oct 19, 2017, 10:51:17 AM10/19/17

to saka...@apereo.org

On Mittwoch, 18. Oktober 2017 23:07:35 CEST Steve Swinsburg wrote:
> I'm curious about't what this statement actually means. We only support two
> databases and both can have FKs so I don't believe that would be an issue.
> Perhaps it is the Sakai SqlService that is the culprit?
>
> I've always believed it was more down to lack of db experience and then
> handling the referential integrity situation in code instead.
>

I optimistically assumed that it might have had something to do with the
modularity of the software or the development process.
Like, the term_eid might be a site property instead of a column in the site
table with a FK into the academicsession table because the academicsession
belongs to the course management API, which might only have appeared later as
a contrib project and thus at first it couldn't be assumend that the cm api
and its tables would be always present.

Or that it would make it easier to swap out components/services.
If I were in marketing I might be able to make up some bollocks and sell this
as a fashionable microservices thing: how each (micro)service needs his own,
independent data storage and how you lose flexibility and options for
horizontal scaling if your modular services are coupled together again on the
persistence tier. Something in that directions, but with more buzzwords, of
course. :-)

On a related note: what I haven't understood is why a few things like
calendar are storing XML in database fields so that you can't query the data
without walking over the entire database and parse the XML.
For example, I had to do a lot of caching to make a performancewise usable
tool which generates a filterable view of all exams in the system so that
instructors can schedule their exams without collisions (-> https://
kvv.imp.fu-berlin.de/x/LK0e6N ).
Maybe there was an idea for something or a requirement I don know about
behind this, like exchanging data with other calendar apps by doing XML
transformations. Or maybe it was just because XML was the hot thing to do at
that time.

Hendrik

Stephen Marquard

unread,

Oct 19, 2017, 10:55:12 AM10/19/17

to Hendrik Steller, saka...@apereo.org

On the XML blobs, XML was the hot thing to do at that time (around 2004). A by-now universally regretted design decision.

Some of the reasons you suggest for lack of cross-table FK relationships are also true though (modularity, independent APIs so different services own their own tables and relations).

Regards

Stephen

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen....@uct.ac.za
Phone: +27-21-650-5037 Cell: +27-83-500-5290

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. If this e-mail is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via cs...@uct.ac.za

Matthew Jones

unread,

Oct 19, 2017, 11:08:02 AM10/19/17

to Hendrik Steller, saka...@apereo.org

Yeah, the XML was just part of the initial design. I wasn't around for this then but I'm guessing it's kind of because they wanted to be rapid and have quick schema changes, kind of like a NoSQL model. Hibernate was pretty new then and just starting to be used by some tools, and probably would have solved some of these problems better. It still is kind a "nice to have project" to replace the SqlService with hibernate. There was some attempts to optimize some of the XML (pull out the columns that needed indexes or searches) but this was just ad-hoc per-project.

In Sakai 12 Earle has been working on changing the assignment structure from XML to relational tables but this takes some time for every tool since it's so integrated into the service. Assignment was just the biggest one with the most issues.

A NoSQL database that did this document model probably would have pretty good for Sakai. But things like MongoDB weren't created for a few years until 2009. And the XML style that this used was just a way to replicate that (poorly) in a relational database. [1]

I always through the lack of FK's made it really nice for having a "soft delete", if we had FK's everywhere we'd have to hard cascade delete when things were deleted that linked across keys. This would have resulted in really deleting a lot less and needing a better soft delete (and hard delete) than what we've got now.

[1] https://www.mongodb.com/blog/post/why-xml-in-rdbms-is-not-mongodb

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Charles Severance

unread,

Oct 19, 2017, 12:02:15 PM10/19/17

to Steve Swinsburg, rlz...@exchange.vt.edu, dev sakai, Gregory Guthrie

On Oct 18, 2017, at 2:07 PM, Steve Swinsburg <steve.s...@gmail.com> wrote:

I'm curious about't what this statement actually means. We only support two databases and both can have FKs so I don't believe that would be an issue. Perhaps it is the Sakai SqlService that is the culprit?

I've always believed it was more down to lack of db experience and then handling the referential integrity situation in code instead.

Steve,

What you say is true as well. In the beginning (2001) there was a lack of db expertise and a lack of cross-db expertise, and in that situation it was difficult enough to just get the table creation and the basic SQL working across all the databases. And I won’t even mention the flat-file serialization option for the DB models - NoSQL 15 years before it was cool. :)

The most frustrating thing to me looking at the initial DB tables and data model was the lack of integer primary keys in some tables. This was because we were at an Oracle school and the “experts” told us “don’t worry join on GUIDs - Oracle makes it all work magically”. One of many reasons that I dislike Oracle :)

We were not skilled enough (and badly advised by experts) to figure out how to make integer foreign keys across the board for multiple databases :( Also a bad decision that computing a “globally unique” UUID in the server was an OK thing to do instead of making an auto-increment field.

And (I think) I have been told that MySQL has some magic that makes GUIDs as foreign keys not too horrible. But to don’t like to trust magic personally. Also - Innodb was not used as universally in 2001/2002 as it is now. Remember ISAM :)

There are a lot of reasons - not much good to assign the blame 15 years later. BTW I am not defending myself here - the core Sakai database tables were laid down in 2001/2002 before I was associated with the project. I think that if you look at DB applications built in 2001 my guess is that they would make a few DB decisions that looked silly with 17 years of hind sight.

Of course with what we know now, with enough effort you definitely can model foreign keys across multiple databases - and we could do a retrofit to change the create statements and build a bunch of alter statements to make it happen in a post InnoDB world - if there was interest that would support the cost.

/Chuck

P.S. Given how much *I* have learned from this painful experience, Tsugi uses one database, integer primary keys across the board and fully modeled foreign keys form the core tables down to every table created by every tool - it is 100% connected. You can delete all evidence a tenant in a Tsugi instance with a single delete statement. :) It is nice to live in a post-InnoDB world.

Charles Severance

unread,

Oct 19, 2017, 12:06:01 PM10/19/17

to Hendrik Steller, dev sakai

On Oct 19, 2017, at 7:50 AM, Hendrik Steller <sa...@stellers.net> wrote:

On a related note: what I haven't understood is why a few things like
calendar are storing XML in database fields so that you can't query the data
without walking over the entire database and parse the XML.
For example, I had to do a lot of caching to make a performancewise usable
tool which generates a filterable view of all exams in the system so that
instructors can schedule their exams without collisions (-> https://
kvv.imp.fu-berlin.de/x/LK0e6N ).
Maybe there was an idea for something or a requirement I don know about
behind this, like exchanging data with other calendar apps by doing XML
transformations. Or maybe it was just because XML was the hot thing to do at
that time.

This was (again) a notion in 2001 that was kind of like “NoSQL” is today - allowing some aspects of the data model to be schema-free. JSON did not exist in 2001 and NoSQL databases did not exist in 2001.

Of course it turned out to be a bad idea - Earl is rewriting the assignments data model for 12 to eliminate this stuff so there are efforts where we have effort available to evolve beyond the XML approach.

It also allowed a 100% XML serialization of all of the data in the DB. Why was this a use case? Long story :)

/Chuck

Sam Ottenhoff

unread,

Oct 19, 2017, 12:24:43 PM10/19/17

to Charles Severance, Steve Swinsburg, Zhang, Ruiling, dev sakai, Gregory Guthrie

> We were not skilled enough (and badly advised by experts) to figure out how to make integer foreign keys across the board for multiple databases :( Also a bad decision that computing a “globally unique” UUID in the server was an OK thing to do instead of making an auto-increment field.

The use of GUIDs across the board can have great benefit for complex institutions with schools and programs that run multiple instances and eventually want to merge or diverge. The problem is consistency: GUID use across all tools should have been yes/no instead of maybe.

The use of XML probably sheds light on the traditional, institutional split between ops, dbas, and developers. Filing a change request with your ops and dba team to get a new column added to the assignments table to handle peer-review is great pain for developers so XML allowed developers to bypass these road blocks.

Charles Severance

unread,

Oct 19, 2017, 1:04:00 PM10/19/17

to Sam Ottenhoff, Steve Swinsburg, Zhang, Ruiling, dev sakai, Gregory Guthrie

On Oct 19, 2017, at 9:24 AM, Sam Ottenhoff <otte...@longsight.com> wrote:

> We were not skilled enough (and badly advised by experts) to figure out how to make integer foreign keys across the board for multiple databases :( Also a bad decision that computing a “globally unique” UUID in the server was an OK thing to do instead of making an auto-increment field.

The use of GUIDs across the board can have great benefit for complex institutions with schools and programs that run multiple instances and eventually want to merge or diverge. The problem is consistency: GUID use across all tools should have been yes/no instead of maybe.

Sam - If you were starting fresh (I am asking about Tsugi :)) - would you use GUID foreign keys or integer autoincrement fields and add a more global identifier at import / export time?

/Chuck

Charles Severance

unread,

Oct 19, 2017, 1:07:52 PM10/19/17

to Gregory Guthrie, Hendrik Steller, dev sakai

The “redo the assignment data model” does not move any direct database access from the service to the tool.

It is the very fact that we *always* keep the details of the data model in the service and behind the API that allows us to revise the data model without breaking the UI code.

The new data model does not break the old API but it makes it so certain uses of the API are far more efficient *and* additional features can be added to the API that would have been inefficient using the old data model that are now efficient with the new data model so (as you say) “We can discover new usages for the data”.

/Chuck

On Oct 19, 2017, at 10:01 AM, Gregory Guthrie <gut...@mum.edu> wrote:

We have also had this same issue.

This may be naïve – but while we also have tools that directly query the database, things like this:
“…is rewriting the assignments data model for 12 …”
Reflect on how fragile such access is, specific to some particular implementation.

It would be good if/as somehow we can continue to develop APIs for access to things as we discover new usages for the data.

Sam Ottenhoff

unread,

Oct 19, 2017, 1:17:16 PM10/19/17

to Charles Severance, Steve Swinsburg, Zhang, Ruiling, dev sakai, Gregory Guthrie

> We were not skilled enough (and badly advised by experts) to figure out how to make integer foreign keys across the board for multiple databases :( Also a bad decision that computing a “globally unique” UUID in the server was an OK thing to do instead of making an auto-increment field.

The use of GUIDs across the board can have great benefit for complex institutions with schools and programs that run multiple instances and eventually want to merge or diverge. The problem is consistency: GUID use across all tools should have been yes/no instead of maybe.

Sam - If you were starting fresh (I am asking about Tsugi :)) - would you use GUID foreign keys or integer autoincrement fields and add a more global identifier at import / export time?

Definitely the latter. Your import mechanism can handle rewriting the integer ids during the importing process.

Hendrik Steller

unread,

Oct 19, 2017, 1:51:03 PM10/19/17

to Matthew Jones, saka...@apereo.org

On Donnerstag, 19. Oktober 2017 17:07:49 CEST Matthew Jones wrote:
> Yeah, the XML was just part of the initial design. I wasn't around for this
> then but I'm guessing it's kind of because they wanted to be rapid and have
> quick schema changes, kind of like a NoSQL model.

Hm, so basically a different way of what is done in other places with the
"entity" and "entity_property" tables like site and site_property.

> A NoSQL database that did this document model probably would have pretty
> good for Sakai.

Well, Sakai ships with Elasticsearch already nowadays, so... ;-)

I was reminded of a system I got a peek at when looking for a job with some
company. They power *really* big media companies by stuffing all data into
Elasticsearch, with their DAOs being REST clients for ES and doing a bit of
converting between ES and business objects IIRC.
So their clients' way of dealing with increased "database" load is simply to
tell Amazon to fire up a few more Elastic nodes. That was pretty nifty.

[..although that whole "everything is a document" bit gave me flashbacks of
writing Java applications for Lotus Domino/Notes ca 15 years ago. ]

> I always through the lack of FK's made it really nice for having a "soft
> delete", if we had FK's everywhere we'd have to hard cascade delete when
> things were deleted that linked across keys. This would have resulted in
> really deleting a lot less and needing a better soft delete (and hard
> delete) than what we've got now.

I agree. I've actually been inspired by Sakai's habit of doing this when
writing a couple of custom tools which store somewhat important data.
Not only is the sakai user_id not a FK in those tools' tables, they also
replicate some data like student names to make it possible to make sense of
the data even if someone decides (again..) to delete "unnecessary" sakai users
from the table...

Hendrik

Mitch Golden

unread,

Oct 20, 2017, 9:13:04 AM10/20/17

to Hendrik Steller, Matthew Jones, Sakai Development

Hendrik

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Mitch Golden

unread,

Oct 20, 2017, 9:25:53 AM10/20/17

to Hendrik Steller, Matthew Jones, Sakai Development

On Thu, Oct 19, 2017 at 1:50 PM, Hendrik Steller <sa...@stellers.net> wrote:

Oops.

One under appreciated issue that is caused by the lack of foreign keys is the loss of the ability to do good reflection. Suppose you want to add a contrib tool that foreign keys into a normal table in master. If you have a foreign key, you can set up a system on the model associated with the master table that automatically calls hooks on the dependent table if it is there. The master table doesn't need to know at compile time what is pointing at it, it just discovers it. The database itself offers ON DELETE CASCADE, but there could be more subtle operations.

Chuck - aside from the possible performance issues, why do you think that it matters what the key is? I think that the PK is more or less like a pointer in C programming - it should never be thought of as having any semantic content whatsoever. From my (very old) understanding of how databases work, a randomly generated primary key (of whatever data type) might have performance advantages over ones that are sequential, since it distributes the rows randomly in the table.

Here's an article that discusses some of this:

https://www.psce.com/en/blog/2012/04/04/how-important-a-primary-key-can-be-for-mysql-performance/

Mitch Golden
CTO
Noodle Partners, Inc.
860 Broadway, 6th Floor, New York, NY 10003
mgo...@noodle.com | noodle-partners.com

Reply all

Reply to author

Forward