Jenkins on SQLite

414 views
Skip to first unread message

Basil Crow

unread,
Apr 2, 2022, 9:59:10 PM4/2/22
to jenkin...@googlegroups.com
In the past we have talked about our vision and goals for Jenkins 3.0
on this list. Here is one of mine.

Has anyone besides me been highly dissatisfied with the way Jenkins
does object persistence? I think we are leaving a lot of functionality
and performance on the table by using flat files rather than a
relational database. Just run syncsnoop.bt on any Jenkins controller
and observe that a standard installation writes out dozens of tiny
files per second while running a Pipeline job and calls fsync(2) on
every single one of them (!). This architectural choice is
constraining our ability to implement new features at reasonable
performance, especially with regard to test results and static
analysis checks.

I think SQLite is the ideal choice for a relational database for
Jenkins. SQLite directly competes with flat files, which is what we
are using today. Furthermore, it is serverless, so it would not
introduce any new installation or upgrade requirements. The migration
could be handled transparently on upgrade to the new version.

True, SQLite allows at most one writer to proceed concurrently. But do
we really need to support more than one concurrent writer for most
metadata, like the Configure System page? Obviously we need to support
concurrent builds of jobs. This can be handled by defining a set of
namespaces as concurrency domains, each one backed by its own SQLite
database. For example, we can have one SQLite database for global
configuration, one SQLite database for the build queue, one SQLite
database for each job (or even build), etc. In this way we can in fact
support multiple writers interacting with different parts of the
system concurrently. The point is that by grouping these into
high-level buckets we can take advantage of the economies of scale
provided by the database and OS page cache.

I put together a quick prototype today at
https://github.com/basil/jenkins/tree/sqlite. My Jenkins home looks
like this:

${JENKINS_HOME}/sqlite.db (one primary SQLite database)
${JENKINS_HOME}/jobs/test/sqlite.db (one SQLite database per job in
this prototype)

The primary SQLite database has these tables:

$ sqlite3 sqlite.db .tables
config
hudson.model.UpdateCenter
hudson.plugins.git.GitTool
jenkins.security.QueueItemAuthenticatorConfiguration
jenkins.security.UpdateSiteWarningsConfiguration
jenkins.security.apitoken.ApiTokenPropertyConfiguration
jenkins.telemetry.Correlator
nodeMonitors
org.jenkinsci.plugins.workflow.flow.FlowExecutionList
queue
users/admin_12464527240177267930/config
users/users

Each table represents an old XML file. In this prototype I am just
serializing the object with XStream and Jettison as JSON rather than
XML and storing it in one JSON column. Why JSON, you ask? Because
SQLite has a fully featured JSON extension. So here is how config.xml
looks:

$ sqlite3 sqlite.db 'select json from config'
{"hudson":{"disabledAdministrativeMonitors":[""],"version":"2.342-SNAPSHOT","numExecutors":2,"mode":"NORMAL","useSecurity":true,"authorizationStrategy":{"@class":"hudson.security.AuthorizationStrategy$Unsecured"},"securityRealm":{"@class":"hudson.security.HudsonPrivateSecurityRealm","disableSignup":true,"enableCaptcha":false},"disableRememberMe":false,"projectNamingStrategy":{"@class":"jenkins.model.ProjectNamingStrategy$DefaultProjectNamingStrategy"},"workspaceDir":"${JENKINS_HOME}\/workspace\/${ITEM_FULL_NAME}","buildsDir":"${ITEM_ROOTDIR}\/builds","markupFormatter":{"@class":"hudson.markup.EscapedMarkupFormatter"},"jdks":[""],"viewsTabBar":{"@class":"hudson.views.DefaultViewsTabBar"},"myViewsTabBar":{"@class":"hudson.views.DefaultMyViewsTabBar"},"clouds":[""],"scmCheckoutRetryCount":0,"primaryView":"all","slaveAgentPort":-1,"label":"","crumbIssuer":{"@class":"hudson.security.csrf.DefaultCrumbIssuer","excludeClientIPFromCrumb":false},"nodeProperties":[""],"globalNodeProperties":[""],"nodeRenameMigrationNeeded":false}}

The job SQLite database has these tables:

$ sqlite3 jobs/test/sqlite.db .tables
build config junitResult workflow

These correspond to the old XML files as well. So builds/1/build.xml
is row 1 in the build table with a JSON column for its content,
builds/1/junitResult.xml is row 1 in the junitResult table with a JSON
column for its content, builds/1/workflow/2.xml is a row in the
workflow table with a composite key of workflow 2 and build 1 and a
JSON column for its content, etc. I have not yet attempted to deal
with things like SCM changelogs, permalinks, nextBuildNumber, and the
like, but these could all be moved into the SQLite database as well.
Halfway through this prototype I realized I was building an ORM from
scratch, so it might be worth exploring an existing solution like
Hibernate. But I was able to get quite far just stuffing JSON from
XStream into a primitive table layout in SQLite.

How does this all stack up? Well, Freestyle and Pipeline jobs work
just fine, and performance seems quite fast. True, multiple concurrent
builds of the same Pipeline job will be contending with each other to
write new Pipeline steps out to the workflow table, yet also there are
economies of scale to be gained in letting the database manage the
layout of the data within a single file rather than laying out data
ourselves in multiple files and fsync(2)'ing each one. SQLite offers
"extra", "full", "normal", and "off" settings for its "synchronous"
option, which we can map to the existing Pipeline durability levels.

Obviously this code is a rough prototype, but I was surprised at how
much just worked out of the box after a few hours of hacking. I think
there could be a future for Jenkins where everything is managed by
SQLite databases and where we leave XStream behind in favor of an ORM
like Hibernate. On upgrade, we can read in all the data with XStream
and write it out to SQLite with the ORM. From then on, serialization
and deserialization would work through an ORM against the relevant
SQLite database(s). And this would be on by default for everyone on
upgrade, not some opt-in plugin.

I think the functionality and performance we could get out of such a
system would be better than what we have today. The real benefit would
come after the migration when we can optimize slow operations, like
loading builds or displaying test results and static analysis results,
with hand-rolled SQL queries. We could also allow people to do
full-text search of build console logs.

Herve Le Meur

unread,
Apr 3, 2022, 6:09:14 AM4/3/22
to jenkin...@googlegroups.com
Impressed by the little amount of modifications needed for your prototype, and really like the idea!

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjrq8OAZs%3DNL3-B7rYD2jqvSWsXs5iY8UyJxPCWEUHk6WA%40mail.gmail.com.

Markus Winter

unread,
Apr 3, 2022, 8:55:14 AM4/3/22
to jenkin...@googlegroups.com
I like the idea.

Questions:
Is SQlite the best choice as it is platform dependent? Anyone that runs Jenkins not on one of the supported platforms would not benefit.
So is H2 maybe the better choice (also from concurrency aspect)?

plugin compatibility, can we be sure all plugins still work (e.g. ssh-slaves-plugin would need changes here https://github.com/jenkinsci/ssh-slaves-plugin/blob/80a9538ba9b0bb6caa049cab9eb4d1ee26d51434/src/main/java/hudson/plugins/sshslaves/verifiers/HostKeyHelper.java#L66-L79)

At work we configure our Jenkins via xml files that we generate before Jenkins start (from times when CASC plugin didn't exist yet). There are potentially others doing the same so this would have a bigger impact.

You mentioned full text search of build logs. Do you intend to also store the build logs in the DB? I guess many plugins rely on the logs being in the File system. Also at work we upload the build logs to Splunk, that wouldn't work when put in a database.

Daniel Beck

unread,
Apr 3, 2022, 11:28:51 AM4/3/22
to Jenkins Developers

> On 3. Apr 2022, at 03:58, Basil Crow <m...@basilcrow.com> wrote:
>
> I put together a quick prototype today at
> https://github.com/basil/jenkins/tree/sqlite.

This is really cool, thanks for sharing!

Basil Crow

unread,
Apr 3, 2022, 12:23:37 PM4/3/22
to jenkin...@googlegroups.com
I have never used H2, but I have a strong preference for SQLite. It
has been deployed more widely than all other database engines combined
and is probably one of the top five most deployed software programs of
all time, competing only with zlib, libpng, and libjpeg. It is very
high quality code and its tests have 100% branch coverage. These are
not small achievements. They mean it when they write: "Small. Fast.
Reliable. Choose any three."

Obviously plugins would need to be prepared in advance for such a
migration. With the ugly schema from my experiment, not too many
preparations are needed. But I suspect that adopting a nice schema
backed by an ORM like Hibernate would likely require many changes to
the serializable objects. It would be interesting to play around with
this, but definitely more than a day's worth of hacking.

I haven't thought about build logs too much, but I could see some
value to storing them in an SQLite database, if not as the primary
storage than at least in a secondary cache for full-text search.

Alexander Brandes

unread,
Apr 3, 2022, 2:21:36 PM4/3/22
to Jenkins Developers
That sounds impressive! Your prototype implementation looks much promising, thanks for dropping the link here!

~ Alex

mike cirioli

unread,
Apr 3, 2022, 8:01:07 PM4/3/22
to jenkin...@googlegroups.com
This is very cool!

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Tim Jacomb

unread,
Apr 4, 2022, 3:19:23 AM4/4/22
to Jenkins Developers
Really nice, I tried it out on a clean setup and it worked well. (existing setup hit an issue which I'm sure wouldn't be hard to sort).

SQLite seems like it could be a good fit.

One thought around it though is it would be great to improve the Jenkins HA story allowing multiple controllers to operate like they are one controller.
If files remain on the file system I'm not sure if we can do that.

But this could be a good stepping stone towards that or at least a great improvement overall.

Nice work :)

Thanks
Tim 

Basil Crow

unread,
Apr 21, 2022, 3:43:38 PM4/21/22
to jenkin...@googlegroups.com
Someone asked me for the tl;dr about this, and I wrote the following:
  • The scalability limitations of flat XML files for storing thousands of builds or hundreds of thousands of Pipeline steps should be obvious, and they are a threat to the long-term sustainability of the project.
  • An embedded database can address many of these use cases without introducing new requirements on upgrade.
  • SQLite is the ideal embedded database. Though it has concurrency issues, these can be addressed by using multiple SQLite databases (think one SQLite database to store all the Pipeline build steps for a Pipeline run, another one to store the user database, etc).
  • Running Jenkins on SQLite is not a pipe dream. While a non-trivial project, it is within the realm of possibility, and it can be done retaining compatibility.
The Jenkins serialization API is object-oriented, and my prototype did not address the key point of adding an ORM (like Hibernate) with a session layer to manage caching. This would be essential for a full prototype, and implementing it in a backward-compatible way would be even more tricky. I stopped short of this, as it would take me well over one Saturday afternoon to put something like this together (starting by learning Hibernate from scratch). But on a high level it seems doable.

With an ORM and a session layer one can start to think about replacing SQLite with a client-server database under the hood for more advanced installations. As others have implied, this opens the door to using e.g. Postgres as a warm standby in a high availability setup.

Antonio Muñiz

unread,
Apr 25, 2022, 5:44:46 AM4/25/22
to jenkin...@googlegroups.com
This is cool Basil!

> and my prototype did not address the key point of adding an ORM (like Hibernate) 

If (or when) the PoC reaches this point, I would suggest evaluating MyBatis. Not a full ORM, but its simplicity and the lighter abstraction layer (compared to Hibernate) would make it a good fit given the necessary backward compatibility.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.


--
Antonio Muñiz
Human, Engineer
CloudBees, Inc.

Denys Digtiar

unread,
May 6, 2022, 5:10:05 AM5/6/22
to Jenkins Developers
Totally agree with the premise, but still have some odd fond attachment to the current design :) 
Reply all
Reply to author
Forward
0 new messages