Differences in 'DB Bootstrap', 'Data Provider' and 'Data Import'?

16 views
Skip to first unread message

Jochen Zehnder

unread,
Aug 2, 2017, 6:59:53 AM8/2/17
to Information Workbench Discussions
Hi,
I have a question regarding the differences in handling data that is imported using 'DB Bootstrap', 'Data Provider' and 'Data Import' mechanism. I at least think, that I have a good understanding of the first two, however, the 'Data Import' mechanism is still a bit unclear to me.

The following description is my understanding of how the 'Data Provider' and 'DB Bootstrap' mechanism work:
  • The 'Data Provider' will always delete the data from the previous run and only store the newly gathered data.
  • Data imported using the 'DB Bootstrap' mechanism will be replaced based on the filename. Meaning that when changes are made to a file, the old content will be deleted and replaced with the new content of the file during bootstrapping the data from 'data/dbBootstrap'
When I 'Import Data' (and specify a 'Target Context', e.g. use the filename I import as the context), and I later want to replace some part of the previously imported data (e.g. due to a bug in the first version), how do I do this?

Is my understanding of the first two mechanisms correct or have I misunderstood something there? Is there some documentation available that describes the handling and differences?

Thanks
Jochen

Andreas Schwarte

unread,
Aug 3, 2017, 2:54:34 AM8/3/17
to iwb-dis...@googlegroups.com

Hi Jochen,

 

Your understanding is correct.

 

A bit of technical background: in the platform we store data in contexts (i.e. RDF statements are persisted as quads in a named graph). We distinguish user contexts and system contexts.

 

For bootstrapping we use a ‘repeatable’ context identifier (basically composed from the dbBootstrap file name) to allow for replacing previous content, i.e. as you stated: in an upgrade of a dbBootstrap the entire previous context is deleted and then replaces with the new data from the RDF dump file.

 

For providers (in the default configuration, see hint later) we use a similar strategy: each provider run replaces the entire previous data with the result of the gathering process. This is achieved by identifying the contexts via the context source, i.e. the provider is the context source of the respective context. Now the advanced behavior: we support different kind of write strategies which may behave differently. An example is the built-in delta write strategy which does not replace the entire context, but only persists the delta (to reduce the write workload on the database). See the provider documentation for details on this.

 

The third case is importing data (e.g. from the UI via Admin:Import): also here data is always loaded into a specific context, optionally a target context provided by the user. The import is a simple loading process, i.e. no data is deleted. If you want to replace the data, the old context needs to be deleted beforehand (* Site note: a single named graph, i.e. context, can hold the same triple, i.e. subject, predicat, object, only once). This can be done in the UI from the Admin:ContentOverview page or automated from code using the ReadWriteDataManager delete context method (if you know the URI of the context), i.e. in an upgrade handler.

 

Hope this answers your questions. If something is unclear, please send a short mail.

 

Best,

Andreas

--
You received this message because you are subscribed to the Google Groups "Information Workbench Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iwb-discussio...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jochen Zehnder

unread,
Aug 4, 2017, 11:50:06 AM8/4/17
to Information Workbench Discussions
Hi Andreas,
thanks for the detailed information, this helps me a lot.

May I request an enhancement for the Data Import feature. Could you add a checkbox (or something similar) so that you can delete an already existing context with the same name before importing new data?

I'm asking, because we load some additional configurations into the system using the Data Import functionality. And here we often end up doing the following:
  1. Import updated data
  2. Realizing that we have duplicate information
  3. Deleting the context
  4. Importing the data again
Thanks
Jochen

Andreas Schwarte

unread,
Aug 7, 2017, 3:03:00 AM8/7/17
to iwb-dis...@googlegroups.com

Hi Jochen,

 

I created an enhancement request for this in the backlog which we will discuss with product management.

 

For your use-case of patching a staging / production system I would suggest to go for a different workflow.

 

You can make use of the solution / app mechanism also for this use case: just create a zip archive with the following structure:

 

- MyAppDataPatch.zip

  - data

    - dbBootstrap

      - myDataDump.ttl

 

Optionally you can provide solution.properties or metadata for version.

 

If you now install this app (e.g. at runtime from the Admin:Apps dashboard) the entire context of the data (i.e. in the same way as with the regular dbBootstrap) will be replaced with the new content.

 

Note that this is repeatable, just re-create different versions of your patch app (i.e. the zip file).

 

If you want you can also automate this with your build scripts to create such patch artifacts.

 

Best,

Andreas

 

Von: iwb-dis...@googlegroups.com [mailto:iwb-dis...@googlegroups.com] Im Auftrag von Jochen Zehnder
Gesendet: Freitag, 4. August 2017 17:50
An: Information Workbench Discussions <iwb-dis...@googlegroups.com>

--

Reply all
Reply to author
Forward
0 new messages