Best Practices for High Availability and Low Latency IoT Data Storage with MariaDB

52 views
Skip to first unread message

dragon lu

unread,
May 15, 2023, 11:32:42 PM5/15/23
to MariaDB ColumnStore

I am currently experimenting with a MariaDB Server Community Galera Cluster (InnoDB)+ MariaDB Columnstore database combination for my IoT data storage needs. Specifically, I receive records from IoT devices every minute and need to query the latest data from all devices as well as data from specified time points. It is essential for me to achieve high availability while maintaining low latency for client queries and ensuring that no sensor data is lost. Ideally, I would like to be able to query the data immediately after receiving it.

To achieve this, I have stored the raw sensor data in a Columnstore table and other related data (such as user data) in an InnoDB table. However, I cannot ensure high availability for Columnstore and cannot use Columnstore Cluster as I am not using the Enterprise version. I had considered using the mariadb data adapter to import InnoDB data into Columnstore, but this package is no longer being updated.

Can you suggest any best practices or alternative solutions for achieving high availability and low latency for IoT data storage using MariaDB?

dragon lu

unread,
May 16, 2023, 2:34:57 AM5/16/23
to MariaDB ColumnStore
I have attempted to use rsync for incremental backups of my data to other hosts, and the transfer was successful. However, since rsync uses a script to execute the copy process, I am concerned that the Columnstore original remote host may be undergoing DML inserts during the copy process, potentially corrupting the backup data. Additionally, I must manually design disaster recovery and host switching mechanisms, which can result in significant costs and testing efforts.
dragon lu 在 2023年5月16日 星期二上午11:32:42 [UTC+8] 的信中寫道:

Todd Stoffel

unread,
May 16, 2023, 2:41:15 AM5/16/23
to MariaDB ColumnStore
If you really want to use rsync to copy your ColumnStore data directory, you'll need to stop writes during the process.  Depending on the amount of data, it is probably cost prohibitive to back up terabytes or petabytes of data on a regular basis for an analytics system. I would consider using S3 storage which is redundant and has very high reliability and/or reloading of the data as an alternative solution.

Todd Stoffel

unread,
May 16, 2023, 2:42:38 AM5/16/23
to MariaDB ColumnStore
Galera and Columnstore rely on different methods of replication and this configuration would not be supported.

dragon lu

unread,
May 16, 2023, 2:58:23 AM5/16/23
to MariaDB ColumnStore

You are right.
To achieve high availability, my application accesses two different databases(2 independent hosts): one for application-related data (such as user information), which uses InnoDB as the storage engine and Galera as the clustering architecture, and another for recording data from IoT devices, which uses Columnstore as the storage engine and is backed up to a remote host using rsync (both primary and backup hosts have the same database content). The rsync backup script shuts down the service before backup and restarts it after the backup is complete.
Todd Stoffel 在 2023年5月16日 星期二下午2:42:38 [UTC+8] 的信中寫道:

Serguey Zefirov

unread,
May 16, 2023, 9:48:24 AM5/16/23
to dragon lu, MariaDB ColumnStore
The following applies if your system is on-premises or you can format
your storage as you need to.

Some of the file systems provide snapshots and also provide snapshot
difference - list of files and data blocks changed (ZFS on linux has
that from 2016 on). If your installation is single machine, you can
use one of these file systems and make truly incremental backups.

If your MCS installation is multi-machine, you have to coordinate
snapshots somehow, and stop writes as Todd suggested to have a
consistent DB image on disk. But, the downtime to make snapshots will
be not that big, snapshots are almost instanteneous.

Lustre (Linux cluster file system) does something like this:
https://github.com/DDNStorage/lustre_manual_markdown/blob/master/03.19-Lustre%20ZFS%20Snapshots.md
> --
> You received this message because you are subscribed to the Google Groups
> "MariaDB ColumnStore" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mariadb-columns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mariadb-columnstore/eedd2fb8-3378-4fa1-8ecc-858cdd94ad2cn%40googlegroups.com.
>

allen....@mariadb.com

unread,
May 16, 2023, 10:54:29 AM5/16/23
to MariaDB ColumnStore
Hey dragon lu,

I recommend mariadb professional hours if you want more detailed suggestions/advice. or an enterprise license which comes with ticketed support and consultation.

Columnstore is best for reporting and data warehousing, Since i see you're using galera i recommend testing MariaDB Xpand which has better write scale and can be used as a hybrid for lite analytics.
Both columnstore and xpand have IOT customers running successfully at large scale.
Reply all
Reply to author
Forward
0 new messages