ESOS SAN memory and SSD

24 views
Skip to first unread message

Willard Griselda

unread,
May 6, 2026, 3:01:01 PMMay 6
to esos-users
Hi,
I am rebuilding my IB SRP SAN, it doesn't involve CePH or ZFS by the way. I have 2 questions:
1, How does ESOS take advantage from ECC RAM? Does ESOS take advantage from large size of RAM?
2, Can I use SSDs to speed up read and write performance to HDDs?
Thank you in advance

Chris Kostecki

unread,
May 6, 2026, 4:03:42 PMMay 6
to esos-...@googlegroups.com
Unless something has changed, ESOS uses LVM. I don't think it would know if you are using ECC but it is recommended because the benefits of ECC are at a hadrware level not a software level. There are reasons why having more ram is better in ESOS but it has been a while so I do not remember.
Unless there is a feature change, LVM has no way of using SSDs as a read/write cache. If you are looking for that feature there are some HBAs that support it, like LSI/Broadcom and the CacheCade feature.

--
You received this message because you are subscribed to the Google Groups "esos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to esos-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/esos-users/52c0c118-0679-4c79-bdd5-762d048b83d8n%40googlegroups.com.

Chris Kostecki

unread,
May 6, 2026, 4:13:08 PMMay 6
to esos-...@googlegroups.com
Looks like LVM does support SSD cache, this may be not be in the TUI and would have to be a manual setup

Wilard

unread,
May 7, 2026, 1:30:25 PMMay 7
to esos-...@googlegroups.com
This is what I got from Gemini ai, how many percent are real?

Rebuilding an InfiniBand (IB) SAN with the Enterprise Storage OS (ESOS) is a solid choice for low-latency performance. Since you are moving away from Ceph and ZFS, you are likely looking at a more "traditional" block storage approach using SCST (Generic SCSI Target Subsystem).

Here is how ESOS handles your hardware resources:

1. ESOS and RAM: ECC vs. Capacity

In a storage environment, RAM isn't just for running the OS; it’s the primary landing zone for data.

Does ESOS take advantage of ECC?

Yes, but indirectly. ESOS itself (the Linux kernel and the SCST mid-layer) doesn't have "ECC-specific" features, but it relies entirely on the hardware's ability to ensure data integrity.

Data Integrity: Without ZFS’s checksumming, you are relying on the hardware to ensure bits don't flip. In a SAN environment, a single bit flip in RAM can result in "silent data corruption" being written to your HDDs.

Stability: Storage targets are high-interrupt environments. ECC helps prevent kernel panics that could lead to an ungraceful shutdown of your entire fabric.

Does it benefit from Large RAM?

Absolutely. Since you aren't using ZFS (and its ARC cache), ESOS uses the standard Linux Page Cache.

Read Caching: Any RAM not used by the OS is used to cache frequently accessed blocks. If your active "hot" data set fits in RAM, your IB SRP performance will be limited only by your network throughput, not your disks.

Write Buffering: ESOS can use RAM to buffer incoming writes (if "write-back" is enabled). However, be cautious: without a battery-backed RAID controller or an NVDIMM, a power failure could lose data sitting in that RAM buffer.

2. Using SSDs to Speed Up HDDs

Since you aren't using ZFS (which uses L2ARC/SLOG), you need a Linux-based caching layer. ESOS supports several methods to use SSDs as a "front-end" for your HDDs.

Recommended Technologies in ESOS:

TechnologyDescriptionUse CasebcacheThe most common Linux kernel caching shim. It hides the HDD behind an SSD.Best for general-purpose read/write acceleration.dm-cachePart of the LVM (Logical Volume Manager) stack.Best if you are already using LVM to manage your block devices.OpenCAS(Cache Acceleration Software) High-performance caching originally by Intel.Best for high-throughput environments where you need granular control over cache metadata.

How to implement it:

Read Cache (Write-Through): Data is written to both the SSD and HDD simultaneously. This speeds up reads but doesn't accelerate writes. It is the safest method.

Write Cache (Write-Back): Data is written to the SSD first and flushed to the HDD later. This provides massive write speed boosts, but requires a reliable SSD (ideally PLP/Power Loss Protection) to prevent data loss during a crash.

Pro-Tip: If you are using SRP (SCSI RDMA Protocol), your bottleneck will almost always be the latency of the storage media. Using bcache with a pair of mirrored NVMe or SATA SSDs as the "caching" tier for your HDD RAID group is the most effective way to keep your IB pipes full.

How many initiators (clients) are you planning to connect to this SAN?

Chris Kostecki

unread,
May 7, 2026, 1:45:51 PMMay 7
to esos-...@googlegroups.com
Yeah, that sounds right to me

Andrei Wasylyk

unread,
May 7, 2026, 10:53:40 PMMay 7
to esos-...@googlegroups.com
I'll bite. 

If I'm not mistaken the pagecache only applies if you are using vdisk_fileio based devices...

In SCST, you have a great number of possible device types of which the most popular are vdisk_fileio and vdisk_blockio.

BlockIO devices are raw block devices that you use as LUNs on the targets you define. Let's say you are defining only 1 target, and exposing all LUNs via that target, then on the ESOS side of things each LUN would likely be backed by an individual LV. So LV1 would be LUN0, LV2 LUN1...etc.

FileIO devices are litterally files. You can configure whatever partitioning layout or overlay whatever volume manager you want, then you would format it with any filesystem you please and create files within that filesystem to serve as backing devices for LUNs. You could have file1 for LUN0, fileblahblah for LUN1...etc.

In both cases, when you expose a LUN via SCST you define attributes for the virtual device that SCST is presenting to initiators - but that's all they are - virtual attributes. So setting write_through on SCST to 1 or 0 doesn't actually do anything - it's up to you to make sure that all the layers right down to the actual disks are also set to writethrough or writeback as well. Again, it's just what SCST tells the initiator - it doesn't actually do anything different if you turn it off or on except that the initiator will think it is on when it isn't (or vice versa). There is an exception - nv_cache - this option in SCST completely disregards any effort made by the initiator to force a write to disk. All throught the history of storage there has been a struggle between developers who want the ability to tell the storage subsystem "MAKE SURE YOU WRITE THIS TO DISK, not to cache - I want it on the DISK" and storage developers who in turn say "Hey what you don't know cant hurt you and I know storage much better than you do"

Now let's take the simplest scenario: 1 physical disk in ESOS.

If you use blockio, you may dedicate the entire drive as a single vdisk_blockio device. You would make sure that the device (/dev/sdb) has writeback caching enabled and then enable writeback on the device in SCST. However, because it's blockIO, SCST really doesn't do anything with regards to read or write caching. You basically just get whatever onboard cache that device came with.

If you use fileio, well things become a tiny bit more interesting. Since it's a filesystem well you get the pagecache at your disposal so now whatever RAM you have available will be used as cache. For reads only. 

In all cases writes are always going to a backend block device (whether it's an actual blocm device or a file on a filesystem) There is no mechanism in linux for setting aside some amount of RAM for caching writes cause that would be insanely irresponsible. it's not just a power loss issue, a kernel panic or hang in the system would cause the same thing as losing power.

Now, if you wanted to use lvm's bcache function to do tiered SSD/HDD storage with writes incoming to the SSDs and then getting written back to HDDs - well that applies regardless of whether you do block or fileio. But neither requires any RAM to do the write caching. 

So if you want to do fileio with file based devices, then yes RAM is good for reads only. Otherwise you could run the server on 4GB of ram and itll be fine. 

Side note, I have never ever seen good performance with SSD tiering on bcache. I wish I could be proven wrong one day. But I have never seen it perform well ever. I wish it did, because I highly suspect that in 200TB of data there is only ever maybe 10% of that which is "hot", so it would stand to reason that you should be able to have more or less SSD performance with 20TB SSDs in front of 200TB of HDDs. But I have never once seen that pan out. Invariably it seems like tiered storage basically performs just a tiny bit better than the slower tier no matter how powerful the faster one is.

Reply all
Reply to author
Forward
0 new messages