Index Of The Raid

1 view

Skip to first unread message

Beverly Zielonko

unread,

Aug 4, 2024, 10:29:04 PM8/4/24

to preskiyrona

TheRAID on the local MA was returning errors; so we replaced one of the faulty disk which caused the whole raid configuration to fail. We had to rebuild the whole virtual disk again. Unfortunately since there were only 3-6TB disks; everything OS/DDB/IndexCache/Libraries were in the same Virtual disk and we lost everything.

Since we had a copy of the backup data at the secondary location; the quickest way was to rebubuild the MA. The RAID controller was not showing any error; and it was not feasible to send another MA to the site due to lockdown

the backups are now running again(took a new full for all the backups; these were all file system agents) and Auxcopies running. Is there anything else we could have done to start the backups quickly?

thanks; also I am cleaning up the old mount paths as those do not exist anymore; however it is compalning about mount path being used by DDB; although that DDB is sealed. Am I able to delete the sealed DDB refrence? Just to tidy things up. Just want library to point only to the new mount path

When I give 32GB RAM to ES, It seems that maximum storable index size for a node is around 4TB.

So, If I set RAID 0, 20TB disks will be unused.

And I can not increase replica due to the ES heap limit.

Mappings look fine and properly optimised, so no problem there. You do however have quite large shards and the terms heap usage is high. I would recommend trying to reduce the average shard size to closer to 50GB to see if this makes a difference, e.g. by increasing the number of primary shards to between 15 and 18.

Raid configuration, what I asked first, seems a little independent from the this 15 shard test. Because disk space quietly large. How do you think about set raid up regardless of heap optimizing if multiple raid1 make sense.

It is a shame it did help, but I have to admit it was a long shot. As the mappings look good I am not sure I have any other suggestions apart from tweaking the circuit breaker thresholds a bit, but this is unlikely to give any massive improvement and could cause instability if pushed too far.

There is one thing I forgot to ask earlier: Do you allow Elasticsearch to automatically assign document IDs or do you set them yourself? If you set them yourself, what do the IDs look like and how are they generated?

Raid 6 is also parity-based. It just has two disks worth of parity, so that a double drive failure leaves you with your data. This means that all writes to raid 6 will cause writes to two disks in addition to the data writes. This sounds practically worse, but the primary problem with partity is the additional read. For every time you want to write out data, you must calculate partity with all the other data in the stripe, which possibly means fetching that data off disk. This means you pay with forced-sequenced I/O (very slow) or very large memory cacheing (which could go to enhancing your real workload). Typically you'll pay with some mix of both.

So in short, raid 5 and raid 6 should have the same type of performance costs. In fact, real-world raid 5 typically features a few hot-standby drives, so the terms are quite blurry here. Traditional raid 5 is unacceptably unreliable for most business needs. A drive failure during raid rebuild causes total data loss, and is statistically very likely after the first drive failure. Thus most people mean a modified implementation of raid 5 in most circumstances, of which raid 6 is one.

Hi. I have similar question.Im building a new splunk setup for an ISP with a 10gig splunk license. As u can see the dataamount isnt that big but we want to have enough read performance for the NOC.What im suggesting is the following:

2xSLC SSDS = 256GB Striped where we use for example 20GB as HOT, and 220GB as WARM14xSAS Disks in RAID10 for COLD data.We aim to having ca 2 years of data stored on the server. In addition we have backup over ISCSI.

Raid 5/6-type storage is usually good for cold storage. The cold DB is usually written only when buckets are rolled, so you do not incur the writing penalty that you would on the hot DB. Read-only performance of raid 5/6 is almost comparable to raid 0/10/01, assuming that the underlying disk and and disk interfaces systems are the same, and that number of disks correspond (e.g., raid 5 over 5 disks vs raid 6 over 6 disks vs raid 0 over 4 disks vs raid 10 over 8 disks)

While raid 10 is superior to raid 5 for performance and fault-tolerance, this suggests that if your total storage is large enough (more than, say, 1 or 2 TB per indexer) and flexible enough, you can achieve moderate savings in disks with only a little performance loss by placing about 100 GB per index on raid 10 for the hot DB, and the remainder on raid 5 for the cold DB.

Generally on busy indexer's it's not a good idea to go Raid 5 because of the extra work parity checking does on each write. If high performance is your main concern consider going Raid 10(raid 0+1) RAID 0, 01 instead of Raid 5

Our documentation has further info on harware spec

I have a raid 5 array composed of 3 hard drives and suddenly, when it started being inactive at boot time. Since the home directory is mounted on it, the system cannot boot and it's asking for manual user intervention. I found similar reports in forums, but most of them happen to have a defective hard drive which is not the case for me.

Stopping the array (mdadm --stop /dev/md0) and starting it again (mdadm --assemble --scan /dev/md0) shows no errors (there is no complaining or array rebuilding) and then it can be mounted properly (manual mount) so why it cannot be brought up at boot?

After checking the smartctl for all the hard drives composing the raid array (sda, sdb, sdc), I could not observe any error (no Current_Pending_Sector, UDMA_CRC_Error_Count, Offline_Uncorrectable). Short and long tests have been ran already.

Running the same command with -v (verbose output), I can spot two lines commenting "grub-probe: info: Found array md/0 (mdraid1x)." right after probing hd0 and hd1 which are mapped on sda and sdb correspondingly. So sdc doesn't have readable by grub metadata for the raid? People facing this problem suggested to update the raid metadata from 0.90 to 1.x but my raid is already using 1.2.

I tried to manually make sdc hard drive to fail twice (the first time just removed it and re-added it and the second time by using mdadm --zero-superblock /dev/sdc) and forced the raid to rebuild, but the error cannot go away so now I am stuck. Does anyone have a clue on what the problem might be and how can it be fixed?

After I did that for /dev/sdb, the problem was solved and now the grub-probe shows only one line with "grub-probe: info: Found array md/0 (mdraid1x)." instead of two as it happened before (look at the question).

So it must be the other way around than what I thought in the beginning about the index error. My thought was that this index should be present in every disk being part of the raid, that's why I was erasing sdc that grub-probe was not showing any "grub-probe: info: Found array md/0 (mdraid1x)." message.

We have a Fujitsu server with this specific RAID Controller (FUJITSU PRAID EP400i / EP420i).We want to export the configuration of that RAID Controller to a new one, so in case that RAID controller stops working, we replace the new one with the exact configuration from the previous one. From what I know, I need to find a way to access the index file and copy to another raid controller, does anyone here knows if it is possible and how? Greetings.

I was considering setting up servers, with the data disk being raid 0.. since we have a replica on another server.. I figured that would a good way to save a lot of money (SSD's is by far the most expensive part of a new cluster setup).

it seems no one is actually using the multiple paths approach.. even though with *TB cluster sizes - thats a lot of money saved if its more stable than raid 0 (which has 10X the risk - with 10 disk backing it- of failure).

@warkholm - it seems ES no longer stripes shards.. avoiding striping causing shard failures because all shards are spread over many paths.. - so from 2.0+ it should be good to have multiple paths.. and you'll only loose parts of your data, when a disk fails (instead of an entire raid 0)

how write performance is - compared to shards.. it should be much worse.. depending on how well ES shares writes over shards.. and with multiple paths PER server.. it would probably make sense to have more shards per index ?

Personally I prefer raid 0 to multiple data paths, even with 2.0's fixes. I see it as a performance vs safety tradeoff. Usually I'm fine with the safety that comes from sharding. But I agree that it is a nice tradeoff to be able to make. I wouldn't for example, raid 0 four disks together. It is just too much bother.