--
---
You received this message because you are subscribed to the Google Groups "zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Well that's one point of view and choice.I'm sure those you refer to are far more knowledgeable than any other individuals.
I can only speak for myself. I have intentionally attempted to destroy data for years under ZFS, amazingly enough all data is always recoverable. I have intentionally stayed away from protected RAM to ensure data for clients is safe.
So back to trolling. Let's be honest, if you were not trolling you would have started a new thread for people to discuss your views.
Abstract: present a study of the effects of disk and memory corruption on file system data integrity. Our analysis focuses on Sun’s ZFS, a modern commercial offering with numerous reliability mechanisms. Through careful and thorough fault injection, we show that ZFS is robust to a wide range of disk faults. We further demonstrate that ZFS is less resilient to memory corruption, which can lead to corrupt data being returned to applications or system crashes. Our analysis reveals the importance of considering both memory and disk in the construction of truly robust file and storage systems.
...memory corruptions still remain a serious problem to data integrity. Our results for memory corruptions indicate cases where bad data is returned to the user, operations silently fail, and the whole system crashes. Our probability analysis shows that one single bit flip has small but non-negligible chances to cause failures such as reading/writing corrupt data and system crashing.
The conclusion of the paper is that ZFS does not protect against in-memory corruption, and thus can't provide end-to-end integrity in the presence of memory errors. I am not arguing against that at all; obviously you'll want ECC on your ZFS-based server if you value data integrity -- just as you would if you were using any other file system. That doesn't really have anything to do with the claim that ZFS specifically makes lack of ECC more likely to cause total data loss, though.
Why is this a zfs issue.
vulnerabilities. But *all* of them do -- including OSX. So it is disingenuous
We might buy this argument if, in fact, no other program had the same
to claim this as a ZFS deficiency.
Back to the OP, I'm not sure why he felt he had to mentioned being
part of SunOS. ZFS was never part of sunos.
cyberjock is the biggest troll ever, not even the people actually
involved with FreeNAS (iX system) knows what to do with him. He does
spend an awful amount of time on the freenas forums helping others and
as such tolerate him on that basis..
Otherwise, he just someone doing nothing, with a lot of time on his
hand and spewing the same stuff over and over simply because he has
heard about it.
Back to the ECC topic; one core issue to ZFS is that it will
specifically write to the pool even when all you are doing is read, in
an attempt to correct any data found to have incorrect checksum.
So say you have corrupted memory, you read from the disk, zfs believes
the data is faulty (after all, the checksum will be incorrect due to
faulty RAM) and start to rewrite the data. That is one scenario where
ZFS will corrupt an otherwise healthy pool until its too late and all
your data is gone.
As such, ZFS is indeed more sensitive to bad RAM than other filesystem.
Having said that; find me *ONE* official source other than the FreeNAS
forum stating that ECC is a minimal requirements (and no a wiki
written by cyberjock doesn't count). Solaris never said so, FreeBSD
didn't either, nor Sun.
Bad RAM however has nothing to do with the occasional bit flip that
would be prevented using ECC RAM. The probability of a bit flip is
low, very low.
The actual error rate found was several orders of magnitude higher than previous small-scale or laboratory studies, with 25,000 to 70,000 errors per billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate), and more than 8% of DIMM memory modules affected by errors per year.
Technically, what you qualify below is a truism under any hardware. ZFS is neither more or less susceptible to RAM failure as it has nothing to do with ZFS. Anything that gets written to the pool technically is sound. You have chosen a single possible point of failure, what of firmware, drive cache, motherboard, power surges, motion, etc.?
RAM/ECC RAM is like consumer drives vs pro drives in your system, recent long term studies have shown you don't get much more for the extra money.
I have been running ZFS in production using the past and current versions for OS X on over 60 systems (12 are servers) since Apple kicked ZFS loose. No systems (3 run ECC) have had data corruption or data loss.
Some pools have disappeared on the older ZFS but were easily recovered on modern (current development) and past OpenSolaris, FreeBSD, etc., as I keep clones of 'corrupted' pools for such tests. Almost always, these were the result of connector/cable failure. In that span of time no RAM has failed 'utterly' and all data and tests have shown quality storage. In that time 11 drives have failed and easily been replaced, 4 of those were OS drives, data stored under ZFS and a regular clone of the OS also stored under ZFS just in case. All pools are backed-up/replicated off site. Probably a lot more than most are doing for data integrity.No this data I'm providing is not a guarantee. It's just data from someone who has grown to trust ZFS in the real world for clients that cannot lose data for the most part due to legal regulations. I trust RAM manufacturers and drive manufacturers equally, I just verify for peace of mind with ZFS.
But if you insist: from "Oracle Solaris 11.1 Administration: ZFS File Systems", "Consider using ECC memory to protect against memory corruption. Silent memory corruption can potentially damage your data." [1]
The actual error rate found was several orders of magnitude higher than previous small-scale or laboratory studies, with 25,000 to 70,000 errors per billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate), and more than 8% of DIMM memory modules affected by errors per year.So, since you've agreed that ZFS is more vulnerable than other file systems to memory errors, and Google says that these errors are a lot more frequent than most people think that they are then the question becomes just how much more vulnerable is ZFS and is the extent of the corruption likely to be wider or more catastrophic than on other file systems?
It seems to me that if using ZFS without ECC memory puts someone's data at an increased risk over other file system then they ought to be told that so that they can make an informed decision. Am I really being unreasonable about this?
scan: scrub in progress since Mon Mar 31 10:14:52 2014
1.83T scanned out of 2.43T at 75.2M/s, 2h17m to go
0 repaired, 75.55% done
config:
NAME STATE READ WRITE CKSUM
moon ONLINE 0 0 91
mirror-0 ONLINE 0 0 110
diskid/DISK-VB92cae47b-31125427p1 ONLINE 0 0 112
diskid/DISK-VBd1496f13-1a733a17p1 ONLINE 0 0 114
mirror-1 ONLINE 0 0 72
diskid/DISK-VB343ad927-b4a3f4f8p1 ONLINE 0 0 77
diskid/DISK-VB245c2429-c36e13b0p1 ONLINE 0 0 74
logs
diskid/DISK-VB98bcd93f-cdf5113fp1 ONLINE 0 0 0
cache
diskid/DISK-VB56c14272-ddacbe50p1 ONLINE 0 0 0
errors: 43 data errors, use '-v' for a list
Doing a scrub is just obliterating my pool.
scan: scrub in progress since Mon Mar 31 10:14:52 2014
1.83T scanned out of 2.43T at 75.2M/s, 2h17m to go
0 repaired, 75.55% done
I'm also running ZFS on FreeBSD 10.0 (RELEASE) in VirtualBox on Windows 7 Ultimate.
Things seem to be pointing to non-ECC RAM causing checksum errors. It looks like I'll have to swap out my memory to ECC RAM if I want to continue this project, otherwise the data is pretty much hosed right now.
On Mar 31, 2014, at 2:23 PM, Eric Jaw <nais...@gmail.com> wrote:Doing a scrub is just obliterating my pool.Is it? I don’t think so:
scan: scrub in progress since Mon Mar 31 10:14:52 2014
1.83T scanned out of 2.43T at 75.2M/s, 2h17m to go
0 repaired, 75.55% doneNote the “0 repaired.”
errors: 43 data errors, use '-v' for a list
I'm also running ZFS on FreeBSD 10.0 (RELEASE) in VirtualBox on Windows 7 Ultimate.Are the disks that the VM sees file-backed or passed-through raw disks?
Things seem to be pointing to non-ECC RAM causing checksum errors. It looks like I'll have to swap out my memory to ECC RAM if I want to continue this project, otherwise the data is pretty much hosed right now.Did you actually run a memory tester (e.g., memcheck86), or is this just based on gut feeling? Lots of things can manifest as checksum errors. If you import the pool read-only, do successive scrubs find errors in different files (use “zpool status -v”) every time, or are they always in the same files? The former would indeed point to some kind of memory corruption issue, while in the latter case it’s much more likely that your on-disk data somehow got corrupted.
As one who has gone through all kinds of permutations to 'corrupt' data under ZFS, I'm calling BS on the RAM as the culprit. As Bjoern mentioned it sounds like connector issues, something I've seen a lot. However depending how you set your pool up, your data may be difficult to access but most likely complete and healthy.It amazes me how willing people are to blame something definitively with so little knowledge of what's going on and determined that quoting some discussion out of context will justify the irrationality.These threads are kind of getting redundant, pointless and I think some of these individuals are best served by Drobo or similar technology.
JasonSent from my iPhone 5S
--
---
You received this message because you are subscribed to the Google Groups "zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I started using ZFS about a few weeks ago, so a lot of it is still new to me. I'm actually not completely certain about "proper procedure" for repairing a pool. I'm not sure if I'm supposed to clear the errors after the scrub, before or after (little things). I'm not sure if it even matters. When I restarted the VM, the checksum counts cleared on its own.
On the first scrub it repaired roughly 1.65MB. None on the second scub. Even after the scrub there were still 43 data errors. I was expecting they were going to go away.
errors: 43 data errors, use '-v' for a list
This is an excellent question. They're in 'Normal' mode. I remember looking in to this before and decided normal mode should be fine. I might be wrong. So thanks for bringing this up. I'll have to check it out again.
memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub and the number or errors has remained at 43. Checksum errors continue to pile up as the pool is getting scrubbed.
I'm just as flustered about this. Thanks again for the input.
The long and the short of it, is that most likely you have a failing disk or controller/connector more than anything. I used to run an 8-disk, 4 mirrored pair pool on a small box without good airflow and slow, SATA-150 controllers that were supported by Solaris 10. I ended up replacing the whole system with a new large box with 140mm fans as well as sata-300 controllers to get better cooling. Over time, every disk has failed because of heat issues. Many of my SATA cables failed too. They were cheap junk.
Equipment has to be selected carefully. I do not see any failing bits for the 3+ years now that I have been running on the new hardware with all of the disks being replaced 2 years ago, so I have been making no changes for the past 2 years. All is good for me with ZFS and non-ECC ram.
On Mar 31, 2014, at 7:41 PM, Eric Jaw <nais...@gmail.com> wrote:I started using ZFS about a few weeks ago, so a lot of it is still new to me. I'm actually not completely certain about "proper procedure" for repairing a pool. I'm not sure if I'm supposed to clear the errors after the scrub, before or after (little things). I'm not sure if it even matters. When I restarted the VM, the checksum counts cleared on its own.The counts are not maintained across reboots.On the first scrub it repaired roughly 1.65MB. None on the second scub. Even after the scrub there were still 43 data errors. I was expecting they were going to go away.errors: 43 data errors, use '-v' for a listWhat this means is that in these 43 cases, the system was not able to correct the error (i.e., both drives in a mirror returned bad data).This is an excellent question. They're in 'Normal' mode. I remember looking in to this before and decided normal mode should be fine. I might be wrong. So thanks for bringing this up. I'll have to check it out again.The reason I was asking is that these symptoms would also be consistent with something outside the VM writing to the disks behind the VM’s back; that’s unlikely to happen accidentally with disk images, but raw disks are visible to the host OS as such, so it may be as simple as Windows deciding that it should initialize the “unformatted” (really, formatted with an unknown filesystem) devices. Or it could be a raid controller that stores its array metadata in the last sector of the array’s disks.
memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub and the number or errors has remained at 43. Checksum errors continue to pile up as the pool is getting scrubbed.
I'm just as flustered about this. Thanks again for the input.Given that you’re seeing a fairly large number of errors in your scrubs, the fact that memtest86 doesn’t find anything at all very strongly suggests that this is not actually a memory issue.
ZFS is lots of parts, in most cases lots of cheap unreliable parts, refurbished parts, yadda yadda, as posted on this thread and many, many others, any issues are probably not ZFS but the parts of the whole. Yes, it could be ZFS, after you confirm that all the parts ate pristine, maybe.
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
<ZFS.vbox>
ZFS issues infrequent flushes (every 5 second or so) after the uberblock updates. The flushing infrequency is fairly inconsequential so no tuning is warranted here. ZFS also issues a flush every time an application requests a synchronous write (O_DSYNC, fsync, NFS commit, and so on).
12.2.2. Responding to guest IDE/SATA flush requests
If desired, the virtual disk images can be flushed when the guest issues the IDE FLUSH CACHE command. Normally these requests are ignored for improved performance. The parameters below are only accepted for disk drives. They must not be set for DVD drives.
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
VBoxManage setextradata "VM name" "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/IgnoreFlush" 0
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
ZFS is lots of parts, in most cases lots of cheap unreliable parts, refurbished parts, yadda yadda, as posted on this thread and many, many others, any issues are probably not ZFS but the parts of the whole. Yes, it could be ZFS, after you confirm that all the parts ate pristine, maybe.
My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM (not ECC) it is the home server for music, tv shows, movies, and some interim backups. The mini has been modded for ESATA and has 6 drives connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been running since ZFS was released from Apple builds. Lost 3 drives, eventually traced to a new cable that cracked at the connector which when hot enough expanded lifting 2 pins free of their connector counter parts resulting in errors. Visually almost impossible to see. I replaced port multipliers, Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS, restored ZFS data from backup, finally to find the bad connector end one because it was hot and felt 'funny'.Frustrating, yes, educational also. The happy news is, all the data was fine, wife would have torn me to shreds if photos were missing, music was corrupt, etc., etc.. And this was on the old out of date but stable ZFS version we Mac users have been hugging onto for dear life. YMMVNever had RAM as the issue, here in the mad science lab across 10 rotating systems or in any client location - pick your decade. However I don't use cheap RAM either, and I only have 2 Systems requiring ECC currently that don't even connect to ZFS as they are both XServers with other lives.--Jason BelecSent from my iPadOn Mar 31, 2014, at 7:41 PM, Eric Jaw <nais...@gmail.com> wrote:I started using ZFS about a few weeks ago, so a lot of it is still new to me. I'm actually not completely certain about "proper procedure" for repairing a pool. I'm not sure if I'm supposed to clear the errors after the scrub, before or after (little things). I'm not sure if it even matters. When I restarted the VM, the checksum counts cleared on its own.
The counts are not maintained across reboots.On the first scrub it repaired roughly 1.65MB. None on the second scub. Even after the scrub there were still 43 data errors. I was expecting they were going to go away.errors: 43 data errors, use '-v' for a listWhat this means is that in these 43 cases, the system was not able to correct the error (i.e., both drives in a mirror returned bad data).This is an excellent question. They're in 'Normal' mode. I remember looking in to this before and decided normal mode should be fine. I might be wrong. So thanks for bringing this up. I'll have to check it out again.
The reason I was asking is that these symptoms would also be consistent with something outside the VM writing to the disks behind the VM’s back; that’s unlikely to happen accidentally with disk images, but raw disks are visible to the host OS as such, so it may be as simple as Windows deciding that it should initialize the “unformatted” (really, formatted with an unknown filesystem) devices. Or it could be a raid controller that stores its array metadata in the last sector of the array’s disks.memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub and the number or errors has remained at 43. Checksum errors continue to pile up as the pool is getting scrubbed.
I'm just as flustered about this. Thanks again for the input.Given that you’re seeing a fairly large number of errors in your scrubs, the fact that memtest86 doesn’t find anything at all very strongly suggests that this is not actually a memory issue.
--
---
You received this message because you are subscribed to the Google Groups "zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+...@googlegroups.com.
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
I have no dog in this fight, but I wonder if possibly the late discovery of the need for ECC was a factor in Apple's abandoning the ZFS project. Unlikely they'd want to reengineer all their machines for it.
I have no dog in this fight, but I wonder if possibly the late discovery of the need for ECC was a factor in Apple's abandoning the ZFS project. Unlikely they'd want to reengineer all their machines for it.
--
Michael Newbery
"I have a soft spot for politicians---it's a bog in the west of Ireland!"
Dave Allen
--
---
You received this message because you are subscribed to a topic in the Google Groups "zfs-macos" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zfs-macos+...@googlegroups.com.
Hi Phil,