Corrupt data - RAID sata

Bengt Samuelsson

unread,

Jan 2, 2009, 7:20:08 AM1/2/09

to

Hi,

I need some support for this soft-raid system.

I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte

And it runs in http://sm7jqb.dnsalias.com
I use mdadm sytem in a Debian Linux
CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )

I have 'some courrupt' data. And I don't understand whay and how to fix it.
Mybee slow it down more, but how slow it down?

Any with experents from this cheep way of RAID systems.

Ask for more information and I can get it, logs, setup files and what you want
to know.

--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Justin Piszcz

unread,

Jan 2, 2009, 7:50:07 AM1/2/09

to

If this is an mdadm-related raid (not dmraid) please show all relevant md
info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.

You'll want to read md.txt in /usr/src/linux/Documentation and read on the
check and repair commands.

In addition, have you run memtest86 on your system first to make sure its
not memory related?

Justin.

Justin Piszcz

unread,

Jan 2, 2009, 7:50:05 AM1/2/09

to

Twigathy

unread,

Jan 2, 2009, 5:10:04 PM1/2/09

to

Hi,

I also had problems with the sata_sil driver with more than one
silicon image card in the same machine about a year or two back. Don't
remember the specifics, but basically the cards would occasionally
drop the SATA link. This was with Western Digital drives. With a
Samsung 750GB disk the disk and controller absolutely refused to talk
to each other.

I've since got rid of all but one silicon image card and haven't had
problems since and swapped out cables. Coincidence? No idea.

04:01.0 RAID bus controller: Silicon Image, Inc. SiI 3512
[SATALink/SATARaid] Serial ATA Controller (rev 01)
Currently running kernel 2.6.24-21

Not much fun when disks don't work properly, is it? :-(

T

2009/1/2 Bernd Schubert <b...@q-leap.de>:
> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks. So far I only know
> about Seagate, but maybe there issues with newer Samsungs as well?
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
>
> Unfortuntely this issue has been simply ignored by the SATA developers :(
> So if you want to be on the safe side, go an get another controller.
>
> I hope I won't frighten you too much, but it also might be possible one of
> your disks has a problem, I have also seen a few broken disks, which don't
> return what you write to it...
>
>
> Cheers,
> Bernd

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Bernd Schubert

unread,

Jan 2, 2009, 5:50:14 PM1/2/09

to

Hello Bengt,

sil3114 is known to cause data corruption with some disks. So far I only know
about Seagate, but maybe there issues with newer Samsungs as well?

http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html

Unfortuntely this issue has been simply ignored by the SATA developers :(
So if you want to be on the safe side, go an get another controller.

I hope I won't frighten you too much, but it also might be possible one of
your disks has a problem, I have also seen a few broken disks, which don't
return what you write to it...

Cheers,
Bernd

On Fri, Jan 02, 2009 at 07:42:30AM -0500, Justin Piszcz wrote:
>
>

Redeeman

unread,

Jan 2, 2009, 10:20:06 PM1/2/09

to

On Fri, 2009-01-02 at 22:30 +0100, Bernd Schubert wrote:
> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks. So far I only know
> about Seagate, but maybe there issues with newer Samsungs as well?
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
>
> Unfortuntely this issue has been simply ignored by the SATA developers :(
> So if you want to be on the safe side, go an get another controller.

Are you sure? is this not the "15" or "slow_down" thing mentioned here:
http://ata.wiki.kernel.org/index.php/Sata_sil ?

> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Bengt Samuelsson

unread,

Jan 3, 2009, 4:30:11 AM1/3/09

to

Justin Piszcz skrev:

~# mdadm -D /dev/md0
------------------------------
/dev/md0:
Version : 00.90.03
Creation Time : Fri Sep 12 19:08:22 2008
Raid Level : raid5
Array Size : 1465151616 (1397.28 GiB 1500.32 GB)
Device Size : 488383872 (465.76 GiB 500.11 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Jan 2 16:53:10 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

UUID : 68439662:90431c4a:5e66217b:5a1a585f (local to host
sm7jqb.dnsalias.com)
Events : 0.13406

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
------------------------------

>
> You'll want to read md.txt in /usr/src/linux/Documentation and read on
> the check and repair commands.
>
> In addition, have you run memtest86 on your system first to make sure
> its not memory related?

I am vorking on this.
>
> Justin.

Bernd Schubert

unread,

Jan 3, 2009, 8:40:06 AM1/3/09

to

On Saturday 03 January 2009 03:31:57 Redeeman wrote:
> On Fri, 2009-01-02 at 22:30 +0100, Bernd Schubert wrote:
> > Hello Bengt,
> >
> > sil3114 is known to cause data corruption with some disks. So far I only
> > know about Seagate, but maybe there issues with newer Samsungs as well?
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
> >
> > Unfortuntely this issue has been simply ignored by the SATA developers :(
> > So if you want to be on the safe side, go an get another controller.
>
> Are you sure? is this not the "15" or "slow_down" thing mentioned here:
> http://ata.wiki.kernel.org/index.php/Sata_sil ?
>

According to Jeff Garzik and Tejun Heo 3114 is not affected by the mod15 bug.
The mod15 also help in our case, but probably we are just luckily.

https://kerneltrap.org/mailarchive/linux-kernel/2007/10/11/334985/thread

Cheers,
Bernd

--
Bernd Schubert
Q-Leap Networks GmbH

Alan Cox

unread,

Jan 3, 2009, 10:00:14 AM1/3/09

to

On Fri, 2 Jan 2009 22:30:07 +0100
Bernd Schubert <b...@q-leap.de> wrote:

> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks.

News to me. There are a few people with lots of SI and other devices
jammed into the same mainboard who had problems but that doesn't appear
to be an SI problem as far as I can tell.

There are some incompatibilities between certain silicon image chips and
Nvidia chipsets needing BIOS workarounds according to the errata docs.

Alan

Bernd Schubert

unread,

Jan 3, 2009, 11:40:06 AM1/3/09

to

On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
> On Fri, 2 Jan 2009 22:30:07 +0100
> Bernd Schubert <b...@q-leap.de> wrote:
>
> > Hello Bengt,
> >
> > sil3114 is known to cause data corruption with some disks.
>
> News to me. There are a few people with lots of SI and other devices

No no, you just forgot about it, since you even reviewed the patches ;)

http://lkml.org/lkml/2007/10/11/137

> jammed into the same mainboard who had problems but that doesn't appear
> to be an SI problem as far as I can tell.
>
> There are some incompatibilities between certain silicon image chips and
> Nvidia chipsets needing BIOS workarounds according to the errata docs.

Well, I already posted the the links to the discussion we had in the past.
The corruption issue is easily reproducible on Tyan S2882 with AMD-8111,
SiI 3114 and ST3250820AS disks. This is on a compute cluster, and we run into
the problem, when a few ST3200822AS failed and got replaced by newer 250GB
disks. The 200GB ST3200822AS work perfectly fine, while the 250GB ST3250820AS
disks cause data corrution.

Presently the cluster is empty, so if you want do help me, your help to
properly solve the issue would be highly appreciated (*).

Cheers,
Bernd

PS: The patches I posted work fine on these systems, but they are not upstream
and I really would prefer to find a way in vanilla linux to prevent this
data corruption.

PPS: Its a bit funny with this cluster, since it is located at my university
group and I did and do many calculations on it myself. But presently I work
for the company we bought it from and which is responsible to maintain it... ;)

Robert Hancock

unread,

Jan 3, 2009, 1:50:07 PM1/3/09

to

Bernd Schubert wrote:
> On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
>> On Fri, 2 Jan 2009 22:30:07 +0100
>> Bernd Schubert <b...@q-leap.de> wrote:
>>
>>> Hello Bengt,
>>>
>>> sil3114 is known to cause data corruption with some disks.
>> News to me. There are a few people with lots of SI and other devices
>
> No no, you just forgot about it, since you even reviewed the patches ;)
>
> http://lkml.org/lkml/2007/10/11/137

And Jeff explained why they were not merged:

http://lkml.org/lkml/2007/10/11/166

All the patch does is try to reduce the speed impact of the workaround.
But as was pointed out, they don't reliably solve the problem the
workaround is trying to fix, and besides, the workaround is already not
applied to SiI3114 at all, as it is apparently not applicable on that
controller (only 3112).

>
>> jammed into the same mainboard who had problems but that doesn't appear
>> to be an SI problem as far as I can tell.
>>
>> There are some incompatibilities between certain silicon image chips and
>> Nvidia chipsets needing BIOS workarounds according to the errata docs.

Do you have details of these Alan?

>
> Well, I already posted the the links to the discussion we had in the past.
> The corruption issue is easily reproducible on Tyan S2882 with AMD-8111,
> SiI 3114 and ST3250820AS disks. This is on a compute cluster, and we run into
> the problem, when a few ST3200822AS failed and got replaced by newer 250GB
> disks. The 200GB ST3200822AS work perfectly fine, while the 250GB ST3250820AS
> disks cause data corrution.
>
> Presently the cluster is empty, so if you want do help me, your help to
> properly solve the issue would be highly appreciated (*).
>
>
> Cheers,
> Bernd
>
> PS: The patches I posted work fine on these systems, but they are not upstream
> and I really would prefer to find a way in vanilla linux to prevent this
> data corruption.

Some people have tried turning on the slow_down option or adding their
drive to the mod15 blacklist and found that problems went away, but that
in no way implies that their setup actually needs this workaround, only
that it slows down the IO enough that the problem no longer shows up.
It's a big hammer that can cover up all kinds of other issues and has
confused a lot of people into thinking the mod15write problem is bigger
than it actually is.

Bernd Schubert

unread,

Jan 3, 2009, 3:30:13 PM1/3/09

to

[sorry sent again, since Robert dropped all mailing list CCs and I didn't
notice first]

On Sat, Jan 03, 2009 at 12:31:12PM -0600, Robert Hancock wrote:
> Bernd Schubert wrote:
>> On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
>>> On Fri, 2 Jan 2009 22:30:07 +0100
>>> Bernd Schubert <b...@q-leap.de> wrote:
>>>
>>>> Hello Bengt,
>>>>
>>>> sil3114 is known to cause data corruption with some disks.
>>> News to me. There are a few people with lots of SI and other devices
>>
>> No no, you just forgot about it, since you even reviewed the patches ;)
>>
>> http://lkml.org/lkml/2007/10/11/137
>
> And Jeff explained why they were not merged:
>
> http://lkml.org/lkml/2007/10/11/166
>
> All the patch does is try to reduce the speed impact of the workaround.
> But as was pointed out, they don't reliably solve the problem the
> workaround is trying to fix, and besides, the workaround is already not
> applied to SiI3114 at all, as it is apparently not applicable on that
> controller (only 3112).

Well, do they reliable solve the problem in our case (before taking the patch
into production I run a checksum tests for about 2 weeks). Anyway, I entirely
understand the patches didn't get accepted.

But now more than a year has passed again without doing anything
about it and actually this is what I strongly criticize. Most people don't
know about issues like that and don't run file checksum tests as I now always
do before taking a disk into production. So users are exposed to known
data corruption problems without even being warned about it. Usually
even backups don't help, since one creates a backup of the corrupted data.

So IMHO, the driver should be deactived for sil3114 until a real solution is
found. And it only should be possible to force activate it by a kernel flag,
which then also would print a huuuge warning about possible data corruption
(unfortunately most distributions disables inital kernel messages *grumble*).

Cheers,
Bernd

Robert Hancock

unread,

Jan 3, 2009, 4:20:06 PM1/3/09

to

If the corruption was happening on all such controllers then people
would have been complaining in droves and something would have been
done. It seems much more likely that in this case the problem is some
kind of hardware fault or combination of hardware which is causing the
problem. Unfortunately these kind of not-easily-reproducible issues tend
to be very hard to track down.

Bernd Schubert

unread,

Jan 3, 2009, 4:40:10 PM1/3/09

to

Well yes, it only happens with certain drives. But these drives work fine on
other controllers. But still these are by now
known issues and nothing is done for that.
I would happily help to solve the problem, I just don't have any knowledge
about hardware programming. What would be your next step, if you had remote
access to such a system?

Thanks,
Bernd

James Youngman

unread,

Jan 3, 2009, 5:40:09 PM1/3/09

to

On Fri, Jan 2, 2009 at 9:30 PM, Bernd Schubert <b...@q-leap.de> wrote:
> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks. So far I only know
> about Seagate, but maybe there issues with newer Samsungs as well?

I've experienced data corruption with a SII 0680 ACLU144 (on an ST
Labs' A-132 card) with a pair of Seagate ST3300622A drives. I was
using them with MD in a RAID1 configuration.

James.

Robert Hancock

unread,

Jan 3, 2009, 6:40:07 PM1/3/09

to

Have you been able to track down what kind of corruption is occurring
exactly, i.e. what is happening to the data, is data being zeroed out,
random bits being flipped, chunks of a certain size being corrupted,
etc. That would likely be useful in determining where to go next..

debia...@jamie-thompson.co.uk

unread,

Jan 4, 2009, 8:30:19 AM1/4/09

to

Bengt Samuelsson wrote:
>
> Hi,
>
> I need some support for this soft-raid system.
>
> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T
> byte
>
> And it runs in http://sm7jqb.dnsalias.com
> I use mdadm sytem in a Debian Linux
> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>
> I have 'some courrupt' data. And I don't understand whay and how to fix it.
> Mybee slow it down more, but how slow it down?
>
> Any with experents from this cheep way of RAID systems.
>
> Ask for more information and I can get it, logs, setup files and what
> you want
> to know.
>

For what it's worth...

I have a pair of 500GB Maxtor disks using software RAID on my Lenny server with
a sil3114 PCI board. I've not had any trouble with them, (though after all this
I might do some more tests!) Probably helps that the machine is a dinosaur and
is so slow CPU-wise that nothing comes anywhere near any dodgy tolerances.

I bought both disks at once though, and knowing that's a bad idea (but curse
special offers!), when I bought another one months later, I swapped out one of
them for the new drive and put the old drive in my XP PC, also with a sil3114
PCI board.

Here's where it gets interesting...I'm suffering similar corruption to that
mentioned previously in this thread (I reiterate, this is apparently identical
behaviour between the Linux issues and my XP issues) - under load the FS is
corrupted. I've been struggling for days to populate the disk with about 100GB
of data and it'll run fine for an hour or two then something pretty much always
gets corrupted. At one point it got so bad chkdsk would just segfault every
time. I had to format to fix it! I've tried smartctl for win32 and it says the
drive is fine. Everything is well within tolerances for a 6 month old drive
that's been used in a server.

I explicitly chose the sil3114 boards as I was of the impression that Silicon
Image chipsets were good with Linux. Shame they're not good overall it would seem :(

Who can I buy? I first got burned with Initio's utter tosh (So bad that I have
to use it under Linux (thanks to Mr Heo!) as the Windows drivers time out and/or
corrupt!), and now seemingly Silicon Image...

- Jamie

signature.asc

Bengt Samuelsson

unread,

Jan 4, 2009, 12:40:06 PM1/4/09

to

Bengt Samuelsson skrev:
> Justin Piszcz skrev:

>>
>>
>> On Fri, 2 Jan 2009, Bengt Samuelsson wrote:
>>
>>>
>>> Hi,
>>>
>>> I need some support for this soft-raid system.
>>>
>>> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot
>>> 1.3T byte
>>>
>>> And it runs in http://sm7jqb.dnsalias.com
>>> I use mdadm sytem in a Debian Linux
>>> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>>>
>>> I have 'some courrupt' data. And I don't understand whay and how to
>>> fix it.
>>> Mybee slow it down more, but how slow it down?
>>>
>>> Any with experents from this cheep way of RAID systems.
>>>
>>> Ask for more information and I can get it, logs, setup files and what
>>> you want
>>> to know.
>>>

>>> --
>>> Bengt Samuelsson
>>> Nydalavägen 30 A
>>> 352 48 Växjö
>>>
>>> +46(0)703686441
>>>
>>> http://sm7jqb.se
>>>
>>>

>>> --
>>> To UNSUBSCRIBE, email to debian-us...@lists.debian.org with a
>>> subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
>>>
>>

No memory error

Next?

>>
>> Justin.
>
>
>

--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se

Bengt Samuelsson

unread,

Jan 4, 2009, 1:40:12 PM1/4/09

to

No memory error

L1 Cache 128 7361MB/s
L2 Cache 64k 3260MB/s
Mem 1024M 275MB/s

Next to check for?

>>
>> Justin.
>
>
>

--
<body bgcolor="#ffffff" text="#000000">
<div class="moz-signature">
<font size=2 color="#000000" face="Times New Roman">Bengt Samuelsson<br>
Nydalavägen 30 A<br>
352 48 Växjö
<font size=1><br><br>
mobil: +46(0)703686441<br>
<a href="http://sm7jqb.se" target="_blank">http://sm7jqb.se</a>
 
<a href="http://data-doc.se" target="_blank">http://data-doc.se</a>
</font></font></div>
</body>

Justin Piszcz

unread,

Jan 4, 2009, 4:30:13 PM1/4/09

to

You ran a check on the array and then checked mismatch_cnt?

Justin.

Justin Piszcz

unread,

Jan 5, 2009, 6:20:14 AM1/5/09

to

On Mon, 5 Jan 2009, Bengt Samuelsson wrote:

> Justin Piszcz skrev:

>>
>>
>> On Sun, 4 Jan 2009, Bengt Samuelsson wrote:
>>
>>> Bengt Samuelsson skrev:
>>>

>>> No memory error
>>>
>>> L1 Cache 128 7361MB/s
>>> L2 Cache 64k 3260MB/s
>>> Mem 1024M 275MB/s
>>>
>>> Next to check for?
>>>

>>> --

>>
>> You ran a check on the array and then checked mismatch_cnt?

> like
> ~# fsck.ext3 -y -v /dev/md0 ??
> You vant to se any log ?? I do not understand maybe?
> It works for 95% I want it to work 100%
>
> /var/log/fsck/ceheks
> Log of fsck -C -R -A -a
> Sun Jan 4 16:30:05 2009
>
> fsck 1.40-WIP (14-Nov-2006)
> /: clean, 21179/987712 files, 648652/1973160 blocks
> boot: clean, 30/32128 files, 22378/128488 blocks
> /dev/md0: clean, 142094/183156736 files, 23162450/366287904 blocks (check
> after next mount)
>
> Sun Jan 4 16:30:06 2009
> ----------------
>
> Can I se the sata_sil parameters?
> Or test something there?
> For me it shuld slow don a bit more.

>
>
> --
> Bengt Samuelsson
> Nydalavägen 30 A
> 352 48 Växjö
>
> +46(0)703686441
>
> http://sm7jqb.se
>

Run a check on the array:
p34:~# echo check > /sys/devices/virtual/block/md0/md/sync_action
p34:~#

Watch the status:
p34:~# cat /proc/mdstat

When its done, run:

p34:~# cat /sys/devices/virtual/block/md0/md/mismatch_cnt

0
p34:~#

And show the output.

Justin.

Bengt Samuelsson

unread,

Jan 5, 2009, 6:30:15 AM1/5/09

to

+46(0)703686441

http://sm7jqb.se

Bengt Samuelsson

unread,

Jan 6, 2009, 5:40:07 AM1/6/09

to

Justin Piszcz skrev:

> Run a check on the array:
> p34:~# echo check > /sys/devices/virtual/block/md0/md/sync_action

I found /sys/block/md0/md/sync_action
idle

I don't find 'check' i my box, but I run this every 2nd day, it help a bit.
/etc/cron.d/mdadm/
...
5 0 * * 1,3,5 root [ -x /usr/share/mdadm/checkarray ] \
&& /usr/share/mdadm/checkarray --cron --all --quiet
...

> p34:~#
>
> Watch the status:
> p34:~# cat /proc/mdstat

---
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]
1465151616 blocks level 5, 128k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>
---

>
> When its done, run:
>
> p34:~# cat /sys/devices/virtual/block/md0/md/mismatch_cnt

/sys/block/md0/md/mismatch_cnt 0

> p34:~#
>
> And show the output.
>
> Justin.

I find
/sys/module/sata_sil/parameter/slow_down

0

and some more, all locks ok to me!

My motherboard is KT7A-RAID (not using that raid)
AMD 1.2Ghz
I have a pdf manual.

SATA kontroler board SA3114-4IR
http://it.us.syba.com/product/43/02/05/index.html

Now I have -3C and sunshine outside, I have to go out in the sun now!

Justin Piszcz

unread,

Jan 6, 2009, 5:50:10 AM1/6/09

to

cc: linux-ide, linux-raid

There was some talk about corruption on these chips I believe, hopefully
someone on the list can offer further insight.

Tejun Heo

unread,

Jan 7, 2009, 12:50:05 AM1/7/09

to

Hello,

Bernd Schubert wrote:
> But now more than a year has passed again without doing anything
> about it and actually this is what I strongly criticize. Most people
> don't know about issues like that and don't run file checksum tests
> as I now always do before taking a disk into production. So users
> are exposed to known data corruption problems without even being
> warned about it. Usually even backups don't help, since one creates
> a backup of the corrupted data.

sata_sil being one of the most popular controllers && data corruption
reports seem to be concentrated on certain chipsets, I don't think
it's a wide spread problem. In some cases, the corruption was very
reproducible too.

I think it's something related to setting up the PCI side of things.
There have been hints that incorrect CLS setting was the culprit and I
tried thte combinations but without any success and unfortunately the
problem wasn't reproducible with the hardware I have here. :-(

Anyways, there was an interesting report that updating the BIOS on the
controller fixed the problem.

http://bugzilla.kernel.org/show_bug.cgi?id=10480

Taking "lspci -nnvvvxxx" output of before and after such BIOS update
will shed some light on what's really going on. Can you please try
that?

> So IMHO, the driver should be deactived for sil3114 until a real
> solution is found. And it only should be possible to force activate
> it by a kernel flag, which then also would print a huuuge warning
> about possible data corruption (unfortunately most distributions
> disables inital kernel messages *grumble*).

The problem is serious but the scope is quite limited and we can't
tell where the problem lies, so I'm not too sure about taking such
drastic measure. Grumble...

Yeah, I really want to see this long standing problem fixed. To my
knowledge, this is one of two still open data corruption bugs - the
other one being via putting CDB bytes into burnt CD/DVDs.

So, if you can try the BIOS update thing, please give it a shot.

Thanks.

--
tejun

Robert Hancock

unread,

Jan 7, 2009, 1:00:12 AM1/7/09

to

Tejun Heo wrote:
> Hello,
>
> Bernd Schubert wrote:
>> But now more than a year has passed again without doing anything
>> about it and actually this is what I strongly criticize. Most people
>> don't know about issues like that and don't run file checksum tests
>> as I now always do before taking a disk into production. So users
>> are exposed to known data corruption problems without even being
>> warned about it. Usually even backups don't help, since one creates
>> a backup of the corrupted data.
>
> sata_sil being one of the most popular controllers && data corruption
> reports seem to be concentrated on certain chipsets, I don't think
> it's a wide spread problem. In some cases, the corruption was very
> reproducible too.
>
> I think it's something related to setting up the PCI side of things.
> There have been hints that incorrect CLS setting was the culprit and I
> tried thte combinations but without any success and unfortunately the
> problem wasn't reproducible with the hardware I have here. :-(

As far as the cache line size register, the only thing the documentation
says it controls _directly_ is "With the SiI3114 as a master, initiating
a read transaction, it issues PCI command Read Multiple in place, when
empty space in its FIFO is larger than the value programmed in this
register."

The interesting thing is the commit (log below) that added code to the
driver to check the PCI cache line size register and set up the FIFO
thresholds:

2005/03/24 23:32:42-05:00 Carlos.Pardo
[PATCH] sata_sil: Fix FIFO PCI Bus Arbitration

This patch set default values for the FIFO PCI Bus Arbitration to
avoid data corruption. The root cause is due to our PCI bus master
handling mismatch with the chipset PCI bridge during DMA xfer (write
data to the device). The patch is to setup the DMA fifo threshold so
that there is no chance for the DMA engine to change protocol. We have
seen this problem only on one motherboard.

Signed-off-by: Silicon Image Corporation <cpa...@siliconimage.com>
Signed-off-by: Jeff Garzik <jga...@pobox.com>

What the code's doing is setting the FIFO thresholds, used to assign
priority when requesting a PCI bus read or write operation, based on the
cache line size somehow. It seems to be trusting that the chip's cache
line size register has been set properly by the BIOS. The kernel should
know what the cache line size is but AFAIK normally only sets it when
the driver requests MWI. This chip doesn't support MWI, but it looks
like pci_set_mwi would fix up the CLS register as a side effect..

>
> Anyways, there was an interesting report that updating the BIOS on the
> controller fixed the problem.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=10480
>
> Taking "lspci -nnvvvxxx" output of before and after such BIOS update
> will shed some light on what's really going on. Can you please try
> that?

Yes, that would be quite interesting.. the output even with the current
BIOS would be useful to see if the BIOS set some stupid cache line size
value..

>
>> So IMHO, the driver should be deactived for sil3114 until a real
>> solution is found. And it only should be possible to force activate
>> it by a kernel flag, which then also would print a huuuge warning
>> about possible data corruption (unfortunately most distributions
>> disables inital kernel messages *grumble*).
>
> The problem is serious but the scope is quite limited and we can't
> tell where the problem lies, so I'm not too sure about taking such
> drastic measure. Grumble...
>
> Yeah, I really want to see this long standing problem fixed. To my
> knowledge, this is one of two still open data corruption bugs - the
> other one being via putting CDB bytes into burnt CD/DVDs.
>
> So, if you can try the BIOS update thing, please give it a shot.
>
> Thanks.
>

--

Bernd Schubert

unread,

Jan 7, 2009, 10:50:07 AM1/7/09

to

Unfortunately I can't update the bios/firmware of the Sil3114 directly, it is
onboard and the firmware is included into the mainboard bios. There is not
the most recent bios version installed, but when we initially had the
problems, we first tried a bios update, but it didn't help.

As suggested by Robert, I'm presently trying to figure out the corruption
pattern. Actually our test tool easily provides these data. Unfortunately, it
so far didn't report anything, although the reiserfs already got corrupted.
Might be my colleague, who wrote that tool, recently broke something (as it
is the second time, it doesn't report corruptions), in the past it did work
reliably. Please give me a few more days...

03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114
[SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
[1095:3114]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at bc00 [size=8]
Region 1: I/O ports at b880 [size=4]
Region 2: I/O ports at b800 [size=8]
Region 3: I/O ports at ac00 [size=4]
Region 4: I/O ports at a880 [size=16]
Region 5: Memory at feafec00 (32-bit, non-prefetchable) [size=1K]
Expansion ROM at fea00000 [disabled] [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: 95 10 14 31 07 01 b0 02 02 00 80 01 10 40 00 00
10: 01 bc 00 00 81 b8 00 00 01 b8 00 00 01 ac 00 00
20: 81 a8 00 00 00 ec af fe 00 00 00 00 95 10 14 31
30: 00 00 a0 fe 60 00 00 00 00 00 00 00 0a 01 00 00
40: 02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
70: 00 00 60 00 d0 d0 09 00 00 00 60 00 00 00 00 00
80: 03 00 00 00 22 00 00 00 00 00 00 00 c8 93 7f ef
90: 00 00 00 09 ff ff 00 00 00 00 00 19 00 00 00 00
a0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
b0: 01 21 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
c0: 84 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Cheers,
Bernd

--
Bernd Schubert
Q-Leap Networks GmbH

Bengt Samuelsson

unread,

Jan 10, 2009, 3:30:13 AM1/10/09

to

I hope the problem is solved!

Make sure you do not run dmraid and mdadm !
So, if you have truble mdadm RAID, check you do not run dmraid !

I found this.

/etc/rcS.d/S04dmraid
/etc/rcS.d/S25mdadm-raid
/etc/rc0.d/S50mdadm-raid
/etc/rc0.d/S51dmraid
/etc/rc6.d/S50mdadm-raid
/etc/rc6.d/S51dmraid

They is now like this
/etc/rcS.d/K04dmraid
/etc/rcS.d/S25mdadm-0-sata_sil_slowdown
/etc/rcS.d/S25mdadm-raid
/etc/rc0.d/S50mdadm-raid
/etc/rc0.d/K51dmraid
/etc/rc6.d/S50mdadm-raid
/etc/rc6.d/K51dmraid

I also make this just before mdadm_raid start to set the "15" block write.
/etc/init.d/sata_sil_slowdown just to make it a bit safer.
---
#!/bin/sh
#
chmod 0644 /sys/modules/sata_sil/parameters/slow_down
echo 1 > /sys/modules/sata_sil/parameters/slow_down
chmod 0444 /sys/modules/sata_sil/parameters/slow_down
#
exit 0
---

I have now, during I write this, run a testcopy for 29hrs, still running
copytest without errors. Before it shuld got in ro-mode. :-)

>>
>> Hi,
>>
>> I need some support for this soft-raid system.
>>
>> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot
>> 1.3T byte
>>
>> And it runs in http://sm7jqb.dnsalias.com
>> I use mdadm sytem in a Debian Linux
>> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>>
>> I have 'some courrupt' data. And I don't understand whay and how to
>> fix it.
>> Mybee slow it down more, but how slow it down?
>>
>> Any with experents from this cheep way of RAID systems.
>>
>> Ask for more information and I can get it, logs, setup files and what
>> you want
>> to know.
>>

--

Bengt Samuelsson

unread,

Jan 10, 2009, 4:00:14 AM1/10/09

to

I found this.

And Read this!
http://unclean.org/howto/sii3114_linux.html
or here at my site!
http://data-doc.se/howto/sii3114_linux.html

>>
>> Hi,
>>
>> I need some support for this soft-raid system.
>>
>> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot
>> 1.3T byte
>>
>> And it runs in http://sm7jqb.dnsalias.com
>> I use mdadm sytem in a Debian Linux
>> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>>
>> I have 'some courrupt' data. And I don't understand whay and how to
>> fix it.
>> Mybee slow it down more, but how slow it down?
>>
>> Any with experents from this cheep way of RAID systems.
>>
>> Ask for more information and I can get it, logs, setup files and what
>> you want
>> to know.
>>

--

<body bgcolor="#ffffff" text="#000000">
<div class="moz-signature">
<font size=2 color="#000000" face="Times New Roman">Bengt Samuelsson<br>
Nydalavägen 30 A<br>
352 48 Växjö
<font size=1><br><br>
mobil: +46(0)703686441<br>
<a href="http://sm7jqb.se" target="_blank">http://sm7jqb.se</a>
 
<a href="http://data-doc.se" target="_blank">http://data-doc.se</a>
</font></font></div>
</body>

Robert Hancock

unread,

Jan 10, 2009, 7:40:13 PM1/10/09

to

Bernd Schubert wrote:
>>> I think it's something related to setting up the PCI side of things.
>>> There have been hints that incorrect CLS setting was the culprit and I
>>> tried thte combinations but without any success and unfortunately the
>>> problem wasn't reproducible with the hardware I have here. :-(
>> As far as the cache line size register, the only thing the documentation
>> says it controls _directly_ is "With the SiI3114 as a master, initiating
>> a read transaction, it issues PCI command Read Multiple in place, when
>> empty space in its FIFO is larger than the value programmed in this
>> register."
>>
>> The interesting thing is the commit (log below) that added code to the
>> driver to check the PCI cache line size register and set up the FIFO
>> thresholds:
>>
>> 2005/03/24 23:32:42-05:00 Carlos.Pardo
>> [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration
>>
>> This patch set default values for the FIFO PCI Bus Arbitration to
>> avoid data corruption. The root cause is due to our PCI bus master
>> handling mismatch with the chipset PCI bridge during DMA xfer (write
>> data to the device). The patch is to setup the DMA fifo threshold so
>> that there is no chance for the DMA engine to change protocol. We have
>> seen this problem only on one motherboard.
>>
>> Signed-off-by: Silicon Image Corporation <cpa...@siliconimage.com>
>> Signed-off-by: Jeff Garzik <jga...@pobox.com>

>>4

>> What the code's doing is setting the FIFO thresholds, used to assign
>> priority when requesting a PCI bus read or write operation, based on the
>> cache line size somehow. It seems to be trusting that the chip's cache
>> line size register has been set properly by the BIOS. The kernel should
>> know what the cache line size is but AFAIK normally only sets it when
>> the driver requests MWI. This chip doesn't support MWI, but it looks
>> like pci_set_mwi would fix up the CLS register as a side effect..
>>
>>> Anyways, there was an interesting report that updating the BIOS on the
>>> controller fixed the problem.
>>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10480
>>>
>>> Taking "lspci -nnvvvxxx" output of before and after such BIOS update
>>> will shed some light on what's really going on. Can you please try
>>> that?
>> Yes, that would be quite interesting.. the output even with the current
>> BIOS would be useful to see if the BIOS set some stupid cache line size
>> value..
>
> Unfortunately I can't update the bios/firmware of the Sil3114 directly, it is
> onboard and the firmware is included into the mainboard bios. There is not
> the most recent bios version installed, but when we initially had the
> problems, we first tried a bios update, but it didn't help.

Well if one is really adventurous one can sometimes use some BIOS image
editing tools to install an updated flash image for such integrated
chips into the main BIOS image. This is definitely for advanced users
only though..

>
> As suggested by Robert, I'm presently trying to figure out the corruption
> pattern. Actually our test tool easily provides these data. Unfortunately, it
> so far didn't report anything, although the reiserfs already got corrupted.
> Might be my colleague, who wrote that tool, recently broke something (as it
> is the second time, it doesn't report corruptions), in the past it did work
> reliably. Please give me a few more days...
>
>
> 03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114
> [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
> Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
> [1095:3114]
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 64, Cache Line Size: 64 bytes

Well, 64 seems quite reasonable, so that doesn't really give any more
useful information.

I'm CCing Carlos Pardo at Silicon Image who wrote the patch above, maybe
he has some insight.. Carlos, we have a case here where Bernd is
reporting seeing corruption on an integrated SiI3114 on a Tyan Thunder
K8S Pro (S2882) board, AMD 8111 chipset. This is reportedly occurring
only with certain Seagate drives. Do you have any insight into this
problem, in particular as far as whether the problem worked around in
the patch mentioned above might be related?

There are apparently some reports of issues on NVidia chipsets as well,
though I don't have any details at hand.

--

Robert Hancock

unread,

Jan 10, 2009, 7:50:07 PM1/10/09

to

Well, Carlos' email bounces, so much for that one. Anyone have any other
contacts at Silicon Image?

> To unsubscribe from this list: send the line "unsubscribe linux-ide" in

> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Tejun Heo

unread,

Jan 11, 2009, 8:50:06 PM1/11/09

to

Robert Hancock wrote:
>> There are apparently some reports of issues on NVidia chipsets as
>> well, though I don't have any details at hand.
>
> Well, Carlos' email bounces, so much for that one. Anyone have any other
> contacts at Silicon Image?

I'll ping my SIMG contacts but I've pinged about this problem in the
past but it didn't get anywhere.

Thanks.

--
tejun

Aaron Greenspan

unread,

Jan 19, 2009, 1:10:13 PM1/19/09

to

Hi,

In case this is helpful to anyone, I've also been having similar issues
with the Syba SD-SATA-4P PCI RAID controller, which is based on the
Silicon Image 3114 chip. I've been using Western Digital 250GB hard
drives of various models, however, and there are no nVidia components
presents in the server I'm using that I'm aware of.

I've written up a more detailed description of the problem I'm
encountering here:

http://bugzilla.kernel.org/show_bug.cgi?id=10480

I've contacted Syba's support e-mail, as well, but I'm not sure if that
will lead anywhere--probably just to Silicon Image.

Aaron

Aaron Greenspan
President & CEO
Think Computer Corporation

http://www.thinkcomputer.com

Dave Jones

unread,

Jan 19, 2009, 2:20:09 PM1/19/09

to

On Mon, Jan 12, 2009 at 10:30:42AM +0900, Tejun Heo wrote:
> Robert Hancock wrote:
> >> There are apparently some reports of issues on NVidia chipsets as
> >> well, though I don't have any details at hand.
> >
> > Well, Carlos' email bounces, so much for that one. Anyone have any other
> > contacts at Silicon Image?
>
> I'll ping my SIMG contacts but I've pinged about this problem in the
> past but it didn't get anywhere.

I wish I'd read this thread last week.. I've been beating my head
against this problem all weekend.

I picked up a cheap 3114 card, and found that when I created a filesystem
with it on a 250GB disk, it got massive corruption very quickly.

My experience echos most the other peoples in this thread, but here's
a few data points I've been able to figure out..

I ran badblocks -v -w -s on the disk, and after running
for nearly 24 hours, it reported a huge number of blocks
failing at the upper part of the disk.

I created a partition in this bad area to speed up testing..

Device Boot Start End Blocks Id System
/dev/sde1 1 30000 240974968+ 83 Linux
/dev/sde2 30001 30200 1606500 83 Linux
/dev/sde3 30201 30401 1614532+ 83 Linux

Rerunning badblocks on /dev/sde2 consistently fails when
it gets to the reading back 0x00 stage.
(Somehow it passes reading back 0xff, 0xaa and 0x55)

I was beginning to suspect the disk may be bad, but when I
moved it to a box with Intel sata, the badblocks run on that
same partition succeeds with no problems at all.

Given the corruption happens at high block numbers, I'm wondering
if maybe there's some kind of wraparound bug happening here.
(Though why only the 0x00 pattern fails would still be a mystery).

After reading about the firmware update fixing it, I thought I'd
give that a shot. This was pretty much complete fail.

The DOS utility for flashing claims I'm running BIOS 5.0.39,
which looking at http://www.siliconimage.com/support/searchresults.aspx?pid=28&cat=15
is quite ancient. So I tried the newer ones.
Same experience with both 5.4.0.3, and 5.0.73

"BIOS version in the input file is not a newer version"

Forcing it to write anyway gets..

"Data is different at address 65f6h"

Dave

--
http://www.codemonkey.org.uk

Robert Hancock

unread,

Jan 19, 2009, 10:10:05 PM1/19/09

to

Yeah, that seems a bit bizarre.. Apparently somehow zeros are being
converted into non-zero.. Can you try zeroing out the partition by
dd'ing into it from /dev/zero or something, then dumping it back out to
see what kind of data is showing up?

>
>
> After reading about the firmware update fixing it, I thought I'd
> give that a shot. This was pretty much complete fail.
>
> The DOS utility for flashing claims I'm running BIOS 5.0.39,
> which looking at http://www.siliconimage.com/support/searchresults.aspx?pid=28&cat=15
> is quite ancient. So I tried the newer ones.
> Same experience with both 5.4.0.3, and 5.0.73
>
> "BIOS version in the input file is not a newer version"
>
> Forcing it to write anyway gets..
>
> "Data is different at address 65f6h"
>
>
>
>
> Dave
>
>

--

Aaron Greenspan

unread,

Jan 20, 2009, 4:50:09 AM1/20/09

to

Hello again,

For the sake of context and in case it got lost in the shuffle, I wrote
this post last night:
http://lists.debian.org/debian-user/2009/01/msg01928.html

Now, after several days of troubleshooting involving ext3-fs errors,
formatting problems, inexplicable read-only filesystem mounts, and
unbelievably bad drivers, my final diagnosis is that basically any
product based on the Silicon Image 3114 chipset isn't worth bothering
with. I'm returning mine.

The particular expansion card I purchased was the Syba SD-SATA-4P, also
known as SiI 3114RAID, and I was trying to use it with four different
250GB Western Digital WD2500JD-00GBB0 and WD2500JD-75FYB0 drives. (I had
two Syba identical cards and I was trying to use them in two identical
Dell PowerEdge 1650 servers.) Over the course of my travails, I tried to
make the Syba card work with CentOS 4.4, CentOS 4.7, CentOS 5.2, Windows
NT 4.0, and Windows XP SP3. I had no idea what I was up against. After
talking to Michael of Syba technical support, I came away with the
impression that it was all my fault, for I had made certain assumptions
in purchasing the card, and these assumptions are what led me astray.

My first assumption, which Michael corrected, was that the expansion
card was designed to be used in servers. He said that instead, Syba was
targeting consumers, and that servers weren't really their bread and
butter, so to speak. I found this odd given that this isn't exactly a
digital photo printer we're talking about--it's a PCI SATA adapter with
4 internal ports.

My next assumption was that "PCI 2.2 compliance" meant that the card was
actually compliant with the PCI 2.2 specifications. According to
http://www.pcisig.com/specifications/conventional/conventional_pci/2_2_checklist.doc,
requirement EE5 on the Expansion Card Electrical Checklist is that the
edge connector key correctly reflects the signaling supported. On the
Syba card, it simply doesn't. While the card is keyed both for 3.3V and
5V PCI slots, implying support of either voltage, according to Michael,
the card will *only* work in 5V slots--the kind he said were more
typically found in consumer systems, and the kind that aren't available
in my Dell PowerEdge 1650 systems.

A third assumption I made was that the Silicon Image drivers for Linux
actually worked on Linux. They don't, as many of the people on this
thread can attest. Michael said that the Syba (really, Silicon Image)
Linux drivers only work up to version 2.6.9 of the kernel, and that
later versions are not supported. (He also said that their engineers
were "probably" working on an updated version.) In any event, none of
the Red Hat drivers on the CD-ROM that came with the product are
recognizable by Red Hat or CentOS from what I can tell.

Yet another assumption I made was that even if I couldn't get the card
to work on my Dell PowerEdge systems because of their 64-bit 3.3V PCI
slots and my apparently foolish desire to run a modern Linux kernel, I
could still get it to work on Windows NT or XP. Depending on whether I
corrected for capitalization errors in the TXTSETUP.OEM file supplied
with several versions of the 32-bit driver, Windows NT 4.0 setup said
either that a file was either missing or of an invalid type. The card
was actually recognized by Windows XP, the latest 32-bit SiI SATARAID5
drivers worked, and I could copy data to my drives--but then RAID arrays
that should be have been in good shape fell apart spontaneously on
reboot, and no matter what, I couldn't boot off any of the connected
drives. Also, after I ran it the first time, the mysterious (and
impossible-to-use) SiI Windows software kept showing a bizarre 1K
"partition" that visually appeared to be equally as large as my other
232GB partition in each drive.

Therefore, I think it's safe to say that the Syba SD-SATA-4P and the
corresponding Silicon Image 3114 chipset have a fairly narrow appeal:
they work in 32-bit 5V PCI slots only, on consumer systems only, which
are running only the following operating systems that I own: none.

I hope this helps someone... I just wish I knew of an inexpensive (under
$50) and reliable PCI-to-SATA adapter so I could put some of my hard
drives to use in these servers!

Aaron

Aaron Greenspan
President & CEO
Think Computer Corporation

http://www.thinkcomputer.com

Mark Allums

unread,

Jan 20, 2009, 5:10:08 AM1/20/09

to

Aaron Greenspan wrote:
> Therefore, I think it's safe to say that the Syba SD-SATA-4P and the
> corresponding Silicon Image 3114 chipset have a fairly narrow appeal:
> they work in 32-bit 5V PCI slots only, on consumer systems only, which
> are running only the following operating systems that I own: none.
>
> I hope this helps someone... I just wish I knew of an inexpensive (under
> $50) and reliable PCI-to-SATA adapter so I could put some of my hard
> drives to use in these servers!
>
> Aaron

I have that chip. It is soldered to my motherboard, an ASUS AMD64
socket 939 machine. I consider it "old". I expect you are running up
against obsolescence. For what it is worth, it worked in Windows XP SP2
with the ASUS supplied driver, but I only tried it as a SATA driver, not
a RAID driver. When ASUS updated that board, they did away with that
chip. That may tell you something.

I am using a US$499.99 Adaptec controller in my newest machine. I
nearly bought the 3ware $899.99 card. I don't regret the Adaptec, and
probably wouldn't regret the 3ware.

Mark Allums

Adrian Levi

unread,

Jan 20, 2009, 5:30:10 AM1/20/09

to

2009/1/20 Aaron Greenspan <aar...@thinkcomputer.com>:

> Hello again,
>
> For the sake of context and in case it got lost in the shuffle, I wrote this
> post last night: http://lists.debian.org/debian-user/2009/01/msg01928.html
>
> Now, after several days of troubleshooting involving ext3-fs errors,
> formatting problems, inexplicable read-only filesystem mounts, and
> unbelievably bad drivers, my final diagnosis is that basically any product
> based on the Silicon Image 3114 chipset isn't worth bothering with. I'm
> returning mine.

I have been reading this thread with interest.
I have a 4 port card based on the sil 3114 chipset. I have never used
the card as a raid card but only in JBOD mode with mdadm controlling
the raid for me. I'm using 2 Seagate 1TB drives in raid 1.

Haven't noticed anything out of the ordinary so far *Keeps fingers crossed*.

Adrian

--
24x7x365 != 24x7x52 Stupid or bad maths?
<erno> hm. I've lost a machine.. literally _lost_. it responds to
ping, it works completely, I just can't figure out where in my
apartment it is.

Dave Jones

unread,

Jan 20, 2009, 3:30:17 PM1/20/09

to

On Mon, Jan 19, 2009 at 08:50:06PM -0600, Robert Hancock wrote:

> > Given the corruption happens at high block numbers, I'm wondering
> > if maybe there's some kind of wraparound bug happening here.
> > (Though why only the 0x00 pattern fails would still be a mystery).
>
> Yeah, that seems a bit bizarre.. Apparently somehow zeros are being
> converted into non-zero.. Can you try zeroing out the partition by
> dd'ing into it from /dev/zero or something, then dumping it back out to
> see what kind of data is showing up?

Hmm, it seems the failed firmware update has killed the eeprom.
It no longer reports the right PCI vendor ID.

Dave

--
http://www.codemonkey.org.uk

Corrupt data - RAID sata_sil 3114 chip

Bengt Samuelsson

Justin Piszcz

Justin Piszcz

Twigathy

Bernd Schubert

Redeeman

Bengt Samuelsson

Bernd Schubert

Alan Cox

Bernd Schubert

Robert Hancock

Bernd Schubert

Robert Hancock

Bernd Schubert

James Youngman

Robert Hancock

debia...@jamie-thompson.co.uk

Bengt Samuelsson

Bengt Samuelsson

Justin Piszcz

Justin Piszcz

Bengt Samuelsson

Bengt Samuelsson

Justin Piszcz

Tejun Heo

Robert Hancock

Bernd Schubert

Bengt Samuelsson

Bengt Samuelsson

Robert Hancock

Robert Hancock

Tejun Heo

Aaron Greenspan

Dave Jones

Robert Hancock

Aaron Greenspan

Mark Allums

Adrian Levi

Dave Jones