Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

freebsd-stable Digest, Vol 241, Issue 5

0 views
Skip to first unread message

freebsd-sta...@freebsd.org

unread,
Feb 27, 2008, 4:33:40 PM2/27/08
to freebsd...@freebsd.org
Send freebsd-stable mailing list submissions to
freebsd...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-sta...@freebsd.org

You can reach the person managing the list at
freebsd-st...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."


Today's Topics:

1. Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
(Jeremy Chadwick)
2. Re: panic in ufs_lookup (6.2-STABLE) (Ivan Voras)
3. Re: em very slow, shared irq... on 6.3p8 (JoaoBR)
4. Tar regression from 6.2 to 6.3 with --strip-components
(Jan Mikkelsen)
5. Re: Tar regression from 6.2 to 6.3 with --strip-components
(Kris Kennaway)
6. Re: Tar regression from 6.2 to 6.3 with --strip-components
(Rink Springer)
7. Re: em very slow, shared irq... on 6.3p8 (Mike Tancsa)
8. Re: ntpd fails to synchronize on FreeBSD 6.3-STABLE
(Pongthep Kulkrisada)
9. Re: ntpd fails to synchronize on FreeBSD 6.3-STABLE
(Jeremy Chadwick)
10. Re: 7.0-PRERELEASE Fatal Trap 12 with sysctl and acpi (Jim Pingle)
11. Documentation of NO_* knobs (Patrick M. Hausen)
12. Re: Documentation of NO_* knobs (Jeremy Chadwick)
13. Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
(Stephen Hurd)
14. Re: fsck_ufs: cannot alloc 94208 bytes for inoinfo (Oliver Fromme)
15. Re: Documentation of NO_* knobs (Patrick M. Hausen)
16. Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
(Jeremy Chadwick)
17. Re: Tar regression from 6.2 to 6.3 with --strip-components
(Tim Kientzle)
18. Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
(Alex Zbyslaw)
19. [solved/workaround?] Re: em very slow, shared irq... on 6.3p8
(Holger Kipp)
20. Re: fsck_ufs: cannot alloc 94208 bytes for inoinfo
(Matthew Dillon)
21. Re: [solved!] Re: em very slow, shared irq... on 6.2p8
(Holger Kipp)
22. Re: [solved!] Re: em very slow, shared irq... on 6.2p8
(Mike Tancsa)
23. Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
(Scott Long)
24. Re: [solved/workaround?] Re: em very slow, shared irq... on
6.3p8 (Jack Vogel)
25. Re: [solved/workaround?] Re: em very slow, shared irq... on
6.3p8 (Holger Kipp)
26. Problems with promise SATA300 TX2Plus (Jisakiel)
27. RE: Tar regression from 6.2 to 6.3 with --strip-components
(Jan Mikkelsen)


----------------------------------------------------------------------

Message: 1
Date: Wed, 27 Feb 2008 04:11:29 -0800
From: Jeremy Chadwick <koi...@freebsd.org>
Subject: Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
To: Stephen Hurd <sh...@sasktel.net>
Cc: freebsd...@freebsd.org
Message-ID: <20080227121...@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 01:11:36AM -0800, Stephen Hurd wrote:
> ... The corrupted sync message scared the heck out of me:
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiti
> Synncgi n(gm adxi sk6s0, svencoodnedss )r efmoari nsiynsgte.m. .pr1o0c ess
> `syncer' to stop...8 7 8 3 3 3 1 0 0 0 0 done

http://lists.freebsd.org/pipermail/freebsd-current/2007-October/078145.html
http://lists.freebsd.org/pipermail/freebsd-current/2007-November/079130.html
http://lists.freebsd.org/pipermail/freebsd-current/2007-November/079131.html
http://lists.freebsd.org/pipermail/freebsd-stable/2007-December/038727.html


> And after the reboot, the READ_DMA timeouts were back.

You're not the only one seeing this behaviour. There are too many posts
in the past reporting similar. Here's the breakdown:

* Some reporting this problem have been told to replace their ATA or
SATA cables (which have previously been known to be working, but cables
going bad does happen) -- and this has fixed the problem for a couple.

* Some have checked their SMART stats and found their disks to be in
perfect condition.

* Some have switched to alternate operating systems (usually Linux) for
a short while and seen no sign of DMA timeouts.

* Some have replaced the storage controller to no avail, and some have
replaced the entire motherboard to no avail. In some cases (myself
included), replacing the motherboard did in fact help.

However: in your case, your disk does look to have problems based on the
SMART output you provided. It does not matter how new/old the disk is,
by the way. I'll point out the problematic stats. You need to replace
the disk ASAP.

BTW, any SMART stats you see labelled "Offline" means the numbers will
not be updated until you perform an offline test (smartctl -t short or
smartctl -t long).

> The only "odd" think I can think of about my system is an unusually high HZ
> value (2386) I'm building a kernel now with 1000 to check if that makes a
> difference.

This is not the cause, rest assured.

> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 4

This shows you've had 4 reallocated sectors, meaning your disk does in
fact have bad blocks. In 90% of the cases out there, bad blocks
continue to "grow" over time, due to whatever reason (I remember reading
an article explaining it, but I can't for the life of me find the URL).

> 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 48

This is excessive, and may be attributing to problems. A hard disk
running at 48C is not a good sign. This should really be somewhere
between high 20s and mid 30s.

> 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 11498

This implies a large number of ECC (error correction) activities have
occured, but all were successful.

> Error 2 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
> When the command that caused the error occurred, the device was in an unknown state.
> Error 1 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
> When the command that caused the error occurred, the device was in an unknown state.

These are automated SMART log entries confirming the DMA failures. The
fact that SMART saw them means that the disk is also aware of said
issues. These may have been caused by the reallocated sectors. It's
also interesting that the LBAs are different than the ones FreeBSD
reported issues with.

My advice to you is: replace the disk ASAP. This problem will only get
worse. Try another hard disk brand too (I don't have anything "against"
Maxtor, but usually its recommended to avoid a brand you have problems
with until the next time you have issues, then switch brands, etc.
etc...). I'm very fond of Western Digital's SE16, RE, and RE2 series
currently. But avoid Fujitsu and Samsung (both have a long track record
of having buggy drive firmwares, forcing vendors to make custom
workarounds for issues); stick with Seagate, Western Digital, or Maxtor.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 2
Date: Wed, 27 Feb 2008 13:21:20 +0100
From: Ivan Voras <ivo...@freebsd.org>
Subject: Re: panic in ufs_lookup (6.2-STABLE)
To: freebsd...@freebsd.org
Message-ID: <fq3ke1$e9v$1...@ger.gmane.org>
Content-Type: text/plain; charset="utf-8"

Andrew N. Below wrote:

> RELENG_6 cvsuped at 2007-01-15
>
> what should I check (source revisions) to ensure we have same bug?

This is a quite old version of FreeBSD. I don't know for sure if the bug
was fixed by that time, but you should update to newest RELENG_6 or
RELENG_6_3 to make sure.

Alternatively, something might be corrupting the file system data. Maybe
you should experiment with a few rounds of rsync followed by md5 on both
the source and destination file systems.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080227/0f38e7f8/signature-0001.pgp

------------------------------

Message: 3
Date: Wed, 27 Feb 2008 10:06:56 -0300
From: JoaoBR <jo...@matik.com.br>
Subject: Re: em very slow, shared irq... on 6.3p8
To: freebsd...@freebsd.org
Cc: Holger Kipp <h...@alogis.com>
Message-ID: <20080227100...@matik.com.br>
Content-Type: text/plain; charset="iso-8859-1"

On Wednesday 27 February 2008 07:49:42 Holger Kipp wrote:
> Hello,
>
> I updated a system with 12 dc-interfaces to a new hardware
> with 14 em-interfaces. Yes, it is a firewall.
> New System is 6.2-RELEASE-p8.
>
> What I now experience between two internal networks (100MBit/s each)
> is the following:
> 1318 packets transmitted, 1317 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 0.387/246.153/2441.392/324.142 ms
>
> tcpdump on the firewall shows similar delays (on the outgoing
> interface).
>
> tcpdump on the system I ping however shows very quick responses
> for incoming packages (ie usually less than a millisecond).
>
> I therefore assume that the problem is between receiving the
> irq from em<x> and getting the data from the interface on the firewall
> itself.
>
> My first option would be to activate polling on em-interfaces - but as
> I did not experience this sort of notieceable slowdown with the old
> dc-based firewall (without polling), maybe someone can shed some light
> on this strange behaviour or has other suggestions as well?

I had a setup with 4 (dlink 4port + 1 nic nfe onboard) which run extremely
stable 6.3

I upgraded the hardware (S939 -> AM2) and used two em 2port cards, without
polling it was certanly unusable but with polling I got very good performance

without polling normally in a day or less the machine hung, no msg, simply
freezed

with polling it stands some days up to two weeks when it freeze again

I upgraded to 7.0 same result

check your setup with vmstat -i and if you see two nics on the same interrupt
I guess you get the same result as I got

actual I am running one em 2port and two single port pci cards what seems to
be stable

anyway, similare setup on Tyan MBs do not have this problem where I also have
8 nics on one system

I also do not have this problems on S939 boards only on AM2, so I am not sure
where the problem is exactly but seems in a certain way hardware related

You could try changing your PS unit because you might be short on power with
lots of em cards and SATA disks


--

João

A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br


------------------------------

Message: 4
Date: Thu, 28 Feb 2008 00:09:50 +1100
From: "Jan Mikkelsen" <ja...@transactionware.com>
Subject: Tar regression from 6.2 to 6.3 with --strip-components
To: <freebsd...@freebsd.org>
Message-ID: <000801c87942$098dceb0$0301a8c0@STUDYPC>
Content-Type: text/plain; charset="us-ascii"

Hi,

I've just noticed a regression in tar from 6.2 to 6.3:

Running this on 6.2 produces no output:

#!/bin/sh
mkdir -p a b output
touch a/file1 b/file2
tar cf test.tar a b
tar -x -C output --strip-components 1 -f test.tar

On 6.3, it produces this output:

: Invalid empty pathname
: Invalid empty pathname
tar: Error exit delayed from previous errors.

And the tar extraction returns a failure.

Is this known? Should I raise a PR?

Regards,

Jan Mikkelsen

------------------------------

Message: 5
Date: Wed, 27 Feb 2008 15:03:53 +0100
From: Kris Kennaway <kr...@FreeBSD.org>
Subject: Re: Tar regression from 6.2 to 6.3 with --strip-components
To: Jan Mikkelsen <ja...@transactionware.com>
Cc: Tim Kientzle <kien...@FreeBSD.org>, "freebsd...@freebsd.org"
<freebsd...@freebsd.org>
Message-ID: <47C56DC9...@FreeBSD.org>
Content-Type: text/plain; charset=UTF-8; format=flowed

Jan Mikkelsen wrote:
> Hi,
>
> I've just noticed a regression in tar from 6.2 to 6.3:
>
> Running this on 6.2 produces no output:
>
> #!/bin/sh
> mkdir -p a b output
> touch a/file1 b/file2
> tar cf test.tar a b
> tar -x -C output --strip-components 1 -f test.tar
>
> On 6.3, it produces this output:
>
> : Invalid empty pathname
> : Invalid empty pathname
> tar: Error exit delayed from previous errors.
>
> And the tar extraction returns a failure.
>
> Is this known? Should I raise a PR?

Let's see what Tim has to say.

Kris


------------------------------

Message: 6
Date: Wed, 27 Feb 2008 14:49:39 +0100
From: Rink Springer <ri...@freebsd.org>
Subject: Re: Tar regression from 6.2 to 6.3 with --strip-components
To: Jan Mikkelsen <ja...@transactionware.com>
Cc: freebsd...@freebsd.org
Message-ID: <20080227134...@rink.nu>
Content-Type: text/plain; charset=us-ascii

Hi,

On Thu, Feb 28, 2008 at 12:09:50AM +1100, Jan Mikkelsen wrote:
> And the tar extraction returns a failure.

I can confirm this does not work on 8-CURRENT either.

> Is this known? Should I raise a PR?

That seems a good idea to me. Thanks!

--
Rink P.W. Springer - http://rink.nu
"Anyway boys, this is America. Just because you get more votes doesn't
mean you win." - Fox Mulder


------------------------------

Message: 7
Date: Wed, 27 Feb 2008 09:50:16 -0500
From: Mike Tancsa <mi...@sentex.net>
Subject: Re: em very slow, shared irq... on 6.3p8
To: Holger Kipp <h...@alogis.com>, freebsd...@freebsd.org
Message-ID: <200802271452....@lava.sentex.ca>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 05:49 AM 2/27/2008, Holger Kipp wrote:

>I therefore assume that the problem is between receiving the
>irq from em<x> and getting the data from the interface on the firewall
>itself.

I would try upgrading to 6.3R (there are several em driver bug fixes)
and then try the box with
% cat /boot/loader.conf
hw.pci.enable_msi=1

...if the cards support msi.

I think pciconf -lvc should tell you if the cards and slots support it or not.

Also, if you dont need IPV6, use FAST_IPSEC. It does not need
mpsafe. If you do need IPSEC and IPV6, 7.0R got rid of that restriction.

---Mike

------------------------------

Message: 8
Date: Wed, 27 Feb 2008 21:58:28 +0700
From: Pongthep Kulkrisada <ptkr...@gmail.com>
Subject: Re: ntpd fails to synchronize on FreeBSD 6.3-STABLE
To: freebsd...@freebsd.org
Message-ID: <2008022714...@gmail.com>
Content-Type: text/plain; charset=us-ascii

> This isn't enough time. Please try this instead.
>
> # /etc/rc.d/ntpd stop
> # /etc/rc.d/ntpdate start
>
> This should set your clock, even if only by a few milliseconds.
> Assuming the ntpdate part is successful, continue on:
>
> # tcpdump -l -n -s 8192 -p "port 123"
>
> Now, in another window, execute:
>
> # /etc/rc.d/ntpd start
>
> Then let the tcpdump go for about 15 minutes. You aren't using the
> "iburst" feature on any of the servers, so it will take some time before
> they try to sync up.
Alright, here is the output.

Script started on Wed Feb 27 20:46:19 2008
root@bsdhost:~# /etc/rc.c/ntpd stop
Stopping ntpd.
root@bsdhost:~# /etc/rc.d/ntpdate start
Setting date via ntp.
27 Feb 20:46:53 ntpdate[2000]: no server suitable for synchronization found
root@bsdhost:~# tcpdump -l -n -s 8192 -p "port 123"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fxp0, link-type EN10MB (Ethernet), capture size 8192 bytes
20:51:46.149541 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:51:47.149369 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:51:48.149192 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:52:50.148777 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:52:50.148818 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:52:54.149147 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:53:53.149127 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:53:56.148700 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:53:57.149545 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:54:56.149586 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:55:02.149701 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:55:02.149749 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:56:00.148838 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:56:05.149070 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:56:07.148751 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:57:06.148789 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:57:11.148992 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:57:13.148718 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:58:10.149016 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:58:17.148954 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:58:17.148997 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
20:59:14.149296 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
20:59:22.149048 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
20:59:23.148886 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:00:19.149376 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:00:26.149309 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:00:29.148856 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:01:23.149634 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:01:30.149579 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:01:33.149117 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:02:29.149586 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:02:35.148637 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:02:37.149400 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:03:32.149004 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:03:40.148796 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:03:41.149618 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:04:35.149397 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:04:45.148898 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:04:46.148714 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:05:39.149665 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:05:50.148985 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:05:50.149032 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:06:44.148776 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:06:54.149246 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:06:56.148916 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:07:49.148879 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
21:07:58.149478 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
21:08:00.149183 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
21:09:56.149530 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
^C
49 packets captured
230 packets received by filter
0 packets dropped by kernel
root@bsdhost:~# ^D

Script done on Wed Feb 27 21:10:30 2008

I also run ``/etc/rc.d/ntpd start'' on another root console right after tcpdump. Note that I appended the following lines in /etc/rc.conf and reboot prior to running ``/etc/rc.d/ntpdate start''.

# grep ntpdate /etc/rc.conf
ntpdate_enable="YES"
ntpdate_flags="-b time.navy.mi.th asia.pool.ntp.org ntp.nict.jp"

These 3 NTP servers are the same as ones in /etc/ntp.conf.
And same as ones in my other machine running MS Windows.
On my Windows machine behind NAT, I can always get sync with these servers.
And on my FreeBSD dial-up connection, I can also get sync with these servers.
But ntpdate and ntpd just don't work for the ``machine behind NAT''.
Even firewall on machine and router are disabled.
There must be something wrong, I don't know.

Please also noted that my clock is drifted but less than 1000secs, for sure.
It should not be a problem for ntpd. ...referred to ntpd(8)

> > Man pages over there are all FreeBSD 6.2.
> > But some timestamps dated Feb 13, 2008; but footer is still FreeBSD 6.2
>
> I can confirm this on my RELENG_6 box (using 6.3). I wouldn't worry
> about the footer saying 6.2.
OK, thanks. I will not worry it either.

> The procedure is documented in /usr/src/Makefile, and you should really
> follow that. I haven't read the Handbook's documentation on what to do,
> but the above seems awfully extensive for something that is described in
> the Makefile (which I have used since the days of 4.x without issue).
Noted, thanks.

> I can't help you with anything relating to updating doc-all or your
> /usr/doc tree. I'm not familiar with that, sorry.
No problems, actually it is not necessary.

Thanks,
Pongthep


------------------------------

Message: 9
Date: Wed, 27 Feb 2008 07:08:52 -0800
From: Jeremy Chadwick <koi...@freebsd.org>
Subject: Re: ntpd fails to synchronize on FreeBSD 6.3-STABLE
To: freebsd...@freebsd.org
Message-ID: <20080227150...@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 09:58:28PM +0700, Pongthep Kulkrisada wrote:
> root@bsdhost:~# /etc/rc.d/ntpdate start
> Setting date via ntp.
> 27 Feb 20:46:53 ntpdate[2000]: no server suitable for synchronization found
> root@bsdhost:~# tcpdump -l -n -s 8192 -p "port 123"
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on fxp0, link-type EN10MB (Ethernet), capture size 8192 bytes
> 20:51:46.149541 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:51:47.149369 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:51:48.149192 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:52:50.148777 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:52:50.148818 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:52:54.149147 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:53:53.149127 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:53:56.148700 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:53:57.149545 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:54:56.149586 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:55:02.149701 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:55:02.149749 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:56:00.148838 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:56:05.149070 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:56:07.148751 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:57:06.148789 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:57:11.148992 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:57:13.148718 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:58:10.149016 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:58:17.148954 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:58:17.148997 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 20:59:14.149296 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 20:59:22.149048 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 20:59:23.148886 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:00:19.149376 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:00:26.149309 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:00:29.148856 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:01:23.149634 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:01:30.149579 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:01:33.149117 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:02:29.149586 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:02:35.148637 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:02:37.149400 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:03:32.149004 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:03:40.148796 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:03:41.149618 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:04:35.149397 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:04:45.148898 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:04:46.148714 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:05:39.149665 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:05:50.148985 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:05:50.149032 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:06:44.148776 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:06:54.149246 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:06:56.148916 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:07:49.148879 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> 21:07:58.149478 IP 192.168.1.10.123 > 133.243.238.163.123: NTPv4, Client, length 48
> 21:08:00.149183 IP 192.168.1.10.123 > 122.154.11.67.123: NTPv4, Client, length 48
> 21:09:56.149530 IP 192.168.1.10.123 > 202.73.37.27.123: NTPv4, Client, length 48
> ^C
> 49 packets captured
> 230 packets received by filter
> 0 packets dropped by kernel

You're not getting responses back from __any__ of those NTP servers. If
you have a firewall *in front* of your BSD box (meaning a separate box,
not ipfw/ipfilter/pf on the same BSD box!), then this is likely the
cause of the problem.

If not, there's two explanations:

* Your uplink provider is filtering incoming packets destined to your
network on port 123.
* If you're using NAT on this BSD box, somehow your NAT rules are
broken, or you're doing something bizarre with network interfaces.

The point here is that you should be seeing NTP responses destined to
192.168.1.10 (which is obviously a NAT'd IP -- again, I don't know where
or how you're doing the NAT), but you're not.

That explains why ntpdate and ntpd both are not working for you.

You also confirm this by stating that you're able to talk to NTP servers
if you use a dial-up connection on the same box, so it really sounds
like you have a NAT problem and not an NTP problem.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 10
Date: Wed, 27 Feb 2008 10:43:18 -0500
From: Jim Pingle <li...@pingle.org>
Subject: Re: 7.0-PRERELEASE Fatal Trap 12 with sysctl and acpi
To: sta...@freebsd.org
Message-ID: <47C58516...@pingle.org>
Content-Type: text/plain; charset=UTF-8; format=flowed

Jim Pingle wrote:
> Jim Pingle wrote:
>> I'm having some trouble with a SuperMicro SuperServer 6022L-6 that
>> previously ran 7.0-BETA4 without problems. Today, I updated this
>> machine to 7.0-PRERELEASE and now it will not fully boot unless I
>> disable ACPI. A quick search of the PR database didn't turn up
>> anything similar with sysctl and ACPI.

I wiped the machine, installed from the RC3 CD, and it did not crash. If I
update to RELENG_7, the crash comes back. If I go back to RELENG_7_0, there
is no crash.

> Kernel config is GENERIC, with ULE scheduler and "options ASR_COMPAT"

This happens with GENERIC, with no extra options, as well as with my custom
kernel.

>> If I get some time next week I might try a binary search of commits
>> between BETA4 and now, to pinpoint where it stopped working.
>
> As a buildworld/buildkernel takes about an hour and a half on this
> hardware (2x2GHz Xeon), I haven't fully narrowed this down yet. It is
> somewhere between 12/15/2007 (works) and 12/25/2007 (crashes). I glanced
> at the archives between those points but I didn't see any similar
> complaints. The only ACPI references I saw in the archives were
> referring to thermal zone problems, and a commit relating to those.
>
> I'll return to this early next week to see if I can narrow this down
> more precisely.

I tried a binary search of the source tree to narrow down the crash. I found
that one vector for the crash was introduced between 2007/12/19 20:00:00 and
2007/12/19 23:59:00, which left me with only a handful of files to test.

By process of elimination, I found that if I backed some changes out in
machdep.c, the crash stopped.

machdep.c v1.658 2007/08/09 njl - Boots OK
machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes

The confusing part (to me) is that my next step was to update all the way to
RELENG_7 as of yesterday, then back out those same changes, but the crash
still happened. So either I misidentified the cause of the crash -- which is
quite possible -- or it was reintroduced in some other change (or both!).

I have a debug kernel built now, and I can generate vmcore files at will.
Does anyone have any ideas? Is there some more information that I can gather
that will help find the cause?

Now that I have some more solid information, I'll open a PR.

Jim


------------------------------

Message: 11
Date: Wed, 27 Feb 2008 17:32:55 +0100
From: "Patrick M. Hausen" <hau...@punkt.de>
Subject: Documentation of NO_* knobs
To: sta...@freebsd.org
Message-ID: <20080227163...@hugo10.ka.punkt.de>
Content-Type: text/plain; charset=iso-8859-1

Hi, all,

is there an exhaustive list of all possible NO_* knobs
for make.conf? While experimenting with NanoBSD I found
e.g. that the handbook

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/nanobsd/index.html

mentions

NO_EXAMPLES
NO_SYSCONS
...

- just two out of many. Yet, these are not in the manpage of
make.conf(5) on a 6.3-RELEASE. So, where did the handbook
author find them?

Thanks,
Patrick
--
punkt.de GmbH * Vorholzstr. 25 * 76137 Karlsruhe
Tel. 0721 9109 0 * Fax 0721 9109 100
in...@punkt.de http://www.punkt.de
Gf: Jürgen Egeling AG Mannheim 108285


------------------------------

Message: 12
Date: Wed, 27 Feb 2008 09:18:31 -0800
From: Jeremy Chadwick <koi...@freebsd.org>
Subject: Re: Documentation of NO_* knobs
To: "Patrick M. Hausen" <hau...@punkt.de>
Cc: sta...@freebsd.org
Message-ID: <2008022717...@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 05:32:55PM +0100, Patrick M. Hausen wrote:
> is there an exhaustive list of all possible NO_* knobs
> for make.conf? While experimenting with NanoBSD I found
> e.g. that the handbook
> - just two out of many. Yet, these are not in the manpage of
> make.conf(5) on a 6.3-RELEASE. So, where did the handbook
> author find them?

I think you're looking for all the WITHOUT knobs in src.conf(5).

Starting with RELENG_7, all of the NO knobs for removal of features in
the base system were moved into /etc/src.conf and renamed to WITHOUT.
Some may also have changed names, so look closely.

Note that src.conf(5) does not apply to RELENG_6 or earlier, where the
knobs are named NO_xxx. You can get a list of those from
/usr/share/examples/etc/make.conf.

Hope this helps.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 13
Date: Wed, 27 Feb 2008 10:32:48 -0800
From: Stephen Hurd <sh...@sasktel.net>
Subject: Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
To: Jeremy Chadwick <koi...@freebsd.org>
Cc: freebsd...@freebsd.org
Message-ID: <47C5ACD0...@sasktel.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Jeremy Chadwick wrote:
>> And after the reboot, the READ_DMA timeouts were back.
>>
>
> You're not the only one seeing this behaviour. There are too many posts
> in the past reporting similar. Here's the breakdown:
>
> * Some have switched to alternate operating systems (usually Linux) for
> a short while and seen no sign of DMA timeouts.
>

Booting the 6.3-RELEASE CD seems to make the problem go away... possibly
7.0 stresses the HD more?

> However: in your case, your disk does look to have problems based on the
> SMART output you provided. It does not matter how new/old the disk is,
> by the way. I'll point out the problematic stats. You need to replace
> the disk ASAP.
>

Yeah, that's pretty much what I figured, the timing (ie: the moment I
boot 7.0-RELEASE) is the only bit that seems fishy. This HD has been
powered on pretty much continuously for around three years. Given that
it's a Maxtor, I'm honestly a bit surprised that it's lasted as well as
it has.

>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
>> 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 4
>>
>
> This shows you've had 4 reallocated sectors, meaning your disk does in
> fact have bad blocks. In 90% of the cases out there, bad blocks
> continue to "grow" over time, due to whatever reason (I remember reading
> an article explaining it, but I can't for the life of me find the URL).
>

This is unusual now? I've always "known" that a small number of bad
blocks is normal. Time to readjust my knowledge again?

>> 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 48
>>
>
> This is excessive, and may be attributing to problems. A hard disk
> running at 48C is not a good sign. This should really be somewhere
> between high 20s and mid 30s.
>

Yeah, this is a known problem with this drive... it's been running hot
for years. I always figured it was due to the rotational speed increase
in commodity drives.

>> Error 2 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
>> When the command that caused the error occurred, the device was in an unknown state.
>> Error 1 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
>> When the command that caused the error occurred, the device was in an unknown state.
>>
>
> These are automated SMART log entries confirming the DMA failures. The
> fact that SMART saw them means that the disk is also aware of said
> issues. These may have been caused by the reallocated sectors. It's
> also interesting that the LBAs are different than the ones FreeBSD
> reported issues with.
>

If that power on lifetime is accurate, that was at least a year ago...
but I can't find any documentation as to when the power-on lifetime
wraps or what it actually indicates. I'm assuming that it is total
power on time since the drive was manufactured. If it's total hours as
a 16-bit integer, it shouldn't wrap. Is there a way of getting the
"current" power-on lifetime value that you're aware of? That power on
minutes is interesting, but its current value is lower than the value at
the error (but higher than the power uptime of the system):
9 Power_On_Minutes 0x0032 219 219 000 Old_age
Always - 1061h+40m

Also interesting is that after getting more errors from FreeBSD, I did
not get more errors in smartctl.

> My advice to you is: replace the disk ASAP. This problem will only get
> worse. Try another hard disk brand too (I don't have anything "against"
> Maxtor, but usually its recommended to avoid a brand you have problems
> with until the next time you have issues, then switch brands, etc.
> etc...). I'm very fond of Western Digital's SE16, RE, and RE2 series
> currently. But avoid Fujitsu and Samsung (both have a long track record
> of having buggy drive firmwares, forcing vendors to make custom
> workarounds for issues); stick with Seagate, Western Digital, or Maxtor.
>

Yeah, that's my plan... but I wanted to stake out some whining rights in
advance so I can do the "But you said it was a bad HD or cable! Now I'm
out $x00 and my system still doesn't work! Help me or I switch to
DragonFly BSD/Desktop BSD/Linux which is perfect and has no problems!"
thing. Then go on Slashdot and post long rambling messages about how
FreeBSD is dead and it doesn't matter than the manpages on any given
Linux box are useless.

------------------------------

Message: 14
Date: Wed, 27 Feb 2008 19:45:47 +0100 (CET)
From: Oliver Fromme <ol...@lurza.secnetix.de>
Subject: Re: fsck_ufs: cannot alloc 94208 bytes for inoinfo
To: freebsd...@FreeBSD.ORG, om-lis...@omx.ch
Message-ID: <200802271845....@lurza.secnetix.de>
Content-Type: text/plain; charset=ISO-8859-1

Olivier Mueller wrote:
> "fsck_ufs: cannot alloc 94208 bytes for inoinfo"
>
> This is what I get after about one hour while trying a fsck on a large
> (1.4TB) partition "broken" since a power outage.
>
> HW: HP DL380G5, under freebsd 6.1/i386, with 1GB of RAM and:

Your fsck will need roughly 1 GB memory per 1 TB file
system size. That formular was posted some time ago on
the -fs mailing list. It only applies with the default
newfs parameters -- if you used other parameters (inode
density, bsize/fsize), fsck's memory requirements are
different.

> Now I added this to the /boot/loader.conf:
> kern.maxdsiz="1073741824" # 1GB
> kern.dfldsiz="1073741824" # 1GB
> and I'm trying the fsck again, but I'm not sure it will help.

Given the above formula, it's probably not enough, so you
might have to increase it further. Also make sure that
you have enough swap space.

If you have a spare box, it might be helpful to transplant
some RAM from it so fsck doesn't have to swap.

Best regards
Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd

"If you think C++ is not overly complicated, just what is a protected
abstract virtual base pure virtual private destructor, and when was the
last time you needed one?"
-- Tom Cargil, C++ Journal


------------------------------

Message: 15
Date: Wed, 27 Feb 2008 20:04:10 +0100
From: "Patrick M. Hausen" <hau...@punkt.de>
Subject: Re: Documentation of NO_* knobs
To: Jeremy Chadwick <koi...@freebsd.org>
Cc: sta...@freebsd.org
Message-ID: <20080227190...@hugo10.ka.punkt.de>
Content-Type: text/plain; charset=iso-8859-1

Hello,

On Wed, Feb 27, 2008 at 09:18:31AM -0800, Jeremy Chadwick wrote:

> I think you're looking for all the WITHOUT knobs in src.conf(5).

I'm running RELENG_6_3 on production machines unlikely to change
any time real soon.

> Note that src.conf(5) does not apply to RELENG_6 or earlier, where the
> knobs are named NO_xxx. You can get a list of those from
> /usr/share/examples/etc/make.conf.

I knew about that file. NO_SYSCONS and NO_EXAMPLES are neither
in this one nor in the manpage for make.conf.
Yet they are in the handbook example for NanoBSD builds.

That's why I'm asking for the "official" exhaustive list.

Thanks for taking the time to answer,
Patrick
--
punkt.de GmbH * Vorholzstr. 25 * 76137 Karlsruhe
Tel. 0721 9109 0 * Fax 0721 9109 100
in...@punkt.de http://www.punkt.de
Gf: Jürgen Egeling AG Mannheim 108285


------------------------------

Message: 16
Date: Wed, 27 Feb 2008 11:05:09 -0800
From: Jeremy Chadwick <koi...@freebsd.org>
Subject: Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
To: Stephen Hurd <sh...@sasktel.net>
Cc: freebsd...@freebsd.org
Message-ID: <2008022719...@eos.sc1.parodius.com>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 10:32:48AM -0800, Stephen Hurd wrote:
> Booting the 6.3-RELEASE CD seems to make the problem go away... possibly
> 7.0 stresses the HD more?

We don't know. The author of the ATA subsystem is somewhat MIA, likely
busy with real-life things (jobs, etc.). My main point was that you're
not alone with DMA timeouts and other oddities, but the reallocated
sector count being non-zero doesn't permit me to say "Yeah, you're
experiencing what others are".

>>> SMART Attributes Data Structure revision number: 16
>>> Vendor Specific SMART Attributes with Thresholds:
>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
>>> WHEN_FAILED RAW_VALUE
>>> 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always
>>> - 4
>>
>> This shows you've had 4 reallocated sectors, meaning your disk does in
>> fact have bad blocks. In 90% of the cases out there, bad blocks
>> continue to "grow" over time, due to whatever reason (I remember reading
>> an article explaining it, but I can't for the life of me find the URL).
>
> This is unusual now? I've always "known" that a small number of bad blocks
> is normal. Time to readjust my knowledge again?

This isn't normal. The realloc sector count in SMART, when a disk comes
out of the factory, is zero. That number increases only when new
defects are found, and when those sectors are remapped to spares which
are available (there is a limited number of spares). This is also
called a "grown defect list".

This isn't to be confused with what's called a "physical defect list",
which are known sectors/LBAs which are bad, straight out of the factory.
On ATA disks, the manufacturer stores the list in the drive and its not
modifiable via formatting or even a BIOS-based format (e.g. a SATA RAID
controller); some vendors do implement "low level formatting" via
undocumented ATA commands, which can erase that list, but that's besides
the point. On SCSI disks, the physical defect list is readable and also
erasable via a low-level format, but SCSI disks also have a grown defect
list which is separate.

What I'm trying to say is that your disk already has 4 bad blocks that
the disk firmware itself is aware of, which means chances are there are
others which it hasn't figured out. A high number of ECCs could
indicate that as well.

>>> 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always
>>> - 48
>>
>> This is excessive, and may be attributing to problems. A hard disk
>> running at 48C is not a good sign. This should really be somewhere
>> between high 20s and mid 30s.
>
> Yeah, this is a known problem with this drive... it's been running hot for
> years. I always figured it was due to the rotational speed increase in
> commodity drives.

7200rpm disks shouldn't be running at 48C. None of my 7200rpm disks, in
my barely-cooled FreeBSD box at home (e.g. two 1100rpm fans and that's
it) get anywhere near that. 36C is the highest they've seen -- and
there's 4 stacked right on top of one another.

Heck, on my disks, the SMART warning threshold (set by the manufacturer,
which is Western Digital) is 45C.

10krpm disks probably run hotter, but are not commodity.

>>> Error 2 occurred at disk power-on lifetime: 5171 hours (215 days + 11
>>> hours)
>>> When the command that caused the error occurred, the device was in an
>>> unknown state.
>>> Error 1 occurred at disk power-on lifetime: 5171 hours (215 days + 11
>>> hours)
>>> When the command that caused the error occurred, the device was in an
>>> unknown state.
>>
>> These are automated SMART log entries confirming the DMA failures. The
>> fact that SMART saw them means that the disk is also aware of said
>> issues. These may have been caused by the reallocated sectors. It's
>> also interesting that the LBAs are different than the ones FreeBSD
>> reported issues with.
>
> If that power on lifetime is accurate, that was at least a year ago... but
> I can't find any documentation as to when the power-on lifetime wraps or
> what it actually indicates. I'm assuming that it is total power on time
> since the drive was manufactured.

Correct: it indicates how many hours the drive itself has been powered
on as an aggregate total. E.g. if powered on for 48 hours, then shut
off for 3 hours, then powered on for another 7, the stat would read 55
hours.

> If it's total hours as a 16-bit integer, it shouldn't wrap. Is there a way
> of getting the "current" power-on lifetime value that you're aware of?

I would have to go look at the SMART extension to ATA/SATA and find out
how large the counter is. It probably varies from vendor to vendor too,
as SMART, despite being a standard, has a lot of "loose ends" in the
specification which vendors take advantage of.

> That power on minutes is interesting, but its current value is lower
> than the value at the error (but higher than the power uptime of the
> system):
> 9 Power_On_Minutes 0x0032 219 219 000 Old_age Always
> - 1061h+40m

smartctl contains an internal database of what attributes map to what
drive model (that's what the "In smartctl database" message is about).
smartctl believes that your Maxtor disk stores the number of powered on
*minutes* in attribute 9, while other vendors store the number of
*hours* in attribute 9. The smartctl(8) manpage outlines some of the
"one-offs" that are required to make smartctl show such counters
correctly, as they vary from vendor to vendor. Look at the -v N,OPTION
flag. You might consider trying '9.raw48' for attribute, to get it to
print the raw values. Interpreting these values should really be punted
to the smartmontools-users list, though. Bruce can probably help.

> Also interesting is that after getting more errors from FreeBSD, I did not
> get more errors in smartctl.

Right, which goes back to what I said, re: this could indeed be a
FreeBSD issue, since others are reporting DMA timeouts with drives and
controllers that are guaranteed to be functional/working.

>> My advice to you is: replace the disk ASAP. This problem will only get
>> worse. Try another hard disk brand too (I don't have anything "against"
>> Maxtor, but usually its recommended to avoid a brand you have problems
>> with until the next time you have issues, then switch brands, etc.
>> etc...). I'm very fond of Western Digital's SE16, RE, and RE2 series
>> currently. But avoid Fujitsu and Samsung (both have a long track record
>> of having buggy drive firmwares, forcing vendors to make custom
>> workarounds for issues); stick with Seagate, Western Digital, or Maxtor.
>
> Yeah, that's my plan... but I wanted to stake out some whining rights in
> advance so I can do the "But you said it was a bad HD or cable! Now I'm
> out $x00 and my system still doesn't work! Help me or I switch to
> DragonFly BSD/Desktop BSD/Linux which is perfect and has no problems!"
> thing. Then go on Slashdot and post long rambling messages about how
> FreeBSD is dead and it doesn't matter than the manpages on any given Linux
> box are useless.

Heh. :-) Well, it's all about troubleshooting I suppose. There's no
guaranteed way to pinpoint what piece is responsible; that depressing
fact applies to most technology these days. I can't even trust the term
"transport error" with SCSI mediums in this day and age; is it the
cable, the controller, a controller BIOS bug, bad terminator, or a buggy
OS? Lots of time and money is required to track it all down.

If you replace the disk and you still continue to see DMA errors, then
my vote would be that you're experiencing the same thing others (and
myself, on one occasion) are. I've done my best to bring this issue to
the attention of proper people in recent days, and that's all I can say
on the matter.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

------------------------------

Message: 17
Date: Wed, 27 Feb 2008 10:28:02 -0800
From: Tim Kientzle <kien...@freebsd.org>
Subject: Re: Tar regression from 6.2 to 6.3 with --strip-components
To: Kris Kennaway <kr...@freebsd.org>
Cc: Jan Mikkelsen <ja...@transactionware.com>,
"freebsd...@freebsd.org" <freebsd...@freebsd.org>
Message-ID: <47C5ABB2...@freebsd.org>
Content-Type: text/plain; charset=us-ascii; format=flowed

>> I've just noticed a regression in tar from 6.2 to 6.3:
>>
>> Running this on 6.2 produces no output:
>>
>> #!/bin/sh
>> mkdir -p a b output
>> touch a/file1 b/file2
>> tar cf test.tar a b
>> tar -x -C output --strip-components 1 -f test.tar
>>
>> On 6.3, it produces this output:
>>
>> : Invalid empty pathname
>> : Invalid empty pathname
>> tar: Error exit delayed from previous errors.

Please file this in a PR so I won't lose
track. I don't have time to investigate this
right now, but I should be able to get to it
sometime next week.

Tim


------------------------------

Message: 18
Date: Wed, 27 Feb 2008 19:20:56 +0000
From: Alex Zbyslaw <xf...@dial.pipex.com>
Subject: Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
To: Stephen Hurd <sh...@sasktel.net>
Cc: freebsd...@freebsd.org
Message-ID: <47C5B818...@dial.pipex.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Stephen Hurd wrote:

> Jeremy Chadwick wrote:
>
>>> SMART Attributes Data Structure revision number: 16
>>> Vendor Specific SMART Attributes with Thresholds:
>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>>> UPDATED WHEN_FAILED RAW_VALUE
>>> 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail
>>> Always - 4
>>
>> This shows you've had 4 reallocated sectors, meaning your disk does in
>> fact have bad blocks. In 90% of the cases out there, bad blocks
>> continue to "grow" over time, due to whatever reason (I remember reading
>> an article explaining it, but I can't for the life of me find the URL).
>
>
> This is unusual now? I've always "known" that a small number of bad
> blocks is normal. Time to readjust my knowledge again?

I have bought disks where the value of Reallocated_Sector_Ct was not 0,
at least by the time I looked at it with smartctl. Nothing bad has
happened to those disks in several years (hope that's not tempting fate).

I have always assumed that what matters is when this value *changes*.
If it's not changing, who cares? smartd will monitor disks and email
you when certain attributes change (e.g. Pre-fail attributes like
Reallocated_Sector_Ct). If it changed, it would mean that an attempt to
write data had failed and that reallocation had happened.

e.g. from smartd.conf

/dev/ad4 -o on -S on -a -m root -M daily

If your Current_Pending_Sector were non-zero you'd be in trouble, I believe.

0.02, pinch of salt, not an expert, slippery when hot, long time since I
read the specs, etc etc.

--Alex


------------------------------

Message: 19
Date: Wed, 27 Feb 2008 20:50:58 +0100
From: Holger Kipp <h...@alogis.com>
Subject: [solved/workaround?] Re: em very slow, shared irq... on 6.3p8
To: Mike Tancsa <mi...@sentex.net>
Cc: freebsd...@freebsd.org
Message-ID: <20080227195...@intserv.int1.b.intern>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 09:50:16AM -0500, Mike Tancsa wrote:

more details below. as it currently is, polling seems to do
the trick, however handling several em-interfaces with the
same irq (mind you, it is pci) shouldn't cause delays of
up to 1.5 seconds for a simple ping... Therefore I consider
using polling for a nearly idle system more a workaround
than a solution to this problem :-(

> At 05:49 AM 2/27/2008, Holger Kipp wrote:
>
> >I therefore assume that the problem is between receiving the
> >irq from em<x> and getting the data from the interface on the firewall
> >itself.
>
> I would try upgrading to 6.3R (there are several em driver bug fixes)
done. system is now 6.3-RELEASE-p1 which also gave me -c option for pciconf
and msi syscontrols (were missing in the old 6.2).


> and then try the box with
> % cat /boot/loader.conf
> hw.pci.enable_msi=1
>
> ...if the cards support msi.
>
> I think pciconf -lvc should tell you if the cards and slots support it or
> not.

pciconf -lvc says for all em<x>:
cap 05[d0] = MSI supports 1 message, 64 bit
so I assume they do support MSI.

with msi disabled I get

38 packets transmitted, 38 packets received, 0% packet loss
round-trip min/avg/max/stddev = 4.833/228.022/1539.337/339.768 ms

with msi enabled (via sysctl) I get

33 packets transmitted, 33 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.865/156.421/1339.841/239.375 ms

so looks equally bad (I don't consider 30-40 packets a meaningful sample).
I don't know if it makes any differences if switched on directly in
loader.conf, though.

enabling polling (withous MSI) gives

30 packets transmitted, 30 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.366/0.790/1.339/0.290 ms

(maybe I should have used HZ=2000 to keep it below 0.6ms ;-)

> Also, if you dont need IPV6, use FAST_IPSEC. It does not need
> mpsafe. If you do need IPSEC and IPV6, 7.0R got rid of that restriction.

I think this enough changes in one go for a production system ;-)

many thanks for the recommendations!

Best regards,
Holger


------------------------------

Message: 20
Date: Wed, 27 Feb 2008 11:53:01 -0800 (PST)
From: Matthew Dillon <dil...@apollo.backplane.com>
Subject: Re: fsck_ufs: cannot alloc 94208 bytes for inoinfo
To: freebsd...@freebsd.org, om-lis...@omx.ch
Message-ID: <200802271953....@apollo.backplane.com>

fsck's memory usage is directly related to the number of inodes and
the number of directories in the filesystem. Directories are
particularly memory intensive.

I've found on my backup system that a UFS1 filesystem with 40 million
inodes is about the limit that can be fsck'd (at least with a 32 bit
architecture). My cron jobs keep my backup partition below that point.
Even in a 64 bit environment you will be limited by swap and the sheer
time it takes for fsck to run. It takes well over 8 hours for my
backup system to fsck.

You can also reduce fsck time by reducing the number of cylinder
groups on the disk. I usually max them out (-c 999 and newfs then
sets it to the maximum, usually in the 50-80 range). This will
improve performance but not reduce the memory required.

-Matt

------------------------------

Message: 21
Date: Wed, 27 Feb 2008 21:09:23 +0100
From: Holger Kipp <h...@alogis.com>
Subject: Re: [solved!] Re: em very slow, shared irq... on 6.2p8
To: freebsd...@freebsd.org
Message-ID: <20080227200...@intserv.int1.b.intern>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 08:50:58PM +0100, Holger Kipp wrote:
> On Wed, Feb 27, 2008 at 09:50:16AM -0500, Mike Tancsa wrote:
>
> more details below. as it currently is, polling seems to do
> the trick, however handling several em-interfaces with the
> same irq (mind you, it is pci) shouldn't cause delays of
> up to 1.5 seconds for a simple ping... Therefore I consider
> using polling for a nearly idle system more a workaround
> than a solution to this problem :-(
[...]
> with msi enabled (via sysctl) I get
>
> 33 packets transmitted, 33 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 1.865/156.421/1339.841/239.375 ms
>
> so looks equally bad (I don't consider 30-40 packets a meaningful sample).
> I don't know if it makes any differences if switched on directly in
> loader.conf, though.

have now activated msi in loader.conf and get very good results again.

38 packets transmitted, 38 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.291/0.425/0.595/0.067 ms

without polling activated. So it was msi after all I needed here.
Maybe this should go into docu for em or ifconfig?

> enabling polling (withous MSI) gives
>
> 30 packets transmitted, 30 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 0.366/0.790/1.339/0.290 ms

this is still the same with msi activated in loader.conf

Best regards,
Holger Kipp


------------------------------

Message: 22
Date: Wed, 27 Feb 2008 15:13:59 -0500
From: Mike Tancsa <mi...@sentex.net>
Subject: Re: [solved!] Re: em very slow, shared irq... on 6.2p8
To: Holger Kipp <h...@alogis.com>, freebsd...@freebsd.org
Message-ID: <200802272016....@lava.sentex.ca>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 03:09 PM 2/27/2008, Holger Kipp wrote:
>On Wed, Feb 27, 2008 at 08:50:58PM +0100, Holger Kipp wrote:
> > On Wed, Feb 27, 2008 at 09:50:16AM -0500, Mike Tancsa wrote:
> >
> > more details below. as it currently is, polling seems to do
> > the trick, however handling several em-interfaces with the
> > same irq (mind you, it is pci) shouldn't cause delays of
> > up to 1.5 seconds for a simple ping... Therefore I consider
> > using polling for a nearly idle system more a workaround
> > than a solution to this problem :-(
>[...]
> > with msi enabled (via sysctl) I get
> >
> > 33 packets transmitted, 33 packets received, 0% packet loss
> > round-trip min/avg/max/stddev = 1.865/156.421/1339.841/239.375 ms
> >
> > so looks equally bad (I don't consider 30-40 packets a meaningful sample).
> > I don't know if it makes any differences if switched on directly in
> > loader.conf, though.
>
>have now activated msi in loader.conf and get very good results again.
>
>38 packets transmitted, 38 packets received, 0% packet loss
>round-trip min/avg/max/stddev = 0.291/0.425/0.595/0.067 ms
>
>without polling activated. So it was msi after all I needed here.
>Maybe this should go into docu for em or ifconfig?

Hi,
Yes, sorry I should have mentioned, you need to reboot. But
I strongly suggest upgrading to 6.3R as there are a number of em bugs
that are fixed.... Perhaps some IRQ issues as well. But for MSI in
general, I think the Intel guy recommended running that way for the NIC.

---Mike

------------------------------

Message: 23
Date: Wed, 27 Feb 2008 13:20:50 -0700
From: Scott Long <sco...@samsco.org>
Subject: Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE
To: Stephen Hurd <sh...@sasktel.net>
Cc: Jeremy Chadwick <koi...@freebsd.org>, freebsd...@freebsd.org
Message-ID: <47C5C622...@samsco.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Stephen Hurd wrote:
>>
>> This shows you've had 4 reallocated sectors, meaning your disk does in
>> fact have bad blocks. In 90% of the cases out there, bad blocks
>> continue to "grow" over time, due to whatever reason (I remember reading
>> an article explaining it, but I can't for the life of me find the URL).
>>
>
> This is unusual now? I've always "known" that a small number of bad
> blocks is normal. Time to readjust my knowledge again?

Modern drives hide bad sectors by keeping a pool of spare tracks and
automatically remapping bad sectors to that pool. The problem lies in
when the drive has aged enough that it's run out of spares.

>
>>> 194 Temperature_Celsius 0x0032 253 253 000 Old_age
>>> Always - 48
>>>
>>
>> This is excessive, and may be attributing to problems. A hard disk
>> running at 48C is not a good sign. This should really be somewhere
>> between high 20s and mid 30s.
>>
>
> Yeah, this is a known problem with this drive... it's been running hot
> for years. I always figured it was due to the rotational speed increase
> in commodity drives.

48C is high, but I wouldn't consider it excessive. Drives that start
generating "excessive" heat tend to fail shortly thereafter. I do agree
that the heat is probably shortening the lifespan on the drive.

>
>>> Error 2 occurred at disk power-on lifetime: 5171 hours (215 days + 11
>>> hours)
>>> When the command that caused the error occurred, the device was in
>>> an unknown state.
>>> Error 1 occurred at disk power-on lifetime: 5171 hours (215 days + 11
>>> hours)
>>> When the command that caused the error occurred, the device was in
>>> an unknown state.
>>>
>>
>> These are automated SMART log entries confirming the DMA failures. The
>> fact that SMART saw them means that the disk is also aware of said
>> issues. These may have been caused by the reallocated sectors. It's
>> also interesting that the LBAs are different than the ones FreeBSD
>> reported issues with.
>>
>
> If that power on lifetime is accurate, that was at least a year ago...
> but I can't find any documentation as to when the power-on lifetime
> wraps or what it actually indicates. I'm assuming that it is total
> power on time since the drive was manufactured. If it's total hours as
> a 16-bit integer, it shouldn't wrap. Is there a way of getting the
> "current" power-on lifetime value that you're aware of? That power on
> minutes is interesting, but its current value is lower than the value at
> the error (but higher than the power uptime of the system):
> 9 Power_On_Minutes 0x0032 219 219 000 Old_age
> Always - 1061h+40m
>
> Also interesting is that after getting more errors from FreeBSD, I did
> not get more errors in smartctl.
>

The errors you're getting from FreeBSD have nothing to do directly with
SMART. The driver thinks that commands are timing out and that the
drive is becoming unresponsive. Whether they actually are is another
question. Given that this problem changes behavior with the version of
FreeBSD that you're running (and even happens in completely virtual
environments like vmware) I'm betting that it's a driver problem and not
a hardware problem, though you should probably think about migrating
your data off to a new drive sometime soon.

I'd like to attack these driver problems. What I need is to spend a
couple of days with an affected system that can reliably reproduce the
problem, instrumenting and testing the driver. I have a number of
theories about what might be going wrong, but nothing that I'm
definitely sure about. If you are willing to set up your system with
remote power and remote serial, and if we knew a reliable way to
reproduce the problem, I could probably have the problem identified and
fixed pretty quickly.

Scott


------------------------------

Message: 24
Date: Wed, 27 Feb 2008 12:16:27 -0800
From: "Jack Vogel" <jfv...@gmail.com>
Subject: Re: [solved/workaround?] Re: em very slow, shared irq... on
6.3p8
To: "Holger Kipp" <h...@alogis.com>
Cc: freebsd...@freebsd.org
Message-ID:
<2a41acea0802271216m279...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Feb 27, 2008 at 11:50 AM, Holger Kipp <h...@alogis.com> wrote:
> On Wed, Feb 27, 2008 at 09:50:16AM -0500, Mike Tancsa wrote:
>
> more details below. as it currently is, polling seems to do
> the trick, however handling several em-interfaces with the
> same irq (mind you, it is pci) shouldn't cause delays of
> up to 1.5 seconds for a simple ping... Therefore I consider
> using polling for a nearly idle system more a workaround
> than a solution to this problem :-(
>
> > At 05:49 AM 2/27/2008, Holger Kipp wrote:
> >
> > >I therefore assume that the problem is between receiving the
> > >irq from em<x> and getting the data from the interface on the firewall
> > >itself.
> >
> > I would try upgrading to 6.3R (there are several em driver bug fixes)
> done. system is now 6.3-RELEASE-p1 which also gave me -c option for pciconf
> and msi syscontrols (were missing in the old 6.2).
>
>
> > and then try the box with
> > % cat /boot/loader.conf
> > hw.pci.enable_msi=1
> >
> > ...if the cards support msi.
> >
> > I think pciconf -lvc should tell you if the cards and slots support it or
> > not.
>
> pciconf -lvc says for all em<x>:
> cap 05[d0] = MSI supports 1 message, 64 bit
> so I assume they do support MSI.
>
> with msi disabled I get
>
> 38 packets transmitted, 38 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 4.833/228.022/1539.337/339.768 ms
>
> with msi enabled (via sysctl) I get
>
> 33 packets transmitted, 33 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 1.865/156.421/1339.841/239.375 ms
>
> so looks equally bad (I don't consider 30-40 packets a meaningful sample).
> I don't know if it makes any differences if switched on directly in
> loader.conf, though.
>
> enabling polling (withous MSI) gives
>
> 30 packets transmitted, 30 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 0.366/0.790/1.339/0.290 ms
>
> (maybe I should have used HZ=2000 to keep it below 0.6ms ;-)
>
> > Also, if you dont need IPV6, use FAST_IPSEC. It does not need
> > mpsafe. If you do need IPSEC and IPV6, 7.0R got rid of that restriction.
>
> I think this enough changes in one go for a production system ;-)

Hmmm, something is really broken here that POLL just is just bypassing,
what was the adapter type exactly (pciconf -l). Sorry, but I must have
missed this email earlier.

Jack


------------------------------

Message: 25
Date: Wed, 27 Feb 2008 22:11:44 +0100
From: Holger Kipp <h...@intserv.int1.b.intern>
Subject: Re: [solved/workaround?] Re: em very slow, shared irq... on
6.3p8
To: Jack Vogel <jfv...@gmail.com>
Cc: freebsd...@freebsd.org
Message-ID: <20080227211...@intserv.int1.b.intern>
Content-Type: text/plain; charset=us-ascii

On Wed, Feb 27, 2008 at 12:16:27PM -0800, Jack Vogel wrote:
>
> Hmmm, something is really broken here that POLL just is just bypassing,
> what was the adapter type exactly (pciconf -l). Sorry, but I must have
> missed this email earlier.

Problem is with em0-em13 that many share an IRQ 16/17, and because I was using
6.2-p8 and therefore did not have MSI enabled/available. Polling was also not
compiled into the kernel.

Polling alone will give times between 0.2 and 1.3 ms (with HZ=1000), and
MSI alone (after activated during boot within loader.conf) will give
best results (around 0.4 to 0.5 ms for a ping through firewall and back
across two lans). This was then with 6.3-p1

Still, 6.3-p1 without msi and polling is still very slow in handling IRQs from
the em-nics.

I still don't know exactly why this took up to 1.5 seconds (sometimes even
more) without MSI or POLLING. Please see my previous emails for dmesg.

pciconf -l gives:

em0@pci4:4:0: class=0x020000 card=0x11998086 chip=0x10b58086 rev=0x03 hdr=0x00
em1@pci4:4:1: class=0x020000 card=0x11998086 chip=0x10b58086 rev=0x03 hdr=0x00
em2@pci4:6:0: class=0x020000 card=0x11998086 chip=0x10b58086 rev=0x03 hdr=0x00
em3@pci4:6:1: class=0x020000 card=0x11998086 chip=0x10b58086 rev=0x03 hdr=0x00
em4@pci12:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em5@pci12:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em6@pci13:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em7@pci13:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em8@pci16:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em9@pci16:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em10@pci17:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em11@pci17:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
em12@pci18:0:0: class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00
em13@pci19:0:0: class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00

We have one quad PCI-X 64bit card (the first four) and two quad PCI-Express cards.
The last two devices are on-board ports.

device = '82546GB PRO/1000 GT Quad Port Server Adapter'
device = '82546GB PRO/1000 GT Quad Port Server Adapter'
device = '82546GB PRO/1000 GT Quad Port Server Adapter'
device = '82546GB PRO/1000 GT Quad Port Server Adapter'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82571EB Gigabit Ethernet Controller'
device = '82573E Intel Corporation 82573E Gigabit Ethernet Controller (Copper)'
device = '82573L Intel PRO/1000 PL Network Adaptor'

Is this helpful? Please let me know if you need anything else.

Best regards,
Holger Kipp


------------------------------

Message: 26
Date: Wed, 27 Feb 2008 22:19:02 +0100 (CET)
From: Jisakiel <jisa...@yahoo.es>
Subject: Problems with promise SATA300 TX2Plus
To: freebsd...@freebsd.org
Message-ID: <144237....@web27507.mail.ukl.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

Greetings. I was trying to install FreeBSD 7 on an old machine to make it a fileserver. Machine is an AMD K7 1200 on an Abit AN-7 mobo (nforce2 400), which has booted Freebsd 7.0 RC1 beforehand (for testing ZFS; only ACPI didn't work as it hung it).

I recently bought a PCI SATA card on the cheap variety, Promise SATA300 TX2Plus, and a couple of 500GB Maxtors. Unfortunately, when any hard drive (both of the 500's and an older 250 Maxtor which I use) is plugged to the SATA ports of the pci card I get an instantaneous reboot when trying to boot the bootonly cd (it doesn't reach the bootloader). If no disks are plugged it works but hangs while booting (with ACPI disabled).

Things I tried:

- Booting another OS. Card works perfectly both in linux (2.6.24) and windows, with the two hard drives visible.
- Leaving just one SATA drive connected to the whole system through the PCI card. It doesn't matter which drive I let, both the 250G and the 500G make it reboot.
- Swapping the power cable.
- Unplugging my former 2 drives to ease the load on the Enermax 365W power supply. It works in Windows though, with everything on...
- Disabling both integrated SATA of the motherboard (SI3112 which works both in linux and freebsd), and integrated firewire (just in case).
- Moving the PCI card to another slot. I only have another pci device, a soundcard, and moving it changed the bios boot order (first the pci, then the integrated sata or viceversa) with no change at all afterwards.
- Trying 6.3 livecd. Same insta-reboot.
- Updating the card's bios. There is no newer one, 1.0.0.34 is what came and what's on the promise web.

Is there any way to make this card work on FreeBSD 7, or anything else that I could try? I bought it specifically for that; I might be in time to return it, though I'd have to buy another one on the same price range then (perhaps Promise FastTrak TX2300, HightPoint RocketRaid 1520 or Adaptec 1210SA which I seem to recall that doesn't work too well on linux). Otherwise I'd be more than willing to help debugging it ^^.

Thanks everybody...


jisa...@yahoo.es


---------------------------------

¿Con Mascota por primera vez? - Sé un mejor Amigo
Entra en Yahoo! Respuestas.


------------------------------

Message: 27
Date: Thu, 28 Feb 2008 08:33:21 +1100
From: "Jan Mikkelsen" <ja...@transactionware.com>
Subject: RE: Tar regression from 6.2 to 6.3 with --strip-components
To: "'Tim Kientzle'" <kien...@freebsd.org>, "'Kris Kennaway'"
<kr...@freebsd.org>
Cc: freebsd...@freebsd.org
Message-ID: <001201c87988$60dcb7d0$0301a8c0@STUDYPC>
Content-Type: text/plain; charset="us-ascii"

Tim Kientzle wrote:
> Please file this in a PR so I won't lose
> track. I don't have time to investigate this
> right now, but I should be able to get to it
> sometime next week.

Filed as PR bin/121158.

Thanks!

Jan.

------------------------------


End of freebsd-stable Digest, Vol 241, Issue 5
**********************************************

0 new messages