Re: [GENERAL] READ ONLY & I/O ERROR

Grzegorz Jaśkiewicz

unread,

Nov 26, 2009, 8:44:26 AM11/26/09

to

On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samj...@yahoo.com> wrote:

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware

We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.

this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).

If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.

--
GJ

Grzegorz Jaśkiewicz

unread,

Nov 26, 2009, 9:06:05 AM11/26/09

to

2009/11/26 Sam Jas <samj...@yahoo.com>

How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.

a) don't top post,
b) don't send emails in html,
c) man e2fsck , I am sure it is described all around net million times. it is something I haven't done in a while - so please search for instructions, for instance on redhat's website.

--
GJ

Grzegorz Jaśkiewicz

unread,

Nov 26, 2009, 9:09:03 AM11/26/09

to

oh, and fourth - if you get filesystem errors, I would inspect drives, raid card, etc - because those usually mean that something's fishy.

Alan Hodgson

unread,

Nov 26, 2009, 10:38:58 AM11/26/09

to

On Thursday 26 November 2009, Sam Jas <samj...@yahoo.com> wrote:

> We are daily processing millions of rows and loadiing into database. We
> have marked that when we create a new database it worked fine upto 20 or
> 25 days. After that we are getting errors like "read only file system" ,
> data is corrupted. Therefore we are running fsck to remove bad blocks
> from the disk. However, after running fsck also we are getting the same
> error.

You have a hardware problem. Get your system administrator to isolate and
repair the bad hardware.

--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.

--
Sent via pgsql-general mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Scott Marlowe

unread,

Nov 26, 2009, 11:57:45 AM11/26/09

to

On Thu, Nov 26, 2009 at 6:40 AM, Sam Jas <samj...@yahoo.com> wrote:
>
> Hi Folks,
>
> I am frequently getting read-only file system error on my server.
>
> We are using postgreSQL, GridSQL database. The size of database is very huge.
> Architecture Details:
> CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port

Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).

> 32 GB RAM
> assemble hardware

Did you follow proper ESD precautions when building this machine??

> We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
> are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.
>

> I will appreciate you if somebody help me to get rid out of this issue.

Sounds like your hardware is bad. Could be mobo / cpu / memory or
RAID card. Does this machine "hang" every so often or anything?

I'd run memtest86+ on it first to confirm good cpu / memory / mobo.

Quick factoid from my days as an electronics instructor in the USAF,
95% of all ESD induced failures are latent in nature, either resulting
in catastrophic failure or thermal degradation some months or years
down the road.

Scott Marlowe

unread,

Nov 27, 2009, 11:01:52 AM11/27/09

to

On Fri, Nov 27, 2009 at 4:53 AM, Sam Jas <samj...@yahoo.com> wrote:
>
> I will check that one. Also i have read one forum which tells that whenever you face disk i/o run "dmesg" command it will give you detail information. Today again i face disk i/o and i have run "dmesg" it has given me below o/p. Can somebody help me to explain what is it telling ?

> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 16
> Buffer I/O error on device sdd, logical block 2
> Buffer I/O error on device sdd, logical block 3
> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 0

Looks like you've got a bad drive.

Greg Smith

unread,

Nov 30, 2009, 3:29:30 PM11/30/09

to

Scott Marlowe wrote:
> Areca doesn't make the high point rocket raid cards (which are medium
> quality RAID cards).
>

On a good day maybe. HighPoint is a pretty miserable RAID vendor--in
the same league as Promise from what I've seen as far as their Linux
driver support goes. In generally, and for reasons I'm not completely
sure of, everyone selling "fake RAID" cards seems to be completely
incompetent. The page at http://linuxmafia.com/faq/Hardware/sata.html
hasn't been updated in a while, but as of 2007 all the current HighPoint
cards were still based on closed-source drivers only. Completely
worthless hardware IMHO.

> Sounds like your hardware is bad. Could be mobo / cpu / memory or
> RAID card. Does this machine "hang" every so often or anything?
>

It's not out of the question for this sort of problem to be caused by a
bad driver too. In this case it seems more likely it's a drive failure
though.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
gr...@2ndQuadrant.com www.2ndQuadrant.com

Scott Marlowe

unread,

Dec 2, 2009, 10:35:22 AM12/2/09

to

(please use text only email to the list)

On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas <samj...@yahoo.com> wrote:
>
> We are getting the below errors after 20 or 25 days of database creation.
>
> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error

PostgreSQL cannot make a file system read only. The OS does that.

What do your system logs in /var/log have to say when this happens?
There's got to be more context in there than we're getting evidence of
here on the list.

> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.

My guess is that it's not a fixed number, just what you've seen so
far, could happen in a day or a month or a year.

>
> The size of database is very huge. We are loading millions of records every day and also fetching from the database is also high. Even the disks are not full. We are not dropping the old database.
>
> What is the reason for this issue?

Looks like bad hardware to me.

> How can we ensure that it is not a database issue?

It can't be a database number, as the database isn't capable of
actually locking a file system. It can trigger an OS bug maybe that
causes this problem, but given that no one else is having this issue
with Centos 5.3, I'm gonna bet on bad hardware.

> We are using
> GridSQL: 1.1.0.9
> PostgreSQL 8.3

> Architecture Details:
> CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port

> 32 GB RAM

I will repeat, Areca does NOT MAKE the high point rocket raid. I will
also add that a Rocket Raid is not, IMHO, suitable for a production
environment. If it's an actual Areca, then the model will be
something like 11xx, 12xx, or 16xx numbers, not 3520.

Craig Ringer

unread,

Dec 2, 2009, 11:16:52 AM12/2/09

to

On 2/12/2009 11:35 PM, Scott Marlowe wrote:
> (please use text only email to the list)
>
> On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas<samj...@yahoo.com> wrote:
>>
>> We are getting the below errors after 20 or 25 days of database creation.
>>
>> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
>> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error
>
> PostgreSQL cannot make a file system read only. The OS does that.
>
> What do your system logs in /var/log have to say when this happens?
> There's got to be more context in there than we're getting evidence of
> here on the list.

In particular, if you're on a Linux system check the output of the
"dmesg" command. I expect to see warnings about file system errors and
about the file system being re-mounted read-only. I won't be surprised
to see disk/raid errors either.

>> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.
>
> My guess is that it's not a fixed number, just what you've seen so
> far, could happen in a day or a month or a year.

Do you do any RAID scrubbing? On what schedule? Do you test the disks
that are part of your RAID array using their internal SMART diagnostics?

Is your server ever hard-reset or rebooted due to loss of power?
(PostgreSQL is fine with this on a proper setup, but if you have a buggy
RAID controller or one that caches writes without a battery backup, it's
going to have issues).

--
Craig Ringer