Hi Folks,
I am frequently getting read-only file system error on my server.
We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware
We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.
I will appreciate you if somebody help me to get rid out of this issue.
How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.
> We are daily processing millions of rows and loadiing into database. We
> have marked that when we create a new database it worked fine upto 20 or
> 25 days. After that we are getting errors like "read only file system" ,
> data is corrupted. Therefore we are running fsck to remove bad blocks
> from the disk. However, after running fsck also we are getting the same
> error.
You have a hardware problem. Get your system administrator to isolate and
repair the bad hardware.
--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.
--
Sent via pgsql-general mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).
> 32 GB RAM
> assemble hardware
Did you follow proper ESD precautions when building this machine??
> We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
> are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.
>
> I will appreciate you if somebody help me to get rid out of this issue.
Sounds like your hardware is bad. Could be mobo / cpu / memory or
RAID card. Does this machine "hang" every so often or anything?
I'd run memtest86+ on it first to confirm good cpu / memory / mobo.
Quick factoid from my days as an electronics instructor in the USAF,
95% of all ESD induced failures are latent in nature, either resulting
in catastrophic failure or thermal degradation some months or years
down the road.
> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 16
> Buffer I/O error on device sdd, logical block 2
> Buffer I/O error on device sdd, logical block 3
> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 0
Looks like you've got a bad drive.
> Sounds like your hardware is bad. Could be mobo / cpu / memory or
> RAID card. Does this machine "hang" every so often or anything?
>
It's not out of the question for this sort of problem to be caused by a
bad driver too. In this case it seems more likely it's a drive failure
though.
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
gr...@2ndQuadrant.com www.2ndQuadrant.com
On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas <samj...@yahoo.com> wrote:
>
> We are getting the below errors after 20 or 25 days of database creation.
>
> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error
PostgreSQL cannot make a file system read only. The OS does that.
What do your system logs in /var/log have to say when this happens?
There's got to be more context in there than we're getting evidence of
here on the list.
> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.
My guess is that it's not a fixed number, just what you've seen so
far, could happen in a day or a month or a year.
>
> The size of database is very huge. We are loading millions of records every day and also fetching from the database is also high. Even the disks are not full. We are not dropping the old database.
>
> What is the reason for this issue?
Looks like bad hardware to me.
> How can we ensure that it is not a database issue?
It can't be a database number, as the database isn't capable of
actually locking a file system. It can trigger an OS bug maybe that
causes this problem, but given that no one else is having this issue
with Centos 5.3, I'm gonna bet on bad hardware.
> We are using
> GridSQL: 1.1.0.9
> PostgreSQL 8.3
> Architecture Details:
> CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
> 32 GB RAM
I will repeat, Areca does NOT MAKE the high point rocket raid. I will
also add that a Rocket Raid is not, IMHO, suitable for a production
environment. If it's an actual Areca, then the model will be
something like 11xx, 12xx, or 16xx numbers, not 3520.
In particular, if you're on a Linux system check the output of the
"dmesg" command. I expect to see warnings about file system errors and
about the file system being re-mounted read-only. I won't be surprised
to see disk/raid errors either.
>> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.
>
> My guess is that it's not a fixed number, just what you've seen so
> far, could happen in a day or a month or a year.
Do you do any RAID scrubbing? On what schedule? Do you test the disks
that are part of your RAID array using their internal SMART diagnostics?
Is your server ever hard-reset or rebooted due to loss of power?
(PostgreSQL is fine with this on a proper setup, but if you have a buggy
RAID controller or one that caches writes without a battery backup, it's
going to have issues).
--
Craig Ringer