Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Value too large for defined data type...

2,553 views
Skip to first unread message

maps

unread,
Dec 22, 2009, 12:20:19 PM12/22/09
to
This error has been popping up since a few days back on our production
servers. Googling it retrieved the following article:
http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view

I'm not into Solaris administration but this issue has been bugging me
since quite some time. If anyone can help explain it to me in simple
terms; also what are the recommended solutions.

Thanks.

Richard B. Gilbert

unread,
Dec 22, 2009, 1:43:02 PM12/22/09
to

In plain English, it's trying to put ten pounds of shit in a five pound
bag! Most likely it's trying to put a 32 bit value into a 16 bit field.
Possibly trying to put a 16 bit value into an eight byte field or even a
64 bit value into a smaller field. Whichever sizes are involved, the
value you are trying to save has more bits than the variable you are
trying to store it in.

http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
explains in a little more detail.

Whatever program is issuing this error message is carelessly written!
The compiler may have recognized the problem and issued an error or
warning message. If so, somebody chose to ignore it.

Go thou and clean your house and maybe demote a programmer to a position
better suited to his abilities. If you wrote it, put a bag over your
head and hope that no one recognizes you.

Andrew Gabriel

unread,
Dec 22, 2009, 2:55:21 PM12/22/09
to
In article <39d2df64-5d73-4650...@t42g2000yqd.googlegroups.com>,

I can think of several things, but first, you'll have to give some context.

--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]

da...@smooth1.co.uk

unread,
Dec 22, 2009, 6:16:09 PM12/22/09
to

The program that you are running has no been compiled to handle large
user IDs or large group IDs and it is trying to process a large user
ID or group ID.

http://docs.sun.com/app/docs/doc/802-5366/6i94lvccc?a=view

"Previous Solaris 2.x software releases used 32-bit data types to
contain the user IDs (UIDs) and group IDs (GIDs), but UIDs and GIDs
were constrained to a maximum useful value of 60000. In the Solaris
2.5.1 release, the limit on UID and GID values has been raised to the
maximum value of a signed integer, or 2147483647."

David.

maps

unread,
Dec 23, 2009, 11:15:35 AM12/23/09
to
Thanks to all for your replies ! I am just a humble programmer who is
needed to use solaris as our production servers run on it. there is a
separate solaris admin team which handles all administration tasks.

Coming back to the topic, allow me to cite a few examples and my
understanding on this whole issue:
1. we started facing this problem over the weekend with sendmail with
the following command erroring out:
sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate | /usr/
lib/sendmail -t
stdin: Value too large for defined data type
when this stopped working we came up with a workaround:
sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate >
mailfilefinal
/usr/lib/sendmail -t < mailfilefinal
and this worked.
2. The following also stopped working:
zcat somearchive.Z | diff somefile -
diff: stdin: Value too large for defined data type

It is interesting to note that none of the above programs (sendmail,
diff etc) have a modification date in the past one week (so they were
not compiled/replaced/modified). I am not sure if this has anything to
do with a 32-bit binary being executed on a 64-bit system (which
should work perfectly fine, as far as i know).

By the way, our production box has Solaris 5.9 64-bit for Sun Sparc
(obtained using isainfo -kv)

Thanks.

Chris Ridd

unread,
Dec 23, 2009, 11:21:41 AM12/23/09
to

It looks more like pipes aren't working. Has the shell changed?

--
Chris

maps

unread,
Dec 23, 2009, 11:31:10 AM12/23/09
to
I actually do not know if somebody from the admin team changed it. I
have tried it in bash, ksh and csh and this still fails.

Chris Ridd

unread,
Dec 23, 2009, 11:38:47 AM12/23/09
to
On 2009-12-23 16:31:10 +0000, maps said:

> I actually do not know if somebody from the admin team changed it. I
> have tried it in bash, ksh and csh and this still fails.

It probably isn't a shell problem then. Has libc changed? The shells
will be calling pipe(2) which is in that library.

Does stracing each side of the pipe show the call that's failing?

--
Chris

maps

unread,
Dec 23, 2009, 2:06:53 PM12/23/09
to
Chris, how does one check libc ? How to strace ? I know abt dtrace but
havent used it yet. And isnt dtrace available only since solaris 10 ?

Let me know.

Thanks.

Chris Ridd

unread,
Dec 23, 2009, 2:54:29 PM12/23/09
to
On 2009-12-23 19:06:53 +0000, maps said:

> Chris, how does one check libc ? How to strace ? I know abt dtrace but
> havent used it yet. And isnt dtrace available only since solaris 10 ?

Check the modification time on /usr/lib/libc.*

I meant truss, not strace. Sorry! Truss has a good manpage.

--
Chris

maps

unread,
Dec 23, 2009, 3:34:31 PM12/23/09
to
> Check the modification time on /usr/lib/libc.*

None of the files were modified last weekend (when the problem
actually started). All of them are at least over 2 months old.

> I meant truss, not strace. Sorry! Truss has a good manpage.

Good idea! I will check it out. thanks!

-maps.

Thomas Tornblom

unread,
Dec 23, 2009, 6:22:06 PM12/23/09
to
maps <mapsi...@gmail.com> writes:

I have seen this issue when some field in a stat buffer is out of
range, like atime, mtime or ctime, which is defined as a time_t, which
is typedef:ed as a long. A long is 32-bit in a 32-bit binary and
64-bit in a 64-bit binary, so you can set the times on a file with a
64-bit application that will be too large for a 32-bit application.

If you know how to provoke the issue, run the application under truss
to see if this is the issue, or if you are on s10 (or later) you may
try "dtrace" to see what happens.

Thomas

maps

unread,
Dec 28, 2009, 12:00:01 PM12/28/09
to
Heres an update:

A case was opened with Sun and they suggested a workaround for the -
problem in the following manner:

zcat somearchive.Z | diff somefile /dev/stdin

I compared the truss results from both variations and the output looks


the same except for write:

diff: stdin: Value too large for defined data type

write(1, " C o m p a n y , S t o r".., 3748) Err#32 EPIPE
Received signal #13, SIGPIPE [default]

for /dev/stdin:

write(1, " C o m p a n y , S t o r".., 3748) = 3748

Is there a way I can dig deeper ?

-maps.

Chris Ridd

unread,
Dec 28, 2009, 2:17:11 PM12/28/09
to

The man page for write says (Solaris 10) it returns EPIPE when:

EPIPE An attempt is made to write to a pipe or a FIFO
that is not open for reading by any process, or
that has only one end open (or to a file descrip-
tor created by socket(3SOCKET), using type
SOCK_STREAM that is no longer connected to a peer
endpoint). A SIGPIPE signal will also be sent to
the thread. The process dies unless special pro-
visions were taken to catch or ignore the signal.

So what's happening to the process with the other end of the pipe?

--
Chris

maps

unread,
Dec 28, 2009, 2:58:46 PM12/28/09
to

> So what's happening to the process with the other end of the pipe?

we are comparing the standard input with somefile


zcat somearchive.Z | diff somefile -

-maps.

jgh

unread,
Dec 28, 2009, 3:15:23 PM12/28/09
to

How big are these files? Any chance you're running
into a 2GB or 4GB limit?

--
Jeremy

maps

unread,
Dec 28, 2009, 3:16:57 PM12/28/09
to
> How big are these files?  Any chance you're running
> into a 2GB or 4GB limit?

The files are 3/4 kilobytes.

-maps.

Chris Ridd

unread,
Dec 29, 2009, 4:40:20 AM12/29/09
to

I mean at a lower level - is the process with the other end of the pipe
going away earlier than expected, or closing the pipe for some other
reason?

How's disk space in /tmp and /var? (Long shot is you're out of
temporary space somewhere.)

--
Chris

maps

unread,
Dec 29, 2009, 1:47:37 PM12/29/09
to
> I mean at a lower level - is the process with the other end of the pipe
> going away earlier than expected, or closing the pipe for some other
> reason?

I cannot say for sure; diff is a standard unix tool and as yet I never
cared to learn how it works at the lower level.

> How's disk space in /tmp and /var? (Long shot is you're out of
> temporary space somewhere.)

No problems with disk space; there's plenty of space available for
both directories. /var is rwxr-xr-x; not sure if this will cause any
problems ?

-maps.

jgh

unread,
Dec 29, 2009, 2:04:43 PM12/29/09
to
On Mon, 28 Dec 2009 19:17:11 +0000, Chris Ridd wrote:

> On 2009-12-28 17:00:01 +0000, maps said:
>
>> Heres an update:
>>
>> A case was opened with Sun and they suggested a workaround for the -
>> problem in the following manner:
>>
>> zcat somearchive.Z | diff somefile /dev/stdin
>>
>> I compared the truss results from both variations and the output looks
>> the same except for write:
>>
>> diff: stdin: Value too large for defined data type write(1, " C o m p a
>> n y , S t o r".., 3748) Err#32 EPIPE
>> Received signal #13, SIGPIPE [default]

Exactly what were you trussing here? The zcat process,
the diff process, or both?

I'd like to see a truss of the diff, but I don't
think that was it.

--
jgh

maps

unread,
Dec 29, 2009, 2:41:42 PM12/29/09
to
> Exactly what were you trussing here?  The zcat process,
> the diff process, or both?

trussed the entire command line (i.e. zcat and diff)


> I'd like to see a truss of the diff, but I don't
> think that was it.

diff, per se, works and its only in this particular usage it fails. So
I am not sure if trussing diff itself would help

-maps.

Chris Ridd

unread,
Dec 29, 2009, 2:54:32 PM12/29/09
to
On 2009-12-29 19:41:42 +0000, maps said:

>>
>> Exactly what were you trussing here?  The zcat process,
>> the diff process, or both?
>
> trussed the entire command line (i.e. zcat and diff)

In that case you only trussed the zcat :-) You need to do something
like this instead to truss both ends of the pipe:

truss -o /tmp/lhs zcat foo.Z | truss -o /tmp/rhs whatever command

>
>
>> I'd like to see a truss of the diff, but I don't
>> think that was it.
>
> diff, per se, works and its only in this particular usage it fails. So
> I am not sure if trussing diff itself would help

The left hand side of the pipe is failing to write data, and the main
documented way that can happen is if there's something odd happening to
the process on the right hand side of the pipe. So you need to look
closely at that.

You'll probably want to use truss's -d option (perhaps a better
timestamping option?) on both invocations so you can correlate what's
happening at any given point.

--
Chris

maps

unread,
Dec 29, 2009, 4:08:39 PM12/29/09
to
ok heres the truss output after calling truss on both sides with -d
option:

Base time stamp: 1262120381.9711 [ Tue Dec 29 14:59:41 CST 2009 ]
0.0000 execve("/usr/bin/diff", 0xFFBFFABC, 0xFFBFFACC) argc = 3
0.0041 resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
0.0044 resolvepath("/usr/bin/diff", "/usr/bin/diff", 1023) = 13
0.0048 stat("/usr/bin/diff", 0xFFBFF880) = 0
0.0048 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
0.0052 stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libc.so.1",
0xFFBFF388) Err#2 ENOENT
0.0056 stat("/usr/lib/libc.so.1", 0xFFBFF388) = 0
0.0057 resolvepath("/usr/lib/libc.so.1", "/usr/lib/libc.so.1", 1023)
= 18
0.0064 open("/usr/lib/libc.so.1", O_RDONLY) = 3
0.0067 mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_ALIGN, 3, 0) = 0xFF3B0000
0.0072 mmap(0x00010000, 802816, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF280000
0.0076 mmap(0xFF280000, 703464, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF280000
0.0078 mmap(0xFF33C000, 24496, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 3, 704512) = 0xFF33C000
0.0080 mmap(0xFF342000, 6720, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFF342000
0.0081 munmap(0xFF32C000, 65536) = 0
0.0086 memcntl(0xFF280000, 117696, MC_ADVISE, MADV_WILLNEED, 0, 0) =
0
0.0087 close(3) = 0
0.0089 stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libdl.so.1",
0xFFBFF388) Err#2 ENOENT
0.0094 stat("/usr/lib/libdl.so.1", 0xFFBFF388) = 0
0.0096 resolvepath("/usr/lib/libdl.so.1", "/usr/lib/libdl.so.1",
1023) = 19
0.0098 open("/usr/lib/libdl.so.1", O_RDONLY) = 3
0.0103 mmap(0xFF3B0000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF3B0000
0.0107 mmap(0x00010000, 8192, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3A0000
0.0111 mmap(0xFF3A0000, 2210, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
0.0115 close(3) = 0
0.0117 stat("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1",
0xFFBFF088) = 0
0.0122 resolvepath("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1", "/
usr/platform/FJSV,GPUZC-M/lib/libc_psr.so.1", 1023) = 44
0.0128 open("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1", O_RDONLY)
= 3
0.0133 mmap(0xFF3B0000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF3B0000
0.0136 munmap(0xFF3B2000, 24576) = 0
0.0138 close(3) = 0
0.0140 mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF390000
0.0148 getustack(0xFFBFF6C4)
0.0152 getrlimit(RLIMIT_STACK, 0xFFBFF6BC) = 0
0.0155 getcontext(0xFFBFF4F8)
0.0158 setustack(0xFF3439B4)
0.0165 issetugid() = 0
0.0167 brk(0x00028CD0) = 0
0.0168 brk(0x0002ACD0) = 0
0.0172 stat("/xxxxx/temp/20091221.src_file.csv.new", 0x00028BA0) = 0
0.0178 fstat(0, 0x00028C28) Err#79 EOVERFLOW
0.0188 fstat64(2, 0xFFBFE308) = 0
0.0192 write(2, " d i f f : ", 6) = 6
0.0196 open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
0.0199 open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
O_RDONLY) Err#2 ENOENT
0.0206 write(2, " s t d i n", 5) = 5
0.0208 write(2, " : ", 2) = 2
0.0212 write(2, " V a l u e t o o l a".., 37) = 37
0.0216 write(2, "\n", 1) = 1
0.0220 _exit(2)
= 0
0.0231 brk(0x000E8FA8) = 0
0.0237 fstat64(3, 0xFFBFE9E8) = 0
0.0240 ioctl(3, TCGETA, 0xFFBFEACC) Err#25 ENOTTY
0.0246 read(3, "1F9D90 CDEB48113C6 M1E16".., 8192) = 2206
0.0248 ioctl(1, TCGETA, 0xFFBFEA04) Err#22 EINVAL
0.0250 fstat64(1, 0xFFBFEA78) = 0
0.0252 brk(0x000E8FA8) = 0
0.0253 brk(0x000EAFA8) = 0
0.0254 fstat64(1, 0xFFBFE920) = 0
0.0258 read(3, 0x000E61CC, 8192) = 0
0.0259 write(1, " C o m p a n y , S t o r".., 3748) = 3748
0.0261 llseek(3, 0, SEEK_CUR) = 2206
0.0262 _exit(0)

maps

unread,
Dec 29, 2009, 4:10:08 PM12/29/09
to
0.0178 fstat(0, 0x00028C28) Err#79
EOVERFLOW

This is probably the root cause of the issue. But then we already had
guessed it earlier; so how can this problem be resolved now ?

Darren Dunham

unread,
Dec 29, 2009, 4:59:49 PM12/29/09
to

EOVERFLOW
The file size in bytes or the number of blocks allo-
cated to the file or the file serial number cannot be
represented correctly in the structure pointed to by
buf.


I wouldn't expect this failure on a pipe which doesn't have a size or
a serial number. I would expect it on a "large" file, but not how
you're using it.

Given that this used to work and now doesn't in more than one case,
and that the error doesn't make sense to me, I wonder if something got
screwed up on the system. Seems very odd to me.

--
Darren

Chris Ridd

unread,
Dec 30, 2009, 4:34:32 AM12/30/09
to
On 2009-12-29 21:59:49 +0000, Darren Dunham said:

> On Dec 29, 1:10 pm, maps <mapsiddi...@gmail.com> wrote:
>> 0.0178 fstat(0, 0x00028C28)                           Err#79
>> EOVERFLOW
>>
>> This is probably the root cause of the issue. But then we already had
>> guessed it earlier; so how can this problem be resolved now ?
>
> EOVERFLOW
> The file size in bytes or the number of blocks allo-
> cated to the file or the file serial number cannot be
> represented correctly in the structure pointed to by
> buf.
>
>
> I wouldn't expect this failure on a pipe which doesn't have a size or
> a serial number. I would expect it on a "large" file, but not how
> you're using it.

It would appear in this truss that diff is opening the largefile
"/xxxxx/temp/20091221.src_file.csv.new".

> Given that this used to work and now doesn't in more than one case,
> and that the error doesn't make sense to me, I wonder if something got
> screwed up on the system. Seems very odd to me.

Me too. I'd repeat the trusses on the original pipe sequence which
didn't involve diff (IIRC).

--
Chris

maps

unread,
Dec 30, 2009, 10:38:14 AM12/30/09
to
> Given that this used to work and now doesn't in more than one case,
> and that the error doesn't make sense to me, I wonder if something got
> screwed up on the system.  Seems very odd to me.

We've been wondering the same; and no one has been able to resolve it
so far. I am wondering how do we move beyond this point ? How do we
dig deeper ? or backtrace, maybe ?

-maps.

Darren Dunham

unread,
Dec 30, 2009, 12:01:22 PM12/30/09
to

Get another system. Try the same commands there. If it works,
something on your current system is screwed up. Consider
reinstalling.

If the same commands don't work on another system (and fail in the
same way), then we're all missing something.

Report back.

If you don't have other hardware, do you have anything that would run
VMware or Virtualbox? Maybe you could spin up a Solaris virtual
machine pretty quickly to try as another data point.

--
Darren

maps

unread,
Dec 30, 2009, 2:18:52 PM12/30/09
to
> Get another system.  Try the same commands there.  If it works,
> something on your current system is screwed up.  Consider
> reinstalling.

Works on other systems. Interestingly on one of the other system fstat
with the same parameters works. I tried this on another server having
solaris 10 and the command runs just fine.


> If you don't have other hardware, do you have anything that would run
> VMware or Virtualbox?  Maybe you could spin up a Solaris virtual
> machine pretty quickly to try as another data point.

Maybe but the problem is we dont understand where the problem is
originating from and unless we do so it may not be possible to
recreate the problem.

-maps.

Richard B. Gilbert

unread,
Dec 30, 2009, 2:32:26 PM12/30/09
to

Do I assume that 0.0178 is a line number rather than a part of the fstat
call? Where did the value 0x00028C28 come from? I assume it's a
pointer to something, but what?

I really don't want to try to go back to the beginning of this thread in
order to make sense of your post. Try posting a "reproducer"; e.g.
reproduce the error with fewer than, say, fifteen lines of code.

maps

unread,
Dec 30, 2009, 2:44:54 PM12/30/09
to
> Do I assume that 0.0178 is a line number rather than a part of the fstat
> call?  Where did the value 0x00028C28 come from?  I assume it's a
> pointer to something, but what?

thats the timestamp from truss output; it wasnt a part of the fstat
call.

> I really don't want to try to go back to the beginning of this thread in
> order to make sense of your post.  Try posting a "reproducer"; e.g.
> reproduce the error with fewer than, say, fifteen lines of code.

Well this error didnt come up while executing a code. It started
appearing all of a sudden on one of our production servers whenever we
used a pipe ( | ). In the specific case I quoted above, it occurred in
the following manner :

zcat foo.txt.Z | diff foo.txt -
diff: stdin: value too large for defined data type.

-maps.

Richard B. Gilbert

unread,
Dec 30, 2009, 4:28:18 PM12/30/09
to

What has changed since it last worked? O/S upgrades? Patches
installed? Different hardware platform?

If you don't use "change control", problems like this are the reason why
you should! I thought change control was a PITA when my employers first
introduced it but I've seen the advantages. Briefly: before making any
change to the hardware, firmware, software, or operating procedures, you
document exactly what you are going to do and how you plan to back out
the change if it causes problems.

maps

unread,
Dec 30, 2009, 4:37:35 PM12/30/09
to
> What has changed since it last worked?  O/S upgrades?  Patches
> installed?  Different hardware platform?

None to my knowledge.

> If you don't use "change control", problems like this are the reason why
> you should!  I thought change control was a PITA when my employers first
> introduced it but I've seen the advantages.  Briefly: before making any
> change to the hardware, firmware, software, or operating procedures, you
> document exactly what you are going to do and how you plan to back out
> the change if it causes problems.

Oh we are quite sound in this aspect. Trust me, we have so many
processes that they do become PITA (good abbrn by the way, lol) and I
really mean it.

Coming back to the problem; I was wondering where does fstat get
invoked from ? is it present in libc.so ? One of our admins suggested
that this might be due to a 64-bit library getting replaced by a 32-
bit one. But the last modification timestamp on all of the files under
suspicion look far too old to suggest that possibility.

-maps.

Richard B. Gilbert

unread,
Dec 30, 2009, 4:50:30 PM12/30/09
to
maps wrote:
>> What has changed since it last worked? O/S upgrades? Patches
>> installed? Different hardware platform?
>
> None to my knowledge.
>
>> If you don't use "change control", problems like this are the reason why
>> you should! I thought change control was a PITA when my employers first
>> introduced it but I've seen the advantages. Briefly: before making any
>> change to the hardware, firmware, software, or operating procedures, you
>> document exactly what you are going to do and how you plan to back out
>> the change if it causes problems.
>
> Oh we are quite sound in this aspect. Trust me, we have so many
> processes that they do become PITA (good abbrn by the way, lol) and I
> really mean it.
>
> Coming back to the problem; I was wondering where does fstat get

Can't help you there! The program that was executing at the time of the
failure is the guilty program. If it's part of the O/S you need to talk
to Sun about it. If it's something you bought from a third party, you
need to talk to the vendor. If it's home made you need to get a "loader
map" for the program to be able to pin down exactly what's going on.

Chris Ridd

unread,
Dec 31, 2009, 2:28:49 AM12/31/09
to
On 2009-12-30 19:18:52 +0000, maps said:

>>
>> Get another system.  Try the same commands there.  If it works,
>> something on your current system is screwed up.  Consider
>> reinstalling.
>
> Works on other systems. Interestingly on one of the other system fstat
> with the same parameters works. I tried this on another server having
> solaris 10 and the command runs just fine.

Are you using exactly the same input files on each system? As it looks
like one file's now larger than 32-bits on the problem system, you need
to keep all the input the same when you're testing.

--
Chris

Casper H.S. Dik

unread,
Dec 31, 2009, 11:17:34 AM12/31/09
to
maps <mapsi...@gmail.com> writes:

>> Do I assume that 0.0178 is a line number rather than a part of the fstat

>> call? =A0Where did the value 0x00028C28 come from? =A0I assume it's a


>> pointer to something, but what?

>thats the timestamp from truss output; it wasnt a part of the fstat
>call.

>> I really don't want to try to go back to the beginning of this thread in

>> order to make sense of your post. =A0Try posting a "reproducer"; e.g.


>> reproduce the error with fewer than, say, fifteen lines of code.

>Well this error didnt come up while executing a code. It started
>appearing all of a sudden on one of our production servers whenever we
>used a pipe ( | ). In the specific case I quoted above, it occurred in
>the following manner :

>zcat foo.txt.Z | diff foo.txt -
>diff: stdin: value too large for defined data type.

Since a pipe is a file with a small number of bytes, the only possible issue
is the dev number of the pipe.

Is this a 64 bit and is the system up for quite some time?

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

maps

unread,
Jan 1, 2010, 9:28:04 PM1/1/10
to
> Are you using exactly the same input files on each system? As it looks
> like one file's now larger than 32-bits on the problem system, you need
> to keep all the input the same when you're testing.


Not sure I understand; can you please elaborate ?

-maps.

maps

unread,
Jan 1, 2010, 9:29:55 PM1/1/10
to
> Since a pipe is a file with a small number of bytes, the only possible issue
> is the dev number of the pipe.
>
> Is this a 64 bit and is the system up for quite some time?

Spot on ! But how do these factors relate to the problem at hand ?

-maps.

Greg Andrews

unread,
Jan 4, 2010, 3:21:09 PM1/4/10
to
I see a couple of puzzling things in the truss output:

1) In several lines, this /usr/bin/diff tries to find standard OS
files (libc.so.1, libdl.so.1, Locale files) in a strange place:

> 0.0052 stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libc.so.1",
>0xFFBFF388) Err#2 ENOENT

...
> 0.0089 stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libdl.so.1",
...


> 0.0196 open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
>SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
> 0.0199 open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
>O_RDONLY) Err#2 ENOENT


2) /usr/bin/diff performs a different fstat() on stdin and stderr
(and in the middle of outputting the complaint to stderr, makes
the locale calls noted above):

> 0.0178 fstat(0, 0x00028C28) Err#79 EOVERFLOW
> 0.0188 fstat64(2, 0xFFBFE308) = 0
> 0.0192 write(2, " d i f f : ", 6) = 6
> 0.0196 open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
>SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
> 0.0199 open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
>O_RDONLY) Err#2 ENOENT
> 0.0206 write(2, " s t d i n", 5) = 5
> 0.0208 write(2, " : ", 2) = 2
> 0.0212 write(2, " V a l u e t o o l a".., 37) = 37
> 0.0216 write(2, "\n", 1) = 1
> 0.0220 _exit(2)


Why would /usr/bin/diff invoke fstat() on stdin, but fstat64() on stderr?

But the real head scratcher is why the Solaris /usr/bin/diff would be
searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
looks like an application's build directory.

Has the environment variable LD_LIBRARY_PATH been set to something on
this server? If so, what happens when it is removed?

-Greg
--
Do NOT reply via e-mail.
Reply in the newsgroup.

maps

unread,
Jan 4, 2010, 4:52:23 PM1/4/10
to
> Why would /usr/bin/diff invoke fstat() on stdin, but fstat64() on stderr?

This is the real head scratcher !!

> But the real head scratcher is why the Solaris /usr/bin/diff would be
> searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
> looks like an application's build directory.
>
> Has the environment variable LD_LIBRARY_PATH been set to something on
> this server?  If so, what happens when it is removed?

Indeed it contains the lib path u mentioned above.

-maps.

Darren Dunham

unread,
Jan 4, 2010, 6:26:45 PM1/4/10
to
On Jan 4, 1:52 pm, maps <mapsiddi...@gmail.com> wrote:
> > Has the environment variable LD_LIBRARY_PATH been set to something on
> > this server?  If so, what happens when it is removed?
>
> Indeed it contains the lib path u mentioned above.

He is suggesting that you unset LD_LIBRARY_PATH and run the command
again. Unexpected things can happen when you populate
LD_LIBRARY_PATH.

Does the behavior change?

--
Darren

bi-g...@id.ethz.ch

unread,
Jan 6, 2010, 2:59:43 AM1/6/10
to
On Dec 31 2009, 5:17 pm, Casper H.S. Dik <Casper....@Sun.COM> wrote:

Hi Casper

We have had the same problem with using PHP-mail(). The server was
running over 1000 days without a reboot. The PHP-mail() has stopping
suddenly at 2. Jan 2010. After a reboot yesterday it works fine again.

Regards Ruedi :-)

maps

unread,
Jan 7, 2010, 10:26:57 AM1/7/10
to
> He is suggesting that you unset LD_LIBRARY_PATH and run the command
> again.  Unexpected things can happen when you populate
> LD_LIBRARY_PATH.
>
> Does the behavior change?

No it does not.

-maps.

tony_c...@yahoo.com

unread,
Jan 7, 2010, 4:14:15 PM1/7/10
to
ge...@panix.com (Greg Andrews) writes:

> But the real head scratcher is why the Solaris /usr/bin/diff would be
> searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
> looks like an application's build directory.
>
> Has the environment variable LD_LIBRARY_PATH been set to something on
> this server? If so, what happens when it is removed?

Is diff being called from within the app, where a
different environment has been set (e.g. through a shell
script wrapper?).

Also, what does "crle" say?

hth
t

0 new messages