Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The interface of fpos_t is hardly sufficient for moving a block of data

44 views
Skip to first unread message

Krzysztof Żelechowski

unread,
Apr 7, 2012, 10:41:28 AM4/7/12
to
One of the important things you cannot do with the present definition of
fpos_t is reliably moving a block of data from one position to another in
the same file.

The problem is with this layout:

0123456789
012345678XXXXXXXXXX

where the desired outcome is

0123456789
0123456780123456789

If you want to move this data, you have to work backwards, otherwise you
would get

0123456789
0123456780123456780

which is not what we want.

However, always working backward is not an option either:

1234567890123456789
XXXXXXXXXX

will give you

9123456789 in the end.

There are two possible solutions to this problem I can think of:

1. Recognise the condition that the target buffer begins in the source
buffer.

2. Copy the source to swap space first.

Solution (1) cannot be implemented directly because the standard library
provides no means of telling whether (t) < (s + l), where (s) and (t) are of
type fpos_t. It can be implemented indirectly by a method similar to
scanning disk: temporarily replace a segment in the target and observe
whether the last position of the source changes. This, however, requires
read access to the target, which need not be given. Of course, the problem
exists only if the target is identical to the source; this condition can be
recognised by moving the source pointer to the same position as the target,
writing to the target and examining the source. I am not entirely sure that
applying a file position from one stream to another stream is supported, but
it is the only way to tell, given that the implementor is not allowed to
write to the target elsewhere. Still, the scandisk solution is rather
cumbersome.

Solution (2) is error-prone when the segment to be moved is large and swap
space is skimpy.

Solution (1) can be implemented using a vendor-specific extension
(_POSIX_SOURCE). I believe that this extension (or an equivalent one) is a
necessary addition to the standard library, for the reasons detailed above.

I append a sample (imperfect) implementation for reference.

static exit_code
movesome
(FILE *const f /* source and target */ [02],
size_t /* amount of data do move */ l,
fpos_t *t /* target position, necessary if source and target are the same
file */)
{

enum FileSelector { IN, OUT, FCOUNT }; enum Direction { FORWARD, BACKWARD };

enum Configuration { IO_SIZE = 0100000 };

if (l)

{

/* determine the direction of the displacement, if any */

signed char d = -01;

#if 0

/*

* The following mechanism cannot be used in ANSI code.

* Reason:

* unsupported syntax (invalid operands to binary <=)

* and no equivalent library function exists */

fpos_t s; sizeof (s <= *t);

#endif

#ifdef _POSIX_SOURCE

/*

* this should be _LARGEFILE_SOURCE, except that it is not supported

* (Sources Bugzilla – Bug 13960) */

off_t o [FCOUNT];

if

((o [IN] = ftello (f [IN])) == -01 || fsetpos (f [OUT], t) ||

(o [OUT] = ftello (f [OUT])) == -01) perror ("seek");

else d = o [IN] <= o [OUT]? BACKWARD: FORWARD;

#else /* _POSIX_SOURCE */

size_t const l1 = l > IO_SIZE? IO_SIZE: l;

unsigned char *const b = malloc (l1 << 01);

if (b)

{

signed char d = 0;

for (; d == 0;)

{

fpos_t s;

if
(fgetpos (f [IN], &s) || fsetpos (f [OUT], t) ||
fseek (f [IN], l - 01, SEEK_CUR)) perror ("seek");

else

{

fpos_t s1;

if (fgetpos (f [IN], &s1)) perror ("seek");

else

{

int const c = fgetc (f [IN]); unsigned char const x = ! c;

size_t r = fread (b, 01, l1, f [OUT]);

/*

* Reading holes in sparse files should fetch 0.

* Therefore, if f [OUT] is at eof,

* we do not need to restore contents of f [OUT].

* Otherwise, just bail out; there is no point dealing with bad blocks. */

if (ferror (f [OUT])) perror ("read");

else

{

size_t w = r? r: l1; memset (b + l1, x, l1);

if (fwrite (b + l1, 01, w, f [OUT]) == 01)

{

fpos_t t1;

if (fgetpos (f [OUT], &t1) || fsetpos (f [IN], &s1)) perror ("seek");

else if (fgetc (f [IN]) == x) d = 01;

else

#error "Dude, this is boring. Why don’t you continue yourself?"

} else perror ("write"); break;

}

}

}

}

free (b);

}

else perror ("malloc"); return EXIT_FAILURE;

#endif /* _POSIX_SOURCE */

return d == -01? EXIT_FAILURE: copysome (f, l, t, d);

}

else /* nothing to do */ return EXIT_SUCCESS;

}

Marcin Grzegorczyk

unread,
Apr 21, 2012, 12:23:35 PM4/21/12
to
Krzysztof Żelechowski wrote:
> One of the important things you cannot do with the present definition of
> fpos_t is reliably moving a block of data from one position to another in
> the same file.
>
[snip examples]
>
> There are two possible solutions to this problem I can think of:
>
> 1. Recognise the condition that the target buffer begins in the source
> buffer.
>
> 2. Copy the source to swap space first.
>
> Solution (1) cannot be implemented directly because the standard library
> provides no means of telling whether (t)< (s + l), where (s) and (t) are of
> type fpos_t.
[snip]

A third solution is to keep track of file offsets yourself. This means
using your own file handles instead of FILE* (e.g. a struct holding a
FILE* and the offset). Of course, it is wasteful (offset information is
kept twice), you need to write your own functions for all file I/O,
etc., but it may be the only realistic solution if full portability
(beyond POSIX, plus possibly Windows-specific code) is a must.

That said, if you do not need to support files larger than LONG_MAX
outside Windows and POSIX, you may be able to get away using
POSIX-specific code for POSIX, Win32-specific code for Win32, and
ftell() everywhere else.

In any case, IMHO it is very unlikely that a function to compare objects
of type fpos_t will be added to the Standard; there seems to be almost
no demand.
--
Marcin Grzegorczyk
0 new messages