File.seek() interface

15 views
Skip to first unread message

Wolverian

unread,
Jul 7, 2005, 1:18:40 PM7/7/05
to perl6-l...@perl.org
Hello,

gaal is porting the Perl 5 filehandle functions to a Perl 6 OO
interface. The Perl 5 interface with global constants from Fcntl strikes
me as severely lacking in elegance and OO.

$fh.seek(-10, SEEK_END);

Instead of globals, how about a :from adverb?

$fh.seek(-10, :from<end>);

Or maybe we don't need such an adverb at all, and instead use

$fh.seek($fh.end - 10);

I'm a pretty high level guy, so I don't know about the performance
implications of that. Maybe we want to keep seek() low level, anyway.

Subject to change when it comes to the Perl 6 Unicode semantics, of
course. :)

Any thoughts/decisions?

--
wolverian

Dave Whipp

unread,
Jul 7, 2005, 2:44:00 PM7/7/05
to perl6-l...@perl.org
Wolverian wrote:
> Or maybe we don't need such an adverb at all, and instead use
>
> $fh.seek($fh.end - 10);
>
> I'm a pretty high level guy, so I don't know about the performance
> implications of that. Maybe we want to keep seek() low level, anyway.

> Any thoughts/decisions?

We should approach this from the perspective that $fh is an iterator, so
the general problem is "how do we navigate a random-access iterator?".

I have a feeling that the "correct" semantics are closer to:

$fh = $fh.file.end - 10

though the short form ($fh = $fh.end - 10) is a reasonable shortcut.

Paul Seamons

unread,
Jul 7, 2005, 4:15:19 PM7/7/05
to perl6-l...@perl.org
> We should approach this from the perspective that $fh is an iterator, so
> the general problem is "how do we navigate a random-access iterator?".

Well - I kind of thought that $fh was a filehandle that knew how to behave
like an iterator if asked to do so. There are too many applications that
need to jump around using seek.

The options that need to be there are:
seek from the beginning
seek from the end
seek from the current location

Now it could be simplified a bit to the following cases:

$fh.seek(10); # from the beginning forward 10
$fh.seek(-10); # from the end backwards 10
$fh.seek(10, :relative); # from the current location forward 10
$fh.seek(-10, :relative); # from the current location backward 10

Paul

Wolverian

unread,
Jul 7, 2005, 4:22:41 PM7/7/05
to perl6-l...@perl.org
On Thu, Jul 07, 2005 at 08:18:40PM +0300, wolverian wrote:
> I'm a pretty high level guy, so I don't know about the performance
> implications of that. Maybe we want to keep seek() low level, anyway.

Sorry about replying to myself, but I want to ask a further question on
this.

Would it be possible to make this work, efficiently:

for =$fh[-10 ...] -> $line { ... }

to iterate over the last ten lines?

Can we generalise that to be as performance-effective as seek()?

Okay, that was two questions.

--
wolverian

Luke Palmer

unread,
Jul 7, 2005, 6:42:57 PM7/7/05
to perl6-l...@perl.org
On 7/7/05, wolverian <wo...@sci.fi> wrote:
> On Thu, Jul 07, 2005 at 08:18:40PM +0300, wolverian wrote:
> > I'm a pretty high level guy, so I don't know about the performance
> > implications of that. Maybe we want to keep seek() low level, anyway.
>
> Sorry about replying to myself, but I want to ask a further question on
> this.
>
> Would it be possible to make this work, efficiently:
>
> for =$fh[-10 ...] -> $line { ... }
>
> to iterate over the last ten lines?

No. Most notably because -10 ... gives (-10, -9, ... -1, 0, 1, 2, 3,
...). I also don't think that without a special interface filehandles
can behave as an array of lines. If they could, then you'd have:

for $fh[-10..-1] -> $line {...}

> Can we generalise that to be as performance-effective as seek()?

Perhaps. That's what tail(1) does. But it's a tricky problem. You
have to guess where the end should be, then do a binary search on the
number of lines after your position. Sounds like a job for a
specialized module to me.

If you don't care about speed, then I suppose you could even do:

for [ =$fh ].[-10..-1] -> $line {...}

Which won't be speed efficient, and may or may not be memory
efficient, depending on the implementation. I'd guess not.

Luke

Larry Wall

unread,
Jul 7, 2005, 8:58:53 PM7/7/05
to perl6-l...@perl.org
On Thu, Jul 07, 2005 at 02:15:19PM -0600, Paul Seamons wrote:
: > We should approach this from the perspective that $fh is an iterator, so

: > the general problem is "how do we navigate a random-access iterator?".
:
: Well - I kind of thought that $fh was a filehandle that knew how to behave
: like an iterator if asked to do so.

Yes, basically. And they fall into that class of iterators that may
or may not know how to back up, so it may be quite possible to seek forward
10 items but not backward 10 items, if "item" is, for example, a line
defined by an asymmetric match rule.

: There are too many applications that

: need to jump around using seek.

We need to have a POSIXly correct layer, but that's no reason not to have
other layers on top of that with more useful semantics. I view files
as just funny-looking strings, in the abstract. So the same issues
arise that we've talked about concerning strings in Unicode, and that's
even before we get into counting lines or paragraphs. Like a string,
a file may naturally allow itself to be viewed as bytes (POSIX), codepoints,
graphemes, and/or characters in the current language. It can allow
multiple views into the same abstract string, but as with strings,
it may limit the minimum and maximum abstraction level you're allowed
to deal with the file. And depending on the file/string representation,
one of the abstraction levels is likely to be very efficient to seek
around in, and others have to be emulated by visiting all the intermediate
items. Some file structures are great at indexing into lines but lousy
at indexing into anything smaller than that. A file position in such
a file is not even going to be an integer, but a line number plus an
offset into the line.

I realize we most of us come from the POSIXly-correct worldview
that all files are really just sequence of bytes that can always be
indexed by integer. This view doesn't make a lot of sense any more
in the world of Unicode. We see various versions of Unix/Linux being
caught with their pants down because there's no metadata to tell you
the character encoding of the filenames, for instance. Perl 6 must
not fall into that trap.

In the discussion of seek(), this primarily means that you must keep
reminding yourself that file positions (and string positions) are
not necessarily numbers. Treat them as opaque recipes for navigating
into a file, because you don't know what the most efficient underlying
representation is. It might even be some kind of URI.

At the same time, all relative navigation *must* specify the units.
We can't simply assume bytes any more. And if you specify navigation
in a smaller unit than the natural unit of the file/string in question,
you have to either give it a round-up or round-down instruction, or
be prepared to handle an exception of some sort. A UTF-8 handler has
the nice property that it can tell if it has landed in the middle of
a character, but it can't read your mind about what to do when that happens.

: The options that need to be there are:


: seek from the beginning
: seek from the end
: seek from the current location
:
: Now it could be simplified a bit to the following cases:
:
: $fh.seek(10); # from the beginning forward 10
: $fh.seek(-10); # from the end backwards 10

Apart from the units and allignment problem, does $fh.seek(-0) mean
the beginning or the end of the file?

: $fh.seek(10, :relative); # from the current location forward 10


: $fh.seek(-10, :relative); # from the current location backward 10

Again, 10 whats? Bytes? Codepoints? Lines?

I think I'd actually like to divorce the notion of going to a
particular position from the notion of relative navigation. So I'm
in favor of $fh.seek taking *only* an opaque position, and $fh.beg
and $fh.cur and $fh.end returning opaque positions. Then there are
navigation commands that can take an opaque position and move relative
to them a given number of units, and we force the units to be specified.
Something like:

$fh.pos = $fh.pos + 10`lines

Arguably, we could probably admit

$fh.pos = 10`bytes

for the case of seeking from the begining. But I'd kind of like

$fh.pos = 10

to be considered an error.

Note also that we can treat string positions exactly the same way.
All the rule-ishly returned positions are defined as opaque objects already.

Larry

Paul Hodges

unread,
Jul 7, 2005, 9:15:03 PM7/7/05
to Larry Wall, perl6-l...@perl.org

--- Larry Wall <la...@wall.org> wrote:
> Arguably, we could probably admit
>
> $fh.pos = 10`bytes
>
> for the case of seeking from the begining. But I'd kind of like
>
> $fh.pos = 10
>
> to be considered an error.

It seems a logical extension also to say

$fh.pos += 10`bytes

as shorthand for

$fh.pos = $fh.cur + 10`bytes

Likewise for -=

But then that begs the questions of *= (not too nuts), /= (same),
%= (great for fixed length records?) and the predictable other host of
operators.

Am I reaching?

Paul


____________________________________________________
Sell on Yahoo! Auctions – no fees. Bid on great items.
http://auctions.yahoo.com/

Wolverian

unread,
Jul 7, 2005, 10:03:21 PM7/7/05
to perl6-l...@perl.org
On Thu, Jul 07, 2005 at 05:58:53PM -0700, Larry Wall wrote:
> $fh.pos = $fh.pos + 10`lines

I'm sorry if this has been discussed, but is the ` going to be in
Perl 6? I like it. :) How does it work, though?

sub *infix:<`> (Num $amount, Unit $class) { $class.new($amount) }

Or so?

Now I'm tempted to make it a generic infix .new.

(args)`Class;

It's almost as confusing as SML!

--
wolverian

Larry Wall

unread,
Jul 7, 2005, 11:17:59 PM7/7/05
to perl6-l...@perl.org
On Thu, Jul 07, 2005 at 06:15:03PM -0700, Paul Hodges wrote:
:
:
: --- Larry Wall <la...@wall.org> wrote:
: > Arguably, we could probably admit
: >
: > $fh.pos = 10`bytes
: >
: > for the case of seeking from the begining. But I'd kind of like
: >
: > $fh.pos = 10
: >
: > to be considered an error.
:
: It seems a logical extension also to say
:
: $fh.pos += 10`bytes
:
: as shorthand for
:
: $fh.pos = $fh.cur + 10`bytes

.pos and .cur are the same thing. So just call them both .pos, I think.

: Likewise for -=


:
: But then that begs the questions of *= (not too nuts), /= (same),
: %= (great for fixed length records?) and the predictable other host of
: operators.
:
: Am I reaching?

No. The stupid people are the ones proposing to outlaw stupidity. :-)

Larry

James Mastros

unread,
Jul 9, 2005, 2:02:35 AM7/9/05
to perl6-l...@perl.org
Wolverian wrote:
> On Thu, Jul 07, 2005 at 05:58:53PM -0700, Larry Wall wrote:
>> $fh.pos = $fh.pos + 10`lines
>
> I'm sorry if this has been discussed, but is the ` going to be in
> Perl 6? I like it. :)
I was hoping it was going to be in the standard library, but non-core.
Using it for manipulating .pos, OTOH, would seem to make it core, which
I suppose is probably worth it.

> How does it work, though?
>
> sub *infix:<`> (Num $amount, Unit $class) { $class.new($amount) }
>
> Or so?
>
> Now I'm tempted to make it a generic infix .new.
>
> (args)`Class;

The problem with it is that somehow we have to get 5`m / 30`s to work,
even though m is an operator, which AFAIK means it needs to be a macro,
or the moral equivalent (is parsed).

Also, having every unit be a like-named class would very much crowd the
root of the namespace.

-=- James Mastros
theorbtwo

Reply all
Reply to author
Forward
0 new messages