Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

read text from file, a chunk of more lines at a time

173 views
Skip to first unread message

hans

unread,
Oct 30, 2011, 3:48:40 PM10/30/11
to
How to read a file, one "record" (of more lines, with a consistent
record delimiter) at a time?

RECORD1
some
text
RECORD2
some
other
text
RECORD3
and
much
more
text
RECORD4
etc.

thanks


Paul Wallich

unread,
Oct 30, 2011, 4:09:00 PM10/30/11
to
Probably the simplest way is a loop of readline (concatenating the
string) with a check for the delimiter, nested inside a loop that does
whatever you want with the records. Oh, and the inside loop will need an
EOF check as well. You can use an explicit loop or a while/until
construct or an if or a cond with an explicit transfer of control. You
could even use recursion.

paul

Tim Bradshaw

unread,
Oct 30, 2011, 8:37:19 PM10/30/11
to
hans <schatze...@gmail.com> wrote:
> How to read a file, one "record" (of more lines, with a consistent
> record delimiter) at a time?
>

I will no doubt be crucified for saying so but: Perl. Read it in Perl, spit
out sexps from Perl, and read those with Lisp.

XeCycle

unread,
Oct 30, 2011, 9:47:54 PM10/30/11
to
What's the variable "$/"? Check perlvar(1perl).

--
Carl Lei (XeCycle)
Department of Physics, Shanghai Jiao Tong University
OpenPGP public key: 7795E591
Fingerprint: 1FB6 7F1F D45D F681 C845 27F7 8D71 8EC4 7795 E591

XeCycle

unread,
Oct 30, 2011, 9:49:48 PM10/30/11
to
XeCycle <xec...@gmail.com> writes:

> hans <schatze...@gmail.com> writes:
>
>> How to read a file, one "record" (of more lines, with a consistent
>> record delimiter) at a time?
>>
>> RECORD1
>> some
>> text
>> RECORD2
>> some
>> other
>> text
>> RECORD3
>> and
>> much
>> more
>> text
>> RECORD4
>> etc.
>
> What's the variable "$/"? Check perlvar(1perl).

Sorry, I thought I was in comp.lang.perl.misc.

But I recommend Perl, too.

Rob Warnock

unread,
Oct 30, 2011, 10:06:04 PM10/30/11
to
hans <schatze...@gmail.com> wrote:
+---------------
+---------------

If your record delimiter is actually something that matches the
regexp pattern "RECORD[0-9]+", then you can match/parse it very
easily with MISMATCH and PARSE-INTEGER:

> (loop with lines = '("RECORD1"
"some"
"text"
"RECORD2"
"some"
"other"
"text")
for line in lines
for delim-p = (eql 6 (mismatch "RECORD" line))
for datum = (if delim-p (parse-integer line :start 6) line)
collect (list delim-p datum))

((T 1) (NIL "some") (NIL "text") (T 2) (NIL "some") (NIL "other")
(NIL "text"))
>

Adding the logic that batches the thus-tagged lines into "records"
is left as an exercise for the student.


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <http://rpw3.org/>
San Mateo, CA 94403

Pascal J. Bourguignon

unread,
Oct 30, 2011, 10:14:08 PM10/30/11
to
By programming. That is, using one's brain.


I don't understand this kind of question. What problem do you have?

Do you have a problem of not knowing lisp I/O primitives?

Do you have a problem of not knowing how to read structured files?

Do you have a problem of not recognizing the structure of the file
(ie. not being able to come with a specificiation)?


What's your problem?

--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.

hans

unread,
Oct 31, 2011, 2:00:40 AM10/31/11
to
On Oct 31, 3:14 am, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:
The file is 145 MB, has about 20000 records, a record may have over
500 lines, but the record separator is simply and always *RECORD* on a
separate line.
Sorry for the above complication with RECORD1, RECORD2 ...

In Perl you would simply do
$/ = "*RECORD*";
as Tim Bradshaw and XeCycle say.

Kaz Kylheku

unread,
Oct 31, 2011, 3:20:44 AM10/31/11
to
No way. There is a new text mangler with Lisp roots.

http://www.nongnu.org/txr

@(collect)
RECORD@num
@ (collect)
@text
@ (until)
RECORD@(skip)
@ (end)
@(end)
@(output)
@ (repeat)
(@num @(rep)@text @(last)@text@(end))
@ (end)
@(end)

Test:

$ ./txr rec2sexp.txr -
RECORD1
a
b
c
d
RECORD2
x
y
z
RECORD3
d
[Ctrl-D]
(1 a b c d)
(2 x y z)
(3 d)

Pascal J. Bourguignon

unread,
Oct 31, 2011, 10:33:53 AM10/31/11
to
Ok, so it seems you can recognize more or less the structure of the
file.

You say "separator", but in your example, it looks like the 'RECORD'
token is a prefix. You must choose what file structure you have:

file ::= { record } .
record ::= 'RECORD' { line } .

file ::= { record } .
record ::= { line } 'RECORD' .

file ::= [ record { 'RECORD' record } ] .
record ::= { line } .


But you didn't answer the other questions:

>> Do you have a problem of not knowing lisp I/O primitives?
>>
>> Do you have a problem of not knowing how to read structured files?



Anton Kovalenko

unread,
Oct 31, 2011, 12:59:45 PM10/31/11
to
"Pascal J. Bourguignon" <p...@informatimago.com> writes:

>> The file is 145 MB, has about 20000 records, a record may have over
>> 500 lines, but the record separator is simply and always *RECORD* on a
>> separate line.
>> Sorry for the above complication with RECORD1, RECORD2 ...
>
> Ok, so it seems you can recognize more or less the structure of the
> file.
>
> You say "separator", but in your example, it looks like the 'RECORD'
> token is a prefix. You must choose what file structure you have:

It's a wonderful illustration of Perl vs. CL differences.

Pascal Bourguignon mentioned possible variants of input grammar. Each of
them is fairly trivial to code in CL, and it might be just as trivial in
Perl. But that's not how people program in Perl, apparently: they
recognize $/ as something that has a chance to work, and they go on and
use it because it's simple and terse and "beautiful". Now let's look
closer at this beauty.

$/="*RECORD*" is obviously wrong:

*RECORD*
An item, that would set the *RECORD* straight.
*RECORD*
A previous record triggered a bug.

$/="\n*RECORD*\n" is somewhat better, but the *first* record header (if
there are headers) won't be recognized as record separator anymore.
(For a variant without \n's, we get an empty first record in this case,
but, of course, Perl people would "solve" it by ignoring empty records).

Now, a line-sensitive regular expression could be useful as separator
instead, but $/ is _not_ a regex, so we're out of luck. It's "better" to
leave it as $/="*RECORD*. Good perl programmer would _document_ the
problem with inline *RECORD*s; that's the maximum quality we could
reasonably expect.

Seriously, such thing is "perfect" as a one-shot throwaway code _only_.
But when I want to massage a text file once and forget about it, I'd
better open it in the _editor_, and with some replace-regexps it will
become a 145-Mb file with S-expressions, which I'll then read with
CL:READ.

--
http://github.com/akovalenko/sbcl-win32-threads/wiki
+7(916)345-34-02 | Elektrostal' MO, Russia

Kaz Kylheku

unread,
Oct 31, 2011, 1:57:17 PM10/31/11
to
On 2011-10-31, Kaz Kylheku <k...@kylheku.com> wrote:
> On 2011-10-31, Tim Bradshaw <t...@tfeb.org> wrote:
>> hans <schatze...@gmail.com> wrote:
>>> How to read a file, one "record" (of more lines, with a consistent
>>> record delimiter) at a time?
>>>
>>
>> I will no doubt be crucified for saying so but: Perl. Read it in Perl, spit
>> out sexps from Perl, and read those with Lisp.
>
> No way. There is a new text mangler with Lisp roots.
>
> http://www.nongnu.org/txr
>
> @(collect)
> RECORD@num
> @ (collect)
> @text
> @ (until)
> RECORD@(skip)
> @ (end)
> @(end)
> @(output)
> @ (repeat)
> (@num @(rep)@text @(last)@text@(end))
> @ (end)
> @(end)

Improved.

- :vars on in inner collect ensure that empty collects still
produces a binding for the text variable (a binding to the empty list nil),
even if there is no match.

- Output simplified.

@(collect)
RECORD@num
@ (collect :vars (text))
@text
@ (until)
RECORD@(skip)
@ (end)
@(end)
@(output)
@ (repeat)
(@num@(rep) @text@(end))
@ (end)
@(end)

$ ./txr rec2sexp.txr -
RECORD1
RECORD2
a
RECORD3
a
b
foo RECORD4
RECORD4
[Ctrl-D]
(1)
(2 a)
(3 a b foo RECORD4)
(4)

Tim Bradshaw

unread,
Oct 31, 2011, 3:42:28 PM10/31/11
to
Kaz Kylheku <k...@kylheku.com> wrote:
> No way. There is a new text mangler with Lisp roots.

I don't think this is really different: my point wasn't really "use Perl"
it was "use the appropriate tool" (OK, I should have said that). There
probably are cases where there is a real reason to use x for everything,
but generally the "reason" is some kind of invented thing in people's
minds, and in fact it is just fine to use a combination of tools: AWK or
Perl or txr or what-have-you for file-munging and Lisp or ... for other
bits

txr looks interesting.

Carlos

unread,
Oct 31, 2011, 4:18:49 PM10/31/11
to
[Anton Kovalenko <an...@sw4me.com>, 2011-10-31 20:59]
[...]
> But that's not how people program in Perl,
> apparently: they recognize $/ as something that has a chance to work,
> and they go on and use it because it's simple and terse and
> "beautiful". Now let's look closer at this beauty.
>
> $/="*RECORD*" is obviously wrong:
>
> *RECORD*
> An item, that would set the *RECORD* straight.
> *RECORD*
> A previous record triggered a bug.
>
> $/="\n*RECORD*\n" is somewhat better, but the *first* record header
> (if there are headers) won't be recognized as record separator
> anymore. (For a variant without \n's, we get an empty first record in
> this case, but, of course, Perl people would "solve" it by ignoring
> empty records).
>
> Now, a line-sensitive regular expression could be useful as separator
> instead, but $/ is _not_ a regex, so we're out of luck. It's "better"
> to leave it as $/="*RECORD*. Good perl programmer would _document_ the
> problem with inline *RECORD*s; that's the maximum quality we could
> reasonably expect.

To solve your "problem", a Perl programmer would probably just read and
discard the first header, and then set $/ to "\n*RECORD*\n". Your
strawman Perl programmers are too incompetent, you should fire them.
--

Anton Kovalenko

unread,
Oct 31, 2011, 7:34:54 PM10/31/11
to
Carlos <an...@quovadis.com.ar> writes:

> To solve your "problem", a Perl programmer would probably just read and
> discard the first header, and then set $/ to "\n*RECORD*\n". Your
> strawman Perl programmers are too incompetent, you should fire them.

As we can see, even a compenent, caring Perl programmer proposes "read
and discard" instead of "read, check and discard", and that's in the
discussion of correctness. Why would I need a strawman?

--
Regards, Anton Kovalenko
+7(916)345-34-02 | Elektrostal' MO, Russia

Kaz Kylheku

unread,
Oct 31, 2011, 8:02:22 PM10/31/11
to
If you discard the first header, but that record is empty,
then you're again left with a header which does not match
"\n*RECORD*\n"

\n is not a good substitute for anchors like ^ and $ which are not
character matches, but a semantic extension to regexes.

--
Alan Perlis Epigram 32. Programmers are not to be measured by their ingenuity
and their logic but by the completeness of their case analysis.

Tim Bradshaw

unread,
Oct 31, 2011, 8:03:57 PM10/31/11
to
Anton Kovalenko <an...@sw4me.com> wrote:

> As we can see, even a compenent, caring Perl programmer proposes "read
> and discard" instead of "read, check and discard", and that's in the
> discussion of correctness. Why would I need a strawman?

It's this kind of thing that makes me want to take Lisp programmers out and
shoot them.

Anton Kovalenko

unread,
Oct 31, 2011, 9:08:31 PM10/31/11
to
Tim Bradshaw <t...@tfeb.org> writes:

>> As we can see, even a compenent, caring Perl programmer proposes "read
>> and discard" instead of "read, check and discard", and that's in the
>> discussion of correctness. Why would I need a strawman?
>
> It's this kind of thing that makes me want to take Lisp programmers out and
> shoot them.

Your own suggestion to spit out sexps was perferctly sane (and it
doesn't need Perl, which is a good sign). What's ridiculous here is not
Perl, or Perl's $/, it's how people stick to a specific Perl feature
($/), even after it was shown to be a wrong tool in a number of ways
(Kaz Kylheku noticed an additional danger of empty records).

Similar thing could happen with CL. Imagine that we're parsing
command-line arguments, and there's one that should be an
integer-bounded range, like 1222-33334. Let's use parse-integer. Then
it turns out that 0xDEAD-0xDEEF is also valid, and 0177-0755 should be
octal and it's silently misinterpreted as decimal. Let's insert some
special cases and still use parse-integer. Then it turns out that we
accept 0x+12-0x+FF, which we shouldn't, and we insert some more code but
_still_ use parse-integer. Then 0-283 turns out to be misdetected as
octal and signals an error on 8...

Surely it _could_ happen with CL, but I have yet to see it happening.

--
Regards, Anton Kovalenko <http://github.com/akovalenko/sbcl-win32-threads>
+7(916)345-34-02 | Elektrostal' MO, Russia

Carlos

unread,
Oct 31, 2011, 9:23:43 PM10/31/11
to
[Kaz Kylheku <k...@kylheku.com>, 2011-11-01 00:02]
Come on, you are testing a sketch algorithm to a made up specification.
He was talking about *RECORD* being not a separator but a header. Now
you say there can be empty records? Then the Perl programmer would set
$/ to "*RECORD*\n" and join records if needed.

My point is that Perl programmers aren't necessarily stupid. That's all.

Oh, and also that Perl's augmented read-line simplifies the solution a
lot.

--

Carlos

unread,
Oct 31, 2011, 9:25:25 PM10/31/11
to
[Anton Kovalenko <an...@sw4me.com>, 2011-11-01 03:34]
> Carlos <an...@quovadis.com.ar> writes:
>
> > To solve your "problem", a Perl programmer would probably just read
> > and discard the first header, and then set $/ to "\n*RECORD*\n".
> > Your strawman Perl programmers are too incompetent, you should fire
> > them.
>
> As we can see, even a compenent, caring Perl programmer proposes "read
> and discard" instead of "read, check and discard", and that's in the
> discussion of correctness. Why would I need a strawman?

Because I said "read and discard the first header", not "read and
discard anything whatsoever".

--

Carlos

unread,
Oct 31, 2011, 9:29:53 PM10/31/11
to
[Carlos <an...@quovadis.com.ar>, 2011-11-01 02:25]
^^^^^^^^^^ I think this "whatsoever" here isn't
right; I withdraw it.


Anton Kovalenko

unread,
Oct 31, 2011, 9:39:35 PM10/31/11
to
Anton Kovalenko <an...@sw4me.com> writes:

>>> As we can see, even a compenent, caring Perl programmer proposes "read
>>> and discard" instead of "read, check and discard", and that's in the
>>> discussion of correctness. Why would I need a strawman?
>>
>> It's this kind of thing that makes me want to take Lisp programmers out and
>> shoot them.

[...]

> [I]t's how people stick to a specific Perl feature
> ($/), even after it was shown to be a wrong tool in a number of ways
> (Kaz Kylheku noticed an additional danger of empty records).

[...]

> Surely it _could_ happen with CL, but I have yet to see it happening.

Well, that was a gross overstatement: it happens all the time with
FORMAT ("~a-~a" is incorrect for making symbol names from other symbol
names, but widely used). And I have an idea why it happens with Perl and
with FORMAT, but not with most other CL stuff.

If we leave out FORMAT, CL doesn't have "killer features", that is,
things so shining with elegance and brevity that we're instantly tempted
to use them. There's nothing magic about PARSE-INTEGER, or SEARCH, or
MAPCAR..., you can write your own and use it, sometimes without any
performance penalty. When a tool is appropriate, you use it; when it's
not quite there, you roll your own. The original tool we wanted to use
usually provides some good hints on the interface we want to export
(e.g. our own parse-c-integer could take string, end, start, radix,
junk-allowed too, and :test & :key are useful for many other stuff).

In Perl, OTOH, _any_ feature is a killer feature. How would I roll my
own $/ or $_, if they were not there? Therefore, each feature that we're
using for a specific task has a chance of becoming addictive: it looks
like too much work to do if we dare to throw it away, even if it's not
really so hard for a specific task. It's not hard in Perl, after all, to
read a line at a time in a loop, check for "*RECORD*", collect a list --
that kind of boring thing we would do in CL.

Kaz Kylheku

unread,
Oct 31, 2011, 9:53:30 PM10/31/11
to
On 2011-10-31, Kaz Kylheku <k...@kylheku.com> wrote:
> Improved.
>
> - :vars on in inner collect ensure that empty collects still
> produces a binding for the text variable (a binding to the empty list nil),
> even if there is no match.
>
> - Output simplified.
>
> @(collect)
> RECORD@num
> @ (collect :vars (text))
> @text
> @ (until)
> RECORD@(skip)
> @ (end)
> @(end)
> @(output)
> @ (repeat)
> (@num@(rep) @text@(end))
> @ (end)
> @(end)

Enough of the trivial Hello, World stuff, and on to a more robust, realistic
solution to the problem.

New requirements:

- produce literals, and escape occurences of " and single
escapes within literals

- catch RECORDX where X is not a number

- enforce that records start with RECORD<NUM>

We use a filter (filters are based on a trie data structure) to do the
stringification. A sprinkle of TXR's "blub-style for the Java spewing masses"
exception handling for the errors. We define a custom exception, derived
from exception type error.

We tighten the record collect with :gap 0 so that it does not skip nonmatching
garbage in its search for a header (not because we have to, but just for the
hell of it).

Look, Ma, one single regex used. For what regexes are designed for:
recognizing/validating a token.

@(deffilter lispstr ("\"" "\\\"") ("\\" "\\\\"))
@(defex badusage error)
@(try)
@ (collect :gap 0)
@ (cases)
RECORD@{num /[0-9]+/}
@ (or)
RECORD@nonnum
@ (throw badusage `RECORD followed by "@nonnum" which is not a number`)
@ (or)
@blah
@ (throw badusage `RECORD<N> missing, "@blah" found instead`)
@ (end)
@ (collect :vars (text))
@text
@ (until)
RECORD@(skip)
@ (end)
@ (end)
@ (output :filter lispstr)
@ (repeat)
(@num@(rep) "@text"@(end))
@ (end)
@ (end)
@(catch badusage (message))
@ (output)
ERROR: @message
@ (end)
@ (fail)
@(end)

Tests:

$ echo "foo" | txr rec2sexp.txr -
ERROR: RECORD<N> missing, "foo" found instead

$ echo "RECORDB" | txr rec2sexp.txr -
ERROR: RECORD followed by "B" which is not a number

$ echo "RECORD1" | txr rec2sexp.txr -
(1)

$ ./txr rec2sexp.txr -
RECORD1
a
b
c
d
RECORDB
3
ERROR: RECORD followed by "B" which is not a number

$ ./txr rec2sexp.txr -
RECORD1
a\b"cdef
g h i
j k
RECORD2
RECORD3
\
RECORD4
"
[Ctrl-D]
(1 "a\\b\"cdef" "g h i" "j k")
(2)
(3 "\\")
(4 "\"")

Tim Bradshaw

unread,
Nov 1, 2011, 3:33:11 AM11/1/11
to
Carlos <an...@quovadis.com.ar> wrote:

> My point is that Perl programmers aren't necessarily stupid. That's all.

My point was that as well, with the additional one that Lisp programmers
are often really disturbingly literal-minded (I'd like to believe it's just
the 8 of them remaining in cll, but I don't).

Tim Bradshaw

unread,
Nov 1, 2011, 3:33:12 AM11/1/11
to
Anton Kovalenko <an...@sw4me.com> wrote:

> Your own suggestion to spit out sexps was perferctly sane (and it
> doesn't need Perl, which is a good sign). What's ridiculous here is not
> Perl, or Perl's $/, it's how people stick to a specific Perl feature
> ($/), even after it was shown to be a wrong tool in a number of ways
> (Kaz Kylheku noticed an additional danger of empty records).
>

That, of course, wasn't what I meant.

Frode V. Fjeld

unread,
Nov 1, 2011, 4:11:25 AM11/1/11
to
Anton Kovalenko <an...@sw4me.com> writes:

> ("~a-~a" is incorrect for making symbol names from other symbol names,
> but widely used).

How is it incorrect?

--
Frode V. Fjeld

Anton Kovalenko

unread,
Nov 1, 2011, 4:43:35 AM11/1/11
to
"Frode V. Fjeld" <fro...@gmail.com> writes:

>> ("~a-~a" is incorrect for making symbol names from other symbol names,
>> but widely used).
>
> How is it incorrect?

*PRINT-CASE* and readtable case may be different.

(let ((*print-case* :downcase))
(format nil "~a-~a" 'foo 'bar))

=> "foo-bar", while it should be "FOO-BAR" for standard,
upper-case-readtable Lisp.

(intern *)
=> |foo-bar|

Some argue that you shouldn't really expect third-party code to work in
any non-default case settings. This argument is legitimate for readtable
case, and for code that is /read/ from a third-party source files: if
your current read-table has a non-standard case setting, there's nothing
good to expect.

However, this argument doesn't apply to *print-case*, and it doesn't
apply to FORMATting that happens when I use a third-party
macro.

(setf *print-case* :downcase)
....
(defstruct foo)

If FORMAT was used in defstruct's expansion code, we'd get |MAKE-foo|
or |make-foo| for the constructor, yet it should be MAKE-FOO.

Setting *readtable* to my own one is perfectly legitimate when /my/ file
is read, even if it's non-standard to the point of non-lispyness. If
DEFSTRUCT-like macro generates incorrect symbol names depending on
readtable settings, that's a bug too. (This paragraph is not about "~a",
but about some other problematic solutions).

Some valid ways to build a new symbol name:

(apply #'concatenate 'string `(make - ,macro-caller-supplied-name))
(concatenate 'string "MAKE-" macro-caller-supplied-name)
....

Anton Kovalenko

unread,
Nov 1, 2011, 4:47:50 AM11/1/11
to
Correction:

> (apply #'concatenate 'string `(make - ,macro-caller-supplied-name))

That would be
(apply #'concatenate 'string (mapcar 'string `(make - ,macro-caller-supplied-name))).


> (concatenate 'string "MAKE-" macro-caller-supplied-name)

That might be
(concatenate 'string "MAKE-" (string macro-caller-supplied-name)).

Tedious and error-prone enough to create a "symbolicate" helper
function (that's what is normally done).

Lieven Marchand

unread,
Nov 1, 2011, 5:00:24 AM11/1/11
to
Anton Kovalenko <an...@sw4me.com> writes:

> *PRINT-CASE* and readtable case may be different.

That's why you wrap these FORMATs in WITH-STANDARD-IO-SYNTAX.

Anton Kovalenko

unread,
Nov 1, 2011, 5:33:36 AM11/1/11
to
Lieven Marchand <m...@wyrd.be> writes:

>> *PRINT-CASE* and readtable case may be different.
>
> That's why you wrap these FORMATs in WITH-STANDARD-IO-SYNTAX.

When I wrote that people use (format nil "~a-~a") to build symbol names,
I forgot to add that they /don't/ wrap these FORMATs in
WITH-STANDARD-IO-SYNTAX.

That would solve a problem, but I used to think that FORMAT is preferred
for its brevity. Let's compare

(concatenate 'string "MAKE-" (string symbol))
(with-standard-io-syntax (format nil "MAKE-~a" symbol))

Actually, (w-s-i-s (format..)) for building symbol names seems to be
less popular than an unwrapper format, /and/ less popular than
concatenate.

Barry Fishman

unread,
Nov 1, 2011, 11:53:59 AM11/1/11
to
I think most of the less capable Perl programmers have moved on to other
languages like Ruby, PHP, and Python.

Perl does a good job of making simple programs even simpler. However,
as the what you are trying to do becomes more than simple parsing the
code starts to becomes a nightmare. It takes time to get used to Lisp,
and it takes more code to do simple things, but isn't coding about
making hard problems simpler? If all the original programmer wants to
do is make the data easier to read, or load it into a relational
database, Perl is probably just the thing.

I know that application specific languages are easy in Lisp but the
txr code seems longer than a brute force Lisp solution.

Assuming the records start with a *RECORD* line and end with the
next record or end of file, and that you don't need to hold all
the data in memory at one time, a solution might be something like:

(defun for-each-record (func &optional
(stream *standard-input*)
(delim "*RECORD*"))
"Peform FUNC on each list of lines following a DELIM line"
(let ((state :start))
(flet ((read-record ()
(loop as line = (read-line stream nil nil)
if (null line) do
(setf state :eof)
while (and line (string/= line delim))
collect line)))
(read-record) ; Skip till first record
(loop while (not (eq state :eof)) do
(funcall func (read-record))))))

I'm sure someone better at Lisp can write something even simpler.
That wasn't too painful, was it?
--
Barry Fishman

Tim Bradshaw

unread,
Nov 1, 2011, 4:32:02 PM11/1/11
to
Barry Fishman <barry_...@acm.org> wrote:

> Perl does a good job of making simple programs even simpler. However,
> as the what you are trying to do becomes more than simple parsing the
> code starts to becomes a nightmare.

I don't think it needs to become a nightmare, I have multi-thousand line
Perl programs which I don't find a significant pain to maintain, even after
years away from them. And I have seen some unspeakable horrors in Lisp.
Badly written code is badly written code, I think, in any language,

> It takes time to get used to Lisp,
> and it takes more code to do simple things, but isn't coding about
> making hard problems simpler? If all the original programmer wants to
> do is make the data easier to read, or load it into a relational
> database, Perl is probably just the thing.
>

That was, actually, my original point: use Perl (I'd probably really use
awk for the original task here) to do the data-massaging, then pump it into
your Lisp or what-have-you program which does the interesting bit. Use the
right tool, in other words.

Tim Bradshaw

unread,
Nov 1, 2011, 4:32:06 PM11/1/11
to
Anton Kovalenko <an...@sw4me.com> wrote:

> (concatenate 'string "MAKE-" (string symbol))
> (with-standard-io-syntax (format nil "MAKE-~a" symbol))
>

And now think about whatnhappens if you use Allegro in "modern" mode: which
of these is right now? (OK, "modern mode" is not a conforming CL, I think,
but you might want your code to work in it: I certainly spent a bunch of
time chasing down obscure bugs around this, long ago).

Dmitry Statyvka

unread,
Nov 1, 2011, 4:42:00 PM11/1/11
to
>>>>> Anton Kovalenko writes:

[...]

AK> That would solve a problem, but I used to think that FORMAT is
AK> preferred for its brevity. Let's compare

AK> (concatenate 'string "MAKE-" (string symbol))
AK> (with-standard-io-syntax (format nil "MAKE-~a" symbol))

And (format nil "MAKE-~a" (string symbol))

:-)

[...]

--
Dmitry Statyvka

Anton Kovalenko

unread,
Nov 1, 2011, 5:47:04 PM11/1/11
to
I normally think about modern mode when I write case-dependent code,
with the result of not breaking it /deliberately/ whenever I can. Doing
something special to support it is another idea, and I'm not too
enthusiastic about it. In real code I just use

(alexandria:symbolicate 'make- symbol) ; modern-sensitive prefix
(alexandria:symbolicate "TCP-" option) ; uppercase prefix.

And, of course, there is no format ~a in alexandria:symbolicate, but
rather an equivalent of CONCATENATE.

Now, let's compare modernity-compatible versions of my examples:

(concatenate 'string (string 'make-) (string symbol))
(with-standard-io-syntax (format nil "~a-~a" 'make- symbol))

The one without FORMAT is still shorter. I believe they are both
correct, even though I'm not confident enough in my knowledge how
with-standard-io-syntax interacts with case mode. CONCATENATE has
additional advantage of stating directly what I want, without
introducing "aesthetical" concepts.

There is another common idiom for format; something like this:

(with-standard-io-syntax ;; avoiding "straw format user" complaints
(format nil (string 'make-~a) name))

Making-NAMES-With-format. Just say no.

Tim Bradshaw

unread,
Nov 1, 2011, 7:01:56 PM11/1/11
to
Anton Kovalenko <an...@sw4me.com> wrote:

> The one without FORMAT is still shorter. I believe they are both
> correct, even though I'm not confident enough in my knowledge how
> with-standard-io-syntax interacts with case mode. CONCATENATE has
> additional advantage of stating directly what I want, without
> introducing "aesthetical" concepts.
>

My point wasn't that format is better - I don't think it is - but rather
just to reinforce that this is fiddly to get right, that lots of code gets
it wrong, and it's a bit silly for Lisp people to crow about how clever
they are compared to Perl people (not to imply you were doing such).

Kaz Kylheku

unread,
Nov 2, 2011, 1:21:59 PM11/2/11
to
On 2011-11-01, Anton Kovalenko <an...@sw4me.com> wrote:
> "Frode V. Fjeld" <fro...@gmail.com> writes:
>
>>> ("~a-~a" is incorrect for making symbol names from other symbol names,
>>> but widely used).
>>
>> How is it incorrect?
>
> *PRINT-CASE* and readtable case may be different.
>
> (let ((*print-case* :downcase))
> (format nil "~a-~a" 'foo 'bar))

This is not a problem with ~a per se, but with what you're passing
to it.

(format nil "~a-~a" (symbol-name 'foo) (symbol-name 'bar))

Problem solved.
0 new messages