Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

reading tabular data

3 views
Skip to first unread message

Zachary Kessin

unread,
Jul 16, 2002, 11:25:43 AM7/16/02
to

OK i am trying to write some software to crunch some data in Gambit
scheme and I have a pile of data that is tab delimited, is there any
easy way to just read it into a set of lists or vectors that does not
involve going character by character over a lot of floating point
variables? What would be great is some equivalent of the perl split
function.

Thanks

--Zach

Barry Margolin

unread,
Jul 16, 2002, 12:13:28 PM7/16/02
to
In article <r3k4rez...@daedalus.cs.brandeis.edu>,

Just call (read <port>) in loop until you get the EOF object. The tabs and
newlines will be trated as delimiters.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Benjamin Simon

unread,
Jul 16, 2002, 12:30:53 PM7/16/02
to
>>>>> "ZK" == Zachary Kessin <zke...@cs.brandeis.edu> writes:

ZK> OK i am trying to write some software to crunch some data in Gambit
ZK> scheme and I have a pile of data that is tab delimited, is there
ZK> any easy way to just read it into a set of lists or vectors that
ZK> does not involve going character by character over a lot of
ZK> floating point variables? What would be great is some equivalent of
ZK> the perl split function.

This doesn't answer your question directly, but for some inspiration you
might want to read the following article. It may give you some ideas as
to how to process data in Scheme.

http://www.glug.org/docbits/sort-grep-tutorial.txt

-Ben

Scott G. Miller

unread,
Jul 16, 2002, 12:45:53 PM7/16/02
to
Barry Margolin wrote:
> In article <r3k4rez...@daedalus.cs.brandeis.edu>,
> Zachary Kessin <zke...@cs.brandeis.edu> wrote:
>
>>OK i am trying to write some software to crunch some data in Gambit
>>scheme and I have a pile of data that is tab delimited, is there any
>>easy way to just read it into a set of lists or vectors that does not
>>involve going character by character over a lot of floating point
>>variables? What would be great is some equivalent of the perl split
>>function.
>
>
> Just call (read <port>) in loop until you get the EOF object. The tabs and
> newlines will be trated as delimiters.
>

I believe that won't work, as I think he has several lines of data, each
of which should be treated as a list. The following should do the trick
if my assumptions were correct. Tweak as necessary.

Scott


;; Assumes that the data starts immediately, and the only whitespace
;; is a tab between each value, and a newline to separate each list
;; 3\t4\t5\n6\t\7\n becomes '((3 4 5) (6 7))
;;
;; Requires SRFI-23, 8, and escaping continuations
(define split-on-tabs
(letrec ((read-row
(lambda (input-port)
(call/cc
(lambda (k)
(let loop ((acc '()))
(let* ((rv (read input-port))
(next-char (read-char input-port)))
(cond ((or (eqv? next-char #\newline)
(eof-object? next-char))
(k (reverse (cons rv acc)) next-char))
((eqv? next-char #\tab) (loop (cons rv acc)))
(else
(error 'split-on-tab
"Invalid data separator."))))))))))
(lambda (input-port)
(receive (row end-case) (read-row input-port)
(if (eof-object? end-case)
'()
(cons row (split-on-tabs input-port)))))))

Bruce Lewis

unread,
Jul 16, 2002, 1:18:09 PM7/16/02
to
Zachary Kessin <zke...@cs.brandeis.edu> writes:

> What would be great is some equivalent of the perl split function.

Yes, that's all you need. Gambit, like probably every Scheme
implementation, has read-line already.

See brl-split in gnu/brl/stringfun.scm in BRL. Not the most efficient
implementation, but it works.

--
<brlewis@[(if (brl-related? message) ; Bruce R. Lewis
"users.sourceforge.net" ; http://brl.codesimply.net/
"alum.mit.edu")]>

ol...@pobox.com

unread,
Jul 16, 2002, 8:48:42 PM7/16/02
to
Zachary Kessin <zke...@cs.brandeis.edu> wrote in message news:<r3k4rez...@daedalus.cs.brandeis.edu>...

If you're reading floating-point numbers, a simple read suffices:

(call-with-input-string
"1.0\t2.0\t\t 3.0 4.0 12345678901234556789"
(lambda (port)
(let loop ((lst '()))
(let ((item (read port)))
(if (eof-object? item) (reverse lst)
(loop (cons item lst)))))))
; => (1. 2. 3. 4. 12345678901234556789) on Gambit-C 3.0

> What would be great is some equivalent of the perl split function

http://pobox.com/~oleg/ftp/Scheme/util.html#string-split
which works on every Scheme I tried it on. It certainly works on
Gambit.


> Gambit, like probably every Scheme implementation, has read-line already.

It does not, but
http://pobox.com/~oleg/ftp/Scheme/parsing.html
does -- again, for any R5RS Scheme. BTW, if you need a more complex
parsing, you might want to look into next-token or next-token-of.

Zachary Kessin

unread,
Jul 16, 2002, 11:26:09 PM7/16/02
to
Bruce Lewis <brl...@yahoo.com> writes:

Thanks for all the good advice, what I ended up doing (at least for
now) is to simply output the data from the first program as if it was
a scheme s-expression then it was easy to read it in, as scheme will
parse it. I am looking at those libraries.

--Zach

Bruce Lewis

unread,
Jul 17, 2002, 9:10:08 AM7/17/02
to
Zachary Kessin <zke...@cs.brandeis.edu> writes:

> Thanks for all the good advice, what I ended up doing (at least for
> now) is to simply output the data from the first program as if it was
> a scheme s-expression then it was easy to read it in, as scheme will
> parse it. I am looking at those libraries.

I had assumed the output format of the first program was outside your
control. Definitely the s-expression approach is the way to go.

David Rush

unread,
Jul 17, 2002, 4:34:05 AM7/17/02
to
"Scott G. Miller" <scgm...@freenetproject.org> writes:
> Barry Margolin wrote:
> > In article <r3k4rez...@daedalus.cs.brandeis.edu>,
> > Zachary Kessin <zke...@cs.brandeis.edu> wrote:
> >
>
> >>OK i am trying to write some software to crunch some data in Gambit
> >>scheme and I have a pile of data that is tab delimited, is there any
> >>easy way to just read it into a set of lists or vectors that does not
> >>involve going character by character over a lot of floating point
> >>variables? What would be great is some equivalent of the perl split
> >> function.
>
> > Just call (read <port>) in loop until you get the EOF object. The
> > tabs and
>
> > newlines will be trated as delimiters.
>
> I believe that won't work, as I think he has several lines of data,
> each of which should be treated as a list. The following should do
> the trick if my assumptions were correct. Tweak as necessary.

I think your assumptions are correct, but my goodness...

> ;; Requires SRFI-23, 8, and escaping continuations
> (define split-on-tabs
> (letrec ((read-row
> (lambda (input-port)
> (call/cc

^^^^^^^^

It's not this hard, especially since he's using Gambit (which has
string-ports).

; assuming read-line, which is trivial to implement (since Gambit
; doesn't have it).

(define (read-tab-delimited port)
; shlurp the tab-del lines off to port into a list of lists
(let get-lines ((line (read-line port)) (lines '()))
(if (eof-object? line)
(reverse lines)
(let ((line-port (open-input-string line)))
(let get-fields ((datum (read line-port)) (data '()))
(if (eof-object? datum)
(get-lines (read-line port)
(cons (reverse data) lines))
(get-fields (read line-port) (cons datum data))
))))))

david rush
--
Scheme: Because closures are cool.
-- Anton van Straaten (the Scheme Marketing Dept from c.l.s)

Zachary Kessin

unread,
Jul 17, 2002, 12:53:24 PM7/17/02
to
Bruce Lewis <brl...@yahoo.com> writes:

In this case it was easy enough to make it what I want, but I'm sure
sooner or later it won't be :). So the information was useful.

--Zach

Biep @ http://www.biep.org/

unread,
Jul 19, 2002, 6:58:00 AM7/19/02
to
<ol...@pobox.com> wrote in message
news:7eb8ac3e.02071...@posting.google.com...
> http://pobox.com/~oleg/ftp/Scheme/util.html#string-split
> http://pobox.com/~oleg/ftp/Scheme/parsing.html

Since you point to your own website, I suppose this is not in SLIB. Any
reason why not? I suppose mentioning SLIB is useful given the discussions
going on about Scheme vs. CL libraries..

..which doesn't mean I don't highly appreciate your productivity, both
constructive (as in writing code) and destructive (as in uncovering
problems/ambiguities) regarding Scheme..

--
Biep
Reply via http://www.biep.org


Adrian Kubala

unread,
Jul 19, 2002, 8:35:34 AM7/19/02
to

> <ol...@pobox.com> wrote in message
> news:7eb8ac3e.02071...@posting.google.com...
>> http://pobox.com/~oleg/ftp/Scheme/util.html#string-split
>> http://pobox.com/~oleg/ftp/Scheme/parsing.html
>
> Since you point to your own website, I suppose this is not in SLIB.
> Any reason why not? I suppose mentioning SLIB is useful given the
> discussions going on about Scheme vs. CL libraries..

I'm curious why more SRFIs aren't in slib. In particular 12, 13, and
26, which don't seem so widely supported but should just be a matter
of dropping in the reference code.

adrian

ol...@pobox.com

unread,
Jul 19, 2002, 8:39:49 PM7/19/02
to
"Biep @ http://www.biep.org/" <repl...@my-web-site.com> wrote in message news:<ah8qv8$qq9gf$1...@ID-63952.news.dfncis.de>...

> <ol...@pobox.com> wrote in message
> news:7eb8ac3e.02071...@posting.google.com...
> > http://pobox.com/~oleg/ftp/Scheme/util.html#string-split
> > http://pobox.com/~oleg/ftp/Scheme/parsing.html
>
> Since you point to your own website, I suppose this is not in SLIB. Any
> reason why not?

A function find-string-from-port? is in SLIB. The other functions are
not. I asked the maintainer of SLIB a couple of times if he would be
interested in including the rest. I received no reply. I tried to push
string-split into SRFI-13, without success. I understand: different
people have different design and inclusion criteria. Mine is
minimalism: I'd like to be able to remember at least the names of the
library functions. Only when I observe that I keep writing roughly the
same code three or more times that I start considering it for
inclusion into the library.

Zachary Kessin

unread,
Jul 24, 2002, 7:11:34 PM7/24/02
to
Bruce Lewis <brl...@yahoo.com> writes:

Actualy gambit does not apear to have read-line. Or if it does I can't
find it in the docs and the system does not appear to see it.

--Zach

Bruce Lewis

unread,
Jul 25, 2002, 9:05:32 AM7/25/02
to
Zachary Kessin <zke...@cs.brandeis.edu> writes:

> Actualy gambit does not apear to have read-line. Or if it does I can't
> find it in the docs and the system does not appear to see it.

Sorry, my google search on "gambit read-line" led me to this:

http://www.iro.umontreal.ca/~feeley/cours/ift2030/doc/demo11.txt

(read-line port) ; extension propre a Gambit 4.0

I read it as read-line being a Gambit 4.0 extension to the Scheme
standard, but maybe it meant something else.

0 new messages