> which creates a comma quoted string and tokenizes it twice, once as an > array and as a series of five scalar variables. This outputs the > following:
> ARRAY => "Mr. John J. Jones Jr." > SCALARS => ("Mr., John, J., Jones, Jr.")
> How would you tokenize a comma quoted string in Lisp?
> TIA, CC
http://www.cliki.net/SPLIT-SEQUENCE btw. sometimes google is better and faster way to find answers to trivial questions like this one, you just need google, or meaby cliki.net I dont belive that someone who want to learn basic things need to ask on group
> * cartercc > Wrote on Sun, 1 Feb 2009 20:09:57 -0800 (PST):
> > Sorry, guys, but I'm just a Lisp newbie. My main language is Perl > > and in my day job I do a lot of data transformation. In Perl, I > > can do this -
> > $string = '"Mr.","John","J.","Jones","Jr."';
> Typically one would use one of any many CSV file parsers (you can > search CLL for these) in lisp to handle such data. However in this > case the strings can be readily parsed by the Common Lisp reader, > which can be trivially modified to handle the commas.
> This is perhaps a more advanced technique not suitable for the rank > newbie, nevertherless....
> I assume the actual data comes from a file like this:
> The following code illustrates how you can copy a readtable > ($COMMA-RT), change the syntax for the #\, character, and wrap up > calls to CL:READ after binding CL:*READTABLE* so as to parse the file > into a list of lists.
> If you are serious about learning, you should look up the > specifications of each new function in the standated (ANS, common > lisp hyperspec) to see exactly what it is defined to do.
p IO.readlines("file.text").map{|line| line.gsub(/^"|"\n/,"").split /","/ }
On Feb 2, 12:15 am, Madhu <enom...@meer.net> wrote:
> Typically one would use one of any many CSV file parsers (you can search > CLL for these) in lisp to handle such data. However in this case the > strings can be readily parsed by the Common Lisp reader, which can be > trivially modified to handle the commas.
> This is perhaps a more advanced technique not suitable for the rank > newbie, nevertherless....
> I assume the actual data comes from a file like this:
You are absolutely correct! I have very little discretionary time, and am attempting to learn Lisp in about ten minutes a day, but if I can start using Lisp to do real work, then I have an excuse to use it during work time. At work my typical practice is to open an infile and an outfile, read each row from the infile, manipulate each row, and write it to the outfile. This is a typical Perl routine for doing this: --------------typical Perl routine---------------- open INFILE, "<", "infile.txt"; open OUTFILE, ">", "outfile.txt"; while (<INFILE>) #reads each row until EOF
}
chomp; #removes the newline from the row $_ =~ s/^"//; #removes the leading double quote char $_ =~ s/"$//; #removes the trailing double quote char @row = split /","/; #reads each datum into an array element #manipulate data as necessary to accomplish task #when finished with data transformation, then $row = join ,",", @row; #creates a comma quoted string from @row print OUTFILE $row, "\n"; #writes $row to OUTFILE with newline
}
close OUTFILE; close INFILE;
My motive was to break this process down into bite sized chunks, and you hosed me! That's okay, but it's really too advanced for what I need now.
Here's the deal: I'm used to dealing with individual items of data, like ID numbers, email addresses, names, dates, all kinds of numbers and identifiers, etc. I assign these to variables so I can manipulate them in an automated fashion, as in a loop. Your solution does tokenize the input string, which is what I asked, but I also need to assign the individual elements to variables so that I can learn how to manipulate the elements. IOW, I don't need a list, but an array or a set of scalars that I can manipulate.
I guess that I should have asked how to tokenize a string AND ASSIGN EACH TOKEN TO ITS OWN VARIABLE. Anyway, when I have time, I'm going to try your solution and milanj's, and report back. It will probably be several days. In the meantime, thanks for your response.
> The following code illustrates how you can copy a readtable ($COMMA-RT), > change the syntax for the #\, character, and wrap up calls to CL:READ > after binding CL:*READTABLE* so as to parse the file into a list of > lists.
> If you are serious about learning, you should look up the specifications > of each new function in the standated (ANS, common lisp hyperspec) to > see exactly what it is defined to do.
On Mon, 02 Feb 2009 06:25:19 -0800, cartercc wrote: > attempting to learn Lisp in about ten minutes a day, but if I can start
That ain't gonna work, you are wasting your time. There is a minimal fixed cost (in time) if you want to learn anything, if you don't have that, it is futile and you will just be frustrated.
On Feb 2, 3:25 pm, cartercc <carte...@gmail.com> wrote:
> during work time. At work my typical practice is to open an infile and > an outfile, read each row from the infile, manipulate each row, and > write it to the outfile. This is a typical Perl routine for doing > this:
Another typical Perl technique is to have a look in CPAN to see if anyone's already created a more robust version of the wheel you're considering inventing :)
There's no real equivalent of CPAN for Lisp, google and cliki.net come closest. Google turned up something called "csv-parser" that (what d'you know) includes a macro to iterate through csv records and bind the fields to variables.
Also check out destructuring-bind:
Perl: ($name, $rank, $number) = @array ;
Lisp: (destructuring-bind (name rank number) list ... )
destructuring-bind can destructure extended lambda lists, which can be a lot more complex than simple lists.
> On Mon, 02 Feb 2009 06:25:19 -0800, cartercc wrote:
>> attempting to learn Lisp in about ten minutes a day, but if I can start
> That ain't gonna work, you are wasting your time. There is a minimal > fixed cost (in time) if you want to learn anything, if you don't have > that, it is futile and you will just be frustrated.
But I think 70 minutes once a week would be way better than ten minutes a day. From trying to do difficult things while commuting, I'd say that 20 minutes is the absolutely minimal time unit. Below that, and you waste most of the time trying to figure out where you were. -- (espen)
On Feb 2, 9:39 am, Tamas K Papp <tkp...@gmail.com> wrote:
> On Mon, 02 Feb 2009 06:25:19 -0800, cartercc wrote: > > attempting to learn Lisp in about ten minutes a day, but if I can start
> That ain't gonna work, you are wasting your time. There is a minimal > fixed cost (in time) if you want to learn anything, if you don't have > that, it is futile and you will just be frustrated.
The expression 'ten minutes' was not meant to be taken literally, but figuratively. I'm working my way through Wilensky, and my goal is to read a chunk of the text and WRITE(!) some Lisp code daily.
There's no way you can learn a skill without practice, whether it be cooking, golf, playing a musical instrument, or programming. Practicing (at least) ten minutes a day is much better than practicing zero minutes a day, and daily practice means you can build on yesterday's progress. It's obviously better to spend more time developing a skill than less time, but the way to do that is to reach a point where you are no longer practicing but performing, and the way to do that is to turn practice into performance.
On Feb 2, 2:25 pm, cartercc <carte...@gmail.com> wrote:
> I guess that I should have asked how to tokenize a string AND ASSIGN > EACH TOKEN TO ITS OWN VARIABLE.
I assume you have installed cl-ppcre. It provides a Perl like regular expression library for Common Lisp (CL).
Line starting with semicolons (;) are comments in CL. I show the line in Perl as a comment first, then the similar CL line.
Unfortunately there is no built in way in CL to avoid escaping the double quote characters in the string. However, usually we would read such a line from a file so this ugliness would not be apparent. Also, we could define a read macro to make this neater if we needed to do it often...
This is not an ideal solution to your larger problem, but it is a direct answer to your question which I hope you find useful.
I programmed in Perl for many years and have been learning Lisp these last few years. Perl was built (in part) to work easily with files and strings and regular expressions. If we think of the domain of text munging as a dirt track then Perl is like a BMX bicycle, designed to work well off road. Maybe Ruby is a smaller, flashier sort of bicycle that people make 'ooh' noises at as it zips past.
So because of the need to mandate cl-ppcre, and to qualify the ugliness of escaping lots of double quotes it might appear that Lisp is not as suitable as Perl for text munging. But that is only true for as long as it takes you to work out that Lisp is not yet another type of bicycle but rather a 50 year old man who has been making bicycles with his bare hands since before you were born.
It took me a long time to work this out and there is no short cut. But I think it is worth it.
Thanks, this was exactly what I needed to see. I understand that it's not idiomatic Lisp, but it does show me something about Lisp (much as an interlinear translation of Cicero isn't idiomatic English, but it does show how Latin relates to English).
Two things:
(1) Perl was created for exactly this kind of work while Lisp wasn't, and it makes sense that a tool made for a particular job will do that job better than a superior tool NOT made for a particular job. I like your comparison of data munging (which is exactly what I do) to a dirt track. Lisp may help me manufacture data mungers, which is one reason why I am learning it.
(2) I don't have cl-ppcre. If I have trouble getting it, I'll certainly ask for help.
> On Feb 2, 2:25 pm, cartercc <carte...@gmail.com> wrote:
> > I guess that I should have asked how to tokenize a string AND ASSIGN > > EACH TOKEN TO ITS OWN VARIABLE.
> I assume you have installed cl-ppcre. It provides a Perl like regular > expression library for Common Lisp (CL).
> Line starting with semicolons (;) are comments in CL. I show the line > in Perl as a comment first, then the similar CL line.
> Unfortunately there is no built in way in CL to avoid escaping the > double quote characters in the string. However, usually we would read > such a line from a file so this ugliness would not be apparent. Also, > we could define a read macro to make this neater if we needed to do it > often...
> This is not an ideal solution to your larger problem, but it is a > direct answer to your question which I hope you find useful.
> I programmed in Perl for many years and have been learning Lisp these > last few years. Perl was built (in part) to work easily with files > and strings and regular expressions. If we think of the domain of > text munging as a dirt track then Perl is like a BMX bicycle, designed > to work well off road. Maybe Ruby is a smaller, flashier sort of > bicycle that people make 'ooh' noises at as it zips past.
> So because of the need to mandate cl-ppcre, and to qualify the > ugliness of escaping lots of double quotes it might appear that Lisp > is not as suitable as Perl for text munging. But that is only true > for as long as it takes you to work out that Lisp is not yet another > type of bicycle but rather a 50 year old man who has been making > bicycles with his bare hands since before you were born.
> It took me a long time to work this out and there is no short cut. > But I think it is worth it.
cartercc <carte...@gmail.com> writes: > I guess that I should have asked how to tokenize a string AND ASSIGN > EACH TOKEN TO ITS OWN VARIABLE. Anyway, when I have time, I'm going to > try your solution and milanj's, and report back. It will probably be > several days. In the meantime, thanks for your response.
Well, unless there is a fixed format, or else a very limited number of potential tokens in the string, you really don't want to assign each token to its own variable.
I would expect that having a list or vector would be a lot more useful. The list if you expect to processin items serially and the vector if you want to have random or indexed access to the elements.
It certainly seems a lot clearer to me to have vector (array) accessors for indexed items rather than variables named arg1, arg2, arg3, arg4, etc. Variables would only make sense if you had fixed meanings for the positions in the vector. Even then, it might make more sense to still use a vector and write your own accessor functions or macros:
Also look at the following CL functions, which you *do* already have:
POSITION SEARCH MISMATCH PARSE-INTEGER SUBSEQ REPLACE CONCATENATE
To get the most from these, you will need to read & understand the sections in the CLHS about "bounding index designators" and the :START and :END [and sometimes :START2 and :END2] keyword arguments which nearly all sequence functions take. Also learn about the :KEY and :TEST keyword arguments, again, which nearly all sequence functions take. [Oh, and :FROM-END, too.]
Additional hints:
- Coming from a C or Perl world, you may find the following bits of syntactic sugar helpful:
(defun join (delimiter &rest strings) (apply #'concatenate 'string (if (zerop (length delimiter)) ; If explicit "" or NIL. strings ; do short-circuit optimization. (loop for s on strings ; Long way. collect (car s) when (cdr s) collect delimiter))))
- MISMATCH is one of more underappreciated string-bashing functions in CL, since it actually tells you how much *was* matched. ;-} Very useful [especially with the :START2/:END2 options] to tell whether a (possibly-abbreviated) fixed substring exists at some specific location in a string, *without* having to do a SUBSEQ first to extract the portion to be tested. [Avoids unnecessary consing.]
-Rob
----- Rob Warnock <r...@rpw3.org> 627 26th Avenue <URL:http://rpw3.org/> San Mateo, CA 94403 (650)572-2607
> On Feb 2, 5:09 am, cartercc <carte...@gmail.com> wrote:
> > Sorry, guys, but I'm just a Lisp newbie. My main language is Perl and > > in my day job I do a lot of data transformation. In Perl, I can do > > this -
> > which creates a comma quoted string and tokenizes it twice, once as an > > array and as a series of five scalar variables. This outputs the > > following:
> > ARRAY => "Mr. John J. Jones Jr." > > SCALARS => ("Mr., John, J., Jones, Jr.")
> > How would you tokenize a comma quoted string in Lisp?
> > TIA, CC
> http://www.cliki.net/SPLIT-SEQUENCE > btw. sometimes google is better and faster way to find answers to > trivial questions like this one, you just need google, or meaby > cliki.net > I dont belive that someone who want to learn basic things need to ask > on group
Sorry but I can't agree. It is precisely answers to questions like this, and the Lisp code examples that seem so difficult to find, that help to sell the Lisp idea to newbies. Reading code is a vital part of learning a language, and this group is good at dishing out useful snippets.
A Google search for Lisp code repositories is, frankly, useless. The links that appear are 15-20 years old, and don't exist anymore. So keep up the good work guys. Answers to 'trivial' questions like this are what will help bring new blood into the Lisp fold. Being sniffy and saying it's beneath this group is just silly.
If there is a positive value to be got from the replies, then the OP was correct.
<webmas...@flymagnetic.com> wrote: > On Feb 2, 4:44 am, "mil...@gmail.com" <mil...@gmail.com> wrote:
> > On Feb 2, 5:09 am, cartercc <carte...@gmail.com> wrote:
> > > Sorry, guys, but I'm just a Lisp newbie. My main language is Perl and > > > in my day job I do a lot of data transformation. In Perl, I can do > > > this -
> > > which creates a comma quoted string and tokenizes it twice, once as an > > > array and as a series of five scalar variables. This outputs the > > > following:
> > > ARRAY => "Mr. John J. Jones Jr." > > > SCALARS => ("Mr., John, J., Jones, Jr.")
> > > How would you tokenize a comma quoted string in Lisp?
> > > TIA, CC
> >http://www.cliki.net/SPLIT-SEQUENCE > > btw. sometimes google is better and faster way to find answers to > > trivial questions like this one, you just need google, or meaby > > cliki.net > > I dont belive that someone who want to learn basic things need to ask > > on group
> Sorry but I can't agree. It is precisely answers to questions like > this, and the Lisp code examples that seem so difficult to find, that > help to sell the Lisp idea to newbies. Reading code is a vital part of > learning a language, and this group is good at dishing out useful > snippets.
> A Google search for Lisp code repositories is, frankly, useless.
>The > links that appear are 15-20 years old, and don't exist anymore. So > keep up the good work guys. Answers to 'trivial' questions like this > are what will help bring new blood into the Lisp fold. Being sniffy > and saying it's beneath this group is just silly.
> If there is a positive value to be got from the replies, then the OP > was correct.
world.lisp.de> wrote: > On Feb 5, 2:44 am, "webmasterATflymagnetic.com"
> <webmas...@flymagnetic.com> wrote: > > On Feb 2, 4:44 am, "mil...@gmail.com" <mil...@gmail.com> wrote:
> > > On Feb 2, 5:09 am, cartercc <carte...@gmail.com> wrote:
> > > > Sorry, guys, but I'm just a Lisp newbie. My main language is Perl and > > > > in my day job I do a lot of data transformation. In Perl, I can do > > > > this -
> > > > which creates a comma quoted string and tokenizes it twice, once as an > > > > array and as a series of five scalar variables. This outputs the > > > > following:
> > > > ARRAY => "Mr. John J. Jones Jr." > > > > SCALARS => ("Mr., John, J., Jones, Jr.")
> > > > How would you tokenize a comma quoted string in Lisp?
> > > > TIA, CC
> > >http://www.cliki.net/SPLIT-SEQUENCE > > > btw. sometimes google is better and faster way to find answers to > > > trivial questions like this one, you just need google, or meaby > > > cliki.net > > > I dont belive that someone who want to learn basic things need to ask > > > on group
> > Sorry but I can't agree. It is precisely answers to questions like > > this, and the Lisp code examples that seem so difficult to find, that > > help to sell the Lisp idea to newbies. Reading code is a vital part of > > learning a language, and this group is good at dishing out useful > > snippets.
> > A Google search for Lisp code repositories is, frankly, useless.
> >The > > links that appear are 15-20 years old, and don't exist anymore. So > > keep up the good work guys. Answers to 'trivial' questions like this > > are what will help bring new blood into the Lisp fold. Being sniffy > > and saying it's beneath this group is just silly.
> > If there is a positive value to be got from the replies, then the OP > > was correct.
On 2009-02-05, webmasterATflymagnetic.com <webmas...@flymagnetic.com> wrote:
> On Feb 2, 4:44 am, "mil...@gmail.com" <mil...@gmail.com> wrote: >> On Feb 2, 5:09 am, cartercc <carte...@gmail.com> wrote:
>> > How would you tokenize a comma quoted string in Lisp?
> Sorry but I can't agree. It is precisely answers to questions like > this, and the Lisp code examples that seem so difficult to find, that > help to sell the Lisp idea to newbies. Reading code is a vital part of > learning a language, and this group is good at dishing out useful > snippets.
I agree with you but for a different reason: look at some books on Lisp and you'll find examples of parsing strings. This is a basic part of ELIZA (only one classic Lisp program). There is also a great example of this in doctor.el that comes with GNU Emacs. The reason it may not be posted on anybody's site is that people who've learned Lisp pedagogically (from books or by direct instruction) think this is really basic.
Here's one way to do it (and yes "there's more than one way to do it")
(defun get-file-as-strings (file) "Collect the lines of a file as strings." (with-open-file (infile file) (loop for line = (read-line infile nil nil) while line collect line)))
(defun string-to-read-list (strng &key (comment-char #\;)) "Take a string and read it as a list" (if (with-input-from-string (st strng) (eq (peek-char t st) comment-char)) nil (read-from-string (concatenate 'string "(" strng ")"))))
Now your string is a list of tokens, but a slight modification would tokenize it into individual strings. From there on you can use `read' to deal with the tokens.
@webmaster: Now, as to my agreement: it's examples like this that should show beginners that Lisp deals with things fundamentally different from other languages. It's taken me a long time to get used to it. Perl is already set up for text-processing: you can make Lisp do that, as much as you can make it do anything else, but you are the one who decides how to do it. What "newbies" need to understand is "there is no spoon." As Paul Graham has said, Lisp is not so much a programming language as it is an algorithmic abstraction. "Quick-and-dirty" is not a good way to learn.
@OP: Read On Lisp, Practical Common Lisp, Paradigms in Artificial Intelligence Programming and keep a copy of CLTL2 on hand, and after a while things will come to you. Spending a little time with Scheme might also be a good idea.
Joel
-- Joel J. Adamson -- http://www.unc.edu/~adamsonj University of North Carolina at Chapel Hill CB #3280, Coker Hall Chapel Hill, NC 27599-3280