Google Groups no longer supports new usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Splitting delimited lines

21 views
Skip to the first unread message

Robert L.

unread,
19 Sept 2017, 16:38:3819/09/2017
to
> research on chat an email within a system. The log files are character
> delimited, one message per line, but the last field on the line is
> "dirty" with unescaped delimiters. So a typical line might look
> something like this (not an actual line):
>
> userID^username^gender^site^messageDate^world^messagetext
> 2706^user^m^center^2004-03-01 09:21^chatWorld^Dirty text with ^caret.
>
> What I want is a generalizable function that allows me to do something
> like this:
>
> ;;----
> ;;;the field list
> (defparameter *field-list* '("userID"
> "username"
> "gender"


People are of the male sex or of the female sex.
Only words have gender.



> "site"
> "messageDate"
> "world"
> "messageText")
> "the field list for splitting and identifying fields")
>
> (setf record (split-line #\^ line *field-list*))
> (get-field "userID" record)
> (get-field "gender" record)
>
> ;;---------
>
> Because I'm dealing with different log file formats, I really want to be
> able to reference field values by name rather than remember that the
> message text is (nth 4 line) in one format, and (nth 6 line) in another
> format.
>
> ;;---------
> ;;This function splits the line into count number of fields
> ;;tacking on the last field as a possibly "dirty" remainder.
> (defun pythonic-split (split-char line count)
> "split the line using split-char to produce a maximum of count fields"
> (multiple-value-bind (result-list place)
> ;;subtract one from count so that you can pass the total number
> ;;of desired fields
> (split-sequence:split-sequence split-char line :count (- count 1))



CL does not have "split-sequence".
That code will not work under SBCL.



> (append result-list `(,(subseq line place)))))
>
> (defun split-line (split-char line labels)
> "split a line into an alist of length count with labels"
> ;;a solution for matching fields to labels, use the length of the
> ;;labels list to get the count.
> (pairlis labels (pythonic-split split-char line (length labels))))




(require srfi/13) ; string-tokenize
(require srfi/14) ; char sets

(define (split-line sep-char line labels)
(define parts
(string-tokenize line (char-set-complement (char-set sep-char))))
(define n-clean (- (length labels) 1))
(define fields
(append (take parts n-clean)
(list (string-join (drop parts n-clean) (string sep-char)))))
(map cons labels fields))


(split-line #\^
"2706^user^m^center^2004-03-01 09:21^chatWorld^Dirty text with ^caret."
'("userID" "username" "sex" "site" "messageDate" "world" "messageText"))

===>
(("userID" . "2706")
("username" . "user")
("sex" . "m")
("site" . "center")
("messageDate" . "2004-03-01 09:21")
("world" . "chatWorld")
("messageText" . "Dirty text with ^caret."))


In Forth?

--
Black Panther Quanell X ... blamed the 11-year-old girl for being raped by 28
black males. http://archive.org/details/DavidDuke_videos/ (Trayvon Martin)
What I would most desire would be the separation of the white and black
races. --- A. Lincoln, July 17, 1858
0 new messages