Jinsong Zhao <
jsz...@yeah.net> writes:
> Hi there,
>
> I hope to find a demo of common lisp on parsing a text file. Or, if
> possible, would your please to show me how to parse the following
> text:
>
> ;;;; start here ;;;;
>
> NET ATOMIC CHARGES AND DIPOLE CONTRIBUTIONS
>
> ATOM NO. TYPE CHARGE ATOM ELECTRON DENSITY
> 1 C 0.0699 3.9301
> 2 O -0.3088 6.3088
> 3 H 0.0085 0.9915
> 4 H 0.0410 0.9590
> 5 H 0.0085 0.9915
> 6 H 0.1809 0.8191
> DIPOLE X Y Z TOTAL
> POINT-CHG. -0.697 0.295 0.542 0.932
> HYBRID -0.149 0.289 0.532 0.624
> SUM -0.847 0.585 1.074 1.488
>
> ;;;; end here ;;;;
>
> I hope to assign "TYPE" and "CHARGE" into two variable, e.g., type and
> change.
It looks like you have fixed size records, so you could read
https://groups.google.com/forum/#!original/comp.lang.lisp/S2aqG-UPhe8/MTIYx8VNArgJ
adding some code to detect what kind of record you're going to read.
On the other hand, if the data is formated in a subset of the lisp
syntax, you could also just read it using the lisp reader (easily enough
if there's no optional data).
> I try to read the text file using loop. I can find the proper
> position. However I don't know how to continue to parse the following
> lines.
>
> (with-open-file (stream "file.txt")
> (do ((line (read-line stream nil)
> (read-line stream nil)))
> ((null line))
> (if (search "ATOM ELECTRON DENSITY" line) (;; lost here...
> ))))
>
> Any suggestions will be appreciated.
https://groups.google.com/forum/#!original/comp.lang.lisp/gw1t5lTvu1k/CXQ-UCxZtDwJ
What you need to do is to analyse the structure of your file and give a
description of it. For example, you could come up with this grammar:
file ::= header atom-table dipole-table trailer .
header ::= { empty-line } 'NET' 'ATOMIC' 'CHARGES' 'AND' 'DIPOLE' 'CONTRIBUTIONS' { empty-line } .
atom-table ::= atom-table-header { atom-table-line } .
atom-table-header ::= 'ATOM' 'NO.' 'TYPE' 'CHARGE' 'ATOM' 'ELECTRON' 'DENSITY' .
atom-table-line ::= atomic-number type charge atom-electron-density .
atomic-number ::= integer .
type ::= symbol .
charge ::= floating-point-number .
atom-electron-density ::= floating-point-number .
dipole-table ::= dipole-table-header { dipole-table-line } .
dipole-table-header ::= 'DIPOLE' 'X' 'Y' 'Z' 'TOTAL'.
dipole-table-line ::= dipole-title x y z total .
dipole-title ::= symbol .
x ::= floating-point-number .
y ::= floating-point-number .
z ::= floating-point-number .
total ::= floating-point-number .
Then you can write a parser for it.
But the important thing here is that you have a data structure that is
composed of a sequence of two different repeatitions.
/
| file-header
|
| atom-header
|
| /
| |
| | atom-no
| |
| atom < type
| |
| | charge
| |
| | electron-density
file < \
|
| dipole-header
|
| /
| |
| | title
| |
| | x
| dipole <
| | y
| |
| | z
| |
| | total
\ \
That means that the program to process this file will consist in a
sequence of two loops:
(progn
(read-file-header)
(read-atom-header)
(loop
:named read-atom-lines
:do …)
(read-dipole-header)
(loop
:named read-dipole-lines
:do …))
So you're starting on the wrong foot, by writing a single outer loop:
the file structure is NOT a repeation of things, it's a sequence of
different things!
Actually, we can skip the read-dipole-header phase, since it has only
one line which will be read by the read-atom-lines loop (if we had to
process it, we could have this loop return it or save it for further
processing).
(defun read-atomic-charges-and-dipole-contributions-file (path)
(let ((atomic-charges '())
(dipole-contributions '()))
(with-open-file (stream path)
(read-file-header stream)
(read-atom-header stream)
(loop
:named read-atom-lines
:for line = (read-line stream nil nil)
:while (atomic-charge-line-p line)
:do (push (parse-atomic-charge-line line) atomic-charges))
(loop
:named read-dipole-lines
:for line = (read-line stream nil nil)
:while (dipole-contribution-p line)
:do (push (parse-dipole-contribution-line line) dipole-contributions)))
(list atomic-charges dipole-contributions)))
;; Here we just read the number of lines, without any check. You could
;; also parse them, cf. the BNF above.
(defun read-file-header (stream)
(read-line stream nil)
(read-line stream nil)
(read-line stream nil))
(defun read-atom-header (stream)
(read-line stream nil))
;; Similarly, the detection of type of lines is primitive, but if we
;; assume the input file is always correct, sufficient.
(defun atomic-charge-line-p (line)
(with-input-from-string (stream line)
(integerp (ignore-errors (read stream)))))
;; For the parsing functions, we could build structures instead of
;; returning lists.
(defun parse-atomic-charge-line (line)
(with-input-from-string (stream line)
(list (read stream) (read stream) (read stream) (read stream))))
(defun dipole-contribution-p (line)
(and line (< 1 (length (string-trim " " line)))))
;; Be careful that the syntax of floating point numbers is not
;; universal. Lisp has its own syntax, and it is different from the syntax
;; issued by Fortran or C programs! So READ may not be adapted: you may
;; have to write your own scanner for those data items.
(defun parse-dipole-contribution-line (line)
(with-input-from-string (stream line)
(list (read stream) (read stream) (read stream) (read stream) (read stream))))
(read-atomic-charges-and-dipole-contributions-file "/tmp/file.txt")
--> (((6 h 0.1809 0.8191)
(5 h 0.0085 0.9915)
(4 h 0.041 0.959)
(3 h 0.0085 0.9915)
(2 o -0.3088 6.3088)
(1 c 0.0699 3.9301))
((sum -0.847 0.585 1.074 1.488)
(hybrid -0.149 0.289 0.532 0.624)
(point-chg. -0.697 0.295 0.542 0.932)))
--
__Pascal Bourguignon__
http://www.informatimago.com/
"Le mercure monte ? C'est le moment d'acheter !"