Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XML parser as reader macro?

26 views
Skip to first unread message

Josef Wolf

unread,
Sep 30, 2010, 2:56:50 PM9/30/10
to
Hello,

I am currently reading the book "Let over Lambda" by Doug Hoyte. In this
book, the author mentions an XML parser implemented as reader macros.
Out of curiosity, I'd like to take a look at this parser.

Unfortunately, the "References" section is somewhat strange, and I am
having troubles to find the parser. Google wasn't helpful, to.

Does anybody have a pointer to this parser?

Zach Beane

unread,
Sep 30, 2010, 4:27:34 PM9/30/10
to
Josef Wolf <j...@raven.inka.de> writes:

Perhaps it was a reference to http://www.agentsheets.com/lisp/XMLisp/ ?

Zach

Pascal J. Bourguignon

unread,
Sep 30, 2010, 5:16:34 PM9/30/10
to
Josef Wolf <j...@raven.inka.de> writes:

I don't but here is a quick and dirty proof-of-concept. A lot more of
work is needed to process the exact XML syntax from the lisp reader (eg.
you'll probably want to read the text elements as strings). Here I
demonstrate how you can just read lisp structures with a different
syntax:

(defstruct close-tag
tag)
(defstruct xml-node
tag attributes elements)

(defun eat-spaces (stream)
(peek-char t stream nil nil t))

(defun read-attributes (stream)
(eat-spaces stream)
(loop
:until (char= #\> (peek-char t stream t nil t))
:collect (let ((name (read stream t nil t))
(equa (read stream t nil t))
(valu (read stream t nil t)))
(assert (eql '= equa))
(list name valu))))

(defun lt-reader-macro (stream ch)
(if (char= #\/ (peek-char t stream t nil t))
(progn
(read-char stream t nil t)
(let ((tag (read stream t nil t)))
(assert (char= #\> (eat-spaces stream)))
(read-char stream t nil t)
(make-close-tag :tag tag)))
(let ((tag (read stream t nil t))
(attr (read-attributes stream)))
(assert (char= #\> (read-char stream t nil t)))
(make-xml-node :tag tag
:attributes attr
:elements (loop
:for item :|=| (read stream t nil t)
:until (and (close-tag-p item) (eql tag (close-tag-tag item)))
:when (close-tag-p item) :do (error "stray close tag </~A> in <~A>." (close-tag-tag item) tag)
:collect item)))))


(defun gt-reader-macro (stream ch)
'>)

(set-macro-character #\< (lambda (stream ch) (lt-reader-macro stream ch)) nil)
(set-macro-character #\> (lambda (stream ch) (gt-reader-macro stream ch)) nil)


C/USER[106]> (read-from-string "<book title = \"Structure and Interpretation of Computer Programs\" code = \"SICP\">Is a good book</book>")
--> #S(XML-NODE
:TAG BOOK
:ATTRIBUTES ((TITLE "Structure and Interpretation of Computer Programs") (CODE "SICP"))
:ELEMENTS (IS A GOOD BOOK))


C/USER[107]> <book title = "Structure and Interpretation of Computer Programs" code = "SICP">Is a good book</book>
--> #S(XML-NODE
:TAG BOOK
:ATTRIBUTES ((TITLE "Structure and Interpretation of Computer Programs") (CODE "SICP"))
:ELEMENTS (IS A GOOD BOOK))


C/USER[108]> <book title = "Structure and Interpretation of Computer Programs" code = "SICP">
<chapter number = "1">Building Abstractions with Procedures</chapter>
<chapter number = "2">Building Abstractions with Data</chapter>
<chapter number = "3">Modularity Objects and State</chapter>
<chapter number = "4">Metalinguistic Abstraction</chapter>
<chapter number = "5">Computing with Register Machines</chapter>
</book>
#S(XML-NODE :TAG BOOK
:ATTRIBUTES
((TITLE "Structure and Interpretation of Computer Programs") (CODE "SICP"))
:ELEMENTS
(#S(XML-NODE :TAG CHAPTER :ATTRIBUTES ((NUMBER "1"))
:ELEMENTS (BUILDING ABSTRACTIONS WITH PROCEDURES))
#S(XML-NODE :TAG CHAPTER :ATTRIBUTES ((NUMBER "2"))
:ELEMENTS (BUILDING ABSTRACTIONS WITH DATA))
#S(XML-NODE :TAG CHAPTER :ATTRIBUTES ((NUMBER "3"))
:ELEMENTS (MODULARITY OBJECTS AND STATE))
#S(XML-NODE :TAG CHAPTER :ATTRIBUTES ((NUMBER "4"))
:ELEMENTS (METALINGUISTIC ABSTRACTION))
#S(XML-NODE :TAG CHAPTER :ATTRIBUTES ((NUMBER "5"))
:ELEMENTS (COMPUTING WITH REGISTER MACHINES))))
C/USER[109]>

--
__Pascal Bourguignon__ http://www.informatimago.com/

Alessio Stalla

unread,
Sep 30, 2010, 6:40:20 PM9/30/10
to

I haven't read Let over Lambda, but I think it's referring to XMLisp
(http://www.agentsheets.com/lisp/XMLisp/).

Josef Wolf

unread,
Oct 1, 2010, 1:59:58 PM10/1/10
to
On 2010-09-30, Pascal J. Bourguignon <p...@informatimago.com> wrote:

> Josef Wolf <j...@raven.inka.de> writes:
>> I am currently reading the book "Let over Lambda" by Doug Hoyte. In this
>> book, the author mentions an XML parser implemented as reader macros.
>> Out of curiosity, I'd like to take a look at this parser.
>
> I don't but here is a quick and dirty proof-of-concept. A lot more of
> work is needed to process the exact XML syntax from the lisp reader (eg.
> you'll probably want to read the text elements as strings).

Well, I'm not really going to parse XML. Was just curious about how such a
parser could look like.

> Here I
> demonstrate how you can just read lisp structures with a different
> syntax:

Ugh! Really impressing, that! Took a while for me to grasp how the recursion
is constructed.

So here you exploit the fact that XML structure always starts with a '<'
character. How would you go when the first token can not be distinguished
from normal symbols? For example, given a syntax like that of the PCB
project (http://pcb.gpleda.org/):

PCB["some-name" 270000 250000]
Grid[1000.000000 0 0 1]
Flags("rubberband,clearnew,snappin")
Groups("1,c:2,s:3:4:5:6:7:8")
Styles["Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000"]
Symbol('!' 12)
(
SymbolLine(0 45 0 50 8)
SymbolLine(0 10 0 35 8)
)
Element["" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 ""]
(
Pin[0 0 6800 3000 7400 3800 "1" "1" "square,edge2"]
ElementLine [10000 -5000 30000 -5000 2500]
)
NetList()
(
Net("GND" "(unknown)")
(
Connect("CONN201-2")
Connect("U101-3")
)
)

So this syntax is pretty close to s-expressions. If only there would be no
mix of squares/parens and the introducing symbol would be moved behind the
paren to form a proper s-expression. How would you define the macro
character in such a case?

Pascal J. Bourguignon

unread,
Oct 1, 2010, 4:14:42 PM10/1/10
to
Josef Wolf <j...@raven.inka.de> writes:

> On 2010-09-30, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> Josef Wolf <j...@raven.inka.de> writes:
>>> I am currently reading the book "Let over Lambda" by Doug Hoyte. In this
>>> book, the author mentions an XML parser implemented as reader macros.
>>> Out of curiosity, I'd like to take a look at this parser.
>>
>> I don't but here is a quick and dirty proof-of-concept. A lot more of
>> work is needed to process the exact XML syntax from the lisp reader (eg.
>> you'll probably want to read the text elements as strings).
>
> Well, I'm not really going to parse XML. Was just curious about how such a
> parser could look like.
>
>> Here I
>> demonstrate how you can just read lisp structures with a different
>> syntax:
>
> Ugh! Really impressing, that! Took a while for me to grasp how the recursion
> is constructed.
>
> So here you exploit the fact that XML structure always starts with a '<'
> character. How would you go when the first token can not be distinguished
> from normal symbols?

The point of XMLLisp and my example, is to allow the mix of lisp and XML
code. XML elements are taken as lisp data. I don't know if XMLLisp
does that, but we could extend my proof-of-concept to read XML syntax as
backquote/quote so that we could embed lisp code in the XML and loop the
loop.

> For example, given a syntax like that of the PCB
> project (http://pcb.gpleda.org/):

In this case, it could be better to just have a separate parser. You
could read that code and parse it into a sexp if you will. Then you
could wrap it in a simple reader macro to be able to read it as a
here-string.

Well you would have to define a macro character for each character, that
basically would enter the character into a full fleshed scanner and
parser. This scanner/parser would have to accept both that syntax and
lisp syntax. In general, you may have some incompatible languages.

As I said above, if you need to read that in Lisp, the easiest thing is
to write a stand alone scanner/parser, and then you can write a
here-string reader macro. For example:

(set-macro-character #\{ (lambda (stream ch)

(prog1 `(progn ,@(parse-pcb-expressions stream))
(assert (char= #\} (peek-char t stream))
(read-char stream))))

parse-pcb-expressions would read the stream and parse the above syntax
until it finds an end-of-file or a #\} character.

Rob Warnock

unread,
Oct 2, 2010, 6:38:57 AM10/2/10
to
Josef Wolf <j...@raven.inka.de> wrote:
+---------------

| How would you go when the first token can not be distinguished from
| normal symbols? For example, given a syntax like that of the PCB
| project (http://pcb.gpleda.org/):
|
| PCB["some-name" 270000 250000]
| Grid[1000.000000 0 0 1]
| Flags("rubberband,clearnew,snappin")
...
+---------------

I would look for a converter from "PCB" format to "EDIF" format
("Electronic Design Interchange Format"), which industry-standard
tools usually produce. EDIF files *are* sexprs, which can be read
with the CL reader [after setting the readtable case to either
:PRESERVE or (my preference) :INVERT]:

(edif MyProjectName
(edifVersion 2 0 0)
(edifLevel 0)
(keywordMap (keywordLevel 0))
(status
(written
(timeStamp 2005 05 23 23 05 04)
(program "CAPTURE.EXE" (Version "10.3.0.p001"))
(comment "Original data from OrCAD/CAPTURE schematic"))
...))


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Josef Wolf

unread,
Oct 2, 2010, 6:52:00 AM10/2/10
to
On 2010-10-01, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Josef Wolf <j...@raven.inka.de> writes:
>> So here you exploit the fact that XML structure always starts with a '<'
>> character. How would you go when the first token can not be distinguished
>> from normal symbols?
>
> The point of XMLLisp [ ... ]

To be honest, I have not taken a deep look into XMLisp yet. XMLisp seems to
be a big package, at least too big for me (yet). I think I should try to
master CL _basics_ before I dive into CLOS.

In contrast, your example was a perfect fit: it was small enough that I could
understand it within about ten minutes. And it showed me concepts which were
new to me.

> and my example, is to allow the mix of lisp and XML
> code. XML elements are taken as lisp data. I don't know if XMLLisp
> does that,

AFAICS, XMLisp reads XML into a CLOS object-tree and provides functions to
write this tree back into XML easily.

> but we could extend my proof-of-concept to read XML syntax as
> backquote/quote so that we could embed lisp code in the XML and loop the
> loop.

Interesting idea.

>> For example, given a syntax like that of the PCB
>> project (http://pcb.gpleda.org/):
>
> In this case, it could be better to just have a separate parser. You
> could read that code and parse it into a sexp if you will. Then you
> could wrap it in a simple reader macro to be able to read it as a
> here-string.

Since the syntax is _almost_ sexp, I thought there might be another option.

In fact, the syntax _is_ sexp. Just pretend that there is an additional
set of parens around the whole file and you have an assoc which can hold
more assoc's, recursively. When looking this way at the problem, only
squares and sharp-sign need special handling. Syntax for squares is identical
to the paren syntax, they only signal that dimensions are specified in
higher resolution.

I've written a parser (in Ruby) for this format a year ago. Took about 200
lines.

BTW: While we're at reader macros: How hard is it to eliminate the
traditional unix comment (sharp-sign till the end of the line)

Pascal J. Bourguignon

unread,
Oct 2, 2010, 9:47:53 AM10/2/10
to
Josef Wolf <j...@raven.inka.de> writes:

>>> For example, given a syntax like that of the PCB
>>> project (http://pcb.gpleda.org/):
>>
>> In this case, it could be better to just have a separate parser. You
>> could read that code and parse it into a sexp if you will. Then you
>> could wrap it in a simple reader macro to be able to read it as a
>> here-string.
>
> Since the syntax is _almost_ sexp, I thought there might be another option.
>
> In fact, the syntax _is_ sexp. Just pretend that there is an additional
> set of parens around the whole file and you have an assoc which can hold
> more assoc's, recursively. When looking this way at the problem, only
> squares and sharp-sign need special handling. Syntax for squares is identical
> to the paren syntax, they only signal that dimensions are specified in
> higher resolution.
>
> I've written a parser (in Ruby) for this format a year ago. Took about 200
> lines.
>
> BTW: While we're at reader macros: How hard is it to eliminate the
> traditional unix comment (sharp-sign till the end of the line)

You can consider almost any syntax as a sequence of token, which you can
read in a sexp an further parse.

But what I meant is that you have to be careful with syntaxes
ie. grammars, because some grammars that appear to generate similar
languages may actually be quite different. Trying to parse a language
with the wrong grammar may lead to very hard if not impossible problems.

In the simple cases, the fact fact parentheses are not placed the same
way requires a transformation phase that is not trivial in general. In
practice you will have to implement the other language pasing over your
sexp.

For example, in the case of PCB, it looks like some "operators" take a
single list of arguments, while some others take two.
You can trivially get it under the form of a sexp such as:

(PCB ("some-name" 270000 250000) GRID (1000.0 0 0 1) FLAGS
("rubberband,clearnew,snappin") GROUPS ("1,c:2,s:3:4:5:6:7:8") STYLES
("Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000") SYMBOL ('! '12)
(SYMBOLLINE (0 45 0 50 8) SYMBOLLINE (0 10 0 35 8)) ELEMENT
("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 "")
(PIN (0 0 6800 3000 7400 3800 "1" "1" "square,edge2") ELEMENTLINE
(10000 -5000 30000 -5000 2500))
NETLIST NIL
(NET ("GND" "(unknown)") (CONNECT ("CONN201-2") CONNECT ("U101-3"))))

But then you have to parse that to put the operators inside their lists,
perhaps something like:

((PCB "some-name" 270000 250000)
(GRID 1000.0 0 0 1)
(FLAGS "rubberband,clearnew,snappin")
(GROUPS "1,c:2,s:3:4:5:6:7:8")
(STYLES "Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000")
((SYMBOL ('! '12))
(SYMBOLLINE 0 45 0 50 8)
(SYMBOLLINE 0 10 0 35 8))
((ELEMENT ("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 ""))
(PIN 0 0 6800 3000 7400 3800 "1" "1" "square,edge2")
(ELEMENTLINE 10000 -5000 30000 -5000 2500))
((NETLIST NIL)
((NET ("GND" "(unknown)"))
(CONNECT "CONN201-2")
(CONNECT "U101-3"))))

Pascal J. Bourguignon

unread,
Oct 2, 2010, 9:48:59 AM10/2/10
to
p...@informatimago.com (Pascal J. Bourguignon) writes:

> But then you have to parse that to put the operators inside their lists,
> perhaps something like:
>
> ((PCB "some-name" 270000 250000)
> (GRID 1000.0 0 0 1)
> (FLAGS "rubberband,clearnew,snappin")
> (GROUPS "1,c:2,s:3:4:5:6:7:8")
> (STYLES "Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000")
> ((SYMBOL ('! '12))
> (SYMBOLLINE 0 45 0 50 8)
> (SYMBOLLINE 0 10 0 35 8))
> ((ELEMENT ("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 ""))
> (PIN 0 0 6800 3000 7400 3800 "1" "1" "square,edge2")
> (ELEMENTLINE 10000 -5000 30000 -5000 2500))
> ((NETLIST NIL)
> ((NET ("GND" "(unknown)"))
> (CONNECT "CONN201-2")
> (CONNECT "U101-3"))))

with:

(defun parse-pcb (src)
(if (atom src)
src
(loop
:with result = '()
:while src
:do (let ((op (pop src)))
(case op
((symbol element netlist net)
(push (cons (list op (pop src)) (parse-pcb (pop src))) result))
(otherwise
(if (symbolp op)
(push (cons op (parse-pcb (pop src))) result)
(push op result)))))
:finally (return (reverse result)))))


(parse-pcb '(PCB ("some-name" 270000 250000) GRID (1000.0 0 0 1) FLAGS


("rubberband,clearnew,snappin") GROUPS ("1,c:2,s:3:4:5:6:7:8") STYLES
("Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000") SYMBOL ('! '12)
(SYMBOLLINE (0 45 0 50 8) SYMBOLLINE (0 10 0 35 8)) ELEMENT
("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 "")
(PIN (0 0 6800 3000 7400 3800 "1" "1" "square,edge2") ELEMENTLINE
(10000 -5000 30000 -5000 2500))
NETLIST NIL

(NET ("GND" "(unknown)") (CONNECT ("CONN201-2") CONNECT ("U101-3")))))

Josef Wolf

unread,
Oct 7, 2010, 1:47:36 PM10/7/10
to
On 2010-10-02, Rob Warnock <rp...@rpw3.org> wrote:

> Josef Wolf <j...@raven.inka.de> wrote:
> I would look for a converter from "PCB" format to "EDIF" format
> ("Electronic Design Interchange Format"), which industry-standard
> tools usually produce. EDIF files *are* sexprs, which can be read
> with the CL reader [after setting the readtable case to either
>:PRESERVE or (my preference) :INVERT]:
>
> (edif MyProjectName
> (edifVersion 2 0 0)
> (edifLevel 0)
> (keywordMap (keywordLevel 0))
> (status
> (written
> (timeStamp 2005 05 23 23 05 04)
> (program "CAPTURE.EXE" (Version "10.3.0.p001"))
> (comment "Original data from OrCAD/CAPTURE schematic"))
> ...))

Hey, that sounds interesting! Unfortunately, there don't seem to be any
specification available on the net. Do you have any pointers?

Josef Wolf

unread,
Oct 7, 2010, 1:45:52 PM10/7/10
to
On 2010-10-02, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Josef Wolf <j...@raven.inka.de> writes:
[ ... ]

> For example, in the case of PCB, it looks like some "operators" take a
> single list of arguments, while some others take two.

You're right, I've missed that.

> You can trivially get it under the form of a sexp such as:
>
> (PCB ("some-name" 270000 250000) GRID (1000.0 0 0 1) FLAGS
> ("rubberband,clearnew,snappin") GROUPS ("1,c:2,s:3:4:5:6:7:8") STYLES
> ("Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000") SYMBOL ('! '12)
> (SYMBOLLINE (0 45 0 50 8) SYMBOLLINE (0 10 0 35 8)) ELEMENT
> ("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 "")
> (PIN (0 0 6800 3000 7400 3800 "1" "1" "square,edge2") ELEMENTLINE
> (10000 -5000 30000 -5000 2500))
> NETLIST NIL
> (NET ("GND" "(unknown)") (CONNECT ("CONN201-2") CONNECT ("U101-3"))))

To get this first step of transformation, I tried this approach:

(defun read-pcb-to-sexp (pathname)
(let ((*readtable* (copy-readtable *readtable*)))
(set-syntax-from-char #\# #\;)
(set-syntax-from-char #\[ #\()
(set-syntax-from-char #\] #\))
(with-open-file (file pathname :direction :input)
(let ((eof-value (gensym "EOF")))
(do* ((thing nil (read file nil eof-value))
(result '() (cons thing result)))
((eql thing eof-value) (nreverse (cdr result)))
(format t "~&~A~%" thing))))))

This works for the hash-comment character. But not for the squares. Turns out
that SBCL's READ-LIST function is hardwired to use #\) as the list termination
character.

> But then you have to parse that to put the operators inside their lists,
> perhaps something like:
>
> ((PCB "some-name" 270000 250000)
> (GRID 1000.0 0 0 1)
> (FLAGS "rubberband,clearnew,snappin")
> (GROUPS "1,c:2,s:3:4:5:6:7:8")
> (STYLES "Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000")
> ((SYMBOL ('! '12))
> (SYMBOLLINE 0 45 0 50 8)
> (SYMBOLLINE 0 10 0 35 8))
> ((ELEMENT ("" "R025" "R102" "10k" 40000 160000 14000 -3000 0 100 ""))
> (PIN 0 0 6800 3000 7400 3800 "1" "1" "square,edge2")
> (ELEMENTLINE 10000 -5000 30000 -5000 2500))
> ((NETLIST NIL)
> ((NET ("GND" "(unknown)"))
> (CONNECT "CONN201-2")
> (CONNECT "U101-3"))))

Actually, there is more to do. For example, the SYMBOL, ELEMENT and NETLIST
keywords should create hashes (or assocs) of their child elements.

Pascal J. Bourguignon

unread,
Oct 7, 2010, 3:13:08 PM10/7/10
to
Josef Wolf <j...@raven.inka.de> writes:

The standard function is READ-DELIMITED-LIST, and it takes the
terminator character as an argument.


(with-input-from-string (inp "a b c ] d e f")
(read-delimited-list #\] inp t))
--> (A B C)


> Actually, there is more to do. For example, the SYMBOL, ELEMENT and NETLIST
> keywords should create hashes (or assocs) of their child elements.

This would be done I guess when evaluating them, not when parsing the
source. Once you have a lisp sexp, you can easily write functions and
macros to 'execute' it, or just write an interpreter.

Josef Wolf

unread,
Oct 7, 2010, 6:44:23 PM10/7/10
to
On 2010-10-07, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Josef Wolf <j...@raven.inka.de> writes:
[ ... ]
> The standard function is READ-DELIMITED-LIST, and it takes the
> terminator character as an argument.
>
> (with-input-from-string (inp "a b c ] d e f")
> (read-delimited-list #\] inp t))
> --> (A B C)

Thanks for the hint! The reading part seems to work fine now:

(defun read-pcb-to-sexp (pathname)
(let ((*readtable* (copy-readtable *readtable*))

(hires-flag (gensym "HIRES-"))) ; a unique flag to signal later fixup
(set-syntax-from-char #\# #\;) ; inherit comment syntax
(set-syntax-from-char #\] #\)) ; delimiter for high resolution list

(set-macro-character #\[ ; read a high resolution list
#'(lambda (stream char)
(declare (ignore char))
(cons hires-flag ; add flag so we can fix up later
(read-delimited-list #\] stream t))))

;; Characters are enclosed in quotes _without_ escaping.
;; Thus #\' and #\\ are encoded as ''' and '\'
(set-macro-character #\'
#'(lambda (stream char)
(declare (ignore char))
(let ((ch (read-char stream t nil t))
(delim (read-char stream t nil t)))
(unless (char= delim #\')
(error "illegal character specification |'~A~A|" ch delim))
ch)))

(with-open-file (file pathname :direction :input)

(let ((eof-value (gensym "EOF-")))


(do* ((thing nil (read file nil eof-value))
(result '() (cons thing result)))

((eql thing eof-value) (nreverse (cdr result))))))))

>> Actually, there is more to do. For example, the SYMBOL, ELEMENT and NETLIST
>> keywords should create hashes (or assocs) of their child elements.
>
> This would be done I guess when evaluating them, not when parsing the
> source. Once you have a lisp sexp, you can easily write functions and
> macros to 'execute' it, or just write an interpreter.

Yeah. I don't expect big problems here, too.

Rob Warnock

unread,
Oct 8, 2010, 6:12:44 AM10/8/10
to
Josef Wolf <j...@raven.inka.de> wrote:
+---------------
| Rob Warnock <rp...@rpw3.org> wrote:
| > I would look for a converter from "PCB" format to "EDIF" format
| > ("Electronic Design Interchange Format"), which industry-standard
| > tools usually produce. EDIF files *are* sexprs, which can be read
| > with the CL reader [after setting the readtable case to either
| >:PRESERVE or (my preference) :INVERT]:
...

| Hey, that sounds interesting! Unfortunately, there don't seem to be any
| specification available on the net. Do you have any pointers?
+---------------

Uh... Did you even *try* a Google search? From the first page of results:

http://en.wikipedia.org/wiki/EDIF
EDIF (Electronic Design Interchange Format)

http://portal.acm.org/citation.cfm?id=113938.149663
The electronic design interchange format EDIF: present and future
by HJ Kahn - 1992
[From the Proceedings of the 29th ACM/IEEE Design Automation Conference
Anaheim, California, United States, Pages: 666 - 671, 1992.]

http://www.rulabinsky.com/cavd/text/chapd.html
Computer Aids for VLSI Design
Steven M. Rubin
Copyright © 1994
Appendix D: Electronic Design Interchange Format

http://www.iue.tuwien.ac.at/phd/minixhofer/node53.html
B.3 Electronic Design Interchange Format (EDIF)
This description of EDIF format is taken from Appendix D of
Computer Aids for VLSI Design by Steven M. Rubin... [See URL above]

If you want the *actual* EDIF 4.0.0 spec itself [rather than the above
descriptions of the syntax], be prepared to shell out *big* bucks!!

http://webstore.ansi.org/RecordDetail.aspx?sku=IEC+61690-2+Ed.+1.0+en%3a2000
IEC 61690-2 Ed. 1.0 en:2000
Electronic design interchange format (EDIF) - Part 2: Version 4 0 0.
(This publication is available in electronic HTML format only)
Price: $388.00 Full MBR Price: $310.40
File Size: 6.3 MB

Standards bodies. Meh.

Sam Steingold

unread,
Oct 8, 2010, 12:36:45 PM10/8/10
to

Josef Wolf

unread,
Oct 8, 2010, 7:36:21 PM10/8/10
to
On 2010-10-08, Rob Warnock <rp...@rpw3.org> wrote:
> Josef Wolf <j...@raven.inka.de> wrote:
> [ ... ]

>| Hey, that sounds interesting! Unfortunately, there don't seem to be any
>| specification available on the net. Do you have any pointers?
> +---------------
>
> Uh... Did you even *try* a Google search? From the first page of results:

You don't have to believe me, but, really, I tried. And I got the same
results as you got.

> http://en.wikipedia.org/wiki/EDIF
> EDIF (Electronic Design Interchange Format)

That was my first try, of course. But this gives only a rough overview.
Far from being enough to implement a translator to/from this format.

> http://portal.acm.org/citation.cfm?id=113938.149663
> The electronic design interchange format EDIF: present and future
> by HJ Kahn - 1992
> [From the Proceedings of the 29th ACM/IEEE Design Automation Conference
> Anaheim, California, United States, Pages: 666 - 671, 1992.]

That's probably a good paper. Unfortunately, when I click on the pdf, I'm
faced with an login-frame, since I'm not subscribed to ACM.

> http://www.rulabinsky.com/cavd/text/chapd.html
> Computer Aids for VLSI Design
> Steven M. Rubin
> Copyright © 1994
> Appendix D: Electronic Design Interchange Format

Seen that also. Glossed over it, but ignored for several reasons:
- it seems to be VLSI specific
- No mention about EDIF version.
Err... there's (edifversion 1 0 0) in an example. The wikipedia page
mentioned above states that 2.0.0 is outdated and not worth to be
supported because of lots of problems. 4.0.0 is current,
So why should I go for 1.0.0?

> http://www.iue.tuwien.ac.at/phd/minixhofer/node53.html
> B.3 Electronic Design Interchange Format (EDIF)
> This description of EDIF format is taken from Appendix D of
> Computer Aids for VLSI Design by Steven M. Rubin... [See URL above]

Looks pretty similar to the paper above. No idea which of them is more
outdated.

> If you want the *actual* EDIF 4.0.0 spec itself [rather than the above
> descriptions of the syntax], be prepared to shell out *big* bucks!!
>
> http://webstore.ansi.org/RecordDetail.aspx?sku=IEC+61690-2+Ed.+1.0+en%3a2000
> IEC 61690-2 Ed. 1.0 en:2000
> Electronic design interchange format (EDIF) - Part 2: Version 4 0 0.
> (This publication is available in electronic HTML format only)
> Price: $388.00 Full MBR Price: $310.40
> File Size: 6.3 MB

Well, looks like they don't really want their 'standard' to be a _real_
standard. I'm definitely _not_ going to pay them a penny just to have
some input data for my exercises for learning CL. If they want dollars
for the spec, then they should not expect that anybody would implement
a translator for free.

Looks like I'll stick with the PCB format for a while. Even if it's worse
than EDIF, at least it's free and it's a good start for exercises.

Anyway, I'm still interested in a detailed description of (current) EDIF
format. But I'm not going to pay for the papers...

Josef Wolf

unread,
Oct 11, 2010, 1:59:09 PM10/11/10
to
On 2010-10-07, Josef Wolf <j...@raven.inka.de> wrote:
> On 2010-10-07, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> Josef Wolf <j...@raven.inka.de> writes:
[ ... ]
>>> Actually, there is more to do. For example, the SYMBOL, ELEMENT and NETLIST
>>> keywords should create hashes (or assocs) of their child elements.
>>
>> This would be done I guess when evaluating them, not when parsing the
>> source. Once you have a lisp sexp, you can easily write functions and
>> macros to 'execute' it, or just write an interpreter.
>
> Yeah. I don't expect big problems here, too.

Turns out to be not that easy as I assumed. Creating/managing hashes and
hashes of hashes in CL feels a little bit awkward to me. When I would do
something intuitive like $x{Element}{"Q102"}{Elementline}=[$parameters] in
perl, I turn up to creating functions to get the setf/gethash sequences
sorted out in CL.

I am pretty much sure this is because of my lack of experience with CL.
I have attached what I have sorted out so far below. While this solution
works, it doesn't exactly feel "right". For example, the STORE-HASHED-ITEM
function is so awkward, I could not even think of a good comment for this
beast.

The next thing is that working with hashes feels like driving a car with
closed eyes. I know there is the inspector. But somehow, perl's Data::Dumper
feels more natural.

Maybe somebody out there can help me sort out what I am missing and help me
improve this piece of code?

Here is what I have up to now (hope this is no too long for this group):

;;; Given a list of alternating keys and values, create a hash with list of
;;; values associated to their keys.
;;; E.g. '(a 1 b 2 a 3 c 4 a 5) ==> A=>(1 3 5), B=>(2), C=>(4)
;;;
(defun construct-hash-of-items (list)
(do ((lst list (cddr lst))
(result (make-hash-table)))
((null lst)
(maphash #'(lambda (k v)
(setf (gethash k result) (nreverse v)))
result)
result)
(push (cadr lst) (gethash (car lst) result))))

(defun store-hashed-item (result collection key contents)
(let ((c (gethash collection result)))
(setf (gethash key c)
(cons (second contents)
(construct-hash-of-items (caddr contents)))))
result)

;;; Build a PCB structure by collecting values from the reversed list. When
;;; we see a symbol, we have collected a complete structure form so we can
;;; store it into the result hash.
;;;
(defun do-extract-pcb-contents (hash-descriptors list result)
(do* ((lst (reverse list) (cdr lst))
(item (car lst) (car lst))
(form (cons item '()) (cons item form)))
((null lst) result)

(when (and (symbolp item) (not (null item)))
(let ((hd (assoc item hash-descriptors)))
(cond (hd (store-hashed-item result
(car hd)
(apply (cadr hd) (list form))
form))

;; Entries containing simple lists as values
((member item '(fileversion pcb grid cursor polyarea thermal drc
flags groups styles))
(setf (gethash item result) (cadr form)))

;; Vias are an exception to the general file format. They don't
;; have a key component, so we have to trore them into a list.
((member item '(via))
(push (cadr form) (gethash 'via result)))

;; The NetList() structure is of no interest. It is just a
;; container to hold the Net() structures.
((equal item 'netlist)
(do-extract-pcb-contents hash-descriptors (caddr form) result))

(t (warn "unknown definition ~S" form))))

;; We just consumed the form, start building a new one
(setf form '()))))

(defun extract-pcb-contents (list)
;; Define a description of how to extract key from a PCB structure
(let ((hash-descriptors (list
(list 'element #'(lambda (x) (third (second x))))
(list 'symbol #'caadr)
(list 'layer #'caadr)
(list 'net #'caadr)))
(result (make-hash-table)))
(mapcar #'(lambda (hd)
(setf (gethash (car hd) result) (make-hash-table)))
hash-descriptors)
(do-extract-pcb-contents hash-descriptors list result)))

;;; Square brackets signal that dimensions are given in high resolution. So we
;;; have to fix up all integer values which were found in parens. The only
;;; exception are layer numbers, which don't have a resolution.
;;;
(defun fixup-pcb-sexp (sexp hires-flag)
(cond ((integerp sexp) (* sexp 100)) ; fix resolution of integer values
((atom sexp) sexp)
((equal (car sexp) 'layer) ; don't need to fix layer number
(cons (car sexp)
(cons (cadr sexp)
(fixup-pcb-sexp (cddr sexp) hires-flag))))
((eq (car sexp) hires-flag) (cdr sexp)) ; new format, already fixed
(t (cons (fixup-pcb-sexp (car sexp) hires-flag)
(fixup-pcb-sexp (cdr sexp) hires-flag)))))

(defun read-pcb-sexp (stream)


(let ((*readtable* (copy-readtable *readtable*))
(hires-flag (gensym "HIRES-"))) ; a unique flag to signal later fixup
(set-syntax-from-char #\# #\;) ; inherit comment syntax
(set-syntax-from-char #\] #\)) ; delimiter for high resolution list

(set-macro-character #\[ ; read a high resolution list
#'(lambda (stream char)
(declare (ignore char))
(cons hires-flag ; add flag so we can fix up later
(read-delimited-list #\] stream t))))

;; Characters are enclosed in quotes _without_ escaping.
;; Thus #\' and #\\ are encoded as ''' and '\'
(set-macro-character #\'
#'(lambda (stream char)
(declare (ignore char))
(let ((ch (read-char stream t nil t))
(delim (read-char stream t nil t)))
(unless (char= delim #\')
(error "illegal character specification |'~A~A|" ch delim))
ch)))

(let ((eof-value (gensym "EOF-")))
(do* ((thing nil (read stream nil eof-value))


(result '() (cons thing result)))
((eql thing eof-value)

(fixup-pcb-sexp (nreverse (cdr result))
hires-flag))))))

(defun read-pcb-from-stream (stream)
(extract-pcb-contents
(read-pcb-sexp stream)))

(defun read-pcb-from-string (string)
(let ((stream (make-string-input-stream string)))
(read-pcb-from-stream stream)))

(defun read-pcb (pathname)
(with-open-file (file pathname)
(read-pcb-from-stream file)))

(defun read-pcb-test (pathname)
(with-open-file (file pathname)
(read-pcb-sexp file)))

(read-pcb-from-string "# release: pcb 20080202
# date: Thu Jun 4 20:51:21 2009
# user: jw (Josef Wolf)
# host: albatros.wolf.lan

# To read pcb files, the pcb version must be >= the file version
FileVersion[20070407]

PCB[\"\" 270000 250000]

Grid[1000.000000 0 0 1]

Cursor[0 0 0.000000]
PolyArea[200000000.000000]
Thermal[0.500000]
DRC[1000 1000 1000 1000 1500 1000]
Flags(\"rubberband,clearnew,snappin\")
Groups(\"1,c:2,s:3:4:5:6:7:8\")
Styles[\"Signal,1000,3600,2000,1000:Power,2500,6000,3500,1000\"]

Symbol(' ' 18)
(
)
Symbol('\"' 12)
(
SymbolLine(0 10 0 20 8)
SymbolLine(10 10 10 20 8)
)

Symbol('.' 12)
(
SymbolLine(0 10 0 20 8)
SymbolLine(10 10 10 20 8)
)
Via[110000 170000 6000 2000 0 3500 \"\" \"\"]
Via[160000 180000 6000 2000 0 3500 \"\" \"\"]

Element[\"\" \"R025\" \"R102\" \"10k\" 40000 160000 14000 -3000 0 100 \"\"]
(
Pin[0 0 6800 3000 7400 3800 \"1\" \"1\" \"square,edge2\"]
Pin[40000 0 6800 3000 7400 3800 \"2\" \"2\" \"edge2\"]
ElementLine [30000 0 40000 0 2500]
ElementLine [0 0 10000 0 2500]
)

Element[\"\" \"TO92\" \"Q201\" \"2N3904\" 170000 130000 8000 15000 0 100 \"\"]
(
Pin[0 20000 7200 3000 7800 4200 \"3\" \"3\" \"\"]
ElementLine [7000 3000 7000 17000 1000]
ElementArc [0 10000 10000 10000 225 270 1000]

)
Layer(1 \"component\")
(
Line[75000 200000 80000 200000 1000 2000 \"clearline,rubberend\"]
Line[124999 165000 115000 165000 2500 2000 \"clearline\"]
Arc[125000 155000 10000 10000 2500 2000 180 -90 \"clearline\"]
Arc[115000 170000 5000 5000 1000 2000 -90 90 \"clearline\"]
)
Layer(2 \"solder\")
(
)
Layer(9 \"silk\")
(
Line[220000 140000 220000 180000 1000 2000 \"clearline,lock\"]
Line[230000 140000 230000 180000 1000 2000 \"clearline,lock\"]
Arc[110000 40000 2000 2000 2000 2000 0 360 \"lock\"]
Arc[130000 40000 2000 2000 2000 2000 0 360 \"lock\"]
Text[78800 110000 0 100 \"A\" \"lock\"]
Text[88799 110000 0 100 \"B\" \"lock\"]


)
NetList()
(
Net(\"GND\" \"(unknown)\")
(
Connect(\"CONN201-2\")
Connect(\"U101-3\")
)

Net(\"Vcc\" \"(unknown)\")
(
Connect(\"Q201-3\")
Connect(\"U101-8\")
)
)
")

Peter Keller

unread,
Oct 11, 2010, 2:39:44 PM10/11/10
to
Josef Wolf <j...@raven.inka.de> wrote:
> On 2010-10-07, Josef Wolf <j...@raven.inka.de> wrote:
>> On 2010-10-07, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>>> Josef Wolf <j...@raven.inka.de> writes:
> [ ... ]
>>>> Actually, there is more to do. For example, the SYMBOL, ELEMENT and NETLIST
>>>> keywords should create hashes (or assocs) of their child elements.
>>>
>>> This would be done I guess when evaluating them, not when parsing the
>>> source. Once you have a lisp sexp, you can easily write functions and
>>> macros to 'execute' it, or just write an interpreter.
>>
>> Yeah. I don't expect big problems here, too.
>
> Turns out to be not that easy as I assumed. Creating/managing hashes and
> hashes of hashes in CL feels a little bit awkward to me. When I would do
> something intuitive like $x{Element}{"Q102"}{Elementline}=[$parameters] in
> perl, I turn up to creating functions to get the setf/gethash sequences
> sorted out in CL.
>
> I am pretty much sure this is because of my lack of experience with CL.
> I have attached what I have sorted out so far below. While this solution
> works, it doesn't exactly feel "right". For example, the STORE-HASHED-ITEM
> function is so awkward, I could not even think of a good comment for this
> beast.
>
> The next thing is that working with hashes feels like driving a car with
> closed eyes. I know there is the inspector. But somehow, perl's Data::Dumper
> feels more natural.

I've needed to do a very similar thing and have thought about an
interface to a hash utility library.

Basically:

To access an element in a HoH (creating hash tables as needed), one would write:

(gethashref key1 key2 key3 key4 key5 ht) <settable>

which would return the same thing as gethash.

To set a hash table entry, one would write:

(setf (gethashref key1 key2 key3 key4 key5 ht) 'item)

Of course, you should do things like this with it:

(push (gethashref key1 key2 key3) 'thing)

and then the value at that location in the hash would be an array.

I hadn't gotten around to implementing it yet, but it shouldn't be
too terrible.

-pete

Pascal J. Bourguignon

unread,
Oct 11, 2010, 5:46:23 PM10/11/10
to
Peter Keller <psi...@cs.wisc.edu> writes:

> I hadn't gotten around to implementing it yet, but it shouldn't be
> too terrible.

Indeed, it's rather trivial:

(defstruct (hash-tree
(:constructor %make-hash-tree))
test table span default)

(defun make-hash-tree (&key (test (function eql)) (span 8) (default nil))
(%make-hash-tree :test test :span span :default default
:table (make-hash-table :test test :size span)))

(defun hash-tree-get (tree &rest keys)
(labels ((walk (hash keys)
(cond
((null keys) hash)
((null hash) (hash-tree-default tree))
(t (walk (gethash (first keys) hash) (rest keys))))))
(walk (hash-tree-table tree) keys)))

(defun (setf hash-tree-get) (new-value tree &rest keys)
(labels ((walk (hash keys)
(cond
((null keys) hash)
((null hash) (error "Overshot"))
(t (let ((table (gethash (first keys) hash)))
(if (typep table 'hash-table)
(walk table (rest keys))
(walk (setf (gethash (first keys) hash)
(make-hash-table :test (hash-tree-test tree)
:size (hash-tree-span tree)))
(rest keys))))))))
(setf (gethash (first (last keys)) (walk (hash-tree-table tree) (butlast keys))) new-value)))


CL-USER> (let ((h (make-hash-tree)))
(setf (hash-tree-get h 'a 'b 'c 'd 'e 'f) 42
(hash-tree-get h 'a 'b 'c 'd 'e 'g) 33
(hash-tree-get h 'a 'b 'h 'i) 44)
h)
#S(HASH-TREE :TEST #<SYSTEM-FUNCTION EQL>
:TABLE
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL
(A .
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL
(B .
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL
(H . #S(HASH-TABLE :TEST EXT:FASTHASH-EQL (I . 44)))
(C .
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL
(D .
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL
(E .
#S(HASH-TABLE :TEST EXT:FASTHASH-EQL (G . 33)
(F . 42))))))))))))
:SPAN 8 :DEFAULT NIL)

Also have a look at
http://groups.google.com/group/comp.lang.lisp/msg/1bab4b5bae1fdca3?dmode=source
http://groups.google.com/group/comp.lang.lisp/msg/190d771c0e033a1c?dmode=source

Peter Keller

unread,
Oct 11, 2010, 7:10:15 PM10/11/10
to
Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Peter Keller <psi...@cs.wisc.edu> writes:
>> Josef Wolf <j...@raven.inka.de> wrote:
>>> Turns out to be not that easy as I assumed. Creating/managing hashes and
>>> hashes of hashes in CL feels a little bit awkward to me. When I would do
>>> something intuitive like $x{Element}{"Q102"}{Elementline}=[$parameters] in
>>> perl, I turn up to creating functions to get the setf/gethash sequences
>>> sorted out in CL.
>>
>> I've needed to do a very similar thing and have thought about an
>> interface to a hash utility library.
>> I hadn't gotten around to implementing it yet, but it shouldn't be
>> too terrible.
>
> Indeed, it's rather trivial:
>
>
> (defstruct (hash-tree
> (:constructor %make-hash-tree))
> test table span default)

When I implemented something similar, I just built it on top of
a regular hash table. Why did you decide to abstract the hash table into
its own explicit datatype?

I just thought it was a matter of a convenient interface to a
hash table, not that the hash table itself was extended with new
functionality or changed and therefore became a new type.

Thank you.

-pete

Tim Bradshaw

unread,
Oct 11, 2010, 7:59:15 PM10/11/10
to
On 2010-10-11 19:39:44 +0100, Peter Keller said:

>
> I've needed to do a very similar thing and have thought about an
> interface to a hash utility library.
>
> Basically:

[I'm not sure if a previous article I wrote made it: I think not but it
may have].

This is sort of that. There are a number of deficiencies in this code
- the most annoying one is that you can't provide a default because
there's no obvious place in the interface for one. The whole
create-tree-on-demand thing is questionable as well.

Obviously these recursions are loops

(defun hashget (hashtree key &rest keys)
;; this is annoying as you can't specify a default anywhere.
(declare (dynamic-extent keys))
(labels ((hg/loop (ht k kt)
(if (null kt)
(gethash k ht)
(hg/loop (gethash k ht) (first kt) (rest kt)))))
(ht/loop hashtree key keys)))

(defun (setf hashget) (new hashtree key &rest keys)
;; This will create new elements in the tree as needed - should it? If
;; it should, what should it copy from the parent? I think the test is
;; all, though this is questionable at best.
(declare (dynamic-extent keys))
(labels ((shg/loop (ht k kt)
(if (null kt)
(setf (gethash k ht) new)
(multiple-value-bind (h hp) (gethash k ht)
(unless hp
(setf h (make-hash-table :test (hash-table-test ht))
(gethash k ht) h))
(shg/loop h (first kt) (rest kt))))))
(shg/loop hashtree key keys)))

Pascal J. Bourguignon

unread,
Oct 11, 2010, 7:59:46 PM10/11/10
to
Peter Keller <psi...@cs.wisc.edu> writes:

In terms of abstract data types (ADT), the hash table is a dictionary
with a single key and delivering 0 or 1 value.

If you want to index several keys in the manner of hashes of hashes
described above, you have a different kind of ADT, namely a tree, each
node being a hash-table, with children named by the keys. When you have
a path (a sequence of keys) you can walk the tree from the root down to
a leaf or a node (hash-table). Those are operations quite different
than the hash-table gethash.


You could also use lists as keys for a normal hash-table:

(let ((x (make-hash-table :test (function equal))))
(setf (gethash '(element "Q102" element-line) x) 112
(gethash '(attribute "Q10" name) x) "hello")
x)
--> #S(HASH-TABLE :TEST EXT:FASTHASH-EQUAL
((ATTRIBUTE "Q10" NAME) . "hello")
((ELEMENT "Q102" ELEMENT-LINE) . 112))

Thus having a single flat hash-table for all the path - value pairs.


But again, you will see that it's two different data structures, because
you will get different time and space complexities. One or the other
may be advantageous, depending on the algorithm used to compute the hash
of the sequence of keys, or the sequence of hash of the keys. Some
implementations use only the first few elements of the sequence, giving
therefore poor results when you have big subtrees. (Again it depends
also of the kind of data you have, you should use the data structure
that is adapted to your data and your algorithms).

Since all these considerations are elided as always in the OP, we may
have fun ignoring them totally and providing widely different solutions.

I also wanted to mention something earlier in the thread, that in Lisp
(and it should be the case in any other programming language too, but
sometimes it's not done because of the syntactic overhead), indeed, we
often 'create' functions to implement data types. This is called
functional abstraction, and this is something that's really done
routinely. Most of the defining macros create quite a number of
functions automatically.


For us lisper, there's something really strange in calling an expression
'intuitive' for its syntax. There's nothing really intuitive about
syntax, since it has to be processed by the compiler, something that is
usually hidden in depths of complex code.

On the other hand, without any syntax, (one grammar rule), and using
function whose call semantic is clearly and simply defined, we can get a
much better intuition of the semantics of an expression.

(setf (hash-tree-get tree 'element "Q102" 'element-line) parameters)

is much more readable and intuitive than:

$x{Element}{"Q102"}{Elementline}=[$parameters]

First we have a verb, SET, (intuitively we just ignore what we don't
understand such as this 'F' at the end of SETF), which intuitively means
that we will change something. Much more intuitive than a mathematical
equal sign which just means (ideographically!) that two elements are
equals. There no notion of assignment here.

Then we have an operator with a readable name: HASH-TREE-GET
which clearly and intuitively let us know that it has something to do
with GETting something out of a TREE of HASH. Then we have a variable
named TREE, quite intuitive, contrarily to stuff like $x. We're not
talking about x amount of dollars. Then after the tree, we have a list
of elements, which may be somewhat obscure, but in the context of a tree
(twice repeated, so it must be a big intuitive hint), that could be a
path, if our intuition doesn't betray us. Finally we see that SETF has
two arguments, that hash-tree-get operator, and that parameters
variable, so we may intuitively assume that it is setting one to the
other, and here you need to know a little more about the direction of
the assignment.

In any case if you want to know more about hash-tree-get, you can just
read its source, to understand what it means, which should be much
easier than to find the meaning of ${}{}{}=[$] in the sources of the
perl compiler...

Peter Keller

unread,
Oct 11, 2010, 10:45:42 PM10/11/10
to

A 0 or one value? Do you have a reference for that? I thought it was simply
a value of arbitrary type.

> If you want to index several keys in the manner of hashes of hashes
> described above, you have a different kind of ADT, namely a tree, each
> node being a hash-table, with children named by the keys. When you have
> a path (a sequence of keys) you can walk the tree from the root down to
> a leaf or a node (hash-table). Those are operations quite different
> than the hash-table gethash.

[snip]

> But again, you will see that it's two different data structures, because
> you will get different time and space complexities.

I'll accept that explanation. Additionally, another good reason for
making it its own ADT is that each node's hash table should be created
with the same test comparison as the other nodes in the tree. That
coupled with the complexity argument satisfies me.

Pascal, is it ok if I use and extend this snippet of code in an Apache
V2 licensed project as long as I attribute the original version of it
to you? If not, that's ok, I can reimagine it in CLOS and iterative
loops for a totally different implementation of the interface if
need be.

Thank you.

-pete

Peter Keller

unread,
Oct 11, 2010, 10:50:59 PM10/11/10
to

Ah yeah, this implementation from you (as opposed to Pascal's) is
much closer to what I was writing. The create-tree-on-demand thing
is necessary since you don't know how many nodes you'll need a-priori
in the tree you build because the tree is data dependent.

Thank you (and Pascal) for showing how it could be implemented. It
have definitely given me food for thought.

-pete

Pascal J. Bourguignon

unread,
Oct 12, 2010, 7:21:44 AM10/12/10
to
Peter Keller <psi...@cs.wisc.edu> writes:

> Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> In terms of abstract data types (ADT), the hash table is a dictionary
>> with a single key and delivering 0 or 1 value.
>
> A 0 or one value? Do you have a reference for that? I thought it was simply
> a value of arbitrary type.


CL-USER> (let ((h (make-hash-table)))
(gethash :key h))

NIL
NIL
CL-USER> (let ((h (make-hash-table)))
(setf (gethash :key h) :value)
(gethash :key h))

:VALUE
T

It does so not by returning a different number of CL:VALUES, but by
indicating whether the value it returns was in the hash table or not.

If you meant CL:VALUES, this simple wrapper shows it:


CL-USER> (defun gh (k h) (multiple-value-bind (v p) (gethash k h) (if p (values v) (values))))

GH
CL-USER> (let ((h (make-hash-table)))
(gh :key h))

; No value
CL-USER> (let ((h (make-hash-table)))
(setf (gethash :key h) :value)
(gh :key h))

:VALUE

>> If you want to index several keys in the manner of hashes of hashes
>> described above, you have a different kind of ADT, namely a tree, each
>> node being a hash-table, with children named by the keys. When you have
>> a path (a sequence of keys) you can walk the tree from the root down to
>> a leaf or a node (hash-table). Those are operations quite different
>> than the hash-table gethash.
>
> [snip]
>
>> But again, you will see that it's two different data structures, because
>> you will get different time and space complexities.
>
> I'll accept that explanation. Additionally, another good reason for
> making it its own ADT is that each node's hash table should be created
> with the same test comparison as the other nodes in the tree. That
> coupled with the complexity argument satisfies me.
>
> Pascal, is it ok if I use and extend this snippet of code in an Apache
> V2 licensed project as long as I attribute the original version of it
> to you?

It's ok, no problem.

Peter Keller

unread,
Oct 12, 2010, 11:24:21 AM10/12/10
to
Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Peter Keller <psi...@cs.wisc.edu> writes:
>> Pascal J. Bourguignon <p...@informatimago.com> wrote:
>>> In terms of abstract data types (ADT), the hash table is a dictionary
>>> with a single key and delivering 0 or 1 value.
>>
>> A 0 or one value? Do you have a reference for that? I thought it was simply
>> a value of arbitrary type.

> CL-USER> (let ((h (make-hash-table)))


> (setf (gethash :key h) :value)
> (gethash :key h))
>
> :VALUE
> T

Hrm. I now understand what you meant. I thought you meant the value of
a hash table key was literally the integer 0 or integer 1. I thought
you were digging out some obscure mathematical reference algorithm
of a hash table from Knuth or something.... :)

What you really meant was the first value of the 2 values from gethash
is either NIL if not present or something, but that something itself
cannot be a set of values as how values is used in CL. The second value is
used to solve the value gap of the first one, since NIL might have been the
actual thing associated with the key.

>> Pascal, is it ok if I use and extend this snippet of code in an Apache
>> V2 licensed project as long as I attribute the original version of it
>> to you?
>
> It's ok, no problem.

Awesome. Thanks!

-pete

Peder O. Klingenberg

unread,
Oct 13, 2010, 8:08:54 AM10/13/10
to
Peter Keller <psi...@cs.wisc.edu> writes:

> I've needed to do a very similar thing and have thought about an
> interface to a hash utility library.

It seems nearly everone has written one of these. I wrote one a few
years ago. It has grown some more features than the others posted so
far, like a analogues for maphash, remhash and clrhash, some
pretty-printing stuff, etc.

You can find it at
<http://heim.ifi.uio.no/pok/download/nested-hash.lisp> if you're
interested.

...Peder...
--
Sl�v uten dop.

Josef Wolf

unread,
Oct 13, 2010, 6:36:51 PM10/13/10
to
On 2010-10-11, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> Peter Keller <psi...@cs.wisc.edu> writes:
>> Pascal J. Bourguignon <p...@informatimago.com> wrote:
>>> Peter Keller <psi...@cs.wisc.edu> writes:
>>>> Josef Wolf <j...@raven.inka.de> wrote:
>>>>> Turns out to be not that easy as I assumed. Creating/managing hashes and
>>>>> hashes of hashes in CL feels a little bit awkward to me. When I would do
>>>>> something intuitive like $x{Element}{"Q102"}{Elementline}=[$parameters] in
>>>>> perl, I turn up to creating functions to get the setf/gethash sequences
>>>>> sorted out in CL.
[ ... ]

> If you want to index several keys in the manner of hashes of hashes
> described above, you have a different kind of ADT, namely a tree, each
> node being a hash-table, with children named by the keys. When you have
> a path (a sequence of keys) you can walk the tree from the root down to
> a leaf or a node (hash-table). Those are operations quite different
> than the hash-table gethash.

Of course, you can look at it this way. Nevertheless, there are lots of
languages where building hashes/arrays of hashes/arrays is quite natural
with concepts similar to the one I mentioned above. Perl/Python/Ruby spring
immediately into mind, and they all use different syntaxes to do this thing.

I'm quite surprised to read that a new ADT is needed to do that. I used
hashes/arrays of hashes/arrays for many years now without feeling any need
to introduce new ATD's.

> You could also use lists as keys for a normal hash-table:
>
> (let ((x (make-hash-table :test (function equal))))
> (setf (gethash '(element "Q102" element-line) x) 112
> (gethash '(attribute "Q10" name) x) "hello")
> x)
> --> #S(HASH-TABLE :TEST EXT:FASTHASH-EQUAL
> ((ATTRIBUTE "Q10" NAME) . "hello")
> ((ELEMENT "Q102" ELEMENT-LINE) . 112))
>
> Thus having a single flat hash-table for all the path - value pairs.

Yeah, but that's a totally different beast.

> But again, you will see that it's two different data structures, because
> you will get different time and space complexities. One or the other
> may be advantageous, depending on the algorithm used to compute the hash
> of the sequence of keys, or the sequence of hash of the keys. Some
> implementations use only the first few elements of the sequence, giving
> therefore poor results when you have big subtrees. (Again it depends
> also of the kind of data you have, you should use the data structure
> that is adapted to your data and your algorithms).

I lost context here. You're comparing single-hash-with-lists-as-keys
against the hash tree structuring, aren't you?

> I also wanted to mention something earlier in the thread, that in Lisp
> (and it should be the case in any other programming language too, but
> sometimes it's not done because of the syntactic overhead), indeed, we
> often 'create' functions to implement data types. This is called
> functional abstraction, and this is something that's really done
> routinely. Most of the defining macros create quite a number of
> functions automatically.

Agree.

> For us lisper, there's something really strange in calling an expression
> 'intuitive' for its syntax. There's nothing really intuitive about
> syntax, since it has to be processed by the compiler, something that is
> usually hidden in depths of complex code.

Oh, when I wrote the above paragraph, I was not talking about syntax at all.
And indeed, Perl/Ruby/Python, for example, all use different syntax for this
concept.

Quite the opposite: I was talking about the semantics. What I was missing was
autovivification (in perl lingua). That is, that hash/arrays spring into
existence automatically if you use a path through an hash/array-tree to
assign something to the entry described by this path. This has absolutely
nothing to do with syntax. It's quite similar to your hash-tree-get example,
only that you can mix arrays/hashes arbitrarily and it's a really recursive
definition.

> On the other hand, without any syntax, (one grammar rule), and using
> function whose call semantic is clearly and simply defined, we can get a
> much better intuition of the semantics of an expression.
>
> (setf (hash-tree-get tree 'element "Q102" 'element-line) parameters)
>
> is much more readable and intuitive than:
>
> $x{Element}{"Q102"}{Elementline}=[$parameters]
>
> First we have a verb, SET, (intuitively we just ignore what we don't
> understand such as this 'F' at the end of SETF), which intuitively means
> that we will change something. Much more intuitive than a mathematical
> equal sign which just means (ideographically!) that two elements are
> equals. There no notion of assignment here.

Well, I doubt it's news to you, but the equal sign _is_ the assignment
operator in perl. Comparison is denoted by ==. Anyway, as I wrote above, I
was not talking about syntax. So I don't care whether I use SETF, = or :=
or yet some totally different token for assignment.

> Then we have an operator with a readable name: HASH-TREE-GET
> which clearly and intuitively let us know that it has something to do
> with GETting something out of a TREE of HASH. Then we have a variable
> named TREE, quite intuitive, contrarily to stuff like $x. We're not
> talking about x amount of dollars.

You are getting down to bad variable naming in my example? Well, since
it was meant just as an example, I thought any variable name would do.
Well, then I'll give you a real example in the context of the PCB files.

# assign a single value
#
$pcb_design->{"layer"}[3]{"lines"}[7]{"y1"} = 8000;

This describes a path through an array/hash-tree referenced by the
$pcb_design variable. Path components in curlies denote indexing by hash,
while path components in squares denote indexing by integer (that is: array)

So this statement gets the value from the hash referenced by $pcb_design
indexed by "layer". This value is treated as an array because of the
squares. At the next levels, hash/array/hash accesses are used. All the
hashes/arrays spring into existence automatically.

Oh, and here's another example:

# or assign some structure. Can be yet another array/hash tree
#
$pcb_design->{"layer"}[3]{"lines"} = [
{ x1=>7000, y1=>7000, x2=>7000, y2=>7000, width=>100 },
{ x1=>6000, y1=>7000, x2=>6000, y2=>7000, width=>100 },
];

Here we say that the "lines" entry of the hash at the third position of the
array that is pointed to by the "layer" key of the $pcb_design hash should
be assigned an array of two hashes.

And here's where your hash-tree-get fails, since it is not really a recursive
structure: your top-level is a HASH-TREE while all the descendants are
HASH-TABLE's. The resulting structure would have a HASH-TREE in the
(HASH-TREE-GET 'LAYER 3 'LINES) position. Therefore, an access of
(HASH-TREE-GET 'LAYER 3 'LINES 1 'X1) would fail, IMHO.

> Then after the tree, we have a list
> of elements, which may be somewhat obscure, but in the context of a tree
> (twice repeated, so it must be a big intuitive hint), that could be a
> path, if our intuition doesn't betray us. Finally we see that SETF has
> two arguments, that hash-tree-get operator, and that parameters
> variable, so we may intuitively assume that it is setting one to the
> other, and here you need to know a little more about the direction of
> the assignment.

I surely must have hit some wound point here... Guess I should avoid the
word "intuitive" in this group?

> In any case if you want to know more about hash-tree-get, you can just
> read its source, to understand what it means, which should be much
> easier than to find the meaning of ${}{}{}=[$] in the sources of the
> perl compiler...

I'm not going to read perl sources for that. It's described in the perlref
manpage. And if I find the details too gory, I'll go read the tutorial
that comes in the perlreftut manpage. ;-)

And BTW: I'm not here to advocate perl or its syntax. I just used perl
syntax to describe the concept I was talking about. It's hard to describe
something in a language you don't know, so I used a language I know.
I have not expected to hit some wound point here.

Josef Wolf

unread,
Oct 14, 2010, 12:10:20 PM10/14/10
to
On 2010-10-11, Pascal J. Bourguignon <p...@informatimago.com> wrote:

> CL-USER> (let ((h (make-hash-tree)))
> (setf (hash-tree-get h 'a 'b 'c 'd 'e 'f) 42
> (hash-tree-get h 'a 'b 'c 'd 'e 'g) 33
> (hash-tree-get h 'a 'b 'h 'i) 44)
> h)
> #S(HASH-TREE :TEST #<SYSTEM-FUNCTION EQL>
> :TABLE
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL
> (A .
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL
> (B .
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL
> (H . #S(HASH-TABLE :TEST EXT:FASTHASH-EQL (I . 44)))
> (C .
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL
> (D .
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL
> (E .
> #S(HASH-TABLE :TEST EXT:FASTHASH-EQL (G . 33)
> (F . 42))))))))))))
> :SPAN 8 :DEFAULT NIL)

How do you get this nested output? When I do the same in slime/sbcl on
linux, I get a rather consolidated output:

CL-USER> (let ((h (make-hash-tree)))
(setf (hash-tree-get h 'a 'b 'c 'd 'e 'f) 42
(hash-tree-get h 'a 'b 'c 'd 'e 'g) 33
(hash-tree-get h 'a 'b 'h 'i) 44)
h)

#S(HASH-TREE

:TEST #<FUNCTION EQL>
:TABLE #<HASH-TABLE :TEST EQL :COUNT 1 {100370B391}>
:SPAN 8
:DEFAULT NIL)

I know there's the inspector, but I find the output that you showed above
much more comprehensible. Is there a setting I can fiddle to get the output
expanded?

Pascal J. Bourguignon

unread,
Oct 14, 2010, 2:43:14 PM10/14/10
to
Josef Wolf <j...@raven.inka.de> writes:

I use a better implementation called clisp.


> I know there's the inspector, but I find the output that you showed above
> much more comprehensible. Is there a setting I can fiddle to get the output
> expanded?

Otherwise, you can obviously write a conforming function to do the
same... The problem is in generating a readable syntax for hash-table.

See HASHTABLE and PRINT-HASHTABLE in
http://www.informatimago.com/develop/lisp/com/informatimago/common-lisp/utility.lisp

Josef Wolf

unread,
Oct 15, 2010, 11:43:47 AM10/15/10
to

Uh, I've just gone from clisp to sbcl because I am told everywhere that sbcl
is better. In particular, clisp seems to give a pretty unusable backtrace in
slime.

http://groups.google.de/group/comp.lang.lisp/browse_thread/thread/764f658c0eb2688f#

>> I know there's the inspector, but I find the output that you showed above
>> much more comprehensible. Is there a setting I can fiddle to get the output
>> expanded?
>
> Otherwise, you can obviously write a conforming function to do the
> same... The problem is in generating a readable syntax for hash-table.

Yeah. No doubt this is possible. Many of the solvable problems can be solved
by writing a conforming function. I just thought such a function might be
already part of CL, since your REPL seems to have access to such a function.

Sorry for disturbing you...

Pascal J. Bourguignon

unread,
Oct 15, 2010, 12:02:30 PM10/15/10
to
Josef Wolf <j...@raven.inka.de> writes:

In slime any implementation gives unusable backtraces.
With some implementations this makes no difference though.

Well, clisp gives really nice backtrace natively, on interpreted code.

For debugging, clisp interpreter > slime + any implementation > clisp compiler


Anyways, let's dedramatize: each CL implementation has its strong
points, and unless you're just learning the language you should not
restrict yourself to a single one.

> http://groups.google.de/group/comp.lang.lisp/browse_thread/thread/764f658c0eb2688f#
>
>>> I know there's the inspector, but I find the output that you showed above
>>> much more comprehensible. Is there a setting I can fiddle to get the output
>>> expanded?
>>
>> Otherwise, you can obviously write a conforming function to do the
>> same... The problem is in generating a readable syntax for hash-table.
>
> Yeah. No doubt this is possible. Many of the solvable problems can be solved
> by writing a conforming function. I just thought such a function might be
> already part of CL, since your REPL seems to have access to such a function.
>
> Sorry for disturbing you...

No disturbing. That's the point of mail and news.

0 new messages