Proposal for "apropos" feature that searches exported symbols across all quicklisp libraries

219 views
Skip to first unread message

Paul Sexton

unread,
May 18, 2012, 5:34:00 AM5/18/12
to Quicklisp
Hi,
Quicklisp is really great. It reduces the pain of installing a lisp
working environment, downloading dependencies, and keeping them
updated by about 99.9%.

I was looking at the alexandria library, and some of the other
"miscellaneous utilities" libraries distributed by quicklisp, and
thinking how I really have no idea what's in any of them, and no easy
way to find out. Then I thought about whether an "apropos" command
that searched exported symbols would be feasible.

The idea is that part of the process of building each update to the
quicklisp distribution would be to run a function that scans every
library known to quicklisp, extracts all the symbols exported by the
DEFPACKAGE forms within the source files of those libraries (by
parsing the files without loading them), and saves that information to
a text file that is included with the quicklisp distribution.

The end user could then call a function from the REPL which would
search the text file for occurrences of a string.

I have written a proof-of-concept which can "grovel" ASDF systems and
installed Quicklisp libraries for exported symbols. It uses
com.informatimago.common-lisp.lisp-text, which will need to be
installed via quicklisp.

Example:

CL-USER> (ql-search:ql-search-library "struct" "cffi")

Parsing source file: #P"C:/Users/paul/quicklisp/dists/quicklisp/
software/cffi_0.10.6/examples/examples.lisp"
Parsing source file: #P"C:/Users/paul/quicklisp/dists/quicklisp/
software/cffi_0.10.6/examples/gethostname.lisp"
Parsing source file: #P"C:/Users/paul/quicklisp/dists/quicklisp/
software/cffi_0.10.6/examples/gettimeofday.lisp"
Parsing source file: #P"C:/Users/paul/quicklisp/dists/quicklisp/
software/cffi_0.10.6/grovel/package.lisp"
Parsing source file: #P"C:/Users/paul/quicklisp/dists/quicklisp/
software/cffi_0.10.6/grovel/invoke.lisp"
[etc]

System "cffi-uffi-compat": cffi-uffi-compat:def-struct
System "cffi": cffi:defcstruct
System "cffi": cffi:define-c-struct-wrapper
CL-USER>

Note how it searches several ASDF systems (cffi, cffi-uffi-compat and
others) contained in the cffi quicklisp library.

The code is at:
https://gist.github.com/1b7d2e444e21c139d12d

Pascal J. Bourguignon

unread,
May 18, 2012, 6:00:42 AM5/18/12
to quic...@googlegroups.com
Paul Sexton <psext...@gmail.com> writes:

> Hi,
> Quicklisp is really great. It reduces the pain of installing a lisp
> working environment, downloading dependencies, and keeping them
> updated by about 99.9%.
>
> I was looking at the alexandria library, and some of the other
> "miscellaneous utilities" libraries distributed by quicklisp, and
> thinking how I really have no idea what's in any of them, and no easy
> way to find out. Then I thought about whether an "apropos" command
> that searched exported symbols would be feasible.

Also, have a look at Sven Van Caekenberghe's lispdoc or the version I
have patched here:
https://gitorious.org/com-informatimago/com-informatimago/blobs/master/lispdoc/lispdoc.lisp

While apropos is nice, reading the output of lispdoc may be more useful
to get an idea of what's in a library. On the other hand, lispdoc
requires loading the library first. While this doesn't help to document
implementation specific variants, it greatly facilitate the processing
of lisp sources, if you can just load them and use the normal lisp
reader (ie. with read-time side effects and other reader macros).


There are also other documentation generation systems.



cl-user> (ql:quickload :com.informatimago.lispdoc)
To load "com.informatimago.lispdoc":
Load 1 ASDF system:
com.informatimago.lispdoc
; Loading "com.informatimago.lispdoc"
[package com.informatimago.rdp]...................
[package com.informatimago.clext.character-sets]..
[package com.informatimago.common-lisp.lisp-reader.reader].
..................................................
[package com.informatimago.lispdoc]....
(:com.informatimago.lispdoc)
cl-user> (com.informatimago.lispdoc:lispdoc-html "/tmp/alexandria/" '(:alexandria))
;; Writing file hierarchical-package-index.html
;; Writing file flat-package-index.html
[…]
;; Writing file alexandria.0.dev.html
#P"/tmp/alexandria/"

and then:
M-x w3m-browse-url RET file:///tmp/alexandria/alexandria.0.dev.html RET
gives:

------------------------------------------------------------------------
  Informatimago CL Software   Documentation Index   Hierarchical Package Index   Flat
Package Index   Symbol Indices   Up: ALEXANDRIA.0

---------------------------------------------------------------------------------------

Package ALEXANDRIA.0.DEV

Nicknames: ALEXANDRIA

undocumented

(alist-hash-table alist &rest hash-table-initargs) function

Returns a hash table containing the keys and values of the association list
ALIST. Hash table is initialized using the HASH-TABLE-INITARGS.

(alist-plist alist) function

Returns a property list containing the same keys and values as the
association list ALIST in the same order.

(appendf g9076 &rest lists) macro

Modify-macro for APPEND. Appends LISTS to the place designated by the first
argument.

array-index type

Type designator for an index into array of LENGTH: an integer between
0 (inclusive) and LENGTH (exclusive). LENGTH defaults to
ARRAY-DIMENSION-LIMIT.





I should parameterize the header to avoid irrelevant references to
informatimago. I patched it to generate the documentation of my
packages.

--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.

Daimrod

unread,
May 18, 2012, 6:10:34 AM5/18/12
to quic...@googlegroups.com
If you use SLIME there is slime-apropos-package bound to C-c C-d C-p.

e.g.
C-c C-d C-p ALEXANDRIA RET

ALIST-HASH-TABLE
Function: Returns a hash table containing the keys and values of the association list
ALIST-PLIST
Function: Returns a property list containing the same keys and values as the
APPENDF
Macro: Modify-macro for APPEND. Appends LISTS to the place designated by the first
ARRAY-INDEX
Type: Type designator for an index into array of LENGTH: an integer between
ARRAY-LENGTH
Type: Type designator for a dimension of an array of LENGTH: an integer between
ASSOC-VALUE
Function: ASSOC-VALUE is an alist accessor very much like ASSOC, but it can
Setf: (not documented)
[...]
WRITE-BYTE-VECTOR-INTO-FILE
Function: Write BYTES to PATHNAME.
WRITE-STRING-INTO-FILE
Function: Write STRING to PATHNAME.
XOR
Macro: Evaluates its arguments one at a time, from left to right. If more then one

Paul Sexton

unread,
May 18, 2012, 4:15:55 PM5/18/12
to Quicklisp
Yes, I'm aware of slime-apropos, and packages like lispdoc. They only
operate on installed/loaded packages. There are > 700 libraries
bundled with quicklisp -- most users only use a small handful, and are
ignorant of most of the others. There is a great deal of duplication
of code (particularly "utility" functions). Many of the libraries have
names which communicate nothing about their function. I wonder how
many versions of "with-gensyms" are in those libraries?

A global apropos would allow users to find out what is actually inside
all of these libraries, without the need to install them locally.

Paul

Zach Beane

unread,
May 18, 2012, 4:17:11 PM5/18/12
to quic...@googlegroups.com
I'd very much like to collect this data and provide a search interface
to it. My personal use is trying to find out which project defines the
"extremum" function (although these days I can usually remember that
it's cl-utilities).

Zach

Pascal J. Bourguignon

unread,
May 18, 2012, 10:55:19 PM5/18/12
to quic...@googlegroups.com
Yes, yes, it's useful.

But I also tried to load everything once, and save a fully loaded
image. It was rather successful (only half a dozen system couldn't be
loaded in the implementation I used). The saved image wasn't
excessively big. I should try it again…

Elias Mårtenson

unread,
May 19, 2012, 12:12:07 AM5/19/12
to quic...@googlegroups.com, p...@informatimago.com
On Friday, 18 May 2012 18:00:42 UTC+8, pjb wrote:
Paul Sexton <psext...@gmail.com> writes:

> I was looking at the alexandria library, and some of the other
> "miscellaneous utilities" libraries distributed by quicklisp, and
> thinking how I really have no idea what's in any of them, and no easy
> way to find out. Then I thought about whether an "apropos" command
> that searched exported symbols would be feasible.

Also, have a look at Sven Van Caekenberghe's lispdoc or the version I
have patched here:
https://gitorious.org/com-informatimago/com-informatimago/blobs/master/lispdoc/lispdoc.lisp

While apropos is nice, reading the output of lispdoc may be more useful
to get an idea of what's in a library.  On the other hand, lispdoc
requires loading the library first.  While this doesn't help to document
implementation specific variants, it greatly facilitate the processing
of lisp sources, if you can just load them and use the normal lisp
reader (ie. with read-time side effects and other reader macros).


To blow my own horn a bit, another option is Docbrowser, which allows you to show all exported symbols and its documentation from a web browser. It should be included in the next release of Quicklisp. The project page is here: http://code.google.com/p/docbrowser/

If you just want to see what it looks like, a demo server is set up at: http://docbrowser.dhsdevelopments.com/

Paul Sexton

unread,
May 19, 2012, 1:41:25 AM5/19/12
to Quicklisp
Obviously it would be much easier to extract exported symbols if all
the packages were actually loaded by the lisp image.
Probably not a very robust design however. Maybe one could batch-
process each quicklisp library in turn -- load it into a lisp, extract
exported symbols and save them somewhere, then quit.


On May 19, 2:55 pm, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:

Mihai Călin Bazon

unread,
May 19, 2012, 4:07:23 AM5/19/12
to quic...@googlegroups.com
The other day I was looking for a function to XML-encode a string. I
thought there must be one in CXML, which I had loaded, but I couldn't
find it in a couple of minutes (Google didn't help either) so I wrote
it myself.

It would be nice if Quicklisp had such a search feature. I'm pretty
sure 90% of my utils.lisp could be found in other projects :-) (but
that's how it happens when searching is harder than writing the
code...)

-Mihai
--
Mihai Bazon,
http://mihai.bazon.net/blog

Zach Beane

unread,
May 19, 2012, 6:23:58 AM5/19/12
to quic...@googlegroups.com
Paul Sexton <psext...@gmail.com> writes:

> Obviously it would be much easier to extract exported symbols if all
> the packages were actually loaded by the lisp image.
> Probably not a very robust design however. Maybe one could batch-
> process each quicklisp library in turn -- load it into a lisp, extract
> exported symbols and save them somewhere, then quit.

I made https://github.com/xach/qlmapper for exactly that kind of thing.

Zach

Paul Sexton

unread,
May 20, 2012, 3:34:20 PM5/20/12
to Quicklisp
Ah, that's perfect.
Using that package and the one I posted, I have extracted every
exported symbol from every system in quicklisp.
The file format is tab-delimited, with 4 columns per line:
SYMBOL PACKAGE ASDF-SYSTEM QUICKLISP-LIBRARY

The total size of the file is a huge 246 MB, of which 233 MB is from
cl-gfw-* packages (opengl). For this reason I have split the file in
two:

The c-glfw symbols:
https://docs.google.com/open?id=0B23v-ApxDrqAVXNiZlBpQ0ZlOU0

All the other symbols:
https://docs.google.com/open?id=0B23v-ApxDrqAWnI0UFpJT3JfSTQ

Some interesting stats:
52 packages export 'once-only'
108 packages export 'with-gensyms'
64 packages export 'flatten'.

I will post the code later.


On May 19, 10:23 pm, Zach Beane <x...@xach.com> wrote:
> Paul Sexton <psexton...@gmail.com> writes:
> > Obviously it would be much easier to extract exported symbols if all
> > the packages were actually loaded by the lisp image.
> > Probably not a very robust design however. Maybe one could batch-
> > process each quicklisp library in turn -- load it into a lisp, extract
> > exported symbols and save them somewhere, then quit.
>
> I madehttps://github.com/xach/qlmapperfor exactly that kind of thing.
>
> Zach

Zach Beane

unread,
May 20, 2012, 4:29:42 PM5/20/12
to quic...@googlegroups.com
Paul Sexton <psext...@gmail.com> writes:

> Some interesting stats:
> 52 packages export 'once-only'
> 108 packages export 'with-gensyms'
> 64 packages export 'flatten'.
>
> I will post the code later.

These stats sound a bit off to me. How do you determine that a package
exports a symbol? Plenty of packages import or inherit symbols that are
external in their home packages, but relatively few re-export them, and
I'd also be surprised if they were completely independent symbols by the
same name.

I'd love to see the code.

Zach

Pascal J. Bourguignon

unread,
May 20, 2012, 7:05:53 PM5/20/12
to quic...@googlegroups.com
Paul Sexton <psext...@gmail.com> writes:

> Some interesting stats:
> 52 packages export 'once-only'
> 108 packages export 'with-gensyms'
> 64 packages export 'flatten'.
>
> I will post the code later.

Now, either you load all the systems and these duplications are a
possibly a problem, or you don't load all of them, but only the
required dependencies, and then these duplications mean that you prune
short the dependency branch.

That is, it is often worth copy-and-pasting a few small utility
functions in a system, to avoid a dependency to a big library.

And even if we load all the systems, we may want to have some
duplication. Eg. I'd imagine in a Lisp Machine that some duplication of
utility function may occur between "system" layers and "user" layers,
so that the user may freely patch and redefined the user version without
impacting the system version.


There's also the question of the specification of library routines.

Take for example nreconc (it occured to me in a real program so it's not
hypothetical):

nreconc reverses the order of elements in list (as if by
nreverse). It then appends (as if by nconc) the tail to that
reversed list and returns the result.

nreconc works like nreverse:

For nreverse, sequence might be destroyed and re-used to produce the
result. The result might or might not be identical to
sequence. Specifically, when sequence is a list, nreverse is
permitted to setf any part, car or cdr, of any cons that is part of
the list structure of sequence.

and nreverse gives no guarantee on how the cons cells are reused.
Therefore nreconc doesn't either.

Now even if nreconc is implemented as you'd expect it to be implemented,
ie.:

(defun nreconc (list tail)
(do ((1st (cdr list) (if (atom 1st) 1st (cdr 1st)))
(2nd list 1st)
(3rd tail 2nd))
((atom 2nd) 3rd)
(rplacd 2nd 3rd)))

you would still have to duplicate this code in your library if you
required this behavior with respect to the cons cells. (That's the
problem with documenting functions: you cannot rely on the code anymore,
the documentation specifies the contract). But the point is that some
code duplication may be due to the fact that the different copies
promise different contracts, even if they have the same name or even the
same implementation (and let's not even consider performance).



Also, duplication of functions (foremost such "utility" functions), may
denote a duplication of library, but can we really reduce or merge
libraries? There are the problems of sets of utility provided (do we
have even one library that's a strict subset of another?), and of
licensing. It would be useless to go fetch a function in another
library (thus adding a new dependency), if we cannot free ourself from
the dependency of the original library: this would just add a new
dependency. A body of code B depending on a library L may depend to a
set of functions that is not a subset of the functions provided by any
other library. Therefore it not be easy to just replace one library by
another.




But your statistics are quite interesting, and there's probably
something to be done here. But we cannot blindly remove those
duplicates and add dependencies. We'd have to color the dependency
graph by license. (see
http://fossil.nasium-lse.ogamita.com/nasium-lse/artifact/ca229c462dcf420c4497020b3d4b795f0c0e4efb
for some code collecting the licenses from asdf systems). We'd have to
compare the various occurences of the "same" function or macro: are they
really identical (eg. my own with-gensyms is different from the
"canonical" one), or do they have specification or efficiency
differences (cf. nreconc)?

Paul Sexton

unread,
May 20, 2012, 7:17:01 PM5/20/12
to Quicklisp
I have made a repository at:
https://bitbucket.org/eeeickythump/ql-grovel

Hopefully the README is self-explanatory. I haven't done much
programming inside ASDF or quicklisp so there are probably better ways
to do a lot of it.

I determine that a package exports a symbol simply by reading a
defpackage form and considering every symbol in its :export clause to
be exported by that package. So in other words, 108 systems contain
defpackage forms that contain '(:export ... #:once-only ... )'.

Paul



On May 21, 8:29 am, Zach Beane <x...@xach.com> wrote:

Zach Beane

unread,
May 20, 2012, 7:40:23 PM5/20/12
to quic...@googlegroups.com
Paul Sexton <psext...@gmail.com> writes:

> I have made a repository at:
> https://bitbucket.org/eeeickythump/ql-grovel
>
> Hopefully the README is self-explanatory. I haven't done much
> programming inside ASDF or quicklisp so there are probably better ways
> to do a lot of it.
>
> I determine that a package exports a symbol simply by reading a
> defpackage form and considering every symbol in its :export clause to
> be exported by that package. So in other words, 108 systems contain
> defpackage forms that contain '(:export ... #:once-only ... )'.

I wanted to check it out, but when I go to the link above, it prompts me
to log in, and I don't think I have a bitbucket account. What should I
do?

Zach

Paul Sexton

unread,
May 21, 2012, 12:26:36 AM5/21/12
to Quicklisp
Strange, it's not showing up via the search box either. Maybe because
it's such a new repository. I have just emailed a snapshot to you;
another thing to try is cloning it:

hg clone http://bitbucket.org/eeeickythump/ql-grovel

You may need to change the http to https.

Paul


On May 21, 11:40 am, Zach Beane <x...@xach.com> wrote:

Sebastian Tennant

unread,
Sep 29, 2012, 3:09:55 AM9/29/12
to quic...@googlegroups.com
Quoth Paul Sexton <psext...@gmail.com>:
> Strange, it's not showing up via the search box either. Maybe because
> it's such a new repository.

No. It's because it's private.

> [...] another thing to try is cloning it:
>
> hg clone http://bitbucket.org/eeeickythump/ql-grovel
>
> You may need to change the http to https.

Neither work. In both cases you are asked to authenticate (username and
password).

It is clearly not a public repository. Care to make it public so we can see
what you're doing?

Sebastian
--
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap

Reply all
Reply to author
Forward
0 new messages