Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

alist keys: strings or symbols

85 views
Skip to first unread message

exca...@tutanota.com

unread,
Jul 19, 2020, 12:28:58 PM7/19/20
to help-gn...@gnu.org
Some questions about alists:

- Is it a better practice to convert string keys to symbols?  Is
  =intern= best for this?  What about handling illegal symbol names?
- If a symbol is used as a key and that symbol is already in use
  elsewhere, is there potential for conflict with the existing symbol?

I have an alist created from parsing meta data from a file.  The file
looks like:

#+begin_src emacs-lisp :results verbatim :session exc
(defvar exc-post-meta-data
  (concat
   "#+TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+TAGS: blogging tests\n"
   "\n")
  "Sample post meta information.")

(defvar exc-post-content
  (concat
   "* Header\n"
   "** Subheader\n"
   "Hello, world!\n\n"
   "#+begin_src python\n"
   "    print('Goodbye, cruel world...')\n"
   "#+end_src\n")
  "Sample post file without meta information.")

(defvar exc-post
  (concat
   exc-post-meta-data
   exc-post-content)
  "Sample post file.")

(message "%s" exc-post)
#+end_src

#+RESULTS:
#+begin_example
"#+TITLE: Test post
,#+AUTHOR: Excalamus
,#+DATE: 2020-07-17
,#+TAGS: blogging tests

,* Header
,** Subheader
Hello, world!

,#+begin_src python
    print('Goodbye, cruel world...')
,#+end_src
"
#+end_example

The meta data is parsed into an alist:

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (org-element-property :key x)
                        (org-element-property :value x))))))

(setq exc-alist (exc-parse-org-meta-data exc-post))
exc-alist
#+end_src

#+RESULTS:
: (("TITLE" . "Test post") ("AUTHOR" . "Excalamus") ("DATE" . "2020-07-17") ("TAGS" . "blogging tests"))

Notice that the keys are strings.  This means that they require
an equality predicate like ='string-equal= to retrieve unless I use
=assoc= and =cdr=:

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: nil

#+begin_src emacs-lisp :results verbatim :session exc
(cdr (assoc "TITLE" exc-alist))
#+end_src

#+RESULTS:
: "Test post"

I can use =assoc/cdr= well enough.  The bother starts when I need
a default.  It looks like =alist-get= is what I need.

#+begin_src emacs-lisp :results verbatim :session exc
(alist-get "TYPE" exc-alist 'post nil 'string-equal)
#+end_src

#+RESULTS:
: post

This works, but now the code is getting messy. There are two forms of
lookup: the verbose =alist-get= and the brute force =assoc/cdr=.  One
requires ='string-equal=, the other does not.  If I forget the
predicate, the lookup will fail silently.

I could create a wrapper for =alist-get= which uses =string-equal=:

#+begin_src emacs-lisp :results none :session exc
(defun exc-alist-get (key alist &optional default remove)
  "Get value associated with KEY in ALIST using `string-equal'.

See `alist-get' for explanation of DEFAULT and REMOVE."
  (alist-get key alist default remove 'string-equal))
#+end_src

Now my calls are uniform and a bit more safe:

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TITLE" exc-alist)
#+end_src

#+RESULTS:
: "Test post"

#+begin_src emacs-lisp :results verbatim :session exc
(exc-alist-get "TYPE" exc-alist 'post)
#+end_src

#+RESULTS:
: post

This works, but seems like a smell.  All these problems go
back to strings as keys.  Maybe there's a better way?

I could convert the keys to symbols using =intern=. 

#+begin_src emacs-lisp :results verbatim :session exc
(defun exc-parse-org-meta-data-intern (data)
  "Parse Org formatted meta DATA into an alist.

Keywords are the '#+' options given within an Org file.  These
are things like TITLE, DATE, and FILETAGS.  Keywords are
case-sensitive!.  Values are whatever remains on that line."
  (with-temp-buffer
    (insert data)
    (org-element-map (org-element-parse-buffer 'element) 'keyword
      (lambda (x) (cons (intern (org-element-property :key x))
                        (org-element-property :value x))))))

(setq exc-alist-i (exc-parse-org-meta-data-intern exc-post))
exc-alist-i
#+end_src

#+RESULTS:
: ((TITLE . "Test post") (AUTHOR . "Excalamus") (DATE . "2020-07-17") (TAGS . "blogging tests"))

This has several apparent problems.

As I understand it, this would pollute the global obarray. Is that a
real concern?  I know the symbol is only being used as a lookup; the
variable, function, and properties shouldn't change.  Regardless, I
don't want my package to conflict with (i.e. overwrite) a person's
environment unknowingly.

The string may also have characters illegal for use as a symbol. 
Here's what happens with illegal symbol characters in the string.
#+begin_src emacs-lisp :results verbatim :session exc
(setq exc-bad-meta-data
  (concat
   "#+THE TITLE: Test post\n"
   "#+AUTHOR: Excalamus\n"
   "#+DATE: 2020-07-17\n"
   "#+POST TAGS: blogging tests\n"
   "\n"))

(setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
exc-alist-i-bad
#+end_src

#+RESULTS:
: ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))

How are situations like these best handled?

Dmitry Alexandrov

unread,
Jul 19, 2020, 7:23:12 PM7/19/20
to exca...@tutanota.com, help-gn...@gnu.org
exca...@tutanota.com wrote:
> The string may also have characters illegal for use as a symbol.

Namely?

> Here's what happens with illegal symbol characters in the string.
>
> #+begin_src emacs-lisp :results verbatim :session exc
> (setq exc-bad-meta-data
>   (concat
>    "#+THE TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+POST TAGS: blogging tests\n"
>    "\n"))
>
> (setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
> exc-alist-i-bad
> #+end_src
>
> #+RESULTS:
> : ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))
>
> How are situations like these best handled?

You mean space? Space perfectly valid character for a symbol.

I suppose, the result above is due to space being invalid character for org-mode metadata. ;-)
signature.asc

to...@tuxteam.de

unread,
Jul 20, 2020, 5:01:51 AM7/20/20
to help-gn...@gnu.org
On Sun, Jul 19, 2020 at 06:23:52PM +0200, excalamus--- via Users list for the GNU Emacs text editor wrote:
> Some questions about alists:
>
> - Is it a better practice to convert string keys to symbols?

It depends. Strings have an "inner life", i.e. are sequences
of characters, symbols are atomic and have no innards (but
see below).

So if you just want to know whether two keys are equal or not,
symbols are the more appropriate choice: it'll be faster, too;
if you find yourself asking whether one key is "greater" (that'd
be lexicographically, I guess) or "less" than another, or whether
it has such-and-such a prefix, you'd rather want a string.

The borders are somewhat fuzzy, since it's possible to extract
the string representation of a symbol). In Emacs Lisp they are
even fuzzier, since you can treat, given the right context, a
symbol as a string. This works for Emacs Lisp:

(string< 'boo "far")
=> t

Emacs lisp transforms 'boo to "foo" and compares the strings
lexicographically.

* Different equalities:

What you have to bear in mind is that there are different measures
of equality: if you are comparing just the "objects" (if you come
from C, that's --basically-- the object's addresses), you use eq.
In that case, asking for "greater" or "less" doesn't make much sense.

If you are comparing the object's "innards", you use =equal=

> Is =intern= best for this?  What about handling illegal symbol names?

Yes. And... there are few, if any, illegal symbol names. Try

(setq foo ".(")

It works. It's a funny symbol, but who cares ;-)

> - If a symbol is used as a key and that symbol is already in use
>   elsewhere, is there potential for conflict with the existing symbol?

No. Interning something gives you an address (well, there's a type
tag attached to it). If it's used somewhere else, it'll reuse that,
otherwise, a new symbol is created. Since those things are immutable,
you don't care.

[...]

> Notice that the keys are strings.  This means that they require
> an equality predicate like ='string-equal= to retrieve unless I use
> =assoc= and =cdr=:

They only require it because you want them compared _as strings_. Had
you put symbols in there, then you could have used =eq= as comparison,
which is the default (so you can leave it out).

[...]

> This works, but now the code is getting messy. There are two forms of
> lookup: the verbose =alist-get= and the brute force =assoc/cdr=.  One
> requires ='string-equal=, the other does not.  If I forget the
> predicate, the lookup will fail silently.

"fail silently" meaning that it's looking for the wrong thing in your
assoc list and not finding it.

> I could convert the keys to symbols using =intern=. 

All that said, I'd think you go with this... unless you find yourself
looking at the innards of your keys too often (extracting prefixes,
doing case-insensitive search, that kind of thing). Remember that
=eq= is just one comparison (address, basically), whereas =equal=
has to first dereference the string and then compare character by
character.

Your keywords are a choice from a limited set, and are immutable,
so to me, they /look/ like symbols. That seems to be the fitting
representation.

> This has several apparent problems.
>
> As I understand it, this would pollute the global obarray. Is that a
> real concern?

Shouldn't be. The global obarray is built for this.

> [...]  Regardless, I
> don't want my package to conflict with (i.e. overwrite) a person's
> environment unknowingly.

It won't. The obarray just maps a string to some immutable thingy
(basically a pointer with some decorations). This thingy can be
used for many things in different contexts. If some package out
there, say =shiny-widgets.el= binds some variable to the symbol
named "THE TITLE", that won't interfere with your usage. You just
happen to both use the symbol =0xdeadbef-plus-some-type-tags=
(which points to the symbol "THE TITLE" in the obarray) for
different things.

>
> The string may also have characters illegal for use as a symbol. 
> Here's what happens with illegal symbol characters in the string.
> #+begin_src emacs-lisp :results verbatim :session exc
> (setq exc-bad-meta-data
>   (concat
>    "#+THE TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+POST TAGS: blogging tests\n"
>    "\n"))
>
> (setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))

I havent't had a look at your code, but "THE TITLE" interns fine as a
symbol here.

The important thing is that you make a choice and stick consistently
to it. That includes being aware of the comparison functions used.

Cheers
-- t
signature.asc

Steve G

unread,
Mar 30, 2022, 4:01:59 PM3/30/22
to
exca...@tutanota.com writes:

> Some questions about alists:
> - Is it a better practice to convert string keys to symbols?

I would recommend keeping the data the way it is. Strings are great;
emacs has hash tables that will work well for strings.

>   Is =intern= best for this? 

No. read-from-string will do better. You don't want numbers and commas
in your symbols.

> What about handling illegal symbol names?

the function make-symbol will handle this for you.

> - If a symbol is used as a key and that symbol is already in use
>   elsewhere, is there potential for conflict with the existing symbol?

Oh yes. In common lisp the convention for global variables is to use `*'
around the symbol; such as *gensym-counter*. In emacs lisp, it is
considered bad form to use such constraints because it messes up the
minibuffer, etc.

> I have an alist created from parsing meta data from a file.  The file
> looks like:
>
> #+begin_src emacs-lisp :results verbatim :session exc

[ ... ]

I do not know why you are using the hash `#'?

> (defvar exc-post-meta-data
>   (concat
>    "#+TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+TAGS: blogging tests\n"
>    "\n")
>   "Sample post meta information.")

You could do.

(defvar executable-find '((:TITLE . "Test Post") (:AUTHOR . "Name")))

> This works, but seems like a smell.  All these problems go
> back to strings as keys.  Maybe there's a better way?
>
> I could convert the keys to symbols using =intern=. 

I would try hash tables for strings. see below.

>
> #+RESULTS:
> : ((TITLE . "Test post") (AUTHOR . "Excalamus") (DATE . "2020-07-17") (TAGS . "blogging tests"))
>
> This has several apparent problems.

> As I understand it, this would pollute the global obarray. Is that a
> real concern?  I know the symbol is only being used as a lookup;

Yes it is.

> the variable, function, and properties shouldn't change. 

This is the key. You seem to understand it. In emacs lisp there are not
duplicate symbols. Two symbols are compared by their address (or
pointer) which is called `EQ' meaning that they are the same thing.

if you are re-using symbols; i.e., if the symbols are finite then
using them should not be a problem. Such as in your case; TITLE, AUTHOR,
these are good for symbols. Parsing an email with each word as a symbol
can cause serious problems with the obarray.

However you could intern the symbols into a different obarray.

The biggest difference will be at run time. You have to make a hashtable
at runtime (to be simple about it). With an alist of symbols emacs will
already place the symbols into it's obarray without the extra code.

> Regardless, I
> don't want my package to conflict with (i.e. overwrite) a person's
> environment unknowingly.

In common lisp their are namespaces; but the problem is the same.
partitioning code is not my specialty.

> The string may also have characters illegal for use as a symbol. 
> Here's what happens with illegal symbol characters in the string.
>
> #+begin_src emacs-lisp :results verbatim :session exc
> (setq exc-bad-meta-data
>   (concat
>    "#+THE TITLE: Test post\n"
>    "#+AUTHOR: Excalamus\n"
>    "#+DATE: 2020-07-17\n"
>    "#+POST TAGS: blogging tests\n"
>    "\n"))
>
> (setq exc-alist-i-bad (exc-parse-org-meta-data-intern exc-bad-meta-data))
> exc-alist-i-bad
> #+end_src
>
> #+RESULTS:
> : ((AUTHOR . "Excalamus") (DATE . "2020-07-17"))
>
> How are situations like these best handled?


I usually use sed, awk, or perl for the input. I write shell script to
create a simple A-LIST or a file with lines of strings.

I wish there was a way to write a hashtable to output.
0 new messages