Python's append vs Racket's append and helping novices understand the implications

289 views
Skip to first unread message

Alex Harsanyi

unread,
Feb 2, 2019, 1:28:12 AM2/2/19
to Racket Users
Someone asked recently for help on Reddit[1] with a Racket performance issue.
The problem was they they were constructing a large list by appending many
short lists repeatedly; their code was calling `(set!  result (append result
shortList))` in a loop and this was slow (unsurprisingly.)

While trying to help them out, it occurred to me that this person was perhaps
translating a program from Python to Racket, maybe to evaluate Racket.  The
problem is that list-append operations are efficient in Python, but the
natural corresponding choice in Racket, the `append` function, is not.  I
wonder how many people are in a similar situation, where they try to convert a
Python program to Racket, see that the performance is bad, and conclude that
Racket is slow -- Every time Racket is mentioned on Reddit or HN there is at
least one person mentioning that Racket is slow and sadly they may even have
their own data to prove it.

Given the recent discussion in this group about promoting Racket, I am
wondering what can we do to help this category of people?  These might be
persons who never ask for help in any forum, after all the Racket
documentation is good enough to help anyone who is willing to read it.

One improvement that I can think of is to add a performance description to
each function that operates on the basic data structures (lists, vectors,
hash-tables)

What do others think?
Alex.

[1]: https://www.reddit.com/r/Racket/comments/am5r2w/how_to_read_a_file_linebyline_efficiently/

travis.h...@gmail.com

unread,
Feb 2, 2019, 2:14:01 AM2/2/19
to Racket Users
I don't have any suggestions for you but I agree that it is an important issue. My programming experience is primarily confined to R but I've recently started tinkering with Racket. My primary interest in Racket is to expand my programming horizons but I also see the potential to use it at work as replacement for R for building simulation models. The idea that Racket could replace R for those tasks is based on the expectation that Racket would be faster than R (relatively low bar for Racket to clear) but similarly expressive. However, in my 10+ years as an R programmer, there is a lot of hard-won knowledge of performance traps to avoid and optimization tricks to try. As I move forward learning Racket, it will be interesting to see how my naive Racket code stacks up to my better optimized R code. I personally will very much appreciate any efforts targeted towards making performance issues more transparent to Racket beginners.

Thanks,

Travis

Neil Van Dyke

unread,
Feb 2, 2019, 3:17:58 AM2/2/19
to Racket Users
Alex Harsanyi wrote on 2/2/19 1:28 AM:
> One improvement that I can think of is to add a performance
> description to each function that operates on the basic data
> structures (lists, vectors, hash-tables)

Before complete coverage, perhaps two first steps:

* Try to think of the minority of procedures (and use cases?) that might
be *surprisingly* expensive (either to anyone, or to users coming from
some other languages), and focus first on the documentation for those. 
`append` is likely one, for people who don't yet understand lists well,
or who aren't aware of more efficient patterns/idioms.  `list?` (or
`listp` elsewhere in our Lisp family) is one that bit me and others,
somehow not realizing it's potentially O(n) [1] (though apparently
Racket now optimizes some cases of that, for immutable pairs[2]).  Those
are the first two that come to mind, but I assume someone would find
others by looking methodically through the documentation.

* Exposing lists based on pairs means programmers must understand the
list model (list is not an opaque abstract data type), and it'll help
them if they know some patterns for working with the model. Teach every
Racketeer some old-school list processing.[3][4]  Maybe teach this
before `for`-something (before they think "Aha!  Now I know the Racket
way, which is pretty much the same as in many other languages I know!",
and then they consider the finer points of lists and recursion to be
esoteric stuff that maybe they don't learn well, soon).


[1] The current documentation even hides the recursive mathematical
definition of a list in innocent-looking text, trying to trick users
into the classic mistake! :)

[2] It looks like Racket C code now sometimes stores a bit with
immutable pairs to cache list-ness, though the documentation's exact
characterization of the performance of this isn't obvious from the C
function alone.  Also, I don't see where in that function the
PAIR_IS_NON_LIST flag is checked, for optimizing those cases. (BTW,

[3] https://m.xkcd.com/297/

[4] One day, I got all excited by Roombas, and accidentally typed an
alist primer.
https://www.neilvandyke.org/racket/roomba/#%28part._.Association._.List._.Primer%29

Thomas F. Burdick

unread,
Feb 2, 2019, 7:57:15 AM2/2/19
to Alex Harsanyi, Racket Users


On February 2, 2019 7:28:12 AM GMT+01:00, Alex Harsanyi <alexha...@gmail.com> wrote:
>Someone asked recently for help on Reddit[1] with a Racket performance
>issue.
>The problem was they they were constructing a large list by appending
>many
>short lists repeatedly; their code was calling `(set! result (append
>result
>shortList))` in a loop and this was slow (unsurprisingly.)

In addition to the general list-processing advice they were likely given, I suspect someone coming from Python might appreciate tconc structures. SRFI 117 is one attempt at fleshing this out into a contemporary API. But just classic tconc and lconc are pretty useful to know and have at your fingertips.

-Thomas

Robby Findler

unread,
Feb 2, 2019, 9:38:57 AM2/2/19
to Alex Harsanyi, Racket Users
Lists seem like a common pitfall here, due to the overlap in terminology but not functionality/performance. Maybe the right thing is to add a library to data/<something> that is the python list data structure and point to it from the list documentation?

Robby 

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sam Tobin-Hochstadt

unread,
Feb 2, 2019, 9:45:29 AM2/2/19
to Robby Findler, Alex Harsanyi, Racket Users
Fortunately there are already libraries with good performance on these
operations for Racket, so we could point to the data/ralist library,
for example.

Sam

Robby Findler

unread,
Feb 2, 2019, 9:46:31 AM2/2/19
to Sam Tobin-Hochstadt, Alex Harsanyi, Racket Users
Oh, right! Mentioning python in the list docs seems like it might help. 

Robby

Sam Tobin-Hochstadt

unread,
Feb 2, 2019, 10:22:24 AM2/2/19
to Robby Findler, Alex Harsanyi, Racket Users
I agree with that 100%. 

George Neuner

unread,
Feb 2, 2019, 10:23:27 AM2/2/19
to Robby Findler, racket users

On 2/2/2019 9:38 AM, Robby Findler wrote:
> Lists seem like a common pitfall here, due to the overlap in
> terminology but not functionality/performance. Maybe the right thing
> is to add a library to data/<something> that is the python list data
> structure and point to it from the list documentation?

The problem is that a Python "list" isn't a list at all - it's an
extensible vector of pointers.  Perhaps the right thing is to educate
Pythonistas that they really haven't been using lists.

I know nuances are lost on most people, but there's a huge difference
between a list and a vector.

YMMV,
George

Robby Findler

unread,
Feb 2, 2019, 10:30:30 AM2/2/19
to George Neuner, racket users
On Sat, Feb 2, 2019 at 9:23 AM George Neuner <gneu...@comcast.net> wrote:
>
>
> On 2/2/2019 9:38 AM, Robby Findler wrote:
> > Lists seem like a common pitfall here, due to the overlap in
> > terminology but not functionality/performance. Maybe the right thing
> > is to add a library to data/<something> that is the python list data
> > structure and point to it from the list documentation?
>
> The problem is that a Python "list" isn't a list at all - it's an
> extensible vector of pointers. Perhaps the right thing is to educate
> Pythonistas that they really haven't been using lists.

Although eduction is certainly a good goal(!), that seems like the
long path to the helpful result for us. :)

Robby

Philip McGrath

unread,
Feb 2, 2019, 2:37:51 PM2/2/19
to Robby Findler, George Neuner, racket users
I am certain people coming from Python are confused by this, since I was bitten by this very difference when I had to write some Python for the first time in a while. (What do you mean append has side-effects?!?)

-Philip


Matthias Felleisen

unread,
Feb 2, 2019, 6:00:10 PM2/2/19
to racket users


Racket needs *you*. Please.

The proper approach is to have short pages for different language immigration groups: Python and R come to mind as obvious examples but I am sure there are others.

What I mean is we need help and *you* can help. Let me explain it with the Python example:

1. Set up a page (wiki?) called “From Python to Racket”

2. Create two sections that are immediately visible from the top:

— idioms
— performance pitfalls

3. In the specific case of Python, the second subsection needs to start with a subsection on

— Python Lists aren’t Racket Lists
— then point to data/ralis and show how to transliterate the loop/append example like this
— optionally also show the more native Racket idiom

4. When anyone observers another blog/social media/whatever post on Racket is slow because I come from Python,

(a) point the posters to the page or
(b) if it is a new case, write a section for this example then do (a)


If you want to help advertise Racket to others, this is an excellent way of helping out.

Thanks — Matthias

[[ p.s. For my very first Python program (a couple of days before meeting with GvR), I used Python’s append and was annoyed beyond belief. ]]


Alex Harsanyi

unread,
Feb 2, 2019, 11:37:08 PM2/2/19
to Racket Users

I put together some notes about available data structures in Racket, with some performance considerations.  It needs more work, but perhaps it can be used as a starting point and it can be added to the Racket wiki, if/when others consider it adequate:


I didn't write a "Python to Racket" guide, because I don't really know enough about Python to write such a document, and I also think that a more generic document is simpler to maintain and can be used by people who come from other languages as well.

I also tried to keep the document short, the aim being to provide a competent programmer who is new to Racket with a 5 minute overview to its data structures and some links to the starting points in the documentation.  We can add things to it, but I think it is better to keep it short rather than comprehensive in this case -- after all, there is the Racket Guide and Racket Reference and these documents contain all the details.  Perhaps new documents can be added to the wiki, exploring other topics in more detail.

I did not mention `ralist` because (1) I have no experience with it, but more importantly (2) the package is not part of the Racket distribution and has to be installed separately.  I don't it reflects well on Racket if we tell people to install a separate package if they want an efficient container...  I have no experience with `ralist`, but if it is indeed a good data structure and it has a potentially wide usage, it should be included in the default Racket installation.

Alex.

Jens Axel Søgaard

unread,
Feb 3, 2019, 6:55:09 AM2/3/19
to Alex Harsanyi, Racket Users

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
--
Jens Axel Søgaard

Zelphir Kaltstahl

unread,
Feb 3, 2019, 6:57:31 AM2/3/19
to racket...@googlegroups.com
I think having such migration pages for people coming from other
languages is a good idea. Maybe also should include basic programming
knowledge such as "a single linked list is not a vector", but then where
to begin? If such page says that some operations are slower on Racket
lists (append), then best to put also an explanation why this is not
necessarily a bad thing (because the lists are used in a certain way,
where such operations are avoided mostly and otherwise we use
differently named data structures) or in what situations some operations
are faster. Otherwise people might simply go and think: "Oh my, why
don't they simply use lists like in Python?" The page needs to tell
people: "What you found in Python, you also have in Racket, just use
data structure xyz."


Robby Findler

unread,
Feb 3, 2019, 8:21:12 AM2/3/19
to Alex Harsanyi, Racket Users
This is very nice!

Robby

Laurent

unread,
Feb 3, 2019, 10:12:33 AM2/3/19
to Matthias Felleisen, racket users
Now you just peaked my curiosity!

When was that and what was the outcome of this meeting?

Matthias Felleisen

unread,
Feb 3, 2019, 10:40:17 AM2/3/19
to Laurent, racket users


On Feb 3, 2019, at 10:12 AM, Laurent <laurent...@gmail.com> wrote:

When was that and what was the outcome of this meeting?


Nothing. It was a waste of my time. — Matthias

Laurent

unread,
Feb 3, 2019, 11:08:15 AM2/3/19
to Matthias Felleisen, racket users
Ouch!! :-D

Matthias Felleisen

unread,
Feb 3, 2019, 4:35:13 PM2/3/19
to Alex Harsanyi, Racket Users

1. I think this is a great start for a generic introduction to data structures. Someone should integrate Jens’s short table: 



2. I think language-to-language documents serve a different role, but your document could be cited from there. 

The point of say “From Python to Racket” would be to show how comprehensions translate or how classes work 1-1. And yes, it would also explain that Racket calls something a list that does __not___ at all correspond to a list. 

The corresponding Java write-up would be quite different again. In that case, we would be dealing with people who might not know more than classes and methods. But they might actually know proper design and might know that it calls for recursion (hidden in classes and interfaces). In Racket, that works even better than in Java. Plus it would need to say something brief about types. 

And R would be an entirely different story. 

— Matthias





Robby Findler

unread,
Feb 3, 2019, 4:43:37 PM2/3/19
to Matthias Felleisen, Alex Harsanyi, Racket Users
It seems like a great addition to the performance section of the guide. 

Robby 

Matthias Felleisen

unread,
Feb 3, 2019, 5:38:08 PM2/3/19
to Robby Findler, Alex Harsanyi, Racket Users

Agreed! 

Neil Van Dyke

unread,
Feb 3, 2019, 8:29:38 PM2/3/19
to Racket Users
Recent Python experience that might inform Racket docs...

I'm recently dusting off my Python skills[1], while switching to Python
3, so I'm checking documentation for every little thing.

It was a little surprising to me, that the current official Language and
Standard Library references did not seem very intuitive, compared to
those of Racket.  The short version is that some of the fragmenting of
topics seemed unnecessary, and particularly disruptive (e.g.,
documentation would just end before talking about something that had to
be there, so you'd confirm that you were in the most plausible manual
and section, then you'd go and start opening links from what you just
read in tabs).[2]

To try to inform Racket docs from this inspiration/itch, I'm aware that
there's very limited resources, but, when possible, I'd like to suggest:

* As I've mentioned before, I think it would be good if the Reference
and Guide were consolidated.  I still find the distinction between the
two counterproductive, when I go to read up on something, and need to
keep checking that I'm not missing half of the key information,
including what I need.  I think this part of the core is a closed world,
and small enough, for one manual, and I don't understand the schism.

* Consider which of the separate core manuals should be incorporated
into the main manual, rather than buried somewhere in the huge "wall of
manuals" list on the documentation page, possibly only discovered
through haphazard search.  Especially if they're key/central.  (Or,
alternatively, break up everything, so at least it's more consistent,
and we always go to the wall of manuals list for any topic.  But
semi-random fragmentation of now-central core stuff due to evolution or
historical accident would seem to cost us every time someone needs that
information.)

* Going forward, with new core documentation being written, try to put
it in the best place for discovery and core-ness in context (perhaps
inline as a section of topical material in core manual, or maybe it's a
chapter of the core manual, or maybe it's a big thing that's part of
core but really a fringe topic that should be its own doc, or maybe it's
not core).

* Tutorials... The main core manual can have tutorial-like discussions
like the current Guide does, in with the topics.  We also want to
encourage tutorials that are purely redundant with the core manuals in
the core-specific information they contain, and these can be separate
and developed in a very distributed community way.  (And, in the
development of each tutorial, maybe there's a fuzzy sense of the
distinction between "this part really belongs in the core manual --
probably lots of everyone wants to see this when they're looking at that
topical chapter" and "the rest of this narrative is redundant with the
core manual, primarily an alternative pedagogic approach for a
particular audience or side topic, and should be an independent
tutorial").  I'll group "cheat sheet for moving from language X to
Racket" in with tutorials, and suggest that they should also be
redundant with the core manuals in the core-specific info they contain.

* I'd lean towards putting noteworthy performance notes inline with the
respective topics.  For example, the contiguous sections that contain
all documentation about hashes (the keyed collection kind) also
characterize the performance, and might link to alternatives in this
discussion.  And maybe so very briefly mention that these are used
similar to particular abstractions in super-popular languages. I
appreciate one reason not to mention performance in the text that a new
Racketeer will be looking at is pedagogic (people get tripped-up
overthinking performance before they've learned basics and idioms), but
I think we can say a hash get is O(whatever) without crippling newbies
with analysis paralysis.  However, broader performance optimization
strategies and tools not specific to, say, hashes, would be its own
chapter, of course.  (If someone developed a rigorous cost model, even
that (the eventual Racket engineering documentation, not any separate
academic paper, nor any tool support) might be inline with the canonical
topical text, rather than its own separate chapter or manual; I'm not sure.)

Again, resources are tight, and, if a lot of this is not possible even
if you agree with it, then maybe just consider some gist of these
suggestions when further work is happening anyway?


[1] Doing Python even for things for which Racket would work even
better, partly for technical pragmatic reasons (learning ML tools that
use it for examples, and for a little server tool with minimal
dependencies), and partly because I briefly considered playing along
with FAANG hazing of principal&senior-level engineer applicants.  ("For
the first new-college-grad step, I need to see whether you can code." 
"Uh, you have my resume, and I know you guys have heard of open source. 
Is this inflexibility, or institutionalized negging?"  "You're pretty;
you shouldn't be self-conscious about that facial birthmark I notice." 
"I'm sure we have collegial mutual respect, and aligned goals to find
the right match, so perhaps we could focus on--"  "Dance for my
amusement, worm!" :)

[2] I'm not talking about OO class library doc organization, which
javadoc quickly taught everyone to navigate, if we hadn't learned it
from Smalltalk browsers or C++ library docs already.  javadoc was
imperfect, but I don't recall as much *additional* fragmentation of core
documentation then as I'm seeing in some places now.

Gustavo Massaccesi

unread,
Feb 4, 2019, 11:57:10 AM2/4/19
to Racket Users, Alex Harsanyi, Robby Findler, Matthias Felleisen
For some reason, the Racket vs Python performance is a question that arise from time to time in Hacke News (like 2 or 3 times per year). Last time I asked if it was possible to add a direct comparison in The Computer Language Benchmarks Game and the owner/maintainer added that page. The results are in https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/racket-python3.html [Standard disclaimer: The results change from program to program, so take this comparison only as a guide.]

A short version of the results is that:
* in 6 of the programs Racket is much faster
* in 1 of the programs Racket is faster
* in 2 of the programs there is almost a tie
* in 1 of the programs Racket is slower

Some programs in Racket need parallelization, so the results may improve in the future. [I didn't look too carefully at the programs in Python.]

My unofficial takeaway, not completely backed by the benchmarks, is that
* For numeric programs Racket is (may be) 5 to 20 times faster than Python
* For programs with too many stings and hashes, the speed is almost the same.



Also, there was an old thread about translating the classic Norvig's spelling correction from Python to Racket, without changing the structure of the implementation too much and using idiomatic code.
https://groups.google.com/forum/#!topic/racket-users/u0Ua1kTUSKw After a few attempts, the final version in Racket was a 30% slower. I didn't follow this too much after the discussion, so I don't know if there is a smaller or bigger difference using the current versions. (Also, I think we never tried to make is fast as possible, with the same algorithms, but changing the implementation as much as it was necessary.)

Gustavo

Matthias Felleisen

unread,
Feb 4, 2019, 1:34:55 PM2/4/19
to Gustavo Massaccesi, Racket Users, Alex Harsanyi, Robby Findler

Great. Let’s include this link in Alex’s write up as an example of a concrete comparison. Even if such simple benchmarks don’t reflect daily, end-to-end programs, they can help people by dispelling some prejudices. Thanks — Matthias

Claes Wallin (韋嘉誠)

unread,
Feb 11, 2019, 11:04:18 PM2/11/19
to racket users
For anyone creating such a web page for Python to Racket specifically,
there is probably a great deal of inspiration, and reminders of
stumbling blocks, to be found in Arne Babenhauserheide's
https://www.draketo.de/py2guile book (available online for free) about
going from Python to Guile Scheme.

--
/c

Matthias Felleisen

unread,
Feb 12, 2019, 12:20:43 PM2/12/19
to "Claes Wallin (韋嘉誠)", racket users

This is perfect! Thanks — Matthias

Stephen De Gabrielle

unread,
Feb 13, 2019, 3:03:22 PM2/13/19
to Matthias Felleisen, Claes Wallin (韋嘉誠), racket users
I created a DRAFT page on the Racket GitHub wiki:
It links to a 'Choosing a data structure' page https://github.com/racket/racket/wiki/Choosing-a-data-structure

Please edit/delete as you see fit

Kind regards
Stephen

Stephen De Gabrielle

unread,
Feb 13, 2019, 3:12:03 PM2/13/19
to Matthias Felleisen, Claes Wallin (韋嘉誠), racket users
Should there be similar pages for Javascript, Java,C#, C/C++, Ruby, PHP, Visual Basic, Scratch and Haskell?

S.

Matthias Felleisen

unread,
Feb 13, 2019, 4:27:00 PM2/13/19
to Stephen De Gabrielle, "Claes Wallin (韋嘉誠)", racket users

Not until there’s demand. I think we do see demand for Python transfers.

Stephen De Gabrielle

unread,
Feb 13, 2019, 5:35:48 PM2/13/19
to Matthias Felleisen, "Claes Wallin (韋嘉誠)", racket users
Thanks 

I should note that anyone with a GitHub account can edit https://github.com/racket/racket/wiki/Python-to-Racket

S.
--
----

John Clements

unread,
Feb 16, 2019, 4:00:23 PM2/16/19
to Stephen De Gabrielle, Matthias Felleisen, Claes Wallin (韋嘉誠), racket users, David Van Horn
The pointer to RaLists would be much more enticing if we could convince David Van Horn to begin his documentation with a couple of small examples….

John

David Van Horn

unread,
Feb 16, 2019, 5:00:35 PM2/16/19
to John Clements, Claes Wallin (韋嘉誠), Matthias Felleisen, Stephen De Gabrielle, racket users
There are several examples for every function provided by the ralist library.  (And it's almost verbatim what's in the Racket reference for pairs and lists.)

But... I can add some early examples.  No problem.

David

John Clements

unread,
Feb 16, 2019, 5:21:42 PM2/16/19
to David Van Horn, Claes Wallin (韋嘉誠), Matthias Felleisen, Stephen De Gabrielle, racket users
I struggled with whether to send that message… I saw the name, “random access lists”, thought, “hmm, I wonder what that would like”, and clicked on the link. I wound up reading a bit about whether I should use (first impresssion) superficial or in-depth contracts, and ran out of steam pretty quickly. Ultimately, of course, the real issue is that your documentation wasn’t designed to help python programmers jump into their first experience with Racket, but a link to your documentation has essentially just made you an involuntary ambassador.


John

David Van Horn

unread,
Feb 17, 2019, 12:21:32 AM2/17/19
to John Clements, Claes Wallin (韋嘉誠), Matthias Felleisen, Stephen De Gabrielle, racket users
I've updated the docs to get to the point faster with some early examples.


David

Will Jukes

unread,
Feb 17, 2019, 4:33:09 PM2/17/19
to David Van Horn, John Clements, Claes Wallin (韋嘉誠), Matthias Felleisen, Stephen De Gabrielle, racket users
I went ahead and added some info on the difference between Racket and Python appends to https://github.com/racket/racket/wiki/Python-to-Racket . Sorry to whoever for the salvo of poorly documented revisions.
Reply all
Reply to author
Forward
0 new messages