> * Barry Margolin <one...@nyhz.zvg.rqh> [2012-09-20 14:22:30 -0400]:Of course.
>> > * proton <yrbfnen...@tznvy.pbz> [2012-09-20 06:13:52 -0700]:
>> > I have a huge amount of text (more than 2 TB) that I want to process
>> > My question is: what is the most efficient way to treat the words, as
>> It all depends on what you do with the text.
>> 1. Space: a symbol is more expensive than a string (see Barry's
> He said that he was going to put the strings in a hash table, so there
>> 2. Strings are compared character-by-character (EQUAL) whileOf course.
>> symbols are compared as pointers (EQ). This could be big.
> But INTERN has to compare by character when it's looking the symbol up
>> 3. You will be associating some information with each word, right? UseOf course.
>> symbols and put the information into the value slot; you will save huge
>> on access time compared with hash tables.
> How do you think INTERN finds the symbol? The package is mostly just
>> 4. Reading a symbol is more expensive than reading a string because youOf course - but suppose he ignores the strings shorter than, say, 2
>> have to intern it. If you do a lot of i/o but little processing, symbols
>> are not for you.
> But if he's going to look up the string in a hash table, that's
characters? to avoid auto-interning such strings by READ, he would have
to read them as strings and then decide whether to intern them.
My point was that symbols would be more _convenient_.
Technically, he would either use
string --[hash table]--> data structure
string --[package]--> symbol --> data structure in the symbol value slot
The second is, IMO, easier on the coding load: there would be less need
Also, using symbols will prevent him from a common error of multiple
At any rate, I agree that the difference is more
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.