Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Size and length limits for Emacs primitive types and etc data?

12 views
Skip to first unread message

Aurélien Aptel

unread,
Feb 3, 2013, 8:56:34 AM2/3/13
to Oleksandr Gavenko, help-gn...@gnu.org
On Tue, Jan 22, 2013 at 11:06 PM, Oleksandr Gavenko <gave...@gmail.com> wrote:
> Please correct me and answer the questions...

I'm just bumping this thread hoping someone knowledgeable can answer
because I'm also interested :)

Eli Zaretskii

unread,
Feb 3, 2013, 2:16:56 PM2/3/13
to help-gn...@gnu.org
> Date: Sun, 3 Feb 2013 14:56:34 +0100
> From: Aurélien Aptel <aurelien.a...@gmail.com>
> Cc: help-gn...@gnu.org
Everything was already said in the OP. And since no one jumped in to
correct that...


Aurélien Aptel

unread,
Feb 4, 2013, 7:38:23 AM2/4/13
to Eli Zaretskii, help-gn...@gnu.org
On Sun, Feb 3, 2013 at 8:16 PM, Eli Zaretskii <el...@gnu.org> wrote:
> Everything was already said in the OP. And since no one jumped in to
> correct that...

What about the hash table?

Eli Zaretskii

unread,
Feb 4, 2013, 10:57:28 AM2/4/13
to help-gn...@gnu.org
> Date: Mon, 4 Feb 2013 13:38:23 +0100
> From: Aurélien Aptel <aurelien.a...@gmail.com>
> Cc: help-gn...@gnu.org
>
What about it?


Oleksandr Gavenko

unread,
Feb 5, 2013, 4:41:55 AM2/5/13
to help-gn...@gnu.org
In my original post I can't analyse how many memory take hash tables...

And also it is interesting to know how much key-value pairs can hold hash
table.

================================================================

Also interesting thing about elisp that any related data stored in distinct
memory locations (ever any internal structs hold link to actual data, like
string for symbol name), non-together.

I think linked nature of elisp data structure cause very high rate of CPU
cache miss (but I don't actually run any AMD/Intel CPU profilers).

--
Best regards!


Eli Zaretskii

unread,
Feb 5, 2013, 1:14:08 PM2/5/13
to help-gn...@gnu.org
> From: Oleksandr Gavenko <gave...@gmail.com>
> Date: Tue, 05 Feb 2013 11:41:55 +0200
>
> In my original post I can't analyse how many memory take hash tables...

Depends on the size of the table, obviously.

> And also it is interesting to know how much key-value pairs can hold hash
> table.

most-positive-fixnum, I guess. (Why is that interesting?)

> I think linked nature of elisp data structure cause very high rate of CPU
> cache miss (but I don't actually run any AMD/Intel CPU profilers).

Prove it.

Burton Samograd

unread,
Feb 5, 2013, 2:06:35 PM2/5/13
to
Eli Zaretskii <el...@gnu.org> writes:

>> I think linked nature of elisp data structure cause very high rate of CPU
>> cache miss (but I don't actually run any AMD/Intel CPU profilers).
>
> Prove it.

Here's an analysis of cache misses with pointer based data structures
(linked list and trees):

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9669

It's common knowledge that linked lists are not cache friendly, leading
to creation of structures like Unrolled Linked Lists
(http://en.wikipedia.org/wiki/Unrolled_linked_list) to help.

--
Burton Samograd

Oleksandr Gavenko

unread,
Feb 5, 2013, 3:04:56 PM2/5/13
to help-gn...@gnu.org
On 2013-02-05, Burton Samograd wrote:

> Eli Zaretskii <el...@gnu.org> writes:
>
>>> I think linked nature of elisp data structure cause very high rate of CPU
>>> cache miss (but I don't actually run any AMD/Intel CPU profilers).
>>
>> Prove it.
>
I don't familiar with appropriated CPU profilers:

http://developer.amd.com/tools/heterogeneous-computing/amd-codeanalyst-performance-analyzer/
CodeAnalyst Performance Analyzer
http://developer.amd.com/tools/heterogeneous-computing/codexl/
GPU debugging, comprehensive GPU and CPU profiling, and static
OpenCL™ kernel analysis capabilities

http://software.intel.com/en-us/intel-vtune-amplifier-xe
Intel® VTune™ Amplifier XE 2013 is the premier profiler for C,
C++, C#, Fortran, Assembly and Java.

Main reason that it is hard to interpret right what they show (modern CPU have
too complicate execution architecture).

> Here's an analysis of cache misses with pointer based data structures
> (linked list and trees):
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9669
>
> It's common knowledge that linked lists are not cache friendly, leading
> to creation of structures like Unrolled Linked Lists
> (http://en.wikipedia.org/wiki/Unrolled_linked_list) to help.
>

I work with Isabelle proof assistant (written in ML) on my old laptop with 512
MiB of RAM (in 64-bit userspace). On large theories my laptop froze on disk
(swap) operations. While non-functional programs like Firefox and OpenOffice
operated with minor slowness in IU when out of 512 MiB.

It is possible to say that I run into 100% RAM misses!!

After installing additional memory module (up to 1 GiB) any disk operation go
away on those large theories!

Also interesting result we get when work on optimisation of GOST cipher. It
have permutation table and in order to increase speed we expand this table.
Usually larger precomputation give better performance... but in our case as
soon table out of CPU cache we lose speed in 2 or more times...

Deduce from theoretical positions (look to Burton Samograd link) it is
possible frequently cache misses in Elisp. As I don't know GNU Emacs internals
I ask about Elisp and CPU cache misses...

--
Best regards!


Oleksandr Gavenko

unread,
Feb 5, 2013, 3:17:47 PM2/5/13
to help-gn...@gnu.org
On 2013-02-05, Eli Zaretskii wrote:

>> From: Oleksandr Gavenko <gave...@gmail.com>
>> Date: Tue, 05 Feb 2013 11:41:55 +0200
>>
>> In my original post I can't analyse how many memory take hash tables...
>
> Depends on the size of the table, obviously.
>
I didn't visit CS courses. Does hash table take more then Coef*N memory space
from N keys? Like N*log_2(N) or similar?

Just looking to

http://en.wikipedia.org/wiki/Hash_table

shown that there are exist implementations with N^2 memory requirement...

>> And also it is interesting to know how much key-value pairs can hold hash
>> table.
>
> most-positive-fixnum, I guess. (Why is that interesting?)
>
I am start implementing ASN1 parser in Elisp. ASN1 data format designed to
store any theoretically possible data length (2^128 bytes...). But any
implementation have memory limits. I want to push this limits up to Elisp
abilities and document this in docs... So need concrete numbers...

--
Best regards!


Eli Zaretskii

unread,
Feb 5, 2013, 4:28:39 PM2/5/13
to help-gn...@gnu.org
> From: Burton Samograd <bur...@samograd.ca>
> Date: Tue, 05 Feb 2013 12:06:35 -0700
The relevant thing here is how Emacs Lisp programs fare in this
regard.

Eli Zaretskii

unread,
Feb 5, 2013, 4:35:02 PM2/5/13
to help-gn...@gnu.org
> From: Oleksandr Gavenko <gave...@gmail.com>
> Date: Tue, 05 Feb 2013 22:17:47 +0200
>
> I didn't visit CS courses. Does hash table take more then Coef*N memory space
> from N keys? Like N*log_2(N) or similar?

Look at 'struct Lisp_Hash_Table' definition, it's all there.
Basically, there's a vector of hash values, and for each value a chain
of key-value pairs.

Peter Dyballa

unread,
Feb 5, 2013, 5:25:33 PM2/5/13
to Oleksandr Gavenko, help-gn...@gnu.org

Am 05.02.2013 um 10:41 schrieb Oleksandr Gavenko:

> I think linked nature of elisp data structure cause very high rate of CPU
> cache miss (but I don't actually run any AMD/Intel CPU profilers).

It's also possible that this new hardware is a bit different compared to older ones. At least this is my explanation why on my Mac I see all the time 0.0 % cache hits…

--
Greetings

Pete

There are very few jobs that actually require a penis or vagina. All other jobs should be open to everybody.
– Florynce Kennedy


Stefan Monnier

unread,
Feb 6, 2013, 1:46:01 PM2/6/13
to help-gn...@gnu.org
> shown that there are exist implementations with N^2 memory requirement...

I don't see any obvious N^2 memory use in that page.
All normal implementations I know have a memory use of O(N).
Anything different is probably limited to very special application areas
(perfect hashing, maybe?).

The way hash tables work in the current Emacs implementation is
with a table of size N containing (upto) N objects (and N pointers used
to reference the next element in a chain), plus a bucket table of size
(N / rehash-threshold).


Stefan


0 new messages