[GSoC]use hash/map in POSIX key and Classic notepad

13 views
Skip to first unread message

阿四

unread,
Mar 30, 2012, 10:50:27 AM3/30/12
to rtems...@rtems.org, rtems...@rtems.org
Hi, all. I'm interesing in the hash/map in POSIX key and Classic notepad project[1] and want to apply it as my GSoC this year. I've submitted my proposal at GSoC. Here are some problems I want to figure out, thanks for reply to any of them:

1. First, I want to be sure where the entry point of this project exactly is in the RTEMS. For the notepad, I think RTEMS_API_Control structure, which includes the Notepads array member, is the entry of the Notepad part of this project. And to the POSIX key, the corresponding structure is POSIX_Keys_Control structure, and the key area mentioned in [1] is the Values array member of POSIX_Keys_Control. Am I missing something?

typedef struct {
  rtems_event_set          pending_events;
  rtems_event_set          event_condition;
  ASR_Information          Signal;
  uint32_t                 Notepads[ RTEMS_NUMBER_NOTEPADS ];
}  RTEMS_API_Control;

typedef struct {
   Objects_Control     Object;
   void              (*destructor)( void * );
   void              **Values[ OBJECTS_APIS_LAST + 1 ];
}  POSIX_Keys_Control;

2. I doesn't quite understand the key create procedure.First, I'm not sure what determines the maximum variable at line4 in code below, is it equal to CONFIGURE_MAXIMUM_POSIX_KEYS in configuration? Second, where the code also allocates key memory for OBJECTS_INTERNAL_API, OBJECTS_CLASSIC_API, while in my opinion, only key memory allocation for OBJECTS_POSIX_API is enough. what are other things used for?

typedef enum {
  OBJECTS_NO_API       = 0,
  OBJECTS_INTERNAL_API = 1,
  OBJECTS_CLASSIC_API  = 2,
  OBJECTS_POSIX_API    = 3
} Objects_APIs;

#define OBJECTS_APIS_LAST OBJECTS_POSIX_API


1   for ( the_api = 1; the_api <= OBJECTS_APIS_LAST; the_api++ ) {
2     the_key->Values[ the_api ] = NULL;
3
4     bytes_to_allocate = sizeof( void * ) * (_Objects_Information_table[ the_api ][ 1 ]->maximum + 1);
5     table = _Workspace_Allocate( bytes_to_allocate );
6     if ( !table ) {
7       _POSIX_Keys_Free_memory( the_key );
8
9       _POSIX_Keys_Free( the_key );
10      _Thread_Enable_dispatch();
11     return ENOMEM;
12    }
13
14    the_key->Values[ the_api ] = table;
15   memset( table, '\0', bytes_to_allocate );
16  }

3. What's the general purpose of Notepads in RTEMS? After look at the example in /testsuites/sptests/sp07, I feel it is used for inter-thread communication, am I right? And the same question for POSIX key, since I haven't found much information of it in RTEMS POSIX API User’s Guide, I learn POSIX key knowledge from here[2], does it also apply to RTEMS?

4.As mentioned in [1], std::map can be  an example for this project. I wonder is std::vector also appropriate for this project? Or maybe, several differection algorithms sould be implemented, and leave the choice to user? And I haven't study hash/map algorithm in depth, maybe, I should ask related question after do more homework on this.

5. What's the function naming convention in RTEMS, like _POSIX_Key_Get is an inline function, while _POSIX_Keys_Free_memory is not.


links:
[1]http://www.rtems.com/wiki/index.php/UseHashOrMapInNotepadsAndKeys
[2]http://www.cs.cf.ac.uk/Dave/C/node29.html#SECTION002945000000000000000

Best Regards!
zw_yao

Gedare Bloom

unread,
Mar 31, 2012, 10:06:34 AM3/31/12
to 阿四, rtems...@rtems.org, rtems...@rtems.org
On Fri, Mar 30, 2012 at 10:50 AM, 阿四 <ashi...@gmail.com> wrote:
> Hi, all. I'm interesing in the hash/map in POSIX key and Classic notepad
> project[1] and want to apply it as my GSoC this year. I've submitted my
> proposal at GSoC. Here are some problems I want to figure out, thanks for
> reply to any of them:
>
> 1. First, I want to be sure where the entry point of this project exactly is
> in the RTEMS. For the notepad, I think RTEMS_API_Control structure, which
> includes the Notepads array member, is the entry of the Notepad part of this
> project. And to the POSIX key, the corresponding structure is
> POSIX_Keys_Control structure, and the key area mentioned in [1] is the
> Values array member of POSIX_Keys_Control. Am I missing something?
>
> typedef struct {
>   rtems_event_set          pending_events;
>   rtems_event_set          event_condition;
>   ASR_Information          Signal;
>   uint32_t                 Notepads[ RTEMS_NUMBER_NOTEPADS ];
> }  RTEMS_API_Control;
>
Yes you have found where the fields are stored. See
http://rtems.org/onlinedocs/doc-current/share/rtems/html/c_user/c_user_83.html#Task-Manager-Directives
6.4.12 TASK_GET_NOTE - Get task notepad entry and 6.4.13 TASK_SET_NOTE
- Set task notepad entry for how the user gets at these fields

> typedef struct {
>    Objects_Control     Object;
>    void              (*destructor)( void * );
>    void              **Values[ OBJECTS_APIS_LAST + 1 ];
> }  POSIX_Keys_Control;
>

The documentation on keys is a little thin but find it here:
http://rtems.org/onlinedocs/doc-current/share/rtems/html/posix_users/posix_users_368.html#Key-Manager

> 2. I doesn't quite understand the key create procedure.First, I'm not sure
> what determines the maximum variable at line4 in code below, is it equal to
> CONFIGURE_MAXIMUM_POSIX_KEYS in configuration? Second, where the code also

It's actually the maximum number of classic or posix tasks
(CONFIGURE_MAXIMUM_TASKS or POSIX_THREADS) depending on the API. this
code makes use of the fact that both classic and posix APIs store the
tasks at the [1] offset of their respective object table entry. It's
not entirely clear, but you can see this by looking at how the table
is loaded by calling _Objects_Initialize_information and that the
table stores along the first dimension the apis and along the second
dimension the classes (not in a matrix but in a compact vector). Look
at score/include/rtems/score/object.h for the definitions of the
meanings of the different APIs and their respective classes.

> allocates key memory for OBJECTS_INTERNAL_API, OBJECTS_CLASSIC_API, while in
> my opinion, only key memory allocation for OBJECTS_POSIX_API is enough. what
> are other things used for?
>

I'm unsure. It appears that this will let non-posix (classic) tasks
store thread-specific data using the posix key interface. Good
question.

Yes that is roughly the idea. Here is the standard on key create..
http://pubs.opengroup.org/onlinepubs/009604499/functions/pthread_key_create.html

> 4.As mentioned in [1], std::map can be  an example for this project. I
> wonder is std::vector also appropriate for this project? Or maybe, several
> differection algorithms sould be implemented, and leave the choice to user?
> And I haven't study hash/map algorithm in depth, maybe, I should ask related
> question after do more homework on this.
>

I don't think you can use functions from C++ stl for this code. We
should evaluate some potential solutions and give preference to
algorithms that put the time complexity cost during key creation and
has fast, deterministic lookup times. The problem with just using a
hash+map function is that if the hash function does not give a good
distribution of keys or if the map algorithm limits the number of keys
then it may be hard to put a tight bound on the lookup time in the
worst-case.

> 5. What's the function naming convention in RTEMS, like _POSIX_Key_Get is an
> inline function, while _POSIX_Keys_Free_memory is not.
>

functions that are short usually get inlined. The naming convention
for internal code is (_API)_Package_name_Method_name(...), where _API
is optional; for posix internals it is _POSIX. For score the _API
normally is omitted (occasionally you will find something named
_CORE). For the public APIs the posix routines stick to the posix
standards, and the classic routines are rtems_package_method(...); the
chain and red-black tree data structures public APIs are implemented
in cpukit/sapi.

-Gedare

> _______________________________________________
> rtems-devel mailing list
> rtems...@rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-devel
>

_______________________________________________
rtems-users mailing list
rtems...@rtems.org
http://www.rtems.org/mailman/listinfo/rtems-users

Joel Sherrill

unread,
Mar 31, 2012, 10:30:41 AM3/31/12
to Gedare Bloom, rtems...@rtems.org, rtems...@rtems.org
On 03/31/2012 09:06 AM, Gedare Bloom wrote:
> On Fri, Mar 30, 2012 at 10:50 AM, 阿四<ashi...@gmail.com> wrote:
>> Hi, all. I'm interesing in the hash/map in POSIX key and Classic notepad
>> project[1] and want to apply it as my GSoC this year. I've submitted my
>> proposal at GSoC. Here are some problems I want to figure out, thanks for
>> reply to any of them:
>>
>> 1. First, I want to be sure where the entry point of this project exactly is
>> in the RTEMS. For the notepad, I think RTEMS_API_Control structure, which
>> includes the Notepads array member, is the entry of the Notepad part of this
>> project. And to the POSIX key, the corresponding structure is
>> POSIX_Keys_Control structure, and the key area mentioned in [1] is the
>> Values array member of POSIX_Keys_Control. Am I missing something?
>>
>> typedef struct {
>> rtems_event_set pending_events;
>> rtems_event_set event_condition;
>> ASR_Information Signal;
>> uint32_t Notepads[ RTEMS_NUMBER_NOTEPADS ];
>> } RTEMS_API_Control;
>>
> Yes you have found where the fields are stored. See
> http://rtems.org/onlinedocs/doc-current/share/rtems/html/c_user/c_user_83.html#Task-Manager-Directives
> 6.4.12 TASK_GET_NOTE - Get task notepad entry and 6.4.13 TASK_SET_NOTE
> - Set task notepad entry for how the user gets at these fields
Note that the Notepads are at the end of structure. In cpukit/src/tasks.c,
this is downsized based on how many notepads are configured. If 0,
then no space is reserved.

>> typedef struct {
>> Objects_Control Object;
>> void (*destructor)( void * );
>> void **Values[ OBJECTS_APIS_LAST + 1 ];
>> } POSIX_Keys_Control;
>>
> The documentation on keys is a little thin but find it here:
> http://rtems.org/onlinedocs/doc-current/share/rtems/html/posix_users/posix_users_368.html#Key-Manager

Keys are simple from a user viewpoint so there isn't much to document. :)

The issue at hand is that the Values table is an array of tables per API.
For each API, an array of (void *) elements is allocated. When the unlimited
allocation is tripped, the allocated array is not resized. The Values
element
needs to be replaced with a hash.

If you consider unlimited keys, there is a lot of memory reserved.

If you consider unlimited tasks/threads, then there are a lot of keys
which need to be resized.

Hashing avoids that. But we need reasonable performance for
insertion, setting, getting, and deletion of a value.

>
>> 2. I doesn't quite understand the key create procedure.First, I'm not sure
>> what determines the maximum variable at line4 in code below, is it equal to
>> CONFIGURE_MAXIMUM_POSIX_KEYS in configuration? Second, where the code also
> It's actually the maximum number of classic or posix tasks
> (CONFIGURE_MAXIMUM_TASKS or POSIX_THREADS) depending on the API. this
> code makes use of the fact that both classic and posix APIs store the
> tasks at the [1] offset of their respective object table entry. It's
> not entirely clear, but you can see this by looking at how the table
> is loaded by calling _Objects_Initialize_information and that the
> table stores along the first dimension the apis and along the second
> dimension the classes (not in a matrix but in a compact vector). Look
> at score/include/rtems/score/object.h for the definitions of the
> meanings of the different APIs and their respective classes.

OBJECTS_APIS_LAST is the number of APIs supported in RTEMS.
<null>=0, Internal=1, Classic=2, POSIX=3, and itron was = 4

For each of those, you need enough pointer slots for the
CONFIGURE_MAXIMUM_TASKS or CONFIGURE_MAXIMUM_POSIX_THREADS,
etc.


>> allocates key memory for OBJECTS_INTERNAL_API, OBJECTS_CLASSIC_API, while in
>> my opinion, only key memory allocation for OBJECTS_POSIX_API is enough. what
>> are other things used for?
>>
> I'm unsure. It appears that this will let non-posix (classic) tasks
> store thread-specific data using the posix key interface. Good
> question.

Exactly. You need to have keys for all tasks/threads no matter which API
was used to create the task/thread. We want them all to be supported
by keys.

This is an example of a key feature in RTEMS. All tasks/threads
at the API level are instances of SuperCore threads and thus
have the same attributes. You can use Classic API services
with POSIX threads, and POSIX API services with Classic API threads.

IMHO they are generally not useful and our current solution
lets them consume 0 bytes so no need to address them at
all in this project.

We couldn't do that when this idea was written up.


>> 4.As mentioned in [1], std::map can be an example for this project. I
>> wonder is std::vector also appropriate for this project? Or maybe, several
>> differection algorithms sould be implemented, and leave the choice to user?
>> And I haven't study hash/map algorithm in depth, maybe, I should ask related
>> question after do more homework on this.
>>
> I don't think you can use functions from C++ stl for this code. We
> should evaluate some potential solutions and give preference to
> algorithms that put the time complexity cost during key creation and
> has fast, deterministic lookup times. The problem with just using a
> hash+map function is that if the hash function does not give a good
> distribution of keys or if the map algorithm limits the number of keys
> then it may be hard to put a tight bound on the lookup time in the
> worst-case.

No C++ is allowed. None at the SuperCore effort. Say that
to yourself over and over. C++ compilable though.

This is for a C implementation at the SuperCore level which
can be used by keys and promoted out via an RTEMS specific
API. It will need to be documented in the user's manual.


>
>> 5. What's the function naming convention in RTEMS, like _POSIX_Key_Get is an
>> inline function, while _POSIX_Keys_Free_memory is not.
>>
> functions that are short usually get inlined. The naming convention
> for internal code is (_API)_Package_name_Method_name(...), where _API
> is optional; for posix internals it is _POSIX. For score the _API
> normally is omitted (occasionally you will find something named
> _CORE). For the public APIs the posix routines stick to the posix
> standards, and the classic routines are rtems_package_method(...); the
> chain and red-black tree data structures public APIs are implemented
> in cpukit/sapi.

And in this case Key vs Keys indicates a mistake. Both are
part of POSIX Keys.

Inlining rules generally focus on performance, readability,
proper typing, and not making testing harder. Inlining a method
with many if expressions leads to test case explosion and
is bad.

--joel


>
> -Gedare
>> links:
>> [1]http://www.rtems.com/wiki/index.php/UseHashOrMapInNotepadsAndKeys
>> [2]http://www.cs.cf.ac.uk/Dave/C/node29.html#SECTION002945000000000000000
>>
>> Best Regards!
>> zw_yao
>>
>> _______________________________________________
>> rtems-devel mailing list
>> rtems...@rtems.org
>> http://www.rtems.org/mailman/listinfo/rtems-devel
>>
> _______________________________________________
> rtems-users mailing list
> rtems...@rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users


--
Joel Sherrill, Ph.D. Director of Research& Development
joel.s...@OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985

阿四

unread,
Apr 1, 2012, 1:49:09 PM4/1/12
to Joel Sherrill, rtems...@rtems.org, rtems...@rtems.org
2012/3/31 Joel Sherrill <joel.s...@oarcorp.com>
I see, it's very instructive!
 
Hashing avoids that.  But we need reasonable performance for
insertion, setting, getting, and deletion of a value.



2. I doesn't quite understand the key create procedure.First, I'm not sure
what determines the maximum variable at line4 in code below, is it equal to
CONFIGURE_MAXIMUM_POSIX_KEYS in configuration? Second, where the code also
It's actually the maximum number of classic or posix tasks
(CONFIGURE_MAXIMUM_TASKS or POSIX_THREADS) depending on the API. this
code makes use of the fact that both classic and posix APIs store the
tasks at the [1] offset of their respective object table entry. It's
not entirely clear, but you can see this by looking at how the table
is loaded by calling _Objects_Initialize_information and that the
table stores along the first dimension the apis and along the second
dimension the classes (not in a matrix but in a compact vector). Look
at score/include/rtems/score/object.h for the definitions of the
meanings of the different APIs and their respective classes
 
OBJECTS_APIS_LAST is the number of APIs supported in RTEMS.

OK, I've removed notepad related stuff in my proposal now. I should ask this more early.
 
We couldn't do that when this idea was written up.

4.As mentioned in [1], std::map can be  an example for this project. I
wonder is std::vector also appropriate for this project? Or maybe, several
differection algorithms sould be implemented, and leave the choice to user?
And I haven't study hash/map algorithm in depth, maybe, I should ask related
question after do more homework on this.

I don't think you can use functions from C++ stl for this code. We
should evaluate some potential solutions and give preference to
algorithms that put the time complexity cost during key creation and
has fast, deterministic lookup times. The problem with just using a
hash+map function is that if the hash function does not give a good
distribution of keys or if the map algorithm limits the number of keys
then it may be hard to put a tight bound on the lookup time in the
worst-case.
No C++ is allowed. None at the SuperCore effort. Say that
to yourself over and over. C++ compilable though.

This is for a C implementation at the SuperCore level which
can be used by keys and promoted out via an RTEMS specific
API. It will need to be documented in the user's manual.

Sorry, I think I didn't make myself understood. I means the final implementation may similar with std::vector, however, I should make effort to get more clear with the requirement of POSIX key to help choose proper algorithm. At my first thought, these requirement are needed in key POSIX: 1. random access to key 2. random insert and delete key. I'm not familiar with this enough, and need help to make it clear.
And then I'll make effort to make myself be clear of the map and hash as soon as possible. I know hash basicly and familiar with using std::map, however, I'm still not quite clear with how hash and map will be appied to POSIX key. (and I think I messed hash and map up in the proposal.) Now, my idea is we could hash all the key(pthread_key_t type) to a hash array table with proper size of buckets, and then we can lookup key quickly because of keys are hashed. and of course the array table is not appropriate for dynamic expanding or shrinking, other data structure should be adopted(such as rbtree, linked-lists etc.) Maybe this is wrong, I ask help to make it right and need more detail about the whole idea.

Thanks!
--zw_yao


5. What's the function naming convention in RTEMS, like _POSIX_Key_Get is an
inline function, while _POSIX_Keys_Free_memory is not.

functions that are short usually get inlined. The naming convention
for internal code is (_API)_Package_name_Method_name(...), where _API
is optional; for posix internals it is _POSIX. For score the _API
normally is omitted (occasionally you will find something named
_CORE). For the public APIs the posix routines stick to the posix
standards, and the classic routines are rtems_package_method(...); the
chain and red-black tree data structures public APIs are implemented
in cpukit/sapi.

Joel Sherrill

unread,
Apr 1, 2012, 2:21:02 PM4/1/12
to 阿四, rtems...@rtems.org, rtems...@rtems.org
From phone. I don't know how using rbtree would impact memory allocation in the sense of when allocations occur but since if the tree stays balanced, that may be a viable implementation.

Or a hash... I know the problem and don't claim to know the perfect solution. :)

But if you use thread ids as keys, you have known properties on the values being hashed or indexed.

--joel

阿四 <ashi...@gmail.com> wrote:

>2012/3/31 Joel Sherrill <joel.s...@oarcorp.com<mailto:joel.s...@oarcorp.com>>


>On 03/31/2012 09:06 AM, Gedare Bloom wrote:

_______________________________________________

Chris Johns

unread,
Apr 3, 2012, 6:08:57 PM4/3/12
to Joel Sherrill, rtems...@rtems.org, rtems...@rtems.org
On 2/04/12 4:21 AM, Joel Sherrill wrote:
> From phone. I don't know how using rbtree would impact memory allocation in the sense of when allocations occur but since if the tree stays balanced, that may be a viable implementation.
>
> Or a hash... I know the problem and don't claim to know the perfect solution. :)
>

Maybe this is something the project could focus on. Looking at both and
seeing how they stack up against each other.

Chris

Gedare Bloom

unread,
Apr 3, 2012, 9:38:49 PM4/3/12
to Chris Johns, rtems...@rtems.org, rtems...@rtems.org
On Tue, Apr 3, 2012 at 6:08 PM, Chris Johns <chr...@rtems.org> wrote:
> On 2/04/12 4:21 AM, Joel Sherrill wrote:
>>
>>  From phone. I don't know how using rbtree would impact memory allocation
>> in the sense of when allocations occur but since if the tree stays balanced,
>> that may be a viable implementation.
>>
>> Or a hash... I know the problem and don't claim to know the perfect
>> solution. :)
>>
>
> Maybe this is something the project could focus on. Looking at both and
> seeing how they stack up against each other.
>

Using the rbtree you don't need a hash since you can use thread ids
directly (I think?); using a hash you still need some structure to map
into for insert/search of key data, such as a tree or an array of
linked lists. The time complexity of the different map algorithms is
what matters most here. The usual hash+array solution is theoretically
quite bad in the worst case; leveraging properties about the threads
themselves might make it better. One problem that I see with using an
rbtree (or array+chain) is that the Key.Values will need to embed an
rbtree_node (or chain_node) along with the user data (void*).

I agree that evaluating different solutions for creating and finding
the Key data should be part of this project, and also determining
whether its possible to frontload some costs to key creation.

> Chris


>
> _______________________________________________
> rtems-devel mailing list
> rtems...@rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-devel

_______________________________________________

pww19...@gmail.com

unread,
Apr 1, 2015, 9:28:25 PM4/1/15
to osdeve_mirror_r...@googlegroups.com, rtems...@rtems.org, ashi...@gmail.com

Title:       The core of the core of the big data solutions -- Map
Author:      pengwenwei
Email:      
Language:    c++
Platform:    Windows, linux
Technology:  Perfect hash algorithm
Level:       Advanced
Description: Map algorithm with high performance
Section      MFC c++ map stl
SubSection   c++ algorithm
License:     (GPLv3)

    Download demo project - 1070 Kb
    Download source - 1070 Kb

Introduction:
For the c++ program, map is used everywhere.And bottleneck of program performance is often the performance of map.Especially in the case of large data,and the business association closely and unable to realize the data distribution and parallel processing condition.So the performance of map becomes the key technology.

In the work experience with telecommunications industry and the information security industry, I was dealing with the big bottom data,especially the most complex information security industry data,all can’t do without map.

For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the Trojan horse virus characteristic code of cloud killing etc..

The map of STL library using binary chop, its has the worst performance.Google Hash map has the optimal performance and memory at present, but it has repeated collision probability.Now the big data rarely use a collision probability map,especially relating to fees, can’t be wrong.

Now I put my algorithms out here,there are three kinds of map,after the build is Hash map.We can test the comparison,my algorithm has the zero probability of collision,but its performance is also better than the hash algorithm, even its ordinary performance has no much difference with Google.

My algorithm is perfect hash algorithm,its key index and the principle of compression algorithm is out of the ordinary,the most important is a completely different structure,so the key index compression  is fundamentally different.The most direct benefit for program is that for the original map need ten servers for solutions but now I only need one server.
Declare: the code can not be used for commercial purposes, if for commercial applications,you can contact me with QQ 75293192.
Download:
https://sourceforge.net/projects/pwwhashmap/files

Applications:
First,modern warfare can’t be without the mass of information query, if the query of enemy target information slows down a second, it could lead to the delaying fighter, leading to failure of the entire war. Information retrieval is inseparable from the map, if military products use pwwhashMap instead of the traditional map,you must be the winner.

Scond,the performance of the router determines the surfing speed, just replace open source router code map for pwwHashMap, its speed can increase ten times.
There are many tables to query and set in the router DHCP ptotocol,such as IP,Mac ,and all these are completed by map.But until now,all map are  using STL liabrary,its performance is very low,and using the Hash map has error probability,so it can only use multi router packet dispersion treatment.If using pwwHashMap, you can save at least ten sets of equipment.

Third,Hadoop is recognized as the big data solutions at present,and its most fundamental thing is super heavy use of the map,instead of SQL and table.Hadoop assumes the huge amounts of data so that the data is completely unable to move, people must carry on the data analysis in the local.But as long as the open source Hadoop code of the map changes into pwwHashMap, the performance will increase hundredfold without any problems.


Background to this article that may be useful such as an introduction to the basic ideas presented:
http://blog.csdn.net/chixinmuzi/article/details/1727195


在 2012年3月30日星期五 UTC+8下午10:50:27,阿四写道:
Reply all
Reply to author
Forward
0 new messages