Doubt about compare (order) float and double types with HAM_TYPE

Miguel Ángel Díez Bielsa

unread,

Jun 2, 2014, 8:55:03 AM6/2/14

to hamster...@googlegroups.com

Hi Christoph,

We are thinking about porting our database with C-Tree to Hamsterdb. In C-Tree we have some keys segmented with differents data types (char, doubles, ...). We are thinking to use HAM_TYPE_BINARY for compare keys but our doubt is: ¿the order will be correct qith this segmented keys?

Thanks.

Christoph Rupp

unread,

Jun 2, 2014, 2:51:04 PM6/2/14

to hamster...@googlegroups.com

Hi Miguel,

I'm not sure if i understand... are all these keys in the same database, and you are worried that the sort order of those keys is messed up because they have different types?

I have never seen or used C-Tree, but in hamsterdb the recommended approach would be to have one type per database. If you have a complex record then you could use a "star schema" approach to connect the various databases (http://en.wikipedia.org/wiki/Star_schema).

If you want to keep everything in one database and use HAM_TYPE_BINARY then hamsterdb uses memcmp for sorting. You could also supply your own compare function if you use HAM_TYPE_CUSTOM (but then your key might need an additional flag to describe its actual type). I can provide you with sample code if required.

I'm not sure if my answer helped - feel free to send more details and i'll try to figure out a solution.

Best regards
Christoph

--
You received this message because you are subscribed to the Google Groups "hamsterdb User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hamsterdb-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miguel Ángel Díez Bielsa

unread,

Jun 2, 2014, 3:55:24 PM6/2/14

to hamster...@googlegroups.com

Thank you very much for your answer, Christoph

You are right, sorry, may be I have not explained very well.

For example: If I want create index to order by client + amount (double). What type will be best index?.

Thanks again.

--
You received this message because you are subscribed to a topic in the Google Groups "hamsterdb User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hamsterdb-user/eYdZlOvoWBM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hamsterdb-use...@googlegroups.com.

Christoph Rupp

unread,

Jun 2, 2014, 4:41:13 PM6/2/14

to hamster...@googlegroups.com

Hi Miguel,

The database layout basically depends on how your indices shall look like. If you want to have separate indices for order-id, customer-id and amount then you will need to create a database for each index. If you would like to have compound indices then that's also possible - i have outlined suggestions for such a compound index below.

In Pseudo-SQL your ORDERS table would look like this:

CREATE TABLE ORDERS (
uint64 order_id, -- primary key, indexed, auto-increment
uint64 customer_id,
double amount,

varchar other_info

);

and if i understood correctly you want to run queries like:

SELECT * FROM Orders WHERE customer_id = 12 AND amount > 100.0;

** The ORDERS table **

The Orders-database can be set up for auto-incrementing keys (use the flag HAM_RECORD_NUMBER when calling ham_env_create_db), and all other fields will be stored in the record.

** Separate Indices **

This is a simple solution and should be "fast enough" for most use cases.

database 1 (Orders): as outlined above

database 2 (Customers): could be created similar to Orders, with an auto-incremented id

database 3: n:m-mapping for Customers to Orders; see the sample env2.c for a simple Customer/Orders database (https://github.com/cruppstahl/hamsterdb/blob/master/samples/env2.c)
.

For running a query (... WHERE customer_id = 12 and amount > 100.0) you need a nested loop over the customer-to-orders database (#4); then retrieve the Orders structure and filter out all those orders with an amount > 100. This is simple to implement and does not require a separate index for "amounts"; for most use cases it is fast enough. If not then a compound index (see below) should be faster. There are other solutions, but they would be difficult to implement (i.e. you would have to write code to perform optimized JOINs, like a DBMS would do).

** The Compound Index **

If both fields (customer_id, amount) should have a compound index, then you could store both fields in a single key and use a custom compare function for the sort order. The keys would be stored in a structure like the following:

struct CompoundIndex {

uint64_t customer_id;

double amount;
};

When creating the database, set the following parameters:

HAM_PARAM_KEY_TYPE: HAM_TYPE_CUSTOM

HAM_PARAM_KEY_SIZE: sizeof(CompoundIndex)

HAM_PARAM_RECORD_SIZE: 8 (will contain the uint64-id of the order)

Immediately after creating (and opening) the database, call ham_db_set_compare_func to install a custom compare function which sorts first by customer_id, and second by amount.

For lookups use cursors (ham_cursor_find) in combination with HAM_FIND_GEQ_MATCH or HAM_FIND_LEQ_MATCH; these flags will position the cursor on the exact key, or on the next ("Greater or Equal") or previous ("Less or Equal") key if there's no exact match. Then use ham_cursor_move to iterate over all keys till you reach the first key with a different customer id.

I recommend to always set HAM_PARAM_RECORD_SIZE, especially if the records are so small. Databases with a fixed length record of 8 bytes will be fast because the record is stored directly in the Btree leaf and not in a separate "blob".

I hope that helped - if not then don't hesitate to ask. I can also point you to code which implements some of these things.

Best regards
Christoph

Miguel Ángel Díez Bielsa

unread,

Jun 3, 2014, 3:18:13 AM6/3/14

to hamster...@googlegroups.com

Thanks, Christoph

I think that we need "The Compound Index" and it will compare with callback function but Will it got good performance?

Regards

Christoph Rupp

unread,

Jun 3, 2014, 3:22:16 AM6/3/14

to hamster...@googlegroups.com

Hi Miguel,

depends how you define "good", but the queries will require a single btree lookup and no full table scans. Cursors are very fast, too. Performance should be "good". :)

Best regards
Christoph

Miguel Ángel Díez Bielsa

unread,

Jun 3, 2014, 4:19:35 AM6/3/14

to hamster...@googlegroups.com

Ok, Christoph, we are going to do it..

Thank you very much for your great help.

Regards,

Miguel Ángel Díez Bielsa

unread,

Jun 4, 2014, 1:40:13 PM6/4/14

to hamster...@googlegroups.com

Hi Christoph,

We have begun to integrate hamsterdb in our data layer and I have got a problem. It is a collision between "winsock.h" and "winsock2.h" files. Our application use winsock.h and when I have integrated "ham/hamsterdb.h" file, shows messages and error.

Regards.

Christoph Rupp

unread,

Jun 4, 2014, 2:29:07 PM6/4/14

to hamster...@googlegroups.com

Hi Miguel,

can you remove the following lines in ham/types.h:

line 67: #include <winsock2.h>

line 171: typedef SOCKET ham_socket_t;

and see if it works?

Thanks
Christoph

Miguel Ángel Díez Bielsa

unread,

Jun 5, 2014, 3:10:48 AM6/5/14

to hamster...@googlegroups.com

Hi Christoph,

I have changed winsock2,h by winsock.h and it runs ok. No problem. I will going on.

Thanks,

Christoph Rupp

unread,

Jun 5, 2014, 3:12:17 AM6/5/14

to hamster...@googlegroups.com

OK thanks for the feedback. I'll fix this for the next release.

Miguel Ángel Díez Bielsa

unread,

Jun 5, 2014, 4:00:44 AM6/5/14

to hamster...@googlegroups.com

Hi again, Christoph

Is there some tool in order to explore the environment created?.

Regards

Christoph Rupp

unread,

Jun 5, 2014, 4:25:02 AM6/5/14

to hamster...@googlegroups.com

Hi Miguel,

there are command line tools: ham_info and ham_dump.

ham_info prints information about the Environment and the databases, ham_dump can dump a database to screen.

just run "ham_info --help" for more information.

best regards
Christoph

Miguel Ángel Díez Bielsa

unread,

Jun 5, 2014, 4:34:25 AM6/5/14

to hamster...@googlegroups.com

Thanks, Christoph. It´s very useful for us.

Other question: How can I delete a database into environment?

Regards.

Christoph Rupp

unread,

Jun 5, 2014, 4:35:55 AM6/5/14

to hamster...@googlegroups.com

You can use ham_env_erase_db; here's the documentation:

http://files.hamsterdb.com/scripts/html_www/group__ham__env.html#ga4ceb71003291e9eabe2df7140c89610c

Miguel Ángel Díez Bielsa

unread,

Jun 5, 2014, 4:54:55 AM6/5/14

to hamster...@googlegroups.com

Sorry Christoph,

I was looking the main page: http://files.hamsterdb.com/scripts/html_www/index.html and there it is not.

From now on, I will look your url.

Thanks for your fast answer

Regards

Reply all

Reply to author

Forward

Doubt about compare (order) float and double types with HAM_TYPE_BINARY

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa

Christoph Rupp

Miguel Ángel Díez Bielsa