Document cells idea

68 views
Skip to first unread message

ddorian

unread,
May 22, 2013, 10:58:54 AM5/22/13
to hyperta...@googlegroups.com
I am using hypertable to track video engagement. Basically i have to track how many times each second of each video was watched, every day.

CREATE TABLE eng (seconds COUNTER);
row_key=video_id:date(yymmdd)
qualifier=which second of the video
value=viewers

Of course many key-values are created and the overhead of timestamps for each key-value becomes bigger.
Maybe a new cell_type could be created: Document.

You could apply set,unset,increment,decrement etc to keys inside the cell, example (mongodb syntax):
insert into t values("r","doc","$set:{'a':2},$inc:{'f.a':5,'g':20})
In this case "$" would be a forbidden character for use in keys. And "." is used for subfields.
OR
just like qualifiers are used to insert multiple cells in a row, a "field" could be introduced after/before the timestamp
INSERT INTO t VALUES ("r","doc","+2","timestamp","a");

How it works
  • like counter, merge in memory/on read/on compaction
  • a special format is saved in-memory with the key modifiers (set,inc,decr etc)
  • it can be returned as a json string
PROS:
  • You save storage on timestamp
  • Lower data transfer from rangeserver--->thrift_broker---->client because you don't transfer (row_key + timestamp) for each key-value.
  • Which means you get more data in a batch when querying.

CONS:

  • Based on the format you save data you may need to keep the field_keys short to keep the overhead minimum.
  • even if you need one key from the document, you will get all the document
  • could be slower updating depending on the format used

Maybe later can be added lists, maps, sets etc. What do you think? Does this make sense?

Doug Judd

unread,
May 23, 2013, 1:06:59 AM5/23/13
to hypertable-user
Thanks for the feedback.  Give us a chance to think about it.  There are a lot of different directions we can go with Hypertable.

- Doug



--
You received this message because you are subscribed to the Google Groups "Hypertable User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypertable-us...@googlegroups.com.
To post to this group, send email to hyperta...@googlegroups.com.
Visit this group at http://groups.google.com/group/hypertable-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Doug Judd
CEO, Hypertable Inc.

ddorian

unread,
Feb 8, 2014, 3:57:53 PM2/8/14
to hyperta...@googlegroups.com
OR, most of that can be emulated by allowing cells to not have timestamps ? (which now that i think about it, seems better) ?

Evgeny Pasynkov

unread,
Feb 9, 2014, 8:38:15 AM2/9/14
to hyperta...@googlegroups.com
Hi,

With Thrift API, you can disable the timestamp for the cell.

Evgeny Pasynkov,
JetBrains.

ddorian

unread,
Feb 9, 2014, 8:47:37 AM2/9/14
to hyperta...@googlegroups.com
What i meant was that:
Optionally for column-families to not store in hdd the timestamp for each cell (so each cell-overhead becomes smaller).
Internally, on merge (or timestamp predicate), every cell of column-families(without timestamps) coud use/get the cell-store's timestamp but these cells will be written without timestamp. (this limits them to 1 version/counters)

And what you are saying is to "not transfer" timestamps when selecting or timestamps being set on the server when inserting ?


On Wednesday, May 22, 2013 4:58:54 PM UTC+2, ddorian wrote:

Evgeny Pasynkov

unread,
Feb 9, 2014, 8:49:55 AM2/9/14
to hyperta...@googlegroups.com
Hi,

CellSerializer API has the option to skip the timestamp. I guess it won't be written to disk then. Am I right?

--
Evgeny Pasynkov

ddorian

unread,
Feb 9, 2014, 8:53:45 AM2/9/14
to hyperta...@googlegroups.com
When you read the cell, you'll see that it has a timestamp(which is the ~time when you inserted it because it was generated on the server).
Here you can see Doug response.


On Wednesday, May 22, 2013 4:58:54 PM UTC+2, ddorian wrote:

Alex Kashirin

unread,
Feb 13, 2014, 4:10:43 AM2/13/14
to hyperta...@googlegroups.com

the skip, correct is the "0" epoch time is set to the cell, - something came across, while time-stamp was required.

Kashirin Alex 

Dorian Hoxha

unread,
Feb 13, 2014, 5:19:45 AM2/13/14
to hyperta...@googlegroups.com
..... i don't really understand what you'r saying Alex?


--
You received this message because you are subscribed to a topic in the Google Groups "Hypertable User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hypertable-user/l9nWyg-_i40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hypertable-us...@googlegroups.com.

To post to this group, send email to hyperta...@googlegroups.com.

Kashirin Alex

unread,
Feb 13, 2014, 5:36:44 AM2/13/14
to hyperta...@googlegroups.com
it is to  Evgeny Pasynkov say, of the serialized -cells can be without  the TIMESTAMP.

--
You received this message because you are subscribed to a topic in the Google Groups "Hypertable User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hypertable-user/l9nWyg-_i40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hypertable-us...@googlegroups.com.
To post to this group, send email to hyperta...@googlegroups.com.
Visit this group at http://groups.google.com/group/hypertable-user.
For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages