tagging every user's data - maximum limit of tag count

32 views
Skip to first unread message

Alex SV

unread,
May 15, 2013, 5:28:24 AM5/15/13
to disc...@googlegroups.com
Hi,

I'm creating user events analyse tool. The system saves user events, and groups by date.
I want to use ddfs tags to tag every date.
Also it's necessary to get events for one particular user. The idea was to have unique tag for each user.

Documentation says, that it's possible to have 100000 of tags, and filesystem works ok. What about bigger amount? If we have 10M users, 100M users, I think it's big overhead for filesystem to have 100M tags.
Please point to the right direction.

Regards,
Alex

Jens Rantil

unread,
May 15, 2013, 6:40:33 AM5/15/13
to disc...@googlegroups.com
Hi Alex,

A tag can point to multiple other tags. Doing this makes it possible to store tags hierarchically. In your case you could tag them by hour, and then day, then month. This would also make it easier for you to purge older tags etc.

Jens


--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to disco-dev+...@googlegroups.com.
To post to this group, send email to disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/disco-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Want to know how full my inbox is? Or how to get in touch with me faster? Or tell me your e-mail is not that important? Then check this out: http://courteous.ly/4WtfZY

Alex SV

unread,
May 15, 2013, 9:09:19 AM5/15/13
to disc...@googlegroups.com
Hi Jens,

Thank you. Yes, it's possible tag by hour too, but during a year it will generate only 365*24 hourly tags, which is not a problem.
The thing I worry about it tagging by user, so for 10M users I should have 10M tags, and I don't know, how DDFS handles it.

Alex

Prashanth Mundkur

unread,
May 16, 2013, 1:14:55 AM5/16/13
to disc...@googlegroups.com
On 06:09 Wed 15 May, Alex SV wrote:

> Thank you. Yes, it's possible tag by hour too, but during a year it will
> generate only 365*24 hourly tags, which is not a problem.
> The thing I worry about it tagging by user, so for 10M users I should have
> 10M tags, and I don't know, how DDFS handles it.

It's best to treat the number of DDFS tags as a large but bounded
resource, and not rely on it to scale indefinitely. This is due to
DDFS requiring garbage collection, which requires all tags and their
contents to be read into the master's memory (for GC only). A tag
per-user certainly sounds very suboptimal.

--prashanth

Alex SV

unread,
May 16, 2013, 4:23:28 AM5/16/13
to disc...@googlegroups.com
So am I right here are 2 strategies:
1. create tags for group of users (like sharding)
2. or use server with big amount of memory for master?

~Alex

Jens Rantil

unread,
May 17, 2013, 4:35:53 AM5/17/13
to disc...@googlegroups.com
There is a third stretegy, too: You could create a chained map reduce job that makes that initial selection in the first map phase. Obviously this could take longer time, it's worth mentioning.

Jens


--
You received this message because you are subscribed to the Google Groups "Disco-development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to disco-dev+...@googlegroups.com.
To post to this group, send email to disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/disco-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Alex SV

unread,
May 17, 2013, 5:32:35 AM5/17/13
to disc...@googlegroups.com
Hi Jens,

Yes, for that reason I want to tag users to access their events directly

Alex
Reply all
Reply to author
Forward
0 new messages