Hi.
I have a question:
1) How I can delete segments from historic node that are larger than X days?
2) on version 0.6.66 if I add
config/historical/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorage
it not use this directory and still write to /tmp/druid/localStorage
but another settings
druid.segmentCache.locations=[{"path": "/var/lib/druid/indexCache", "maxSize"\: 10000000000}]
is working normally
anything I missed?
Thanks,
Taras
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4444025f-7f80-4efb-878a-c5e7d201bc73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
|
Is the documentation for the drop rules incorrect (or perhaps my reading of it incorrect)? From what I see in the documentation, “The interval of a segment will be compared against the specified period. The period is from some time in the past to the current time. The rule matches if the period contains the interval.”, which implies that a rule with a period of “P1M” would drop all segments for the last month, but keep segments older than that. This seems to be exactly the opposite of what someone would normally want. I would think that normally you would want a rule saying to drop segments older than a certain time (i.e. keep one month of segments and drop things as they get older than that).
Will
Will Lauer
Tech Yahoo, Software Sys Dev Eng, Sr
P: 217.255.4262 M: 508.561.6427
2021 S First St Suite 110 Champaign IL 61820
![]()
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACW6ntcNmWb04v5a5Z%3D%3D5aJ1Rti%3D4OkJ8_dMvhGh2YzYHknPTw%40mail.gmail.com.
Hi Taras,
See Inline
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4444025f-7f80-4efb-878a-c5e7d201bc73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Nishant
Software Engineer
|
m
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
OK, that makes more sense. I missed the bit about the rules being ordered. Once you add that, it makes more sense.
Will Lauer
Tech Yahoo, Software Sys Dev Eng, Sr
P: 217.255.4262 M: 508.561.6427
2021 S First St Suite 110 Champaign IL 61820
![]()
From: druid-de...@googlegroups.com [mailto:druid-de...@googlegroups.com]
On Behalf Of Fangjin Yang
Sent: Wednesday, April 16, 2014 10:53 PM
To: druid-de...@googlegroups.com
Subject: Re: [druid-dev] Segments delete
Hi Will,
Hi Taras,
See Inline
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4444025f-7f80-4efb-878a-c5e7d201bc73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
|
|
|
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid-developm...@googlegroups.com.
To post to this group, send email to
druid-de...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-development/CACW6ntcNmWb04v5a5Z%3D%3D5aJ1Rti%3D4OkJ8_dMvhGh2YzYHknPTw%40mail.gmail.com.
For more options, visit
https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid-developm...@googlegroups.com.
To post to this group, send email to
druid-de...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-development/21dbbb55-a678-4dc5-b2fe-e896aef64ed3%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/iUuR7nvkYF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/6e5d242e-0308-403b-998f-632167beecb9%40googlegroups.com.
Fangjin Yang1) so as I understand, I send query to coordinator node and it do what specified in json request, am I right?like
curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'content-type: application/json' -d @query.body
how can be? what I can change to inject 100 files more faster?when I inject 100 files (each file 1000 rows, the same 100k rows totally) it takes me 2 min 40 secs2) when I inject 1 file (100k rows) it takes me 5-6 secs
and another questions:
3) I put to config/overlord/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorageand is working now for me, thanks to you. but if I reinject for the same period data I have new and old data in separated folders. is any way how to rewrite old data and have only new one if I do reinject?
ThanksTaras
On 17 April 2014 06:55, Fangjin Yang <fan...@metamarkets.com> wrote:
Hi Taras,There is no way to send rules during indexing. You have to specify them using the coordinator console.
On Wednesday, April 16, 2014 12:04:45 PM UTC-7, Taras Puhol wrote:Nishant Bangarwa
Does it mean I need indicate this rule during batch indexing injections?
or there is a way to send drop rule { "type" : "dropByInterval", "interval" : "2012-01-01/2013-01-01" } to indexing service and interval will be dropped ?
thanks
Taras
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/iUuR7nvkYF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
Hi Taras, see inline.
On Thursday, April 17, 2014 10:13:55 AM UTC-7, Taras Puhol wrote:Fangjin Yang1) so as I understand, I send query to coordinator node and it do what specified in json request, am I right?like
curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'content-type: application/json' -d @query.body
You send queries to broker nodes, not coordinator nodes. Broker nodes route queries to historical and real-time nodes, which compute answers in parallel. Coordinator nodes are responsible for load balancing, assigning new data to historical nodes, and dropping old data.
how can be? what I can change to inject 100 files more faster?when I inject 100 files (each file 1000 rows, the same 100k rows totally) it takes me 2 min 40 secs2) when I inject 1 file (100k rows) it takes me 5-6 secs
and another questions:
What do you mean by inject 1 file? Are you ingesting the file and if so, how are you ingesting the file?
3) I put to config/overlord/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorageand is working now for me, thanks to you. but if I reinject for the same period data I have new and old data in separated folders. is any way how to rewrite old data and have only new one if I do reinject?
If you use batch indexing, Druid will create immutable segments for a time period with a version identifier associated. If you reindex the same time period of data, segments will be created with new versions and once loaded into Druid, invalidate older segments for the same time period with older versions.
Does that make sense?FJThanksTarasOn 17 April 2014 06:55, Fangjin Yang <fan...@metamarkets.com> wrote:
Hi Taras,There is no way to send rules during indexing. You have to specify them using the coordinator console.
On Wednesday, April 16, 2014 12:04:45 PM UTC-7, Taras Puhol wrote:Nishant Bangarwa
Does it mean I need indicate this rule during batch indexing injections?
or there is a way to send drop rule { "type" : "dropByInterval", "interval" : "2012-01-01/2013-01-01" } to indexing service and interval will be dropped ?
thanks
Taras--To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/6e5d242e-0308-403b-998f-632167beecb9%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/iUuR7nvkYF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/iUuR7nvkYF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/392f7afe-f35d-4c33-8d97-eb18c62d5863%40googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
Hi Fangjin Yang,On 18 April 2014 08:11, Fangjin Yang <fan...@metamarkets.com> wrote:
Hi Taras, see inline.
On Thursday, April 17, 2014 10:13:55 AM UTC-7, Taras Puhol wrote:Fangjin Yang1) so as I understand, I send query to coordinator node and it do what specified in json request, am I right?like
curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'content-type: application/json' -d @query.body
You send queries to broker nodes, not coordinator nodes. Broker nodes route queries to historical and real-time nodes, which compute answers in parallel. Coordinator nodes are responsible for load balancing, assigning new data to historical nodes, and dropping old data.is clear now. thankshow can be? what I can change to inject 100 files more faster?when I inject 100 files (each file 1000 rows, the same 100k rows totally) it takes me 2 min 40 secs2) when I inject 1 file (100k rows) it takes me 5-6 secs
and another questions:
What do you mean by inject 1 file? Are you ingesting the file and if so, how are you ingesting the file?I'm doing benchmark test. I'm using batch ingestion to load segments to my datastores. For my system, I want to use like for each user separated datastore. but see that load 1 segment to 1 data store takes me 5 sec, and load 100 segments to 100 dataStores (total size the same) takes me 2 min 40 sec. is there any way how to make this faster?
3) I put to config/overlord/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorageand is working now for me, thanks to you. but if I reinject for the same period data I have new and old data in separated folders. is any way how to rewrite old data and have only new one if I do reinject?
If you use batch indexing, Druid will create immutable segments for a time period with a version identifier associated. If you reindex the same time period of data, segments will be created with new versions and once loaded into Druid, invalidate older segments for the same time period with older versions.yes, I understand that I can invalidate older segments, but is there are rules that they are deleted so I not use a lot hdd space if need reinject a lot of times?
Thanks,Taras
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/392f7afe-f35d-4c33-8d97-eb18c62d5863%40googlegroups.com.
Inline.
On Thursday, April 17, 2014 11:53:49 PM UTC-7, Taras Puhol wrote:Hi Fangjin Yang,On 18 April 2014 08:11, Fangjin Yang <fan...@metamarkets.com> wrote:
Hi Taras, see inline.
On Thursday, April 17, 2014 10:13:55 AM UTC-7, Taras Puhol wrote:Fangjin Yang1) so as I understand, I send query to coordinator node and it do what specified in json request, am I right?like
curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'content-type: application/json' -d @query.body
You send queries to broker nodes, not coordinator nodes. Broker nodes route queries to historical and real-time nodes, which compute answers in parallel. Coordinator nodes are responsible for load balancing, assigning new data to historical nodes, and dropping old data.is clear now. thankshow can be? what I can change to inject 100 files more faster?when I inject 100 files (each file 1000 rows, the same 100k rows totally) it takes me 2 min 40 secs2) when I inject 1 file (100k rows) it takes me 5-6 secs
and another questions:
What do you mean by inject 1 file? Are you ingesting the file and if so, how are you ingesting the file?I'm doing benchmark test. I'm using batch ingestion to load segments to my datastores. For my system, I want to use like for each user separated datastore. but see that load 1 segment to 1 data store takes me 5 sec, and load 100 segments to 100 dataStores (total size the same) takes me 2 min 40 sec. is there any way how to make this faster?
Can you describe your system specs? E.g. are you using SSDs? What is your deep storage? You are loading 100 segments to 100 historical nodes? What are the size of the segments? FWIW, if you care about how fast data loads, you should look into realtime ingestion.
3) I put to config/overlord/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorageand is working now for me, thanks to you. but if I reinject for the same period data I have new and old data in separated folders. is any way how to rewrite old data and have only new one if I do reinject?
If you use batch indexing, Druid will create immutable segments for a time period with a version identifier associated. If you reindex the same time period of data, segments will be created with new versions and once loaded into Druid, invalidate older segments for the same time period with older versions.yes, I understand that I can invalidate older segments, but is there are rules that they are deleted so I not use a lot hdd space if need reinject a lot of times?Yes, drop rules will drop segments that don't match the rule. You can also drop entire datasources in Druid. You can do all of this using the coordinator console (http://druid.io/docs/latest/Coordinator.html).
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4669fce0-b415-439e-a711-5aad81bc5ed3%40googlegroups.com.To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
3) I put to config/overlord/runtime.properties
druid.storage.type=local
druid.storage.storageDirectory=/var/lib/druid/localStorageand is working now for me, thanks to you. but if I reinject for the same period data I have new and old data in separated folders. is any way how to rewrite old data and have only new one if I do reinject?
If you use batch indexing, Druid will create immutable segments for a time period with a version identifier associated. If you reindex the same time period of data, segments will be created with new versions and once loaded into Druid, invalidate older segments for the same time period with older versions.yes, I understand that I can invalidate older segments, but is there are rules that they are deleted so I not use a lot hdd space if need reinject a lot of times?Yes, drop rules will drop segments that don't match the rule. You can also drop entire datasources in Druid. You can do all of this using the coordinator console (http://druid.io/docs/latest/Coordinator.html).sorry, perhaps I was not clear.why if I reinject data old and new segments are stored in deep storage, in my case (druid.storage.storageDirectory=/var/lib/druid/localStorage)
is any rule that if I reinject data deepStorage only store last version of segment?mean example, after 1hour I accepted log that was delayed, so doing reinject, than this can be few times. so deep storage is grow up quickly.
and other related question, in case I save logs, and data were injected, so I see in Historic, as I understand if I delete all in storageDirectory=/var/lib/druid/localStorage will I have any issues?
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAPVaYSH6toNsXTQo9eJvt879fAPGkzAb%3Dt%3DrTyb7pK97GYXMwg%40mail.gmail.com.
What I mean, I do benchmark1) 1 file, 100 000 rows, takes 5 sec2) 100 files, each 1000 rows. (total space the same as point 1) 2 min 40 secs3) 1000 files, each 100 rows (so total space still the same) - 28 mins
why inject time is so much different? how I can speedup ?
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4669fce0-b415-439e-a711-5aad81bc5ed3%40googlegroups.com.
I'm using ssd. all druids nodes are at the same machine. (broker, historic, overlord, coordinator) default settings
My deep storage is local, /var/lib/druid
all other things are default for druid. all like in http://druid.io/docs/latest/Tutorial:-Loading-Your-Data-Part-1.html
just my dataEach segments is injected to unique dataStore. I want to test how much time I'll need to inject files to 1000 customers. for each customer I want to have separate dataStoreWhat I mean, I do benchmark1) 1 file, 100 000 rows, takes 5 sec2) 100 files, each 1000 rows. (total space the same as point 1) 2 min 40 secs3) 1000 files, each 100 rows (so total space still the same) - 28 mins
why inject time is so much different? how I can speedup ?The difference between these is largely due to the no. of segments being generated,In the first case druid is generating only 1 segment (which includes, writing segment metadatas to mysql, persisting segment to deep storage)while in the last case segments being generated are 1000 (1000X no of indexes being generated and persisted to deep storage, metadatas written to mysql)Also I wonder with separating data in different datasources the amount of compression being achieved might be different.Do you see any noticeable diff in the size of a single segment in deep storage Vs total size of 1000 segments in use case 3?
--You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAPVaYSHanxvqcuR8V9o4RUXxry4BOAQxbm2%3DnWxo9YuQ4GbyBw%40mail.gmail.com.
Hi Taras,
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAPVaYSGFcCo7cKwinJS9HsUANfbspx06FyCNA2epo-rJMq13%2Bw%40mail.gmail.com.
HadoopDruidIndexer
Taras
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/iUuR7nvkYF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACW6nteK4fRT0tTY4zi7uq1JGpEDdrRmJSaPvm1BFXb-7BpV2A%40mail.gmail.com.