To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/10fbbbdf-cbd7-4ac8-ab9b-7607c2851368%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
$ sysctl -a | grep machdep.cpu
machdep.cpu.max_basic: 13
machdep.cpu.max_ext: 2147483656
machdep.cpu.vendor: GenuineIntel
machdep.cpu.brand_string: Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz
machdep.cpu.family: 6
machdep.cpu.model: 58
machdep.cpu.extmodel: 3
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 9
machdep.cpu.feature_bits: 3219913727 2142954495
machdep.cpu.leaf7_feature_bits: 641
machdep.cpu.extfeature_bits: 672139520 1
machdep.cpu.signature: 198313
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ENFSTRG RDWRFSGS
machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
machdep.cpu.logical_per_package: 16
machdep.cpu.cores_per_package: 8
machdep.cpu.microcode_version: 21
machdep.cpu.processor_flag: 4
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.linesize_max: 64
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.sub_Cstates: 135456
machdep.cpu.thermal.sensor: 1
machdep.cpu.thermal.dynamic_acceleration: 1
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.thresholds: 2
machdep.cpu.thermal.ACNT_MCNT: 1
machdep.cpu.thermal.core_power_limits: 1
machdep.cpu.thermal.fine_grain_clock_mod: 1
machdep.cpu.thermal.package_thermal_intr: 1
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.energy_policy: 0
machdep.cpu.xsave.extended_state: 7 832 832 0
machdep.cpu.arch_perf.version: 3
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.width: 48
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.events: 0
machdep.cpu.arch_perf.fixed_number: 3
machdep.cpu.arch_perf.fixed_width: 48
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.cache.size: 256
machdep.cpu.tlb.inst.small: 64
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.data.large: 32
machdep.cpu.tlb.shared: 512
machdep.cpu.address_bits.physical: 36
machdep.cpu.address_bits.virtual: 48
machdep.cpu.core_count: 4
machdep.cpu.thread_count: 8
$ java -cp MemoryMappedFiles-0.0.1-SNAPSHOT-jar-with-dependencies.jar:lib/* -XX:+UseG1GC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:AllocatePrefetchLines=5 -XX:AllocatePrefetchStyle=3 -server -javaagent:./lib/SizeOf-1.0-SNAPSHOT.jar Benchmark ../real-roaring-datasets
Java runs single-threaded
objc[78140]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined.
JAVAGENT: call premain instrumentation for class SizeOf
Results interpretation ::
RAM Size = the required RAM space, in KB and bytes/bitmap, to store the 200 bitmaps
Disk Size = the required disk space, in MB and KB/bitmap, to store the 200 serialized bitmaps
Horizontal unions time = average time in ms to compute the horizontal union of 200 bitmaps
Intersections time = average time in ms to compute the intersection of 200 bitmaps
Scans time = average time in ms to scan the 200 bitmaps
***************************
Roaring bitmap on census1881.csv dataset
***************************
RAM Size = 29.77 KB (152.40 bytes/bitmap)
Disk Size = 1.91 MB (9.76 KB/bitmap)
Horizontal unions time = 4 ms
Intersections time = 0.06 ms
Scans time = 12 ms
.ignore = 464741562
***************************
ConciseSet on census1881.csv dataset
***************************
RAM Size = 15.63 KB (80.00 bytes/bitmap)
Disk Size = 3.06 MB (15.66 KB/bitmap)
Unions time = 269 ms
Intersections time = 6 ms
Scans time = 41 ms
.ignore = 1574010228
***************************
Roaring bitmap on census-income.csv dataset
***************************
RAM Size = 26.50 KB (135.68 bytes/bitmap)
Disk Size = 2.26 MB (11.57 KB/bitmap)
Horizontal unions time = 4 ms
Intersections time = 0.06 ms
Scans time = 100 ms
.ignore = 1784600922
***************************
ConciseSet on census-income.csv dataset
***************************
RAM Size = 15.63 KB (80.00 bytes/bitmap)
Disk Size = 2.42 MB (12.40 KB/bitmap)
Unions time = 81 ms
Intersections time = 1 ms
Scans time = 119 ms
.ignore = -1673566912
***************************
Roaring bitmap on weather_sept_85.csv dataset
***************************
RAM Size = 34.66 KB (177.48 bytes/bitmap)
Disk Size = 8.33 MB (42.67 KB/bitmap)
Horizontal unions time = 4 ms
Intersections time = 0.09 ms
Scans time = 181 ms
.ignore = 544521326
***************************
ConciseSet on weather_sept_85.csv dataset
***************************
RAM Size = 15.63 KB (80.00 bytes/bitmap)
Disk Size = 9.02 MB (46.20 KB/bitmap)
Unions time = 593 ms
Intersections time = 13 ms
Scans time = 280 ms
.ignore = 1163895196
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/92e5ceb8-961d-402c-8f5b-8dc0d276afd6%40googlegroups.com.
...
RAM Size = 15.63 KB (80.00 bytes<span
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/a33e45d5-25e3-437d-b6fd-82e33dae6445%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/88271f72-21b9-4eb8-ab79-004ae69f4a4b%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/886f6446-f1e3-4b68-be3e-7e1c2a6adaff%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/1e2266a1-efa4-4398-9464-27ab0e55b369%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/fad8df69-d028-45cd-8966-31694bf78d5e%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4a098107-46c3-4e56-93ac-42f929ff32c9%40googlegroups.com.
Yes I meant now*:P
On Mon, Nov 17, 2014 at 1:12 PM, Daniel Lemire <lem...@gmail.com> wrote:
The format change is intriguing... Effectively, they either sort the original data, or its negation, whichever is most economical. This can be cool in the sense that you might be able implement super fast negation (complement). I do not have a very good idea right now as to whether it could improve compression ratios and/or performance in the context of druid. Obviously, anything that changes the format has to be done with care (e.g., it is an opportunity to create new bugs).They have a function called "advance".... you can find it by searching through their (relatively short) code:https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/util/RoaringDocIdSet.java?view=markup&pathrev=1629606It would not be hard to extend the RoaringBitmap API to include something of the sort... Note that RoaringBitmap already offers things like "rank" and so on... So, unlike the format change, we could throw this in safely as a minor revision.
On Monday, November 17, 2014 3:50:44 PM UTC-5, Xavier wrote:Interesting, do you think it would make sense to incorporate those enhancements?Can you elaborate on what it means to skip over values quickly?On Mon, Nov 17, 2014 at 7:31 AM, Daniel Lemire <lem...@gmail.com> wrote:The Apache Lucene implementation differs in two interesting ways...1. They have a special "reversed" format for compressing well super dense lists.2. They can skip over values quickly while iterating.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB8U%2Bh0PYKLU%3D7zsZ0NUyMaG8oybMU_DNMu9f6AfyyyZNXknKA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB8U%2Bh0PYKLU%3D7zsZ0NUyMaG8oybMU_DNMu9f6AfyyyZNXknKA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/291660cc-7f82-4444-ade3-f3d359a0ba50%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKdCgy-%2B9aOxAw_s3S2axPqmq_XFYyKKMY%2BLD9JvY8xr1M2KdQ%40mail.gmail.com.
Regarding your concern mapping thousands of objects, that's a non-issue in Druid, since the entire set of bitmaps is memory mapped at once.
Fwiw, 18 bytes of header is quite a bit. Is there any documentation
that you can point us to which describes what each value in the header
represents?
One strategy that is specifically enabled by how we store
data is that any header information which provides insight into like
serialization version and things of that nature can be stored once in
the column's metadata rather than stored on each individual bitmap.
If it's true that the memory usage is largely from these headers, it
should be possible to remove the redundant bits and store them just
once as a "column header" instead.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/22767a0b-97a4-43e3-b0f8-df5f2784c880%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/cb5cba5d-b7c8-46ae-880f-66b5f5548a98%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/666a2d3a-15b0-4b88-a9ac-43090f6cbf3b%40googlegroups.com.
Given some data, we could probably tell what us going on quickly.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/48ce8862-cba7-422c-8daa-6b52754e801f%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/_kw2jncIlp0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/68a02b8f-4427-42a5-82e4-6f83d6cbb707%40googlegroups.com.
...................................................................
Sascha Coenen
Senior Java Developer
sascha...@smaato.com
Smaato Inc.
San Francisco – New York – Hamburg – Singapore
www.smaato.com
Valentinskamp 70, Emporio, 19th Floor
20355 Hamburg
T: 0049 (40) 3480 949 0
F: 0049 (40) 492 19 055
The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/68a02b8f-4427-42a5-82e4-6f83d6cbb707%40googlegroups.com.
I also noticed that Druid does not have Roaring run compression turned on by default, and actually it's not even configurable (you have to modify the source and recompile). Raised this PR for that: https://github.com/druid-io/druid/pull/3228
When writing the bitmap dumping tool I noticed that for single element bitmaps (common for high cardinality columns) the Concise bitmaps are 8 bytes and the Roaring bitmaps are 18 bytes. I'm not sure if Sascha is running into the same thing, but that difference adds up over the millions of unique values that could be present in a 5M row segment. Any thoughts on whether this can be improved?
8 + 9 * ((long)x+65535)/65536 + 2 * N
RoaringBitmap.maximumSerializedSize
for a more precise estimate."--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/35a0ad05-b28e-425e-adce-27a66a973cec%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/70a0af4b-54fc-4742-9e7d-874517afb401%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAJ5NOmjgz66v-Y-X-B2bV7CYTKaxz-mw2DBGvg%2BSpcw03txUYQ%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/475f5319-19d8-4943-b552-d24f38e4328c%40googlegroups.com.