int MAX_ENTRIES = 1800000000;
double entriesWithMargin = MAX_ENTRIES * 1.1;
long actualSegments = Maths.nextPower2((long) (entriesWithMargin / (1L << 16)), 1L);
ChronicleMap<byte[], byte[]> map = ChronicleMapBuilder.of(byte[].class, byte[].class)
.actualSegments((int) actualSegments)
.actualEntriesPerSegment((long) (entriesWithMargin / actualSegments))
.putReturnsNull(true)
.removeReturnsNull(true)
.createPersistedTo(file);
## Map builder configs while reading (after the write is completed)
map = ChronicleMapBuilder.of(byte[].class, byte[].class)
.createPersistedTo(file);
byte[] using = new byte[24];
long time = System.currentTimeMillis();
byte[] value = map.getUsing(key.getBytes(), using);
System.out.println("Lookup time: " + (System.currentTimeMillis() - time));
Can you run the test where;
- use the latest release. There have been some changes recently.
- if I remember this was tuned down to minimise space used which will slow performance. Can you try setting the entries to number * 1.2 and the entry size to 80 bytes?
- move key.getBytes () so it is not in the timing.
- you use the actual key type as the key, which doesn't appear to be a byte [].
- check the map size on disk with: du -h {file}
- time warmed up code e.g. After 20k operations.
- can you test this with less entries to see if it a scalability problem. Eg 50 million entries.
- having multiple threads will improve throughput but at the cost to latency.
- smaller maps shouldn't help much but they might.
- getUsing should be fine.
- you might need to use BytesMarshallable for the key.
- would consider using the natural type for the value unless it really is just 24 bytes.
--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
How many numa regions do you have? Eg
numactl --show
If you have multiple numa regions you effectively have multiple machines working together over bus(es) and you don't get the same performance. I suggest you try using no more than 90% of one numa region in a single process so it doesn't have to go across the inter connect.
My 5 cents:
- YES, there is sonething surely wrong with your code, main miss in your code is not specifying key/value size. When you forget it, entry size defaults to 256 bytes.
While you need keySize(48).constantValueSizeBySample(new byte[24]).
- don't do library's work yourself. Seems actual key type is String, just specify ChronicleMapBuilder.of(String.class, …) and don't bother with translating from/to bytes, unless you need encoding different from UTF-8
- The same for values, if they are constant-size there is a good chance they are suitable for data value generation abstraction, that is zero-copy, zero garbage on ser/deser and allows directly off-heap updates.
- Seems you copy-pasted some cumbersome code from recent thread in this group, aiming to workaround a bug which is now fixed in master. Not sure it is included in the latest release, please wait for next release, rc1, it should be there in a couple of days. Generally, this code: entries(ACTUAL_MAX_NUMBER_OF_ENTRIES_WITHOUT_MARGIN), without specifying number of segments, entries per segment, margin etc. MUST work well. Otherwise it is a bug which should be reported.
- BUT don't forget number of entries you configure is the number of "chunks" of the entry size. Since 90% of your entries should fit 1 entrySize (if configured keySize(48), and almost all others should fit 2, you should configure .9 * 1 + .1 * 2 = 1.1 "entries" over your actual max number of entries. (A note for people reading this message: you should copy-paste entries * 1.1 everywhere! It depends on exact configuration and variance profile!)
- if you have a special type of keys/values, not boxed primitive / string / char sequence / byte[] / char[] and etc. For which efficient marshallers are already written and configured in the lib (all such types will be listed soon in docs), and your type is not suitable for data value generation, yes you might consider implementing BytesMarshallable. But better BytesReader/BytesWriter/BytesInterop, that could save some memory and CPU cost compared to implementing BytesMarshallable, if keys/values are small.
- even if you do everething we noted, you could still have a problem, because there could be bugs in our code. That's very hard to identify without live playing with your case.
- unfortunately it is hard to account all details without good knowledge of the lib; currently only we (developers) are able to cook Chronicle Map best.
- BUT don't forget number of entries you configure is the number of "chunks" of the entry size. Since 90% of your entries should fit 1 entrySize (if configured keySize(48), and almost all others should fit 2, you should configure .9 * 1 + .1 * 2 = 1.1 "entries" over your actual max number of entries. (A note for people reading this message: you should copy-paste entries * 1.1 everywhere! It depends on exact configuration and variance profile!)
ChronicleMap<String, byte[]> map = ChronicleMapBuilder.of(String.class, byte[].class)
.entries(110000000)
.keySize(100)
.entrySize(124)
.constantValueSizeBySample(new byte[24])
.createPersistedTo(file);
modified the key to be String (length would be 48-100 characters, not constant). but the value is always a byte[] of size 24 (to be more specific, the value contains 6 float values as a byte[])
While writing to the map, there were no reads. After completing the write:
Map.size() = 205,202,432 map size on disk (du -sh map.dat) = 26G
Using the version: 2.0.14b
Questions:
-----------
1. If you see the Map.size() after write = 205+M, But the builder.entries() was configured as 110M. How did it accommodate more than 110M entries?
2. Lookup times seems to be between 5-7 millis. sounds high, isn't it?
byte[] using = new byte[24];
long time = System.currentTimeMillis();
byte[] value = map.getUsing(key, using);
System.out.println("Lookup time: " + (System.currentTimeMillis() - time));
Thanks,KP
1) config should be just keySize().constantValueSizeBySample(). EntrySize() is redundant in this case, and it hides other two configs. Actually, entrySize() is very rarely needed. In release it will be renamed to actualChunkSize() to emphasize it's low level nature.
2) Define interface
Interface MyValue {
void setValueAt(@MaxSize(6) int index, float value);
float getValueAt(int index);
}
Obtain a value using map.newValueInstance().
I. E. Use data value generation.
3) reported 200m size looks strange and like a bug. It would be helpful if you test with latest master ChMap (you can install via get clone & mvn install -DskipTests=true, 2.0.18rc1-SNAPSHOT in deps), or wait a couple of days until next release
*git clone
ChronicleMap<String, byte[]> map = ChronicleMapBuilder.of(String.class, MyTestValue.class)
.entries(225000000)//as I would like to insert 205m+ entries
.keySize(100)
.constantValueSizeBySample(new MyTestValue())
.createPersistedTo(file);
where, MyTestValue implements MyValue (that you described below)
looks good to give it another try?
Thanks,
KP
ChronicleMap<String, MyTestValue> map = ChronicleMapBuilder.of(String.class, MyTestValue.class)
.entries(225000000)//as I would like to insert 205m+ entries
.keySize(100)
.constantValueSizeBySample(new MyTestValue())
.createPersistedTo(file);
where, MyTestValue implements MyValue (that you described below)
looks good to give it another try?
If you need 100m entries, you should configure 100m. Or you don't know exactly? Reporting 200m if you inserted only 100 seems like a bug.
Regarding key size - you should think how much bytes would 90% of your keys fit, in uft-8 encoding. From your prev mail, I thought this is 48 bytes.
No need to configure constant value size, if using data value generated values. Just remove this line.
MyTestValue generic param on left hand side.
Everything else seems ok.
Yes, 15b now. It doesn't matter
public class LotsOfEntriesMain {
public static void main(String[] args) throws IOException {
workEntries(true);
workEntries(false);
}
private static void workEntries(boolean add) throws IOException {
int entries = 100_000_000;
File file = new File("/tmp/lotsOfEntries.dat");
ChronicleMap<CharSequence, byte[]> map = ChronicleMapBuilder
.of(CharSequence.class, byte[].class)
.entries(entries)
.entrySize(80)
.createPersistedTo(file);
Random rand = new Random();
StringBuilder sb = new StringBuilder();
byte[] bytes = new byte[24];
long start = System.nanoTime();
for (int i = 0; i < entries; i++) {
sb.setLength(0);
int length = (int) (24 / (rand.nextFloat() + 24.0 / 1000));
if (length > 2000)
throw new AssertionError();
sb.append(i);
while (sb.length() < length)
sb.append("-key");
if (add)
map.put(sb, bytes);
else
map.getUsing(sb, bytes);
if (i << -7 == 0 && i % 2000000 == 0)
System.out.println(i);
}
long time = System.nanoTime() - start;
System.out.printf("Map.size: %,d took an average of %,d ns to %s.%n",
map.size(), time / entries, add ? "add" : "get");
map.close();
}
}
public class LotsOfEntriesMain {
public static void main(String[] args) throws IOException {
workEntries(true);
workEntries(false);
}
private static void workEntries(boolean add) throws IOException {
long entries = 100_000_000;
File file = new File("/tmp/lotsOfEntries.dat");
ChronicleMap<CharSequence, MyFloats> map = ChronicleMapBuilder
.of(CharSequence.class, MyFloats.class)
.entries(entries)
.entrySize(80)
.createPersistedTo(file);
Random rand = new Random();
StringBuilder sb = new StringBuilder();
MyFloats mf = map.newValueInstance();
if (add)
for (int i = 0; i < 6; i++)
mf.setValueAt(i, i);
long start = System.nanoTime();
for (long i = 0; i < entries; i++) {
sb.setLength(0);
int length = (int) (24 / (rand.nextFloat() + 24.0 / 1000));
if (length > 2000)
throw new AssertionError();
sb.append(i);
while (sb.length() < length)
sb.append("-key");
if (add)
map.put(sb, mf);
else
map.getUsing(sb, mf);
if (i << -7 == 0 && i % 2000000 == 0)
System.out.println(i);
}
long time = System.nanoTime() - start;
System.out.printf("Map.size: %,d took an average of %,d ns to %s.%n",
map.size(), time / entries, add ? "add" : "get");
map.close();
}
}
interface MyFloats {
public void setValueAt(@MaxSize(6) int index, float f);
public float getValueAt(int index);
}
Data value generated type is better, 1) because you don't need to convert from/to bytes yourself, 2) you can update off-heap bytes directly without ser/deser of the whole value and repetitive search by key, 3) it doesn't copy bytes on deserialization, only save reference to that bytes. If your value size is 24 it might not be so important but on larger size makes some difference
int MAX_ENTRIES = 210000000//I don't know the exact number before writing the map and also I want to maintain a margin so that I can write to the same map in future
ChronicleMap<CharSequence, byte[]> map = ChronicleMapBuilder.of(CharSequence.class, byte[].class)
.entries(MAX_ENTRIES)
.entrySize(80)
.createPersistedTo(file);
After writing:
---------
map.size() => 187,541,584
du -sh map.dat => 19GB
Lookup times are still (tried only few keys) => 6-7ms
Key format:
<string_can_have_spaces><6_digit_int><varying_length_int>
example keys:
chronicle map1234561234567890
java345678637828292973
Value is fixed byte[24]
I understand that value types might perform better, but I want to see the lookup times in nanos than millis with the byte[24] values. Also, I'm not sure if I need to clean up any OS/Disk caches on the machine as I've been generating this map frequently (of course I delete the old map before creating a new one)
I'm still using 2.0.14b. May be I should wait until the next release and try this again.
Thanks,
KP
There already was a release since our discussion 16b), you can try it.
There should be a flaw, either in your test, or benchmarking method, or our code. Or problems with your machine. Peter posted an example above, quite close to your case, where put was taking 1500 ns on average, get -- 600 ns.
Variable sized keys/values cost additional bits! and a few nanoseconds, but not milliseconds!
In the test I have done and provided the latency is an average of 0.0014 ms which I consider to be high but nothing like 6 ms. Can you provide a reproducible test to demonstrate this as I suspect you are testing something different to us.
import net.openhft.chronicle.map.ChronicleMap;
import net.openhft.chronicle.map.ChronicleMapBuilder;
import java.io.*;
import java.util.zip.GZIPInputStream;
public class CMapTest {
static int MAX_ENTRIES = 200000000;
ChronicleMap<CharSequence, byte[]> chMap = null;
public static final String inputPath = "/code/ChronicleMapTest/dataInput";
public static final String outputPath = "/code/ChronicleMapTest/dataOutput/cmapTest.dat";
public static void main(String[] args){
try {
new CMapTest().demoMap();
} catch (IOException e) {
e.printStackTrace();
}
}
public void demoMap() throws IOException {
chMap = createChronicleMap();
System.out.println("Map Size Before= " + chMap.size());
buildChronicleMap();
System.out.println("Map Size After = " + chMap.size());
chMap.close();
}
private boolean buildChronicleMap() throws IOException {
boolean status = false;
String line = null;
BufferedReader reader = null;
try {
int numOfProcessRecords = 0;
System.out.println("Processing input path " + inputPath);
for (Object inputFile : getFiles(inputPath)) {
System.out.println("Processing file " + inputFile.toString());
reader = getReader(inputFile);
int i = 0; boolean isProcessed = true;
long start = System.currentTimeMillis();
while ((line = reader.readLine()) != null) {
i++;
isProcessed = processRecord(line, i);
if(isProcessed) {
numOfProcessRecords++;
}
}
reader.close();
reader = null;
status = true;
System.out.println("No of lines processed: "+i+" in "+(System.currentTimeMillis()-start)/1000+" seconds");
}
if (numOfProcessRecords == 0) {
status = false;
}
System.out.println("numOfProcessRecords: "+numOfProcessRecords);
} catch (IOException e) {
status = false;
System.out.println("Failed building the map"+e);
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
System.out.println("Failed to close the reader"+e);
}
}
}
return status;
}
private boolean processRecord(String line, int lineNumber) throws IOException {
boolean status = true;
String[] parts = line.split("\t");
if (parts.length == 13) {
try {
if (parts[2].length() > 100){
return false;
}
float a = Float.parseFloat(parts[6]);
float b = Float.parseFloat(parts[5]);
float c = Float.parseFloat(parts[9]);
float d = Float.parseFloat(parts[8]);
float e = Float.parseFloat(parts[12]);
float f = Float.parseFloat(parts[11]);
add(createKey(parts[2], parts[0], parts[3]),
createValue(a, b, c, d, e, f));
} catch (Exception e) {
System.out.println("Failed to process record at lineNumber: "+lineNumber +" Record: "+ line);
e.printStackTrace();
}
} else {
status = false;
}
return status;
}
private ChronicleMap<CharSequence, byte[]> createChronicleMap() throws IOException {
File file = new File(outputPath);
ChronicleMap<CharSequence, byte[]> map = ChronicleMapBuilder.of(CharSequence.class, byte[].class)
.entries(MAX_ENTRIES)
// .keySize(41)
.entrySize(80)
.constantValueSizeBySample(new byte[24])
.createPersistedTo(file);
return map;
}
private void add(CharSequence key, byte[] val) throws IOException {
chMap.put(key, val);
}
private CharSequence createKey(String a, String b, String c) {
StringBuilder sb = new StringBuilder();
sb.append(a);
sb.append(b);
sb.append(c);
return sb.toString();
}
private byte[] createValue(float a, float b, float c, float d, float e, float f) {
byte[] value = new byte[4 + 4 + 4 + 4 + 4 + 4];
putInt(value, 0, Float.floatToRawIntBits(a));
putInt(value, 4, Float.floatToRawIntBits(b));
putInt(value, 8, Float.floatToRawIntBits(c));
putInt(value, 12, Float.floatToRawIntBits(d));
putInt(value, 16, Float.floatToRawIntBits(e));
putInt(value, 20, Float.floatToRawIntBits(f));
return value;
}
private void putInt(byte[] arr, int offset, int val) {
arr[offset] = (byte)((val >>> 24) & 0xFF);
arr[offset+1] = (byte)((val >>> 16) & 0xFF);
arr[offset+2] = (byte)((val >>> 8) & 0xFF);
arr[offset+3] = (byte)(val & 0xFF);
}
private Object[] getFiles(String dirName) throws IOException {
File dir = new File(dirName);
if (dir.isDirectory()) {
return dir.listFiles(new FilenameFilter() {
public boolean accept(File d, String name) {
return name.startsWith("part");
}
});
}
return null;
}
public BufferedReader getReader(Object inputFile) throws IOException {
InputStream is = new FileInputStream((File)inputFile);
if (is == null) {
return null;
}
return new BufferedReader(new InputStreamReader(new GZIPInputStream(is)));
}
}
There isn't map.get() in this code
Scanner in = new Scanner(System.in);
System.out.println("Key to lookup: ");
String key = in.nextLine();
ChronicleMap<CharSequence, byte[]> map = ChronicleMapBuilder.of(CharSequence.class, byte[].class)
.keySize(50)
.createPersistedTo(file);
byte[] using = new byte[24];long start = System.nanoTime();
byte[] value = map.getUsing(key, using);
System.out.println("Lookup time: " + (System.nanoTime() - start) +" ns");
Why you are creating a new map on each getUsing operation? Seems you are measuring map bootstrap :)
To be fair, taking 6-7 ms to reload a map from disk with 200 Million entries is pretty good.
If you want to time how long the get() takes I suggest timing it repeatedly after the code has warmed up. E.g. running a test of at least 2 seconds multiple times.e.g 5 times, taking the median.
Btw if you want to store 6 floats I suggest using an off heap reference with named fields. Otherwise using a float[ ]
...
...
...
...
...
...
...