As far as I understand, if request data not found in DRAM (DCPMM cache), next step is trying to find data in DCPMM. That means in this case total time to find data will be sum of DRAM latency + DCPMM latency. And if still have no requested data in DCPMM, data must be read from disk and will be write to DRAM or DCPMM ? Can we read data directly to CPU L1 cache bypassing L3/L2 CPU caches ?
Anton
As far as I understand, if request data not found in DRAM (DCPMM cache), next step is trying to find data in DCPMM. That means in this case total time to find data will be sum of DRAM latency + DCPMM latency.
And if still have no requested data in DCPMM, data must be read from disk and will be write to DRAM or DCPMM ?
Can we read data directly to CPU L1 cache bypassing L3/L2 CPU caches ?
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/1d29c4ce-94aa-4c67-ad23-ca6f05e23f89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjrQQiA%2BsCozacypeuHytM_8QekE_9W%2ByAsSW%2BpRdLJkNg%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/1d29c4ce-94aa-4c67-ad23-ca6f05e23f89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/18110a14-f1f8-4d5b-bf5a-57c1ce8c74e3%40googlegroups.com.
linux-4185:~ # numactl --cpunodebind=1 --membind=1 fio --filename=/dev/sda --rw=read --ioengine=sync --bs=128k --iodepth=1 --numjobs=1 --runtime=60 --group_reporting --name=perf_test
perf_test: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=sync, iodepth=1
fio-3.13-27-gef32d
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=2121MiB/s][r=16.0k IOPS][eta 00m:00s]
perf_test: (groupid=0, jobs=1): err= 0: pid=3315: Tue Apr 9 18:49:58 2019
read: IOPS=14.2k, BW=1777MiB/s (1863MB/s)(104GiB/60001msec)
clat (usec): min=22, max=2939, avg=70.07, stdev=122.08
lat (usec): min=22, max=2939, avg=70.10, stdev=122.08
So each time due to double miss it reads data from DISK -> DCPMM and then writes to DRAM. That means total latency should be equal (DISK latency + (2 x DRAM latency + 2 x DCPMM latency)). That's what I asked for.
|---------------------------------------||---------------------------------------|
|-- Socket 0 --||-- Socket 1 --|
|---------------------------------------||---------------------------------------|
|-- Memory Channel Monitoring --||-- Memory Channel Monitoring --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch 0: Reads (MB/s): 4.91 --||-- Mem Ch 0: Reads (MB/s): 754.15 --|
|-- Writes(MB/s): 6.16 --||-- Writes(MB/s): 1128.36 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 1: Reads (MB/s): 2.57 --||-- Mem Ch 1: Reads (MB/s): 753.91 --|
|-- Writes(MB/s): 3.12 --||-- Writes(MB/s): 1127.98 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 2: Reads (MB/s): 2.28 --||-- Mem Ch 2: Reads (MB/s): 754.09 --|
|-- Writes(MB/s): 1.89 --||-- Writes(MB/s): 1128.32 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 3: Reads (MB/s): 1.66 --||-- Mem Ch 3: Reads (MB/s): 754.86 --|
|-- Writes(MB/s): 1.81 --||-- Writes(MB/s): 1130.06 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 4: Reads (MB/s): 2.17 --||-- Mem Ch 4: Reads (MB/s): 755.66 --|
|-- Writes(MB/s): 2.96 --||-- Writes(MB/s): 1131.09 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 5: Reads (MB/s): 1.75 --||-- Mem Ch 5: Reads (MB/s): 756.39 --|
|-- Writes(MB/s): 2.17 --||-- Writes(MB/s): 1132.16 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 371.25 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- NODE 0 Mem Read (MB/s) : 15.34 --||-- NODE 1 Mem Read (MB/s) : 4529.07 --|
|-- NODE 0 Mem Write(MB/s) : 18.10 --||-- NODE 1 Mem Write(MB/s) : 6777.98 --|
|-- NODE 0 PMM Read (MB/s): 0.00 --||-- NODE 1 PMM Read (MB/s): 2227.51 --|
|-- NODE 0 PMM Write(MB/s): 0.00 --||-- NODE 1 PMM Write(MB/s): 0.00 --|
|-- NODE 0.0 NM read hit rate : 0.82 --||-- NODE 1.0 NM read hit rate : 0.51 --|
|-- NODE 0.1 NM read hit rate : 0.71 --||-- NODE 1.1 NM read hit rate : 0.51 --|
|-- NODE 0 Memory (MB/s): 33.45 --||-- NODE 1 Memory (MB/s): 13534.56 --|
|---------------------------------------||---------------------------------------|
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/hjcuaettkvfucuv6ovg6o55d2esbfvh2ll%404ax.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjqZ9J9oa5dGVq9ZAH2dFcivZywF97mwbR3YmPpK%3DmhyKw%40mail.gmail.com.
Hi OttoI have started work on pmem solutions since they appeared in the market 3 years ago, such as NVDIMM-N, Scalable Persistent Memory etc.Yes, I understand that pmem is something completely new. But I'm talking for example about simplest workloads such as non-DAX access 4k random reads/writes.Reading publishes with not correct results will just confuse people.Anton
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjobktyDeO6GRT0hp5ECVYm2jxk3fT7ONJKQLf%2BV18W_Xg%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/1d29c4ce-94aa-4c67-ad23-ca6f05e23f89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/18110a14-f1f8-4d5b-bf5a-57c1ce8c74e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+unsubscribe@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/1d29c4ce-94aa-4c67-ad23-ca6f05e23f89%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+unsubscribe@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/18110a14-f1f8-4d5b-bf5a-57c1ce8c74e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Amnon Izhar
Hi Andy,
I have a few questions:
"In the current Intel® Optane™ DC persistent memory product, there is only one DRAM access on a cache read hit because the data & tags are arranged so that they can be fetched together via a single DDR transaction."
1. Above answer to Amnon's question implies that the cache tags are indeed contained in the DRAM, am I right?
2. Can you please help me understand how it is possible to fetch both data and tags together via a single DDR transaction? AFAIK, every DDR transaction fetches 64 bytes (BL=8), or one complete cache line. How do you stuff extra bits in there to include the cache tags? I can only think of one way: use x72 ECC DRAM DIMMs and re-purpose the ECC bits as cache tags. In this case, each 8-cycle burst will give you an extra 64 bits for use as cache tags. However, you lose ECC capability. Am I right?
"Memory mode uses Optane DC to expand main memory capacity without persistence. It combines a Optane DCPMM with a conventional DRAM DIMM that serves as a direct-mapped cache for the Optane DC PMM. The cacheblock size is 4 KB, and the CPU’s memory controller manages the cache transparently."
3. Above quote appears on page 5 of the NVSL paper (https://arxiv.org/pdf/1903.05714.pdf). Since the CPU's L1/L2/L3 caches all have block size of 64 bytes, I infer that on a read-miss, 4KB (one block) are fetched from Optane and placed in the DRAM cache only. In other words, it is more like a transparent memory-side cache than a traditional L4 cache that also participates in the processor's cache coherency protocols. Am I right?
4. Won’t fetching a large block of 4KB and placing in DRAM on every read-miss hog the memory channel, considering Optane reads are much slower than DRAM? And, upon eventual eviction from DRAM cache, won't it take a very long time to write back 4KB blocks to Optane, since writes are even slower than reads?
5. And if the program does not take advantage of spatial locality, won't all that pre-fetched data goes to waste?
Thanks!
KH
Thought I would share some basic price research, thinking of this
availability question ----
The best summary I've got is from ( as usual ) anand tech, here:
https://www.anandtech.com/show/14146/intel-xeon-scalable-cascade-lake-deep-dive-now-with-optane
which shows Intel has moved high density memory ( DRAM and Optane ) into upcharges in the Xeon line.
The key sentences:
Out of the
new letter configurations, users will notice that the no-letter
designation now has support for 1.5 TB of memory, double the
first generation. This goes up to 2 TB for M, and 4.5 TB for L.
These values include parts that are fitted with Optane, and as
such an ‘L’ CPU can support 3.0 TB of Optane plus 1.5 TB of DDR4
memory, to give that 4.5 TB total.
So if you want 3TB of optane, you have to make sure you get an L
part. Anand doesn't say how much a base-level CPU can handle,
whether anything that CAN do optane can do 1.5T, or whether there
are different values ( like 0.5T for a base part ). Still looking
for that info.
Further down in the list, you see the price schedule, and L rated
parts have a serious premium. The cheapest is the 5215L, which
Anand is showing as a $9119 part, compared to the base models
which are in the $1500 to $2k, the M which is $4k.
This also shows that there are some Cascade Lakes without Optane
DIMM support, but it's mostly the "silver" and "bronze", Gold and
better should do it.
And there is some leaking of pricing --- Anand again --- I can't
say whether it is correct or not ( because Intel never told me ) :
https://www.anandtech.com/show/14180/pricing-of-intels-optane-dc-persistent-memory-modules-leaks
Regarding getting your own build to build a database, good luck!
When I built that which became Aerospike ( which is an open source
database that supports PMEM ), I had to spend about $25k in 2008
on a server substantial enough to prove 100x faster than oracle,
out of my own pocket.... I hope you wouldn't expect starting a new
database to be cheap.
-brian
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjobktyDeO6GRT0hp5ECVYm2jxk3fT7ONJKQLf%2BV18W_Xg%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/07a4be12nieq46i6k7ml8c8moht592gf7m%404ax.com.
First, I'll reiterate that a memory-side cache hit means exactly one fetch from DDR. The details on where all the required information lives is not public, but the cache lines do still have ECC protection. I'll point out that there's many ways to do this, analogous to how directory information is stored for each cache line on various CPU architectures, but the details of Intel's exact layout are not currently public.
Your characterization of the cache as a "transparent memory-side cache" is exactly right.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/07a4be12nieq46i6k7ml8c8moht592gf7m%404ax.com.
No, I am not sure, I am quoting Anandtech which may be wrong. I look forward to better information.
The problem with the ARK statements is it's unclear whether "memory" is DRAM or Optane + DRAM.
The Anand article - which already seems faulty - seems to imply otherwise.
The ARK page says "Max Memory Size ( dependent on memory type )".
Are the types DRAM and Optane and the max depends on whether it is
Optane?
If I read between the lines of the ARK statement, it says literally "Intel® Optane™ DC persistent memory is a revolutionary tier of non-volatile memory that sits between memory and storage to provide large, affordable memory capacity that is comparable to DRAM performance."
That sentance is wrong on the face - it doesn't sit between
anything and anything, it's right on the bus. And, it can't be
memory that sits between memory and storage - that's just a
logically impossible statement.
It would be nice if ARK could say that "Max Memory Size" either includes, or does not include Optane?
I look forward to less confusion ....
-brian
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjqWUJoifC9fTRWeWS%3D2qGDY0hfX6CDj%3DS-Wdkw5zDXz%2BQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/0ed1f1e5-6ed9-027a-9b97-c391021cefa5%40aerospike.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/CAAiJnjqFtbsrp%2BPYuDJGfJ0DRC_bqJOXKnfKPD_yain8%3DfEK%3Dg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/pgm8bepehfhj10ujtabbhihn7ni4iuh568%404ax.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/g9mabe9mnao0oci0ulqjtvcdvuvlrndpae%404ax.com.