When to use pmemobj_persist

79 views
Skip to first unread message

S Kannoth

unread,
Aug 12, 2022, 5:51:53 AMAug 12
to pmem
Hi Guys,

I am new to PMEM and I am using libpmemobj for my development. I do not have the hardware, so I am playing with pmdk in my pc. I have a question on the simple example below.

int main() {
  PMEMobjpool* pop1 = pmemobj_create("/tmp/mypool_int_1", LAYOUT_NAME_1, PMEMOBJ_MIN_POOL, 0666)

  PMEMoid root_int = pmemobj_root(pop1, sizeof(int)); int* root_int_p = pmemobj_direct(root_int);

  *root_int_p = 65535;

  pmemobj_persist(pop1, &root_int_p, sizeof(int)); pmemobj_close(pop1);

  return 0;
}

What will happen if we do not call the pmemobj_persist? Would the data already written with *root_int_p = 65535; be already persisted? Since we are writing directly into the PM address with Persistent Memory Direct Access, should I always invoke this?

The man pages says "pmemobj_persist() forces any changes in the range [addr, addr+len) to be stored durably in persistent memory"

What do you meant by durable stored in persistent memory?

Cheers, Kannoth

steve.s...@gmail.com

unread,
Aug 12, 2022, 11:01:24 AMAug 12
to pmem
Hi Kannoth,

>> What do you meant by durable stored in persistent memory?

If you haven't already, grab yourself a free copy of the 'Programming Persistent Memory: A Comprehensive Guide for Developers' from https://pmem.io/books/. Chapter 2 describes the platform support and the power-fail protected domain. In summary, platforms must provide Asynchronous DRAM Refresh (ADR) which defines the power-fail protected domain including the Memory Controller write pending queue (WPQ) and the media, but not the CPU caches. If the system crashes or loses power, any uncommitted data in the CPU caches is lost, resulting in possible data corruption/loss. libpmemobj uses atomic database-like transactions with an undo log to protect against the failure scenario.  

>> What will happen if we do not call the pmemobj_persist? 

If you do not flush, you run the risk of losing data at critical points within your code upon system failure.

>> Since we are writing directly into the PM address with Persistent Memory Direct Access, should I always invoke this?

This is a temporal write (though the CPU caches), so you're not writing directly to the PMem device. Your options are either forcing the data to be written to PMem with a call to pmemobj_persist() or allowing the CPU to naturally flush dirty data to PMem.

>> should I always invoke this?

On an ADR platform, Yes. Flush/Persist at critical points in your code, but don't go crazy. Flushing is an expensive operation (# of CPU cycles) and depending on your platform you could call either CLFLUSHOPT which flushes the data from the CPU caches but invalidates the data, or CLWB which flushes and does not invalidate. Libpmemobj does the correct call based on your platform features. 

Flushing too frequently has a performance penalty. Not flushing frequently enough runs the risk of data loss.

>> Would the data already written with *root_int_p = 65535; be already persisted? 

As described in the previous answers, this answer is "maybe", but at that specific line in the code, likely not. It all depends on whether the CPU flushed the data from the caches or you call *_persist(). 

Enhanced ADR (eADR) includes the CPU caches within the power-fail protected domain so you no longer need to flush as the hardware will ensure the data is written out. However, only the HPe Superdome actually implemented this feature for Cascade Lake, so it's not available on your typical 1- or 2-Socket system (or the 4- or 8 Socket). You can read more about ADR and eADR in the book and eADR: New Opportunities for Persistent Memory Applications. Upcoming Fast ADR (Sapphire Rapids) and Global ADR (CXL) platform features will negate the need to flush/persist, but it's up to the platform vendors to implement it as it requires significantly more stored energy than your typical PSU can provide. 

HTH


Message has been deleted

Wu, Dennis

unread,
Sep 7, 2022, 9:09:10 PMSep 7
to S Kannoth, pmem

Libpmem2 and libpmemobj are in two different level library. Libpmem2 is only considering how to keep the data persistent, but application need to consider the power fail and meta data management.

Libpmemobj consider the ACID with the undo/redo log and help you maintain the data, but the performance is not good enough.

Another option is use the storage over AD that’s configure the Pmem as the fast and low latency storage(SSD) and help you store your snapshot.  In the current storage over AD, two kernel patches can improve the performance:

[PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation & [PATCH] BTT: Use dram freelist and remove bflog to otpimize perf can improve your performance. 

 

From: pm...@googlegroups.com <pm...@googlegroups.com> On Behalf Of S Kannoth
Sent: Thursday, September 8, 2022 4:37 AM
To: pmem <pm...@googlegroups.com>
Subject: Re: When to use pmemobj_persist

 

Hi Steve,

 

Thanks for the detailed reply and pointing me out the PMEM programming book :) . I appreciate it.

In fact my usage of PMEM is a little different. It would be great to know your suggestions on my following questions as well.

 

I have an application which uses classic posix_memalign() function used for allocations. Now I would like to extend the application with snapshotting capability with PMEM in Filesystem DAX mode. It means I need to have an allocator for PMEM which persists my data over application restart. Since my application is distributed, it might need to write to remote PMEM as well. My first consideration was libmemkind. But, I came to know that libmemkind can only be used in volatile mode, so I cannot re-refer the data after restart.

 

My initial considerations were using the below libraries.

1) libpmemobj and librpmem (for remote read/writes- Since I see in the github that librpmem is deprecated this ended up being not a choice.

2) libpmem2 and librpma (for remote)  -  What do you think about this? 

I see there are quite a lot difference between the APIs of libpmemobj and libpmem2. Especially with libpmemobj, the use the term pool and store the data in pools, which I do not see with libpmem2,  where we use files and memory map them.

 

I modified the examples from github and created a naive allocator , below. Do you think this is too simple to be an allocator in practocal applcitions? if so what do you think it lacks and what are your suggestions?

 

...

 

typedef struct{

   struct pmem2_config *cfg;

   struct pmem2_map *map;

   struct pmem2_source *src;

   pmem2_persist_fn persist;

  

   const char* path;

} naive_pmem_desc_t;

 

void* naive_pmem_malloc(size_t size , naive_pmem_desc_t desc)

{

  int fd;

 

  if ((fd = open(desc.path, O_CREAT | O_RDWR)) < 0)

    return NULL;

 

  if((pmem2_config_new(&desc.cfg)))

    return NULL;

 

  if( posix_fallocate(fd, 0, size ) < 0)

    return NULL;

 

  if (pmem2_source_from_fd(&desc.src, fd))

    return NULL;

 

  if (pmem2_config_set_required_store_granularity(desc.cfg, PMEM2_GRANULARITY_PAGE))

    return NULL;

 

  if (pmem2_map_new(&desc.map, desc.cfg, desc.src))

    return NULL;

 

  close(fd);

    return pmem2_map_get_address(desc.map);

}

 

void naive_pmem_free(naive_pmem_desc_t desc)

{

   pmem2_map_delete(&desc.map);

   pmem2_source_delete(&desc.src);

   pmem2_config_delete(&desc.cfg);

   

   // Delete the file

   unlink(path) 

}

 

...

 

 

Cheers

~Kannoth

--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/70b94140-c91f-4f60-9f63-b7d0c7bb7fa6n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages