New cache implementation, and new versioned namespace

Skip to first unread message

Brent Burley

May 14, 2015, 3:54:54 PM5/14/15
I've pushed a fairly big update to the newcache branch.  The latest version is tagged v2.1.5 and is intended to become the new master version once some platform-specific loose ends have been filled in - more below.

The main feature is the new cache implementation.  PtexCache was entirely rewritten to have better performance especially when the cache is shared among threads.  We have been using this in our Hyperion renderer for a few months now with a single shared cache.  We have also ported all of our other products that use Ptex to this new implementation without incident.

Highlights of the new cache:
* A standalone production scale test (reproducing filtered texture accesses from a render of Hiro's bedroom) using a shared cache ran 17x faster than the previous per-thread cache, and 171x faster than the previous shared cache.  In Hyperion, the typical render improvement was closer to 10% in overall render time, but the big gains are memory reduction due to the shared cache and less I/O load on the network.
* The global cache lock had been eliminated (and use of locks in general overall has been greatly reduced).  In particular, the new cache is lock-free (and wait-free) when reading data that is present in the cache, requiring just a couple atomics to update the LRU list.
* Cached data has been decoupled from file handles.  If a file handle is released (due to reaching the maxFiles limit), the data is now kept.  Only when data is needed that isn't in cache will the file be reopened.
* Re-opening a file is very cheap.  The header checksum is compared to the in-memory representation and if they match, all the previously cached data is kept intact.  Based on this, the maxFiles limit can be made quite small without incident.
* Cache granularity is now much coarser.  The LRU list now tracks PtexTextures, not individually data items; if a texture is evicted, all data for that texture is evicted.  This reduces the need for locking and reduces the LRU overhead.
* LRU tracking is managed via work queues to avoid contention.

In addition to the new cache, there are some additional notable features:
* The entire Ptex API and library are wrapped in a versioned namespace.  This namespace includes an optional vendor token to allow insulation from vendor-specific changes.
* The stats are now live (not just in the debug build) and have been added to the API.  Stats can be polled asynchronously.  Memory accounting is much improved.
* A number of minor optimizations were made.

The remaining "to do" item is that the Windows and Mac atomics still need to be implemented.  These are AtomicIncrement/Add, AtomicCompareAndSwap, AtomicStore, and MemoryFence.

Reply all
Reply to author
0 new messages