July 9 SG5 Webex

Michael Wong

unread,

Jul 6, 2018, 2:16:45 PM7/6/18

to SG5 - Transactional Memory, Herb Sutter, Paul McKenney

Start Time: Monday, July 9 , 2018, 12:00 PM US Pacific Time (07:00 PM in GMT)
End Time: 1:00 PM US Pacific Time (duration: one hour)

https://codeplay.webex.com/codeplay/j.php?MTID=m38839aecbf655d241535c2fa02dab445

https://groups.google.com/a/isocpp.org/forum/#!topic/tm/1d9ZzKHfLUo

With large numbers of participants, audio interference can be a problem. Please try to keep
your phone muted whenever possible. If your phone does not have a mute
button, the bridge will mute or un-mute your line if you dial *6.

The current secretary rota list is (the person who took notes at the last meeting is moved to the end)

Maged, Jens, Victor,, Hans, Michael Scott, Michael Spear , Michael W

Agenda:

1. Opening and introductions

1.1 Roll call of participants

1.2 Adopt agenda

1.3 Approve minutes from previous meeting, and approve publishing previously approved minutes to ISOCPP.org

1.4 Review action items from previous meeting (5 min)

1.5 Call schedules (please add your away days)

2. Main issues (50 min)

2.1 Continue discussion on future of TM logistics

Continue on minimal TM lite proposal based on Herb Sutter's ideas (Herb to come on)

Tim Sweeney has been occassionally pinging us or talking about us and has replied to our requests:

https://groups.google.com/a/isocpp.org/forum/#!topic/tm/60J04S7vaB4

From Jan 29 meeting:

https://groups.google.com/a/isocpp.org/forum/#!topic/tm/9RdEvSzP_1c

In my view, this is going the wrong way. Here's a summary of reasons, and an alternative path forward.

First, let's recognize that the following topics are closely linked:

- Transactional memory (how do we track all reads and writes to all shared memory?)

- Persistence (how do we allocate, find, persist, and manage data long-term without corruption?)

- Garbage collection (how do we find out what memory is actively being used?)

- ABI (how do we provide interface and data backwards-compatibility over multiple program invocations and even across platforms?)

- Reflection (what is the format of all of our data?)

There are several different ways that SG5 could approach this topic.

The current approach is to try to expose transactional memory at the language level. This is difficult, expensive, not fully orthogonal to the other topics above. More generally, it seems aloof to the C++ way, which is to expose general abstractions to programmers so we can implement specific features. Examples of abstractions include functions (1960), templates (1990), and reflection (2020?)

My view is: Give us a great reflection spec, and we'll do the other things ourselves in libraries. Doing this in libraries would be a good thing because:

- Developers can experiment and discover what works best, as opposed to mandating a solution that's only roughly prototyped.

- Designing containers for transactional, persistent, garbage-collected, binary-forward-compatible containers leads to very different designs than std.

- New transactional, persistent, garbage-collected code will need to coexist and interoperate with existing libraries, so fine-grained control will be needed -- which is natural with a library solution to these problems, and unnatural with cross-cutting language features.

- Reflection provides the full toolset needed to build the features above. Transactions via new templated container types; persistence via anything from serialization to patching memory to upgrade versions in-place; garbage collection via metadata; and ABI compatibility by automatically creating forward-compatible wrappers and adapters.

A minimalist alternative for SG5 is to simply bless (via std extensions) the kind of accelerated-but-not-guaranteed restricted transactional memory of Intel's TSX and similar related proposals. These are well-understood low-level features that libraries can build on to implement full transactional memory and the other things, on an opt-in basis.

From Tim Sweeney to SG7 reflection on 12/28/17

https://mail.google.com/mail/u/0/#search/tim.sweeney%40epicgames.com/1609fe1984b318f0

With just reflection, and no reliance on a future generative C++ proposal, we can generate specialized functions that mimic the behavior of constructors and destructors but are customized for special usage cases. For example: a "deserializing constructor" to generate a new instance of a class from a stream, or helper constructors for optimized garbage-collection schemes.

One thing we can't with reflection alone is member-specific customization smart pointers to classes. For example, given "gc_smart_pointer<t> p", we can implement "gc_smart_pointer<t>::operator->()", but it has to behave uniformly for all types.

Could we have a per-class overloadable templated variant of operator->(t&) which receives a meta object describing the particular member being accessed? Then it can customize its behavior according to the type and member being accessed. This would be useful for optimized garbage-collection schemes (where accessing a POD can be optimized compared to a garbage-collector-managed type); software transactional memory schemes (which would like to store data in a class as a simple type, but access it using a wrapper type), marshaling layers that connect C++ to scripting languages; etc.

From Tim Sweeney to SG7

https://mail.google.com/mail/u/0/#search/tim.sweeney%40epicgames.com/1606babe40ba17bc

Will P0194 be extended to support lambdas, and specifically reflecting on the number and type of lambda captures?

Reflecting on lambda captures is critically important in the case of implementing a garbage collector on top of standard C++, without hardcoding knowledge of memory layout or other things.

Background

In standard C++, a general-purpose garbage collector can be implemented on top of smart pointers with reference counting. Any allocation with a nonzero reference count is treated as a GC root. To get from this starting point to real garbage collection, we can provide a mechanism for certain types (such as containers) which are themselves heap-allocated and reference counted, to release the reference counts of their contents once they're initialized.

This can be automated by replacing "new t(parms)" with "newref<t>(parms)", which allocations memory, calls a constructor, and ensures smart pointers release their reference counts immediately rather than in their destructor. This approach breaks the reference-counting cycles for those heap-allocated types, while ensuring everything on the stack remains a GC root.

Using this approach, I have a neat concurrent, nonblocking garbage collector up and running on top of standard C++17. Without static reflection, this requires manually implementing reference-count-releasing functions for essential types.

With C++2a, reflection could make it completely automatic. So, instead of using raw pointers and new, you use a smart pointer and newref, and get free, safe GC within standard C++. For this to work well, we'd need to reflect lambda captures. If we can't reflect lambda captures, then we are nearly certain that they'll be held forever due to reference-counting cycles, because a lambda's purpose is often to manipulate an object it's stored in. Thus the lambda pins the object, and the object contains the lambda, so it's never released. Eager functional languages like ML require garbage collection solely because of these cyclic references between containers and lambdas within them.

Aside: Garbage Collection in Future C++

I believe C++ will fundamentally require concurrent GC in order to scale to many-threaded programs with complex data dependencies and asynchronous execution. Objects, lambdas, and futures interact in so many subtle ways that manual memory management seems intractable. (Unreal Engine has relied on a hand-coded C++ garbage collector since 1998.)

I feel the N2670 garbage collection track is neither tenable nor desirable. C++ isn't about heavyweight runtime plumbing; it's about giving the programmer control, and relying on standard and user libraries to solve common problems. Give us a thorough version of P0194 and we'll have garbage collection soon enough. And it won't be a conservative kluge that stops all threads and scans all stacks and memory looking for pointer-like things; it will be a standard C++ implementation which users can opt-in to, while remaining safe and composable with all other libraries, whether they use GC or not.

Axel Naumann <Axel.N...@cern.ch>

12/19/17

to reflection

Hi Tim,

On 18.12.17 23:08, tim.s...@epicgames.com wrote:

Will P0194 be extended to support lambdas, and specifically reflecting on the number and type of lambda captures?

Reflecting on lambda captures is critically important in the case of implementing a garbage collector on top of standard C++, without hardcoding knowledge of memory layout or other things.

Wow, super-interesting. I'd love to see a CppCon talk on that, to better understand that idea.

And while P0194 excludes lambdas, its follow-up paper P0670 includes them and will appear in LEWG + EWG for the next C++ standards meeting, i.e. it's on track to catch up with P0194.

Could you check whether it does what you need?

Cheers, Axel.

tim.s...@epicgames.com

12/20/17

to SG

Thanks for P0670! Its coverage of static reflection on functions and lambdas in particular looks like exactly what's needed for exposing lambda captures to a garbage collector.

As an aside, I agree with the importance of exposing function parameter names. These will all be important in doc tools, RPC frameworks, and marshalling from scripting languages built on C++ reflection

From Herb Sutter:

I just got this in email from Tim Sweeney:

Transactional memory: Supporting this natively is totally crazy in C++, which has far too much low-level mutable state by default and will be hopelessly inefficient, and uses mutable containers whose internal implementations (reading and writing lots of state) will create vast false conflicts. Rather, we should approach this two ways: ISO C++ should quickly adopt and expose failable Intel TSX-style transactions for small, low-level operations; …

This matches my encouragement to the group to please consider “small local transactions” that are just some small fixed number of memory-only operations… that avoids the entire composability/annotation problem and immediately enables a whole class of lock-free data structures that need (only) multi-word/non-contiguous CAS.

2.2: Interaction with Executors and Synchonized proposal

https://groups.google.com/a/isocpp.org/forum/#!topic/tm/jG9XPJetNkc

The last discussion has us considering an alternative lambda form.

See Paper emailed out on Lambda proposal

https://docs.google.com/document/d/1ICmcrCdigq3ataoM2Jl7m19h_Sa3aE3KfU6AVkPyT-4/edit#

2.3 future issues list:

1. llvm synchronized blocks
2. more smart ptrs?how fast can atomics and smart ptrs be outside tx if they have to interact with tx (for world that does not care about tx), the atomic nature of smart ptrs as a way towards atomics inside atomic blocks
3. more papers?
4. Issue 1-4 paper updates to current TM spec
5. std library

2.4 Discuss defects if any work done since last call
Issue 1: https://groups.google.com/a/isocpp.org/forum/#!topic/tm/SMVEiVLbdig
Issue 2: https://groups.google.com/a/isocpp.org/forum/#!topic/tm/Th7IFxFuIYo
Issue 3:https://groups.google.com/a/isocpp.org/forum/#!topic/tm/CXBycK3kgo0
Issue 4: https://groups.google.com/a/isocpp.org/forum/#!topic/tm/Ood8sP1jbCQ

3. Any other business

4. Review

4.1 Review and approve resolutions and issues [e.g., changes to SG's working draft]
N4513 is the official working draft (these links may not be active yet until ISO posts these documents)
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4513.pdf

N4514 is the published PDTS:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4514.pdf

N4515 is the Editor's report:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4514.html

Github is where the latest repository is (I have updated for latest PDTS published draft from post-Leneaxa):
https://github.com/cplusplus/transactional-memory-ts

Bugzilla for filing bugs against TS:
https://issues.isocpp.org/describecomponents.cgi

4.2 Future backlog discussions:

4.2.1 Write up guidance for TM compatibility for when TM is included in C++ standard (SG5)

4.2.2 Continue Retry discussion
https://groups.google.com/a/isocpp.org/forum/?hl=en&fromgroups#!topic/tm/qB1Ib__PFfc
https://groups.google.com/a/isocpp.org/forum/#!topic/tm/7JsuXIH4Z_A

4.2.3 Issue 3 follow-up

Jens to follow up to see if anything needs to be done for Issue 3.

4.2.5 Future C++ Std meetings:

2018 06-04 RAP C++ Std meeting

4.3 Review action items (5 min)

5. Closing process

5.1 Establish next agenda

5.2 Future meeting
Next call: TBD

Tim Sweeney

unread,

Jul 6, 2018, 8:24:59 PM7/6/18

to t...@isocpp.org, Herb Sutter, Paul McKenney

Thanks Michael.

I’ve reviewed the “TM lite” proposal and love it as a minimalist abstraction for executing code under Intel TSX style limited transactional memory where it’s available, and falling back to execution under a global lock.

That’s exactly what low-level implementors and library writers need for creating various concurrency abstractions and also high-level frameworks for STM “in the large”, which is a very complex and research-level topic at this point, definitely not standards-track stuff.

My one suggestion is for std::tm_synchronized to take an optional lock variable (bool* or whatever) so that updates with a known scope can execute the fallback under a lock that’s narrower than a global lock.

Tim

--
You received this message because you are subscribed to the Google Groups "SG5 - Transactional Memory" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tm+unsu...@isocpp.org.
To post to this group, send email to t...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/tm/.

Michael Spear

unread,

Jul 9, 2018, 12:00:18 AM7/9/18

to t...@isocpp.org, Herb Sutter, Paul McKenney

Hi Tim,

Thanks for your feedback! With regard to your suggestion of an optional lock variable, we have discussed this idea before, and there is a lot of merit to it, but also a few pitfalls. I'll describe the two biggest ones below:

First, some current or future HTM may not interface nicely with the OS's lock implementation. Researchers have shown ways to use TSX with futexes in Linux. I'm not sure about other OSes or IBM's HTMs. There does exist a general solution by changing the implementation of std::mutex: in essence you move most of the scheduling mechanics into userspace by making std::mutex a queue of per-thread semaphores. But then the TM proposal stops being self-contained.

Second, there is a programmability argument. Once one part of the program uses a scoped transaction, then the rest of the program needs to comply with that scoping rule. There are two issues that can arise. The first is that it becomes easier to write racy code. The second is that an attempt to elide a set of locks can result in poor performance: it's harder for an implementation to get the progress guarantees right, each subscribed lock increases the transaction's contention footprint, and each subscribed lock decreases the transaction's capacity.

If I remember correctly, our conclusion was "we could probably do scoped transactions, but maybe that should wait until v2, so that we can move forward with a specification that is as lightweight as possible".

Would that approach be reasonable, or do you think we may have over-estimated the downsides and under-estimated the benefits of scoped transactions?

- Mike

Tim Sweeney

unread,

Jul 9, 2018, 2:40:14 AM7/9/18

to t...@isocpp.org, Herb Sutter, Paul McKenney

Thanks for explaining the complexity of supporting a user-specified lock. That makes sense.

However, lacking that, isn’t it essential to provide a lower-level version of the API which just uses the functionality which TSX-style CPU features expose? E.g. try to run arbitrary code atomically and either succeed or revert its changes and tell the user it failed.

How about a lower-level, no-lock version of this function which takes a lambda returning t, tries to run it atomically, and returns std::optional<t> indicating either success or failure-and-reversion. If nothing like TSX is available on the target, it would do nothing and return failure. Given this primitive, users can write code exploring the full capability of HTM and implement their own fallbacks, which can be much less costly than a global lock.

Tim

Jens Maurer

unread,

Jul 9, 2018, 4:02:33 PM7/9/18

to t...@isocpp.org, Michael Wong, Herb Sutter, Paul McKenney

This is a draft of the minutes. Please, Michaels, fix your
attributions.

SG5 Transactional Memory
2018-07-09 19:00 UTC

Michael Wong, Jens Maurer, Michael Scott,
Herb Sutter, Victor Luchangco, Michael Spear

1.2 Adopt agenda
Herb Sutter's thoughts on TM light
No objections.

1.3 Approve the minutes from previous meetings
No objections.

1.4 Review action items
- Contact Herb Sutter to have him join today's call. Done.
- Contact Tim Sweeney for feedback. Tim sent e-mail.

2.1 Herb Sutter's thoughts on TM light

Alternate proposal for reduced TM interface (Oct 2017) by Michael Spear.
(atomic blocks, synchronized blocks, annotating transaction-safe functions)

Herb suggests "TM even lighter": In lock-free applications, we would
like to do a multi-word compare/swap without data being contiguous.
Just N (e.g. 10) memory operations in a transaction block; no exceptions,
no function calls, no I/O, just plain memory operations.
Is that palatable to the group?

Michael Spear: Lambda-Executors proposal is defined in terms of locks,
so transactions commit no matter what. The biggest fear under hardware
TM that we have no strong progress guarantees. One proposal from 2008
attempts to be provably non-blocking, but no progress since then.

Herb Sutter: Lock-free for me means mostly obstruction-free; single
global lock semantics imposed by a brace-enclosed block specially
marked. We should go practical and useful.

Michael Scott: On Solaris, you have a kernel call that says "please
don't preempt me for a little bit"; that might be all you need.

Herb Sutter: On older Windows, you could force a context switch
and then have a good chance of not being interrupted again after
you get the CPU again.

Michael Scott: Oracle uses the Solaris system call quite a lot.

Jens Maurer: Yes, a limited facility such as Herb's is useful.

Michael Scott: When we started talking about TM-light, we wanted to
reduce the annotation and syntax burden. A lambda seemed useful.
Details need to be nailed down (atomics, shared_ptr).

Herb Sutter: My suggestion: ordinary memory reads and writes only.

Jens Maurer: Fine with me. Lambda proposal with restrictions on the
code appearing in the lambda is not good; instead expose the
restricted code environment at the core language level.

Victor Luchangco: Syntax issues are important, but are orthogonal to
the question what we could implement. Exploring lock-free requirement?

Herb Sutter: There is no requirement on a lock-free implementation
for the "10 memory operations, but fast" proposal.
Tim Sweeney's point of view is that hardware vendor will make the
chosen model fast, because they will complete on it.

Herb Sutter: A proposal that supports exceptions is more costly than one
that doesn't. Both in terms of implementation and in terms of
programming model complexity.

Michael Scott: Maybe Michael Spear is considering a lambda-passing
proposal where you don't support STM. What's the extra runtime cost
when passing a lambda?

Jens Maurer, Herb Sutter: A core language extension is much more
optimizable.

Herb Sutter: If we have a very limited atomic{} block today, we can
expand later to allow e.g. function calls etc. Don't close the door
to future relaxation. We want static diagnostics.

Victor Luchangco: Where do we have a hard limit of 10 in the current
specification? Allow for larger sizes, with presumably worse
performance.

Herb Sutter: Having a number was intending to make the implementation
easier. If it doesn't, I'm fine with leaving the size unlimited.
See Annex on implementation limits in the C++ standard.

Herb Sutter: Yes, while loops would be forbidden as the first step.

Herb Sutter: All algorithms using MCAS work immediately with my proposal.

Michael Spear: I strongly disagree; all such algorithms are written
in Java and assume garbage collection. We must be able to traverse
a read-black tree within a single transaction. Relying on hazard pointers
or similar is a non-starter.

Michael Spear: This will only perform well on HTM.

Next meeting scheduled for July 23.

20:00 UTC

Michael Wong

unread,

Jul 11, 2018, 10:54:04 PM7/11/18

to Jens Maurer, t...@isocpp.org, Herb Sutter, Paul McKenney

Thank you.

Michael Spear

unread,

Jul 12, 2018, 10:26:25 AM7/12/18

to t...@isocpp.org, Jens....@gmx.net, Herb Sutter, Paul McKenney

I don't have any edits to the minutes. They look good to me.

- Mike

Reply all

Reply to author

Forward