Compute in a lock free way

170 views
Skip to first unread message

Dain Ironfoot

unread,
Jun 3, 2022, 6:53:55 PM6/3/22
to mechanical-sympathy
Hello, 

I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial.  Prices are sent via different threads therefore class needs to be thread-safe.

Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.

In other words, there is no way to do this in a lock-free manner? 

Thank you!



public final class Computer{

private ReentrantReadWriteLock rwLock;
private ReadLock readLock;
private WriteLock writeLock;

private BigDecimal sum;
private int count;

public AverageCalculatorThreadSafeImplementation2( ){
this.rwLock = new ReentrantReadWriteLock();
this.readLock = rwLock.readLock();
this.writeLock = rwLock.writeLock();
this.sum = new BigDecimal(0);
}

public final void add( double value ){
writeLock.lock();
try{
sum = sum.add( BigDecimal.valueOf(value) );
++count;
}finally{
writeLock.unlock();
}
}

public final double compute( ){
readLock.lock();
try{
return sum.divide(BigDecimal.valueOf(count)).doubleValue();
}finally{
readLock.unlock();
}
}

Remi Forax

unread,
Jun 4, 2022, 4:45:50 AM6/4/22
to mechanical-sympathy
From: "Dain Ironfoot" <vicva...@gmail.com>
To: "mechanical-sympathy" <mechanica...@googlegroups.com>
Sent: Saturday, June 4, 2022 12:53:55 AM
Subject: Compute in a lock free way
Hello, 
I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial.  Prices are sent via different threads therefore class needs to be thread-safe.

Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.

In other words, there is no way to do this in a lock-free manner? 

you can group them in an object and do a CAS (see VarHandle.compareAndSet) on the references

  record Stat(BigDecimal sum, int count) { }

public final class Computer {
  private volatile Stat stat;

  public void add(double value) {
    // do a CAS between stat and new Stat(BigDecimal.valueOf(value), stat.count + 1)
  }
}

BTW in your example, rwLock, readLock and writeLock should be declared final to avoid any publication issues,

regards,
Rémi

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/2d083587-2ee1-4f8f-a7ee-a99062a6ef5cn%40googlegroups.com.

Laurynas Biveinis

unread,
Jun 4, 2022, 5:03:44 AM6/4/22
to mechanica...@googlegroups.com
> I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial. Prices are sent via different threads therefore class needs to be thread-safe.
>
> Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.
>
> In other words, there is no way to do this in a lock-free manner?

I am not much of an expert here but I have done exactly this. The mean
and count were packed into a single 128-bit value, which was then
updated with x86_64 DWCAS operation (LOCK CMPXCHG16B). It was slower
than a locking version though. I think that is because 1) 128-bit
atomics are more limited than 64-bit ones on x86_64, i.e. to load the
128 bit value you still have to do CAS instead of simple load,
resulting in a more expensive implementation; 2) under high
concurrency this implementation trades a contended lock for a
contended 128-bit atomic, gaining nothing for scalability. To scale,
I'd look into per-thread averages with occasional global aggregation,
but I'm not sure your use case allows this


--
Laurynas

Remi Forax

unread,
Jun 4, 2022, 5:39:32 AM6/4/22
to mechanical-sympathy
----- Original Message -----
> From: "Laurynas Biveinis" <laurynas...@gmail.com>
> To: "mechanical-sympathy" <mechanica...@googlegroups.com>
> Sent: Saturday, June 4, 2022 11:03:28 AM
> Subject: Re: Compute in a lock free way

>> I am writing a class to calculate average of prices. We use it to generate
>> buy/sell signal for some financial instruments therefore latency is crucial.
>> Prices are sent via different threads therefore class needs to be thread-safe.
>>
>> Am I correct in understanding that I have to use a lock to make the two
>> operations (adding and incrementing) atomic? I have looked at
>> AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the
>> numbers atomically.
>>
>> In other words, there is no way to do this in a lock-free manner?
>
> I am not much of an expert here but I have done exactly this. The mean
> and count were packed into a single 128-bit value, which was then
> updated with x86_64 DWCAS operation (LOCK CMPXCHG16B). It was slower
> than a locking version though. I think that is because 1) 128-bit
> atomics are more limited than 64-bit ones on x86_64, i.e. to load the
> 128 bit value you still have to do CAS instead of simple load,
> resulting in a more expensive implementation;

You do not have to, on x86_64 the SIMD mov operations are atomic on 128 bits.

> 2) under high concurrency this implementation trades a contended lock for a
> contended 128-bit atomic, gaining nothing for scalability. To scale,
> I'd look into per-thread averages with occasional global aggregation,
> but I'm not sure your use case allows this
>
>
> --
> Laurynas

Rémi

>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to mechanical-symp...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/mechanical-sympathy/CAHkCEVePWi7ZQ6UagR_WX%2B_fJeGR3yLeC1td02nZHV_iKH%2Bgeg%40mail.gmail.com.

Dr Heinz M. Kabutz

unread,
Jun 4, 2022, 6:11:07 AM6/4/22
to mechanica...@googlegroups.com
What is the maximum count and sum? Could the two fit into a 64 bit long? If so a LongAccumulator could be made to work.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java(tm) Specialists' Newsletter"
Sun/Oracle Java Champion
JavaOne Rockstar Speaker
http://www.javaspecialists.eu
Tel: +30 69 75 595 262
Skype: kabutz

John

unread,
Jun 4, 2022, 6:41:48 AM6/4/22
to mechanical-sympathy
Hello anonymous internet user :) 

I cant understand the full context of the problem and am curious about it. I don't understand how aggregating prices into a mean value but  can be used to produce a meaningful signal. So some questions:

Do you reset the average periodically?
Do you take snapshots and analyse the discrete time buckets?
Do you just use an average over the whole day?


Why does the code use decimals? I have never measured performance of x87 but I would expect them to degrade performance. I've always seen money represented as int/long types and a fixed scale but its good that we can ignore accuracy.

Have you tried sticking everything on a lock free queue from all these threads and processed the queue and populated this in a single threaded context? I think that fundamentally you can't perform the two operations atomically without the hackery Dr Kabutz has propounded :P.

Final thoughts:
a. You care more that the read is fast rather than having the most recent data.
b. You don't want to block the writing threads.
c. Thread local averages sound but performing an aggregation is incompatible with a.
d. I think you should focus on more than this component of this code.

Thanks in advance if you can answer and of this :)  

p.s none of this is advice, im just another nobody on the internet
Message has been deleted

Dain Ironfoot

unread,
Jun 4, 2022, 11:11:49 AM6/4/22
to mechanical-sympathy
Thank you so much Remi!!
I took your advice of using CAS to update both of them and came up with this (Can't use record as still on JDK 11).

Does this look ok?


public final class  Computer  {

public static class Stat{
    private final BigDecimal sum;
    private final int count;

public Stat(BigDecimal sum, int count){
    this.sum = sum;
    this.count = count;
}
}


private final AtomicReference<Stat> statReference;

public  Computer  ( ){
     this.statReference = new AtomicReference<>();
}


public final void  add ( int value ){
     Stat oldStat = null;
     Stat newStat = null;

    do{
        oldStat = statReference.get();

        BigDecimal newSum = BigDecimal.valueOf(value);
        BigDecimal newStatCheck = (oldStat == null ) ? newSum : oldStat.sum.add(newSum);
        int newCount = (oldStat == null ) ? 1 : oldStat.count + 1;
        newStat = new Stat( newStatCheck, newCount);

}while( !statReference.compareAndSet(oldStat, newStat) );

}


public final double  compute  ( ){
    Stat stat = statReference.get();
    if( stat == null ){
        return 0;
    }else {
       return stat.sum.divide(BigDecimal.valueOf(stat.count)).doubleValue();
    }
}

Dain Ironfoot

unread,
Jun 4, 2022, 11:14:22 AM6/4/22
to mechanical-sympathy
Thanks Heinz!

Please correct me if I am wrong but your approach, on a high level, sounds similar to Remi's approach?
In Remi's approach, we are CASing and atomically updating the state ourselves.
However, in your approach, we would delegate it to LongAccumulator?

Remi Forax

unread,
Jun 4, 2022, 2:27:39 PM6/4/22
to mechanical-sympathy
From: "Dain Ironfoot" <vicva...@gmail.com>
To: "mechanical-sympathy" <mechanica...@googlegroups.com>
Sent: Saturday, June 4, 2022 5:11:49 PM
Subject: Re: Compute in a lock free way
Thank you so much Remi!!
I took your advice of using CAS to update both of them and came up with this (Can't use record as still on JDK 11).

Does this look ok?

I think you can separate the concurrency part from the algorithm part and always have the reference initialized to avoid those pesky nulls, like this

public final class Computer {
  private static final class Stat {

    private final BigDecimal sum;
    private final int count;

    public Stat(BigDecimal sum, int count) {
      this.sum = sum;
      this.count = count;
    }

    public Stat add(double value) {
      return new Stat(sum.add(BigDecimal.valueOf(value)), count + 1);
    }

    public double compute() {
      return sum.divide(BigDecimal.valueOf(count)).doubleValue();

    }
  }

  private final AtomicReference<Stat> statReference;

  public Computer() {
    this.statReference = new AtomicReference<>(new Stat(BigDecimal.ZERO, 0));
  }

  public void add(double value) {
    statReference.getAndUpdate(s -> s.add(value));
  }

  public double compute() {
    return statReference.get().compute();
  }
}


Dain Ironfoot

unread,
Jun 4, 2022, 8:04:16 PM6/4/22
to mechanical-sympathy
Thanks so much Remi!  Really like how you separated the computation and the thread-safety part.
Wish I had thought of that :)

Laurynas Biveinis

unread,
Jun 6, 2022, 3:11:16 AM6/6/22
to mechanica...@googlegroups.com
> > 128-bit
> > atomics are more limited than 64-bit ones on x86_64, i.e. to load the
> > 128 bit value you still have to do CAS instead of simple load,
> > resulting in a more expensive implementation;
>
> You do not have to, on x86_64 the SIMD mov operations are atomic on 128 bits.

Indeed. Thanks!

--
Laurynas
Reply all
Reply to author
Forward
0 new messages