# Compute in a lock free way

170 views

### Dain Ironfoot

Jun 3, 2022, 6:53:55 PM6/3/22
to mechanical-sympathy
Hello,

I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial.  Prices are sent via different threads therefore class needs to be thread-safe.

Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.

In other words, there is no way to do this in a lock-free manner?

Thank you!

public final class Computer{

private WriteLock writeLock;

private BigDecimal sum;
private int count;

this.writeLock = rwLock.writeLock();
this.sum = new BigDecimal(0);
}

public final void add( double value ){
writeLock.lock();
try{
++count;
}finally{
writeLock.unlock();
}
}

public final double compute( ){
try{
return sum.divide(BigDecimal.valueOf(count)).doubleValue();
}finally{
}
}

### Remi Forax

Jun 4, 2022, 4:45:50 AM6/4/22
to mechanical-sympathy
From: "Dain Ironfoot" <vicva...@gmail.com>
Sent: Saturday, June 4, 2022 12:53:55 AM
Subject: Compute in a lock free way
Hello,
I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial.  Prices are sent via different threads therefore class needs to be thread-safe.

Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.

In other words, there is no way to do this in a lock-free manner?

you can group them in an object and do a CAS (see VarHandle.compareAndSet) on the references

record Stat(BigDecimal sum, int count) { }

public final class Computer {
private volatile Stat stat;

// do a CAS between stat and new Stat(BigDecimal.valueOf(value), stat.count + 1)
}
}

BTW in your example, rwLock, readLock and writeLock should be declared final to avoid any publication issues,

regards,
Rémi

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

### Laurynas Biveinis

Jun 4, 2022, 5:03:44 AM6/4/22
> I am writing a class to calculate average of prices. We use it to generate buy/sell signal for some financial instruments therefore latency is crucial. Prices are sent via different threads therefore class needs to be thread-safe.
>
> Am I correct in understanding that I have to use a lock to make the two operations (adding and incrementing) atomic? I have looked at AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the numbers atomically.
>
> In other words, there is no way to do this in a lock-free manner?

I am not much of an expert here but I have done exactly this. The mean
and count were packed into a single 128-bit value, which was then
updated with x86_64 DWCAS operation (LOCK CMPXCHG16B). It was slower
than a locking version though. I think that is because 1) 128-bit
atomics are more limited than 64-bit ones on x86_64, i.e. to load the
128 bit value you still have to do CAS instead of simple load,
resulting in a more expensive implementation; 2) under high
concurrency this implementation trades a contended lock for a
contended 128-bit atomic, gaining nothing for scalability. To scale,
I'd look into per-thread averages with occasional global aggregation,
but I'm not sure your use case allows this

--
Laurynas

### Remi Forax

Jun 4, 2022, 5:39:32 AM6/4/22
to mechanical-sympathy
----- Original Message -----
> From: "Laurynas Biveinis" <laurynas...@gmail.com>
> Sent: Saturday, June 4, 2022 11:03:28 AM
> Subject: Re: Compute in a lock free way

>> I am writing a class to calculate average of prices. We use it to generate
>> buy/sell signal for some financial instruments therefore latency is crucial.
>> Prices are sent via different threads therefore class needs to be thread-safe.
>>
>> Am I correct in understanding that I have to use a lock to make the two
>> operations (adding and incrementing) atomic? I have looked at
>> AtomicLong/LongAdder/LongAccumulator but looks like they can only sum the
>> numbers atomically.
>>
>> In other words, there is no way to do this in a lock-free manner?
>
> I am not much of an expert here but I have done exactly this. The mean
> and count were packed into a single 128-bit value, which was then
> updated with x86_64 DWCAS operation (LOCK CMPXCHG16B). It was slower
> than a locking version though. I think that is because 1) 128-bit
> atomics are more limited than 64-bit ones on x86_64, i.e. to load the
> 128 bit value you still have to do CAS instead of simple load,
> resulting in a more expensive implementation;

You do not have to, on x86_64 the SIMD mov operations are atomic on 128 bits.

> 2) under high concurrency this implementation trades a contended lock for a
> contended 128-bit atomic, gaining nothing for scalability. To scale,
> I'd look into per-thread averages with occasional global aggregation,
> but I'm not sure your use case allows this
>
>
> --
> Laurynas

Rémi

>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> To view this discussion on the web, visit

### Dr Heinz M. Kabutz

Jun 4, 2022, 6:11:07 AM6/4/22
What is the maximum count and sum? Could the two fit into a 64 bit long? If so a LongAccumulator could be made to work.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java(tm) Specialists' Newsletter"
Sun/Oracle Java Champion
JavaOne Rockstar Speaker
http://www.javaspecialists.eu
Tel: +30 69 75 595 262
Skype: kabutz

### John

Jun 4, 2022, 6:41:48 AM6/4/22
to mechanical-sympathy
Hello anonymous internet user :)

I cant understand the full context of the problem and am curious about it. I don't understand how aggregating prices into a mean value but  can be used to produce a meaningful signal. So some questions:

Do you reset the average periodically?
Do you take snapshots and analyse the discrete time buckets?
Do you just use an average over the whole day?

Why does the code use decimals? I have never measured performance of x87 but I would expect them to degrade performance. I've always seen money represented as int/long types and a fixed scale but its good that we can ignore accuracy.

Have you tried sticking everything on a lock free queue from all these threads and processed the queue and populated this in a single threaded context? I think that fundamentally you can't perform the two operations atomically without the hackery Dr Kabutz has propounded :P.

Final thoughts:
a. You care more that the read is fast rather than having the most recent data.
b. You don't want to block the writing threads.
c. Thread local averages sound but performing an aggregation is incompatible with a.
d. I think you should focus on more than this component of this code.

p.s none of this is advice, im just another nobody on the internet
Message has been deleted

### Dain Ironfoot

Jun 4, 2022, 11:11:49 AM6/4/22
to mechanical-sympathy
Thank you so much Remi!!
I took your advice of using CAS to update both of them and came up with this (Can't use record as still on JDK 11).

Does this look ok?

public final class  Computer  {

public static class Stat{
private final BigDecimal sum;
private final int count;

public Stat(BigDecimal sum, int count){
this.sum = sum;
this.count = count;
}
}

private final AtomicReference<Stat> statReference;

public  Computer  ( ){
this.statReference = new AtomicReference<>();
}

public final void  add ( int value ){
Stat oldStat = null;
Stat newStat = null;

do{
oldStat = statReference.get();

BigDecimal newSum = BigDecimal.valueOf(value);
BigDecimal newStatCheck = (oldStat == null ) ? newSum : oldStat.sum.add(newSum);
int newCount = (oldStat == null ) ? 1 : oldStat.count + 1;
newStat = new Stat( newStatCheck, newCount);

}while( !statReference.compareAndSet(oldStat, newStat) );

}

public final double  compute  ( ){
Stat stat = statReference.get();
if( stat == null ){
return 0;
}else {
return stat.sum.divide(BigDecimal.valueOf(stat.count)).doubleValue();
}
}

### Dain Ironfoot

Jun 4, 2022, 11:14:22 AM6/4/22
to mechanical-sympathy
Thanks Heinz!

Please correct me if I am wrong but your approach, on a high level, sounds similar to Remi's approach?
In Remi's approach, we are CASing and atomically updating the state ourselves.
However, in your approach, we would delegate it to LongAccumulator?

### Remi Forax

Jun 4, 2022, 2:27:39 PM6/4/22
to mechanical-sympathy
From: "Dain Ironfoot" <vicva...@gmail.com>
Sent: Saturday, June 4, 2022 5:11:49 PM
Subject: Re: Compute in a lock free way
Thank you so much Remi!!
I took your advice of using CAS to update both of them and came up with this (Can't use record as still on JDK 11).

Does this look ok?

I think you can separate the concurrency part from the algorithm part and always have the reference initialized to avoid those pesky nulls, like this

public final class Computer {
private static final class Stat {

private final BigDecimal sum;
private final int count;

public Stat(BigDecimal sum, int count) {
this.sum = sum;
this.count = count;
}

return new Stat(sum.add(BigDecimal.valueOf(value)), count + 1);
}

public double compute() {
return sum.divide(BigDecimal.valueOf(count)).doubleValue();

}
}

private final AtomicReference<Stat> statReference;

public Computer() {
this.statReference = new AtomicReference<>(new Stat(BigDecimal.ZERO, 0));
}

}

public double compute() {
return statReference.get().compute();
}
}

### Dain Ironfoot

Jun 4, 2022, 8:04:16 PM6/4/22
to mechanical-sympathy
Thanks so much Remi!  Really like how you separated the computation and the thread-safety part.
Wish I had thought of that :)

### Laurynas Biveinis

Jun 6, 2022, 3:11:16 AM6/6/22
> > 128-bit
> > atomics are more limited than 64-bit ones on x86_64, i.e. to load the
> > 128 bit value you still have to do CAS instead of simple load,
> > resulting in a more expensive implementation;
>
> You do not have to, on x86_64 the SIMD mov operations are atomic on 128 bits.

Indeed. Thanks!

--
Laurynas