Currently only the bitmap is optimized. Table capacity supported is
about ~90%. Going to add Hazard-Pointers support. This would allow
memory reclamation. Added preliminary benchmark section, would add
more algorithms to the benchmark.
Any constructive & positive ideas are welcome!
Thanks,
Ms. Moran Tzafrir. (MA Student, TAU)
P.S. link to C++ code
http://groups.google.com/group/hopscotch-hashing/web/hopscotch---implmentation-c-java
Isn't there an error in the following algorithm:
// located in `cpp_framework.h'
class TASLock {
public:
unsigned int volatile _lock;
TASLock() : _lock(0) {}
~TASLock() {_lock=0;}
void init() {_lock = 0;}
void lock() {
do {
if (0 == CASIO(&_lock, 0, 0xFFFFFFFF))
return;
} while(true);
}
bool tryLock() {
return (0 == CASIO(&_lock, 0, 0xFFFFFFFF));
}
void unlock() {
_lock = 0;
Memory::write_barrier();
}
};
?
How is this used? Don't you need a `MEMBAR #LoadStore | #StoreStore' before
you store to `TASLock::_lock' in `TASLock::unlock()'? Also, don't you need
`MEMBAR #StoreLoad | #StoreStore' after you CAS `TASLock::_lock' in
`TASLock::lock()', and if `TASLock::tryLock()' succeeds? e.g.:
class TASLock {
public:
unsigned int volatile _lock;
TASLock() : _lock(0) {}
~TASLock() {_lock=0;}
void init() {_lock = 0;}
void lock() {
while (CASIO(&_lock, 0, 0xFFFFFFFF)) {
// exponential backoff/sched_yield()...
}
MEMBAR #StoreLoad | #StoreStore;
}
bool tryLock() {
if (CASIO(&_lock, 0, 0xFFFFFFFF)) {
return false;
}
MEMBAR #StoreLoad | #StoreStore;
return true;
}
void unlock() {
MEMBAR #LoadStore | #StoreStore;
_lock = 0;
}
};
? What am I missing?
Thanks,
Moran Tzafrir.
I notice that in `sparc_defs.h" CASIO is defined as:
#define CASIO(_a,_o,_n) CASIO_internal((_u32 volatile *)(_a),_o,_n)
I notice that in `sparc_mcas.il' implementation of `CASIO_internal' is:
.inline CASIO_internal
cas [%o0], %o1, %o2
mov %o2, %o0
.end
So, CASIO is naked atomic op; no MEMBAR. CAS on TSO SPARC is not strong
enough to acquire lock. So, AFAICT, there seems to be a memory visibility
bug in `TASLock::lock()/trylock()'. And on SPARC RMO, there is a bug in
`TASLock::unlock()'. Also you mention TTASLock there is a MEMBAR; I cannot
seem to see it right now. BTW, what is point of `TTASLock' and `TASLock'?
AFAICT, as of now, if I use either `TASLock' or `TTASLock' as template
parameter `_tLock' to `HopscotchHashMap' in `HopscotchHashMap.h', then it
will be infested with race-conditions on SPARC TSO and/or RMO. Please clear
up my mind.
Thanks.
(Let not be too hasty, let’s talk about it, I worked on the code, it
works, putting bad words without talking about it is not academic)
Again in the benchmark code TTAS is used, so talking about TAS is
void, but thanks for the input (would remove it or check it again).
(1) TAS - Test & Set (2) TTAS - Test & Test & Set
Here is the line from the benchmark that use the TTAS
(CMDR::TTASLock) : HopscotchHashMap<int, int, HASH_INT,
CMDR::TTASLock, CMDR::Memory> _ds;
Here is the TTAS lock hiding in the .h file. Notice
"Memory::write_barrier();", hiding in the unlock function :-) those
memory barriers always manage to hide :-)
Let me know if you OK with you now, or I'll be here, lets talk, it is
really important to support this, so people would start using this.
class TTASLock {
public:
unsigned int volatile _lock;
TTASLock() : _lock(0) {}
~TTASLock() {_lock=0;}
void init() {_lock = 0;}
void lock() {
do {
if(0 == _lock) {
if (0 == CASIO(&_lock, 0, 0xFFFFFFFF))
return;
}
} while(true);
}
bool tryLock() {
return (0==_lock && 0 == CASIO(&_lock, 0, 0xFFFFFFFF));
Please notice that I recommended using BitmapHopscotchHashMap.h and
not HopscotchHashMap.h, until I'll optimize HopscotchHashMap.h.
Great Chris! people should note this. Follow the benchmark code, use
TTAS, with the memory barrier: (and not TAS, until I'll review this,
thanks Chris, really helpful)
void unlock() {
_lock = 0;
Memory::write_barrier();
}
Thanks,
Moran Tzafrir.
What the does a trailing `#StoreStore' do here... Ummm? I noticed that, but
its the wrong membar in the wrong place. Sorry, but that code above is
broken on RMO SPARC, and the `lock/trylock' portion is broken on RMO __and__
TSO. I boil `Memory::wrire_barrier()' down to the SPARC defines wrt the
`WMB()' macro; which is defined as:
#define WMB() MEMBAR_STORESTORE()
AFAICT, `MEMBAR_STORESTORE()' is implemented as:
.inline MEMBAR_STORESTORE, 0
membar #StoreStore
.end
`TASLock' and `TTASLock' are broken on unlock in RMO, and the lock members
are both broken wrt `lock/trylock' in TSO; all of them are broken on RMO
mode `lock/trylock/unlock'. First of all a `#StoreStore' on RMO is not
strong enough to unlock a mutex. You need at least a `#LoadStore |
#StoreStore'. Second, its in the wrong place! You need to place the membar
before the store to `_lock'. This is all very simple wrt locking primitives.
How can you possibly get that wrong? What am I missing? Also, even if you
got the membars right in TASLock and TTASLock, how do you deal with
link-time optimizations?
Perhaps you should model your algorihtm in Relacy:
http://groups.google.com/group/relacy
A proper non-asymmetric spinlock should have a `#StoreLoad | #StoreStore'
barrier after atomic lock acquisition, and a `#LoadStore | #StoreStore'
before atomic lock release. This is basic knowledge. I do not see any of
that in TASLock or TTASLock. Clear my mind Sir. I am trying to HELP you
here.
I am focusing on the SPARC TSO and RMO models and how your implementation
interacts with them.
TTAS is the only relevant here, and you continue to talk about TAS, so
you'll be able to try and educate me, and show how smart you are. This
is not needed, as I have mentioned.
If you had a bit of experience, you'll understand that you use here
brute force, which is not needed/must. TTAS, first load "_lock", which
is invalidated by the STORESORE in the unlock. Your mistake is that no
#LoadStore | #StoreStore is not needed before the unlock, since we are
already in the lock, and your inexperience comes to practice.
And if you want to access common memory after the lock LOADSTORE. I
would not add extra code, and would punish everyone, just because you
do not have experience.
Let’s stop this thread, you want to help everyone, do something, post
your _tLock, which would not be minimal, and would be for people that
want have things easy & simple, and do not want to micro optimize the
code !!
Stop misguiding people, you do not know what you are talking about in
this case, and fail to see the small details, others should not me
misguided by your miss education.
Thanks, Peace Man, Moran Tzafrir.
Be positive, be constructive, try to do new thing, instead of downing
other, build the world do not name it.
Again, I make sure the lock work, with minimal work,
you want to access common location LOADSTORE man, no need to punish
others.
Peace,
Moran.
> TTAS is the only relevant here, and you continue to talk about TAS, so
> you'll be able to try and educate me, and show how smart you are. This
> is not needed, as I have mentioned.
AFAICT, I cannot safely use TASLock or TTASLock as-is. They are not correct
in TSO and/or RMO SPARC modes. Here is a correct generic mutex in plain
pseudo-code:
class mutex {
enum constant {
UNLOCKED = 0,
LOCKED = 1,
CONTENTION = 2
};
atomic_word m_state; // = UNLOCKED
event m_waitset; // = false
public:
void lock() {
if (ATOMIC_SWAP(&m_state, LOCKED)) {
while (ATOMIC_SWAP(&m_state, CONTENTION)) {
m_waitset.wait();
}
}
MEMBAR #StoreLoad | #StoreStore;
}
void unlock() {
MEMBAR #LoadStore | #StoreStore;
if (ATOMIC_SWAP(&m_state, UNLOCKED) == CONTENTION) {
m_waitset.post();
}
}
};
Here is a correct generic spin-lock in pseudo-code:
class spinlock {
enum constant {
UNLOCKED = 0,
LOCKED = 1
};
atomic_word m_state; // = UNLOCKED;
public:
void lock() {
while (ATOMIC_SWAP(&m_state, LOCKED)) {
// backoff/yield;
}
MEMBAR #StoreLoad | #StoreStore;
}
void unlock() {
MEMBAR #LoadStore | #StoreStore;
ATOMIC_STORE(&m_state, UNLOCKED);
}
};
This is trivial to me. Those memory barriers and there #StoreLoad to
#LoadStore constraints are necessary to keep memory visibility confined
within the critical section. This is REQUIRED for general purpose locks.
I am not misleading anybody. I already know how to implement locks from
scratch. Its not that hard. Please use Relacy to test your algorithms wrt a
C++0x context. I am not insulting you. I am trying to help you; please try
to realize that! I want you algorithm to work. TASLock and TTASLock are
busted according to me. Please consider fixing them.
> TTAS is the only relevant here, and you continue to talk about TAS, so
> you'll be able to try and educate me, and show how smart you are. This
> is not needed, as I have mentioned.
> If you had a bit of experience, you'll understand that you use here
> brute force, which is not needed/must. TTAS, first load "_lock", which
> is invalidated by the STORESORE in the unlock. Your mistake is that no
> #LoadStore | #StoreStore is not needed before the unlock, since we are
> already in the lock, and your inexperience comes to practice.
>
You need a #StoreLoad | #StoreStore after the atomic update which grabs the
lock, and you need a #LoadStore | #StoreStore before you atomically release
it. This wrt to general purpose locking semantics.
Thanks, Ms. Moran.
YIKES!!!!!!!!!!!!
I am very sorry for that retarded mistake!
;^(...
Why should I supply brute force lock, if after one unlock, I want to
do private calculation, and only then call MEMBAR, it would be
better.
I am not implementing here Lock, but hash, the effort for the template
lock was just for that, let everyone do what they want, if they want
MCS lock, they can. If they want to control the LOADSTORE they can.
This is the idea behind the _tLock effort. It was not done for
nothing; generic algorithm is just for that. When you try to define
one implementation for a lock, you say that the generic lock algorithm
capabilities are not needed. And I say they are needed, people should
know if they need MSC Lock, TTAS lock, or MUTEX. I would not force
them to use one thing.
This is the idea, and I am sorry you are missing.
Thanks, Moran.
I point out what I think is an error; sorry.
> It seems people here think they in the town square, and must show
> everyone that no one know anything and they are smart, they are the
> only one that understand.
> And hard work of other people is nothing compared to their big
> brains.
>
> Be positive, be constructive, try to do new thing, instead of downing
> other, build the world do not name it.
> Again, I make sure the lock work, with minimal work,
> you want to access common location LOADSTORE man, no need to punish
> others.
Look at the memory barriers within `membar_enter()/membar_exit()':
http://docs.sun.com/app/docs/doc/816-5168/membar-enter-3c?a=view
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/atomic/sparcv9/atomic.s
I know how to create certian sync algorihtms on SPARC, so does SUN.
Here is SUN impl:
914 ENTRY(membar_enter)
915 membar #StoreLoad|#StoreStore
916 retl
917 nop
918 SET_SIZE(membar_enter)
919
920 ENTRY(membar_exit)
921 membar #LoadStore|#StoreStore
922 retl
923 nop
924 SET_SIZE(membar_exit)
Yout TASLock and TTASLock are not general purpose spinlocks. They have
busted memory visibility. If you want to fix them so that they can be used
generally try:
class TTASLock {
public:
unsigned int volatile _lock;
TTASLock() : _lock(0) {}
~TTASLock() {_lock=0;}
void init() {_lock = 0;}
void lock() {
do {
if(0 == _lock) {
if (0 == CASIO(&_lock, 0, 0xFFFFFFFF))
membar_enter();
return;
}
} while(true);
}
bool tryLock() {
return (0==_lock && 0 == CASIO(&_lock, 0, 0xFFFFFFFF));
}
void unlock() {
membar_exit();
_lock = 0;
// Memory::write_barrier();
}
};
What am I missing? How do you get around the #StoreLoad to #LoadStore
constraints in TASLock and TTASLock? They most certainly cannot be used for
general purpose spinlock primitives on SPARC in TSO and/or RMO mode.
I thought that TASLock and TTASLock were general purpose spinlocks. Your
saying I am dead wrong right? I want to clear that up.
I am assuming that TASLock and TTASLock are general purpose spinlocks which
guarantee memory visibility semantics that are expected/required. However,
the source code clearly shows that those locks do not have the proper
membars in place. Therefore, I can't quite see how multiple writer threads
can use them in the hash code. The mutations of the critical section can be
hoisted above the lock acquire in TSO and RMO mode and the, and the lock
release can be hoisted above the mutations within the critical section in
RMO mode.
I simply have to be missing something very important here. Please help me
clear my mind!
;^O
Anyway thanks :-) I'll add standard lock too, in next near version.
Would email you the first version.
Thanks, Moran.
I have gained a fairly deep understanding of memory visibility on several
arch's. Perhaps I can be of service to you. I am sorry for coming across so
harshly. I just got focused on the implementation of TASLock and TTASLock. I
have not even begun exploring the interesting hash algorihtm itself. See, I
assumed that those locks were general purpose. I think I am correct in my
assumption, unless I am missing something vital. I am interested in your
work, and have read MANY papers created by Maurice Herlihy and/or Nir
Shavit. I have great respect for them. In fact, Maurice's brother Greg
posted a couple of articles in this group. It seems like the entire Herlihy
family is very smart! :^)
I _really_ enjoyed reading the following paper referenced here:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8ad297f61b369a41
The only "problem" I have with Maurice Herlihy is that his name is
referenced as an inventor in the following patent application referenced
here:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f2c94118046142e8
IMVHO, that patent application takes Joe Seighs excellent invention and work
on atomic_ptr:
http://atomic-ptr-plus.sourceforge.net
AFAICT, the patent application is based on prior art and should be 100%
rejected; end of story. IMO, nothing new was invented here, except by Joe
Seigh. IIRC, they did not even reference his work! Here is Joe's original
work on atomic_ptr concepts:
I believe he contacted the so-called inventors, and his response was
something like: "talk to our lawyers". Oh well, shi% happens. Take care wrt
what you post on this group, it may be stolen from you...
;^o
> Anyway thanks :-) I'll add standard lock too, in next near version.
> Would email you the first version.
My email address is:
cristom<2347239798243>@<32424>charter.<3231>net
Remove all `<' and `>' characters, and all the crap in between.
> class TTASLock {
> public:
> unsigned int volatile _lock;
>
> TTASLock() : _lock(0) {}
> ~TTASLock() {_lock=0;}
>
> void init() {_lock = 0;}
>
> void lock() {
> do {
> if(0 == _lock) {
> if (0 == CASIO(&_lock, 0, 0xFFFFFFFF))
> return;
> }
> } while(true);
> }
>
> bool tryLock() {
> return (0==_lock && 0 == CASIO(&_lock, 0, 0xFFFFFFFF));
> }
>
> void unlock() {
> _lock = 0;
> Memory::write_barrier();
> }
> };
Provided following code:
TTASLock lock;
//...
lock.lock();
// critical section
// lock.unlock() contents below:
lock._lock = 0;
Memory::write_barrier();
It's perfectly possible that compiler will reorder the code as:
lock.lock();
// lock.unlock() contents below:
lock._lock = 0;
// critical section
Memory::write_barrier();
This makes lock senseless, because no code is executed under it
protection.
Or TTASLock is intended only for Microsoft VC++, which makes semantics
of volatile keyword stronger?
The same reordering is possible on hardware level (on relaxed memory
models - IA-64, PowerPC, Sparc RMO). Note that NO code crosses
Memory::write_barrier() in this case.
--
Dmitriy V'jukov
FWIW:
If you really want to optimize your Intel implementation please read section
7.2 here:
http://download.intel.com/design/processor/manuals/253668.pdf
As for SPARC, check opensparc.org.
PPC/Cell: http://www.ibm.com/developerworks/views/power/downloads.jsp
Check power.org as well.
Also check out:
http://www.rdrop.com/users/paulmck
Check out sicortex.com and see if your algorithm can scale on their systems.
I have conversed with a person who works there over on comp.arch wrt MPI
impl... Keep that newsgroup in mind as well!
One more thing... I would prefer RCU over SMR any day... However, Joe Seigh
cleverly combined the two and invented a new way to do hazard pointers.
Perhaps you should contact him. Using original SMR is WAY to expensive
because of the nasty #StoreLoad membar in hazard pointer load. You can do
proxy gc with SMR, but its nice to do it without #StoreLoad!
;^)
What kind of checks do your tests include?
I am asking because I was also thinking that my hash map "works" (I
knew that there are some bugs, but at least in simple tests it looks
like it works). After that I rewrite the test so that it uses
satellite data (keys and values are strings which dynamically
allocated and freed) and incorporate TONS of PARANOID checks
everywhere (add magic signs into objects, check consistency of data on
every read, zeroize objects before freeing, etc). This new test reveal
~10 extremely critical errors which cause inconsistent data, paging
faults, memory leaks, etc here, there and everywhere. Although on
simple test with embed keys and data everything looked perfectly Ok.
AFAIR your test also uses NON dynamically allocated data...
Btw, here is fixed version of my hash map:
http://groups.google.com/group/lock-free/files
--
Dmitriy V'jukov
Right. Also, even on SPARC TSO, some loads in the critical-section can be
hoisted up above the CAS in the lock function of TTASLock. AFAICT, the CAS
(e.g., CASIO) is naked in their implementation. IIRC, #StoreLoad is the only
membar that is not a NOP on TSO SPARC. For instance, you need it for
Petersons Algorihtm, even if you use atomic RMW. This differs from Intel
where a LOCK instruction acts like a #StoreLoad for WB memory.
Thanks, Moran Tzafrir.
P.S.
- checking your input about the locks, thanks.
- Maybe I am open-source woman, so I guess I do not think what in the
code I release, maybe I should ;-)
> Right. Also, even on SPARC TSO, some loads in the critical-section can be
> hoisted up above the CAS in the lock function of TTASLock. AFAICT, the CAS
> (e.g., CASIO) is naked in their implementation. IIRC, #StoreLoad is the only
> membar that is not a NOP on TSO SPARC. For instance, you need it for
> Petersons Algorihtm, even if you use atomic RMW. This differs from Intel
> where a LOCK instruction acts like a #StoreLoad for WB memory.
Here is one moment I don't quite understand.
CAS includes both load and store. Let's assume that there is no
#StoreLoad after CAS, only #LoadLoad | #LoadStore (i.e. load-acquire).
Here is pseudo code for mutex unlock followed by mutex lock:
// some code from previous critical section:
tmp = data;
data = tmp + 1;
// mutex unlock:
membar #StoreStore | #LoadStore;
lock = UNLOCKED;
// mutex lock (unrolled CAS):
tmp = lock;
if (tmp != UNLOCKED) goto retry;
lock = LOCKED;
membar #LoadLoad | #LoadStore;
// critical section:
tmp = data;
data = tmp + 1;
At maximum this can be reordered as:
// mutex lock (unrolled CAS):
tmp = lock;
if (tmp != UNLOCKED) goto retry;
membar #LoadLoad | #LoadStore;
// critical section:
tmp = data;
data = tmp + 1;
lock = LOCKED;
The question is: can this reordering lead to some problem?
Code from previous critical section and from current critical section
can not be interleaved (as I understand), because load that yield
UNLOCKED can not hoist above store of UNLOCKED.
Consider following simplified "one-shot" mutex:
initially lock = LOCKED
thread 1:
tmp = data;
data = tmp + 1;
membar #StoreStore | #LoadStore;
lock = UNLOCKED;
thread 2:
if (lock == UNLOCKED)
{
membar #LoadLoad | #LoadStore;
tmp = data;
data = tmp + 1;
}
Here #LoadLoad | #LoadStore is enough in thread 2, because there is
just no stores to lock in thread 2. So why #LoadLoad | #LoadStore is
not enough in general mutex lock() implementation?
--
Dmitriy V'jukov
M1: > tmp = lock;
M2: > if (tmp != UNLOCKED) goto retry;
M3: > membar #LoadLoad | #LoadStore;
> // critical section:
M4: > tmp = data;
M5: > data = tmp + 1;
M6: > lock = LOCKED;
> The question is: can this reordering lead to some problem?
That would allow two threads in the critical-section at the same time. Not
sure the reordering is valid anyway. I think your breaking apart the atomic
RMW operations down to far. It should go like:
void mutex::lock() {
M0: while (! CAS(&m_state, 0, 1)) backoff();
M1: MEMBAR #LoadStore | #LoadLoad;
}
void mutex::unlock() {
M2: MEMBAR #LoadStore | #StoreStore;
M3: STORE(&m_stater, 0);
}
I am not sure if its valid for `M1' to intermingle with the RMW portion of
the preceding atomic CAS. The membar should not be able to penetrate the
atomic operation.
void foo() {
mutex::lock();
tmp = data;
data = tmp + 1;
mutex::unlock();
}
> Code from previous critical section and from current critical section
> can not be interleaved (as I understand), because load that yield
> UNLOCKED can not hoist above store of UNLOCKED.
> Consider following simplified "one-shot" mutex:
> [...]
> So why #LoadLoad | #LoadStore is
> not enough in general mutex lock() implementation?
It should be good enough for read access on a rw-lock implementation. Using
`#LoadStore | #LoadLoad' might not be compatible with the actual
critical-section. Stores from the critical-section (e.g., `data = tmp + 1;')
can hoist above the atomic op which acquired the mutex. This can allow for
multiple stores to be committed outside of the protected region. I guess it
boils down to what the critical-section is comprised of. For the one shot
mutex, you probably don't even need the `#LoadStore' constraint:
foo m_data = 0;
atomic_word m_state = 0;
void foo::single_producer() {
init();
MEMBAR #StoreStore;
ATOMIC_STORE(&m_state, 1);
}
void foo::multiple_readers() const {
while (! ATOMIC_LOAD(&m_state)) backoff();
MEMBAR #LoadLoad;
use();
}
Humm...