Complete junk, I'm afraid.
Your code assumes that you can just magically stick a node into the shared
data structure without worrying about what another thread is doing.
You've written an ordinary doubly-linked list and simply *called*
it "wait-free".
What if two processors execute "last->v = v" at the same time?
What if two processors execute "last->next = tmp" at around the same time?
What if two processors execute "last = tmp" at the same time?
What if you have this scenario:
Processor A Processor B
last->next = tmp
last->next = tmp
last = tmp
last = tmp
How about this one:
Processor A Processor B
last->next = tmp
last->next = tmp
last = tmp
last = tmp
It is baffling as to how you can neglect such questions in code that
is supposed to be concurrent programming.