• Wait-free Hazard Pointers Using Std Atomics

    From jseigh@21:1/5 to All on Mon May 26 17:58:58 2025
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers-using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea. The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Tue May 27 12:26:10 2025
    On 5/26/25 19:17, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.  The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Interesting, thanks Joe! Fwiw, I think, something like FlushProcessWriteBuffers() should be in a new C++ std? Perhaps, allow a system to use DWCAS without resorting to not lock-free wrt the checks?
    Damn it. ;^o Ouch.

    https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/ nf-processthreadsapi-flushprocesswritebuffers

    Sometimes I think that function was specifically designed for exotic
    RCU, ect...


    It's basically the equivalent of linux membarrier(). I don't know
    why they just focus on writes, it's a full membar AFAIK.

    As far as the wait-free hazard pointer goes, I though there would
    be forward progress issues with the reader threads but it's not
    an issue other schemes like URCU has with quiescent states.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to jseigh on Wed May 28 14:09:05 2025
    On 5/26/25 17:58, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

    Added an additional post with a bit more explanation.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Thu May 29 17:17:08 2025
    On 5/28/25 18:34, Chris M. Thomasson wrote:
    On 5/26/2025 2:58 PM, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.  The only wait-free version of hazard pointers
    I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.

    Joe Seigh

    Actually wrt:
    _______________
    hz_ptr.store(INDETERMINATE, std::memory_order_relaxed);
    T* local = ptr.load(std::memory_order_relaxed);
    hz_ptr.store(local, std::memory_order_relaxed);
    _______________

    For some damn reason, it kind of reminds me of an older way to use only single word exchange to push into a lock-free stack.

    Iirc, something like this pseudo code for push:
    _____________
    cur->next = WAIT_STATE;
    node* prev = exchange(head, cur);
    cur->next = prev;
    _____________

    There is a "window" in there where cur->next will be WAIT_STATE. So,
    when a thread iterating the list, say after pop all, flush, it can do a couple of spins or do something else during iteration on a cur->next
    being in WAIT_STATE. It's a way to get exchange on push and pop of a stack.

    The window is very small. I am having trouble finding on this group
    where I posted about it before in a working program...


    Push may wait-free but pop isn't event lock-free.

    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer. The lock-free version is
    more than performant.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Mon Jun 2 18:07:22 2025
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for push,
    SWAP for pop. It needs to have multiple stacks and distribute across
    them because popping all from a single stack can mean a thread gets too
    much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world applications which is one reason I not going to implement it.


    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.  The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Mon Jun 23 07:03:13 2025
    On 6/3/25 15:09, Chris M. Thomasson wrote:
    On 6/2/2025 3:07 PM, jseigh wrote:
    On 6/1/25 16:31, Chris M. Thomasson wrote:
    On 5/29/2025 2:17 PM, jseigh wrote:>>>

    Push may wait-free but pop isn't event lock-free.



    Actually, I like my futex stack experiment as is. normal CAS for
    push, SWAP for pop. It needs to have multiple stacks and distribute
    across them because popping all from a single stack can mean a thread
    gets too much work. Push lock-free, pop wait-free.


    Well you actually have a lot of leeway in concurrent queue semantics.
    Drop the FIFO requirements since they're not really observable for
    all practical purposes and just concentrate on throughput forward
    progress.

    And wait-free is highly overrated. It seems to be important only in
    theses, technical papers and obscure blogs, not so much is real world
    applications which is one reason I not going to implement it.

    Perhaps. However, using a loop-free exchange is "faster" than a loop
    based CAS? On Intel, XCHG vs a CMPXCHG loop?




    Anyway, after some consideration, I'm off on this
    wait-free hazard pointer.  The lock-free version is
    more than performant.

    I did post on my blog how to improve reclaim forward progress for the
    wait-free hazard pointer so it's equivalent to a proxy collector.


    I've been doing reader lock-free too long. I just realized how
    trivially simple it is to do read write lock-free for almost any data structure. That's slightly embarrassing.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Fri Jun 27 17:54:46 2025
    On 6/27/25 01:26, Chris M. Thomasson wrote:
    On 6/23/2025 4:03 AM, jseigh wrote:


    I've been doing reader lock-free too long.  I just realized how
    trivially simple it is to do read write lock-free for almost any data
    structure.  That's slightly embarrassing.

    :^) Indeed. But, its not a bad skill to have. Well, at least to me.

    I'm not talking about skill but a general lock-free technique for
    adding or removing nodes from an arbitrary linked data structures.
    It might be extendable to multiple nodes but I think there would
    be restrictions.

    I'm doing everything as though exercises these days. Looks good
    so far. :)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Sun Jun 29 12:16:08 2025
    On 6/28/25 18:41, Chris M. Thomasson wrote:

    How does the technique adapt to different methods of creating linked
    data structures?


    Working through that. It's low priority work. There's a bunch of
    Java stuff I need to get to.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Sun Jul 20 11:25:45 2025
    On 7/4/25 17:56, Chris M. Thomasson wrote:
    On 6/29/2025 9:16 AM, jseigh wrote:
    On 6/28/25 18:41, Chris M. Thomasson wrote:

    How does the technique adapt to different methods of creating linked
    data structures?


    Working through that.  It's low priority work.  There's a bunch of
    Java stuff I need to get to.

    I did figure out how to do certain operations on linked lists. It could
    work on other linked data structures provided the operations met certain criteria. Something to use if I can think of any interesting
    applications.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Tue Jul 22 16:17:05 2025
    On 7/21/25 16:36, Chris M. Thomasson wrote:
    On 7/20/2025 8:25 AM, jseigh wrote:
    On 7/4/25 17:56, Chris M. Thomasson wrote:
    On 6/29/2025 9:16 AM, jseigh wrote:
    On 6/28/25 18:41, Chris M. Thomasson wrote:

    How does the technique adapt to different methods of creating
    linked data structures?


    Working through that.  It's low priority work.  There's a bunch of
    Java stuff I need to get to.

    I did figure out how to do certain operations on linked lists.  It could
    work on other linked data structures provided the operations met certain
    criteria.  Something to use if I can think of any interesting
    applications.

    Iirc, you made an old post on c.p.t many years ago about a somewhat
    general case using RCU and/or your atomic-ptr-plus proxy collector, for linked node based tree algorithms in the past? Is this related to it?

    No, that is unrelated. A lot of partial copy on writes IIRC.

    Also unrelated - I found an old post of mine suggesting using
    a queue to avoid syscalls like io_uring does.

    https://groups.google.com/g/comp.arch/c/32oorU6fTts/m/d-dYChmSosEJ

    Seems to be a twenty year delay on when I predict things and
    they happen.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to jseigh on Sun Jul 27 13:13:40 2025
    On 5/26/25 17:58, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.  The only wait-free version of hazard pointers I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.



    For some reason I though restartable sequences (RSEQ) couldn't be
    inlined but apparently it can. I did a quick and dirty perf test
    of hp protect and it looks like it runs 3x faster. But this will
    only work on linux and only with asymmetric memory barriers.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Mon Jul 28 07:11:01 2025
    On 7/27/25 16:36, Chris M. Thomasson wrote:
    On 7/27/2025 10:13 AM, jseigh wrote:
    On 5/26/25 17:58, jseigh wrote:
    and asymmetric memory barriers of course.

    https://threadnought.wordpress.com/2025/05/26/wait-free-hazard-
    pointers- using-std-atomics/

    Looks like it runs faster without the conditional branch.

    Not sure it's a new idea.  The only wait-free version of hazard pointers >>> I've seen so far involves a storage to storage move instruction which
    might not be available on all platforms.



    For some reason I though restartable sequences (RSEQ) couldn't be
    inlined but apparently it can.  I did a quick and dirty perf test
    of hp protect and it looks like it runs 3x faster.  But this will
    only work on linux and only with asymmetric memory barriers.

    Please excuse my ignorance, by why only on Linux? windows has asymmetric memory barriers...

    No restartable sequences on windows AFAIK.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)