On 8/19/2024 7:22 AM, Bonita Montero wrote:
I've developed a lock-free alternative to atomic<shared_ptr<>>. The idea is that the pointer inside this atomic has two counters which come along[...]
in a size_t'd word. The first counter has a width of 24 bit and counts
the number of control block increments which are in progress, so I'm
also using double reference counting. The remaing 40 bits of the second
word contain a counter which constantly increments when the pointer is
being overwritten. This prevents the lowerr 24 bit to be decremented
if the pointer is already overwritten.
The code is somewhat faster than the futex'd solution of MSVC and
mangitudes faster than the clang's libc++ atomic<shared_ptr<>>.
Here's the code:
Need to read it, but are you familiar with Joe Seighs atomic_ptr? Its been out for a long time.
Am 20.08.2024 um 15:11 schrieb jseigh:
I need to update it with actual C++ atomics. I'll have to use the folly lib trick of packing a 48 bit ptr into 64 bits with 16 bits left over for
a reference count since it's not likely C++ will support DWCAS.
Only having 16 bits is dangerous and doesn't work with newer x86-CPUs
which use 55 bit for the userspace.
On 8/20/2024 9:21 AM, Bonita Montero wrote:
Am 20.08.2024 um 15:11 schrieb jseigh:That's seriously big. There's not enough memory to require something like that.
I need to update it with actual C++ atomics. I'll have to use the folly lib trick of packing a 48 bit ptr into 64 bits with 16 bits left over for
a reference count since it's not likely C++ will support DWCAS.
Only having 16 bits is dangerous and doesn't work with newer x86-CPUs
which use 55 bit for the userspace.
On 8/20/2024 8:28 PM, Bonita Montero wrote:
Am 20.08.2024 um 21:27 schrieb Chris M. Thomasson:
Wrt your atomic_ptr:
struct anchor
{
void* pointer;
word count;
};
Wrt to standard C++11 it should be lock-free on many arch's by now, aka, they have a real DWCAS. It can be checked with is_lock_free() wrt:
g++ doesn't support 16 byte atomics which are lock-free, also not MSVC
(but 8 byte lock-free atomics in 32 bit mode). But clang++ does support
it. I use it with a "if constexpr( SUBSTITUTABLE )" in my dcas_atomic.h.
It's been a while since I checked them, but damn it... C++11 should be using native DWCAS if the underlying arch supports it wrt (struct anchor). Ahh shit. Well, I need to check again.
https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free
is_always_lock_free is better
Touche. :^)
Am 21.08.2024 um 14:24 schrieb jseigh:
No, C++ won't. Their reasoning is they can only do it if a 128 bit atomic can be defined
with all the operations that other atomic types have. But some architectures don't have
128 bit atomic loads and/or stores (not performant ones anyway) so no go in their view.
The idea that you could provide DWCAS some other way seems not to fit into their
world view. And lock-free programming isn't important to them.
It's not necessary to define an explicit atomic for that in the
standard. clang++ supports lock-free atomics up to any 128 bit
trivial type. That's what all compilers for x84 and ARM should
support.
Am 22.08.2024 um 17:17 schrieb jseigh:
On 8/21/2024 8:44 AM, Bonita Montero wrote:
Am 21.08.2024 um 14:24 schrieb jseigh:Not sure what you are talking about. Clang documentation is only clear
No, C++ won't. Their reasoning is they can only do it if a 128 bit atomic can be defined
with all the operations that other atomic types have. But some architectures don't have
128 bit atomic loads and/or stores (not performant ones anyway) so no go in their view.
The idea that you could provide DWCAS some other way seems not to fit into their
world view. And lock-free programming isn't important to them.
It's not necessary to define an explicit atomic for that in the
standard. clang++ supports lock-free atomics up to any 128 bit
trivial type. That's what all compilers for x84 and ARM should
support.
if you already know the answer. I tried using _Atomic __uint128_t and
it didn't like it.
Anyway I have a gcc inline DWCAS so as far as I am concerned C/C++
support is still where it was in 2003, meaning I still have to use
the same techniques I used back then.
I tested this with clang++-18:
#include <iostream>
#include <atomic>
using namespace std;
int main()
{
atomic<__int128> a;
cout << sizeof(a) << endl;
cout << a.is_lock_free() << endl;
}
This prints:
16
0
So the atomic<__int128> is reported not to be lock-free, but
it calls libatomic, which internally uses CMPXCHG16B. Seems
the compiler thiks "I dont know what libatomic does, but I
guess it's not lockfree", although it actually.
On 8/22/2024 8:17 AM, jseigh wrote:
Anyway I have a gcc inline DWCAS so as far as I am concerned C/C++
support is still where it was in 2003, meaning I still have to use
the same techniques I used back then.
Does this logic ring a bell? I saw your DWCAS asm impl _way_ back on c.p.t, 20+ years now?
____________________
.align 16
.globl np_ac_i686_atomic_dwcas_fence
np_ac_i686_atomic_dwcas_fence:
pushl %esi
pushl %ebx
movl 16(%esp), %esi
movl (%esi), %eax
movl 4(%esi), %edx
movl 20(%esp), %esi
movl (%esi), %ebx
movl 4(%esi), %ecx
movl 12(%esp), %esi
lock cmpxchg8b (%esi)
jne np_ac_i686_atomic_dwcas_fence_fail
xorl %eax, %eax
popl %ebx
popl %esi
ret
np_ac_i686_atomic_dwcas_fence_fail:
movl 16(%esp), %esi
movl %eax, (%esi)
movl %edx, 4(%esi)
movl $1, %eax
popl %ebx
popl %esi
ret
____________________
;^)
On 8/22/2024 1:03 PM, jseigh wrote:
that would be for 32 bit cpus. I use gcc inline assembly. Also I update the expected value if the cas fails.
So does this wrt updating the expected value:
np_ac_i686_atomic_dwcas_fence_fail:
movl 16(%esp), %esi
movl %eax, (%esi)
movl %edx, 4(%esi)
movl $1, %eax
popl %ebx
popl %esi
ret
I remember you teaching me about atomic_ptr _way_ back over on c.p.t 20+ years ago. I saw your original inline asm wrt DWCAS for GCC.
I got an idea why split reference counting doesn't safely work: If one
thread increments the pointer's reference count, gets suspended then
and another thread moves the pointer to another atomic<shared_ptr<>>
and "installs" it with a new local reference count a thread then might
erase the pointer in its new position and deallocate then the object.
This situation is not covered by split reference counting. I guess
that's the reason while MSVC, libstdc++ and libc++ use futexes and
don't supply a lock-free alternative.
I've dropped my code and now I implemented a futex'd solution, but
still with the mentioned shortcut when I assign a thsared_obj to a
shared_obj that tests if both pointers are the same before the futex
is locked.
Am 03.10.2024 um 16:50 schrieb jseigh:
I never did figure out what split reference counting was all about.
Even if you have endless amount of time to kill by looking at the
source, it still may not tell you what need to know. There's
"how you use this" and what not to use it for, i.e. the semantics.
https://www.youtube.com/results?search_query=lockfree+atomic+shared_ptr
A corollary of the burden of proof principle. If someone can't
be bothered to explain something, I don't feel any obligation
to figure out what the heck they are talking about.
It't so complicated that I got to the afforementioned conclusion
very late.
Besides, I already have an atomic shared pointer implantation
that doesn't have those problems.
With hazard pointers that's still possible, but not otherwise.
Am 03.10.2024 um 18:29 schrieb jseigh:
With hazard pointers that's still possible, but not otherwise.
Don't take my word for it then.
https://patents.google.com/patent/US7769791B2/en
Which also shows you can patent prior art.
Maybe hazard pointers were invented 2005.
Yes, it is well known, or maybe not so well known, that you
can implement atomic lock-free reference counting with deferred
reclamation, e.g. hazard pointers, rcu, garbage collection,
etc... A lot faster for read access that doesn't need to
change the reference count.
Look at the code which copies a tshared_obj<T> to a shared_obj<T>:
template<typename T>
shared_obj<T> &shared_obj<T>::operator =( tshared_obj<T> const &tso ) noexcept
{
using namespace std;
ctrl_t *ctrl = tso.m_ctrl.load( memory_order_relaxed );
if( ctrl == m_ctrl )
return *this;
if( m_ctrl )
{
if( m_ctrl->decr() == 1 )
delete m_ctrl;
m_ctrl = nullptr;
}
lock_guard<futex> lock( tso.m_ftx );
ctrl = tso.m_ctrl.load( memory_order_relaxed );
if( ctrl == m_ctrl )
return *this;
m_ctrl = ctrl;
ctrl->incr();
return *this;
}
The shortcut in the beginning compares the two pointers before locking
the futex. So if the central tshared_ptr<> hasn't changed the whole
operation of updating the shared_obj<> is just a nanosecond and all participating cachelines stay in shared mode. With that you need no
hazard pointers, no RCU or whatever and you have immediate object
destruction without any delay.
I think it would be the best to suggest an assignment-operator for shared_ptr<> which takes an atomic<sharded_ptr<>> which also has this
simple optimization.
The idea is so brain damaged simple and makes all more advanced techno- logies for RCU-like patterns gratitious.
On 10/4/2024 12:50 PM, Bonita Montero wrote:
Am 04.10.2024 um 21:40 schrieb Chris M. Thomasson:
I assume you know exactly how SMR works.
Maybe I'll use hazard pointers if C++26 supports it. But as I've
shown this is not necessary for atomic<shared_ptr<>> with this
small optimization.
Actually, split reference counting is strange to me because atomic_ptr
has already been out there for a long time and works fine.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 153:02:29 |
| Calls: | 12,091 |
| Calls today: | 4 |
| Files: | 15,000 |
| Messages: | 6,517,664 |