On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a distributed seqlock for some reason. Are you using an asymmetric membar in here? in smr_poll ?
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>>> instead of mostly wait-free. The reader lock logic after loading >>>>>> the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>> it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region
is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get back to
you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait- >>>>>>>> free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>> instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>> it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent. >>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit happens. >>>>>>>
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric
membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical region >>>> is 3 instructions unless you use atomics which are expensive and have
memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look
somewhat similar so you have to know how the reclaim logic uses it.
In this case I am slingshotting off of the asymmetric memory barrier.
Earlier at one point I was going to have smrproxy use hazard pointer
logic or qsbr logic as a config option, but the extra code complexity
and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were still >>>> some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work
besides just rewriting smrproxy. There coming up with an api for
proxies and testcases which tend to be more work than the code that
they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get back
to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric membar?
On 10/27/2024 3:29 PM, jseigh wrote:
On 10/27/24 15:33, Chris M. Thomasson wrote:
On 10/25/2024 3:56 PM, jseigh wrote:
On 10/25/24 18:00, Chris M. Thomasson wrote:
On 10/18/2024 5:07 AM, jseigh wrote:
On 10/17/24 19:40, Chris M. Thomasson wrote:
On 10/17/2024 2:08 PM, jseigh wrote:
On 10/17/24 16:10, Chris M. Thomasson wrote:
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now >>>>>>>>>> wait- free
instead of mostly wait-free. The reader lock logic after loading >>>>>>>>>> the address of the reader lock object into a register is now 2 >>>>>>>>>> instructions a load followed by a store. The unlock is same >>>>>>>>>> as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting >>>>>>>>>> it to c++ and don't want to waste any more time on c version. >>>>>>>>>>
No idea of it's a new algorithm. I suspect that since I use >>>>>>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>>>>>> the quiescent states are.
I have to take a look at it! Been really busy lately. Shit
happens.
There's a quick and dirty explanation at
http://threadnought.wordpress.com/
repo at https://github.com/jseigh/smrproxy
I'll need to create some memory access diagrams that
visualize how it works at some point.
Anyway if it's new, another algorithm to use without
attribution.
Interesting. From a quick view, it kind of reminds me of a
distributed seqlock for some reason. Are you using an asymmetric >>>>>>> membar in here? in smr_poll ?
Yes, linux membarrier() in smr_poll.
Not seqlock, not least for the reason that exiting the critical
region
is 3 instructions unless you use atomics which are expensive and have >>>>>> memory barriers usually.
A lot of the qsbr and ebr reader lock/unlock code is going to look >>>>>> somewhat similar so you have to know how the reclaim logic uses it. >>>>>> In this case I am slingshotting off of the asymmetric memory barrier. >>>>>>
Earlier at one point I was going to have smrproxy use hazard pointer >>>>>> logic or qsbr logic as a config option, but the extra code complexity >>>>>> and the fact that qsbr required 2 grace periods kind of made that
unfeasible. The qsbr logic was mostly ripped out but there were
still
some pieces there.
Anyway I'm working a c++ version which involves a lot of extra work >>>>>> besides just rewriting smrproxy. There coming up with an api for >>>>>> proxies and testcases which tend to be more work than the code that >>>>>> they are testing.
Damn! I almost missed this post! Fucking Thunderbird... Will get
back to you. Working on something else right now Joe, thanks.
https://www.facebook.com/share/p/ydGSuPLDxjkY9TAQ/
No problem. The c++ work is progressing pretty slowly, not least in
part because the documentation is not always clear as to what
something does or even what problem it is supposed to solve.
To think I took a pass on on rust because I though it was
more complicated than it needed to be.
Never even tried Rust, shit, I am behind the times. ;^)
Humm... I don't think we can get 100% C++ because of the damn
asymmetric membar for these rather "specialized" algorithms?
Is C++ thinking about creating a standard way to gain an asymmetric
membar?
I don't think so. It's platform dependent. Apart from linux, mostly
it's done with a call to some virtual memory function that flushes
the TLBs (translation lookaside buffers) which involves IPI calls
to all the processors and those have memory barriers. This is
old, 1973, patent 3,947,823 cited by the patent I did.
Anyway, I version the code so there's a asymmetric memory barrier
version and an explicit memory barrier version, the latter
being much slower.
Ahh, nice! acquire/release, no seq_cst, right? ;^)
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I say
C++, I mean pure C++, no calls to FlushProcessWriteBuffers and things
like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
On 10/28/2024 4:45 AM, jseigh wrote:
On 10/28/24 00:02, Chris M. Thomasson wrote:
On 10/27/2024 5:35 PM, jseigh wrote:
On 10/27/24 18:32, Chris M. Thomasson wrote:
The membar version? That's a store/load membar so it is expensive.
I was wondering in your c++ version if you had to use any seq_cst
barriers. I think acquire/release should be good enough. Now, when I
say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and
things like that.
I take it that your pure C++ version has no atomic RMW, right? Just
loads and stores?
While a lock action has acquire memory order semantics, if the
implementation has internal stores, you have to those stores
are complete before any access from the critical section.
So you may need a store/load memory barrier.
Wrt acquiring a lock the only class of mutex logic that comes to mind
that requires an explicit storeload style membar is Petersons, and some others along those lines, so to speak. This is for the store and load version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK
prefix, XCHG aside for it has an implied LOCK prefix. For instance the original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK prefix.
Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO
mode. On x86, the LOCK prefix handles that wrt the RMW's themselves.
This is a lot different than using stores and loads. The original SMR
and Peterson's algo needs that "store followed by a load to a different location" action to hold true, aka, storeload...
Now, I don't think that a data-dependant load can act like a storeload.
I thought that they act sort of like an acquire, aka #LoadStore |
#LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now,
the DEC Alpha is a different story... ;^)
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);
}
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
epoch_t is interesting. It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n
Only your single polling thread can mutate the shadow_epoch, right?
On 10/28/2024 6:17 PM, jseigh wrote:
On 10/28/24 17:57, Chris M. Thomasson wrote:
On 10/28/2024 4:45 AM, jseigh wrote:
fwiw, here's the lock and unlock logic from smrproxy rewrite
inline void lock()
{
epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
_ref_epoch.store(_epoch, std::memory_order_relaxed);
std::atomic_signal_fence(std::memory_order_acquire);^^^^^^^^^^^^^^^^^^^^^^
}
Still don't know how your pure C++ write up can handle this without an std::atomic_thread_fence(std::memory_order_acquire).
inline void unlock()
{
_ref_epoch.store(0, std::memory_order_release);
}
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar in
your C++ version. Sorry. However, it still would be fun to test
against... ;^)
On 10/29/2024 4:27 AM, jseigh wrote:
Yes. It's just an optimization. The reader threads could read
from the global epoch but it would be in a separate cache line
and be an extra dependent load. So one dependent load and
same cache line.
Are you taking advantage of the fancy alignment capabilities of C++?
https://en.cppreference.com/w/cpp/language/alignas
and friends? They seem to work fine wrt the last time I checked them.
It's nice to have a standard way to pad and align on cache line
boundaries. :^)
On 10/17/2024 5:10 AM, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
Joe, can you call dtors for nodes after a single epoch?
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
On 11/4/24 00:14, Chris M. Thomasson wrote:
On 10/30/2024 9:39 AM, jseigh wrote:
On 10/29/24 18:05, Chris M. Thomasson wrote:
On 10/28/2024 9:41 PM, Chris M. Thomasson wrote:
Ahhh, if you are using an async membar in your upcoming C++ version,
then it would be fine. No problem. A compiler fence ala
atomic_signal_fence, and the the explicit release, well, it will
work. I don't see why it would not work.
For some reason, I thought you were going to not use an async membar
in your C++ version. Sorry. However, it still would be fun to test
against... ;^)
The C version has both versions. The C++ version does only the
async member version. But I'm not publishing that code so it's
a moot point.
I got side tracked with more heavy math. The problem with C++ code that
uses an async memory barrier is that its automatically rendered into a
non-portable state... Yikes! Imvvvvvho, C/C++ should think about
including them in some future standard. It would be nice. Well, for us
at least! ;^)
That's never going to happen. DWCAS has been around for more than
50 years and c++ doesn't support that and probably never will.
You can't write lock-free queues that are ABA free and
are performant without that. So async memory barriers won't
happen any time soon either.
Long term I think c++ will fade into irrelevance along with
all the other programming languages based on an imperfect
knowledge of concurrency, which is basically all of them
right now.
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
On 10/17/24 08:10, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I got a port to c++ working now. There are 5 proxy implementations
1) smrproxy v2
2) arcproxy - reference counted proxy
3) rwlock based proxy
4) mutex based proxy
5) an unsafe proxy with no locking
The testcase is templated so you can use any of the
5 proxy implementations without rewriting for each proxy
type. You can do apple to apple comparisons. I
realize that's the complete antithesis of current
programming practices but there you have it. :)
A bit of clean up and performance tuning now.
On 11/23/2024 8:10 AM, jseigh wrote:
On 11/21/24 15:17, jseigh wrote:
On 10/17/24 08:10, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I got a port to c++ working now. There are 5 proxy implementations
1) smrproxy v2
2) arcproxy - reference counted proxy
3) rwlock based proxy
4) mutex based proxy
5) an unsafe proxy with no locking
The testcase is templated so you can use any of the
5 proxy implementations without rewriting for each proxy
type. You can do apple to apple comparisons. I
realize that's the complete antithesis of current
programming practices but there you have it. :)
A bit of clean up and performance tuning now.
Ok, smrproxy lock/unlock is down to 0.6 nanoseconds now,
about what the C version was.
Nice! Are you using pthread_getspecific or tss_get in you C version?
On 11/21/24 15:17, jseigh wrote:
On 10/17/24 08:10, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free
instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I got a port to c++ working now. There are 5 proxy implementations
1) smrproxy v2
2) arcproxy - reference counted proxy
3) rwlock based proxy
4) mutex based proxy
5) an unsafe proxy with no locking
The testcase is templated so you can use any of the
5 proxy implementations without rewriting for each proxy
type. You can do apple to apple comparisons. I
realize that's the complete antithesis of current
programming practices but there you have it. :)
A bit of clean up and performance tuning now.
Ok, smrproxy lock/unlock is down to 0.6 nanoseconds now,
about what the C version was.
On 11/24/2024 12:14 PM, jseigh wrote:
On 11/23/24 11:10, jseigh wrote:
On 11/21/24 15:17, jseigh wrote:
On 10/17/24 08:10, jseigh wrote:
I replaced the hazard pointer logic in smrproxy. It's now wait-free >>>>> instead of mostly wait-free. The reader lock logic after loading
the address of the reader lock object into a register is now 2
instructions a load followed by a store. The unlock is same
as before, just a store.
It's way faster now.
It's on the feature/003 branch as a POC. I'm working on porting
it to c++ and don't want to waste any more time on c version.
No idea of it's a new algorithm. I suspect that since I use
the term epoch that it will be claimed that it's ebr, epoch
based reclamation, and that all ebr algorithms are equivalent.
Though I suppose you could argue it's qsbr if I point out what
the quiescent states are.
I got a port to c++ working now. There are 5 proxy implementations
1) smrproxy v2
2) arcproxy - reference counted proxy
3) rwlock based proxy
4) mutex based proxy
5) an unsafe proxy with no locking
The testcase is templated so you can use any of the
5 proxy implementations without rewriting for each proxy
type. You can do apple to apple comparisons. I
realize that's the complete antithesis of current
programming practices but there you have it. :)
A bit of clean up and performance tuning now.
Ok, smrproxy lock/unlock is down to 0.6 nanoseconds now,
about what the C version was.
I've been using cpu time to measure performance. That's ok
for lock-free/wait-free locking. For normal mutexes and
shared locks, it doesn't measure wait time so those didn't
look as bad as they really were. You can add logic
to measure how long it takes to acquire a lock but that
adds significant overhead.
I remember back in the day when I was comparing and contrasting various lock/wait-free algorithms with their 100% lock-based counter parts. Some
of the lock-based tests too so long that I just terminated the damn
program. Iirc, a lock-free test would take around 5 minutes. The lock-
based test would be around 30+ minutes. This was way back on c.p.t.
On 11/24/2024 4:09 PM, jseigh wrote:
Actually, I remember way back where a scenario that had a lot of writes
would start to mess with a deferred reclamation wrt a polling thread
type of “scheme”. Too many deferred nodes would start to "pile up". Basically, the single polling thread was having trouble keeping up with
all of them. The interlocked versions seemed to perform sort of "better"
in a sense during periods that had a lot of frequent “writes”. Of course clever use of node caches helps heavy write periods. Anyway, some of the tests just used a mutex for writes, others used lock-free and would
generate high loads of them that would push and pop nodes and defer them
to the poll thread to test how much load it (poll thread) could take.
On 11/23/24 11:10, jseigh wrote:
On 11/21/24 15:17, jseigh wrote:
Ok, smrproxy lock/unlock is down to 0.6 nanoseconds now,
about what the C version was.
I've been using cpu time to measure performance. That's ok
for lock-free/wait-free locking. For normal mutexes and
shared locks, it doesn't measure wait time so those didn't
look as bad as they really were. You can add logic
to measure how long it takes to acquire a lock but that
adds significant overhead.
Some timings with 128 reader threads
unsafe 52.983 nsecs ( 0.000) 860.576 nsecs ( 0.000)
smr 54.714 nsecs ( 1.732) 882.356 nsecs ( 21.780)
smrlite 53.149 nsecs ( 0.166) 870.066 nsecs ( 9.490)
arc 739.833 nsecs ( 686.850) 11,988.289 nsecs ( 11,127.713)
rwlock 1,078.306 nsecs ( 1,025.323) 17,309.882 nsecs ( 16,449.306)
mutex 3,203.034 nsecs ( 3,150.052) 51,479.407 nsecs ( 50,618.831)
The first column is cpu time, third column is elapsed time.
unsafe is without any synchronized reader access. The
value in parentheses is the unsafe access time subtracted
out to separate out the synchronization overheads. smrlite is
smr proxy with thread_local overhead. So smrproxy lock/unlock
by itself is about 0.1 - 0.2 nanoseconds.
I'm going to drop working on the whole proxy interface thing. The application can decide if it wants to hardcode a dependency on a
particular 3rd party libarary implementation or abstract it out
to a more portable api.
On 11/27/24 10:29, jseigh wrote:
I'm going to drop working on the whole proxy interface thing. The
application can decide if it wants to hardcode a dependency on a
particular 3rd party libarary implementation or abstract it out
to a more portable api.
I figured out where the smr vs smrlite overhead is likely coming from.
1) thread_local load about .3 nsecs, 2 for lock/unlock so .6 nsecs.
2) overhead from lazy initialization, about .6 nsecs.
smrlite most of the time doesn't show any measurable overhead,
0 nsecs.
Theoretically, you could do do lazy initialization with zero
runtime overhead, but for most c++ apps, 1 millisecond is
considered fast, so I don't think there would be much interest
in it.
On 11/4/2024 4:46 AM, jseigh wrote:
Hummm... If I remember correctly, you said something about using a
simple atomic exchange to pop a whole list (lock-free stack), then
simple reversing the list to get a fifo order? Do you remember any of
that way back on c.p.t?
On 12/12/2024 4:29 AM, jseigh wrote:
On 12/12/24 03:43, Chris M. Thomasson wrote:
On 11/4/2024 4:46 AM, jseigh wrote:That kind of stuff pre-dates c.p.t. even.
Hummm... If I remember correctly, you said something about using a
simple atomic exchange to pop a whole list (lock-free stack), then
simple reversing the list to get a fifo order? Do you remember any of
that way back on c.p.t?
Has to. Although, it was not in the IBM principles of operation where
the describe their lock-free stack in an appendix, iirc... I cannot
remember the exact one right now. Iirc, it was under free pool
manipulation?
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (3 / 13) |
| Uptime: | 03:26:27 |
| Calls: | 12,099 |
| Calls today: | 7 |
| Files: | 15,003 |
| Messages: | 6,517,876 |