Eventcounts are nifty. I can get a 25x performance improvement using eventcounts over the standard mutex/cvar based solution.
They're pretty trivial to implement. But since they were only invented
in 1979, probably too soon to hope they would be in the standard lib in
the foreseeable future.
On 31.01.2025 15:48, jseigh wrote:
Eventcounts are nifty. I can get a 25x performance improvement using
eventcounts over the standard mutex/cvar based solution.
They're pretty trivial to implement. But since they were only invented
in 1979, probably too soon to hope they would be in the standard lib in
the foreseeable future.
I'm not an expert in this field, but at first glance the eventcounts
seem similar to std::atomic<T>::wait(), std::atomic<T>::notify_*() (in
the standard since C++20).
So, would it be possible to compare eventcounts with c++20 std::atomic (functionality and performance wise)?
On 1/31/25 13:49, Paavo Helde wrote:
On 31.01.2025 15:48, jseigh wrote:
Eventcounts are nifty. I can get a 25x performance improvement using
eventcounts over the standard mutex/cvar based solution.
They're pretty trivial to implement. But since they were only invented >>> in 1979, probably too soon to hope they would be in the standard lib in
the foreseeable future.
I'm not an expert in this field, but at first glance the eventcounts
seem similar to std::atomic<T>::wait(), std::atomic<T>::notify_*() (in
the standard since C++20).
So, would it be possible to compare eventcounts with c++20 std::atomic
(functionality and performance wise)?
You can compare it but it wouldn't be a fair comparison. The eventcount
api let's you do optimizations that std::atomic can't even if it has
platform support like linux futex, windows WaitOnAddress, etc..
E.g. folly's eventcount api
void notify() noexcept;
void notifyAll() noexcept;
Key prepareWait() noexcept;
void cancelWait() noexcept;
void wait(Key key) noexcept;
let's you optimize notify to basically do nothing if there
are no waiters.
For std::atomic, you'd have to do an atomic increment and
a system call on every single notify.
And if there's no platform support for <T> then the implementation
is going to use mutex and condvars.
On 1/31/25 15:47, jseigh wrote:
On 1/31/25 13:49, Paavo Helde wrote:
On 31.01.2025 15:48, jseigh wrote:
Eventcounts are nifty. I can get a 25x performance improvement using >>>> eventcounts over the standard mutex/cvar based solution.
They're pretty trivial to implement. But since they were only invented >>>> in 1979, probably too soon to hope they would be in the standard lib in >>>> the foreseeable future.
I'm not an expert in this field, but at first glance the eventcounts
seem similar to std::atomic<T>::wait(), std::atomic<T>::notify_*() (in
the standard since C++20).
So, would it be possible to compare eventcounts with c++20 std::atomic
(functionality and performance wise)?
You can compare it but it wouldn't be a fair comparison. The eventcount
api let's you do optimizations that std::atomic can't even if it has
platform support like linux futex, windows WaitOnAddress, etc..
E.g. folly's eventcount api
void notify() noexcept;
void notifyAll() noexcept;
Key prepareWait() noexcept;
void cancelWait() noexcept;
void wait(Key key) noexcept;
let's you optimize notify to basically do nothing if there
are no waiters.
For std::atomic, you'd have to do an atomic increment and
a system call on every single notify.
And if there's no platform support for <T> then the implementation
is going to use mutex and condvars.
I did a comparison to std::atomic<uint32_t> on linux which is probably
using a futex. It's about 2x to 3x slower than using eventcounts.
Am 04.02.2025 um 09:10 schrieb Chris M. Thomasson:
On 2/3/2025 8:25 PM, Bonita Montero wrote:
Am 04.02.2025 um 01:17 schrieb Chris M. Thomasson:
Sigh.
These poll-only lock-free algorithms don't make sense.
Why do you assume poll-only?
Lock-free don't has kernel-waits.
On 2/4/25 12:33, Bonita Montero wrote:
Am 04.02.2025 um 09:10 schrieb Chris M. Thomasson:
On 2/3/2025 8:25 PM, Bonita Montero wrote:
Am 04.02.2025 um 01:17 schrieb Chris M. Thomasson:
Sigh.
These poll-only lock-free algorithms don't make sense.
Why do you assume poll-only?
Lock-free don't has kernel-waits.
The underlying queue is lock-free. Adding the eventcounts means
that it isn't technically lock-free any more. It's just a concurrent
queue. But one that's 25x faster than a concurrent queue
using mutexes and condvars. You are free to use the slower
implementations. Who's to know the difference if you don't tell
them.
On Tue, 4 Feb 2025 19:59:46 -0500
jseigh <[email protected]> wibbled:
On 2/4/25 12:33, Bonita Montero wrote:
Am 04.02.2025 um 09:10 schrieb Chris M. Thomasson:
On 2/3/2025 8:25 PM, Bonita Montero wrote:
Am 04.02.2025 um 01:17 schrieb Chris M. Thomasson:
Sigh.
These poll-only lock-free algorithms don't make sense.
Why do you assume poll-only?
Lock-free don't has kernel-waits.
The underlying queue is lock-free. Adding the eventcounts means
that it isn't technically lock-free any more. It's just a concurrent
queue. But one that's 25x faster than a concurrent queue
using mutexes and condvars. You are free to use the slower
implementations. Who's to know the difference if you don't tell
them.
The kernel could implement these in any way it sees fit, so long as it stops and restarts the user space program at the correct time in, correct state with
the correct data. Underlying implementations should always be black box to user space.
On 2/5/25 03:38, [email protected] wrote:
On Tue, 4 Feb 2025 19:59:46 -0500
jseigh <[email protected]> wibbled:
On 2/4/25 12:33, Bonita Montero wrote:
Am 04.02.2025 um 09:10 schrieb Chris M. Thomasson:
On 2/3/2025 8:25 PM, Bonita Montero wrote:
Am 04.02.2025 um 01:17 schrieb Chris M. Thomasson:
Sigh.
These poll-only lock-free algorithms don't make sense.
Why do you assume poll-only?
Lock-free don't has kernel-waits.
The underlying queue is lock-free. Adding the eventcounts means
that it isn't technically lock-free any more. It's just a concurrent
queue. But one that's 25x faster than a concurrent queue
using mutexes and condvars. You are free to use the slower
implementations. Who's to know the difference if you don't tell
them.
The kernel could implement these in any way it sees fit, so long as it stops >> and restarts the user space program at the correct time in, correct state >with
the correct data. Underlying implementations should always be black box to >> user space.
No argument there. Not entirely sure how this is germane to user
space code. Please clarify a bit more.
On Wed, 5 Feb 2025 06:59:41 -0500
jseigh <[email protected]> wibbled:
On 2/5/25 03:38, [email protected] wrote:
On Tue, 4 Feb 2025 19:59:46 -0500
jseigh <[email protected]> wibbled:
On 2/4/25 12:33, Bonita Montero wrote:
Am 04.02.2025 um 09:10 schrieb Chris M. Thomasson:
On 2/3/2025 8:25 PM, Bonita Montero wrote:
Am 04.02.2025 um 01:17 schrieb Chris M. Thomasson:
Sigh.
These poll-only lock-free algorithms don't make sense.
Why do you assume poll-only?
Lock-free don't has kernel-waits.
The underlying queue is lock-free. Adding the eventcounts means
that it isn't technically lock-free any more. It's just a concurrent
queue. But one that's 25x faster than a concurrent queue
using mutexes and condvars. You are free to use the slower
implementations. Who's to know the difference if you don't tell
them.
The kernel could implement these in any way it sees fit, so long as it stops
and restarts the user space program at the correct time in, correct state >> with
the correct data. Underlying implementations should always be black box to >>> user space.
No argument there. Not entirely sure how this is germane to user
space code. Please clarify a bit more.
Did you miss the bits above mentioning kernel-waits and underlying implementations?
On 2/5/25 08:30, [email protected] wrote:
Did you miss the bits above mentioning kernel-waits and underlyingOk, I see the source of your confusion. Eventcounts and mutex/condvars
implementations?
are 2 different synchronization mechanisms. On linux they both use
futexes. The mutex/condvar is slower not because of its implementation
but because there is a lock involved. If a thread time slice ends
with it holding the lock, other threads get blocked. With eventcounts
it doesn't matter if and where its time slice end happens because it
won't affect the progress of other threads. That's basically where
the 25x difference in performance is.
On Wed, 5 Feb 2025 12:15:18 -0500
jseigh <[email protected]> wibbled:
On 2/5/25 08:30, [email protected] wrote:
Did you miss the bits above mentioning kernel-waits and underlyingOk, I see the source of your confusion. Eventcounts and mutex/condvars
implementations?
are 2 different synchronization mechanisms. On linux they both use
futexes. The mutex/condvar is slower not because of its implementation
but because there is a lock involved. If a thread time slice ends
with it holding the lock, other threads get blocked. With eventcounts
it doesn't matter if and where its time slice end happens because it
won't affect the progress of other threads. That's basically where
the 25x difference in performance is.
Presumably however a lock must be involved somewhere even if in kernel space when waiting on a futex/eventcount otherwise you risk race conditions. Also
I don't understand people talking about polling - if polling were a solution to thread synchronisation then mutexes wouldn't exist.
On 1/31/25 15:47, jseigh wrote:
I did a comparison to std::atomic<uint32_t> on linux which is probably
using a futex. It's about 2x to 3x slower than using eventcounts.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 29:45:00 |
| Calls: | 12,108 |
| Calls today: | 8 |
| Files: | 15,006 |
| Messages: | 6,518,245 |