open
https://gitlab.synchro.net/main/sbbs/-/issues/1153
## Summary
On Windows, `xp_lockfile()` takes an **exclusive** byte-range lock even for descriptors opened **read-only**, because it is implemented with the CRT `_locking()` function, which has no shared-lock mode. The POSIX implementation of the same function honors the descriptor's access mode and takes a **shared** (`F_RDLCK`) lock for `O_RDONLY` files.
The result is a silent cross-platform semantic divergence: a record read that allows **concurrent readers on Linux** **serializes all readers on Windows**. This is mostly invisible on a local disk, but it becomes a serious contention / DoS-amplification problem when the data directory is on a **network share (SMB/CIFS)**, where every lock/unlock is a round-trip and our retry loop spins up to 200 times.
## Root cause
`src/xpdev/filewrap.c`
POSIX branch — chooses shared vs. exclusive from the fd's open mode:
```c
// xp_lockfile(), ~line 94
if ((flags & (O_RDONLY | O_RDWR | O_WRONLY)) == O_RDONLY)
alock.l_type = F_RDLCK; /* set read lock to prevent writes (shared) */
else
alock.l_type = F_WRLCK; /* set write lock to prevent all access (exclusive) */
```
Windows branch — always exclusive, mode ignored:
```c
// xp_lockfile(), ~line 419
int xp_lockfile(int file, off_t offset, off_t size, bool block)
{
...
i = _locking(file, block ? LK_LOCK : LK_NBLCK, (long)size); // exclusive-only
...
}
```
`_locking()` only offers `LK_LOCK`/`LK_NBLCK` (exclusive) and `LK_UNLCK`. There is no shared/read-lock flavor, so a read-only consumer that *should* take a shareable read lock instead blocks every other reader of the same region.
## Why it bites: read-only `user.tab` access over SMB
The user database read path takes a lock on every single-record read, via a **read-only** descriptor:
- `openuserdat(cfg, /* for_modify */ false)` opens `user.tab` `O_RDONLY | O_DENYNONE` — `src/sbbs3/userdat.c:319`
- `readuserdat()` → `lockuserdat()` → `lock()` → `xp_lockfile()` — `src/sbbs3/userdat.c:395`, `:341-353`
- `lockuserdat()` retries up to `LOOP_USERDAT` (200) times with `FILE_RETRY_DELAY()` backoff before giving up — `src/sbbs3/userdat.c:348-350`
- `getuserdat()` does open → seek → lock → read → unlock → close for each lookup — `src/sbbs3/userdat.c:551-573`, `:368-412`
On Linux this is a shared read lock: many servers (terminal/mail/ftp/web) can read the same user record concurrently. **On Windows it is exclusive**: concurrent reads of the same record serialize, and on an SMB-mounted `data/` each lock + unlock is a network round-trip. The hottest record (user #1, the sysop/`admin` alias) is exactly the one an attacker probes.
### Observed impact
This was found while investigating an active web-scrape / SSH-login-probe flood against a production system (`vert`) whose `data/` is an **SMB-mounted** volume. Symptoms: legitimate, unrelated services stall waiting to read `user.tab` while the flood drives a storm of exclusive lock/unlock round-trips on hot records. Because the lock is exclusive, even pure *readers* contend with each other — the attack doesn't have to write anything to cause cross-service starvation. The retry loop (200 × backoff) turns each contended SMB lock into a multi-second stall, multiplied across every server sharing the mount.
So a flood that on Linux would be shared-read-lock-cheap becomes, on Windows-over-SMB, a self-inflicted lock convoy that blocks the rest of the BBS from using the user base.
## Proposed fix
Reimplement the Windows `xp_lockfile()` using `LockFileEx()` (obtaining the `HANDLE` via `_get_osfhandle()`), passing the lock **without** `LOCKFILE_EXCLUSIVE_LOCK` when the descriptor was opened `O_RDONLY`, and **with** it otherwise — mirroring the POSIX `F_RDLCK`/`F_WRLCK` selection already in place. Use `LOCKFILE_FAIL_IMMEDIATELY` for the non-blocking (`block == false`) path.
Notes / gotchas:
- `LockFileEx()` ranges must be released with `UnlockFileEx()`, **not** `_locking(LK_UNLCK)`. The Windows `unlock()` (`filewrap.c:252`, currently `_locking(file, LK_UNLCK, ...)`) must be converted to match, or locks won't be released correctly.
- This changes the behavior of **every** `lock()` caller on Windows, not just `user.tab` (SMB message bases under `data/`, etc.). That is the intended consistency fix — it brings Windows in line with the shared-read-lock semantics POSIX has shipped all along — but it warrants a careful pass over lock callers that pass a read-only fd yet implicitly rely on mutual exclusion (on POSIX those are *already* getting only a shared lock, so any such reliance is already cross-platform-incorrect).
- Borland (`__BORLANDC__`) currently aliases `_locking` → `locking`; the `LockFileEx` path should be guarded so the legacy GUI builds still compile (they can retain the exclusive `_locking` fallback, or use the Win32 API directly).
- Worth pairing with a reduction of the `LOOP_USERDAT`/`FILE_RETRY_DELAY` retry cost on slow (network) filesystems, but that is secondary to the shared-vs-exclusive fix.
## Severity
Latent correctness/perf bug on all Windows installs; **severe** on Windows installs with a network-mounted data directory under load. Cross-platform-only — POSIX builds are unaffected.
---
*Discovered while tracing SMB `user.tab` lock contention during a live scrape/login-probe attack. Related, separate findings from the same investigation (not part of this issue): the web server has no max-requests-per-keep-alive-connection / max-connection-age cap, letting crawlers hold a single connection open for many hours; and rate-limit auto-filtering is most effective when configured to drop abusers at `accept()` via `ip-silent.can`.*
— *Authored by Claude (Claude Code), on behalf of @rswindell*
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)