• INN 2.7.1 death spiral under high load?

    From Jesse Rehmer@21:1/5 to All on Fri Jun 30 00:29:09 2023
    I've been working on transitioning my tradspool setup to a new server with
    CNFS buffers. Some of my previous threads I asked about ways the increase concurrency to respool all the articles from server to server.

    When I used innfeed no matter the connection count or dynamic-method chosen,
    it takes 35 days to transmit 256 million articles to the new server. I decided to start experimenting with multiple innxmit processes instead. When I split
    my batch file into 20 parts and start the innxmit processes, somewhere between 10-15 innxmit processes (so the same number of connection to the remote
    server) the remote server starts using 100% CPU but stops accepting articles and prints the following in the logs:

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file
    descriptor

    I have to forcefully kill innd, it will not respond to any commands, etc. I'm wondering exactly what these errors mean. This server is running FreeBSD 13.2, has 32GB of physical memory, 64GB of swap (not using any) and INN is the only thing running.

    With 10 innxmit processes on the remote server running I'm pushing ~18Mbps of traffic and according the stats am accepting ~400,000 articles per 10 minutes, but I have quite a bit of spare CPU and I/O cycles on both ends and would like to go faster, if I can.

    Regards,

    Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Fri Jun 30 03:57:28 2023
    On Jun 29, 2023 at 7:29:09 PM CDT, "Jesse Rehmer" <[email protected]> wrote:

    I've been working on transitioning my tradspool setup to a new server with CNFS buffers. Some of my previous threads I asked about ways the increase concurrency to respool all the articles from server to server.

    When I used innfeed no matter the connection count or dynamic-method chosen, it takes 35 days to transmit 256 million articles to the new server. I decided
    to start experimenting with multiple innxmit processes instead. When I split my batch file into 20 parts and start the innxmit processes, somewhere between
    10-15 innxmit processes (so the same number of connection to the remote server) the remote server starts using 100% CPU but stops accepting articles and prints the following in the logs:

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

    I have to forcefully kill innd, it will not respond to any commands, etc. I'm wondering exactly what these errors mean. This server is running FreeBSD 13.2,
    has 32GB of physical memory, 64GB of swap (not using any) and INN is the only thing running.

    With 10 innxmit processes on the remote server running I'm pushing ~18Mbps of traffic and according the stats am accepting ~400,000 articles per 10 minutes,
    but I have quite a bit of spare CPU and I/O cycles on both ends and would like
    to go faster, if I can.

    Regards,

    Jesse

    Purposely put it into this state and gathered some more information and the logs eventually gave more details.

    When running truss I see this happening at a high rate and nothing else:

    select(1047,{ 12 14 15 17 },{ 1027 1028 1030 1031 1035 1036 1037 1041 },0x0,{ 198.000000 }) = 8 (0x8)
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

    After a few minutes the following is printed to news.notice:

    Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com:647 NCmode "MODE STREAM" received
    Jun 29 22:51:18 gatekeeper innd[16269]: free:-1 20 free but was in SMASK
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset 0x7c4d40000: Bad file descriptor
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset 0x7c4d40000: Bad file descriptor
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset 0x7c4d40000: Bad file descriptor
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed
    Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com connected 20 streaming allowed
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset 0x7c4d40000: Illegal seek
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset 0x7c4d40000: Illegal seek
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed
    Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com:22 22 sleeping without Waker
    Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: CNFSflushallheads: flushing TEXT Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: CNFSflushallheads: flushing BINARY
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER throttled Illegal seek writing SMstore file -- throttling
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER throttle Illegal seek writing SMstore file -- throttling
    Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
    failed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Syber Shock@21:1/5 to Jesse Rehmer on Thu Jun 29 22:51:36 2023
    On Fri, 30 Jun 2023 00:29:09 -0000 (UTC)
    Jesse Rehmer <[email protected]> wrote:

    I've been working on transitioning my tradspool setup to a new server
    with CNFS buffers. Some of my previous threads I asked about ways the increase concurrency to respool all the articles from server to
    server.

    <snip>

    With 10 innxmit processes on the remote server running I'm pushing
    ~18Mbps of traffic and according the stats am accepting ~400,000
    articles per 10 minutes, but I have quite a bit of spare CPU and I/O
    cycles on both ends and would like to go faster, if I can.

    Regards,

    Jesse

    My immediate thought is to ask, can the tradspool directories be
    pulled from a local tradspool mirror to CNFS storage on the same
    SSD/HDD? If yes, would it be faster to script some batch jobs of pushing
    the tradspool to tarballs and/or rsync and send the data over the
    network to the target machine?

    My thought is to tarball or rsync blocks of folders to the target
    machine in compressed tradspool blobs. Then deflate and pipe from local
    to the CNFS import jobs on a local proxy or unix socket, if that is
    possible.

    --
    SugarBug | https://sybershock.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Hochstein@21:1/5 to Jesse Rehmer on Fri Jun 30 08:10:36 2023
    Jesse Rehmer schrieb:

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

    Perhaps you're running into resource limits?

    | news@weidegrund /etc/news $ ulimit -a
    [...]
    | open files (-n) 1024
    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Thomas Hochstein on Thu Jun 29 23:19:19 2023
    Thomas Hochstein <[email protected]> writes:
    Jesse Rehmer schrieb:

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
    Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

    Perhaps you're running into resource limits?

    | news@weidegrund /etc/news $ ulimit -a
    [...]
    | open files (-n) 1024
    [...]

    I was wondering about that too, but the SMASK error is very specific to a
    file descriptor being included in the select mask despite not
    corresponding to an open channel. That implies something has gone
    seriously wrong with the file descriptor tracking. innd then closes that
    file descriptor, which I bet is what's causing the CNFS errors since I bet
    it's closing the cycbuff file descriptor thinking it's a remote peer.

    Why this is happening, I don't know. The number of open connections
    described is not large and is way, way lower than I have run servers with
    in the past. You can see that the file descriptor numbers (35, 36, 39)
    are also quite low.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 30 13:31:19 2023
    Hi Russ,

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK >>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK >>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK >>> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

    I was wondering about that too, but the SMASK error is very specific to a file descriptor being included in the select mask despite not
    corresponding to an open channel. That implies something has gone
    seriously wrong with the file descriptor tracking. innd then closes that file descriptor, which I bet is what's causing the CNFS errors since I bet it's closing the cycbuff file descriptor thinking it's a remote peer.

    It seems like a channel can both be waiting to write and sleeping. SITEchanclose() sets them both. (It may not be the right thing as
    normally only one state should be given at a time as far as I understand
    how states are supposed to work.)

    But anyway, it does not explain why we end up with a channel of type
    CTfree considered to be sleeping (which is the condition causing the
    "free:-1 35 free but was in SMASK" error).
    As far as I see, the sleeping state is removed when a channel gets the
    CTfree type.


    ** Jesse, do you happen to have any "internal closing free channel"
    error in your logs? In this case, the sleeping state is not explicitly enforced.

    --
    Julien ÉLIE

    « Whenever you set out to do something, something else must be done
    first. » (Murphy's Fourth Corollary)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Fri Jun 30 12:36:07 2023
    On Jun 30, 2023 at 6:31:19 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Russ,

    Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK >>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK >>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK >>>> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

    I was wondering about that too, but the SMASK error is very specific to a
    file descriptor being included in the select mask despite not
    corresponding to an open channel. That implies something has gone
    seriously wrong with the file descriptor tracking. innd then closes that
    file descriptor, which I bet is what's causing the CNFS errors since I bet >> it's closing the cycbuff file descriptor thinking it's a remote peer.

    It seems like a channel can both be waiting to write and sleeping. SITEchanclose() sets them both. (It may not be the right thing as
    normally only one state should be given at a time as far as I understand
    how states are supposed to work.)

    But anyway, it does not explain why we end up with a channel of type
    CTfree considered to be sleeping (which is the condition causing the
    "free:-1 35 free but was in SMASK" error).
    As far as I see, the sleeping state is removed when a channel gets the
    CTfree type.


    ** Jesse, do you happen to have any "internal closing free channel"
    error in your logs? In this case, the sleeping state is not explicitly enforced.

    I enabled the trace logs you mentioned. I don't have the specific error you mentioned above, but on this latest attempt and turning on debug logging in syslog, I have more tidbits in news.notice:

    *****
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: tradindexed: cannot get offset for article 168 in /usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal seek
    Jun 30 07:24:18 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F846A00000001@
    Jun 30 07:24:21 gatekeeper innd[84944]: tradindexed: cannot get offset for article 169 in /usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal seek
    Jun 30 07:24:21 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F885000000001@
    Jun 30 07:24:22 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2902 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:22 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F896600000001@
    Jun 30 07:24:22 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2903 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:22 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F898C00000001@
    Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2904 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F8AB300000001@
    Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2905 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F8ADB00000001@
    Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2906 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F8AED00000001@
    Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2907 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F8B0100000001@
    Jun 30 07:24:29 gatekeeper innd[84944]: tradindexed: cannot get offset for article 170 in /usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal seek
    Jun 30 07:24:29 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F91DC00000001@
    Jun 30 07:24:39 gatekeeper innd[84944]: tradindexed: cannot get offset for article 2908 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
    seek
    Jun 30 07:24:39 gatekeeper innd[84944]: SERVER cant store overview for @03015445585400000000012F9C8900000001@
    Jun 30 07:24:58 gatekeeper innd[84944]: news.blueworldhosting.com:878 checkpoint seconds 52 accepted 2000 refused 0 rejected 0 duplicate 0 accepted size 5263200 duplicate size 0 rejected size 0
    Jun 30 07:25:13 gatekeeper innd[84944]: news.blueworldhosting.com:22
    checkpoint seconds 98 accepted 1998 refused 0 rejected 2 duplicate 0 accepted size 10925664 duplicate size 0 rejected size 237059
    Jun 30 07:25:18 gatekeeper innd[84944]: news.blueworldhosting.com:821 checkpoint seconds 65 accepted 2000 refused 0 rejected 0 duplicate 0 accepted size 5087052 duplicate size 0 rejected size 0
    Jun 30 07:25:20 gatekeeper innd[84944]: SERVER cant select: Bad file
    descriptor
    Jun 30 07:25:50 gatekeeper syslogd: last message repeated 561451 times
    *****

    The news.debug file has a ton of entries so I'm not sure what I'm looking for, but it 'ends' like this:

    *****
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader Used=2896
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc Used=3509
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 cp->Start=0 cp->Next=2896 bp->Used=3509
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc state=8 next "y 'Quickie' Sto"
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc state=2 Start=3509 Next=3509 Used=3509
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 cp->Start=3509 cp->Next=3509 bp->Used=3509
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc state=2 next "as an^M >> "inva"
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc state=2 Start=3509 Next=3509 Used=3509
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCreader Used=2873
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc Used=2890
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 cp->Start=0 cp->Next=2873 bp->Used=2890
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc state=8 next "sympatico.ca^M ."
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCwritereply 18=write(821, "239 <76204197@1", 18)Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 >239 <76204197@1>
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc state=2 Start=2890 Next=2890 Used=2890
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 cp->Start=2890 cp->Next=2890 bp->Used=2890
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc state=2 next "f: Technique, c"
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc state=2 Start=2890 Next=2890 Used=2890
    Jun 30 07:25:20 gatekeeper innd[84944]: SERVER cant select: Bad file
    descriptor
    Jun 30 07:25:50 gatekeeper syslogd: last message repeated 561451 times
    Jun 30 07:27:51 gatekeeper syslogd: last message repeated 2241177 times
    Jun 30 07:29:17 gatekeeper syslogd: last message repeated 1581962 times
    Jun 30 07:29:17 gatekeeper innd[84944]: ME time 600000 hishave 5402(83753) hiswrite 11495(83560) hissync 11(2) idle 284870(4542195) artclean 499(83752) artwrite 4690(83554) artcncl 77(240) hisgrep/artcncl 35(240) overv
    24253(83554) perl 49799(83560) python 26(83561) nntpread 1307(175291) artparse 2053(239726) artlog 947(83786) datamove 253(145273)
    Jun 30 07:29:17 gatekeeper innd[84944]: ME HISstats 77 hitpos 0 hitneg 115 missed 83561 dne
    Jun 30 07:29:17 gatekeeper innd[84944]: SERVER cant select: Bad file
    descriptor
    Jun 30 07:29:47 gatekeeper syslogd: last message repeated 574345 times
    *****

    The innd process is using 100% of CPU and truss shows the following over and over (I'm going to let it run for a while and see if it eventually dies):

    *****
    __sysctl("kern.hostname",2,0x2bfb054c64d0,0x2bfb054c23b8,0x0,0) = 0 (0x0) getpid() = 84944 (0x14bd0)
    sendto(3,"<59>1 2023-06-30T07:31:55.121776"...,125,0,NULL,0) = 125 (0x7d) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1057,{ 3 12 14 15 17 22 24 29 30 31 32 226 736 737 821 878 },{ },0x0,{ 442.000000 }) ERR#9 'Bad file descriptor'
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    write(2,"innd: SERVER cant select: Bad fi"...,46) = 46 (0x2e)
    *****

    *****
    [news@gatekeeper ~]$ ulimit -a
    number of pseudoterminals (-P) unlimited
    socket buffer size (bytes, -b) unlimited
    core file size (blocks, -c) unlimited
    data seg size (kbytes, -d) 33554432
    file size (blocks, -f) unlimited
    max kqueues (-k) unlimited
    max locked memory (kbytes, -l) 64
    max memory size (kbytes, -m) unlimited
    open files (-n) 942498
    pipe size (512 bytes, -p) 1
    stack size (kbytes, -s) 524288
    cpu time (seconds, -t) unlimited
    max user processes (-u) 34304
    virtual memory (kbytes, -v) unlimited
    swap size (kbytes, -w) unlimited
    *****

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 30 14:58:15 2023
    Hi Jesse,

    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
    [...]
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader
    Used=2896
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>

    It would be interesting to know what happened to channels 29 to 32.
    Do they appear in the logs as channels?
    Do you see patterns like "write(29, " or ":29 "?

    --
    Julien ÉLIE

    « Animi limina. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Fri Jun 30 13:59:28 2023
    It seems I've hit a 'stop' of some sort, or maybe it's just the throttle behavior but nothing has changed for several minutes.

    news.notice:

    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttled Bad file descriptor writing creating overview file -- throttling
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttle Bad file descriptor writing creating overview file -- throttling
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000138F94B00000001@
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant write history <[email protected]> Bad file descriptor
    ****SNIPPED checkpoint / closed messages****
    Jun 30 08:54:02 gatekeeper innd[60826]: ME status seconds 0 accepted 0 refused 0 rejected 0 duplicate 0 accepted size 1 duplicate size 0 rejected size 0
    Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1028
    inactive 40
    Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1045
    inactive 40
    Jun 30 08:54:02 gatekeeper innd[60826]: ME time 600939 hishave 192(3318) hiswrite 471(3317) hissync 0(0) idle 594219(5121) artclean 23(3318) artwrite 498(3318) artcncl 0(1) hisgrep/artcncl 0(1) overv 2343(3318) hissync/overv 13(1) perl 1619(3318) python 81(3318) nntpread 58(7125) artparse 75(8613) artlog 35(3337) datamove 12(7346)
    Jun 30 08:54:02 gatekeeper innd[60826]: ME HISstats 0 hitpos 0 hitneg 0 missed 0 dne

    news.debug:

    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 NCproc Used=3897
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 cp->Start=0 cp->Next=2896 bp->Used=3897
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 NCproc state=8 next "ivist ^M Richard"
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92
    checkpoint seconds 203 accepted 1842 refused 0 rejected 2 duplicate 0 accepted size 5794716 duplicate size 0 rejected size 181352
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 closed seconds 1851 accepted 17826 refused 0 rejected 18 duplicate 16 accepted size 52758348 duplicate size 39793 rejected size 221145
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCreader Used=1448
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCproc Used=2310
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 cp->Start=0 cp->Next=1448 bp->Used=2310
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCproc state=8 next "-42e3-8733-69a8"
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 checkpoint seconds 115 accepted 881 refused 0 rejected 1 duplicate 0 accepted size 2532336 duplicate size 0 rejected size 2310
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 closed seconds 1877 accepted 14862 refused 0 rejected 20 duplicate 16 accepted size 77809304 duplicate size 294673 rejected size 967053
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCreader Used=2896
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCproc Used=4323
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 cp->Start=0 cp->Next=2896 bp->Used=4323
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCproc state=8 next "surface, GFS ha"
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 checkpoint seconds 141 accepted 1905 refused 0 rejected 1 duplicate 0 accepted size 3706440 duplicate size 0 rejected size 4323
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 closed seconds 956 accepted 11887 refused 0 rejected 19 duplicate 16 accepted size 23317392 duplicate size 27153 rejected size 38795
    Jun 30 08:54:02 gatekeeper innd[60826]: ME status seconds 0 accepted 0 refused 0 rejected 0 duplicate 0 accepted size 1 duplicate size 0 rejected size 0
    Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1028
    inactive 40
    Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1045
    inactive 40
    Jun 30 08:54:02 gatekeeper innd[60826]: ME time 600939 hishave 192(3318) hiswrite 471(3317) hissync 0(0) idle 594219(5121) artclean 23(3318) artwrite 498(3318) artcncl 0(1) hisgrep/artcncl 0(1) overv 2343(3318) hissync/overv 13(1) perl 1619(3318) python 81(3318) nntpread 58(7125) artparse 75(8613) artlog 35(3337) datamove 12(7346)
    Jun 30 08:54:02 gatekeeper innd[60826]: ME HISstats 0 hitpos 0 hitneg 0 missed 0 dne

    Truss output just stops with this:

    root@gatekeeper:/var/log/news # truss -p 60826
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Fri Jun 30 13:26:03 2023
    On Jun 30, 2023 at 7:58:15 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
    Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
    [...]
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader
    Used=2896
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
    NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20
    gatekeeperinnd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>

    It would be interesting to know what happened to channels 29 to 32.
    Do they appear in the logs as channels?
    Do you see patterns like "write(29, " or ":29 "?

    In news.debug I have this which seems to be the last messages logged by the channels?

    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCwritereply 19=write(29, "239 <168029265@", 19)Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 >239 <168029265@1>
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc state=2 Start=1372 Next=1372 Used=1372
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 cp->Start=1372 cp->Next=1372 bp->Used=1372
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc state=2 next "@mid.individual"
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc state=2 Start=1372 Next=1372 Used=1372

    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc state=2 next "TAKETHIS <14960"Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 < TAKETHIS <149602012@1>
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc state=7 Start=1550 Next=1550 Used=2896
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 cp->Start=1550 cp->Next=1550 bp->Used=2896
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc state=7 next "Path: spool1.us"
    Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc state=8 Start=1550 Next=2896 Used=2896

    There is so much output it is difficult for me to pick out specifics. If it would help I can upload a core dump I got and logs to a web server where they can be grabbed?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Fri Jun 30 13:39:32 2023
    On this latest run innd hasn't gone completely into the state I initially saw, but I do have quite a bit of bad file descriptor messages from tradindexed:

    Jun 30 08:13:11 gatekeeper innd[60826]: free:-1 4 free but was in SMASK
    Jun 30 08:13:22 gatekeeper innd[60826]: free:-1 21 free but was in SMASK
    Jun 30 08:14:01 gatekeeper innd[60826]: tradindexed: cannot write index record for 1202 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:01 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000131775A00000001@
    Jun 30 08:14:03 gatekeeper innd[60826]: tradindexed: cannot write index record for 1203 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:03 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000131780200000001@
    Jun 30 08:14:05 gatekeeper innd[60826]: tradindexed: cannot write index record for 1204 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:05 gatekeeper innd[60826]: SERVER cant store overview for @0301544558540000000001317C9B00000001@
    Jun 30 08:24:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 15855 in
    /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133D26300000001@
    Jun 30 08:24:23 gatekeeper innd[60826]: tradindexed: cannot write index record for 15856 in
    /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:23 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133DF7700000001@
    Jun 30 08:24:47 gatekeeper innd[60826]: tradindexed: cannot write index record for 15857 in
    /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:47 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133F33D00000001@
    Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record for 19832 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136542C00000001@
    Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record for 19833 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136545A00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19834 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136548900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19835 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654A800000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19836 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654B400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19837 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654C900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19838 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654E800000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19839 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654F400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19840 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136550400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19841 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136550600000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19842 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136559900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19843 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655BF00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19844 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655C500000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19845 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655D900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19846 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655FE00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record for 19847 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136560300000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19848 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136560900000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19849 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136561400000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19850 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136561C00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19851 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136562F00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19852 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136563C00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19853 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136564B00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19854 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656B000000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19855 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656B100000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record for 19856 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656CB00000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record for 19857 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136571900000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record for 19858 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136574600000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record for 19859 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136575100000001@
    Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record for 19860 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013657F200000001@
    Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record for 19861 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136588400000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19862 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136593300000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19863 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136593B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19864 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136594B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19865 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136595600000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19866 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136596700000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19867 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136597900000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19868 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136598100000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19869 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136598B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19870 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659CB00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record for 19871 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659DA00000001@
    Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record for 19872 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659EB00000001@
    Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record for 19873 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659FE00000001@

    I don't see the OS reporting any filesystem or issues with the block device
    and tdx-util doesn't report any issues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Fri Jun 30 13:51:08 2023
    On Jun 30, 2023 at 8:39:32 AM CDT, "Jesse Rehmer" <[email protected]> wrote:

    On this latest run innd hasn't gone completely into the state I initially saw,
    but I do have quite a bit of bad file descriptor messages from tradindexed:

    Jun 30 08:13:11 gatekeeper innd[60826]: free:-1 4 free but was in SMASK
    Jun 30 08:13:22 gatekeeper innd[60826]: free:-1 21 free but was in SMASK
    Jun 30 08:14:01 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 1202 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:01 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000131775A00000001@
    Jun 30 08:14:03 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 1203 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:03 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000131780200000001@
    Jun 30 08:14:05 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 1204 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad file descriptor
    Jun 30 08:14:05 gatekeeper innd[60826]: SERVER cant store overview for @0301544558540000000001317C9B00000001@
    Jun 30 08:24:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 15855 in /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133D26300000001@
    Jun 30 08:24:23 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 15856 in /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:23 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133DF7700000001@
    Jun 30 08:24:47 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 15857 in /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file descriptor
    Jun 30 08:24:47 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000133F33D00000001@
    Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19832 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136542C00000001@
    Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19833 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136545A00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19834 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136548900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19835 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654A800000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19836 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654B400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19837 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654C900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19838 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654E800000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19839 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013654F400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19840 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136550400000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19841 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136550600000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19842 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136559900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19843 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655BF00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19844 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655C500000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19845 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655D900000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19846 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013655FE00000001@
    Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19847 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136560300000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19848 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136560900000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19849 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136561400000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19850 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136561C00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19851 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136562F00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19852 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136563C00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19853 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136564B00000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19854 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656B000000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19855 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656B100000001@
    Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19856 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013656CB00000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19857 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136571900000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19858 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136574600000001@
    Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19859 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136575100000001@
    Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19860 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013657F200000001@
    Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19861 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136588400000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19862 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136593300000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19863 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136593B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19864 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136594B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19865 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136595600000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19866 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136596700000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19867 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136597900000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19868 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136598100000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19869 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000136598B00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19870 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659CB00000001@
    Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19871 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659DA00000001@
    Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19872 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659EB00000001@
    Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
    for 19873 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad file descriptor
    Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for @03015445585400000000013659FE00000001@

    I don't see the OS reporting any filesystem or issues with the block device and tdx-util doesn't report any issues.

    A new error set including history write failure, but its still chugging along:

    Jun 30 08:44:01 gatekeeper innd[60826]: ME time 600000 hishave 9642(139443) hiswrite 19812(139443) hissync 23(2) idle 145170(177110) artclean 807(139443) artwrite 20093(139436) artcncl 17855(236) hisgrep/artcncl 407(236) overv 250828(139436) perl 70631(139443) python 4053(139443) nntpread 2128(260530) artparse 3000(333610) artlog 1666(139611) datamove 354(266420)
    Jun 30 08:44:01 gatekeeper innd[60826]: ME HISstats 0 hitpos 0 hitneg 0 missed 139443 dne
    Jun 30 08:44:01 gatekeeper innd[60826]: ME status seconds 1874 accepted 407939 refused 0 rejected 311 duplicate 288 accepted size 1192259072 duplicate size 864430 rejected size 1803392
    Jun 30 08:44:01 gatekeeper innd[60826]: news.blueworldhosting.com status seconds 1874 accepted 407939 refused 0 rejected 311 duplicate 288 accepted
    size 1192259072 duplicate size 864430 rejected size 1803392
    Jun 30 08:44:12 gatekeeper innd[60826]: tradindexed: cannot write index record for 3257 in /usr/local/news/spool/overview/i/c/o/d/it.comp.os.dibattiti.IDX: Bad file descriptor
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000138F91000000001@
    Jun 30 08:44:12 gatekeeper innd[60826]: tradindexed: cannot write index record for 3258 in /usr/local/news/spool/overview/i/c/o/d/it.comp.os.dibattiti.IDX: Bad file descriptor
    Jun 30 08:44:12 gatekeeper innd[60826]: python: state change from running to throttled - Bad file descriptor writing creating overview file -- throttling Jun 30 08:44:12 gatekeeper innd[60826]: CNFS: CNFSflushallheads: flushing TEXT Jun 30 08:44:12 gatekeeper innd[60826]: CNFS: CNFSflushallheads: flushing BINARY
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttled Bad file descriptor writing creating overview file -- throttling
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttle Bad file descriptor writing creating overview file -- throttling
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant store overview for @030154455854000000000138F94B00000001@
    Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant write history <[email protected]> Bad file descriptor
    Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:858 checkpoint seconds 64 accepted 1234 refused 0 rejected 1 duplicate 0 accepted size 3314568 duplicate size 0 rejected size 1564

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Fri Jun 30 14:38:18 2023
    After letting things sit for a while without any connections truss did output
    a little bit, and I decided to run a innxmit process on the remote side and in the logs I see:

    Jun 30 09:32:58 gatekeeper innd[60826]: any:0 1033 sleeping without Waker
    Jun 30 09:32:58 gatekeeper innd[60826]: any:0 closed
    Jun 30 09:32:58 gatekeeper innd[60826]: free:-1 1033 free but was in SMASK
    Jun 30 09:33:12 gatekeeper innd[60826]: any:0 1024 sleeping without Waker
    Jun 30 09:33:12 gatekeeper innd[60826]: any:0 closed
    Jun 30 09:33:21 gatekeeper innd[60826]: free:-1 1024 free but was in SMASK
    Jun 30 09:33:22 gatekeeper innd[60826]: any:0 1034 sleeping without Waker
    Jun 30 09:33:22 gatekeeper innd[60826]: any:0 closed
    Jun 30 09:33:22 gatekeeper innd[60826]: free:-1 1034 free but was in SMASK
    Jun 30 09:33:23 gatekeeper innd[60826]: free:-1 1024 free but was in SMASK

    root@gatekeeper:/var/log/news # truss -p 60826
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 137.000000 }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0) open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0)
    fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
    getpid() = 60826 (0xed9a) __sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:04:03.389851"...,198,0,NULL,0) = 198 (0xc6) write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e) open("/dev/null",O_RDONLY,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:04:03.392944"...,128,0,NULL,0) = 128 (0x80) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:04:03.393756"...,128,0,NULL,0) = 128 (0x80) __sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:04:03.394272"...,325,0,NULL,0) = 325 (0x145) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:04:03.394707"...,130,0,NULL,0) = 130 (0x82) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 104.152.52.99:53903 },0x2231d0315a5c) = 9 (0x9) setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03158dc,4) = 0 (0x0) fcntl(9,F_GETFL,) = 6 (0x6)
    fcntl(9,F_SETFL,O_RDWR) = 0 (0x0)
    fork() = 85116 (0x14c7c)
    close(9) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 511.000000 }) ERR#4 'Interrupted system call'
    SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=85116 uid=8 status=1 sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0)
    wait4(-1,{ EXITED,val=1 },WNOHANG,0x0) = 85116 (0x14c7c) wait4(-1,0x2231d0316484,WNOHANG,0x0) ERR#10 'No child processes' sigreturn(0x2231d03164b0) EJUSTRETURN
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 511.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 104.152.52.199:50860 },0x2231d0315a5c) = 9 (0x9) setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03158dc,4) = 0 (0x0) fcntl(9,F_GETFL,) = 6 (0x6)
    fcntl(9,F_SETFL,O_RDWR) = 0 (0x0)
    fork() = 85997 (0x14fed)
    close(9) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 464.000000 }) ERR#4 'Interrupted system call'
    SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=85997 uid=8 status=1 sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0)
    wait4(-1,{ EXITED,val=1 },WNOHANG,0x0) = 85997 (0x14fed) wait4(-1,0x2231d0316484,WNOHANG,0x0) ERR#10 'No child processes' sigreturn(0x2231d03164b0) EJUSTRETURN
    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1046,{ 12 14 15 17 },{ },0x0,{ 463.000000 }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0) open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0)
    fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
    getpid() = 60826 (0xed9a) __sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.408448"...,198,0,NULL,0) = 198 (0xc6) write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e) open("/dev/null",O_RDONLY,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.409333"...,128,0,NULL,0) = 128 (0x80) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.409640"...,124,0,NULL,0) = 124 (0x7c) __sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.409953"...,233,0,NULL,0) = 233 (0xe9) __sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.410270"...,229,0,NULL,0) = 229 (0xe5) close(1028) = 0 (0x0) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.410712"...,128,0,NULL,0) = 128 (0x80) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.411017"...,124,0,NULL,0) = 124 (0x7c) __sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.411323"...,233,0,NULL,0) = 233 (0xe9) __sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.411634"...,229,0,NULL,0) = 229 (0xe5) close(1045) = 0 (0x0) __sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.412088"...,324,0,NULL,0) = 324 (0x144) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:14:03.412390"...,130,0,NULL,0) = 130 (0x82) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0) open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0)
    fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
    getpid() = 60826 (0xed9a) __sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:24:03.449870"...,198,0,NULL,0) = 198 (0xc6) write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e) open("/dev/null",O_RDONLY,0666) = 9 (0x9)
    dup2(9,4) = 4 (0x4)
    close(9) = 0 (0x0) __sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:24:03.450754"...,324,0,NULL,0) = 324 (0x144) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:24:03.451068"...,130,0,NULL,0) = 130 (0x82) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 199.102.165.15:54669 },0x2231d0315a5c) = 9 (0x9) fcntl(9,F_GETFD,) = 0 (0x0)
    fcntl(9,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    fcntl(9,F_GETFL,) = 6 (0x6)
    fcntl(9,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0) setsockopt(9,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(9,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
    write(1,"Jun 30 08:44:12.171 - spool1.use"...,3147) = 3147 (0xc4b) write(2,"innd: any:0 1033 sleeping withou"...,40) = 40 (0x28) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:32:58.242206"...,119,0,NULL,0) = 119 (0x77) __sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:32:58.242567"...,98,0,NULL,0) = 98 (0x62) close(0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 1033 },{ 9 },0x0,{ 66.000000 }) = 2 (0x2) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    write(2,"innd: free:-1 1033 free but was "...,41) = 41 (0x29) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:32:58.243474"...,120,0,NULL,0) = 120 (0x78) close(1033) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 66.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 199.102.165.15:54670 },0x2231d0315a5c) = 0 (0x0) fcntl(0,F_GETFD,) = 0 (0x0)
    fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    fcntl(0,F_GETFL,) = 6 (0x6)
    fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
    write(2,"innd: any:0 1024 sleeping withou"...,40) = 40 (0x28) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:33:12.269818"...,119,0,NULL,0) = 119 (0x77) __sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:33:12.270176"...,98,0,NULL,0) = 98 (0x62) close(0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 52.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 199.102.165.15:16010 },0x2231d0315a5c) = 0 (0x0) fcntl(0,F_GETFD,) = 0 (0x0)
    fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    fcntl(0,F_GETFL,) = 6 (0x6)
    fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
    write(2,"innd: free:-1 1024 free but was "...,41) = 41 (0x29) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:33:21.619875"...,120,0,NULL,0) = 120 (0x78) close(1024) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 42.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 199.102.165.15:17875 },0x2231d0315a5c) = 10 (0xa) fcntl(10,F_GETFD,) = 0 (0x0)
    fcntl(10,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    fcntl(10,F_GETFL,) = 6 (0x6) fcntl(10,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0) setsockopt(10,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(10,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(10,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0) write(2,"innd: any:0 1034 sleeping withou"...,40) = 40 (0x28) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:33:22.750984"...,119,0,NULL,0) = 119 (0x77) __sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<61>1 2023-06-30T09:33:22.751310"...,98,0,NULL,0) = 98 (0x62) close(0) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 1034 },{ 10 },0x0,{ 41.000000 }) = 2 (0x2) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    write(2,"innd: free:-1 1034 free but was "...,41) = 41 (0x29) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:33:22.752212"...,120,0,NULL,0) = 120 (0x78) close(1034) = 0 (0x0)
    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(1045,{ 12 14 15 17 },{ },0x0,{ 41.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    accept(17,{ AF_INET 199.102.165.15:24294 },0x2231d0315a5c) = 0 (0x0) fcntl(0,F_GETFD,) = 0 (0x0)
    fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    fcntl(0,F_GETFL,) = 6 (0x6)
    fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0) setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
    write(2,"innd: free:-1 1024 free but was "...,41) = 41 (0x29) __sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0) getpid() = 60826 (0xed9a)
    sendto(3,"<59>1 2023-06-30T09:33:23.636694"...,120,0,NULL,0) = 120 (0x78) close(1024) ERR#9 'Bad file descriptor' sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 30 17:56:09 2023
    Hi Jesse,

    After letting things sit for a while without any connections truss did output a little bit, and I decided to run a innxmit process on the remote side and in
    the logs I see:

    Jun 30 09:32:58 gatekeeper innd[60826]: any:0 1033 sleeping without Waker
    Jun 30 09:32:58 gatekeeper innd[60826]: any:0 closed

    Hmm, now the file descriptors exceed 1024.

    In the man page of select(2) for FreeBSD, I see:

    The default size of FD_SETSIZE is currently 1024. In order to accommo-
    date programs which might potentially use a larger number of open files
    with select(), it is possible to increase this size by having the
    program
    define FD_SETSIZE before the inclusion of any header which includes
    <sys/types.h>.


    There's maybe something to do regarding that as we do not do that... Nonetheless, it does not explain the previous issue with file
    descriptors < 1024.

    --
    Julien ÉLIE

    « Un clavier azerty en vaut deux. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jul 1 08:26:53 2023
    Hi Jesse,

    In the man page of select(2) for FreeBSD, I see:

    The default size of FD_SETSIZE is currently 1024. In order to accommo-
    date programs which might potentially use a larger number of open files
    with select(), it is possible to increase this size by having the program
    define FD_SETSIZE before the inclusion of any header which includes
    <sys/types.h>.

    There's maybe something to do regarding that as we do not do that... Nonetheless, it does not explain the previous issue with file
    descriptors < 1024.

    Could this be tried please?

    The sleep set is just after the read set in memory, and I am under the impression that when FD_SET is used on a higher file descriptor than
    1024 by select() to advertise that this file descriptor should be read,
    it just marks that number minus 1024 as sleeping in the sleep set. The
    length of the set is not large enough.


    You could either try to rebuild INN with 4096 set in FD_SETSIZE in your
    system headers /usr/include/sys/select.h (and then revert the change
    after the build, just to test) or try this one for innd/chan.c only:

    --- a/innd/chan.c
    +++ b/innd/chan.c
    @@ -17,6 +17,9 @@

    /* Needed on AIX 4.1 to get fd_set and friends. */
    #ifdef HAVE_SYS_SELECT_H
    +# ifndef FD_SETSIZE
    +# define FD_SETSIZE 4096
    +# endif
    # include <sys/select.h>
    #endif


    Please tell if it happens to work better.

    --
    Julien ÉLIE

    « Un clavier azerty en vaut deux. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sat Jul 1 07:47:10 2023
    On Jul 1, 2023 at 2:45:59 AM CDT, "Jesse Rehmer" <[email protected]> wrote:

    On Jul 1, 2023 at 1:26:53 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    The sleep set is just after the read set in memory, and I am under the
    impression that when FD_SET is used on a higher file descriptor than
    1024 by select() to advertise that this file descriptor should be read,
    it just marks that number minus 1024 as sleeping in the sleep set. The
    length of the set is not large enough.

    You could either try to rebuild INN with 4096 set in FD_SETSIZE in your
    system headers /usr/include/sys/select.h (and then revert the change
    after the build, just to test) or try this one for innd/chan.c only:

    --- a/innd/chan.c
    +++ b/innd/chan.c
    @@ -17,6 +17,9 @@

    /* Needed on AIX 4.1 to get fd_set and friends. */
    #ifdef HAVE_SYS_SELECT_H
    +# ifndef FD_SETSIZE
    +# define FD_SETSIZE 4096
    +# endif
    # include <sys/select.h>
    #endif


    Please tell if it happens to work better.

    Hi Julien, I spent today in the data center setting up a Dell R440 with a couple PCI NVMe drives. This is a big upgrade from my source and destination hardware (R710s on old SAS RAID5) and am in the process of migrating the VMs to this new host. I will rebuild INN with the larger FD_SETSIZE and see what happens.

    One thing of note, I have been doing various things over the last year that involve shoving large amount of articles to INN as fast as I can, but it wasn't until I messed with a separate destination machine that has SSD storage
    and could write a ton faster that I ran into this, so I'm curious to see how things go when both source and destination are on NVMe drives.

    Much appreciate everyone's eyes and suggestions. Will report back when I've got the new environments running.

    Something I just noticed when looking at my Diablo feeder source code... some time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sat Jul 1 07:45:59 2023
    On Jul 1, 2023 at 1:26:53 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    The sleep set is just after the read set in memory, and I am under the impression that when FD_SET is used on a higher file descriptor than
    1024 by select() to advertise that this file descriptor should be read,
    it just marks that number minus 1024 as sleeping in the sleep set. The length of the set is not large enough.

    You could either try to rebuild INN with 4096 set in FD_SETSIZE in your system headers /usr/include/sys/select.h (and then revert the change
    after the build, just to test) or try this one for innd/chan.c only:

    --- a/innd/chan.c
    +++ b/innd/chan.c
    @@ -17,6 +17,9 @@

    /* Needed on AIX 4.1 to get fd_set and friends. */
    #ifdef HAVE_SYS_SELECT_H
    +# ifndef FD_SETSIZE
    +# define FD_SETSIZE 4096
    +# endif
    # include <sys/select.h>
    #endif


    Please tell if it happens to work better.

    Hi Julien, I spent today in the data center setting up a Dell R440 with a couple PCI NVMe drives. This is a big upgrade from my source and destination hardware (R710s on old SAS RAID5) and am in the process of migrating the VMs
    to this new host. I will rebuild INN with the larger FD_SETSIZE and see what happens.

    One thing of note, I have been doing various things over the last year that involve shoving large amount of articles to INN as fast as I can, but it
    wasn't until I messed with a separate destination machine that has SSD storage and could write a ton faster that I ran into this, so I'm curious to see how things go when both source and destination are on NVMe drives.

    Much appreciate everyone's eyes and suggestions. Will report back when I've
    got the new environments running.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jul 1 17:44:55 2023
    Hi Jesse,

    One thing of note, I have been doing various things over the last year that >> involve shoving large amount of articles to INN as fast as I can, but it
    wasn't until I messed with a separate destination machine that has SSD storage
    and could write a ton faster that I ran into this, so I'm curious to see how >> things go when both source and destination are on NVMe drives.

    Thanks again for your bug report and all your testings!


    Something I just noticed when looking at my Diablo feeder source code... some time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)

    Oh, thanks for the pointer.
    Maybe we should do something like that for INN too as it seems to have
    done the work for Diablo. Here is the related patch:

    https://github.com/jpmens/diablo/commit/9505abdc9d100e97ef89e2f58482fe9bd49ca215

    /*
    * Cannot increase FD_SETSIZE on Linux, but we can increase __FD_SETSIZE
    * with glibc 2.2 and 2.3 at least. We do this by including
    * bits/types.h which defines __FD_SETSIZE first (before any other
    include),
    * then we redefine __FD_SETSIZE. Ofcourse a user program may NEVER
    * include bits/whatever.h directly, so this is a dirty hack!
    */
    #if LARGE_FD_SETSIZE > 1024
    # ifdef __linux__
    # include <features.h>
    # if (__GLIBC__ > 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2)
    # include <bits/types.h>
    # undef __FD_SETSIZE
    # define __FD_SETSIZE LARGE_FD_SETSIZE
    # endif
    # else
    /* Works on most BSDs */
    # define FD_SETSIZE LARGE_FD_SETSIZE
    # endif
    #endif



    So it would for instance mean for INN to say in the documentation that -DLARGE_FD_SETSIZE=4096 should be given at build time to increase the
    limit of the number of file descriptors to 4096.

    Having googled a bit, there does not seem to be a more elegant way to
    achieve that, unless switching to another method to read sockets
    (poll(), libevent or).

    Any worries with that piece of code to add into INN?


    Jesse, the suggestion for
    #define FD_SETSIZE 4096
    I gave you in innd/chan.c may not work. The #define should be before
    any inclusion of <sys/types.h>, and we do that in several header files.
    Maybe it should be in include/inn/libinn.h instead, or maybe in include/portable/system.h or maybe elsewhere, I should have a deeper look.

    --
    Julien ÉLIE

    « La bête aux douze pieds qui marche sur la tête. » (Nougaro)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Sat Jul 1 09:54:05 2023
    Julien ÉLIE <[email protected]d> writes:

    Having googled a bit, there does not seem to be a more elegant way to
    achieve that, unless switching to another method to read sockets
    (poll(), libevent or).

    Switching to libevent is almost certainly the right long-term solution
    (I've been thinking about this for ages and previously converted remctld
    to it), since then it handles all the many confusing variations of select/poll/epoll/etc. for you and you don't have to worry about figuring
    out which system has which variation. It provides an equivalent of innd's current select loop that's much cleaner but works in a very similar way.
    But unfortunately it's a whole lot of quite disruptive work.

    It would probably produce a noticable performance increase for really busy servers, too. select is notoriously inefficient.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sat Jul 1 22:11:24 2023
    On Jul 1, 2023 at 10:44:55 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    One thing of note, I have been doing various things over the last year that >>> involve shoving large amount of articles to INN as fast as I can, but it >>> wasn't until I messed with a separate destination machine that has SSD storage
    and could write a ton faster that I ran into this, so I'm curious to see how
    things go when both source and destination are on NVMe drives.

    Thanks again for your bug report and all your testings!


    Something I just noticed when looking at my Diablo feeder source code... some
    time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)

    Oh, thanks for the pointer.
    Maybe we should do something like that for INN too as it seems to have
    done the work for Diablo. Here is the related patch:

    https://github.com/jpmens/diablo/commit/9505abdc9d100e97ef89e2f58482fe9bd49ca215

    /*
    * Cannot increase FD_SETSIZE on Linux, but we can increase __FD_SETSIZE
    * with glibc 2.2 and 2.3 at least. We do this by including
    * bits/types.h which defines __FD_SETSIZE first (before any other include),
    * then we redefine __FD_SETSIZE. Ofcourse a user program may NEVER
    * include bits/whatever.h directly, so this is a dirty hack!
    */
    #if LARGE_FD_SETSIZE > 1024
    # ifdef __linux__
    # include <features.h>
    # if (__GLIBC__ > 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2)
    # include <bits/types.h>
    # undef __FD_SETSIZE
    # define __FD_SETSIZE LARGE_FD_SETSIZE
    # endif
    # else
    /* Works on most BSDs */
    # define FD_SETSIZE LARGE_FD_SETSIZE
    # endif
    #endif



    So it would for instance mean for INN to say in the documentation that -DLARGE_FD_SETSIZE=4096 should be given at build time to increase the
    limit of the number of file descriptors to 4096.

    Having googled a bit, there does not seem to be a more elegant way to
    achieve that, unless switching to another method to read sockets
    (poll(), libevent or).

    Any worries with that piece of code to add into INN?


    Jesse, the suggestion for
    #define FD_SETSIZE 4096
    I gave you in innd/chan.c may not work. The #define should be before
    any inclusion of <sys/types.h>, and we do that in several header files.
    Maybe it should be in include/inn/libinn.h instead, or maybe in include/portable/system.h or maybe elsewhere, I should have a deeper look.

    I spun up a new FreeBSD VM backed by SSD storage, set FD_SETSIZE to 4096 in /usr/include/sys/select.h and compiled INN and started my process. Honestly, it's going so fast with one innxmit process now that both machines are on
    solid state storage I'm not sure I can make it go any faster on the receiving end. Doing ~3200 articles per second with a single innxmit process on the source side.

    INN isn't CPU bound yet, so tomorrow when I plan to do more playing around I will try multiple innxmit processes and see just how far I can push it and if
    I run into a similar crash. That said, the previous crashes I was not getting anywhere near this article rate.

    Can we talk about the implications of setting icdsynccount to a much much higher value than 10? I have it set to 500 now, but curious if there could be gains by setting this really high in my scenario where I know I'm not
    receiving duplicate articles and no articles exist on the destination? If
    there are any other tuning suggestions that can be had as far as reducing extraneous things (I did set incominglogfrequency to 2000 which still logs about twice a second, for example)?

    Jul 1 16:46:32 sorter innd[1098]: ME status seconds 1183 accepted 3722328 refused 0 rejected 1017 duplicate 16 accepted size 10797672448 duplicate size 42912 rejected size 149576720
    Jul 1 16:56:32 sorter innd[1098]: ME time 600000 hishave 2087(1997776)
    hiswrite 37175(1997776) hissync 6(2) idle 127733(2314098) artclean 3311(1997776) artwrite 20582(1997709) artcncl 90(431) hisgrep/artcncl 2(431) overv 50717(1997709) perl 295793(1997776) python 13162(1997776) nntpread 4811(2314098) artparse 14916(4102412) artlog 5645(1998040) datamove 709(2472871)
    Jul 1 16:56:32 sorter innd[1098]: ME HISstats 0 hitpos 0 hitneg 0 missed 1997776 dne
    Jul 1 16:56:32 sorter innd[1098]: ME status seconds 1783 accepted 5720026 refused 0 rejected 1084 duplicate 16 accepted size 16114387968 duplicate size 42912 rejected size 151971536

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jesse Rehmer on Sat Jul 1 15:39:26 2023
    Jesse Rehmer <[email protected]> writes:

    Can we talk about the implications of setting icdsynccount to a much
    much higher value than 10? I have it set to 500 now, but curious if
    there could be gains by setting this really high in my scenario where I
    know I'm not receiving duplicate articles and no articles exist on the destination?

    It decreases the frequency of syncs so it's a straight tradeoff between
    faster speed and more data loss if innd crashes (or the system crashes, or loses power, etc.). The risk is that if something goes wrong and innd
    crashes before updating the history and active files, the sending system
    may think it has successfully transmitted articles (they were acked), but
    the receiving system never recorded they were received and thus while
    they're on disk they're orphaned.

    If innd doesn't crash, you could set it arbitrarily high and I think it
    would just make things faster. Think of it as the checkpointing
    frequency.

    If there are any other tuning suggestions that can be had as far as
    reducing extraneous things (I did set incominglogfrequency to 2000 which still logs about twice a second, for example)?

    Make sure that you've disabled sync for all of your news logs. I think
    that's already covered in INSTALL but a lot of people forget about it.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Russ Allbery on Sat Jul 1 22:48:02 2023
    On Jul 1, 2023 at 5:39:26 PM CDT, "Russ Allbery" <[email protected]> wrote:

    Jesse Rehmer <[email protected]> writes:

    Can we talk about the implications of setting icdsynccount to a much
    much higher value than 10? I have it set to 500 now, but curious if
    there could be gains by setting this really high in my scenario where I
    know I'm not receiving duplicate articles and no articles exist on the
    destination?

    It decreases the frequency of syncs so it's a straight tradeoff between faster speed and more data loss if innd crashes (or the system crashes, or loses power, etc.). The risk is that if something goes wrong and innd crashes before updating the history and active files, the sending system
    may think it has successfully transmitted articles (they were acked), but
    the receiving system never recorded they were received and thus while
    they're on disk they're orphaned.

    If innd doesn't crash, you could set it arbitrarily high and I think it
    would just make things faster. Think of it as the checkpointing
    frequency.

    If there are any other tuning suggestions that can be had as far as
    reducing extraneous things (I did set incominglogfrequency to 2000 which
    still logs about twice a second, for example)?

    Make sure that you've disabled sync for all of your news logs. I think that's already covered in INSTALL but a lot of people forget about it.

    I'm not finding anything about that in the current INSTALL I have on my machine?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jesse Rehmer on Sat Jul 1 16:12:45 2023
    Jesse Rehmer <[email protected]> writes:
    On Jul 1, 2023 at 5:39:26 PM CDT, "Russ Allbery" <[email protected]> wrote:

    Make sure that you've disabled sync for all of your news logs. I think
    that's already covered in INSTALL but a lot of people forget about it.

    I'm not finding anything about that in the current INSTALL I have on my machine?

    Huh, yes, indeed. Edit /etc/syslog.conf to add - as the first character
    for the paths to which news logs are written, at least for news.notice.
    For example:

    news.notice -/var/log/news/news.notice

    Otherwise syslogd syncs on every write.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Jul 3 13:03:08 2023
    Hi Russ,

    Having googled a bit, there does not seem to be a more elegant way to
    achieve that, unless switching to another method to read sockets
    (poll(), libevent or).

    Switching to libevent is almost certainly the right long-term solution
    (I've been thinking about this for ages and previously converted remctld
    to it), since then it handles all the many confusing variations of select/poll/epoll/etc. for you and you don't have to worry about figuring
    out which system has which variation. It provides an equivalent of innd's current select loop that's much cleaner but works in a very similar way.
    But unfortunately it's a whole lot of quite disruptive work.

    It would probably produce a noticable performance increase for really busy servers, too. select is notoriously inefficient.

    Yes, I totally agree for the long-term solution to switch to libevent.

    For the time being, I've tried to integrate Diablo's hack into INN. It
    seems to work (at least innd sees the wanted FD_SETSIZE size).

    In INSTALL and in the documentation of rlimitnofile in inn.conf, I just
    plan to add:

    If you really need using more than 1024 file descriptors, you can try
    to rebuild INN with for instance the "-DLARGE_FD_SETSIZE=4096" option
    given to the compiler to increase the possible value to 4096.


    Incidentally, is the "File Descriptor Limits" still accurate in INSTALL?

    INN won't be able to increase the limits above the hard limits set by
    your operating system; on some systems, that hard limit is normally
    256 file descriptors (Linux, for example). On others, like Solaris,
    it's 1024. Increasing the limit beyond that value may require serious
    system configuration work. (On some operating systems, it requires
    patching and recompiling the kernel. On Solaris it can be changed in
    /etc/system, but for 2.6 or earlier the limit cannot be increased
    beyond 1024 without breaking select(2) and thereby breaking all of
    INN. For current versions of Linux, you may be able to change the
    maximum by writing to /proc/sys/fs/file-max.)

    Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?




    I've found out that when systemd is in use, systemd/sd-daemon.h includes asm/socket.h when __USE_MISC is set, which in turn includes
    linux/posix_types.h which unconditionnaly redefines __FD_SETSIZE to 1024
    (even if it has another value) for use with __kernel_fd_set.
    So __FD_SETSIZE has to be forced before any inclusion of bits/types.h
    and after any inclusion of systemd/sd-daemon.h on Linux
    (systemd/sd-daemon.h also includes bits/types.h).

    I am unsure of the consequences but systemd might not work with INN if a systemd file descriptor exceeds 1024 (?) (though the fd_set size used
    with select(2) has the wanted LARGE_FD_SETSIZE value, from sys/select.h)
    In doubt, I could prevent LARGE_FD_SETSIZE from being used when systemd
    is also used?



    Jesse, I'll also change the log line:
    xxx:yyy zzz free but was in SMASK
    to:
    xxx:yyy zzz free but was in SMASK; looks like fd_set is not large
    enough (you may want to increase FD_SETSIZE to a higher value by
    rebuilding INN with -DLARGE_FD_SETSIZE=4096)


    This way, I hope it will be of help for other people coming across that
    issue.



    Also, the documentation of the rlimitnofile parameter should say that
    innd may malfunction, and not only "suffer reduced functionality" :)

    The maximum number of file descriptors that innd(8) or innfeed(8) can have
    open at once. If innd(8) or innfeed(8) attempts to open more file
    descriptors than this value, it is possible the program may throttle or -otherwise suffer reduced functionality. The number of open file
    +otherwise malfunction or suffer reduced functionality.



    I hope all of these changes will be useful, and you're happy with your
    new hardware setup Jesse!

    --
    Julien ÉLIE

    « – Quel a été votre plus beau jour ?
    – Une nuit. » (Brigitte Bardot)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Mon Jul 3 09:04:37 2023
    Julien ÉLIE <[email protected]d> writes:

    Incidentally, is the "File Descriptor Limits" still accurate in INSTALL?

    INN won't be able to increase the limits above the hard limits set by
    your operating system; on some systems, that hard limit is normally
    256 file descriptors (Linux, for example). On others, like Solaris,
    it's 1024. Increasing the limit beyond that value may require serious
    system configuration work. (On some operating systems, it requires
    patching and recompiling the kernel. On Solaris it can be changed in
    /etc/system, but for 2.6 or earlier the limit cannot be increased
    beyond 1024 without breaking select(2) and thereby breaking all of
    INN. For current versions of Linux, you may be able to change the
    maximum by writing to /proc/sys/fs/file-max.)

    Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?

    1024 is now the default for Linux, it looks like, and you can now change
    it dynamically using the normal ulimit command (no need for messing around
    in proc). I have no idea what Solaris does these days. I suspect I last tested that part of INN against Solaris 7, which is a very long time ago.

    I've found out that when systemd is in use, systemd/sd-daemon.h includes asm/socket.h when __USE_MISC is set, which in turn includes linux/posix_types.h which unconditionnaly redefines __FD_SETSIZE to 1024 (even if it has another value) for use with __kernel_fd_set. So
    __FD_SETSIZE has to be forced before any inclusion of bits/types.h and
    after any inclusion of systemd/sd-daemon.h on Linux (systemd/sd-daemon.h
    also includes bits/types.h).

    I'm a little worried that something is going to be unhappy with redefining __FD_SETSIZE somewhere. That approach was always a bit of an intrusive
    hack. But if Diablo is happy, maybe it will be fine.

    I am unsure of the consequences but systemd might not work with INN if a systemd file descriptor exceeds 1024 (?)

    I suspect it will only matter if the notification file descriptor is over
    1024, which shouldn't happen because it gets passed in on startup.

    Jesse, I'll also change the log line:
    xxx:yyy zzz free but was in SMASK
    to:
    xxx:yyy zzz free but was in SMASK; looks like fd_set is not large
    enough (you may want to increase FD_SETSIZE to a higher value by
    rebuilding INN with -DLARGE_FD_SETSIZE=4096)

    We should be able to do better than this and detect the actual problem
    before we corrupt our data structures, I think, by testing whether a file descriptor is larger than FD_SETSIZE.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Jul 3 18:21:12 2023
    Hi Russ,

    Can we talk about the implications of setting icdsynccount to a much
    much higher value than 10? I have it set to 500 now, but curious if
    there could be gains by setting this really high in my scenario where I
    know I'm not receiving duplicate articles and no articles exist on the
    destination?

    It decreases the frequency of syncs so it's a straight tradeoff between faster speed and more data loss if innd crashes (or the system crashes, or loses power, etc.). The risk is that if something goes wrong and innd crashes before updating the history and active files, the sending system
    may think it has successfully transmitted articles (they were acked), but
    the receiving system never recorded they were received and thus while
    they're on disk they're orphaned.

    If innd doesn't crash, you could set it arbitrarily high and I think it
    would just make things faster. Think of it as the checkpointing
    frequency.

    I'll improve the documentation of icdsynccount with some of your
    explanations.
    There's also an undocumented use of icdsynccount for buffindexed: the
    head of the buffer is flushed every 10*icdsynccount+1 articles.
    I can mention it in the documentation.

    As far as I see, other overview and storage methods flush indexes or
    headers for each article (tradindexed, cnfs, timecaf) or after a
    configurable period of time (transtimelimit of 10s by default for
    ovsqlite for instance).

    For buffindexed, maybe a dedicated option would be better, than to reuse icdsynccount? (buffindexedheadsync?)


    Make sure that you've disabled sync for all of your news logs. I think that's already covered in INSTALL but a lot of people forget about it.

    I'll add a note too, following one of your next articles in this thread.

    --
    Julien ÉLIE

    « Traversez la rivière en foule, le crocodile ne vous mangera pas. »
    (proverbe malgache)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Mon Jul 3 17:01:44 2023
    On Jul 3, 2023 at 11:46:51 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Russ,

    Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?

    1024 is now the default for Linux, it looks like, and you can now change
    it dynamically using the normal ulimit command (no need for messing around >> in proc). I have no idea what Solaris does these days. I suspect I last
    tested that part of INN against Solaris 7, which is a very long time ago.

    Solaris 10 and 11 have in sys/select.h:

    #ifndef FD_SETSIZE
    #ifdef _LP64
    #define FD_SETSIZE 65536
    #else
    #define FD_SETSIZE 1024
    #endif /* _LP64 */

    Can confirm that Solaris 9 also has the same values.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Jul 3 18:46:51 2023
    Hi Russ,

    Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?

    1024 is now the default for Linux, it looks like, and you can now change
    it dynamically using the normal ulimit command (no need for messing around
    in proc). I have no idea what Solaris does these days. I suspect I last tested that part of INN against Solaris 7, which is a very long time ago.

    Solaris 10 and 11 have in sys/select.h:

    #ifndef FD_SETSIZE
    #ifdef _LP64
    #define FD_SETSIZE 65536
    #else
    #define FD_SETSIZE 1024
    #endif /* _LP64 */

    so on Solaris 64-bit, the default is large enough :)

    I bet we can now remove the documentation of the 256 limit for Solaris
    as it no longer applies to current versions.



    I am unsure of the consequences but systemd might not work with INN if a
    systemd file descriptor exceeds 1024 (?)

    I suspect it will only matter if the notification file descriptor is over 1024, which shouldn't happen because it gets passed in on startup.

    Yep you're right.



    Jesse, I'll also change the log line:
    xxx:yyy zzz free but was in SMASK
    to:
    xxx:yyy zzz free but was in SMASK; looks like fd_set is not large
    enough (you may want to increase FD_SETSIZE to a higher value by
    rebuilding INN with -DLARGE_FD_SETSIZE=4096)

    We should be able to do better than this and detect the actual problem
    before we corrupt our data structures, I think, by testing whether a file descriptor is larger than FD_SETSIZE.

    Sure.
    The check could be done in RCHANadd, SCHANadd, and WCHANadd.
    It is when channels.max_fd and channels.max_sleep_fd are updated, when
    needed.
    We could then log a warn message each time these max_fd or max_sleep_fd
    counts *would* have been updated and be higher than FD_SETSIZE-1? And immediately close the channel?

    This way, the select(channels.max_fd + 1, &rdfds, &wrfds, NULL, &tv)
    call won't go beyond the expected limit.
    But no new connection will be seen (they will probably timeout).


    I see that innfeed already does something like that in endpoint.c:

    #if defined(FD_SETSIZE)
    if ((unsigned int) fd >= FD_SETSIZE) {
    warn("ME fd (%d) looks too big (%d -- FD_SETSIZE)", fd,
    FD_SETSIZE);
    return NULL;
    }
    #else
    if (fd > (sizeof(fd_set) * CHAR_BIT)) {
    warn("ME fd (%d) looks too big (%d -- sizeof (fd_set) * CHAR_BIT)",
    fd, (sizeof(fd_set) * CHAR_BIT));
    return NULL;
    }
    #endif



    Would a similar check be added for the xCHANadd functions?
    I guess the check could be done by a new function in lib/fdlimit.c that
    can be used for both innd and innfeed, returning true/false.

    --
    Julien ÉLIE

    « Traversez la rivière en foule, le crocodile ne vous mangera pas. »
    (proverbe malgache)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Jul 4 12:17:18 2023
    Adding to my previous message:

    Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?

    1024 is now the default for Linux, it looks like, and you can now change
    it dynamically using the normal ulimit command (no need for messing
    around
    in proc).  I have no idea what Solaris does these days.  I suspect I last >> tested that part of INN against Solaris 7, which is a very long time ago.

    Solaris 10 and 11 have in sys/select.h:

    #ifndef FD_SETSIZE
    #ifdef _LP64
    #define FD_SETSIZE      65536
    #else
    #define FD_SETSIZE      1024
    #endif  /* _LP64 */

    I've came accross this page:
    https://support.oracle.com/knowledge/Sun%20Microsystems/1005979_1.html

    saying that the default fd limit is 256 up to Solaris 11.4 SRU 26.

    There is a difference between the *stdio* and *select* interfaces, which certainly explains the point in the recommendation of 256 for
    rlimitnofile for Solaris.

    stdio handles up to 256 file descriptors up to Solaris 10, and 32-bit
    Solaris 11.3. For 64-bit Solaris 11.x, and 32-bit Solaris 11.4, stdio no
    longer has that limit.

    I then believe the recommandation for rlimitnofile should remain 256,
    and can be increased only for Solaris 11.4 or 64-bit Solaris 11.0-11.3.
    If of course I understand well what is said in the Oracle support page.

    --
    Julien ÉLIE

    « Traversez la rivière en foule, le crocodile ne vous mangera pas. »
    (proverbe malgache)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Jul 4 22:31:45 2023
    Hi all,

    This thread is pretty interesting. A few improvements to do in our documentation,
    besides a fix for the maximum number of file descriptors to use.


    I'll improve the documentation of icdsynccount with some of your explanations.

    Here follows a suggestion, with a recommendation of running makehistory
    for recovery. Any other caveat or recovery actions to recommend?

    icdsynccount
    How many article writes between updating the active and history
    files. The default value is 10.

    This is a trade-off between faster speed and more data loss if innd
    crashes (or the system crashes, or loses power, etc.). The higher
    this parameter is, the less frequent syncs are done. It somehow
    represents a checkpointing frequency: it would be the maximum
    number of articles stored on disk that might be orphaned in case of
    a crash as they wouldn't have been recorded in the history file.
    There would also be failures to store articles with already
    assigned article numbers, but not recorded in the active file. (In
    order not to encounter such errors, you should then rebuild the
    history file and the overview with makehistory(8). The active file
    will be automatically be renumbered after that operation.)



    There's also an undocumented use of icdsynccount for buffindexed: the
    head of the buffer is flushed every 10*icdsynccount+1 articles.
    I can mention it in the documentation.

    I'll add a dedicated buffindexedheadsync parameter for that.



    Make sure that you've disabled sync for all of your news logs.  I think
    that's already covered in INSTALL but a lot of people forget about it.

    I'll add a note too, following one of your next articles in this thread.

    Proposition for INSTALL and newslog(5):

    If you're using syslogd, edit /etc/syslog.conf on your system and add
    lines that look like the following:

    news.crit <pathlog in inn.conf>/news.crit
    news.err <pathlog in inn.conf>/news.err
    news.notice -<pathlog in inn.conf>/news.notice

    The minus sign as the first character for the path to news.notice
    instructs syslogd not to synchronize the log file to disk every time
    there is a new log entry (it otherwise degrades performance). Depending
    on the syslog daemon you are using, log synchronization may already be
    disabled by default (that is for instance the case for rsyslogd or
    syslog-ng).

    --
    Julien ÉLIE

    « – Obélix, mon garçon, quand tu viens chez moi, j'aimerais que tu
    laisses ton menhir à la porte !
    – Mais chef, le menhir se porte aussi bien à l'intérieur qu'à
    l'extérieur !
    – Sauf quand la porte est trop petite pour le faire entrer
    imbécile !!! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Tue Jul 4 14:42:56 2023
    Julien ÉLIE <[email protected]d> writes:

    I've came accross this page:
    https://support.oracle.com/knowledge/Sun%20Microsystems/1005979_1.html

    saying that the default fd limit is 256 up to Solaris 11.4 SRU 26.

    There is a difference between the *stdio* and *select* interfaces, which certainly explains the point in the recommendation of 256 for rlimitnofile for Solaris.

    stdio handles up to 256 file descriptors up to Solaris 10, and 32-bit
    Solaris 11.3. For 64-bit Solaris 11.x, and 32-bit Solaris 11.4, stdio no longer has that limit.

    I then believe the recommandation for rlimitnofile should remain 256, and
    can be increased only for Solaris 11.4 or 64-bit Solaris 11.0-11.3.
    If of course I understand well what is said in the Oracle support page.

    INN does some terrifying things to try to handle cases where file
    descriptors higher than 256 are possible but it still wants to use stdio.
    I'm not sure how well that stuff works in practice, but it's been there
    for forever and I think it's still enabled. See lib/reservefd.c.

    In theory, this is supposed to allow increasing the file descriptor limit
    above 256 on platforms such as Solaris that have this stdio limitation,
    and it did work at least at one time. Note that, critically, it only is implemented for innd; all the other programs have to either not use stdio
    or have to not open more than 256 file descriptors. I think that might
    apply to everything?

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Wed Jul 5 12:46:38 2023
    Julien ÉLIE <[email protected]d> writes:

    I'm wondering whether this is still useful nowadays (as it is only used
    for 5 file descriptors in innd), and we couldn't just use
    fopen()/fclose() for them...

    I believe the problem is that some of the things for which it uses stdio
    (most notably history) might be closed and reopened, and if a new incoming connection is handled between the close and reopen (I'm not sure if that's possible), it could grab that low-numbered file descriptor and leave no available file descriptors lower than 256, in which case stdio on those platforms will fail.

    The chosen number of file descriptors is intended to be enough for history
    plus whatever else used stdio (I think probably the active file and
    overview, although I don't think overview uses stdio any more).

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Jul 5 21:41:04 2023
    Hi Russ,

    I've came accross this page:
    https://support.oracle.com/knowledge/Sun%20Microsystems/1005979_1.html

    saying that the default fd limit is 256 up to Solaris 11.4 SRU 26.

    There is a difference between the *stdio* and *select* interfaces, which
    certainly explains the point in the recommendation of 256 for rlimitnofile >> for Solaris.

    stdio handles up to 256 file descriptors up to Solaris 10, and 32-bit
    Solaris 11.3. For 64-bit Solaris 11.x, and 32-bit Solaris 11.4, stdio no
    longer has that limit.

    I then believe the recommandation for rlimitnofile should remain 256, and
    can be increased only for Solaris 11.4 or 64-bit Solaris 11.0-11.3.
    If of course I understand well what is said in the Oracle support page.

    INN does some terrifying things to try to handle cases where file
    descriptors higher than 256 are possible but it still wants to use stdio.
    I'm not sure how well that stuff works in practice, but it's been there
    for forever and I think it's still enabled. See lib/reservefd.c.

    Oh indeed, I had not noticed this lib/reservefd.c file.
    innd reserves 4 or 5 file descriptors at startup and this is
    unconditionally enabled. It still works as otherwise a fatal error is returned, and Fopen()/Fclose() are still in use for these file
    descriptors for history (file, stats and dbz).

    I'm wondering whether this is still useful nowadays (as it is only used
    for 5 file descriptors in innd), and we couldn't just use
    fopen()/fclose() for them...


    In theory, this is supposed to allow increasing the file descriptor limit above 256 on platforms such as Solaris that have this stdio limitation,
    and it did work at least at one time. Note that, critically, it only is implemented for innd; all the other programs have to either not use stdio
    or have to not open more than 256 file descriptors. I think that might
    apply to everything?

    innfeed has its own parameter (stdio-max) to reserve the given number of
    file descriptors for stdio. It is not used by default, but one can set
    it so that network socket file descriptors are always higher than this
    number.

    --
    Julien ÉLIE

    « Sème du bonheur dans le champ du voisin, tu seras surpris de constater
    ce que le vent fera produire au tien. » (Juliette Saint Gelais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Wed Jul 5 22:11:10 2023
    Julien ÉLIE <[email protected]d> writes:

    Sure.
    The check could be done in RCHANadd, SCHANadd, and WCHANadd.
    It is when channels.max_fd and channels.max_sleep_fd are updated, when needed.
    We could then log a warn message each time these max_fd or max_sleep_fd counts *would* have been updated and be higher than FD_SETSIZE-1? And immediately close the channel?

    That sounds right to me.

    I see that innfeed already does something like that in endpoint.c:

    #if defined(FD_SETSIZE)
    if ((unsigned int) fd >= FD_SETSIZE) {
    warn("ME fd (%d) looks too big (%d -- FD_SETSIZE)", fd,
    FD_SETSIZE);
    return NULL;
    }
    #else
    if (fd > (sizeof(fd_set) * CHAR_BIT)) {
    warn("ME fd (%d) looks too big (%d -- sizeof (fd_set) *
    CHAR_BIT)",
    fd, (sizeof(fd_set) * CHAR_BIT));
    return NULL;
    }
    #endif

    Would a similar check be added for the xCHANadd functions?

    Yup, that's exactly what I was thinking of.

    I guess the check could be done by a new function in lib/fdlimit.c that
    can be used for both innd and innfeed, returning true/false.

    Sounds great to me!

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to [email protected] on Wed Jul 5 22:16:41 2023
    Julien ÉLIE <[email protected]d> writes:

    Here follows a suggestion, with a recommendation of running makehistory
    for recovery. Any other caveat or recovery actions to recommend?

    icdsynccount
    How many article writes between updating the active and history
    files. The default value is 10.

    This is a trade-off between faster speed and more data loss if innd
    crashes (or the system crashes, or loses power, etc.). The higher
    this parameter is, the less frequent syncs are done. It somehow
    represents a checkpointing frequency: it would be the maximum

    "It is essentially the frequency of checkpoints: the maximum number of
    articles that may be orphaned in case of a crash..."

    number of articles stored on disk that might be orphaned in case of
    a crash as they wouldn't have been recorded in the history file.
    There would also be failures to store articles with already
    assigned article numbers, but not recorded in the active
    file.

    "The missing updates to the active file would cause other problems later,
    such as duplicate article numbers and corresponding errors when storing
    new articles."

    (In order not to encounter such errors, you should then
    rebuild the history file and the overview with
    makehistory(8). The active file will be automatically be
    renumbered after that operation.)

    "(If innd has crashed, you can fix these errors by rebuilding the history
    file and overview with makehistory(8)...."

    Proposition for INSTALL and newslog(5):

    If you're using syslogd, edit /etc/syslog.conf on your system and add
    lines that look like the following:

    news.crit <pathlog in inn.conf>/news.crit
    news.err <pathlog in inn.conf>/news.err
    news.notice -<pathlog in inn.conf>/news.notice

    The minus sign as the first character for the path to news.notice
    instructs syslogd not to synchronize the log file to disk every time
    there is a new log entry (it otherwise degrades performance). Depending
    on the syslog daemon you are using, log synchronization may already be
    disabled by default (that is for instance the case for rsyslogd or
    syslog-ng).

    Yup, looks good.

    --
    Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Jul 6 19:50:41 2023
    Hi Russ,
    Here follows a suggestion, with a recommendation of running makehistory
    for recovery. Any other caveat or recovery actions to recommend?
    [...]
    "It is essentially the frequency of checkpoints: the maximum number of articles that may be orphaned in case of a crash..."

    OK, thanks for the rewording wich indeed is far better :)
    I'll update the documentation accordingly.

    --
    Julien ÉLIE

    « Il vaut mieux un tapis persan volé qu'un tapis volant percé ! »
    (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sun Jul 16 14:36:14 2023
    On Jul 3, 2023 at 6:03:08 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    I hope all of these changes will be useful, and you're happy with your
    new hardware setup Jesse!

    Circling back to this after having spent time migrating to my shiny new hardware (amazing how much faster things go when you jump 10 generations of hardware!). I am building INN adding "-DLARGE_FD_SETSIZE=4096" to COPT in Makefile.global (is that the correct way?).

    I see a number of these warnings, but I'm unsure which value is actually being used?

    libtool: compile: cc -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DLARGE_FD_SETSIZE=4096 -I../include -c network.c -fPIC -DPIC -o .libs/network.o
    In file included from network.c:49:
    ../include/portable/system.h:78:17: warning: 'FD_SETSIZE' macro redefined [-Wmacro-redefined]
    # define FD_SETSIZE LARGE_FD_SETSIZE
    ^
    /usr/include/sys/select.h:61:9: note: previous definition is here
    #define FD_SETSIZE 1024
    ^

    As a side - I appreciate the discussion and clarification on some of the
    tuning bits discussed. Hopefully someone else will find it useful as well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sun Jul 16 14:44:54 2023
    On Jul 16, 2023 at 9:36:14 AM CDT, "Jesse Rehmer" <[email protected]> wrote:

    On Jul 3, 2023 at 6:03:08 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    I hope all of these changes will be useful, and you're happy with your
    new hardware setup Jesse!

    Circling back to this after having spent time migrating to my shiny new hardware (amazing how much faster things go when you jump 10 generations of hardware!). I am building INN adding "-DLARGE_FD_SETSIZE=4096" to COPT in Makefile.global (is that the correct way?).

    I see a number of these warnings, but I'm unsure which value is actually being
    used?

    libtool: compile: cc -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DLARGE_FD_SETSIZE=4096 -I../include -c network.c -fPIC -DPIC -o .libs/network.o
    In file included from network.c:49:
    ../include/portable/system.h:78:17: warning: 'FD_SETSIZE' macro redefined [-Wmacro-redefined]
    # define FD_SETSIZE LARGE_FD_SETSIZE
    ^
    /usr/include/sys/select.h:61:9: note: previous definition is here
    #define FD_SETSIZE 1024
    ^

    As a side - I appreciate the discussion and clarification on some of the tuning bits discussed. Hopefully someone else will find it useful as well.

    Hmm I'm seeing other strange behavior like "make tests" seems to stall out and do nothing after getting to authprogs/ident. INN does start, but even actsync is core dumping.

    I see the following in kernel logs:

    pid 32149 (actsync), jid 0, uid 8: exited on signal 6 (core dumped)
    pid 37633 (ident.t), jid 0, uid 8: exited on signal 6 (core dumped)

    When I take the "-DLARGE_FD_SETSIZE=4096" option out of Makefile.global everything builds/tests/runs okay.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jul 29 14:11:02 2023
    Hi Jesse,

    Hmm I'm seeing other strange behavior like "make tests" seems to stall out and
    do nothing after getting to authprogs/ident. INN does start, but even actsync is core dumping.

    When I take the "-DLARGE_FD_SETSIZE=4096" option out of Makefile.global everything builds/tests/runs okay.

    Oh, that's bad. It seems that the large FD_SETSIZE patch does not work
    as it should on your installation.
    I do not see these problems with Debian, and I have also run "make
    tests" with success on an OpenBSD system. No stalling nor core dump
    seen. No macro-redefinition warning, and indeed that's a problem as the
    whole point of the patch is to define FD_SETSIZE before /usr/include/sys/select.h does it.


    I consider that subject still open and not fixed then.


    I unfortunately do not have access to a working FreeBSD system right now
    to test. These servers are down on the Compile Farm (https://cfarm.tetaneutral.net/machines/list/).


    Yes, adding -DLARGE_FD_SETSIZE=4096 to COPT in Makefile.global is the
    expected thing to do.




    Could you please try to add in include/portable/socket.h the following
    lines?

    #include "config.h"
    #include "portable/macros.h"

    +#if LARGE_FD_SETSIZE > 1024
    +# define FD_SETSIZE LARGE_FD_SETSIZE
    +#endif
    +
    #include <errno.h>
    #include <sys/types.h>



    When done, do you still see macro-redefinition warnings? Does it work
    better?

    --
    Julien ÉLIE

    « Moi, je n'ai pas goûté le sel de cette plaisanterie. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jul 29 14:33:29 2023
    Adding to my previous message:

    Could you please try to add in include/portable/socket.h the following
    lines?

    +#if LARGE_FD_SETSIZE > 1024
    +#    define FD_SETSIZE LARGE_FD_SETSIZE
    +#endif

    When done, do you still see macro-redefinition warnings?  Does it work better?

    You can then ignore macro-redefinition warnings between include/portable/socket.h and include/portable/system.h; that would be fine.
    If that works, I'll see how to handle that gracefully.

    --
    Julien ÉLIE

    « Moi, je n'ai pas goûté le sel de cette plaisanterie. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Jul 30 17:56:43 2023
    Hi Jesse,

    When I take the "-DLARGE_FD_SETSIZE=4096" option out of Makefile.global everything builds/tests/runs okay.

    And normally you will no longer have errors like "free:-1 35 free but
    was in SMASK", "SERVER cant store overview", "tradindexed: cannot get
    offset for article", etc.
    No channel with a file descriptor above 1024 will be created.

    --
    Julien ÉLIE

    « Ne craignez pas d'être lent, craignez seulement d'être à l'arrêt. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Mon Jul 31 14:19:11 2023
    On Jul 29, 2023 at 7:11:02 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    Hmm I'm seeing other strange behavior like "make tests" seems to stall out and
    do nothing after getting to authprogs/ident. INN does start, but even actsync
    is core dumping.

    When I take the "-DLARGE_FD_SETSIZE=4096" option out of Makefile.global
    everything builds/tests/runs okay.

    Oh, that's bad. It seems that the large FD_SETSIZE patch does not work
    as it should on your installation.
    I do not see these problems with Debian, and I have also run "make
    tests" with success on an OpenBSD system. No stalling nor core dump
    seen. No macro-redefinition warning, and indeed that's a problem as the whole point of the patch is to define FD_SETSIZE before /usr/include/sys/select.h does it.


    I consider that subject still open and not fixed then.


    I unfortunately do not have access to a working FreeBSD system right now
    to test. These servers are down on the Compile Farm (https://cfarm.tetaneutral.net/machines/list/).


    Yes, adding -DLARGE_FD_SETSIZE=4096 to COPT in Makefile.global is the expected thing to do.




    Could you please try to add in include/portable/socket.h the following
    lines?

    #include "config.h"
    #include "portable/macros.h"

    +#if LARGE_FD_SETSIZE > 1024
    +# define FD_SETSIZE LARGE_FD_SETSIZE
    +#endif
    +
    #include <errno.h>
    #include <sys/types.h>



    When done, do you still see macro-redefinition warnings? Does it work better?

    Tested and I didn't see the warnings at compile time, 'make tests' is good,
    and no more core dumping.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Aug 4 22:38:25 2023
    Hi Jesse,

    When done, do you still see macro-redefinition warnings? Does it work
    better? If that works, I'll see how to handle that gracefully.

    Tested and I didn't see the warnings at compile time, 'make tests' is good, and no more core dumping.

    Thanks for the confirmation!

    I've found out a graceful way to enforce header inclusions in the right
    order, using our clang-format code reformatter to instruct it to begin
    with config.h, then portable/system.h and then all the other headers in alphabetical order.

    The problem was portable/socket.h being otherwise included before portable/system.h (because of the alphabetical order). This is now fixed!


    https://github.com/InterNetNews/inn/commit/5f260cbb794c847c74739f0197344ef3f42a2367


    I hope the -DLARGE_FD_SETSIZE build-time option now works fine, and you
    haven't noticed any problem since last week.

    --
    Julien ÉLIE

    « Se taire, c'est pareil dans toutes les langues. » (Philippe Geluck)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Fri Aug 4 23:58:50 2023
    On Aug 4, 2023 at 3:38:25 PM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    When done, do you still see macro-redefinition warnings? Does it work
    better? If that works, I'll see how to handle that gracefully.

    Tested and I didn't see the warnings at compile time, 'make tests' is good, >> and no more core dumping.

    Thanks for the confirmation!

    I've found out a graceful way to enforce header inclusions in the right order, using our clang-format code reformatter to instruct it to begin
    with config.h, then portable/system.h and then all the other headers in alphabetical order.

    The problem was portable/socket.h being otherwise included before portable/system.h (because of the alphabetical order). This is now fixed!


    https://github.com/InterNetNews/inn/commit/5f260cbb794c847c74739f0197344ef3f42a2367


    I hope the -DLARGE_FD_SETSIZE build-time option now works fine, and you haven't noticed any problem since last week.

    Everything has been humming along well. I'll give this new commit a test too.

    I have a somewhat related question, why does the configure output for FreeBSD say to use gmake? I think I've overlooked this the entire time I've used FreeBSD and have always been using make.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Aug 5 08:10:12 2023
    Hi Jesse,

    I hope the -DLARGE_FD_SETSIZE build-time option now works fine, and you
    haven't noticed any problem since last week.

    Everything has been humming along well. I'll give this new commit a test too.

    Thanks for the testing. (You need removing the additional #define
    FD_SETSIZE added in portable/socket.h manually last week, as it is no
    longer needed with this commit.)


    I have a somewhat related question, why does the configure output for FreeBSD say to use gmake? I think I've overlooked this the entire time I've used FreeBSD and have always been using make.

    I've already came across FreeBSD versions on which stock make did not
    work. I don't remember the versions but maybe current make versions
    work fine.
    According to INSTALL: "The INN Makefiles use the syntax include FILE,
    rather than the syntax expected by some BSDish systems of .include
    <FILE>. If your system expects the latter syntax, the recommended
    solution is to install GNU make".


    So in your case, if make works fine, you can just go on using it.

    Instead of saying in the configure output:

    Mandatory FreeBSD packages:
    - gmake (INN needs to be built using gmake instead of make)
    - autoconf (generate a configuration script)
    - libtool (for library creation)
    - perl5

    It would probably be better to change the first requirement to:

    - make (or gmake if a syntax error occurs during the build)

    ?

    --
    Julien ÉLIE

    « Medicus curat, natura sanat. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to [email protected] on Sat Aug 5 13:51:57 2023
    On Aug 5, 2023 at 1:10:12 AM CDT, "Julien ÉLIE" <[email protected]d> wrote:

    Hi Jesse,

    I hope the -DLARGE_FD_SETSIZE build-time option now works fine, and you
    haven't noticed any problem since last week.

    Everything has been humming along well. I'll give this new commit a test too.

    Thanks for the testing. (You need removing the additional #define
    FD_SETSIZE added in portable/socket.h manually last week, as it is no
    longer needed with this commit.)

    Got it.

    I have a somewhat related question, why does the configure output for FreeBSD
    say to use gmake? I think I've overlooked this the entire time I've used
    FreeBSD and have always been using make.

    I've already came across FreeBSD versions on which stock make did not
    work. I don't remember the versions but maybe current make versions
    work fine.
    According to INSTALL: "The INN Makefiles use the syntax include FILE,
    rather than the syntax expected by some BSDish systems of .include
    <FILE>. If your system expects the latter syntax, the recommended
    solution is to install GNU make".


    So in your case, if make works fine, you can just go on using it.

    Instead of saying in the configure output:

    Mandatory FreeBSD packages:
    - gmake (INN needs to be built using gmake instead of make)
    - autoconf (generate a configuration script)
    - libtool (for library creation)
    - perl5

    It would probably be better to change the first requirement to:

    - make (or gmake if a syntax error occurs during the build)

    ?

    This makes sense... I don't know the history because I recently converted back to FreeBSD after a very long time spent in in the Linux realm. I recall from the past some of the non-GNU development tools behaved quite a bit
    differently. I used to have a default process to install all the GNU tools after OS install, but haven't done that on recent system builds. "make" is
    part of the base FreeBSD system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)