• Bug#1107645: postgresql: Provide a mechanism to hook service restarts

    From Antonio Terceiro@21:1/5 to Helmut Grohne on Sat Jul 19 19:10:02 2025
    Hi,

    Thanks for the bug report.

    On Wed, Jun 11, 2025 at 06:01:05PM +0200, Helmut Grohne wrote:
    Control: reassign -1 debci-collector
    Control: retitle -1 debci-collector should handle a postgresql connection failure in a better way

    On Wed, Jun 11, 2025 at 02:43:14PM +0200, Christoph Berg wrote:
    Re: Helmut Grohne
    That said, I'm not super convinced that this is a good solution. Maybe spending more effort on the debci side is warranted. In principle, debci should work with a remote postgresql server and then no such
    notification can happen.

    You are not alone. Apps freaking out after database restarts is still widely seen.

    I'm reassigning the bug to debci-collector as there is no useful thing postgresql can do to support the use case. What follows is context for
    debci maintainers.

    When restarting postgresql (and thus closing existing connections), debci-collector gets stuck. You get this:

    | E, [2025-05-31T06:44:25.008155 #1704] ERROR -- #<Bunny::Session:0x938 [email protected]:5671, vhost=/, addresses=[ci.example.com:5671]>: Uncaught exception from consumer #<Bunny::Consumer:1355380 @channel_id=1 @queue=debci_results> @consumer_tag=
    bunny-1747892779000-472546353095>: #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() FATAL: terminating connection due to administrator command
    | server closed the connection unexpectedly
    | This probably means the server terminated abnormally
    | before or while processing the request.
    | > @ /usr/share/rubygems-integration/all/gems/activerecord-6.1.7.10/lib/active_record/connection_adapters/postgresql_adapter.rb:687:in `exec_prepared'

    And then for every further result being processed, you get this:

    | E, [2025-05-31T06:45:05.396999 #1704] ERROR -- #<Bunny::Session:0x938 [email protected]:5671, vhost=/, addresses=[ci.example.com:5671]>: Uncaught exception from consumer #<Bunny::Consumer:1355380 @channel_id=1 @queue=debci_results> @consumer_tag=
    bunny-1747892779000-472546353095>: #<ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor> @ /usr/share/rubygems-integration/all/gems/activerecord-6.1.7.10/lib/active_record/connection_adapters/postgresql_adapter.rb:
    687:in `exec_prepared'

    This is due to how debci in general uses ActiveRecord. Looking into lib/debci/db.rb, we may see that the last line is:

    | Debci::DB.establish_connection

    I understand this as one connection being opened at program startup and
    its kept for the entire process lifetime. When it is closed, stuff just fails.

    It's not clear to me how to fix this, but ActiveRecord does have ActiveRecord::Base.connection_pool.with_connection. I guess a first step would be wrapping all database interactions with this such that
    ActiveRecord can keep track of when connections are leased and released. Then, we may request that the pool closes idle connections, but I
    wouldn't know how.

    The key complaint in this bug report is the failure mode. I suggest that
    it becomes resilient to connection failure, but another way of dealing
    with this is propagating the exception and terminating the
    debci-collector process such that systemd can restart it. Solving it
    that way would be a reasonable thing to do from my point of view. Unfortunately, I did not figure out where that exception is caught and
    logged rather than propagated.

    Any ideas on how to move forward here?

    I agree that this is a problem, and we have been bitten by it on
    ci.debian.net a few times. However, I and kanashiro tried to reproduce
    this for some time and failed. We just realized that your logs show a
    stable system, and thus activerecord 6.x, while we tried with trixie, activerecord 7.2, which AFAICT reestablishes the connection on its own.

    ci.debian.net has been upgraded to trixie since a few weeks, and all is
    well there, so I have little incentive to hunt this down. It may very
    well be that this is fixed by just upgrading.

    When you upgrade, if you remember, please update the bug here.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEst7mYDbECCn80PEM/A2xu81GC94FAmh7z6sACgkQ/A2xu81G C95QLhAA6HxT2gaiaFdaR7xaG/TTZNt/1kJCRAlQNd9CMkpAiEl+/UCK6DlLFi9T sansSbMJNCpZnbRDoqr4i5qpbk7p6nxidbMK6hVK0zsfXnKDVVlOdYKKIwlq26N9 h1alLYKKiQQqpFNxgXjJ8zOwxkaTwY3E+NaLsydDtJTuB3zonuXMz6eDRRVg9kwR GC/Hv8KYM3F8c6eNEQFCrsBG0YNHmP7tlFy/nk5xOgq3z4lED6uGXy2dCRYq+Oj3 MJdX6KHBi6QsTQWr67OVyXkaUosoIWoGj7m2E2MhLLXQj2htCHs9Voeg/JFGVkic DwF50Iw6H8w+bV3mUaTKDXWo0OXjXNQWjGuLJb1yU+YYXybtAQDLEXKjytlm6/1q JWLoheuoHx8aOBXXZ0uWUpu5B3S2qqao3zTizi74X8v4TwIqOn3dQ0c+XuB7yr7E 0raJGVftmGyHer05j0UFv8k4PpwKkFjvnCu4cDzr7qYD4fD8ThZRywWmEaDUxcIy SqWIb/KG2KHUkOzNmMa+jOYcThO1qlBOfw1ZaTlTeP/9mnuY2eKT6koVRsrVK/w8 Z8ATIZtupa5mP+fok9QunvVrdWyRNSCAM+wMjUR6QAw4x3REAEQl4x972QEU3hQU punPHHfwVSri52Nq8xEbwEI+QWZeUY0d9ZlYQNjOwTT+sjIO1AM=
    =IB8K
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)