• new in UDD: duck importer (URL checker)

    From Lucas Nussbaum@21:1/5 to All on Wed Jun 11 09:00:01 2025
    Hi,

    I added a duck importer[0] and a duck dashboard[1] to UDD.
    Previously the duck.debian.net service was providing regular checks
    about URLs in Debian package, but it broke at some point around 2020.

    # importer internals

    The UDD importer[0] works as follows:
    - URLs are collected out of all source packages using the 'duck' package
    (a vendorized version[2], because I needed to add a dump mode, which
    is also submitted as #1107642)
    This is tracked in the duck_raw (raw output from duck) and duck_urls
    (easier to work with, one row per (source/version, URL))
    - On a regular basis, UDD checks all known URLs (with a retry policy
    depending on whether it's possibly a transient failure).
    duck is _not_ used for this part (the importer has its own check
    logic), because the check happens URL per URL, not package per package.
    This is tracked in the duck_url_status.

    # dashboards

    - per maintainer view: https://udd.debian.org/duck/?email1=pkg-perl-maintainers%40lists.alioth.debian.org#results
    - The DMD 'duck' column works again: https://udd.debian.org/dmd/?email1=pkg-perl-maintainers%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#details

    # statistics

    * UDD knows about 207055 URLs
    * 15207 URLs (7.34%) are failing
    * 937 (2.37%) source packages failed to be processed by duck (I need to
    look into that)

    [0] https://salsa.debian.org/qa/udd/-/blob/master/rimporters/duck.rb?ref_type=heads
    [1] https://udd.debian.org/duck/
    [2] https://salsa.debian.org/qa/udd/-/tree/master/vendor/duck?ref_type=heads

    Best,

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Soren Stoutner@21:1/5 to All on Wed Jun 11 09:08:23 2025
    On Tuesday, June 10, 2025 11:52:13 PM Mountain Standard Time Lucas Nussbaum wrote:
    # statistics

    * UDD knows about 207055 URLs
    * 15207 URLs (7.34%) are failing
    * 937 (2.37%) source packages failed to be processed by duck (I need to
    look into that)

    I noticed that it is trying to check upstream metadata Repository URLs (which, in some cases, are not expected to return a result to a standard web request).

    For example, see the entry for privacy browser:

    https://udd.debian.org/duck/? email1=soren%40debian.org&email2=&email3=&packages=&ignpackages=&format=html#results

    It doesn’t like:

    Repository: https://git.stoutner.com/PrivacyBrowserPC.git

    But it shouldn’t be able to access that unless it is using the git protocol.

    The Repository-Browse URL works as expected:

    Repository-Browse: https://gitweb.stoutner.com/? p=PrivacyBrowserPC.git;a=summary

    https://salsa.debian.org/soren/privacybrowser/-/blob/master/debian/upstream/ metadata?ref_type=heads#L7

    --
    Soren Stoutner
    [email protected]
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEJKVN2yNUZnlcqOI+wufLJ66wtgMFAmhJqfcACgkQwufLJ66w tgMOvQ//QViARjmwxjNckFpQrhO/hpYWtLg6DkdrKeWZ50CGveJG6IOdFeQbjvje 6ITl6Ci7ZbeMwHu9cfKXF02PBy5jadWco9ORReNFBjGI9pjlBGtd8AqbkIBvdGm8 NslsexKuND7YvZ0+hNZvHnOC8RZgAE/bhWdPDdllHHD25g0yPYYL5HVuFb5tasVI 4XKT8pa2V2jF7+4Hws9p0HeblUkG3vAeM8qBBt6WOvZYBe6LmAfb58OjZN17ZS90 dc7Qd7AxZrg9ytWTbZXalbe3yKY9pO+z1XHPjkYdunsXn4yFoaBPPg6e/MooFUTQ fJvr2E0YeTnVfEF99DXL5j20sQ5arJubOd6s+kXSW05akBUfQ4zfcKPRYB3RvHaE XhHwyfwlM75GeUxbR3vWPyqT5l89OiUANiUSKsi1NFOq2Os5GZ7PdNxQufKI9bsp jDKMDURVOqoqDL3HLqd0Wv2B3BlK49YliG5qinKrnGNs3MuXrLhG3dY9x5dTdJWy aQFv3kNB3txebQ6a8o636uXx6goxrW4ZKydwSA5pxF4ZxvnLdaYdNkrftUImwWP/ Z4nSXkgGJWxvlbwDT7DUjwVenFgHCQ+V5TW/dowkKj83SL6BaGtEdf2sHdUe0eiq qBPn/cMhxs7sotn38Tlu+SmG3KCEOszkuPK4kKvnCDHeuW/TmR0=
    =OWwf
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Soren Stoutner on Thu Jun 12 09:30:01 2025
    Hi Soren,

    On 11/06/25 at 09:08 -0700, Soren Stoutner wrote:
    On Tuesday, June 10, 2025 11:52:13 PM Mountain Standard Time Lucas Nussbaum wrote:
    # statistics

    * UDD knows about 207055 URLs
    * 15207 URLs (7.34%) are failing
    * 937 (2.37%) source packages failed to be processed by duck (I need to
    look into that)

    I noticed that it is trying to check upstream metadata Repository URLs (which,
    in some cases, are not expected to return a result to a standard web request).

    For example, see the entry for privacy browser:

    https://udd.debian.org/duck/? email1=soren%40debian.org&email2=&email3=&packages=&ignpackages=&format=html#results

    It doesn’t like:

    Repository: https://git.stoutner.com/PrivacyBrowserPC.git

    But it shouldn’t be able to access that unless it is using the git protocol.

    The Repository-Browse URL works as expected:

    Repository-Browse: https://gitweb.stoutner.com/? p=PrivacyBrowserPC.git;a=summary

    https://salsa.debian.org/soren/privacybrowser/-/blob/master/debian/upstream/ metadata?ref_type=heads#L7

    Thanks for the feedback.

    I improved the URL tester to deal with the Git protocol (so now it
    checks that it talks to a valid Git repository for e.g. Vcs-Git URLs),
    and also added an override for the Repository metadata field, to check
    using that method, if the field looks like a Git repository.

    It would have been better to name the field "Repository-Git" or
    something to avoid guessing the type of repository based on URL, but
    it's probably too late for that.

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)