• Bug#949506: RFP: wayback-machine-downloader -- Download an entire websi

    From =?utf-8?Q?Antoine_Beaupr=C3=A9?=@1:229/2 to Alessandro Barbieri on Thu Jul 3 03:40:01 2025
    From: [email protected]

    owner 834177 [email protected]
    thanks

    On 2020-01-21 16:26:31, Alessandro Barbieri wrote:
    Package: wnpp
    Severity: wishlist

    * Package name : wayback-machine-downloader
    Version : 2.2.1
    Upstream Author : Julian Khaleghy
    * URL : https://github.com/hartator/wayback-machine-downloader
    * License : MIT
    Programming Lang: Ruby
    Description : Download an entire website from the Wayback Machine.

    It will download the last version of every file present on Wayback Machine to ./websites/example.com/. It will also re-create a directory structure and auto-create index.html pages to work seamlessly with Apache and Nginx. All files downloaded are the
    original ones and not Wayback Machine rewritten versions. This way, URLs and links structure are the same as before.

    Just to let you know that there's also a Python version of this that's
    called "wayback" and seems slightly better maintained, although I've
    heard this one (wayback-machine-downloader) might be "better", it's
    clear the repo mentioned about has been inactive for years and one must
    find The Right Fork going forward.

    For now I'll be taking a look at the python version, wayback.

    A.

    --
    We must shift America from a needs- to a desires-culture. People must
    be trained to desire, to want new things, even before the old have
    been entirely consumed. Man's desires must overshadow his needs.
    - Paul Mazur, Lehman Brothers

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From =?utf-8?Q?Antoine_Beaupr=C3=A9?=@1:229/2 to All on Thu Jul 3 04:00:01 2025
    From: [email protected]

    noowner 834177
    thanks

    On 2025-07-02 21:26:21, Antoine Beaupré wrote:
    For now I'll be taking a look at the python version, wayback.

    So I've done that. I've audited the code and did some basic packaging
    (mostly thanks to py2dsp, but i did the job in full), and the result is
    here:

    https://salsa.debian.org/python-team/packages/waybackpack

    waybackpack works, but with one major limitation: it downloads only a
    single URL at a time, not an entire site.

    so I'm abandoning the packaging for now, anyone feel free to pick it up
    and just upload it if you feel like it, but i don't quite see the point
    myself.

    now i guess we need to find the right machine-downloader fork now...

    a.
    --
    La mer, cette grande unificatrice, est l'unique espoir de l'homme.
    Aujourd'hui plus que jamais auparavant, ce vieux dicton dit
    littéralement ceci: nous sommes tous dans le même bateau.
    - Jacques Yves Cousteau - Océanographe

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From =?utf-8?Q?Antoine_Beaupr=C3=A9?=@21:1/5 to All on Mon Jul 7 15:50:01 2025
    On 2025-07-02 21:54:15, Antoine Beaupré wrote:
    On 2025-07-02 21:26:21, Antoine Beaupré wrote:
    For now I'll be taking a look at the python version, wayback.

    So I've done that. I've audited the code and did some basic packaging
    (mostly thanks to py2dsp, but i did the job in full), and the result is
    here:

    https://salsa.debian.org/python-team/packages/waybackpack

    waybackpack works, but with one major limitation: it downloads only a
    single URL at a time, not an entire site.

    so I'm abandoning the packaging for now, anyone feel free to pick it up
    and just upload it if you feel like it, but i don't quite see the point myself.

    now i guess we need to find the right machine-downloader fork now...

    That seems to be:

    https://github.com/StrawberryMaster/wayback-machine-downloader

    To test it, I ran it from the Docker container image, by creating a user
    with `adduser --system archiver` then picking that UID and passing it as
    --user to the `docker-run` command, and then:

    sudo docker run --rm -it --user 1023 -v /srv/mirror/example.com:/srv ghcr.io/strawberrymaster/wayback-machine-downloader:latest -d /srv example.com -s --reset

    It works well, but i'm not familiar enough with Ruby packaging to follow through with the next steps.

    a.

    --
    There is no cloud, it's just someone else's computer.
    - Chris Watterson

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)