Forum: >>> Magnum BBS <<<

archiver for a rolling history of a file with sparse changes

From Phil Carmody@21:1/5 to All on Thu Aug 15 17:50:22 2019

I'm generating a couple of megs of (html) data per day and the data
really doesn't change that much from day to day. Is there an archiver
which will store the complete history of a file, taking advantage of
the knowledge of the previous contents the file?

I was hoping ZPAQ would do the job, as it's designed to archive files'
full histories, but I'm convinced it doesn't use this knowledge.
e.g. one 1.2MB file has a diff from day to day of ~200-600KB, of which
half is removed stuff, so effectively noise, so about ~100-300KB of new
data. Every subsequent day's changes I've added to the ZPAQ archive has expanded it by almost exactly the same size as it was from the 1st day.
I'm sure deltas that are 1/10-1/4 of the size should be compressed to
1/10-1/4 of the size, as they're effectively the same type of data.

Any ideas what would be a suitable program to use?

FOSS on linux preferred, happy to compile from source.

Phil
--
We are no longer hunters and nomads. No longer awed and frightened, as we have gained some understanding of the world in which we live. As such, we can cast aside childish remnants from the dawn of our civilization.
-- NotSanguine on SoylentNews, after Eugen Weber in /The Western Tradition/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Phil Carmody on Thu Aug 15 10:55:57 2019

Phil Carmody <[email protected]> writes:

I'm generating a couple of megs of (html) data per day and the data
really doesn't change that much from day to day. Is there an archiver
which will store the complete history of a file, taking advantage of
the knowledge of the previous contents the file?

Any decent source control system (Git, CVS, RCS, etc.) should do the
job. CVS or RCS would do fairly well if the changes can be represented compactly as line-oriented diffs.

--
Keith Thompson (The_Other_Keith) [email protected] <http://www.ghoti.net/~kst> Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phil Carmody@21:1/5 to Keith Thompson on Mon Aug 19 18:59:48 2019

Keith Thompson <[email protected]> writes:

Phil Carmody <[email protected]> writes:

I'm generating a couple of megs of (html) data per day and the data
really doesn't change that much from day to day. Is there an archiver
which will store the complete history of a file, taking advantage of
the knowledge of the previous contents the file?

Any decent source control system (Git, CVS, RCS, etc.) should do the
job. CVS or RCS would do fairly well if the changes can be represented compactly as line-oriented diffs.

I'm giving git a go, and after the occasional git gc it does seem to be
neck and neck with zpaq -m4, but I -m5 would clearly beat it. zpaq's not
able to make use of any similarities between the new and the old files,
when I change the fragment parameter for deduplication, compression gets
worse. A quick comparison of zipping up a set of hand-mangled (to remove anything I know shouldn't be necessary to reproduce each version)
patches is worse than what git does internally, so is probably a dead
end. I'll keep both the zpaq and git running daily until one appears to
be a clear winner.

Cheers,
Phil
--
We are no longer hunters and nomads. No longer awed and frightened, as we have gained some understanding of the world in which we live. As such, we can cast aside childish remnants from the dawn of our civilization.
-- NotSanguine on SoylentNews, after Eugen Weber in /The Western Tradition/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Fri Jul 31 19:41:16 2026
  from Madison, Nc via Telnet
- Rixter
  Fri Jul 31 19:29:50 2026
  from Madison, Nc via Telnet
- Rixter
  Fri Jul 31 19:18:30 2026
  from Madison, Nc via Telnet
- Bob Worm
  Fri Jul 31 15:23:30 2026
  from Wales, Uk via Telnet
- Rixter
  Fri Jul 31 12:17:09 2026
  from Madison, Nc via Telnet
- Krenn
  Fri Jul 31 10:41:58 2026
  from Sydney, Nsw via Telnet
- Krenn
  Fri Jul 31 10:34:35 2026
  from Sydney, Nsw via Telnet
- Shift
  Fri Jul 31 06:46:34 2026
  from Leeds, England via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	121:22:15
Calls:	12,468
Calls today:	10
Files:	15,200
Messages:	6,538,317

archiver for a rolling history of a file with sparse changes

Who's Online

Recent Visitors

System Info