• Make a backup only if necessary

    From Anton Shepelev@21:1/5 to All on Mon May 27 12:59:29 2024
    Hello, all

    I am working on an in-house backup solution for our MSSQL
    databases, and have found that a differential backup may
    take considerable space (around 1 Mb) even if /no/ changes
    have been made to the database since the previous full
    backup. Is it possible somehow, before making another
    differential backup, to check whether it makes sense, i.e.
    whether the database has changed since the previous (full or
    differential) backup? In fact, I do not need a criterium,
    and a mere necessary condition will suffice.

    I want to make sure that my differential backups are not
    useless. One solution is to compare byte-for-byte each new
    diff. backup with the previous one, and delete the copy if
    it turn out redundant, but I wondered if there were a
    simpler method.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Mon May 27 13:53:02 2024
    I wrote

    I want to make sure that my differential backups are not
    useless. One solution is to compare byte-for-byte each
    new diff. backup with the previous one, and delete the
    copy if it turn out redundant, but I wondered if there
    were a simpler method.

    Nope -- no two differential backup files compare equal.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erland Sommarskog@21:1/5 to Anton Shepelev on Mon May 27 22:43:08 2024
    Anton Shepelev (anton.txt@g{oogle}mail.com) writes:
    I am working on an in-house backup solution for our MSSQL
    databases, and have found that a differential backup may
    take considerable space (around 1 Mb)

    I'm tempted to say that 1MB in size for a database back is very small.

    I want to make sure that my differential backups are not
    useless. One solution is to compare byte-for-byte each new
    diff. backup with the previous one, and delete the copy if
    it turn out redundant, but I wondered if there were a
    simpler method.

    To restore a database you need most recent full and most recent diff,
    so keeping both is not really necessary. Unless you would want to
    perform some sort of point-in-time restore. But normally, you use
    log backups for that, not diff backups.

    In any case, RESTORE HEADERONLY on the backup and checking last LSN
    seems to be like a tenable solution. You could also use sys.fn_dblog
    to find the last LSN - but keep in mind that this is an undocumented
    and unsupported function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Tue May 28 12:41:12 2024
    Erland Sommarskog to Anton Shepelev:

    I am working on an in-house backup solution for our
    MSSQL databases, and have found that a differential
    backup may take considerable space (around 1 Mb)

    I'm tempted to say that 1MB in size for a database back is
    very small.

    Bear in mind that it is a /differential/ backup made
    /immediately/ after a full backup. The backup is essitially
    empty and 1Mb is the size of boilerplate.

    I want to make sure that my differential backups are not
    useless. One solution is to compare byte-for-byte each
    new diff. backup with the previous one, and delete the
    copy if it turn out redundant, but I wondered if there
    were a simpler method.

    To restore a database you need most recent full and most
    recent diff, so keeping both is not really necessary.

    Right.

    Unless you would want to perform some sort of point-in-
    time restore.

    Indeed I do. And I might need to compare the last diff.
    backups in order to determine whether the DB has changed in
    between. If it has not, the files will have equal size and
    one of them be redundant.

    But normally, you use log backups for that, not diff
    backups.

    They are a tad more difficult than full and diff. ones, and
    whereas I need at most one backup per day, rather than, say,
    one each 15 minutes, I have decided to content myself with a
    combination of full and diff. backups. That way, I can keep
    my DBs in the simple recovery model.

    In any case, RESTORE HEADERONLY on the backup and checking
    last LSN seems to be like a tenable solution. You could
    also use sys.fn_dblog to find the last LSN -- but keep in
    mind that this is an undocumented and unsupported
    function.

    I believe exactly the same LSNs are available in the
    `backupset' table:

    <https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/backupset-transact-sql>

    But they are not what I thought there were. The `last_lsn'
    field seems always to be unique for each backup set, even
    for two consequtive diff. backups made within a second on an
    unused database.

    I belive that a non-trivial diff. backup may be detected by
    comparing its `first_lsn' to the `first_lsn' of the previous
    backup. They compare equal in all my redundant diff.
    backups. For example, the query:

    SELECT type, first_lsn, last_lsn
    FROM msdb.dbo.backupset
    WHERE database_name = 'my_db'
    ORDER BY backup_set_id

    Ends with:

    type first_lsn last_lsn
    --------------------------------------------
    I 362000000066900034 362000000068500001
    D 362000000068700037 362000000070400001
    I 363000000050400089 363000000054200001
    I 363000000054200001 363000000054500001
    I 363000000054500001 363000000054800001
    I 363000000054800001 363000000055100001
    I 363000000055100001 363000000055400001

    Thanks you for the hint, Erland.

    I still fail, however, to understand why `first_lsn' changes
    for each subsequent diff. backup. It is described as the
    "log sequence number of the first or oldest log record in
    the backup set." Since the last five diff. backups were
    generated with respect to the same full backup, I expected
    them to have the same first_lsn. Why do they not?

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erland Sommarskog@21:1/5 to Anton Shepelev on Tue May 28 20:11:54 2024
    Anton Shepelev (anton.txt@g{oogle}mail.com) writes:
    I still fail, however, to understand why `first_lsn' changes
    for each subsequent diff. backup. It is described as the
    "log sequence number of the first or oldest log record in
    the backup set." Since the last five diff. backups were
    generated with respect to the same full backup, I expected
    them to have the same first_lsn. Why do they not?


    Because the backup itself is logged.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Wed May 29 00:52:33 2024
    Erland Sommarskog to Anton Shepelev:

    I still fail, however, to understand why `first_lsn'
    changes for each subsequent diff. backup. It is
    described as the "log sequence number of the first or
    oldest log record in the backup set." Since the last
    five diff. backups were generated with respect to the
    same full backup, I expected them to have the same
    first_lsn. Why do they not?

    Because the backup itself is logged.

    Thank you. Now my system uses LSNs to avoid storing
    redundant differential backups, and I like the result. When
    I look at a "batch" of my backup files (one full and
    sequence of differential ones), instead of the typically
    reguarly spaced files I see only backups taken when the
    database was in flux, which is great.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Fri May 31 13:53:09 2024
    I wrote:

    Now my system uses LSNs to avoid storing redundant
    differential backups

    For now, I have decided upon detecting changes elibible for
    backup by the following query:

    SELECT TOP 1 1 FROM fn_dblog(NULL,NULL)
    WHERE operation IN
    ( 'LOP_INSERT_ROWS', 'LOP_MODIFY_ROW', 'LOP_DELETE_ROWS') AND
    AllocUnitName NOT LIKE 'sys.%'

    The output of `fn_dblog' is cleared after every full and
    diff backup, probably because the transaction log is
    cleared, although I don't know why since my recovery model
    is `simple' anyway, and it should not need a backup to be
    cleared...

    The major problem with this approach is that fn_dblog is an
    undocumented function apt to change or disappear any time.
    If anybody know how to perform an equivalent test using more
    stable and reliable method, I shall be grateful if they
    share it.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erland Sommarskog@21:1/5 to Anton Shepelev on Sat Jun 1 11:34:27 2024
    Anton Shepelev (anton.txt@g{oogle}mail.com) writes:
    The major problem with this approach is that fn_dblog is an
    undocumented function apt to change or disappear any time.
    If anybody know how to perform an equivalent test using more
    stable and reliable method, I shall be grateful if they
    share it.


    Since you are settling on a recovery strategy that few other people
    would consider, I'm afraid that you are quite much on your own.

    Beside the fact that fn_dblog being undocumented, you are also relying
    on that there are some rows left around in the log after a write operation. This is something that could change over time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Sun Jun 2 14:43:20 2024
    Erland Sommarskog:

    Since you are settling on a recovery strategy that few
    other people would consider, I'm afraid that you are quite
    much on your own.

    I believe my situation is not unique, when there are many
    infrequently used databases and no requirement of a
    15-minute backup granularity, so that keeping the simple
    recovery model and doing a backup /only/ when data has
    changed safes a log of resources.

    Beside the fact that fn_dblog being undocumented,

    Yet it is a function of long standing:

    <https://www.sqlskills.com/blogs/paul/?s=fn_dblog>

    and statiscitally the longer something has existsed, the
    longer is its expected reamaining lifespan. Back in 2010
    the function's author reassured users that rumours of its
    death were greately exagerrated:

    The Future -- fn_dblog() No More?
    <https://www.sqlskills.com/blogs/jonathan/an-xevent-a-day-22-of-31-the-future-fn_dblog-no-more-tracking-transaction-log-activity-in-denali/>

    you are also relying on that there are some rows left
    around in the log after a write operation. This is
    something that could change over time.

    Do you mean that in the future write operations may begin to
    clear the log?

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erland Sommarskog@21:1/5 to Anton Shepelev on Tue Jun 4 20:22:23 2024
    Anton Shepelev ([email protected]) writes:
    you are also relying on that there are some rows left
    around in the log after a write operation. This is
    something that could change over time.

    Do you mean that in the future write operations may begin to
    clear the log?


    Since the log is truncated on checkpoint, the log could look empty.

    Then again, if you track LSN:s you should be safe.


    I believe my situation is not unique, when there are many
    infrequently used databases and no requirement of a
    15-minute backup granularity, so that keeping the simple
    recovery model and doing a backup /only/ when data has
    changed safes a log of resources.


    Log of resources, hehe. :-)

    Then again, following the normal routines saves quite a bit of
    human resources.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Wed Jun 5 17:15:17 2024
    Erland Sommarskog:

    Since the log is truncated on checkpoint, the log could
    look empty.

    Indeed. I wish I knew how to track LSNs more reliably.

    Then again, if you track LSN:s you should be safe.

    Do you mean your suggestion to use `RESTORE HEADERONLY'? If
    so, I have already remarked that its FirstLSN LastLSN fields
    are also available from the msdb.backupset table.
    Unfortunately, tracking them produces a lot of false
    positives, because of all the event types, I need only those
    that refer to data changes, that is:

    LOP_INSERT_ROWS, LOP_MODIFY_ROW, LOP_DELETE_ROWS

    Other events should not trigger a backup.

    I believe my situation is not unique, when there are
    many infrequently used databases and no requirement of a
    15-minute backup granularity, so that keeping the simple
    recovery model and doing a backup /only/ when data has
    changed safes a log of resources.

    Log of resources, hehe. :-)

    Of course, so that nothing slips by.

    Then again, following the normal routines saves quite a
    bit of human resources.

    At the expense of increased boredom, and customisation.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Shepelev@21:1/5 to All on Wed Jun 5 19:37:50 2024
    I wrote in reply to Erland Sommarskog:

    Then again, following the normal routines saves quite a
    bit of human resources.

    At the expense of increased boredom, and customisation.

    And /decreased/ customisation.

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)