So what can I do to fix this, while still
keeping my history, cookies, tabs, etc?
So.� This time, while the backup was in process, I mounted /home
read-only to check something out.� Apparently that's not good enough to
keep the filesystem intact,
because at the end when I resumed, several
things in $HOME didn't work right.
So what can I do to fix this, while still keeping my history, cookies,
tabs, etc?
So.� This time, while the backup was in process, I mounted /home
read-only to check something out.� Apparently that's not good enough to
keep the filesystem intact, because at the end when I resumed, several
things in $HOME didn't work right.
Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated
drive.� The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way.� After the backup
has completed, I resume and it's right where I left off.
Eben King <[email protected]> writes:
Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated drive. The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way. After the backup
has completed, I resume and it's right where I left off.
So you actually back up the hibernated / partition? Is that really a
sound backup strategy? Seems risky to me but if it works I guess it's
fine.
Eben King <[email protected]> writes:
Normally I hibernate, and while it's hibernated, boot off a thumb driveSo you actually back up the hibernated / partition? Is that really a
and back up (either by partition or the whole drive) to a dedicated
drive. The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way. After the backup
has completed, I resume and it's right where I left off.
sound backup strategy? Seems risky to me but if it works I guess it's
fine.
I backed up my system on Saturday (yesterday), and pulled a stupid.
I'll explain.
Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated
drive. The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way. After the backup
has completed, I resume and it's right where I left off.
So. This time, while the backup was in process, I mounted /home
read-only to check something out. Apparently that's not good enough to
keep the filesystem intact, because at the end when I resumed, several
things in $HOME didn't work right. E.G., some widgets in the panel were misconfigured, T-bird had lost its configuration, and Firefox plugins
didn't work. I fixed the panel widgets, T-bird appears to be *mostly*
fixed (we'll see if this gets sent as text), but the Firefox plugins are still a mess. The Noscript icon shows up in the row with the hamburger menu, but doesn't show up in Tools → Addons & Themes. Adblocker for Youtube, I'm not sure it's doing anything because I still see ads before
a video and I didn't before. So what can I do to fix this, while still keeping my history, cookies, tabs, etc?
Even mounted read-only, the driver will replay the journal and resolve
any outstanding actions--but the hibernated system doesn't know that,
and will proceed without taking any changes into account. You could have mounted with "-o ro,norecovery" which will prevent the journal replay
and make the mount truly read-only. Your best bet at this point is to
force an fsck to at least ensure that the filesystem is consistent, but
if there was any data corruption that won't uncorrupt it. To be certain
that all the data is ok you'll have to restore to the last backup made before this happened.
On 3/2/25 12:03, Charles Curley wrote:
On Sun, 2 Mar 2025 10:49:41 -0500
Eben King <[email protected]> wrote:
...
How did you do the backup? Per file (e.g. rsnapshot or amanda), or per
block device (e.g. dd if=/dev/sda1 of=…)?
Block device. If I've resize / moved a partition since the last backup
I'll do a full (dd if=/dev/sda of=/dev/sdc), if not I'll do it by
partition (dd if=/dev/sdaX of=/dev/sdcX). Since only maybe 70% of the
drive is in partitions, it's faster that way.
I just shut down and booted off a thumb drive, and ran "fsck -f" on each ext4 partition. Most had no errors, but a few had "this inode is too
wide" errors. There may have been one or two others. Anyhow, I'll
check again in a few days and see if those (or other) errors recur.
If you know the relevant files and have a per file backup, restoring
them should be a matter of selecting the correct file.
The most obnoxious errors are definitely under ~/.mozilla/firefox and probably ~/.mozilla/firefox/<profile>/extension* .
I am assuming that the corruption is all in /home and that it is
included in your backups. If either one of those is false, you may be
in deep yogurt.
Yeah, I don't know how much I trust the backup of /home right now.
eben@cerberus:/$ sudo smartctl -a /dev/sdawww.smartmontools.org
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke,
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5 (SMR)
Device Model: ST2000DM008-2UB102
...
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail
Always - 146369262
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 723
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 084 060 045 Pre-fail
Always - 232382570
9 Power_On_Hours 0x0032 093 093 000 Old_age
Always - 6346h+20m+46.297s
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 541
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age
Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 060 053 040 Old_age
Always - 40 (Min/Max 27/40)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 155
193 Load_Cycle_Count 0x0032 100 100 000 Old_age
Always - 1040
194 Temperature_Celsius 0x0022 040 047 000 Old_age
Always - 40 (0 25 0 0 0)
195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age
Always - 146369262
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 6320h+12m+26.797s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 18657342320
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 92379620242
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Anssi Saari <[email protected]> wrote:
Eben King <[email protected]> writes:
Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated
drive. The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way. After the backup
has completed, I resume and it's right where I left off.
So you actually back up the hibernated / partition? Is that really a
sound backup strategy? Seems risky to me but if it works I guess it's
fine.
With modern systems booting so fast I wonder why anyone bothers with hibernate or sleep.
I smell a rat. I wonder if the corruption is because your hard
drive is failing. I would first boot to a live CD and run smartctl
tests on it.
At the end of this message.
How did you do the backup? Per file (e.g. rsnapshot or amanda), or
per block device (e.g. dd if=/dev/sda1 of=…)?
Block device. If I've resize / moved a partition since the last
backup I'll do a full (dd if=/dev/sda of=/dev/sdc), if not I'll do it
by partition (dd if=/dev/sdaX of=/dev/sdcX). Since only maybe 70% of
the drive is in partitions, it's faster that way.
If you know the relevant files and have a per file backup, restoring
them should be a matter of selecting the correct file.
The most obnoxious errors are definitely under ~/.mozilla/firefox and probably ~/.mozilla/firefox/<profile>/extension* .
I am assuming that the corruption is all in /home and that it is
included in your backups. If either one of those is false, you may
be in deep yogurt.
Yeah, I don't know how much I trust the backup of /home right now.
Anssi Saari <[email protected]> wrote:
Eben King <[email protected]> writes:
Normally I hibernate, and while it's hibernated, boot off a thumb drive and back up (either by partition or the whole drive) to a dedicated drive. The idea is if my main drive takes a dump, I could replace it with the backup drive, boot, and be on my merry way. After the backup has completed, I resume and it's right where I left off.
So you actually back up the hibernated / partition? Is that really a
sound backup strategy? Seems risky to me but if it works I guess it's
fine.
With modern systems booting so fast I wonder why anyone bothers with hibernate or sleep.
--
Chris Green
·
With modern systems booting so fast I wonder why anyone bothers with hibernate or sleep.
[...]
ID# ATTRIBUTE_NAME��������� FLAG���� VALUE WORST THRESH TYPE UPDATED� WHEN_FAILED RAW_VALUE
� 1 Raw_Read_Error_Rate���� 0x000f�� 082�� 064�� 006��� Pre-fail
Always������ -������ 146369262
� 7 Seek_Error_Rate�������� 0x000f�� 084�� 060�� 045��� Pre-fail
Always������ -������ 232382570
� 9 Power_On_Hours��������� 0x0032�� 093�� 093�� 000��� Old_age
Always���� -������ 6346h+20m+46.297s
195 Hardware_ECC_Recovered� 0x001a�� 082�� 064�� 000��� Old_age
Always����� -������ 146369262
Well, at least those read errors were all corrected ;)
None of the first three bits are absolute proof that the drive is going,
but they're certainly cause for suspicion.
On 3/3/25 05:03, Dan Purgert wrote:
Well, at least those read errors were all corrected ;)
None of the first three bits are absolute proof that the drive is going, but they're certainly cause for suspicion.
Is there a way of seeing how many spare blocks are left?
On Mar 02, 2025, Eben King wrote:
[...]
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail
Always - 146369262
146 million read-errors.
7 Seek_Error_Rate 0x000f 084 060 045 Pre-fail
Always - 232382570
230 million seek errors
9 Power_On_Hours 0x0032 093 093 000 Old_age
Always - 6346h+20m+46.297s
~9 years on-time
195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age
Always - 146369262
Well, at least those read errors were all corrected ;)
None of the first three bits are absolute proof that the drive is going,
but they're certainly cause for suspicion.
I thought the whole point of running the SMART tests was to detect a failing disk, so color me confused.
I thought the whole point of running the SMART tests was to detect
a failing disk, so color me confused.
In this particular case I do not think it is a drive failure. I suspect mounting /home was a mistake.
Booting a live image may destroy hibernation data since live system may
mount the same swap partition and reinitialize it. Eben, have you find a reliable way to avoid swap reuse?
My impression is that some device drivers do not save complete state
during hibernation, so booting other OS may cause some problems after resuming.
I considered it as a reason why Windows does not allow booting
other OS when it is hibernated (at least with BIOS namely shutdown was required, I am unsure concerning UEFI).
I have always wondered if the decimal numbers in the smartctl(8)
"RAW_VALUE" column are actual event counts, or a decimal
representation of some binary bit field whose correct interpretation
only the manufacturer knows (?).
I typically look at the "VALUE" column, as it usually represents an
integer percentage that starts at 100 (%) and diminishes towards 0 (%)
as the drive degrades. That said, I have seen starting values of 120,
200, and possibly 255 (?). So, again, only the manufacturer knows.
On Mon, 3 Mar 2025 at 10:03, Dan Purgert <[email protected]> wrote:
On Mar 02, 2025, Eben King wrote:
[...]
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail
Always - 146369262
146 million read-errors.
7 Seek_Error_Rate 0x000f 084 060 045 Pre-fail
Always - 232382570
230 million seek errors
9 Power_On_Hours 0x0032 093 093 000 Old_age
Always - 6346h+20m+46.297s
~9 years on-time
195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age
Always - 146369262
Well, at least those read errors were all corrected ;)
None of the first three bits are absolute proof that the drive is going, but they're certainly cause for suspicion.
I see no cause for concern in that data.
The wikipedia page [1] regarding "1 Raw_Read_Error_Rate" says:
The raw value has different structure for different vendors and is often
not meaningful as a decimal number. For some drives, this number
may increase during normal operation without necessarily signifying errors.
[...]
Why do you write that the "9 Power_On_Hours" data represents
"~9 years on-time"? It looks to me that it says 6346 hours.
There are 365 * 24 = 8760 hours / year.
So 6346 hours is less than one year.
So you actually back up the hibernated / partition? Is that really a
sound backup strategy?
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk". I must remember that land mine if and when I want to do forensic work.
At this point, I am uncertain if the /home ext4 file systems are correct
on either the OS disc or the copied image disc (?).
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk".
I must
remember that land mine if and when I want to do forensic work.
eben@cerberus:/$ sudo smartctl -a /dev/sdabuild)
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local
Copyright (C) 2002-22, Bruce Allen, Christian Franke,www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5 (SMR)
Device Model: ST2000DM008-2UB102
AIUI SMR does not work well for OS (e.g. /tmp, swap) and general-purpose (e.g. /home) disks that see frequent small random write workloads. I
prefer small high-quality 2.5" SSD's (Intel SSD 520 Series 60 GB) for my
OS and /home disks, and put my bulk data on a file server. I would re- purpose that HDD for images -- CMR should be okay for large sequential
write workloads.
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Are you running tests periodically?
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk".
Yeah, that's what I thought too.
Note that, depending on the filesystem type, state and kernelThis doesn't seem like "readonly does the wrong thing" so much as "you
behavior, the system may still write to the device. For example,
ext3 and ext4 will replay the journal if the filesystem is dirty. To
prevent this kind of write access, you may want to mount an ext3 or
ext4 filesystem with the ro,noload mount options or set the block
device itself to read-only mode, see the blockdev(8) command.
On 3/2/25 14:35, David Christensen wrote:
AIUI SMR does not work well for OS [...]
?MR=Shielded / Conventional Magnetic Recording? How do I tell a priori
which drives are SMR and which are CMR?
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5 (SMR)
Device Model: ST2000DM008-2UB102
AIUI SMR does not work well for OS (e.g. /tmp, swap) and general-purpose (e.g. /home) disks that see frequent small random write workloads. I prefer small high-quality 2.5" SSD's (Intel SSD 520 Series 60 GB) for my
OS and /home disks, and put my bulk data on a file server. I would re- purpose that HDD for images -- CMR should be okay for large sequential write workloads.
?MR=Shielded / Conventional Magnetic Recording? How do I tell a priori
which drives are SMR and which are CMR?
On 3/2/25 2:35 PM, David Christensen wrote:
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk". I
must remember that land mine if and when I want to do forensic work.
To be fair, the first step of forensic work is "make an image of the
drive and save it somewhere read-only." This way if you attempt to
mount the image without norecovery, it barks at you because the
underlying medium is read-only.
You then work either with copies of the image. (Or thin layered images using the original as a backing image, which will redirect writes to the higher layer, leaving the original image untouched. Semantically the
same as making a copy but without wasting a bunch of space.)
On 3/2/25 14:35, David Christensen wrote:
AIUI SMR does not work well for OS (e.g. /tmp, swap) and general-purpose
(e.g. /home) disks that see frequent small random write workloads. I
prefer small high-quality 2.5" SSD's (Intel SSD 520 Series 60 GB) for my
OS and /home disks, and put my bulk data on a file server. I would re-
purpose that HDD for images -- CMR should be okay for large sequential
write workloads.
?MR=Shielded / Conventional Magnetic Recording? How do I tell a priori which drives are SMR and which are CMR?
SMART Self-test log structure revision number 1Are you running tests periodically?
No self-tests have been logged. [To run self-tests, use: smartctl -t] >>
I haven't been, but perhaps I should add that to my after-backup routine.
On 3/9/25 9:26 AM, Eben King wrote:
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk".
Yeah, that's what I thought too.
"readonly" means "don't allow the contents of the filesystem to be
changed," e.g. attempts to alter files by userspace programs are
rejected. It doesn't mean the kernel won't write to the device.
mount(8) even documents this explicitly:
Note that, depending on the filesystem type, state and kernelThis doesn't seem like "readonly does the wrong thing" so much as "you
behavior, the system may still write to the device. For example,
ext3 and ext4 will replay the journal if the filesystem is dirty. To
prevent this kind of write access, you may want to mount an ext3 or
ext4 filesystem with the ro,noload mount options or set the block
device itself to read-only mode, see the blockdev(8) command.
should know what things do before you use them."
I have glanced at smartd(8), but have yet to try it because it seems
to prefer sending reports via e-mail (?).
I have glanced at smartd(8), but have yet to try it because it seems
to prefer sending reports via e-mail (?). I have yet to figure out
how fetch root mail messages from my daily driver mail client
(Thunderbird). My WAG is that I need to install an IMAP server on
each machine whose root mail I want to read (?).
On Sun, Mar 09, 2025 at 12:04:10PM -0700, David Christensen wrote:
I have glanced at smartd(8), but have yet to try it because it seems
to prefer sending reports via e-mail (?).
It's highly configurable. It also logs to syslog, and mails can be
disabled entirely or replaced by some other scripted function.
On Sun, 9 Mar 2025 12:04:10 -0700
David Christensen <[email protected]> wrote:
I have glanced at smartd(8), but have yet to try it because it seems
to prefer sending reports via e-mail (?). I have yet to figure out
how fetch root mail messages from my daily driver mail client
(Thunderbird). My WAG is that I need to install an IMAP server on
each machine whose root mail I want to read (?).
What I do is use postfix to send all of my machines' emails to one user
on one machine. I have an imap server (dovecot) for that user, and an appropriate account in clawsmail. Once you get it set up it works
nicely.
The only exception to that is laptops and their virtual machines; their
mail goes to the laptop.
Do you know the URL of a "howto" document that describes how to set
that up?
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 146:11:22 |
| Calls: | 12,089 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,501 |