Grant Edwards wrote:
On 2024-09-03, Dale <[email protected]> wrote:I've seen that before too. I'm hoping not. I may shutdown my rig,
I was trying to re-emerge some packages. The ones I was working onIn my experience, that usually means failing RAM. I'd try running
failed with "internal compiler error: Segmentation fault" or similar
being the common reason for failing.
memtest86 for a day or two.
--
Grant
remove and reinstall the memory and then test it for a bit. May be a
bad connection. It has worked well for the past couple months tho.
Still, it is possible to either be a bad connection or just going bad.
Dang those memory sticks ain't cheap. o_~
Thanks. See if anyone else has any other ideas.
Dale
:-) :-)
I wonder how much fun getting this memory replaced is going to be. o_O
I wasn't planning to go to 128GBs yet but guess I am now.
On 2024-09-04, Dale <[email protected]> wrote:
I ordered another set of memory sticks. I figure I will have to send
them both back which means no memory at all. I wasn't planning to go to 128GBs yet but guess I am now. [...]
Good luck.
[…]
I plugged them in alongside the recently purchased pair. Wouldn't
work. Either pair of SIMMs worked fine by themselves, but the only way
I could get both pairs to work together was to drop the clock speed
down to about a third the speed they were supposed to support.
On 2024-09-04, Dale <[email protected]> wrote:
At one point, I looked for a set of four sticks of the memory. I
couldn't find any. They only come in sets of two. I read somewhere
that the mobo expects each pair to be matched.
Yep, that's definitely how it was supposed to work. I fully expected
my two (identically spec'ed) sets of two work. All the documentation I
could find said it should. It just didn't. :/
--
Grant
When I built this rig, I first booted the Gentoo Live boot image and
just played around a bit. Mostly to let the CPU grease settle in a
bit. Then I ran memtest through a whole test until it said it passed.
Only then did I start working on the install. The rig has ran without
issue until I noticed gkrellm temps were stuck. They wasn't updating as temps change. So, I closed gkrellm but then it wouldn't open again.
Ran it in a console and saw the error about missing module or
something. Then I tried to figure out that problem which lead to seg
fault errors. Well, that lead to the thread and the discovery of a bad memory stick. I check gkrellm often so it was most likely less than a
day. Could have been only hours. Knowing I check gkrellm often, it was likely only a matter of a couple hours or so. The only reason it might
have went longer, the CPU was mostly idle. I watch more often when the
CPU is busy, updates etc.
I've ran fsck before mounting on every file system so far. I ran it on
the OS file systems while booted from the Live image. The others I just
did before mounting. I realize this doesn't mean the files themselves
are OK but at least the file system under them is OK.
I'm not sure how
to know if any damage was done between when the memory stick failed and
when I started the repair process. I could find the ones I copied from
place to place and check them but other than watching every single
video, I'm not sure how to know if one is bad or not. So far,
thumbnails work. o_O
Some MoBos are more tolerant than others.
Regarding Dale's question, which has already been answered - yes, anything the
bad memory has touched is suspect of corruption. Without ECC RAM a dodgy module can cause a lot of damage before it is discovered.
Maybe that it only catches 1-bit errors, but Dale has more broken bits?
Or it could be Dale's kit is DDR4?
Am Wed, Sep 04, 2024 at 11:38:01PM +0100 schrieb Michael:
Some MoBos are more tolerant than others.
Regarding Dale's question, which has already been answered - yes, anything the bad memory has touched is suspect of corruption. Without ECC RAM a dodgy module can cause a lot of damage before it is discovered.
Actually I was wondering: DDR5 has built-in ECC. But that’s not the same as the server-grade stuff, because it all happens inside the module with no communication to the CPU or the OS. So what is the point of it if it still causes errors like in Dale’s case?
Maybe that it only catches 1-bit errors, but Dale has more broken bits?
Michael wrote:
On Thursday 5 September 2024 09:36:36 BST Dale wrote:
I've ran fsck before mounting on every file system so far. I ran it on
the OS file systems while booted from the Live image. The others I just >> did before mounting. I realize this doesn't mean the files themselves
are OK but at least the file system under them is OK.
This could put your mind mostly at rest, at least the OS structure is OK and the error was not running for too long.
That does help.
I'm not sure how
to know if any damage was done between when the memory stick failed and
when I started the repair process. I could find the ones I copied from
place to place and check them but other than watching every single
video, I'm not sure how to know if one is bad or not. So far,
thumbnails work. o_O
If you have a copy of these files on another machine, you can run rsync with --checksum. This will only (re)copy the file over if the checksum
is different.
I made my backups last weekend. I'm sure it was working fine then.
After all, it would have failed to compile packages if it was bad. I'm thinking about checking against that copy like you mentioned but I have
other files I've added since then. I figure if I remove the delete
option, that will solve that. It can't compare but it can leave them be.
Use rsync with:
--checksum
and
--dry-run
You can also run find to identify which files were changed during the period
you were running with the dodgy RAM. Thankfully you didn't run for too long
before you spotted it.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000
in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
Am Thu, Sep 05, 2024 at 06:30:54AM -0500 schrieb Dale:
Use rsync with:
--checksum
and
--dry-run
I suggest calculating a checksum file from your active files. Then you don’t
have to read the files over and over for each backup iteration you compare
it against.
You can also run find to identify which files were changed during the period you were running with the dodgy RAM. Thankfully you didn't run for too long before you spotted it.
This. No need to check everything you ever stored. Just the most recent stuff, or at maximum, since you got the new PC.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000
in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
I aqcuired the habit of writing checksum files in all my media directories such as music albums, tv series and such, whenever I create one such directory. That way even years later I can still check whether the files are intact. I actually experienced broken music files from time to time (mostly on the MicroSD card in my tablet). So with checksum files, I can verify
which file is bad and which (on another machine) is still good.
Michael wrote:
On Thursday 5 September 2024 19:55:56 BST Frank Steinmetzger wrote:
Am Thu, Sep 05, 2024 at 06:30:54AM -0500 schrieb Dale:
Use rsync with:
--checksum
and
--dry-run
I suggest calculating a checksum file from your active files. Then you
don’t have to read the files over and over for each backup iteration you >> compare it against.
You can also run find to identify which files were changed during the >>>> period you were running with the dodgy RAM. Thankfully you didn't run >>>> for too long before you spotted it.
This. No need to check everything you ever stored. Just the most recent
stuff, or at maximum, since you got the new PC.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000 >>> in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
I aqcuired the habit of writing checksum files in all my media
directories
such as music albums, tv series and such, whenever I create one such
directory. That way even years later I can still check whether the files >> are intact. I actually experienced broken music files from time to time
(mostly on the MicroSD card in my tablet). So with checksum files, I can >> verify which file is bad and which (on another machine) is still good.
There is also dm-verity for a more involved solution. I think for Dale something like this should work:
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
Someone more knowledgeable should be able to knock out some clever python script to do the same at speed.
I'll be honest here, on two points. I'd really like to be able to do
this but I have no idea where to or how to even start. My setup for
series type videos. In a parent directory, where I'd like a tool to
start, is about 600 directories. On a few occasions, there is another directory inside that one. That directory under the parent is the name
of the series. Sometimes I have a sub directory that has temp files;
new files I have yet to rename, considering replacing in the main series directory etc. I wouldn't mind having a file with a checksum for each
video in the top directory, and even one in the sub directory. As a
example.
TV_Series/
├── 77 Sunset Strip (1958)
│ └── torrent
├── Adam-12 (1968)
├── Airwolf (1984)
I got a part of the output of tree. The directory 'torrent' under 77
Sunset is temporary usually but sometimes a directory is there for
videos about the making of a video, history of it or something. What
I'd like, a program that would generate checksums for each file under
say 77 Sunset and it could skip or include the directory under it.
Might be best if I could switch it on or off. Obviously, I may not want
to do this for my whole system. I'd like to be able to target
directories. I have another large directory, lets say not a series but sometimes has remakes, that I'd also like to do. It is kinda set up
like the above, parent directory with a directory underneath and on
occasion one more under that.
One thing I worry about is not just memory problems, drive failure but
also just some random error or even bit rot. Some of these files are
rarely changed or even touched. I'd like a way to detect problems and
there may even be a software tool that does this with some setup,
reminds me of Kbackup where you can select what to backup or leave out
on a directory or even individual file level.
While this could likely be done with a script of some kind, my scripting skills are minimum at best, I suspect there is software out there
somewhere that can do this. I have no idea what or where it could be
tho. Given my lack of scripting skills, I'd be afraid I'd do something
bad and it delete files or something. O_O LOL
I been watching videos again, those I was watching during the time the
memory was bad. I've replaced three so far. I think I noticed this
within a few hours. Then it took a little while for me to figure out
the problem and shutdown to run the memtest. I doubt many files were affected unless it does something we don't know about. I do plan to try
to use rsync checksum and dryrun when I get back up and running. Also,
QB is finding a lot of its files are fine as well. It's still
rechecking them. It's a lot of files.
Right now, I suspect my backup copy is likely better than my main copy.
Once I get the memory in and can really run some software, then I'll run rsync with those compare options and see what it says. I just got to remember to reverse things. Backup is the source not the destination.
If this works, I may run that each time, help detect problems maybe.
Maybe??
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
Someone more knowledgeable should be able to knock out some clever python script to do the same at speed.
I'll be honest here, on two points. I'd really like to be able to do
this but I have no idea where to or how to even start. My setup for
series type videos. In a parent directory, where I'd like a tool to
start, is about 600 directories. On a few occasions, there is another directory inside that one. That directory under the parent is the name
of the series.
Sometimes I have a sub directory that has temp files;
new files I have yet to rename, considering replacing in the main series directory etc. I wouldn't mind having a file with a checksum for each video in the top directory, and even one in the sub directory. As a example.
TV_Series/
├── 77 Sunset Strip (1958)
│ └── torrent
├── Adam-12 (1968)
├── Airwolf (1984)
What
I'd like, a program that would generate checksums for each file under
say 77 Sunset and it could skip or include the directory under it.
Might be best if I could switch it on or off. Obviously, I may not want
to do this for my whole system. I'd like to be able to target
directories. I have another large directory, lets say not a series but sometimes has remakes, that I'd also like to do. It is kinda set up
like the above, parent directory with a directory underneath and on occasion one more under that.
As an example, let's assume you have the following fs tree:
VIDEO
├──TV_Series/
| ├── 77 Sunset Strip (1958)
| │ └── torrent
| ├── Adam-12 (1968)
| ├── Airwolf (1984)
|
├──Documentaries
├──Films
├──etc.
You could run:
$ find VIDEO -type f | xargs md5sum > digest.log
The file digest.log will contain md5sum hashes of each of your files within the VIDEO directory and its subdirectories.
To check if any of these files have changed, become corrupted, etc. you can run:
$ md5sum -c digest.log | grep FAILED
If you want to compare the contents of the same VIDEO directory on a back up,
you can copy the same digest file with its hashes over to the backup top directory and run again:
$ md5sum -c digest.log | grep FAILED
One thing I worry about is not just memory problems, drive failure but
also just some random error or even bit rot. Some of these files are rarely changed or even touched. I'd like a way to detect problems and there may even be a software tool that does this with some setup,
reminds me of Kbackup where you can select what to backup or leave out
on a directory or even individual file level.
Right now, I suspect my backup copy is likely better than my main copy.
This should work in rsync terms:
rsync -v --checksum --delete --recursive --dry-run SOURCE/ DESTINATION
It will output a list of files which have been deleted from the SOURCE and will need to be deleted at the DESTINATION directory.
Update. New memory sticks i bought came in today. I ran memtest from
Gentoo Live boot media and it passed. Of course, the last pair passed
when new too so let's hope this one lasts longer. Much longer.
Am Fri, Sep 06, 2024 at 01:21:20PM +0100 schrieb Michael:
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
I had a quick look at the manpage: with md5sum --quiet you can omit the grep part.
Someone more knowledgeable should be able to knock out some clever python
script to do the same at speed.
And that is exactly what I have written for myself over the last 11 years. I call it dh (short for dirhash). As I described in the previous mail, I use
it to create one hash files per directory. But it also supports one hash
file per data file and – a rather new feature – one hash file at the root of a tree. Have a look here: https://github.com/felf/dh
Clone the repo or simply download the one file and put it into your path.
On Friday 6 September 2024 22:41:33 BST Frank Steinmetzger wrote:
Someone more knowledgeable should be able to knock out some clever python
script to do the same at speed.
And that is exactly what I have written for myself over the last 11 years. I
call it dh (short for dirhash). As I described in the previous mail, I use it to create one hash files per directory. But it also supports one hash file per data file and – a rather new feature – one hash file at the root
of a tree. Have a look here: https://github.com/felf/dh
Clone the repo or simply download the one file and put it into your path.
Nice! I've tested it briefly here. You've put quite some effort into this.
Thank you Frank!
Probably not your use case, but I wonder how it can be used to compare SOURCE
to DESTINATION where SOURCE is the original fs and DESTINATION is some backup,
without having to copy over manually all different directory/subdirectory Checksums.md5 files.
I suppose rsync can be used for the comparison to a backup fs anyway, your script would be duplicating a function unnecessarily.
There is also dm-verity for a more involved solution. I think for Dale something like this should work:
I've seen that before too. I'm hoping not. I may shutdown my rig,
remove and reinstall the memory and then test it for a bit. May be a
bad connection. It has worked well for the past couple months tho.
Still, it is possible to either be a bad connection or just going bad.
On 05/09/2024 23:06, Michael wrote:
There is also dm-verity for a more involved solution. I think for Dale something like this should work:Snag is, I think dm-verity (or do you actually mean dm-integrity, which
is what I use) merely checks that what you read from disk is what you
wrote to disk. If the ram corrupted it before it was written, I don't
think either of them will detect it.
Cheers,
Wol
On 04/09/2024 01:39, Dale wrote:
I've seen that before too. I'm hoping not. I may shutdown my rig,
remove and reinstall the memory and then test it for a bit. May be a
bad connection. It has worked well for the past couple months tho.
Still, it is possible to either be a bad connection or just going bad.
I've had *MOST* of my self-built systems force me to remove and replace
the ram several times before the system was happy.
And when a shop "fixed" my computer for me (replacing a mobo that wasn't broken - I told them I thought it needed a bios upgrade and I was
right!) they also messed up the ram. Memory is supposed to go in in
matched pairs. So what do they do? One stick in each pair of slots - the thing ran like a sloth on tranquillisers!
As soon as I realised what
they'd done and put both sticks in the same pair, it was MUCH faster.
Cheers,
Wol
Wols Lists wrote:
On 04/09/2024 01:39, Dale wrote:
I've seen that before too. I'm hoping not. I may shutdown my rig,
remove and reinstall the memory and then test it for a bit. May be a
bad connection. It has worked well for the past couple months tho.
Still, it is possible to either be a bad connection or just going bad.
I've had *MOST* of my self-built systems force me to remove and
replace the ram several times before the system was happy.
And when a shop "fixed" my computer for me (replacing a mobo that
wasn't broken - I told them I thought it needed a bios upgrade and I
was right!) they also messed up the ram. Memory is supposed to go in
in matched pairs. So what do they do? One stick in each pair of slots
- the thing ran like a sloth on tranquillisers! As soon as I realised
what they'd done and put both sticks in the same pair, it was MUCH
faster.
Cheers,
Wol
I noticed on the set I had to return, the serial numbers were in
sequence. One was right after the other. I don't know if that makes
them a matched set or if they run some test to match them.
From my understanding tho, each 'bank' or pair has to be a matched set.
I did finally find a set of four but it is a different brand. From what
I read to tho, ASUS trains itself each time you boot up. It finds the
best setting for each set of memory. It does say that it is usually set
to a slower speed tho when all four are installed.
Just have to wait
and see I guess. Oh, when I boot the first couple times with new
memory, it takes quite a bit longer on the BIOS boot screen. After a
couple times, it doesn't seem to take so long. Not sure what, but it
does something.
The placement of DIMMs depends on the MoBo, its manual would show in which slot should DIMM modules be added and the (maximum) size of each stick the MoBo can cope with. Normally OEMs provide a list of tested memory brands and models for their MoBos (QVL) and it is recommended to buy something on the list, rather than improvise.
I was running the command again and when I was checking on it, it
stopped with this error.
File "/root/dh", line 1209, in <module>
main()
File "/root/dh", line 1184, in main
directory_hash(dir_path, '', dir_files, checksums)
File "/root/dh", line 1007, in directory_hash
os.path.basename(old_sums[filename][1]) ~~~~~~~~^^^^^^^^^^
KeyError: 'Some Video.mp4'
I was doing a second run because I updated some files. So, it was
skipping some and creating new for some new ones. This is the command I
was running, which may not be the best way.
/root/dh -c -f -F 1Checksums.md5 -v
Also, what is the best way to handle this type of situation. Let's say
I have a set of videos. Later on I get a better set of videos, higher resolution or something. I copy those to a temporary directory then use your dmv script from a while back to replace the old files with the new
files but with identical names. Thing is, file is different, sometimes
a lot different. What is the best way to get it to update the checksums
for the changed files? Is the command above correct?
I'm sometimes pretty good at finding software bugs. But hey, it just
makes your software better. ;-)
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 154:53:31 |
| Calls: | 12,092 |
| Files: | 15,000 |
| Messages: | 6,517,690 |