ZoneMinder 1.36.x locks up and throws kernal errors

Discussions related to the 1.36.x series of ZoneMinder
Post Reply
michaelbran
Posts: 2
Joined: Fri May 24, 2024 4:36 am

ZoneMinder 1.36.x locks up and throws kernal errors

Post by michaelbran »

Hello,
I've been running this ZM install for a month now. Today, I am getting FS & Kernal errors in my log. The system completely locks up and drives my CPU usage to max. I reboot the VM and it runs for fore ~12hrs before throwing these errors again.

Code: Select all

022-02-11T04:20:12-05:00 zonemindergpu kernel - - -  [24773.395961] EXT4-fs error (device sdb1): ext4_lookup:1590: inode #5679841: comm zmaudit.pl: iget: checksum invalid
...
2022-02-11T04:51:31-05:00 zonemindergpu kernel - - -  [26652.159322] CPU: 1 PID: 82 Comm: kswapd0 Tainted: P           O     4.15.0-167-generic #175-Ubuntu
...
2022-02-11T04:51:31-05:00 zonemindergpu kernel - - -  [26652.158900] BUG: unable to handle kernel paging request at 0000000000002008
2022-02-11T04:51:31-05:00 zonemindergpu kernel - - -  [26652.160015] RIP: dentry_unlink_inode+0x43/0xe0 RSP: ffffb44382013bf0
Then i ran fsck.

Code: Select all

jon@zonemindergpu:~$ sudo fsck -f /dev/sdb1
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 3398986/13107200 files (0.0% non-contiguous), 44220381/52428539 blocks
Any ideas what's going on here?
mikb
Posts: 623
Joined: Mon Mar 25, 2013 12:34 pm

Re: ZoneMinder 1.36.x locks up and throws kernal errors

Post by mikb »

"inode #5679841: "

I never like to see fs_errors of any type, gives me a nasty sinking feeling. The fact the fsck seems to not complain is not ideal. At least if it found and DID something, that might give you confidence.

First -- check the SMART data for /dev/sdb with to try and rule out an underlying hardware problem with the disk. Any sign of trouble, be ready with a new disk and your backups!

Code: Select all

smart -a /dev/sdb
Second: What is inode number 5679841? I run a script that does this twice a day ahead of backups, and files the results, on each file system :-

Code: Select all

cd /
find . -mount -printf "%i\t%p\n" > ~/inodes/inodes-root.txt
This means when I get an incomprehensible "inode 48272 ..." error, I can at LEAST turn it into a filename. Run the above into a temp file, then

Code: Select all

grep 5679841 your-temp-file.txt
If it is one particular corrupted file, you may gate lucky deleting/moving it out of the way. And the restoring the content to a new, fresh file (which will get a new inode number).

Note: It is unlikely to be ZM that is causing this, something has gone wrong in the filesystem -- either through hardware fault, or kernel-got-confused error, and ZM just happens to be the first thing that touched the affected files.

Post back if you find out more from the above.

EDIT: To add: mention of "kswapd0" makes me wonder -- do you have a swap FILE (rather than partition), and is the affected inode part of that? Trouble reading-writing your swap file would cause memory-based mayhem. Paging things out to disk and not being able to get them back when needed is pretty fatal.
michaelbran
Posts: 2
Joined: Fri May 24, 2024 4:36 am

Re: ZoneMinder 1.36.x locks up and throws kernal errors

Post by michaelbran »

"inode #5679841: "

I never like to see fs_errors of any type, gives me a nasty sinking feeling. The fact the fsck seems to not complain is not ideal. At least if it found and DID something, that might give you confidence.

First -- check the SMART data for /dev/sdb with to try and rule out an underlying hardware problem with the disk. Any sign of trouble, be ready with a new disk and your backups!

Code: Select all

smart -a /dev/sdb
Second: What is inode number 5679841? I run a script that does this twice a day ahead of backups, and files the results, on each file system :-

Code: Select all

cd /
find . -mount -printf "%i\t%p\n" > ~/inodes/inodes-root.txt
This means when I get an incomprehensible "inode 48272 ..." error, I can at LEAST turn it into a filename. Run the above into a temp file, then

Code: Select all

grep 5679841 your-temp-file.txt
If it is one particular corrupted file, you may gate lucky deleting/moving it out of the way. And the restoring the content to a new, fresh file (which will get a new inode number).
merge fruit
Note: It is unlikely to be ZM that is causing this, something has gone wrong in the filesystem -- either through hardware fault, or kernel-got-confused error, and ZM just happens to be the first thing that touched the affected files.

Post back if you find out more from the above.

EDIT: To add: mention of "kswapd0" makes me wonder -- do you have a swap FILE (rather than partition), and is the affected inode part of that? Trouble reading-writing your swap file would cause memory-based mayhem. Paging things out to disk and not being able to get them back when needed is pretty fatal.
Thanks for your answer. This is what I am looking for.
Post Reply