New ZM Install on Debian - Continuous rc Errors in syslog

Ruler · Post by **Ruler** » Mon Feb 19, 2024 8:42 pm

mikb wrote: ↑Sun Feb 18, 2024 7:20 pm Just a theory, but maybe things quitting/exitting is a sign of something fundamentally wrong -- so wrong, in fact, that syslogd/klogd or the equivalent part of systemd ... is exitting too. This means that the normal path of messages (that would be captured, filtered and filed according to your rules) is gone.

Kernel is still running and still has stuff to say. So, last resort, spit them onto the physical console.

"Remote logging" might turn into a camera pointed at the monitor at this rate

I'd be willing to accept that there's something wrong at a system level, except it runs for weeks at a time without trouble between this happening. If something were so fundamentally flawed, don't you think it would show up sooner?

Problem with pointing a camera at the screen is that this system is connected via a KVM & the vast majority of the time, my PC is what's shown on the screen. I'd also have to set up another ZoneMinder system / camera, which at this time I'm somewhat doubting my ability to make it reliable. (Really never thought I'd think, much less say, something like that...

) Might be able to find a gizmo to clone the display to another monitor & set up another screen for the camera to watch; will keep it in mind if the remote logging isn't helpful.

Searches finally uncovered someone else getting these same type of error messages, only with Arch at https://bbs.archlinux.org/viewtopic.php?id=241005 - he was having problems with his system not resuming from hibernation. Doesn't appear that these messages caused any issues. Solution ended up being to unload the sound modules, hibernate, then reload them after the system came back up. Wayback Machine reveals that the dead link in that thread was specifically about hibernation problems / troubleshooting and not really applicable in my circumstances.

An old machine is currently installing for remote logging as I write this. (Never configured remote logging before, so it'll be interesting.) Here's the local syslog from when it died the night of February 15. It's complaining about the missing events, then there's a chunk of binary data in the log when it died, then nothing until I forcibly rebooted it on the 16th when it shows the boot messages & starts complaining about missing events again. Am I missing something?

PasteBin Link because it's too big for the forum, Password is ZoneMinder - https://pastebin.com/JziwFUWs

(Something odd I noticed when chopping up the original 7.8 gig syslog is that the first 4 inputs [0-3] are being forced to type 42 as they should be from the module configuration, but the last 4 [4-7] are autodetected as type 98. The inputs seem to work... wonder if this might be the cause of something.

)

I really appreciate the pointers and feedback... trying to wrap my head around this and come up with a plausible theory that fits the observed behavior is sending my brain into an uncontrolled recursion loop.

Ruler · Post by **Ruler** » Tue Feb 20, 2024 5:15 pm

Well, I have good news and bad news. I'm going to document what I found & did for the benefit of anyone who might have similar issues in the future and find this thread.

This was uncovered while digging into the whole card 42 vs card 98 thing. I discovered that if I pass card=42,42,42,42,42,42,42,42 to the bttv module, the logs are spammed with 'rc rc4: error -6' through 'rc rc7: error -6'! This seems to confirm that what iconnor indicated - that these messages are coming from the capture card driver - is accurate. It also strongly suggests that mikb's hunch was right and that these errors are connected to the capture cards. The thread on the arch linux forum I found also makes sense if you consider that the person there was working on a DVR system with video capture cards in it. (Not improbable that he'd have the wrong card type in his config.) Further, it suggested a solution...

Because the cards were being auto-detected as type 98 and the 4 monitors using card inputs that were being autodetected worked, I simply removed the /etc/modprobe.d/bttv.conf file and rebooted, thinking that it would automatically detect all 8 tuners as card #98. Unfortunately, ZoneMinder wouldn't work with this configuration - the monitors for /dev/video0 and /dev/video3 on both channels 0 and 1 showed green and capturing, but all the rest of them showed red. Even those that showed green & capturing didn't have a picture, so I moved the bttv.conf file back into /etc/modprobe.d & changed it to be card=98,98,98,98,98,98,98,98 (Even though the manual that came with these cards clearly says that it's supposed to be type 42 & that's what our old server ran with for nearly 20 years, I figured it was worth a try.) The 'rc rc[0-3]: error -6' errors disappeared, further confirming that using card type 42 for the bttv module was the source of these errors.

Unfortunately, ZoneMinder still didn't work, even after rebooting, vehemently swearing at it, and threatening the machine. (Don't laugh - the last 2 are viable troubleshooting techniques which work a surprising amount of the time.

) I changed the bttv configuration back to card type 42, added 4 more log suppression rules, and rebooted... and it still refused to work. Discovered that in one of the numerous reboots, the BIOS swapped the device nodes of the two drives - the SSD boot drive was now /dev/sdb & the HDD storage drive was /dev/sda. (References to the UUID in /etc/fstab obviously didn't work anymore after I re-created the new file system with a bigger inode area, so I had just changed it to reference /dev/sdb1 instead of the UUID. Never seen a system change device nodes like that before...) Looked up how to get the UUID of the file system (blkid /dev/sda1) and changed fstab to reference the UUID instead of the device node. Rebooted, corrected the typo in the UUID, rebooted again, and both file systems mounted properly. Unfortunately, ZoneMinder still wasn't working.

By this time, it was an hour after I was supposed to go home, so I just left and picked it up this morning. A fresh look at it helped - found that the zoneminder service just was not starting. Issuing 'service zoneminder start' as root failed with the following:

Code: Select all

Starting zoneminder.service - ZoneMinder CCTV recording and surveillance system...
FAT [Can't create missing temporary directory '/run/zm': Permission denied]
Feb 20 09:52:40 vss-new systemd[1]: zoneminder.service: Control process exited, code=exited, status=255/EXCEPTION
zoneminder.service: Failed with result 'exit-code'.

Looking in /run revealed that there was no zm directory, so I created this, changed ownership & group to www-data, and voila! ZoneMinder starts.

To test the card type being passed to the bttv module, I changed the values to 8 * 98 in the config file and rebooted... to find ZoneMinder wouldn't start again - same issue. Again found /run/zm didn't exist, so did the same thing again & it started; all monitors have a picture and the syslog is silent as far as rc errors go. Confirmed that ZoneMinder is no longer automatically creating this directory with another reboot without changing bttv module options.

My plan is to simply script creating / chowning /run/zm at boot, but I'm curious as to why anything I did would have had any impact at all on that. Also, I'm going to let the system run with card type 98 (as it's functional & doesn't generate the rc errors) and see if it crashes again. Based on what I've discovered so far, I'd wager that my initial assumption that the rc errors were somehow responsible for the machine crashing was incorrect... which means I'm back to square one in that regard. Going to finish configuring remote logging today in the hope that it'll reveal something. Finally, I'm going to script sending myself an e-mail notification that the server booted, set it to trigger on startup, and add some positive number to /proc/sys/kernel/panic so that the system will automatically reboot in the case of a kernel panic. (I'm not even certain that the freeze IS a kernel panic as most panics I've seen have debugging information on the screen, leave behind dump files, and have the num lock/caps lock/scroll lock lights on the keyboard all flashing together. None of these conditions happened with this past crash, which is why I'm questioning if it's a kernel panic or something else. Regardless, it's something else to do that might help the system 'self-heal' after a failure... I can add file system checking, database repair, and running zmaudit.pl later if needed.)

Thank you all for your input & advice. If anyone has thoughts on my plans, I'd be happy to hear them.

mikb · Post by **mikb** » Tue Feb 20, 2024 6:36 pm

Looking at your syslog dump, I notice the mentions of "rc rc0" upwards, in context of a much longer path to a "Provideo PV951" PCI bus device.

Code: Select all

Registered IR keymap rc-pv951
2024-02-16T11:22:13.891735-05:00 vss-new kernel: [   13.229852] rc rc0: PV951 as /devices/pci0000:00/0000:00:1e.0/0000:0d:00.0/0000:0e:08.0/i2c-0/0-004b/rc/rc0
2024-02-16T11:22:13.891736-05:00 vss-new kernel: [   13.229922] rc rc0: lirc_dev: driver ir_kbd_i2c registered at minor = 0, scancode receiver, no transmitter
2024-02-16T11:22:13.891737-05:00 vss-new kernel: [   13.229972] input: PV951 as /devices/pci0000:00/0000:00:1e.0/0000:0d:00.0/0000:0e:08.0/i2c-0/0-004b/rc/rc0/input6

Which lead me to ...

https://sourceforge.net/p/lirc/mailman/ ... 40suas.cz/

It seems it thinks your misidentified capture cards have an LIRC-friendly piece of hardware (an InfraRed remote control TX/RX) which it's trying to set up/use.

This, fortunately, has nothing to with /etc/rc.d/rc.0...3 scripts for starting up the machine, and looking in there was a total red herring. This is just an unfortunate coincidence.

Very confusing

Note though that the hardware device that the kernel/driver created "rc0 ..." for this non-existent IR hardware is the source of your "ENXIO" (Doesn't exist) errors. So something was trying to use the IR hardware that you don't have.

Hard drive nodes swapping around can happen (esp. if one drive is slow to respond or needs a reset, or gets disconnected and reconnected and ends up jumping over a letter -- e.g. sda, sdb becomes sda (broken), sdb, sdc ... and that's why UUIDs and labels are more reliable. Reduces nasty accidents when wrong things get mounted in wrong places.

I suspect there's still some stuff going wrong in there if you're having to manually patch things up like that, so not entirely got to the bottom of it yet!

ZoneMinder Forums

New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog