New ZM Install on Debian - Continuous rc Errors in syslog

Ruler · Post by **Ruler** » Wed Dec 06, 2023 8:42 pm

Our ancient machine running ZoneMinder finally died recently, so I found a machine to replace it. Installed ZoneMinder and got it working. Got a complaint today that it wasn't responding, so checked it out. Turned out that there were millions of lines in /var/log/syslog like this:

Code: Select all

2023-12-06T15:27:42.551930-05:00 vss-new kernel: [  351.497392] rc rc1: error -6
2023-12-06T15:27:42.551949-05:00 vss-new kernel: [  351.498154] rc rc3: error -6
2023-12-06T15:27:42.579927-05:00 vss-new kernel: [  351.525379] rc rc0: error -6
2023-12-06T15:27:42.595889-05:00 vss-new kernel: [  351.541434] rc rc2: error -6
2023-12-06T15:27:42.655898-05:00 vss-new kernel: [  351.601312] rc rc3: error -6
2023-12-06T15:27:42.655918-05:00 vss-new kernel: [  351.602082] rc rc1: error -6
2023-12-06T15:27:42.683931-05:00 vss-new kernel: [  351.629376] rc rc0: error -6
2023-12-06T15:27:42.703911-05:00 vss-new kernel: [  351.649364] rc rc2: error -6
2023-12-06T15:27:42.759887-05:00 vss-new kernel: [  351.705391] rc rc1: error -6
2023-12-06T15:27:42.759904-05:00 vss-new kernel: [  351.706097] rc rc3: error -6

It's repeating very, very, very fast and it's my hypothesis that the load produced by these repeating processes is what caused it to stop responding.
Did a hard reset (as issuing a reboot command & then the three finger salute failed to reboot it) and after repairing file system damage caused by this, started it back up again and it's functioning. The syslog file is still being spammed with these 4 messages though. Amazingly, neither DDG or Google turned up anything pertinent. Checked /etc and the rc[0-6].d directories are all set as 755, as are the scripts in /etc/init.d, which is where all the symbolic links in rc[0-6].d point at. All are owned by root:root, which I believe is correct. The processes that are supposed to start with the machine do, so I'm at somewhat of a loss as to how to proceed troubleshooting.

Wondering if anyone has encountered this on your systems or have any idea what error -6 from rc indicates.

dougmccrary · Post by **dougmccrary** » Wed Dec 06, 2023 11:09 pm

Maybe some hardware issue?
Does stopping zm/mysql/et al. do anything?

Ruler · Post by **Ruler** » Thu Dec 07, 2023 9:00 pm

Good thought. I tried stopping zoneminder, mysql, & apache2, checking /var/log/syslog after stopping each one - no change and those same 4 lines are continuously and rapidly written to syslog every time.

While doing this, I did discover something interesting... the errors were not being spammed to the console as they were when it stopped responding the other day. Wondering if I was running a tail -F /var/log/syslog on the console & forgot to break out of it when I left for the day. (I'm pulled in 3,491 different directions throughout the day & isn't inconceivable that I forgot that I had it running after getting called away on another task.) If that happened, I can believe the system spammed the console so fast that it overfilled whatever buffers are there for text output and this was the cause of the failure.

It's only been one day, but ZoneMinder was still was still running, so going to cross my fingers that it's benign unless someone has ideas of how to continue troubleshooting.

Ruler · Post by **Ruler** » Thu Dec 07, 2023 9:12 pm

I did think of a workaround... doesn't really fix whatever's causing this, but makes syslog useful again.

Created /etc/rsyslog.d/01-blocklist.conf and put these lines in it:

Code: Select all

:msg,contains,"rc rc0: error -6" ~
:msg,contains,"rc rc1: error -6" ~
:msg,contains,"rc rc2: error -6" ~
:msg,contains,"rc rc3: error -6" ~

Restarted rsyslogd & /var/log/syslog is no longer being spammed. I'd *much* rather find what's causing this & fix it, but still haven't been able to find anything about this online.

mikb · Post by **mikb** » Fri Dec 08, 2023 6:24 pm

Ruler wrote: ↑Thu Dec 07, 2023 9:12 pm I did think of a workaround... doesn't really fix whatever's causing this, but makes syslog useful again.

Created /etc/rsyslog.d/01-blocklist.conf and put these lines in it:
Code: Select all
:msg,contains,"rc rc0: error -6" ~
:msg,contains,"rc rc1: error -6" ~
:msg,contains,"rc rc2: error -6" ~
:msg,contains,"rc rc3: error -6" ~
Restarted rsyslogd & /var/log/syslog is no longer being spammed. I'd *much* rather find what's causing this & fix it, but still haven't been able to find anything about this online.

If that return value is an "errno", then 6 is ENXIO: No such device or address (which is something that sounds hardware related)

From table part way down this page

https://stackoverflow.com/questions/503 ... rrno-means

Ruler · Post by **Ruler** » Sat Dec 09, 2023 6:17 am

mikb wrote: ↑Fri Dec 08, 2023 6:24 pm If that return value is an "errno", then 6 is ENXIO: No such device or address (which is something that sounds hardware related)

From table part way down this page

https://stackoverflow.com/questions/503 ... rrno-means

Thank you for that! Confirmed that same description is valid on the version of Debian that this machine is installed with. (They say that it can change between distros & releases... simply running perl -E 'say $!=shift' 6 returned "No such device or address".) Certainly sounds hardware related, but the problem now is figuring out what device is causing the trouble.

The only add-on cards that are in this system are the 2 Spectra-8 capture cards, which are both working fine after much fiddling. Given that it affects all used runlevels - 0-3 (no GUI installed on this box) - I looked for commonalities in /etc/rc[0-3].d thinking that I'd find one process in common that was being started at all these runlevels. Perplexingly, rc0.d & rc1.d have only kill scripts in them, which makes sense when you consider that 0 only get entered when the system is shutting down. Which begs the question, why are the scripts executed when entering runlevel 0 continually spamming logs when the system is up and running?

Code: Select all

user@vss-new:~$ ls /etc/rc0.d
K01apache2  K01apache-htcacheclean  K01hwclock.sh  K01mariadb  K01networking  K01udev  K01zoneminder
user@vss-new:~$ ls /etc/rc1.d
K01apache2  K01apache-htcacheclean  K01mariadb  K01zoneminder
user@vss-new:~$ ls /etc/rc2.d
K01apache-htcacheclean  S01apache2  S01console-setup.sh  S01cron  S01dbus  S01mariadb  S01rsync  S01ssh  S01zoneminder
user@vss-new:~$ ls /etc/rc3.d
K01apache-htcacheclean  S01apache2  S01console-setup.sh  S01cron  S01dbus  S01mariadb  S01rsync  S01ssh  S01zoneminder

The errors persist after I stop zoneminder, mysql, and apache2 services, so the apache hypertext cache cleaner, hardware clock daemon, networking, or udev must be the culprit. Of these, my suspicion is udev, but I've no idea how to get it to tell me details about what's causing the error. (Ever since it was introduced, I've felt that udev is unnecessarily complex & expects you to 'just know' WAY too much... not really a fan, but it is what it is.) My plan is to dig into this further on Monday, but wanted to thank you for the input right away as well as report on what little I found. I'm probably going off on the wrong track with this...

mikb · Post by **mikb** » Sat Dec 09, 2023 3:16 pm

Ruler wrote: ↑Sat Dec 09, 2023 6:17 am Thank you for that! Confirmed that same description is valid on the version of Debian that this machine is installed with. (They say that it can change between distros & releases... simply running perl -E 'say $!=shift' 6 returned "No such device or address".) Certainly sounds hardware related, but the problem now is figuring out what device is causing the trouble.

I don't understand why something in your /etc/rc.X directories would be being so persistent, or why all of 0,1,2,3 seem to be running at once.

I have a horrible feeling that what it MIGHT be is -- through misconfiguration/typo -- something is trying to open VIDEO devices, e.g. /dev/video0 ... 1 ..2 3

And what's actually happening is "Open /etc/rc/rc.0/ for video please!" [ERROR: No, what are you on about?] and then trying in turn to open 1,2,3 ... which is doomed to fail as they are E-NOT-DEVICES but directories.

Look for something *really* annoyingly stupid like that, just in case!

Ruler wrote: ↑Sat Dec 09, 2023 6:17 am The only add-on cards that are in this system are the 2 Spectra-8 capture cards, which are both working fine after much fiddling.

Hmmm. Those are. The "other" camera/capture devices it's looking for may not be so healthy

Ruler wrote: ↑Sat Dec 09, 2023 6:17 am Of these, my suspicion is udev, but I've no idea how to get it to tell me details about what's causing the error.

Go and check any udev rules you've tampered with. Maybe one of them doesn't contain what you think it does! Especially those that might be related to assigning specifice /dev/videoX devices to specific hardware items (to stop your video cards accidentally swapping around, same with CD/DVD and net devices, sometimes you need udev to nail them down so they don't renumber when one is removed or detected in the wrong order).

Post by **iconnor** » Fri Dec 15, 2023 2:00 pm

I think I used to see these. They are coming from the kernel, not init runlevels. I think they are coming from the video capture add-on card drivers.

I don't think I was ever able to fix them.

mikb · Post by **mikb** » Fri Dec 15, 2023 6:22 pm

iconnor wrote: ↑Fri Dec 15, 2023 2:00 pm I think I used to see these. They are coming from the kernel, not init runlevels. I think they are coming from the video capture add-on card drivers.

I don't think I was ever able to fix them.

True on the source: The log messages do say "vss-new" (machine name) and source as "kernel:" -- you'd hope that e.g. udev would prefix its own errors/warnings with "udevd:" but maybe it's something the udev is passing to the kernel, and giving the kernel indigestion, which then udev just shrugs off and moves on from.

It's suspicious that it seems to be spitting out names of directories in /etc/rc.d -- which could lead you to think that there is a problem in those directories. But I think the problem is that something (e.g. udev?) is being fed those directories as input!

"List of my video devices to probe /dev/video0 /dev/video1 /etc/rc.d/* " ... it's very easy to end up with that sort of thing with a single misplaced bit of punctuation, or a rogue edit that joined two lines of a file together.

On that thought, I'd maybe experiment and create an /etc/rc.d/rc.fysh or rc.9 or something directory, with same permissions as the others, then reboot and see if I could get an extra error message from that. Obviously, remove it afterwards ...

Ruler · Post by **Ruler** » Mon Dec 18, 2023 7:01 pm

I changed one of the log suppression rules, restarted rsyslogd, and confirmed that /var/log/syslog was again being spammed with these messages. Then I stopped the udev service and saw that the file was still being filled with these messages. Reverted the changes and syslog is again relatively quiet. To my knowledge, I didn't change any udev rules after system installation. There is a systemd rule that was put in, but all it does is enable /etc/rc.local (Scripted anyway as I got sick of looking it up every system I install.

)

Interestingly, /etc/rc[4,5,6].d all exist (odd as I don't *think* I told it to install a GUI when setting up the operating system, but honestly do not remember), but messages about them are not being dumped into the system log. This is one reason that I originally thought it was something in one of the directories triggering it... system only reaches runlevel 3, not having a GUI, and that's where the errors stopped.

It's somehow nice to hear that I'm not alone with noticing these messages iconnor - the system has been up and running stable since I put in the suppression rules on the 8th, so I'm not going to worry about it too much. While part of me would still like to figure out what's causing them, from what I can tell, I'm down to basically grasping at straws to identify why. If you weren't able to isolate it, I don't stand much of a chance...

---

Here are the exact details of the system for anyone who happens to find this thread in the future: There are two capture cards in the system model Spectra-8 made by iTuner Networks Corp. Each card has 4 BT878 capture chips on it. There is also a small 'expansion board' that plugs directly into the main card with a ribbon cable - basically provides 4 more BNC inputs on the back of the machine. Each card has 8 cameras configured, one on channel 0 and another on channel 1, for a total of 16 on the system. Experimentally, I found that I need 4 captures per frame to prevent signal from one channel from bleeding into the other, which also has the added benefit of capping frame rate to between 3 and 4 FPS per camera. (Strangely, if I put a value in the Max FPS, I get the same image bleed between channels. Management does not want to pony up for a large hard drive, yet wants a decent amount of historical recordings retained before the space is recycled, so the frame rate being capped by the captures per frame setting is a significant advantage in my situation.) The parameters passed to the bttv driver module by our old Slackware system in /etc/modprobe.conf were lifted directly from the manual that came with the care & reused for this machine, only located in /etc/modprobe.d/bttv.conf on this new Debian system. They are:

Code: Select all

options bttv card=42,42,42,42 radio=0 tuner=4 chroma_agc=1 vbibufs=4 v4l2=1 gbuffers=16

mikb · Post by **mikb** » Tue Dec 19, 2023 6:16 pm

Ruler wrote: ↑Mon Dec 18, 2023 7:01 pm Interestingly, /etc/rc[4,5,6].d all exist

Hmmm. So that's a good stand-in for rc.9/rc.fish suggestion. So something is only ingesting four indigestible pieces of directory, expecting video devices. Why only four?

Code: Select all

options bttv card=42,42,42,42 radio=0 tuner=4 chroma_agc=1 vbibufs=4 v4l2=1 gbuffers=16

"Only four" ... could be a co-incidence though.

I still think something is pointing to /etc/rc.d and treating it like a directory where N video devices can be found. Oh look, here they are, rc.0 rc.1 rc.2 rc.3 ... fail fail fail fail. OK, moving on ...

Ruler · Post by **Ruler** » Fri Feb 16, 2024 8:25 pm

OK, looks like I've gotta find and fix the source of this if I'm to have a stable system...

I left the zmaudit.pl script (viewtopic.php?t=33055) running when I clocked out last night to clean up events that were left behind after destroying & recreating the file system for the storage area (viewtopic.php?t=33019) because the system ran out of inodes long before it ran out of space. When I got in this morning, the ssh session from my PC to the server had been disconnected. Went to the console and saw these same errors scrolling as fast as the system would and it didn't respond to any key presses or the power button. Had to hold the power button in to forcibly kill power and get the system to reboot, then dealt with the file system corruption that resulted. This is the second time this has happened - I figured that I had screwed up when changing the syslog config last time when I was testing various hypotheses as to the cause of these, but haven't touched it since and so this time can't be attributed to that.

While rebooting after repairing the file system corruption, I noticed something else that there are four of in this system - CPU cores. How that might cause something like this is beyond me, but it's something more than having 4 tuners on each of the two capture cards in the system.

Still haven't been able to find anything relevant about this when searching the internet and would be grateful for any tips on how to trace down what's causing it. Unfortunately, just leaving it doesn't seem to be a viable solution with the issue popping up at random & making the entire system lock up when it does.

mikb · Post by **mikb** » Sat Feb 17, 2024 6:08 pm

Ruler wrote: ↑Fri Feb 16, 2024 8:25 pm OK, looks like I've gotta find and fix the source of this if I'm to have a stable system...

... ssh session from my PC to the server had been disconnected.

[console] ... didn't respond to any key presses or the power button. Had to hold the power button in to forcibly kill power and get the system to reboot, then dealt with the file system corruption that resulted.

So original ssh session went away (and not just from an idle timeout because you went to sleep?)

Can't ssh in to get a new session?

No response to keypresses (not even CAPS LOCK -> LED toggle? -- That's my go-to 'Are you even alive?' test ...)

I assume you tried Ctrl-Alt-Del, and if no good ... did you try the magic-Sysrq-key sequence? (Ctrl-Alt-SysRq all held down, then in turn, R E I S U B waiting for a few seconds after each key) to try and at least forcibly unmount filesystems before a boot?

I suppose you are losing any useful syslog/messages files when the system locks up/gets booted that might indicate the start of things going wrong? It may require you setting syslogging to be sent to another machine or a serial port to see what happened.

Ruler · Post by **Ruler** » Sun Feb 18, 2024 1:02 am

Yep - ssh session was disconnected. Couldn't connect via ssh again or even ping the machine.

Didn't check caps / num lock. I did give it the three finger salute multiple times, even holding them down for about 10 seconds, in the hope of getting it to reboot cleanly. This is the first I've heard of the sysrq key sequence & have added it to my list of tricks for next time.

Great idea about setting up remote logging - I'll install an old PC and get that going when I get back in the office.

Thing that confuses me is that the errors are continual unless I put the suppression rules in place. Then it shuts up and runs perfectly fine until randomly out of the blue, the system decides to disregard the suppression rules & start spamming the same messages to the console again, this time causing the system to lock up. I'm extremely curious as to what's causing them & wonder even more about why the suppression rules random stop working.

mikb · Post by **mikb** » Sun Feb 18, 2024 7:20 pm

Ruler wrote: ↑Sun Feb 18, 2024 1:02 am the system decides to disregard the suppression rules & start spamming the same messages to the console again

Just a theory, but maybe things quitting/exitting is a sign of something fundamentally wrong -- so wrong, in fact, that syslogd/klogd or the equivalent part of systemd ... is exitting too. This means that the normal path of messages (that would be captured, filtered and filed according to your rules) is gone.

Kernel is still running and still has stuff to say. So, last resort, spit them onto the physical console.

"Remote logging" might turn into a camera pointed at the monitor at this rate

ZoneMinder Forums

New ZM Install on Debian - Continuous rc Errors in syslog

New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog

Re: New ZM Install on Debian - Continuous rc Errors in syslog