After reboot, some cameras are missing and never appear

Forum for questions and support relating to the 1.24.x releases only.
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

Disable ACPI on BIOS...


There is no option to do that in this BIOS.
and daemon...
What daemon would you like me to disable?
CoYoTe
Posts: 33
Joined: Sat Jul 18, 2009 12:56 pm
Location: Buenos Aires, Argentina

Post by CoYoTe »

I have the same issue here, with an ECS borad and a AMD processor.

I notice that, if you restart zoneminder, all things goes ok.

Looks like daemons and hard lost sync on startup (its weird, some cams start with a green screen or with a line of vid on top), but the problem is solved with a "zmpkg.pl restart".

I try all kind of modificacionts, switching PCI slots, etc. But the problem persist.

i update te server, and looks like some kernel issue, cause now it works perfectly..

Code: Select all

bagservice:~# uname -a
Linux bagservice.com.ar 2.6.32-trunk-amd64 #1 SMP Thu Dec 17 06:29:18 UTC 2009 x86_64 GNU/Linux
so, i suggest... try to update kernel.

Greetings
Alejandro
Alejandro
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

skier wrote:How do I start a missing zmc (capture daemon) from the command line?

I tried

/usr/local/bin/zmc -d /dev/video3

but it just returned silently and the process wasn't there in ps (plus, it would have been owned by me rather than by www-data).

There must be a wrapper script somewhere that invokes this properly. What is it?
I answer my own question here: by reading through the source of zmwatch.pl I found that the "official" way to start the missing zmc would be

zmdc.pl restart zmc -d /dev/video3

If I do that on the /dev/videoX of a running camera, it does indeed restart the corresponding zmc daemon: I see a new process number in ps.

However, for the cameras that don't come on (right now 0 and 3), this has no visible effect, which is in a sense understandable, otherwise zmwatch.pl would have resurrected those cameras by itself.


So, in summary: I now know the command, but it doesn't work. Something else is preventing the zmc from running. If it is an IRQ conflict, it's unclear to me why only those channels have a problem with the conflict whereas they're all in conflict with each other in more or less the same way.
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

CoYoTe wrote:I have the same issue here, with an ECS borad and a AMD processor.

I notice that, if you restart zoneminder, all things goes ok.
Unfortunately this is not the case for me.
Looks like daemons and hard lost sync on startup (its weird, some cams start with a green screen or with a line of vid on top), but the problem is solved with a "zmpkg.pl restart".
At the moment I am missing channels 0 and 3. I issued your command and they are still missing. At this stage I'm clutching at straws and trying anything, but it would be good to understand WHY things don't work...
I try all kind of modificacionts, switching PCI slots, etc. But the problem persist.

i update te server, and looks like some kernel issue, cause now it works perfectly..

Code: Select all

bagservice:~# uname -a
Linux bagservice.com.ar 2.6.32-trunk-amd64 #1 SMP Thu Dec 17 06:29:18 UTC 2009 x86_64 GNU/Linux
so, i suggest... try to update kernel.
Well, thanks a lot for your message and suggestions.
I'm on 2.6.31-16-server #53-Ubuntu SMP Tue Dec 8, which is the latest available in the distro's repositories. 2.6.32 isn't out yet. I'll have to compile it myself...

From which kernel did you upgrade when you fixed your problems? Was it also a fairly recent one?
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

@Coyote: compiling the 2.6.32 kernel as we speak... We'll see!



@all:

when I run

zmdc.pl restart zmc -d /dev/video3

and it fails to start a zmc, where can I find the error messages it may have written?
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

@Coyote:

OK, I have now compiled and installed a 2.6.32 kernel.

What do I get? All channels minus /dev/video3.
Restarting zm with

Code: Select all

sudo /etc/init.d/zm restart
or with

Code: Select all

sudo su www-data 
zmpkg.pl restart
does not revive channel 3.

Maybe I have now got a hardware fault on 3? Maybe, as mastertheknife said might happen, because of the lack of heatsinks on the Conexant chips?


The diagnostic output looks the same as before.

Code: Select all

$ cat /proc/interrupts 
           CPU0       CPU1       
  0:         26          0   IO-APIC-edge      timer
  1:          8          0   IO-APIC-edge      i8042
  8:          0          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 12:        132          0   IO-APIC-edge      i8042
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 17:          0          0   IO-APIC-fasteoi   pata_marvell
 18:          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb7
 19:          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
 20:      40915          0   IO-APIC-fasteoi   bttv3, bttv7
 21:      82803          0   IO-APIC-fasteoi   uhci_hcd:usb4, bttv0, bttv4
 22:      68641          0   IO-APIC-fasteoi   bttv1, bttv5
 23:      57129          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb5, bttv2, bttv6
 29:      44172          0   PCI-MSI-edge      ahci
 30:       3121          0   PCI-MSI-edge      eth0
 31:          1          0   PCI-MSI-edge      i915
NMI:          0          0   Non-maskable interrupts
LOC:      40890      60270   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:        343        334   Rescheduling interrupts
CAL:         26         67   Function call interrupts
TLB:       1443        294   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          2          2   Machine check polls
ERR:          0
MIS:          0

Code: Select all

$ dmesg | fgrep IRQ | fgrep bttv
[    5.626719] bttv 0000:07:08.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[    5.641455] IRQ 21/bttv0: IRQF_DISABLED is not guaranteed on shared IRQs
[    7.691374] bttv 0000:07:09.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[    7.691409] IRQ 22/bttv1: IRQF_DISABLED is not guaranteed on shared IRQs
[    9.741352] bttv 0000:07:0a.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[    9.741394] IRQ 23/bttv2: IRQF_DISABLED is not guaranteed on shared IRQs
[   11.791385] bttv 0000:07:0b.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[   11.791423] IRQ 20/bttv3: IRQF_DISABLED is not guaranteed on shared IRQs
[   13.841406] bttv 0000:07:0c.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[   13.841449] IRQ 21/bttv4: IRQF_DISABLED is not guaranteed on shared IRQs
[   15.891347] bttv 0000:07:0d.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[   15.891390] IRQ 22/bttv5: IRQF_DISABLED is not guaranteed on shared IRQs
[   17.941342] bttv 0000:07:0e.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[   17.941500] IRQ 23/bttv6: IRQF_DISABLED is not guaranteed on shared IRQs
[   19.991339] bttv 0000:07:0f.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[   19.991383] IRQ 20/bttv7: IRQF_DISABLED is not guaranteed on shared IRQs
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

STOP PRESS --

It wasn't a hardware fault: the hardware of all 8 channels works fine.

It wasn't the old kernel: it is possible to get a picture from all 8
channels simultaneously even on 2.6.31.14-generic.

It wasn't an IRQ conflict: it is possible to get a picture from all 8
channels even with a setup where each IRQ from 20 to 23 has two video
channels and sometimes other things such as usb items (like
"uhci_hcd:usb4")

I can't cut and paste the output of dmesg easily because I don't have
ssh access into that box right now, but it has the same error message
(IRQF_DISABLED is not guaranteed on shared IRQs) that was earlier
flagged as pointing to the problem.

What I did:

* changed the hard disk with another one that also had ubuntu 9.10 (but
desktop edition, not server)

* installed zm from the distribution

* fiddled with it until I got a picture (adding a bttv.conf in
/etc/modprobe.d and that kind of bullshit)

[By the way: I *never* got a proper picture with xawtv---always broken
up, no matter what format I chose. Whereas with ZM I get a clean
picture on v4l2, PAL I, YUV420, 320x240.]

* added 8 monitors, one per card

* ...and I could get a montage of all 8 working together, as in the good
old times.

Now of course this is not the "proper" setup: it's the desktop edition
rather than the server, it doesn't have ssh access and all the
associated anti-intrusion thingies, it runs zm 1.24.1 instead of
1.24.2, it has lots of other junk software on it, the hd is too small,
it probably still has the bug where the video feed crashes after 15
mins etc etc etc.

*BUT* it proved that the hardware can work, with all 8 channels, with
the current IRQ settings (whatever they are) and so on. So it's "only"
a matter of transferring to the other setup whatever pixie magic makes
this one work.

Now, to be sure, this one doesn't work perfectly either. When I reboot
it, does it come back up with all 8 cameras going? No way. I just
rebooted, and I got only one camera. Then I restarted zm, and I got 6
working (all but 6 and 7). Then restarted zm again, and I got 7
working (all but 7). Then once again, and they were finally all
up. Including /dev/video3.

I still haven't found a solution that will work unattended and come
back up properly with all 8 cameras working after a power cut. But
some of the issues I have been banging my head against over the past
few days, with help from you guys (which I still appreciate---gotta
try SOMETHING!), seem not to be the cause of the problem.

More clues welcome...
User avatar
cordel
Posts: 5210
Joined: Fri Mar 05, 2004 4:47 pm
Location: /USA/Washington/Seattle

Post by cordel »

The good thing is that you know it's not in fact the hardware.
But this also means something is off/mis-configured/broken in your previous install and something that has been thus far overlooked as I doubt it is distro specific else I suspect we would hear more complaints.

Might be worth putting the other drive back and go step by step through one of the install tutorials and verify what is done. Also as mentioned in all the trouble shooting links enable the debug with the level about 5 and see what errors return in the debug logs.

Short of having a real error, we can only give a best guess.
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

What I've been doing is to take yet another hard disk and restart from zero: BIOS reset to the system defaults, HD reformat, clean install of ubuntu server, then zm only (1.24.1 from the distro) and no other software, bttv.conf and all that stuff, definition of the cameras one by one and trying to get to the stage where I used to have the problem.

I'm very glad to say that, thanks to the info from the debug logs, I got my EUREKA moment.

The log of the aborted zmc said that the zone extended outside the image dimensions. But I hadn't even DEFINED any zone in the first place! So...?

Didn't make much sense at first; but with this hint I eventually found the reason.

Since entering or editing the parameters for 8 cameras is so fiddly and time-consuming through the web UI, especially when you have to do it several times, I had defined one camera just like I wanted it and then I had entered the data for the others by dumping out the Monitors table in sql, editing it as text, reimporting it, and restarting the server.

"Obviously" this attempt at being clever backfired, because behind my back zm had defined its own zones that now no longer matched the new camera settings (eg new size). "Obviously" when you change the width and height through the web interface and press save, something else happens behind the scenes other than just saving those---probably including redefining the default zones. This of course didn't happen when I imported my edited sql dump and so I was left with an internally inconsistent setup that prevented zmc from starting for that camera on channel 3.

Creating all the cameras and redefining all the parameters by hand, though laborious, led to a working setup. The shortcut I had attempted, instead, led to failure. (It would still be good to know how to define the setup of all the cameras with an editable and a search-and-replaceable text file, so that eg if you want to change the label format or the picture size on all the cameras you can do it easily in one go; but that's a separate problem.) Now I have all 8 cameras at the same time again.

Do the cameras now all reappear after a reboot? Sadly not. So the original problem still hasn't been solved. But at least I'm making some progress, and I have an explanation for at least some of the strange behaviour I observed (ie why wasn't /dev/video3 ever coming back up lately?).
skier
Posts: 29
Joined: Thu Dec 10, 2009 11:54 am

Post by skier »

I am investigating the after-I-reboot-the-pc-some-cameras-arent-there issue on the reinstalled system where I know I can have all 8 going at some point.

What seems to happen (different from before) is that, for the zmc that can't start, it's because the /dev/videoX isn't even there! Apparently it takes a very long time (like: several minutes) for some of them to "appear". Check this out:

Code: Select all

$ dmesg | fgrep video
[    0.531194] pci 0000:00:02.0: Boot video device
[    5.443929] Linux video capture interface: v2.00
[   86.090082] bttv0: registered device video0
[  166.530083] bttv1: registered device video1
[  246.970097] bttv2: registered device video2
[  327.410089] bttv3: registered device video3
[  407.850083] bttv4: registered device video4
[  488.290095] bttv5: registered device video5
[  568.730104] bttv6: registered device video6
[  649.170087] bttv7: registered device video7
So after two minutes (120 seconds) we still only have /dev/video0. It takes over 10 minutes (600 seconds) before we have all 8. And the video devices come up in numerical order, pretty regularly, about one every 80 seconds or so. When they do, the corresponding zmc processes also appear, and so do the pictures.

It seems a very long time but if it did it consistently then I wouldn't worry so much: I'd just know that after a power cycle I need to wait 15 minutes before the system is fully up and running. No problem, so long as it really is after that!

More experiments needed to prove or disprove this.

Behind it, though, there's surely some other bug. 80 seconds is too long for a piece of electronics. It's certainly a timeout, so I'd have to figure out what is timing out on what else. But that's now low priority if that's really all that's happening.
th
Posts: 23
Joined: Wed Dec 30, 2009 4:05 am
Location: Iowa

Post by th »

I know I'm a couple weeks behind your efforts here but I fought this same problem this weekend after I moved from two systems with a PV-149 in each into a single system with both cards. (I had to move from 64 bit to 32 bit as well.)

I started having the same random camera going red/disappearing after a couple hours running on the new consolidated server. It would fix itself after a system reboot. And even then, it only worked for a couple hours. Same errors in the logs as you discussed.

I googled for hours on it. This thread was the most useful. (Thanks!) But it didn't seem to give me a definitive solution.

So as of 12 hours ago, I added "noapic" and "acpi=off" to the kernel boot. And after that, the cameras stayed active. Running 8 cameras with great success!

Thanks to your tips. And hopefully my addition helps someone.

CentOS 5.3 i686 (yum updated to 5.4)
ZM 1.24.2
(2x) PV-149
(7x) Vissior XR-325
(1x) Vissior XR-350
th
Posts: 23
Joined: Wed Dec 30, 2009 4:05 am
Location: Iowa

Post by th »

So much for that suggestion. Just under 24 hours after the boot with the above settings, one of the 8 cams flaked out with the same errors again. It seemed to prevent it for longer than usual. Or maybe it occurs in longer intervals after a reboot than I suspected.

Code: Select all

Jan 18 08:07:02 zoom kernel: bttv0: SCERR @ 37430000,bits: HSYNC FBUS SCERR*
Jan 18 08:07:03 zoom last message repeated 8 times
Jan 18 08:07:03 zoom kernel: bttv0: timeout: drop=0 irq=4994691/38207616, risc=37430000, bits: HSYNC FBUS
Jan 18 08:07:03 zoom zmc_dvideo0[2750]: ERR [Sync failure for frame 7 buffer 1: Input/output error]
Jan 18 08:07:03 zoom zmc_dvideo0[2750]: ERR [Failed to capture image from monitor 4 (0/1)]
Jan 18 08:07:03 zoom zmdc[2692]: ERR ['zmc -d /dev/video0' exited abnormally, exit status 255] 
Jan 18 08:07:03 zoom zmdc[2692]: INF [Starting pending process, zmc -d /dev/video0] 
Jan 18 08:07:03 zoom zmdc[2692]: INF ['zmc -d /dev/video0' starting at 10/01/18 08:07:03, pid = 20268] 
Jan 18 08:07:03 zoom zmdc[20268]: INF ['zmc -d /dev/video0' started at 10/01/18 08:07:03] 
Jan 18 08:07:03 zoom zmc_dvideo0[20268]: INF [Debug Level = 0, Debug Log = <none>]
Jan 18 08:07:04 zoom zmc_dvideo0[20268]: INF [Starting Capture]
Jan 18 08:07:04 zoom kernel: bttv0: reset, reinitialize
Jan 18 08:07:04 zoom kernel: bttv0: PLL can sleep, using XTAL (28636363).
Jan 18 08:07:04 zoom kernel: bttv0: SCERR @ 37430000,bits: VSYNC HSYNC FBUS SCERR*
Jan 18 08:07:04 zoom last message repeated 3 times
th
Posts: 23
Joined: Wed Dec 30, 2009 4:05 am
Location: Iowa

Post by th »

I never did find a fix for this. Some forums just said the capture card was not as compatible with older hardware. So perhaps my system is "older hardware".

I did manage to use this work around in a cron running every minute:

Code: Select all

#!/bin/bash

CHECK=`dmesg | tail -7 | grep "bttv[0123456789]: timeout: drop="`

test ! -z "$CHECK" && {
service zm stop
modprobe -r bttv
sleep 3
modprobe  bttv
service zm start
echo "$CHECK" | mail -s "Camera Went Red" YOUREMAIL@domain.com
}
Locked