Motion - Netcam Retry Error Discussion
You are here: Foswiki>Motion Web>RoadMap>NetcamRetryErrorDiscussion (09 Aug 2005, BillBrack)Edit Attach

Netcam and the Retry Scheme when having errors Discussion

When should the netcam code give up? When should it retry? How often? How long? What messages should it give to the user?

This topic is for discussing this.

To start with, consider the basic data flow. When a network camera is configured, a single thread is started for "watching" the data for motion. This thread performs as a loop (the code is within motion.c), which periodically (based upon the framerate configuration parameter) calls the routine vid_next (in video.c), which in turn calls netcam_next. With a CCTV camera, the video frame is returned back to the motion loop within a few milliseconds. The network camera would normally take somewhat longer (particularly if it's not "streaming", but rather has to go fetch the data), but will similarly return with the image reasonably soon. The potential problem happens when errors occur with the netcam. Consider, for example, that a router goes down for 5 or 10 minutes (or even longer). It is probably not satisfactory to just stay within the netcam handling code, waiting for the camera to return, because the main motion loop may need to do something (like finish a movie file, or generate an alarm event). But if the netcam handler wants to return to the main loop, it needs to supply an image. If the image is NULL, the main loop takes that as a fatal error and shuts down the netcam. So, what should the netcam return? Should it be a solid black frame? Solid white? Some other data? Or should we modify the main loop to understand that there was an error?

The second consideration is what sorts of different errors we might encounter, and how we can best handle them. The table below lists some of the possibilities. Please add any others which you think need to be addressed.

Condition Description During Startup Retry Messages
Invalid URL (e.g. hostname not found, or wrong path) This condition occurs during the "parsing" (interpretation) of the URL during startup, and would normally be caused by incorrect data in the netcam_url or netcam_userpass parameters Exit No Show the invalid URL
Connection refused Could be caused by an incorrect hostname or path, or by "network problems" Exit Retry scheme 1 Show the system error string
Header / Boundary String missing or invalid Caused by incorrect data being received from the camera. This is often due to dropped packets Retry scheme 2 Retry scheme 1 during normal running. Indicate what was expected and what was received
JPEG decompression error Usually caused by one or more missing packets, or corruption of some of the picture data Retry scheme 2 Do not log or display individual errors, but log if trouble persists for several consecutive frames Log message if trouble persists for more than N [5?] frames
HTTP/1.0 404 Not found Some netcams will return this error code randomly (Toshiba is certainly bad about it) to the netcam code whenever information is requested. Retry scheme 2 Retry scheme 2 No message required unless the retries fail, in which case a message describing the failure before exiting.
JPEG image size 800x600, JPEG was 640x480 When the thread first begins, an initial image is fetched and the dimensions are set (the 'was' value). This is then used for picture buffer allocation. Later, someone changes the camera to produce a different size (very unfriendly). (Not applicable) Because image sizes (i.e. the buffers used by the motion main-loop) have been changed, if the new dimensions are greater the threads for this camera should be restarted. Error message to log the event.

  • Retry Scheme 1: Retry fast a few times and then change to a once per minute retry for 24 hours
  • Retry Scheme 2: Retry up to 5 times, then give up

Return values to motion_loop

How the errors are fed from network code back to the main motion_loop.

motion_loop is changed so that.

  • vid_next returns an integer instead of a pointer to char.
    • 0 = success
    • negative value = fatal error - thread should stop
      • -1 = V4L fatal.
      • -2 = Netcam fatal
    • positive value = non-fatal - thread should continue copying previous frame into current
      • 1 = V4L non-fatal (we have an old ignored patch that could detect when a bttv camera looses sync.
      • 2 = Netcam non-fatal
      • 3+ = Can be assigned to any non-fatal


vid_next in video.c is changed from
unsigned char *vid_next (struct context *cnt, unsigned char *map)
{
        struct config *conf=&cnt->conf;
        unsigned char *ret=NULL;

        if (conf->netcam_url) {
                int retries=0;
                while ((!cnt->finish) && (!ret) && (retries++<100))
                        ret = netcam_next(cnt, map);
                return ret;
        }
#ifndef WITHOUT_V4L
        {
                int i=-1;
                int width, height;
                int dev = cnt->video_dev;

                /* NOTE: Since this is a capture, we need to use capture dimensions. */
                width = cnt->rotate_data.cap_width;
                height = cnt->rotate_data.cap_height;
                
                while (viddevs[++i])
                        if (viddevs[i]->fd==dev)
                                break;

                if (!viddevs[i])
                        return NULL;

                if (viddevs[i]->owner!=cnt->threadnr) {
                        pthread_mutex_lock(&viddevs[i]->mutex);
                        viddevs[i]->owner=cnt->threadnr;
                        viddevs[i]->frames=conf->roundrobin_frames;
                        cnt->switched=1;
                }

                v4l_set_input(viddevs[i], map, width, height, conf->input, conf->norm,
                               conf->roundrobin_skip, conf->frequency, conf->tuner_number, cnt);
                ret=v4l_next(viddevs[i], map, width, height, cnt);

                if (--viddevs[i]->frames <= 0) {
                        viddevs[i]->owner=-1;
                        pthread_mutex_unlock(&viddevs[i]->mutex);
                }
        
                if(cnt->rotate_data.degrees > 0) {
                        /* rotate the image as specified */
                        rotate_map(map, cnt);
                }
        }
#endif /*WITHOUT_V4L*/
        return ret;
}

to

int *vid_next (struct context *cnt, unsigned char *map)
{
        struct config *conf=&cnt->conf;
        int ret = 0;

        if (conf->netcam_url) {
                ret = netcam_next(cnt, map);
                return ret;
        }
#ifndef WITHOUT_V4L
        {
                int i=-1;
                int width, height;
                int dev = cnt->video_dev;

                /* NOTE: Since this is a capture, we need to use capture dimensions. */
                width = cnt->rotate_data.cap_width;
                height = cnt->rotate_data.cap_height;
                
                while (viddevs[++i])
                        if (viddevs[i]->fd==dev)
                                break;

                if (!viddevs[i])
                        return -1;

                if (viddevs[i]->owner!=cnt->threadnr) {
                        pthread_mutex_lock(&viddevs[i]->mutex);
                        viddevs[i]->owner=cnt->threadnr;
                        viddevs[i]->frames=conf->roundrobin_frames;
                        cnt->switched=1;
                }

                v4l_set_input(viddevs[i], map, width, height, conf->input, conf->norm,
                               conf->roundrobin_skip, conf->frequency, conf->tuner_number, cnt);
                ret = v4l_next(viddevs[i], map, width, height, cnt);

                if (--viddevs[i]->frames <= 0) {
                        viddevs[i]->owner=-1;
                        pthread_mutex_unlock(&viddevs[i]->mutex);
                }
        
                if(cnt->rotate_data.degrees > 0) {
                        /* rotate the image as specified */
                        rotate_map(map, cnt);
                }
        }
#endif /*WITHOUT_V4L*/
        return ret;
}

Comments


  • For 'Header / Boundary' changed 'During Startup' from 'Exit' to 'Retry scheme 2'
  • For 'JPEG error' changed 'During Startup' from longer description to just 'Retry scheme 2'; for 'Retry' changed to be 'Do not log or display individual errors, but log if trouble persists for several consecutive frames

-- BillBrack - 06 Aug 2005

We have this scenario.

  • The camera thread needs 5 frames per second according to the framerate set in motion.conf.
  • The Netcam cannot keep up with this - maybe only periodicly because the Ethernet is busy or whatever. It can do may two on average.
  • So the Netcam handler thread has nothing to present to Motion.

In this case I would think the Motion loop would be better off WAITING for a frame to arrive instead of receiving grey frames which will trigger false motion and grey jpeg pictures.

BUT - Much functionality depends on Motion completing a loop at least once per second.

Let us now take the scenario where the Netcam cannot deliver frames even once per second. Here the netcam_handler needs to send something to the motion loop to enable it to continue.

  • It could send the previous frame. No motion will be detected. The webcam will just appear as nothing is moving.
  • A grey frame triggering Motion detection both when it arrives and disappears but shows the user on the webcam that something is wrong.
  • Or we could combine the two. Maybe send previous frame the first many seconds. Yes maybe always so that the camera loop never has to wait!! It has a small impact on the motion detection because it removes the old motion from the reference frame but I think this may even be a desired side effect. After a period of time we change to grey image.

If we implement this we have to watch out not reintroducing the old out of sync bug.

This was meant as a proposal - not a decision smile

-- KennethLavrsen - 06 Aug 2005

I see netcam like a connection to a ccd cam.

If you unplug the ccd cam you get a black picture.

but not every networktimeout sould end in black pictures, instead use a retrycount

-- PeterHolik - 06 Aug 2005

Yes and no.

A CCD camera that gets disconnect really gets physically disconnected. This happens only during service or if the camera breaks. You can trust the connection to a CCTV or USB camera.

A network camera sits on a network. Sometimes on a big network which is used for other things. In a big company you often have a 5 minute break white a router is booted or a patch panel gets reorganised.

And also Netcams are sometimes simply on the big internet and the machine running Motion is far far away. And the big Internet is not always stable. There are hickups on the way.

A 2nd difference is the fact that with both USB cams and CCTV cams (bttv driver) Motion decided when to pick a picture and it always gets it milliseconds later. On a Netcam with mjpeg the camera sets the pace.

So all in all a Network camera is everything but a connection to a CCD cam.

But your 2nd statement basically concludes the same as I. Not every network timeout should result in a black/grey picture. Motion should retry first. But if the camera handler cannot produce a valid picture within a fraction of a second I would propose that for a while the handler sends the previous picture to the camera thread so that small network hickups that lasts a few seconds like a quick router boot or moving a patch cable does not result in motion being detected and grey pictures saved or added to an mpeg.

-- KennethLavrsen - 06 Aug 2005

At the moment, the netcam code (for a non-streaming camera) can't return to motion until a picture has been fetched, or a timeout has occurred. I'm changing this (see NetcamErrorImprovementPatch [patchfile coming soon...]) to fix that part of the problem. Once that's done, netcam_next can always return within a few milliseconds.

The more interesting part is what needs to be communicated to the motion main loop. Even in the absence of errors, there may not be a "new" frame available. When a new (error-free) frame is ready, it might be as old as (time of last request from motion - 1 nanosecond). It seems to me a good solution is for netcam_next to provide a return value which indicates the time at which the image was received (e.g a long-long value). With that, the motion main_loop could do all kinds of clever things. We could simultaneous modify vid_next to provide a corresponding return value for v4l data, or to fake it (to be further considered).

-- BillBrack - 06 Aug 2005

I don't think the motion_loop - at the moment - have any real use for a timestamp. As I implemented things Sunday 7th Aug the motion_loop makes a time stamp the first time it fails to get an image which it can use for reporting the error.

I think the best approach for non-streaming netcam is (we already discussed it on IRC - just want to put it on "paper").

  • The motion_loop calls vid_next during startup even before it enters the infinite while loop. The netcam_next has to make sure that first time it is called it fetches an image and returns it. I think you have it covered but no idea how you do it now. There is no special error handling done at this early stage. Motion exists in case of error. (Is this smart?)

Once we are looping we assume that an image is waiting to be picked up.

  • motion_loop calles vid_next() which calles netcam_next().
  • netcam_next returns an image and return code 0.
  • motion_loop returns from vid_next and continues with business
  • netcam_next starts fetching the next image. It now has 1/framerate time to retry and get an image without affecting the flow of the motion_loop.
  • Once the picture is fetched netcam thread just waits for netcam_next to be called again.
  • motion_loop calles netcam_next and we start all over.

The advantage of this is.

  • motion_loop never waits
  • netcam code has lots of time to fetch next image and retrying without delaying anything.

The error mode would typically be that the netcam code cannot get an image or the image continues to be rotten.

  • netcam code continues to retry.
  • motion_loop calls vid_next -> netcam_next()
  • netcam_next has no valid frame and copies no data into the buffer ALERT!. Instead it returns return code 2.
  • motion_loop returns from vid_next
    • If it is less than 5 seconds ago we got a valid frame we copy the last good frame to the new frame. Note that if pre_capture is 0 and minimum_motion_frames os 1 the ring buffer has size 1 and no copying is done.
    • If it is more than 5 seconds ago we put a grey image with a message in the buffer.
  • netcam continues to retry. If success we are back in the normal mode next time motion_loop calls vid_next.
  • If netcam still fails we cycle through the error mode once more. We never exit from this mode. User has to stop Motion.

-- KennethLavrsen - 08 Aug 2005
Topic revision: r16 - 09 Aug 2005, BillBrack
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.