Current netcam uses locking (mutex) to protect buffers. I found now a way to eliminate this locks using an atomic function (xchg - exchange). Atomic functions manipulates integers in one step so that no other thread or CPU can disturb.
Also netcam->video_image and the memcpy is no more needed.
Description of Patch
Three buffers are needed (next, latest, current). The netcam thread writes always to buffer next and atomically exchanges this buffer with latest. The motion thread (netcam_next) only exchanges atomically buffer latest with buffer current if a new image arrived.
- If jpeg decompression at netcam_next failed 0 is returned.
- motion_log at netcam_check_buffsize used with NETCAM_DEBUG is not usable because of missing motion context (cnt)
Installation of Patch
Within the motion source directory, execute the command:
gzip -dc motion-3.2.2_snap10-netcam_xchg3.patch.gz | patch
Change History of Patch
- 3.2.2-snap9-post1 3 Aug 2005
- motion-3.2.2_snap9-netcam_xchg2.patch 4 Aug 2005
Changed to three separate buffers. Generated patch against previously-installed ErrorLoggingEnhancementPatch
- motion-3.2.2_snap10-netcam_xchg3.patch 5 Aug 2005
Updated patch to work against snap10. Changed to not return NULL within netcam_next, but rather return a grey picture image. Changed libjpeg errors and warnings to only appear when netcam.c is compiled with NETCAM_DEBUG defined. Small code and comment cleanup in all three netcam-related modules, together with their include files.
- motion-3.2.2_snap11-xchg.diff 12 Aug 2005
I've extended configure.in with a check for a working xchg instruction and add a compiler directive HAVE_XCHG if found and modified the two parts where xchg is usefull.
You have some very good ideas. First on the two bugfix points - as the comment just above the jpeg_decompression error points out, I (quite arbitrarily) chose to return the partially-completed buffer. I certainly have no objection to returning with a NULL instead, and will go ahead and change that. The motion_log correction was also spotted by another contributor, and is being fixed (although in a somewhat different way from yours).
Now to the more important part. I actually spent much more time than I'm willing to admit to, trying to prove that your approach wouldn't work. At the end of it all, I'm convinced - your suggestion is excellent! However, I would prefer to do a little further cleanup (taking out some stuff that isn't needed any longer, and re-naming some of the components of netcam_context in a more meaningful way. When I've completed that, I'll attach the results here.
- 03 Aug 2005 19:58PDT
Lavr: You were all busy last night and Peter H for sure got the "small grey cells" going.
Yes, it took me some time till i found this solution
- 04 Aug 2005
I have completed my enhancements, and am attaching a new patch file. I have removed the concept of an array of image buffers, and rather put in three distinct, separate buffers within the netcam context. These buffers are named "receiving", "jpegbuf" and "latest". Rather than use indices to access them, they are directly accessed by pointers, and the "xchg" becomes an exchange of pointers.
I tested the code with a single (remote) netcam overnight, and had no problems at all with it. However, the new code consistently fails under my testing with 9 separate (simultaneous) cameras. Data errors occur, and threads unexpectedly terminate. For that reason, it should be considered experimental
Since I previously spent so much time convincing myself that the basic approach is correct, my thinking at this point is that the errors may not be directly related to the buffer switching. The big difference between this approach and the former (mutex) is that both
the camera-handling thread and
the jpeg-decompression thread are never put into a state of "waiting" for a buffer. I am continuing my investigation into this, but I wanted to get this patch published so that others (especially Peter) could check over what I have done and (if possible) conduct some simultaneous testing.
- 04 Aug 2005
I figured out where the problem came from. The idea of returning NULL within netcam_next if anything was wrong with the frame (e.g. a JPEG decompression problem) is not satisfactory. The NULL return causes the main Motion code to kill the thread (resulting in the unexpected termination I experienced). I'm not producing a different version of the patch at this time, but rather moving the discussion of the appropriate handling of such a case to the NetcamRetryErrorDiscussion
- 05 Aug 2005
I changed the NULL returns to be calls to a new routine "netcam_grey_picture" which returns an image according to it's name. I also re-generated the patch file to work against snap10 (if nothing bad happens in the near future, I would expect this feature to go into snap11). The patch also includes a little further cleanup of code and comments within the three netcam routines (as well as their include files), and changes the libjpeg errors to only appear if netcam.c is compiled with NETCAM_DEBUG. I believe that the only thing remaining is to enhance the error detetection and recovery, so I would suggest any further discussion go to the NetcamRetryErrorDiscussion
- 05 Aug 2005 16:50PDT
After further testing, and reports from users who were not able to implement this patch, further thought was given to what we were trying to accomplish. The idea of using the 'xchg' macro to atomically exchange two pointers is certainly attractive, but has three bad points. First, certain distributions (Red Hat in particular) do not include the 'xchg' macro in their asm/system.h include file. Second, although the macro is great for the i386 (and x86-64) architecture, it is not necessarily good on other architectures on which motion may be run. Third and last, the macro is actually intended to be used within the kernel, and is not guaranteed to be atomic (on all architectures) when used within user space. Apparently this is the reason Red Hat has decided not to include it within their distribution.
We discussed this further on the IRC chat channel, and our conclusion was that it was just not worthwhile implementing this approach. The potential savings in execution time do not justify the potential problems in implementation and support. For that reason, motion will retain the usage of pthread_mutex_lock / pthread_mutex_unlock. However, your suggestion to use a "triple buffer scheme" is still a very good one, and we will retain those enhancements.
The acceptable parts of this will be found under the NetcamErrorImprovementPatch
- 09 Aug 2005
I agree 100%.
I do not dare taking the risk that Motion will no longer on PowerPCs or other architectures then i386.
A performance gain per pixel is worth trying to persue. But a few extra instructions handling a mutex per frame is not worth limiting Motion to i386 when it is cross platform today. And I know we have non-386 users.
But other elements of this patch are great and will be used in the other Netcam patch.
- 09 Aug 2005
Third and last, the macro is actually intended to be used within the kernel, and is not guaranteed to be atomic (on all architectures) when used within user space. Apparently this is the reason Red Hat has decided not to include it within their distribution.
Do you have any links describing this, i can't believe?
- 09 Aug 2005
REF the xchg.diff "use xchg if configure checks it".
It is not worth it. If something as common as Red Hat and Fedora does not support this is will not go in.
It is two different ways to do things. When something does not work I will personally as a FC4 user not be able to test the same version of the running code as another user has.
I think the pthread method is pretty effective in the light that this is done normally 5 - 10 times per seconds (normal network camera speed).
- 12 Aug 2005
the main problem with xchg is, that not every arch supports it for using in userspace. Some archs needs locking of irq that is maybe not possible in userspace.
But for i386 archs and some others if anybody does not like locks
may be used.
- 13 Aug 2005