Motion - Motion Profiling
You are here: Foswiki>Motion Web>MotionProfiling (27 Jan 2008, BobSaggeth)Edit Attach

Profiling Motion

What is Profiling?

From the gprof documentation: "Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing." In other words, profiling is important to know where there is a need for optimization. Optimizations are good, but there's no point in optimizing rarely called code, or code that already is fast. Instead, optimization should be directed at bottlenecks and CPU-hogging code.

How is it Done?

First of all, you need a profiler. Here, I'm going to use gprof, the GNU profiler, which should be available on most Unix-based systems. To profile a program using gprof, here is what you do:

  1. Alter the makefile, adding -pg to both the compile flags and the link flags.
  2. Compile and build.
  3. Run the program. A file named gmon.out will be produced (provided that the program exits gracefully).
  4. Run gprof: gprof ./prog gmon.out, where prog of course is the program executable.

(Note: gmon.out is the default profile filename, and can be omitted.)

The output appearing on screen should begin with a "flat profile", which shows the amount of time spent in each function, and how often the function was called. For example, the following excerpt from a flat profile shows that the draw_textn function was called 2886 times during execution of Motion, and each call took 49ms. The function accounted for 1.9% of the total execution time.

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
...
  1.90     74.03     1.41     2886     0.49     0.49  draw_textn
...

Profiling Multi-Threaded Programs

The problem with profiling Motion is that Motion is multi-threaded. If you try to profile Motion the way that is described above, you will get a pretty useless flat profile, since all times will be zero. A simple and very elegant solution is outlined in HOWTO: using gprof with multithreaded applications. Basically, you build a preload library that will wrap each call to pthread_create, preparing the newly created thread for profiling. So, the procedure for Motion becomes:

  1. Build the preload library, gprof-helper (I've attached an archive containing source code and a makefile).
  2. Alter Makefile (after running configure), adding -pg to both CFLAGS and LDFLAGS.
  3. Compile and build.
  4. Run Motion using LD_PRELOAD=<path to gprof-helper.so> ./motion (assuming you're in Motion's source directory).
  5. Exit Motion using, for example, motion-control (to force a graceful exit).
  6. Run gprof: gprof ./motion gmon.out (again, gmon.out can be omitted).

Now you should see the true flat profile for Motion (as in the short example above).

Example Profiles

Below are two example profiles from profiling Motion 3.1.17 with the updated rotate patch applied. To keep it short, only the first 10 or so entries are included.

Profile when idle

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 69.43    305.93   305.93                             motion_loop
 14.57    370.14    64.21     8681     7.40     7.40  alg_diff_fast
 12.98    427.36    57.21    26046     2.20     2.20  reverse_inplace_quad
  2.40    437.94    10.59    26043     0.41     0.41  draw_textn
  0.14    438.58     0.64     8682     0.07     6.75  vid_next
  0.08    438.91     0.34     8682     0.04     0.04  v4l_next
  0.07    439.20     0.29     8682     0.03     6.62  rotate_map
  0.05    439.44     0.24     8681     0.03     0.03  webcam_put
  0.04    439.61     0.17     8682     0.02     0.02  v4l_set_input
  0.03    439.75     0.14    17365     0.01     0.03  event
  0.03    439.88     0.13    17362     0.01     0.62  draw_text

Nothing strange here. The heaviest function is alg_diff_fast, but 7.4 ms per call isn't too bad, I think.

Profile when detecting motion

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 31.51     60.43    60.43                             motion_loop
 18.83     96.56    36.12      193   187.16   187.16  dilate9
 12.29    120.13    23.57      193   122.13   132.65  alg_labeling
  9.60    138.55    18.42      193    95.45    95.45  dilate5
  8.86    155.55    17.00      442    38.46    38.46  alg_diff_standard
  5.40    165.91    10.36     4854     2.13     2.13  reverse_inplace_quad
  4.47    174.49     8.58     1197     7.17     7.17  alg_diff_fast
  2.26    178.82     4.33      353    12.27    12.27  erode9
  1.65    181.99     3.17      104    30.48    30.48  alg_locate_center
  1.41    184.69     2.70      235    11.49    11.49  erode5
  1.16    186.91     2.22      104    21.35    21.35  alg_locate
  1.14    189.09     2.18     4851     0.45     0.45  draw_textn

Now it's getting more interesting. Apparently dilate9, alg_labeling and dilate5 are pretty CPU-intensive functions. In other words, they are the prime candidates for optimizations. The alg_diff_standard function is five times slower than alg_diff_fast, but then again it's also more precise.

Notes

  • For FreeBSD, read the comments below and use the separately attached makefile.
  • Don't stop Motion using kill -9 or by pressing Ctrl-C (in non-daemon mode). Then the profile file won't be created.

-- PerJonsson - 19 Oct 2004

Discussion and Comments

Nice doc Per ! smile

Unfortunately i didn't get working gprof in FreeBSD when i tested ( there's not any gprof-helper version ).

-- AngelCarpintero - 19 Oct 2004

Thanks smile

So the source cannot be built on FreeBSD? What errors do you get?

-- PerJonsson - 19 Oct 2004

Great doc. I moved it under Misc info. The menu on the front page is getting a little crowded.

I will have good use of this info.

-- KennethLavrsen - 19 Oct 2004


gprof-helper doesn't want to build , i've added the right includes and LD value to compile against linuxthreads but that's not the problem , here it's :

/usr/libexec/elf/ld: cannot find -ldl

-- AngelCarpintero - 20 Oct 2004

I just read something about FreeBSD having the dl* functions in libc, which means that you could remove -ldl from the build line. Does that work?

-- PerJonsson - 20 Oct 2004


Yes ! you're right smile i've attached the FreeBSD Makefile . To run in FreeBSD basically is the same :

  1. -. LD_PRELOAD=/usr/home/motion/gprof-helper/gprof-helper.so ./motion
  2. -. kill -15 <first_motion_pid>
  3. -. gprof ./motion motion.gmon > profile.motion

-- AngelCarpintero - 21 Oct 2004

Another important remark about the different steps during the profiling process:

When motion is started like 'LD_PRELOAD=/usr/home/motion/gprof-helper/gprof-helper.so ./motion', the actual profiling data is captured. To see the results, you have to terminate Motion and start 'gprof ./motion'.

-- JoergWeber - 25 Sep 2007

From this information, do you have any recommendations for us normal users to decrease our CPU load? -- Should I disable label or dilute or any other specific despeckle filter?

-- BobSaggeth - 26 Sep 2007

When it comes to CPU usage, the despeckle option is right on top of the list, yes. To reduce load, you can start disabling labelling. For indoor cams, you should not need despeckle anyway - except for very cheap and noisy cams. Smartmask may also need some resources and should be disabled when not needed.

-- JoergWeber - 26 Sep 2007

I did some system profiling while motion was recording an event and I had the webcam open -- On a 1,8ghz Core2Duo, the system was 50% busy from just 1 camera recording at 25fps 640x480.

Here is the profiling information:

CPU: Core 2, speed 1862.56 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        image name               app name                 symbol name
47835    26.5178  libjpeg.so.62.0.0        libjpeg.so.62.0.0        (no symbols)
40494    22.4483  libavcodec.so.1d.51.38.0 libavcodec.so.1d.51.38.0 (no symbols)
21368    11.8456  motion                   motion                   alg_update_reference_frame
18182    10.0794  libc-2.6.1.so            libc-2.6.1.so            (no symbols)
9593      5.3180  motion                   motion                   alg_despeckle
6873      3.8101  mplayer                  mplayer                  (no symbols)
4796      2.6587  motion                   motion                   alg_locate_center_size
3482      1.9303  libGLcore.so.100.14.11   libGLcore.so.100.14.11   (no symbols)
1792      0.9934  motion                   motion                   alg_diff_standard
1790      0.9923  libmozjs.so              libmozjs.so              (no symbols)
1607      0.8909  motion                   motion                   alg_diff
1483      0.8221  motion                   motion                   iflood
1450      0.8038  nvidia_drv.so            nvidia_drv.so            (no symbols)
1204      0.6675  libflashplayer.so        libflashplayer.so        (no symbols)
1133      0.6281  Xorg                     Xorg                     (no symbols)
1013      0.5616  motion                   motion                   draw_textn
812       0.4501  nvidia                   nvidia                   (no symbols)
651       0.3609  saa7134                  saa7134                  (no symbols)
621       0.3443  libglib-2.0.so.0.1400.1  libglib-2.0.so.0.1400.1  (no symbols)
376       0.2084  libvte.so.9.2.13         libvte.so.9.2.13         (no symbols)
374       0.2073  jbd                      jbd                      (no symbols)
366       0.2029  oprofiled                oprofiled                (no symbols)
364       0.2018  ext3                     ext3                     (no symbols)
290       0.1608  vmlinux                  vmlinux                  unix_poll
270       0.1497  libfb.so                 libfb.so                 fbCopyAreammx
260       0.1441  compiz.real              compiz.real              (no symbols)
243       0.1347  libglib-2.0.so.0.1400.1  libglib-2.0.so.0.1400.1  (no symbols)

-- RomanGaufman - 04 Dec 2007

And here's the same without the webcam being accessed - why is mpeg4 encoding eating so much CPU? -- mpeg4 encoding done by the windows software takes a mere 2-3% cpu on the same system with the same card.

CPU: Core 2, speed 1862.56 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        image name               app name                 symbol name
201611   67.4810  libavcodec.so.1d.51.38.0 libavcodec.so.1d.51.38.0 (no symbols)
23815     7.9711  motion                   motion                   alg_update_reference_frame
18878     6.3186  libc-2.6.1.so            libc-2.6.1.so            (no symbols)
9200      3.0793  motion                   motion                   alg_despeckle
7422      2.4842  libjpeg.so.62.0.0        libjpeg.so.62.0.0        (no symbols)
2831      0.9476  motion                   motion                   alg_locate_center_size
2635      0.8820  libmozjs.so              libmozjs.so              (no symbols)
2330      0.7799  motion                   motion                   alg_diff
2021      0.6764  motion                   motion                   alg_diff_standard
1377      0.4609  libflashplayer.so        libflashplayer.so        (no symbols)
1118      0.3742  libGLcore.so.100.14.11   libGLcore.so.100.14.11   (no symbols)
1083      0.3625  motion                   motion                   draw_textn
1053      0.3524  libgklayout.so           libgklayout.so           (no symbols)
848       0.2838  nvidia                   nvidia                   (no symbols)
833       0.2788  Xorg                     Xorg                     (no symbols)
747       0.2500  ext3                     ext3                     (no symbols)
719       0.2407  jbd                      jbd                      (no symbols)
717       0.2400  saa7134                  saa7134                  (no symbols)
599       0.2005  oprofiled                oprofiled                (no symbols)
583       0.1951  libxpcom_core.so         libxpcom_core.so         (no symbols)
553       0.1851  vmlinux                  vmlinux                  copy_user_generic_string
539       0.1804  libglib-2.0.so.0.1400.1  libglib-2.0.so.0.1400.1  (no symbols)
477       0.1597  libxpconnect.so          libxpconnect.so          (no symbols)
431       0.1443  libavformat.so.1d.51.10.0 libavformat.so.1d.51.10.0 (no symbols)
365       0.1222  nvidia_drv.so            nvidia_drv.so            (no symbols)
306       0.1024  libglib-2.0.so.0.1400.1  libglib-2.0.so.0.1400.1  (no symbols)
300       0.1004  vmlinux                  vmlinux                  __d_lookup

-- RomanGaufman - 04 Dec 2007

BTW, I used oprofile for the above profiling - couldn't get gprof to work straight away so decided to use the next best thing smile

-- RomanGaufman - 04 Dec 2007

Hmm, I'm having the same with realtime FPS, has anyone tried to use something other than ffmpeg? - Or use ffmpeg differently? -- Wonder how the windows software manages to get the CPU usage so low as the videos are also h263 or divx.

-- BobSaggeth - 27 Jan 2008
I Attachment Action Size Date Who Comment
MakefileEXT Makefile manage 313 bytes 21 Oct 2004 - 05:31 AngelCarpintero freebsd version , added linuxthreads lib and headers
gprof-helper.tar.gzgz gprof-helper.tar.gz manage 1 K 19 Oct 2004 - 12:06 UnknownUser Archive containing gprof-helper source and a makefile.
Topic revision: r14 - 27 Jan 2008, BobSaggeth
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.