7. Performance

video-pipeline is a package designed to take advantage of the Python multiprocessing library in order to speed up image manipulations and filters applied to a PiCamera video stream in real time.

The intent of this approach is to spread the work of image manipulation operations across many processes to “parallelize” the work that is done on each frame. This concept holds under the assumption that the image manipulation operations do not depend on the order in which the frame is captured, so multiple frames can be processed in parallel and then ordered chronologically later after completing the processing step.

Using video-pipeline to process many frames in parallel is supposedly capable of processing more image frames than processing each frame sequentially (capture->process->send out->repeat) although the code architecture needed to support multiprocessing is much more complex and likely introduces additional overhead to the pipeline versus serial processing. The true performance gains of using video-pipeline have not yet been quantified.

7.1. Testing approach

With this set of tests, I will quantify the performance of video-pipeline versus serial image processing on a PiCamera video stream. I will also quantify the operational overhead of the pipeline itself compared to no image processing. The primary metric for performance is frames per second (fps) of the video output.

Isolate image processing performance

I am interested only in the image processing part of video-pipeline’s performance, so I will create a “control” script that uses video-pipeline interfaces to PiCamera and outputting a video stream to a client. The control script will NOT use video-pipeline tools to handle images captured from PiCamera, but it WILL use the same operations on each frame, processing each frame directly and in order. In other words, the control script will be used to quantify the performance of the test setup itself and establish a baseline.

Quantify performance impacts from overhead

While the main benefit to using video-pipeline is its multiprocessing support, it is possible to run video-pipeline with one process. This effectively forces video-pipeline to operate on frames in series. While this is not a realistic use-case of the package, it provides us with an opportunity to quantify performance losses from any additional overhead introduced from using this package versus plain serial processing. Ideally, there would be little to no overhead and using a single-process video-pipeline would have the same impact as any other script between capturing and displaying frames.

Quantify gains from parallel processing

video-pipeline allows the user to select an arbitrary number of parallel processes to use for image processing. Clearly the upper limit to the number of concurrent parallel processes is limited by hardware capabilities, but we can still assess the performance gains compared to a single-process baseline. Even on hardware with one CPU core, the Python multiprocessing module abstracts this away so we can specify an arbitrary number of processes. For these tests we will compare the performance of video-pipeline with 1, 2, 4, 8, and 16 processes in the pool. The expectation is that performance improves with more than one process but with diminishing returns as the number of processes increases.

Try various image processing operations

The performance of an image processor is heavily dependent on the operations it must perform on each frame. As an image processing task has more operations or more complex computations to be run on every pixel, it is expected to have lower throughput (fps). As such, I will subject the serial baseline and video-pipeline to the following operations: - No-op. Output frames exactly match the captured frames. Any performance losses are attributed to overhead. - Grayscale filter. Convert captured frames (RGB) to single-channel grayscale, then output the grayscale frame as an equivalent 3-channel (RGB) frame. The number of operations is proportional to the number of pixels in the frame. This method is built in to PIL. - Sobel filter. Compute Sobel edge detection algorithms on the captured frame and output the filtered, grayscale result as an equivalent 3-channel (RGB) frame. The number of operations is proportional to 8x the number of pixels in the frame since it convolves a 3x3 kernel with every pixel. This method is built in to PIL. - Color select filter. Convert captured frames (RGB) to HWV color space. Create a binary mask of the pixels that are within the desired HSV bounds. Apply the binary mask to the original frame as a logical-and, then output the result as a 3-channel RGB frame. I don’t know how many operations this is but it’s probably more than the Sobel filter. These methods are built in to OpenCV.

7.2. Baseline Script

# TODO@phil: write me
# the source for the baseline serial, single process image processor will go here

7.3. Test Execution

TODO@phil explain the scene and how the test is conducted. use gifs where applicable.

7.4. Test Results

TODO@phil: include plots showing diminishing returns from increasing number of processes TODO@phil: include gif of sample video

Raspberry Pi 2

640x480 # processes No-op Grayscale Sobel Color Select
Baseline 1 ?? fps ?? fps ?? fps ?? fps
video-pipeline 1 ?? fps ?? fps ?? fps ?? fps
video-pipeline 2 ?? fps ?? fps ?? fps ?? fps
video-pipeline 4 ?? fps ?? fps ?? fps ?? fps
video-pipeline 8 ?? fps ?? fps ?? fps ?? fps
video-pipeline 16 ?? fps ?? fps ?? fps ?? fps

Raspberry Pi 3 B+

640x480 # processes No-op Grayscale Sobel Color Select
Baseline 1 ?? fps ?? fps ?? fps ?? fps
video-pipeline 1 ?? fps ?? fps ?? fps ?? fps
video-pipeline 2 ?? fps ?? fps ?? fps ?? fps
video-pipeline 4 ?? fps ?? fps ?? fps ?? fps
video-pipeline 8 ?? fps ?? fps ?? fps ?? fps
video-pipeline 16 ?? fps ?? fps ?? fps ?? fps