Codecs and Bitrates: Comparing MJPEG to H.264 Round 1

5Sep2018

15 min read

Both the 5D mark IV and 1DX Mark II use Motion JPEG (MJEPG) compression for their 4K video streams. The conventional wisdom, all be it paraphrased, has been that this super high bitrate was chosen since it was necessary to make the less efficient MJPEG codec produce a usable quality file.

At first blush, this sounds completely reasonable. I mean using a less efficient codec should mean you need more bits to get the same image quality.

The funny thing is that if you google around for information comparing MJPEG to H.264 it’s actually rather hard to find useful information. A lot of the discussion is focused on security cameras and generally low bit rates, not cameras and really high bitrates.

About the only reasonably close article was one by Thomas Skowron. Though his test isn’t really all that useful either, at least in my opinion. In his testing, he shot some 4K test videos with his 5D4. Then he converted them to H.264 files at varying bit rates, and looked at the how the PSNR and SSIM related ot the already compressed MJPEG files.

Fundamentally what he was asking is, what bitrate one needs H.264 to run at to recompress already lossy compressed MJPEG files without loosing too much quality. This is great if you want to slim down the storage situation, but it doesn’t really tell us much about the initial recordings.

Instead, the question we need to ask, is what is the difference in quality between H.264 and MJPEG compared to a “uncompressed” source.

After all, if the argument is that Canon’s choice to use MJPEG means that they need to run a much higher bitrate to compensate for the poor efficiency of the codec, then it stands to reason that H.264 would allow much lower bit rates to be used while still maintaining the same quality.

The monkey wrench in this argument, of course, is now that Canon is releasing a camera with H.264 based 4K, they’re still offering a 500 Mbps bitrate at the highest setting.

Of course, part of the compounding problem here is that this isn’t something that can be tested in the camera.

To start with, there’s no 4K clean HDMI out on the 5D4. So you can’t plug it into something and record a lossless or uncompressed video clip. This also means that you can’t look at the exact implementation details for the compression Canon uses either. Instead we have to look at this almost completely synthetically.

That said, ultimately the question here isn’t whether or not Canon’s implementation is ideal or not, it’s the comparative differences between the efficiency of the two codecs that we’re after. Canon will, at least with the H.264 codec have to contend with performance and power problems that simply don’t exist on a PC.

Test Overview

For this test, and I say that singularly, as I only generated a single round of tests based on a single scene. However, part of what I’ve spent time on with this initial test is automating the execution and ultimately the data collation so I can just change the scene and rerun the test on it.

With that said, for this test, I created a 5 second pan from the following image of the horseshoe bend overlook of the Snake River in Grant Teton National Park^[1].

The image started life as a seiners of losslessly compressed Canon CR2 raw images that were stitched into a 155 MP pano in Lightroom. Tough the pano was ultimately compressed as a lossy DNG, the starting point for all test videos is identical.

I chose this image due to the fact that it generally represents the kinds of scenes that I would expect myself to shoot. Though since the efficiency of these compression algorithms is dependent on the content of the video being compressed, this is in no way indicative of the kinds of performance you can expect in all situations. This is also the reason for the automation and working towards testing more video clips.

The clip was rendered out of Premiere Pro as a 24-bit PNG sequence, and this PNG sequence (all 2 GB of it) was then run through the open source video processing tool FFMPEG for both conversion and analysis.

For the sake of completeness, these tests were performed on:

Intel Xeon E3-1220 v2
16 GB DDR3-1333/PC3-10600 RAM
Ubuntu 18.04.1 LTS
Linux Kernel 4.15.0-33 generic x86_64
FFMPEG version 3.4.4-0
Libx264 version 152
x264 speed veryfast

Test Caveats and Gotcahs

I think the biggest think to be aware of here, is the that this test should over state the quality of H.264 video compared to MJPEG. This is the case for a couple of reasons.

First, I’m doing these tests on a general purpose computer using high quality software encoders that aren’t running in real time — in fact, they tend to run about 6 FPS on my system.

When H.264 is run in real-time applications some of the more effective, but time consuming, optimizations have to be disabled, and other parts of the algorithm have to be tuned for performance over quality.

In an effort to deal with the real-time problem, I ran my H.264 encoder (libx264) using the in the veryfast preset. This tuned the algorithms to towards a balance for speed over compression size. That said, I have to admit, whether I should have used veryfast or something faster, like ultrafast is a bit of a guess.

The second major difference is that FFMPEG uses the x264 library. X264 is a very high quality H.264 library that generally preforms substantially better than most, if not all, hardware implementations.

Conversely, MJPEG and JPEG itself, is very much a long solved problem. Canon’s in camera JPEG engine isn’t appreciably better or worse than any software. As a result, I would expect the FFMPEG MJPEG output to be very close in quality to what Canon’s cameras end up producing.

Finally, as I noted, this is a test of a single reference video. While the video is related to my workloads, it’s not indicative of all workloads. I intend to do further testing with new data sets at some point in the future. However, even a single 5s clip takes 20 minute to process to generate the 35 data points that I’m collecting.

Data

The full dataset for this test can be viewed here.

I’ve excerpted some from the full data sheet in theta table below.

Codec	Bitrate (Mbps)	Chroma	CPU %	Max Mem (MB)	PSNR (avg)	PSNR (min)	PSNR (max)	SSIM (all)
MJPEG	500	4:2:2	202	503.8	41.831248	32.470795	45.807426	0.983781
H.264 ALL-I	500	4:2:2	385	827.8	43.983679	43.15598	57.672527	0.986265
H.264 ALL-I	100	4:2:2	385	786.9	32.528070	30.989464	45.623106	0.885640
H.264 ALL-I	100	4:2:0	383	720.9	31.861992	30.472548	45.097514	0.878120
H.264 IPB	100	4:2:0	383	1220.9	45.215650	42.005638	56.130851	0.990605

Discussion

Comparative Image Quality: How Bad is MJPEG compared to H.264?

So the million dollar question here is just how bad is MJPEG compared to H.264.

Unsurprisingly, the answer is it depends.

What really surprised me is that in ALL-I mode, H.264 and MJPEG are pretty comparable — at least under these test conditions. MJPEG definitely lags behind H.264, but the differences aren’t unreasonably large.

At 500 Mbps, the SSIM and PSRN averages are only 0.3% and 5% respectively apart. Both put in PSNRs above 40, and SSIMs above 0.98 H.264’s more advanced processing does put it ahead, with much better minimum PSNRS (43.1 to 32.5) and higher max PSNRs (57.7 to 45.8).

That said, it’s not like you can halve the bitrate with H.264 and maintain the averages. Dropping H.264 down to 250 Mbps, ALL-I, 4:2:2, dropped the PSNR average to 37.5, and the SSIM to .954.

Of course all of this is again, to be expected. ALL-I and MJPEG are fundamentally doing something similar in storing all the frames as complete images. H.264 has some more advanced spatial compression options (like variable sized blocks) so it comes out a slightly ahead.

Where H.264 really makes huge gains is when you turn on the interframe compression options — that is, use an IPB mode.

Interframe compression dramatically reduces the number of bits needed to represent most of the frames because they’re no longer self contained. Saying that block 15 in frame 2 is the same as a block 15 in frame 1, uses a lot less space that it does to describe block 15 all over again. That in turn frees up bits that can be used to more accurately store the initial information more accurately.

We can see this in the PSRN and SSIM values as you’d expect.

Codec	Mode	Avg PSNR
MJPEG	500Mbps	41.831248
H.264	500Mbps ALL-I	43.983679
H.264	500Mbps IPB	54.742351

In fact, IPB is so much more efficient than ALL-I, that you have to drop the bitrate to around 60 Mbps before the IPB file has a PSNR as low as the 500Mbps ALL-I files (either MJPEG or H.264).

This of course raises the question then, for the best quality should you use ALL-I or IPB on your camera when recording. Unfortunately, there is a level of implementation dependency in the performance of these compression algorithms. Since I’m not testing the camera’s implementation — since there’s no way to compare anything — it’s impossible to know the camera’s specifics.

RAM Usage and Compression Algorithms

Granted, the RAM usage on a general purpose computer doing everything in software is going to be higher than that in a device with dedicated hardware. However, the trends are going to be generally the same.

The first thing to notice about RAM usage is that MJPEG used the least, by about 40% over the best case for high bitrate H.264. Further H.264 doing ALL-I encoding used, on average, just over half as much RAM as IPB compression did. All of this should be expected, if you know even the slightest bit about how these compression algorithms work.

MJPEG is nothing more than a sequence of JPEG images sorted in a container file. On a Canon camera it should use very little memory as it should be able to take advantage of the hardware JPEG encoder and the only thing the camera has hold in buffers (memory) are the frames on the way from the sensor to the encoder, and the encoder to the flash card.

Next up is ALL-I H.264. H.264’s gains in compression come from having more sophisticated algorithms in place to find more compressibility in the data.

For example, both JPEG and H.264 break images up into blocks to process and compress. For JPEG files, the blocks are always square and always the same size, 8×8. Blocks in H.264, however, aren’t fixed in size they can range from 4×4 to 16×16 with non-square shapes like 8×16 as well.

A bigger block that covers a large area of similar data, will compress data better than multiple smaller blocks. However, it takes more time and memory to figure out how to distribute these blocks of differing sizes.

That said, while the details are different, ALL-I H.264 is still just a sequence of compressed images much like MJPEG. The increase in RAM usage is there because of the more sophisticated processing that needs to happen, but images are only being buffered to the encoder, and from the encoder to the card.

Finally, H.264 in IPB mode consumes the most RAM, and generally puts the most data in to the least amount of bits. IPB compression uses all the tools in the H.264 toolbox to compress the data, both inside the frame (intraframe) and between frames (interframe).

The IPB designation represents the 3 types of frames that are available to the interframe compression engine. I frames are the stand alone pictures; the same as the ones in ALL-I and what ever frame is in MJPEG. P frames, are single predictive frames. These frames encode the difference between itself and the previous I frame. Finally, B frames are Bi-Predictive frames. These frames encode data as changes between two other frames, for example a P frame and an I frame.

The caveat for interframe compression is that since the P and B frames have to be able to reference other I and P frames, all the frames in a group of pictures (GOP) have to be kept in memory somewhere so the differences can be calculated. As a result, this drives the memory usage up substantially.

Instead of the codec having to buffer just the frame it’s dealing with, it has to buffer all the frames in the GOP. With GOPs around 30 frames in most cameras, and 4K frames being around 8.3 MP, that quickly burns through a lot of ram.

Conclusions

At least under these test conditions and with this sample video file. The differences between H.264 and MJPEG at 500 Mbps aren’t as significant as I had previously expected.

Again, this not representative of all scenes, and is being computed based on the use of software encoders not the hardware in available cameras.

Finally, I intend to make some of the tools available, but I’m still working on cleaning them up.

This is near the location where Anas Adams took his famous picture of the Tetons behind the Snake River. (The trees have grown appreciably since he shot his image). ↩︎

Comments

Matt

January 4, 2021 6:35 PM

Fantastic breakdown. I’m looking for a good high speed Proxy codec that still does some decent compression. My software (Blender) on a PC doesn’t do ProRes (which I use a fair bit) and I was debating between MJPEG and H.264 All-I or something else. I appreciate the clear data for this case!

Jason Franke | admin

January 4, 2021 8:32 PM

If Blender supports it (I’ve played around with Blender, but not for long enough to really do anything), you might want to look at Avid’s DNxHR format. It’s basically ProRes for non Apple users. You may need to get the codec pack from Avid (it was a free download last time I needed to install it).