I just finished writing an article looking at MJPEG and H.264 and how they compared to each other in at least one test scene. In that article, I found myself wanting to write a section about why Canon may have chosen the codecs they did when they did, only it didn’t really fit and as I wrote it it got really long.
Canon has definitely received a lot of flack across the internet for the way their cameras, at least 5D mark IV, 1DX mark II, and to a lesser extent the EOS 1DC, record 4K video. Broadly, the complains seem to come down to two things.
- The files are huge
- They shouldn’t have used MJPEG for “reasons”
Unfortunately, when it gets to those reasons, well that’s when things get problematic. The biggest implication seems to be that if Canon had used H.264 instead of MJPEG the files would have been both better and substantially smaller.
The problem is, I’ve had a very hard time finding meaningful comparative tests of the two codecs on equal footing. There’s a lot of “but H.264 is better,” but I couldn’t actually find much in the way of head to head comparisons done against the same reference files to show by how much and under what conditions.
With that said, this is something that’s been bugging me for a while. On one hand, I’m very much in the group that would have liked to have had smaller 4K files on my 5D mark IV. On the other hand, my background in computer engineering tells me that it may not have been possible with the time and processing equipment at hand.
Oh yea, this article is a bit of a ramble on cameras and engineering in general. I’d apologize, but the reality is you can’t really condense complex discussions in to a 500 words and not lose all of the complexity.
Manufactured Products are Designed
This should be obvious: manufactured goods do not naturally occur in the wild, they must be designed and then, well, manufactured.
If you’re a hobbyist, say building electronics projects for fun or to do something for your photography (e.g., sound triggers for the guys doing rocket launches), the name of the game is being agile. While you’re not going to set out with no plan, the plan can be changed and evolved rapidly as you work through the design. At least this is how I work on most, if not all, small electronics projects.
That’s not the case with something like a camera. The amount of money involved to get something like this out the door isn’t something that can be thrown around freely. Consequently, there’s a whole lot more planning involved and the design process starts with detailed goals, objectives, and specifications. The design brief will identify things like the target market segment, the features and performance targets for that market segment, and details on the technical capabilities for things that might be integrated.
Some of that information is sourced from internal metrics and timelines. For example, Canon would know that their gen. x pixel architecture is ready to go into production, or that their their processor with a new capability is ready to go. Or for that matter, that some part in a previous design was more prone to failure.
Some of this information comes from market research. This could be in the form of product analysis where companies look at what their competition is doing. Another source are product advisory boards; where carefully selected industry experts provide specific feedback on either capabilities they want or on designs in progress. Finally there’s feedback from both customers (e.g., through surveys or direct contact and complaints) and from internal sources like the repair/service people about specific problem areas.
Really, before you can even start talking about possible engineering challenges, it’s really necessary to try and figure out what the design goals were to start with. Without understanding that, there’s no context to understand anything else.
Design Goals: Cinema Down, not Consumer Up
Canon is never going to disclose their internal design documents, so we can’t know for sure what Canon’s actual focus was. However, we can draw some inferences from what they’ve said and done.
One thing that I’ve always suspected, is that with the 5D mark IV and 1DX mark II Canon was that Canon’s influence, and the feed back they were working on, was heavily influenced by high level cinematographers, not lower tier videographers and prosumers. The simple reality is, there are too many design points that lean towards pro cinematography that clearly aren’t driven by technical limitations.
To start with, Canon chose to use the cinema standard DCI-4K (4096×2160) not the consumer standard UHD 4K (3840×2160) resolution. Moreover, given that DCI-4K has more pixels per frame, there’s no technical limitation in terms of pixel processing that would prevent Canon from offering a UDH crop. The only reasonable explanation for why UHD wasn’t included is a deliberate design choice, and one that’s aimed at the very high end.
Then there are the ALL-I modes that debuted in the prior generation. Intraframe only compression sacrifices compression efficiency for some, arguably minimal for most people, benefits in editing and post. This is something that very high end pros were, and are, asking for, if not demanding. But for prosumers and even lower tier pros, it’s not so important.
Finally, there’s the 500 Mbps bitrate that leads to the “huge” files. From a consumer perspective 500 Mbps is insanely high. However, in a pro context, for 4K video that’s pretty much the standard bitrate for compressed video. Apple’s ProRess 422 targets bitrate of 629 Mpbs for 4K files, and Avid’s DNxHR SQ format targets 588 Mbps.
Ultimately, I think this is where a lot of the criticism stems from too. It’s not high-end pro video and cinema people complaining, its those of us much lower down the production quality totem pole that are.
That said, with the benefit of hindsight, it’s easy to say that Canon screwed up and didn’t appreciate the lower end of the market as much as they should have. However, Canon didn’t have the benefit of hindsight at the time that they were designing the cameras. Now, we know both what the competition did, and how related industries (like TV sales, and the associated demand) developed.
However, Canon was doing most of this design work back in late 2013 or 2014, and potentially even earlier for some of the hardware. Back then, 4K TVs, while available, were extremely expensive, niche, and didn’t sell in nearly as much volume as they did in 2016, let alone today. Netflix was just starting to do 4K, but to get your content on Netflix you had to be a pro level production anyway. Finally, YouTube wasn’t even talking about 4K, and wouldn’t roll out 4K support until 2016 when the 5D4 was actually released (at that point it would be far too late for Canon to change anything anyway).
Changes and Lead Time in Hardware Design
Of course, the easy armchair argument then is if the industry changes, change with it.
Only when you start talking about hardware changes, especially when they involve significant redesigns, they simply don’t happen quickly. In fact, the flexibility to adapt generally decreases over time as more details are solidified and the product gets closer to launch. In some respects, the initial design team needs to predict the future, often the future 2, 3, or 4 years down the road, and do so fairly accurately too.
Changing direction is a substantial process. I think a good illustration of this kind of thing is actually something from Canon themselves.
From around 2008 to 2012, Canon’s cameras largely stagnated. They reused the same 18 MP APS-C sensor, with minor tweaks (if that), in camera after camera, and their higher end camera development basically stopped during that time frame. During this era, Nikon (largely due to their partnership with Sony) and Sony saw consistent improvements in their sensor tech putting them ahead of Canon; a margin that to one extent or another still exists today.
The picture painted by the camera press and a lot of people around the net, was more or less that Canon was completely incompetent and have no idea what they’re doing.
However, if you pay attention to comments made by Canon employees at the time, the picture you start getting is substantially different. Canon Japan, where the design happens, clearly wasn’t prepared for the response to the video features that came with the 5D mark II, and their response was to radically shift Canon’s priorities to exploit that.
There’s a video out there (if I can find it again, I’ll post a link) of a Canon USA marketing member talking about how when the Canon engineers came over to brief them on the 5D mark II. They story went that the Canon Japan’s guys spent the entire meeting on still photography stuff, and only mentioned that it could shoot video moments before the meeting was adjourned. Canon USA’s people immediately recognized the impact, but apparently Canon Japan’s engineers and design people hadn’t see it as being that big of a deal.
As an aside, I would note, that this kind of thing, where people deeply involved in a product or a field miss the implications of substantial shifts in the technology isn’t uncommon. It happens in most fields, especially when the change is a radical departure form the status quo.
It’s immediately following the 5D mark II’s release that Canon enters this period where they seem to have completely, well, if you go by the conventional wisdom, “given up” on designing cameras. However, what you’re really seeing is what it looks like when a major company completely changes course on their design and engineering priorities. The three years of minimal change and progress were capped off by a rapid succession of new video-centric sensor designs, an entire new product line (the Cinema EOS), and ultimately pretty substantial technology shifts like the development of DPAF.
None of that comes cheap, or quickly, though. Most big companies aren’t even remotely nimble, and a major change to their strategic plan is not a small feat. Things like Dual Pixel AF don’t just spring up over night for use back in 2016, or even 2014. R&D work, not just for DPAF, but for foundations needed to support it (such as the read interface and supporting electronics) didn’t happen over night.
In fact, some of the formative underlying work was probably happening at least back as far as 2010. That year, Canon announced, and took a lot of flak for doing so, that they successfully developed a 120 MP APS-H senor that could sustain 9.5 FPS, and shot windowed video using windows at 1/16th of its surface.
The only thing most internet commenters could see at the time was a stupidly high resolution sensor that nobody wanted because of the prevailing wisdom.
However, consider that sensor again, only now in the context of the Dual Pixel technology. While that sensor wasn’t a dual pixel design, the tech needed to read 120 million pixels out at nearly 10 FPS, is not at all different than the tech needed to handle the increased load of dual pixel sensors. 120 MP at 9.5 FPS is the same amount of pixels as 16.7 MP at 64 FPS, and 16.7 MP is the number of sub pixels in a DCI-4K frame.
Similar to changing direction, making mistakes can be even worse when you’re talking about hardware. Unlike software, there’s no pushing an update to fix a broken chip design or a missing feature. Consequently, everything is more deliberately, which, of course, take time. The costs of mistakes is especially high if it requires a product recall to fix.
Unfortunately, in so far discussions go, this is also a topic that’s so distant from most people’s realities and lives that there’s no appreciation or intuition for the processes or time scales involved.
Even a recent article by a German author suggests that because cameras use silicon ICs parts, the development time is minimal. The reality is, and you can look at this in virtually any industry that builds silicon based chips, not quite that simple. There’s still a substantial amount of lead time involved in building custom, or even semi-custom, silicon.
CMOS Image Sensors are not Just Computer Chips
More importantly, the author of that above linked piece makes a perfectly reasonable mistake in his assumptions too. CMOS Image Sensors (CIS) are not “computer chips”.
While they use silicon wafers and similar manufacturing processes to digital logic devices, they aren’t digital parts. At best, they’re hybrid analog/digital parts; at worst, they’re pure analog parts. The photo diodes and read amplifiers are all analog circuits, in fact, everything before the signal gets to the ADC is analog.
I’m making this point because there’s substantial differences in the design and requirements of analog ICs from digital ones.
For example, with a digital circuit the design is really only concerned with switching at a threshold voltage. That is, if the voltage is lower than the threshold the transistor doesn’t conduct; if it’s above that voltage then the transistor conducts. The voltage itself isn’t really that important as long as it’s within the designed ranges.
In an analog circuit, the voltage (or current) is the important thing, and the circuit has to take care to not to influence the signal improperly. Moreover, the circuit designer has to take care to route other signal lines in such a way that the EMI produced by current moving them doesn’t influence parallel signal lines. This is a big differentiating part from digital systems, while it’s important for a digital design to insure that signals remain clean, there’s a lot more room for error because the exact voltage doesn’t matter as much.
Looking at say CPUs or GPUs and trying to draw inferences on how long it should take to design a CMOS image sensor does not yield necessarily relevant inferences.
Constraints of an Embedded System
Cameras, like all embedded systems, pose interesting problems to their designers due to one simple reality: limited resources.
In almost all cases, this means limited computational power, memory, and in the case of portable devices limited power availability (both in terms of instantaneous current draw, voltage, and sustained load). This in turn puts constrains on what the hardware can do.
An example of these kinds of limitation is the situation that prompted Apple to throttle CPUs in devices with worn down batteries. Without the throttling, under load the CPU drew so much current from the battery that the voltage dropped in response. Under the right conditions the voltage would fall enough that it would cause the phone to spontaneously reboot.
These limits, mean that application specific processors, like Canon’s Digic, there’s a lot of customization to the hardware to implement their processing in hardware instead of trying to run it in software.
Beyond power consumption, the next significant hurdle is going to be memory use and availability.
By my estimates, there’s around 500 MB of DRAM in the 5D mark IV. This appears to be implemented as a single 4 gigabit dram module connected to the CPU.
A 2Gbit DRAM chip only provides 250 MB of RAM, which clearly isn’t enough for a camera that empirically can have 300+ MB of data buffered to be written to the card. A 4 Gbit part provides 500 MB of RAM, which is consistent with the needed ram buffer space.
RAM and Compression
In any event, how much RAM is needed by any given compression process is dictated by how much data needs to be kept around to do the compression. For example, a Canon camera shooting MJPEG should only require enough memory for a buffer to hold the raw data from the sensor, and a buffer to hold the JPEG output of the JPEG engine before it’s written to disk.
H.264 does more sophisticated compression than JPEG, and so even in the ALL-I modes, will require more ram to hold the temporary data that’s generated to do the compression.
Interframe compression, such as IPB H.264, has the highest memory overhead. This is because the compression software has to keep copies of previous frames to determine what’s changed between frames. This means that all the intra (I-frames), and predictive (P-frames) frames in a GOP have to be kept in memory for the duration of the GOP so the future P and B frames can be calculated.
Storing these frames requires some memory. Though I can’t speak for sure as to how much as I’ve never gotten that far into the H.264 algorithm and the data structures it creates.
All this talk about memory got me thinking though. In my software tests, I found that libx264 on Linux required nearly 1.5 GB of memory to do IPB compression at 4K resolutions. A hardware solution will almost certainly use less, but how much less is an open question.
Which raises the question, how is that a GoPro can mange to do it?
In digging, somewhat surprisingly there’s a lot more RAM in a GoPro than you would probably guess. This tear down of a Hero5 shows that there’s an 8 gigabit LPDDR3 chip on board. 8 gigabits translates to 1 gigabyte, twice the estimated RAM available in the 5D mark IV, and only the total estimated RAM in the 1DX mark II.
So why can’t the 1DX mark II do 4K H.264?
Finding details about the internal architecture of Canon cameras is, put bluntly, next to impossible. Even with the help of the opensource project Magic Lantern, I’ve not been able to find a whole lot of information about the way the EOS 7D is architected, and they won’t even touch the EOS-1D bodies.
Simply put the problem with a 1DX having 1GB of RAM comes down to how that memory is organized and accessible to the processors. And unfortunately this isn’t something I can’t guess at.
Looking at the PCBs, there’s clearly 1 RAM module attached to each CPU. The laptop/desktop/server computers we’re normally use to, use an design called symmetric multiprocessing. In these designs all the CPU cores are treated equally, and the attached memory is available to all parts of the system.
However, we’re not talking about a desktop multiprocessor system. Instead of two CPUs operating as peers with one unified address space, it’s entirely possible that the two CPUs are running in a master/slave arrangement.
In a master/slave arrangement, the master CPU would handle all the camera management and control functions, as well as its half of the image processing tasks. The other half of the image processing tasks would be offloaded to the save CPU by the master. However, this model doesn’t required shared access to memory — though it doesn’t preclude it.
And in fact, there are, arguably, some advantages to this kind of arrangement, mostly because it simplifies the system software design since the main OS can treat the second CPU as a device instead of having to deal with coordinating parallel processing and resources.
My point is, if the camera isn’t a symmetric multiprocessing systems, there may be no way for the master CPU to access the secondary CPUs memory directly. Ultimately, that means that instead of having 1GB of RAM to work with, it only has 500 MB. Without that, it’s likely that there isn’t sufficient memory for h.264 IPB compression to work at 4K resolutions.
Okay, good “science” requires looking at falsifying the premise. So lets give it a shot: Now there are two cameras on the market from Canon that do 4K H.264 video: the EOS M50 and the EOS R.
So in both cases, the cameras have newer Digic processors than the 5D mark IV and 1DX mark II. But that’s kind of a week argument, as a newer processor should still require similar buffers to do the same tasks.
In the case of the EOS M50, it would seem (at least if you add up the buffer needed to support 10 frames at 10 FPS) that the camera should probably only have 500 MB of RAM. However, I can’t find any images of the mainboard, or teardowns of the camera yet. It’s entirely possible that it has a larger DRAM buffer, but the buffer is artificially limited due to marketing reasons.
As things currently stand, the problem with the M50 is that there’s very little technical information that I can find of the type that’s needed to draw any real conclusions. I can’t find any teardowns that show the DRAM chip in enough resolution to get a part number. Nor can I find any relevant info in the Magic Lantern community.
As for the EOS R, the published specifications support the premise that the camera has an 1GB of RAM in it. The rated burst is 34 frames when using a regular SD card, and 47 frames when using a UHS-II SD card. Compare this to the 18–21 frames the 5D mark IV with high speed SD and CF cards respectively.
What about a Sony camera, say the A7 (their first 4K capable “still” camera). Based on Imaging Resource’s performance numbers, with a 95MB/s Sandisk SD card, the A7 takes 14 seconds to clear the buffer. If we assume the card gets the full 90MB/s writes Sandisk claims, that’s 1.26 GB of data that’s been transferred. Even assuming a slower transfer rate, this supports there being more than 500 MB DRAM in the A7.
Okay, I think I’ve rambled about this for longer than I intended to. So to wrap this up. It seems that hardware 4K compression requires more than 500 MB of RAM.
Of course in hind sight it’s always real easy to say that Canon should have done this or that. After all these products were engineered, and the engineering could have been done to reach a different target. What’s far harder is to identify which features are important and prioritize the resources for them while essentially predicting the future.
Looking at pictures of the main board in Lens Rental’s teardown. Comparing that image to other images (namely the Digic 6+ one on this SLR Lounge page, and this one on DPReview’s forums), the largest chip is identified in both of those images as the Digic 6+ processor. The smaller chip, labeled Elpida in the Lens Rentals post, is identified as the Digic 6 processor.
Cross referencing the boards with images of the mainboard for the EOS 1DX mark II, both of the Digic 6+ parts on that camera have similarly placed angled chips next to them as well. And since the 1DX has a 2/3rd the resolution and 2.6 times the buffer capacity, that strongly suggests that it has double the RAM, and two chips to provide it.
Also, angling the DRAM chips controls the distance of various traces allowing for more uniform lengths for the DRAM bus. In any event, DDR3/4 DRAM modules are offered in specific size increments of 1, 2, 4, 8, and 16 GBit capacities. ↩︎
One characteristic of DRAM busses is that they need to have the signal traces all nearly the same length within a tolerance. If you look at the traces on many computer motherboards, at least when they’re visible and not hidden under paint layers, you’ll often find that some traces will have weird snaky patterns in them. This is to even out the length differences in the traces. ↩︎
Even in a NUMA, or Non-Uniform Memory Access, system the memory is all contained in one logical address space, even though it might take longer to access the memory that’s attached to a different CPU. ↩︎