A Brief Summary of Raw Ideas

17May2018

20 min read

Raw file formats have gone through a number of iterations over the years. In this article I’m going to briefly look at some various aspects, including what’s good and bad, of various raw file format as they’ve evolved over time. This isn’t intended to be a serious or in depth investigation into any specific format, it’s more of a light reading summary of some aspects of what’s been done and how those decisions stand up.

Uncompressed Raw Sensor Data – 👍/👎

e.g., Nikon’s Uncompressed raw format

It’s hard to argue that the point of a raw file is that it should store the raw output of the sensor, and this strategy does exactly that. However, it’s also hard to argue that there aren’t more elegant or more space efficient ways to do the same thing.

About the only plus to an uncompressed raw file is that it’s easier for the developer to reverse engineer. However, this is less of a plus than it seems since compression algorithms are generally well defined and the people reverse engineering raw files are smart and haven’t had any appreciable problems in figuring out compression methods used so far.

On the flip size, the big practical down side to a photographer is that uncompressed raw files are big. Meaning as photographers, we lose storage space on our cards that we could otherwise use in the field and in our computers. Moreover, read and write bandwidth to a flash card is fixed by the implementation of that card, not the size of the file. Smaller files write more quickly meaning they clear out of the buffer faster.

At best, this is a baseline standard. It does what it needs to do, but it doesn’t do it as elegantly or efficiently as it could.

Lossless Compression – 👍

E.g. Canon’s raw format, and Nikon’s lossless raw format

Compression adds complexity to both the code bases for saving and reading the file, as well as to the reverse engineering process. Though honestly, as a photographer none of that matters, and as a programmer I would expect anyone who’s implementing a raw engine to be competent enough to be able to deal with it. The big up side from compression is that you get smaller, potentially much smaller, file sizes and that translates to more pictures on a card.

When it comes to compression algorithms, they can be classified into two broad kinds: lossless, and lossy.

Lossless compression algorithms permit the complete bit-for-bit 100% accurate reconstruction of the original data from the compressed data.

Lossy compression, as the name implies, irrecoverably throws away some information. As a result you cannot completely 100% accurately reconstruct the original data from the lossy compressed file.

Because the original data can be completely recovered, lossless compression is generally a safe bet. However, there are limitations, compared to lossy compression, lossless compression won’t save as much space.

Additionally, as with all compression algorithms, the compression ratio is determined by the randomness of the data being compressed. Fundamentally all compression algorithms attempt to find a way to combine repetitive blocks of data into a smaller representation. For highly repetitive data, the algorithm can replace lots of duplicates with smaller representations. However, for random data, where nearly every bit is unique, this isn’t possible and the compression ratio drops as a result.

We can see the impacts of noise (randomness) in compression by observing the increase in file sizes with increased ISO. Noisier files have more randomness and become less compressible.

Ultimately losslessly compressed raw files are a straight up better approach than uncompressed ones. You don’t lose any data and you still get smaller files. What little added complexity in the camera and covnersion code is entirely immaterial to us photographers, and the programmers that have to deal with it are already in a position of knowledge and experience to do so.

Nikon’s Not-Really-Lossy Lossy Raw Compression – 👍👍

In many ways, I would argue that Nikon’s Lossy Raw format has simultaneously accurate and a the worst possible name Nikon could have given it.

Strictly speaking any compression algorithm that throws away data is lossy, and Nikon’s lossy raw format does discard something that to many looks like data.

Generally, lossy compression algorithms, like the MP3 for music or the h.264 for video, work because they exploit the physical limitations of the human that ultimately will look at or listen to the file.

An easy example of this is chroma subsampling. Since the human eye is less sensitive to color differences than brightness differences, some color information can be discarded to save space. Video systems have been exploiting this to save space by sampling the color information at a lower resolution than the brightness information for the same frame. For example 4:2:0 chroma subsampling stores 1/4 as much information (samples at half the vertical and horizontal resolutions) for each fo the color channels as it does brightness.

However, Nikon’s lossy raw format doesn’t approach the problem as trying to exploit the end viewer’s perceptual limitations. Instead, Nikon looked at the behavior of light itself.

There’s an inherent type of nose in light called shot noise (or Poisson noise). Basically, even if you have what appears to be a continuous light source, there will be tiny fluctuations in the times photons arrive at the camera’s sensor. If you want to understand the physics behind why this happens, the wikipedia page on it is reasonably good read.

Put simply, you can’t get around this noise with a better camera or sensor, its an inherent part of the way light behaves. And noise, as I noted earlier is bad for compressing data.

This is where the Nikon engineers went and did something real slick.

Photon shot noise can be roughly modeled as the square root of the number of photons in the signal. This means that as the brightness (number of photons) increases, so does the shot noise, and it does so in a predictable way.

For example, if you have a signal that reads 200 units. There’s up to 14 units of noise in that signal (√200 = 14), meaning if you have 3 different values, 200, 201, and 195, you can’t distinguish whether those are all supposed to be 200, but are being affected by shot noise, of if those are all completely different values.

Simultaneously, digital imaging sensors work and raw files store data linearly. As the brightness increases, number of bits that are available to store steps increases.

For example, an in idealized 8-bit camera, the brightest stop will contain the 127 values from 128 to 255. The next stop down will only contain 63 values, from 64 to 127. And so forth until the darkest stop contains only the bottom most bit, and the values 0 and 1.

What this means is that as the brightness increases, more and more values represent noise instead of signal. Not only are there more values being used to store each successive stop, but the number of those values that are noise for each distinct step in signal increases as well. That is, while you might have 127 possible values to store a stop, there aren’t 127 uniquely different steps in brightness between them.

Instead of simply storing the raw sensor values in a file, sores values that would be statistically differentiable signals without the shot noise. By binning multiple noise correlated signal levels together, you need less numbers to represent the entire range of brightnesses. With fewer total numbers needed, you don’t need as many bits to store them either.

At low brightnesses, Nikon’s lossy raw format records the output of the sensor exactly. However, at a threshold point (DN 689 for 12-bit, and DN 2753 for 14-bit^[1]) Nikon stops storing the exact sensor output and stores only noise correlated steps^[2]. As a result, Nikon can store the full range of a 12-bit camera in around 9 bits, and 14-bits in around 11. That’s alone is a 25% reduction in file sizes before you do anything further to the file; like compressing it.

The real key point here, and why this gets 2 thumbs up form me, is that Nikon isn’t throwing away data, but is throwing away noise. Moreover, if you were inclined to do so, you could actually recreate the discarded noise by applying an appropriate amount of brightness correlated noise to the decompressed file. Randomness is randomness, in this case it doesn’t matter if it was added by the light itself or a computer fuzzing up the values after the fact.

Personally, I think this is really what everybody should be doing from the start. The only real argument I can make against it is that if you’re using the your camera for scientific measurements, where you need to record shot noise, then this would be a problematic approach. Otherwise, there’s no actual loss of information, at least not information we care about (i.e. everything that isn’t noise) photographically, and the files are smaller as a result of it. Win-win.

Sony Raw Lossy cRAW/ARW2- 👎

Sony’s engineers attempted to take Nikon’s photon shot modeled raw and compress it further. However, in doing so they choose a seemingly bizarre approach, and to be honest, I can’t quite figure out what their thought process might have been.

Sony starts with a similar kind of photon shot noise modeling that Nikon did in their lossy raw format. They store a 14-bit linear sensor output into a 11-bit raw value by binning brighter values relative to their shot noise.

However, to further reduce the file size, instead of using a lossy compression algorithm across the whole file, they instead turned to a technique a technique that’s reminiscent of micro-blocking used in JPEG or h.264 only different.

Sony divides their images into 1 pixel high by 16 pixel wide groups that are stored in a 128-bit block of data. In block, two pixels are stored in full 11-bits form, these set this minimum and maximum values in that group of pixels. A further 2 4-bit numbers are used to store the position in the block of those min and max pixels. The remaining pixels are stored as 7-bit deltas between those minimum and maximum values. The result is 16-pixels get stored in 128 bits of space, or an average of 8-bits per pixel^[3].

Sony’s binning certainly makes the files smaller, but it comes with a huge consequences to image quality, and it all comes down to the 7-bit offset values.

7-bits provides enough space to store 128 values (0 to 127). If the difference between the min and max pixels is 127 steps or smaller, then 7-bits is enough to completely cover the entire range 1-to1 and there aren’t any problems.

However, when you have large jumps in brightness between the brightest and darkest pixels, such as when you have a star trail or the dark shadowed side of a building against a bright sky, then the 127 intermediate values aren’t enough to cover the range 1:1. Instead the intermediate values have to jump multiple brightness levels and that opens the door for artifacts.

This artifacting isn’t a problem for all images, which is probably why Sony thought it was fine to use. It’s only the ones where there are significant transitions between bright and dark that are problematic.

However, I find it hard to understand why there was such a pressing need to shave off that few extra bytes that this looked like a good idea.

Even if you fallaciously appeal to popularity and argue that the failure cases are relatively rare, they’re still there. Moreover, I’m not sure what the selling point really is, “our raw files are smaller than Canon’s or Nikon’s” doesn’t really strike me something that’s going to push anybody one way or another and certainly not in the case where you have to add “and sometimes have visual artifacts because of it”.

And it’s this artifacting that, for me, makes this approach a bad idea. Nikon’s “lossy” stratgy was to throw away noise, which isn’t really losing anything of value and ultimately doesn’t compromise image quality in any kinds of images. However, Sony’s approach does very much is lossy in the sense that it throws away actual data with the hopes that you just wont care. Moreover, there’s a ton of research out there for how to do perceptually invisible lossy compression all of which works better than what Sony came up with here.

I have to wonder if it wouldn’t have been better for Sony to simply pass a resulting 11-bit raw file through a standard lossless compression engine (e.g., zip or lzma) instead of what they did.

Canon’s Dual Pixel RAW – 👍

Canon’s dual pixel raw format, along with the tech behind it, is surprisingly quite interesting to me in the context of this discussion. It’s not so much that there’s some amazing design feature that thumps everything else. Rather what intrigues me is the clever way appraoched extending their existing cr2 raw format to add the capabilities.

The DPR raw files aren’t radical departures from the existing Canon cr2 format. In fact, the basically leverage everything that already existed in the cr2 format, and by extension the capabilities of the Canon Digic processors and raw conversion software. Canon certainly could have chosen to introduce a new raw format, much as they did with the cr3 format and the EOS M50.

So what does a DPR enabled cr2 do that’s so interesting?

For me the main point is how Canon added in capabilities without radically changing the cr2 format or creating a completely new one.

Up until Canon introduced DPR, the relationship for raw data has been 1:1 for the pixels pixels. Canon’s dual pixel architecture, changes this slightly. Each “pixel” is actually two sub pixels not just one. During normal operation, the camera combines the values of the two sub-pixels into a single value for the pixel.

When it comes to storing the dual pixel data, there’s at least two viable approaches that could have been taken. One is to store the value for each sub-pixel in it’s own area in the file. In effect, you’d have a big array of left sub-pixels, and another big array of right sub-pixels.

Doing it that way would mean that raw processing software — even software that wasn’t doing fancy things with the dual pixel data like Canon’s focus shifting — would have to be rewritten to recombine the sub-pixel data into a the complete value of the actual pixel.

However, that’s not what Canon did. Instead of splitting out the sub-pixel data, the first block of raw data stored in a DPR file is the sum of the two subpixels (call this the A+B data). This, not coincidentally, is identical to the contents of the raw block in a non-DPR enabled cr2 file, or functionally a cr2 file from any other camera. The dual pixel data is then stored as a second block of only one of the sub pixel’s data (we’ll call this A data).

This approach has some interesting consequences. Backwards compatibility with existing software should require virtually no work after adding support for the 5D4 cr2 files. Any cr2 raw converter should be able to follow the existing offset information and extract the A+B data and ignore the DPR part and have an image.

The Canon implemented DP specific processing (e.g. focus and bokeh shifting) can be done by recovering the values of the B pixel data by subtracting the A values from the A+B data.

This is also where things get really interesting too.

Back in early 2018, the developers behind Fast Raw Viewer and RAWDigger, LibRaw LCC, made headlines by claiming that you could recover a stop of highlights (increasing the dynamic range by a stop) by exploiting the properties of the way Canon did their DPR files. You can read about the details here and here, since I’m not going to go into great detail here.

Ultimately, DPR files provide access to some interesting side effects while still fundamantelly being able to be treated as bog standard cr2 files if your software isn’t interested in that stuff. Canon’s Digital Photo Pro software can use the dual pixel data to shift the point of focus and the bokeh. And DPRSplit, and hopefully more direct implementations in 3rd party raw processors like Lightroom, stand to enable an additional stop of dynamic range to be extracted from Canon’s raw files.

The cost of DPR enabled files isn’t earth shattering either. A DPR file is certainly bigger than a non-DPR file. However, it’s not quite twice as big, or rather, not quite as big as two standard raw files. There’s an inherent level of redundancy that a bracketed set of images has that doesn’t exist in a DPR file; there’s only one copy of the metadata for the shot, and only one embedded JPEG preview.

In my limited testing (again remember Canon’s raw files are all losslessly compressed, so their size varies with ISO and image content) DPR enabled files are between 2 and 4 MB smaller than a bracketed set of standard raw files. That’s not a tremendous difference, but it does add up to an extra file every 30 frames or so.

Shooting DPR files also limits the frame rate on the 5D4 to 5 FPS and the buffer to approximately 7 frames. This is an appreciable drop from the 7 FPS and 17+ frames the camera is capable of shooting regular raw files, at least in some respects.

This too is a pro and con situation. In some respects DPR files can be thought of as a one shot replacement for a 2 frame 0 and -1Ev bracket set. Shooting a 2 frame bracket set drops the 5D4’s effective frame rate from 7 to 3.5 (since you need to shoot both frames for the bracket). So shooting DPR instead of bracketing like that nets a faster frame rate. The buffer situation may still be slightly better for shooting standard frames, but that would require extensive testing to verify.

However, the real killer feature of the DPR built in bracketing is the lack of a temporal discontinuity. The two sub frames are recorded at the same time instead of sequentially. This means that for moving subjects (animals, waves, trees on a windy day), the DPR exposure won’t have the same kind of HDR ghosting issues that a 2 frame bracket set would.

Canon clearly aimed to make their DPR files as similar to non-DPR files as possible. The base image is fundamentally the same information that’s stored in any other cr2 raw file. This means that it can reuse all the same code paths and hardware in the camera and in raw converters to get to at least a baseline image.

Then there’s the half sub-pixel data. One could make an argument that there’s better ways, or at least ways that take up less space to store it. On the other hand, doing what they did does enable to us to recover another stop of dynamic range from the raw data, which is a nice win for the photographer.

Unlike previous sections, there’s no concrete take away here in a broader sense. Canon’s DPR files are an artifact of the dual-pixel AF tech, without that there’s no lesson to learn. Maybe the real takeaway here is how Canon’s engineer’s didn’t simply throw up their hands and say they needed to completely reinvent the wheel to implement DPR.

Canon Lossy Compressed Raw – 😕

Canon’s new lossy compressed raw supported by the new cr3 format is, well right now it’s a largely open question to me as far as what’s going on. The file size savings is significant, with lossy raw files coming in 30-40% smaller than Canon’s normal lossless compression. However, I’m having a hard time finding how Canon actually went about it.

If we were to look at Nikon’s lossy and lossless compression methods, their already excellently designed lossy format is no where near 30% smaller than a losslessly compressed file. Moreover, Canon’s RAW files aren’t that far away from Nikon’s when looking at comparative compression ratios. (E.g., a 36MP D810 lossless file is ~41MB, and a 30MP 5D4 lossless file is 36.8MB; ~16% less resolution, ~10% smaller files.)

Just going by those numbers, Canon probably isn’t just adopting Nikon’s photon statistics modeling and throwing out shot noise. I’ve seen some discussions on various open-source raw conversion tools that the compression is in some ways similar to JPEG, but not exactly either.

The short of it, for me at least, is that right now it’s too early to tell. Looking at it from an pure image quality perspective, the few tests I’ve seen done so far, don’t show an appreciable depredation in quality. DPReview seems to think that the dynamic range may be slightly compromised compared to standard RAW, but they didn’t quantify it^[4].

On the same token, without a better understanding of what’s going on, it’s hard for me to really weigh in. When it comes to storage formats, I tend towards being conservative. I can accept “losses” for compression when I understand the implications and know that I’m not actually giving up something potentially useful. On the other hand, once you throw data away, you can’t recover it.

Conclusions

My intent here wasn’t to fully explain the ins and outs of every raw format out there. Rather, I wanted to look at some of the bigger picture details of various raw formats and see if I could glean some of the engineering decisions that went into them.

Clearly there are varying underlying decision processes going on in the way various companies have approached their raw files. Canon’s DPR format is especially telling of something that looks like it was both intending to add new features while maintaining reasonably straight forward path for backward compatibility.

https://web.archive.org/web/20150103013811/http://theory.uchicago.edu/~ejm/pix/20d/tests/noise/noise-p3.html#NEFcompression ↩︎
In practice it’s subtly more complicated than that, and I believe some noise is left in to provide some dithering to the stored image data. But the approximation is good enough for this discussion. ↩︎
https://www.rawdigger.com/howtouse/sony-craw-arw2-posterization-detection ↩︎
https://www.dpreview.com/articles/0483301097/quick-look-canons-new-compressed-raw-format ↩︎

Comments

There are no comments on this article yet. Why don't you start the discussion?