Documentation Buy Contact Blog
Blog

JPEG Optimizations

Adam C. Clifton
15 Feb 2022

While working on Number Duck Next, a big new update to the library, I noticed that when inserting pictures, we fully load and process them to get the pixel data, but we never actually use it. We only actually need the width, height and file data.

Since these values are not too difficult to read directly from the file, I've been able to remove the need for the libjpeg library. This not only saves us over a megabyte of source code that needs compiling, it also reduces the required licences for the end user to add to their own project.

The main part to keep in mind when manually reading a JPEG file is that it is made up of many small chunks, so we just need to loop through untill we find the ones we want.

Firstly there is a file header of two bytes, 0xFF and 0xDB.

Then we can start looping through the chunks, they have a repeated 4 byte header pattern:
One byte, always 0xFF
One byte, the chunk type
Two bytes, the chunk size (not including the first two bytes for the chunk type).

The two bytes for size may not match the endianess of your system, so it's best to manually combine them like so: (s[0] << 8) | s[1]

So we can loop through these chunks, skipping ahead by the computed size to get to the next chunk.

There are two types of chunks we are interested in. SOF0 and SOF2 which have the chunk type 0xC0 and 0xC2 respectively.

Both these chunks have the same initial format:
One byte, the bits per pixel of the image data
Two bytes, the height of the image
Two bytes, the width of the image

Height and width are both combined together in the same way we did the size.

Now that we have what we need we are done and we don't need to scan through any more chunks. A note here that there will be other formats not yet supported, so if you see a chunk with type 0xDA, you've gone too far and should quit out. This chunk type does not have the size in the same position as previous chunks, so if you try to read and step ahead from there, you will end up in a bad part of the file.

Previous: Excel Theme Color Tint
Next: Number Duck Retrospective: Lessons Learned in Building an Excel Library