The Science of Raster to Vector Conversion

Why JPEG Files are Bad for Raster to Vector Conversion

Raw, lossless files are quite sizable, an issue that becomes problematic when archiving because a lot of storage space is needed. Now, picture yourself in the late 20th century in a world with slow internet, small-capacity storage devices, and slow computers. It’s unfathomable, isn’t it? But this was the life for technology enthusiasts then. The myriad of problems they faced necessitated measures to simplify the usage of technology and the internet. One vital development that emanated from the researchers at that time was image compression and allied file formats.

JPEG and Graphics Interchange Format (GIF) happen to be two of the most popular compressed image formats that were created out of necessity. JPEG is commonly used to save photographs/images, while GIF is utilized to save animations and images that feature line art or simple shapes. JPEG and GIF are relatively old formats – the former was created in 1992 while the latter was first released in 1987.

Each of these had their shortcomings. JPEG, for instance, was – and still is – a lossy compression. On the other hand, patent issues made GIF’s usage problematic. These factors and problems instigated the creation of the Portable Network Graphics (PNG) lossless file format in 1995 as a solution. Since then, the popularity of PNG has grown in leaps and bounds to the extent that in 2013, PNG was being used in 62.4% of all websites compared to GIF’s 62.3%.

This article isn’t about PNG or GIF, though. But the fact that JPEG was so bad that it necessitated the creation of PNG makes it the focus of this article. The issues that plagued its usage in the 1990s still exist today, and they make it a bad file format for raster to vector conversion. Here’s why and how.


History of JPEG

JPEG is an eponymous name, given that the file format was named after its developers, the Joint Photographic Experts Group. Although the file format became a formal file standard in 1992, its history didn’t start then. Its story began with Nasir Ahmed’s proposal of the discrete cosine transform (DCT), the lossy compression algorithm which forms the backbone of JPEG’s compression, in 1972. 

In 1983 individuals working with the International Organization for Standardization (ISO) started looking for ways of adding graphics that embodied the qualities of photographs to computer terminal screens. At that time, these terminal screens only utilized texts. Through the engagements of these ISO researchers, the Joint Photographic Experts Group was formed in 1986.

The committee worked on the JPEG standard throughout the late 1980s. Their work partly entailed selecting the compression technique for the images. Thus, DCT was chosen because it was the most efficient compression technology at that time that could be applied practically in photographs. Notably, the group’s work culminated in the publishing of the JPEG standard in 1992.

What is JPEG?

JPEG is a standard that defines the following:

  • The compression 
  • How the digital data signal is encoded
  • How the encoded data is decoded for editing purposes

JPEG’s compression made the format reign supreme when the internet was still a new phenomenon. At that time, the internet was much slower than presently. Additionally, moving files physically or even archiving them relied on floppy discs whose capacity was limited.

Thus, JPEG’s compression enabled – and still enables – the reduction of large, complex images into small sizes. Far too small, in fact. For instance, with JPEG, you can reduce the file size to 5% of its original size. This characteristic has made the format unappealing for situations that require an image’s resolution and quality to be maintained.

JPEG’s Compression

While the mention of JPEG’s compression brings the discrete cosine transform (DCT) algorithm into the picture, the process – known as encoding – is much more elaborate. It includes multiple steps that combine different compression techniques, namely:

  • Chroma subsampling or downsampling
  • Quantization using a matrix obtained from the DCT algorithm
  • Entropy coding – a lossless JPEG

Lossless JPEG

Entropy coding, the lossless JPEG, was developed much later, in 1993. It utilized different techniques in order to eliminate the lossy compression for which JPEG was known. However, despite being more beneficial than lossy compression, lossless JPEG wasn’t widely adopted. It, therefore, implies that it’s not as popular as its older brother, even now.

As such, the disadvantages that plagued JPEG at the beginning – that compelled its developers to develop a lossless version – still plague it today. Indeed, not much has changed despite there being a vehicle to bring about change. Thus, JPEG’s compression is synonymous with lossy compression and will be used as such in this article. It’s this lack of migration from lossy compression to the lossless version that makes JPEG bad for vector conversion.

Lossy Compression JPEG

The chroma subsampling, discrete cosine transform (DCT) algorithm, and quantization did an exemplary job during the early stages of JPEG’s deployment and usage. They reduced the file sizes as expected. DCT – on which several encoding steps were anchored – can thus be described as having been as efficient and practical as the eponymous group had observed.

For one, web developers could reduce images to very small sizes and subsequently populate their websites with them. The result would be a much more appealing site that would load relatively fast, considering the number of images therein. Further, given that devices at that time weren’t as fast as today, they’d experience bottlenecks when reading large files. But not with JPEG in the picture; JPEG’s compression dealt with this problem effectively, making even the slower devices able to read images.

However, perhaps JPEG was too practical and efficient. Because, by reducing the file sizes, its compression reduced the quality of certain aspects within the photograph or image. The fact that this still exists, given lossy compression’s wide usage, has continually made JPEG images bad for vector conversion and even printing.

How the JPEG’s lossy compression works

JPEG’s compression works on the biological premise that the human eye can only notice differences in brightness but not the hue and saturation aspects of color. The compression, therefore, works by targeting the red and blue components of the color. It reduces their spatial resolution (blurring) by half in a process called chroma subsampling or downsampling using ratios. In doing so, the JPEG simplifies the original image.

JPEG also reduces the file size by eliminating three types of redundancies, namely coding, interpixel, and psychovisual redundancies. The former two are easily reversible, but the third is rarely used because it’s irreversible. The technicalities of how the compression eliminates each of these redundancies don’t fall within the purview of this text. But they make for an interesting read if you have a knack for mathematical processes and formulae.

Combined, the various size-reduction avenues result in a much lesser file size than the original. Even so, the level of compression varies based on an image and its characteristics. 

Furthermore, image editing or compression software will give you the flexibility to choose the quality you desire. The higher the quality you need, the less the compression. But regardless of the quality you’ve chosen, the compression still targets the same image components, and these aspects, therefore, suffer. The resulting reduced quality makes JPEG bad for raster to vector conversion.

Why JPEG is bad for raster to vector conversion

The assumption throughout this article is that you intend to convert a technical drawing. And technical drawings are mainly made up of linework and texts, two aspects that JPEG’s compression greatly impacts.

Related: See these videos on raster to vector conversion.

Compression Artifacts

JPEG’s compression causes two compression artifacts, i.e., blocky images and shadows around the outlines of hard solids (lines and texts). Vectorizing blocky images has the same result as converting pixelated raster images to vector images – that, in both, the accuracy of the vectorization suffers.

Blockiness in a JPEG Image
Blockiness in a JPEG Image (Source)

But I should point out that the blocky and pixelated images aren’t the same. How so? Blocky images are made up of blocks of different sizes but which have the same color. In comparison, pixelated photos consist of blocks of the same size throughout. It may, therefore, be accurate to say that pixelated images are blocky but blocky images aren’t pixelated.

On the other hand, hard solids, i.e., lines and texts, usually acquire visible shadows that appear like clouds around their outlines. Such components are significantly impacted because they’re made up of sharp edges.

Linework surrounded by clouds/rings in a JPEG file
Linework surrounded by clouds/rings in a JPEG file
Ideal linework for vectorization
Ideal linework for vectorization

Raster images and vector images differ greatly. One of the differences that makes JPEG images bad for vector conversions is that, while raster images can represent shadows and shade effectively, vector images cannot. Instead, the vectorization software has to create a new object and assign it a color for the shadow representation to be accurately portrayed.

As such, in cases where lines or texts are surrounded by rings that resemble shadows, then the vectorization software will have to create a new object. Do you see where this is going? Ultimately, your vector image wouldn’t, in any way, be a replica of the raster file. As a result, the raster to vector conversion becomes inaccurate.

Furthermore, if the texts and lines are close together, then creating new images may even be too problematic to complete successfully because images are bound to overlap. An error message is the only likely outcome in such a case.

Inaccurate Vectorization

In this regard, the compression artifacts cause inaccurate vectorization. The inaccuracies have trickle-down effects since they’ll force you to start the process from the very beginning, i.e., choosing new scanner settings, scanning your document anew, saving it under a lossless file format, and then converting.

If you’re not looking to scan your document afresh, perhaps because it was archived a long time ago, then your only recourse is cleaning the raster file. However, you can never clean some raster files enough to the point that you can convert them successfully into vector images.

As if that’s not all, most of JPEG’s compression is irreversible. In essence, saving a scan using the JPEG file format is damaging because you can’t convert the JPEG image to a more high-quality file format thereafter.

Best File format for vectorization

The additional work and accompanying frustrations aren’t worth it. With that in mind, save your file as TIFF as it’s the best file format for vectorization. TIFF will compress your file, but your image won’t lose any information that would otherwise complicate vectorization. After all, it;s a lossless compression format.

Alternatively, you could save your file as a PDF but not before choosing TIFF as the compression method in case your PDF file is too large.

Whether you’re dealing with a photograph or scanned technical drawing, JPEG’s lossy compression is an inescapable trap if you choose to save your file in this format anyway. For photography, it may be ideal if you’re only looking to reduce the original file size, without much regard for the quality.

However, storing scanned technical drawings as JPEG files is tantamount to shooting yourself in the foot. You might not see the issue now, but the truth might dawn on you later on, when it’s too late. Perhaps while battling frustrations resulting from unsuccessful vectorizations. In short, the JPEG file format is bad for raster to vector conversion.