In this article, you will read about how to do JavaScript implementation of the so-called Seam Carving algorithm for the content-aware image resizing and objects removal. A dynamic programming approach is applied to optimize the resizing time.
TL;DR
There are many great articles written about the Seam Carving algorithm already, but I couldn't resist the temptation to explore this elegant, powerful, and yet simple algorithm on my own, and to write about my personal experience with it. Another point that drew my attention was the fact that the Dynamic Programming (DP) approach might be smoothly applied to solve it. And, if you're like me and still on your "learning algorithms" journey, this algorithmic solution may enrich your personal DP arsenal.
So, with this article, I want to do three things:
- Provide you with an interactive content-aware resizer so that you could play around with resizing your own images
- Explain the idea behind the Seam Carving algorithm
- Explain the dynamic programming approach to implement the algorithm (we'll be using TypeScript for it)
Content-Aware Image Resizing
Content-aware image resizing might be applied when it comes to changing the image proportions (i.e., reducing the width while keeping the height) and when losing some parts of the image is not desirable. Doing the straightforward image scaling, in this case, would distort the objects in it. To preserve the proportions of the objects while changing the image proportions, we may use the Seam Carving algorithm that was introduced by Shai Avidan and Ariel Shamir.
The example below shows how the original image width was reduced by 50% using content-aware resizing (left image) and straightforward scaling (right image). In this particular case, the left image looks more natural since the proportions of the balloons were preserved.
The Seam Carving algorithm's idea is to find the seam (continuous sequence of pixels) with the lowest contribution to the image content and then carve (remove) it. This process repeats over and over again until we get the required image width or height. In the example below, you may see that the hot air balloon pixels contribute more to the content of the image than the sky pixels. Thus, the sky pixels are being removed first.
Finding the seam with the lowest energy is a computationally expensive task (especially for large images). To make the seam search faster, the dynamic programming approach might be applied (we will go through the implementation details below).
Objects Removal
The importance of each pixel (so-called pixel's energy) is being calculated based on its color (R
, G
, B
, A
) difference between two neighbor pixels. Now, if we set the pixel energy to some really low level artificially (i.e., by drawing a mask on top of them), the Seam Carving algorithm would perform an object removal for us for free.
JS IMAGE CARVER Demo
I've created the JS IMAGE CARVER web app that you may use to play around with resizing your custom images.
More Examples
Here are some more examples of how the algorithm copes with more complex backgrounds.
Mountains in the background are being shrunk smoothly without visible seams.
The same goes for the ocean waves. The algorithm preserved the wave structure without distorting the surfers.
We need to keep in mind that the Seam Carving algorithm is not a silver bullet, and it may fail to resize the images where most of the pixels are edges (look important to the algorithm). In this case, it starts distorting even the important parts of the image. In the example below, the content-aware image resizing looks pretty similar to a straightforward scaling since for the algorithm all the pixels look important, and it is hard for it to distinguish Van Gogh's face from the background.
How Seam Carving Algorithms Works
Imagine we have a 1000 x 500 px
picture, and we want to change its size to 500 x 500 px
to make it square (let's say the square ratio would better fit the Instagram feed). We might want to set up several requirements to the resizing process in this case:
- Preserve the important parts of the image (i.e., if there were 5 trees before the resizing, we want to have 5 trees after resizing as well).
- Preserve the proportions of the important parts of the image (i.e., circle car wheels should not be squeezed to the ellipse wheels)
To avoid changing the important parts of the image, we may find the continuous sequence of pixels (the seam), that goes from top to bottom and has the lowest contribution to the content of the image (avoids important parts) and then remove it. The seam removal will shrink the image by 1 pixel. We will then repeat this step until the image will get the desired width.
The question is how to define the importance of the pixel and its contribution to the content (in the original paper the authors are using the term energy of the pixel). One of the ways to do it is to treat all the pixels that form the edges as important ones. In case a pixel is a part of the edge, its color would have a greater difference between the neighbors (left and right pixels) than the pixel that isn't a part of the edge.
Assuming that the color of a pixel is represented by 4 numbers (R
- red, G
- green, B
- blue, A
- alpha), we may use the following formula to calculate the color difference (the pixel energy):
Where:
mEnergy
- Energy (importance) of the middle pixel ([0..626]
if rounded) lR
- Red channel value for the left pixel ([0..255]
) mR
- Red channel value for the middle pixel ([0..255]
) rR
- Red channel value for the right pixel ([0..255]
) lG
- Green channel value for the left pixel ([0..255]
) - and so on...
In the formula above, we're omitting the alpha (transparency) channel, for now, assuming that there are no transparent pixels in the image. Later, we will use the alpha channel for masking and for object removal.
Now, since we know how to find the energy of one pixel, we can calculate, so-called, energy map which will contain the energies of each pixel of the image. On each resizing step, the energy map should be re-calculated (at least partially, more about it below) and would have the same size as the image.
For example, on the 1st resizing step, we will have a 1000 x 500
image and a 1000 x 500
energy map. On the 2nd resizing step, we will remove the seam from the image and re-calculate the energy map based on the new shrunk image. Thus, we will get a 999 x 500
image and a 999 x 500
energy map.
The higher the energy of the pixel, the more likely it is a part of an edge, and it is important for the image content and the less likely that we need to remove it.
To visualize the energy map, we may assign a brighter color to the pixels with the higher energy and darker colors to the pixels with the lower energy. Here is an artificial example of how the random part of the energy map might look like. You may see the bright line which represents the edge and which we want to preserve during the resizing.
Here is a real example of the energy map for the demo image you saw above (with hot air balloons).
We may use the energy map to find the seams (one after another) with the lowest energy and by doing this to decide which pixels should be ultimately deleted.
Finding the seam with the lowest energy is not a trivial task and requires exploring many possible pixel combinations before making the decision. We will apply the dynamic programming approach to speed it up.
In the example below, you may see the energy map with the first lowest energy seam that was found for it.
In the examples above, we were reducing the width of the image. A similar approach may be taken to reduce the image height. We need to "rotate" the approach though:
- start using top and bottom pixel neighbors (instead of left and right ones) to calculate the pixel energy
- when searching for a seam, we need to move from left to right (instead of from up to bottom)
Implementation in TypeScript
To implement the algorithm, we will be using TypeScript. If you want a JavaScript version, you may ignore (remove) type definitions and their usages.
For simplicity reasons, let's implement the seam carving algorithm only for the image width reduction.
Content-Aware Width Resizing (The Entry Function)
First, let's define some common types that we're going to use while implementing the algorithm.
type ImageSize = { w: number, h: number };
type Coordinate = { x: number, y: number };
type Seam = Coordinate[];
type EnergyMap = number[][];
type Color = [
r: number,
g: number,
b: number,
a: number,
] | Uint8ClampedArray;
On the high level, the algorithm consists of the following steps:
- Calculate the energy map for the current version of the image.
- Find the seam with the lowest energy based on the energy map (this is where we will apply Dynamic Programming).
- Delete the seam with the lowest energy seam from the image.
- Repeat until the image width is reduced to the desired value.
type ResizeImageWidthArgs = {
img: ImageData,
toWidth: number,
};
type ResizeImageWidthResult = {
img: ImageData,
size: ImageSize,
};
export const resizeImageWidth = (
{ img, toWidth }: ResizeImageWidthArgs,
): ResizeImageWidthResult => {
const size: ImageSize = { w: img.width, h: img.height };
const pxToRemove = img.width - toWidth;
if (pxToRemove < 0) {
throw new Error('Upsizing is not supported for now');
}
let energyMap: EnergyMap | null = null;
let seam: Seam | null = null;
for (let i = 0; i < pxToRemove; i += 1) {
energyMap = calculateEnergyMap(img, size);
seam = findLowEnergySeam(energyMap, size);
deleteSeam(img, seam, size);
size.w -= 1;
}
return { img, size };
};
The image that needs to be resized is being passed to the function in ImageData format. You may draw the image on the canvas and then extract the ImageData
from the canvas
like this:
const ctx = canvas.getContext('2d');
const imgData = ctx.getImageData(0, 0, imgWidth, imgHeight);
Let's break down each step only be one and implement the calculateEnergyMap()
, findLowEnergySeam()
and deleteSeam()
functions.
Calculating the Pixel's Energy
Here, we apply the color difference formula described above. For the left and right borders (when there are no left or right neighbors), we ignore the neighbors and don't take them into account during the energy calculation.
const getPixelEnergy = (left: Color | null, middle: Color, right: Color | null): number => {
const [mR, mG, mB] = middle;
let lEnergy = 0;
if (left) {
const [lR, lG, lB] = left;
lEnergy = (lR - mR) ** 2 + (lG - mG) ** 2 + (lB - mB) ** 2;
}
let rEnergy = 0;
if (right) {
const [rR, rG, rB] = right;
rEnergy = (rR - mR) ** 2 + (rG - mG) ** 2 + (rB - mB) ** 2;
}
return Math.sqrt(lEnergy + rEnergy);
};
Calculating the Energy Map
The image we're working with has the ImageData format. It means that all the pixels (and their colors) are stored in a flat (1D) Uint8ClampedArray array. For readability purposes, let's introduce a couple of helper functions that will allow us to work with the Uint8ClampedArray
array as with a 2D matrix instead.
const getPixel = (img: ImageData, { x, y }: Coordinate): Color => {
const i = y * img.width + x;
const cellsPerColor = 4;
return img.data.subarray(i * cellsPerColor, i * cellsPerColor + cellsPerColor);
};
const setPixel = (img: ImageData, { x, y }: Coordinate, color: Color): void => {
const i = y * img.width + x;
const cellsPerColor = 4;
img.data.set(color, i * cellsPerColor);
};
To calculate the energy map, we go through each image pixel and call the previously described getPixelEnergy()
function against it.
const matrix = <T>(w: number, h: number, filler: T): T[][] => {
return new Array(h)
.fill(null)
.map(() => {
return new Array(w).fill(filler);
});
};
const calculateEnergyMap = (img: ImageData, { w, h }: ImageSize): EnergyMap => {
const energyMap: number[][] = matrix<number>(w, h, Infinity);
for (let y = 0; y < h; y += 1) {
for (let x = 0; x < w; x += 1) {
const left = (x - 1) >= 0 ? getPixel(img, { x: x - 1, y }) : null;
const middle = getPixel(img, { x, y });
const right = (x + 1) < w ? getPixel(img, { x: x + 1, y }) : null;
energyMap[y][x] = getPixelEnergy(left, middle, right);
}
}
return energyMap;
};
The energy map is going to be recalculated on every resize iteration. It means that it will be recalculated, let's say, 500 times if we need to shrink the image by 500 pixels which is not optimal. To speed up the energy map calculation on the 2nd, 3rd, and further steps, we may re-calculate the energy only for those pixels that are placed around the seam that is going to be removed.
Finding the Seam With the Lowest Energy (Dynamic Programming Approach)
The issue we need to solve now is to find the path (the seam) on the energy map that goes from top to bottom and has the minimum sum of pixel energies.
The Naive Approach
The naive approach would be to check all possible paths one after another.
Going from top to bottom, for each pixel, we have 3 options (↙︎ go down-left, ↓ go down, ↘︎ go down-right). This gives us the time complexity of O(w * 3^h)
or simply O(3^h)
, where w
and h
are the width and the height of the image. This approach looks slow.
The Greedy Approach
We may also try to choose the next pixel as a pixel with the lowest energy, hoping that the resulting seam energy will be the smallest one.
This approach gives not the worst solution, but it cannot guarantee that we will find the best available solution. On the image above, you may see how the greedy approach chose 5
instead of 10
at first and missed the chain of optimal pixels.
The good part about this approach is that it is fast, and it has a time complexity of O(w + h)
, where w
and h
are the width and the height of the image. In this case, the cost of the speed is the low quality of resizing. We need to find a minimum value in the first row (traversing w
cells) and then we explore only 3 neighbor pixels for each row (traversing h
rows).
The Dynamic Programming Approach
You might have noticed that in the naive approach, we summed up the same pixel energies over and over again while calculating the resulting seams' energy.
In the example above, you see that for the first two seams, we are re-using the energy of the shorter seam (which has the energy of 235
). Instead of doing just one operation 235 + 70
to calculate the energy of the 2nd seam, we're doing four operations (5 + 0 + 80 + 150) + 70
.
This fact that we're re-using the energy of the previous seam to calculate the current seam's energy might be applied recursively to all the shorter seams up to the very top 1st row seam. When we have such overlapping sub-problems, it is a sign that the general problem might be optimized by dynamic programming approach.
So, we may save the energy of the current seam at the particular pixel in an additional seamsEnergies
table to make it re-usable for calculating the next seams faster (the seamsEnergies
table will have the same size as the energy map and the image itself).
Let's also keep in mind that for one particular pixel on the image (i.e., the bottom left one) we may have several values of the previous seams energies.
Since we're looking for a seam with the lowest resulting energy, it would make sense to pick the previous seam with the lowest resulting energy as well.
In general, we have three possible previous seems to choose from:
You may think about it this way:
- The cell
[1][x]
: contains the lowest possible energy of the seam that starts somewhere on the row [0][?]
and ends up at cell [1][x]
- The current cell
[2][3]
: contains the lowest possible energy of the seam that starts somewhere on the row [0][?]
and ends up at cell [2][3]
. To calculate it we need to sum up the energy of the current pixel [2][3]
(from the energy map) with the min(seam_energy_1_2, seam_energy_1_3, seam_energy_1_4)
If we fill the seamsEnergies
table completely, then the minimum number in the lowest row would be the lowest possible seam energy.
Let's try to fill several cells of this table to see how it works.
After filling out the seamsEnergies
table, we may see that the lowest energy pixel has an energy of 50
. For convenience, during the seamsEnergies
generation for each pixel, we may save not only the energy of the seam but also the coordinates of the previous lowest energy seam. This will give us the possibility to reconstruct the seam path from the bottom to the top easily.
The time complexity of DP approach would be O(w * h)
, where w
and h
are the width and the height of the image. We need to calculate energies for every pixel of the image.
Here is an example of how this logic might be implemented:
type SeamPixelMeta = {
energy: number,
coordinate: Coordinate,
previous: Coordinate | null,
};
const findLowEnergySeam = (energyMap: EnergyMap, { w, h }: ImageSize): Seam => {
const seamsEnergies: (SeamPixelMeta | null)[][] = matrix<SeamPixelMeta | null>(w, h, null);
for (let x = 0; x < w; x += 1) {
const y = 0;
seamsEnergies[y][x] = {
energy: energyMap[y][x],
coordinate: { x, y },
previous: null,
};
}
for (let y = 1; y < h; y += 1) {
for (let x = 0; x < w; x += 1) {
let minPrevEnergy = Infinity;
let minPrevX: number = x;
for (let i = (x - 1); i <= (x + 1); i += 1) {
if (i >= 0 && i < w && seamsEnergies[y - 1][i].energy < minPrevEnergy) {
minPrevEnergy = seamsEnergies[y - 1][i].energy;
minPrevX = i;
}
}
seamsEnergies[y][x] = {
energy: minPrevEnergy + energyMap[y][x],
coordinate: { x, y },
previous: { x: minPrevX, y: y - 1 },
};
}
}
let lastMinCoordinate: Coordinate | null = null;
let minSeamEnergy = Infinity;
for (let x = 0; x < w; x += 1) {
const y = h - 1;
if (seamsEnergies[y][x].energy < minSeamEnergy) {
minSeamEnergy = seamsEnergies[y][x].energy;
lastMinCoordinate = { x, y };
}
}
const seam: Seam = [];
if (!lastMinCoordinate) {
return seam;
}
const { x: lastMinX, y: lastMinY } = lastMinCoordinate;
let currentSeam = seamsEnergies[lastMinY][lastMinX];
while (currentSeam) {
seam.push(currentSeam.coordinate);
const prevMinCoordinates = currentSeam.previous;
if (!prevMinCoordinates) {
currentSeam = null;
} else {
const { x: prevMinX, y: prevMinY } = prevMinCoordinates;
currentSeam = seamsEnergies[prevMinY][prevMinX];
}
}
return seam;
};
Removing the Seam With the Lowest Energy
Once we found the lowest energy seam, we need to remove (to carve) the pixels that form it from the image. The removal is happening by shifting the pixels to the right of the seam by 1px
to the left. For performance reasons, we don't actually delete the last columns. Instead, the rendering component will just ignore the part of the image that lays beyond the resized image width.
const deleteSeam = (img: ImageData, seam: Seam, { w }: ImageSize): void => {
seam.forEach(({ x: seamX, y: seamY }: Coordinate) => {
for (let x = seamX; x < (w - 1); x += 1) {
const nextPixel = getPixel(img, { x: x + 1, y: seamY });
setPixel(img, { x, y: seamY }, nextPixel);
}
});
};
Objects Removal
The Seam Carving algorithm tries to remove the seams which consist of low energy pixels first. We could leverage this fact and by assigning low energy to some pixels manually (i.e., by drawing on the image and masking out some areas of it), we could make the Seam Carving algorithm to do objects removal for us for free.
Currently, in getPixelEnergy()
function, we were using only the R
, G
, B
color channels to calculate the pixel's energy. But there is also the A
(alpha, transparency) parameter of the color that we didn't use yet. We may use the transparency channel to tell the algorithm that transparent pixels are the pixels we want to remove.
Here is how the algorithm works for object removal.
Issues and What's Next
The JS IMAGE CARVER web app is far from being a production-ready resizer of course. Its main purpose was to experiment with the Seam Carving algorithm interactively. So the plan for the future is to continue experimentation.
The original paper describes how the Seam Carving algorithm might be used not only for the downscaling but also for the upscaling of the images. The upscaling, in turn, might be used to upscale the image back to its original width after the objects' removal.
Another interesting area of experimentation might be to make the algorithm work in real-time.
Those are the plans for the future, but for now, I hope that the example with image downsizing was interesting and useful for you. I also hope that you've got the idea of using dynamic programming to implement it.
So, good luck with your own experiments!
History
- 18th January, 2022: Initial version