On a whim I decided to update my thumbnail algorithm
. In short, the previous iteration
would look for areas of sharp contrast from a few preconfigured areas of the image. Intuitively, this had the twofold benefit of finding high contrast areas (sometimes where the important stuff is) and finding sharp areas (as out-of-focus parts won't have sharp contrast). I decided to iterate on this and preserve the sharp contrast element of the equation, but added parameters to try to resolve some of the issues with the original algorithm.
Both new ingredients are based on creating a histogram of brightness values
using luminance (greyscale that treats R/G/B as the human eye does).
- Distance from midtones. That is, how far is the most common brightness value from the midpoint. This would de-emphasize areas that are predominantly dark or bright.
- Peak contrast. I started with the distance between the two highest points in the histogram, but revised it to take the largest separation between a handful of the highest values. This would emphasize areas that have contrast across the entire measured surface.
Rather than use preconfigured sample locations, I wrote the interface to let the user define a number of circlular sampling areas
spaced evenly across the image. Running the algorithm would give me:
( x, y): [score] [values...]
(88, 76):  midtone: 96 contrast: 96 sharpness: 73
(88, 152):  midtone: 40 contrast: 40 sharpness: 24
(88, 228):  midtone: 48 contrast: 24 sharpness: 16
(88, 304):  midtone: 64 contrast: 24 sharpness: 14
(88, 380):  midtone: 48 contrast: 24 sharpness: 7
(88, 456):  midtone: 48 contrast: 24 sharpness: 14
Taking the combined midtone/contrast/sharpness values, I normalized them using them to give the percentage thumbnailability
compared to the rest of the image.
It gets a little clearer and more aligned when I draw on the sampling circles.
Generating a thumbnail is fairly trivial from here, but still heuristic
. I could take the point of highest interest and expand out if necessary, but that could fall into an adversarial case fairly easily. Rather, iterating over the entire grid is pretty straightforward, and I can take the maximum grouping that meets provided size constraints.
And beyond the thumbnail application, the high/low interest region labeling could be used for cropping or culling datasets for other algorithms.
Video games break the heuristic
, but not as easily as the last one. Text provides the sharp contrast that the code likes to hone in on. Creating a center left/right bias would help with this, but then I need to provide manual or automatic recognition that it's a screencap.
Even photos don't always get the intended result.