Tags: canvas, html5, fingerprinting
Disclaimer: This post is more of a write-up and note-taking for my own exploration of HTML5 canvas fingerprinting and privacy-preserving techniques.
How accurate are HTML5 canvas fingerprints? According to AmIUnique, only about 0.73% of users share the same canvas fingerprint as I do, highlighting its uniqueness.
Canvas fingerprinting is a technique widely used in ad tracking and user identification systems and has recently been explored in risk-based authentication research 1. While there is extensive research into detecting and mitigating canvas fingerprinting, few studies have examined just how privacy-invasive these techniques are in practice.
This post will explore the effectiveness of HTML5 canvas fingerprinting, its limitations, and a privacy-preserving approach using differential privacy mechanisms to add noise to fingerprints.
HTML5 canvas fingerprinting works by rendering text, shapes, and graphics on an invisible canvas, extracting the image as a data source, and generating a hash. The slight rendering differences across devices and browsers make these fingerprints relatively unique. Some sources claim its accuracy is between 80% and 99% for correctly identifying the same user again (e.g. 2).
|
|
In theory, different ways to render a font make the difference from one user to another. Below is JavaScript code that demonstrates the process:
|
|
The idea of identifying users is old and most studies only tried identifying users on small datasets 3. For instance this is my canvas on my Chrome:
And this is the same on Firefox:
Notice the subtle differences? Giving me two very distinct hashes.
In reality, studies like Laperdrix et al. 3 found that, while unique for some, around 57% of desktop devices share the same canvas fingerprint. This brings up the question of whether advanced defenses against canvas fingerprinting, like those in Brave or certain browser extensions, are truly necessary. These defenses may even stand out, reducing privacy rather than enhancing it.
Still, we can use deviations in a user’s regular canvas fingerprint to detect potentially suspicious logins in risk-based authentication. Adding signals and metrics to these deviations increases the reliability of identification.
According to research 4 canvas fingerprints can group up to 1,000 users, which still poses a privacy concern. To improve privacy, we can apply a Laplacian noise mechanism based on differential privacy. By adding controlled randomness to the fingerprint, we can reduce its specificity while preserving some utility. For instance, adding a Laplace Noise with a scale of 15 will give me this:
Since every pixel has three channels with 255 colors per channel we have an epsilon of:
$\epsilon_{channel} = \frac{\Delta f}{b} = \frac{255}{15} = 17$
$\epsilon_{image} = 17 \times 3 = 51$
We can add Laplacian noise to the canvas by using:
|
|
While this noise makes hashing unreliable due to its randomness, we can store the raw image data instead, then apply image comparison techniques or even machine learning to classify it.
Instead of the fingerprint, we just send the raw image data (which is usually around 6kb). Unfortunately, due to the random noise, compression will get harder and increase up to 150% (16kb) on my experiments. Thus, we can apply a slight blur to enhance the privacy guarantees and the compression ratio:
|
|
If we apply Gaussian blur with a blur radius of $1$ will effectively average each pixel to its immediate neighboring other 8 pixels:
$\epsilon_{channel} = \frac{\Delta f_{blurred}}{b} \approx \frac{\frac{255}{9}}{15} = 1.89$
$\epsilon = 1.89 \times 3 = 5.67$
Which will look like this and only have 10kb size:
Processing and uploading these images can take time (20–1200ms in tests), which may block the main thread. To ensure the page loads smoothly, we can defer this work using requestIdleCallback, which only runs the processing when the browser is idle.
|
|
We also add a isMobile check if the requestIdleCallback
is not present that helps prevent lag on mobile devices, where network and processing resources may be limited.
Canvas fingerprinting offers high uniqueness for user tracking but also raises privacy concerns. By adding differential privacy techniques, such as Laplacian noise and blur, we can reduce the specificity of canvas fingerprints, allowing us to gather insights without compromising individual privacy.
This exploration of canvas fingerprinting shows that with thoughtful design, we can enhance privacy and still retain some utility in user identification. As privacy standards evolve, techniques like differential privacy will be essential in bridging the gap between user tracking and personal privacy.