3.3

Perceptual hashing and fingerprinting

A perceptual hash is a short signature that says, "this is the same image, even if the bytes are different." The math is older than C2PA and the applications are wider.

A perceptual hash is a short, fixed-size signature derived from an image's content rather than its byte representation, designed so that two images that look the same to a human produce similar hashes. The algorithms are decades old in their core form and have wide application: duplicate detection in photo libraries, copyright matching on stock-photo sites, CSAM detection at major platforms, and now soft-binding lookup for durable Content Credentials.

This page covers the dominant algorithms in production — pHash, dHash, aHash, and PDQ — and the trade-offs that determine where each is appropriate. It also covers the false-match floor, which is the practical limit on what perceptual hashing can do at scale, and the recent neural-fingerprint approaches that improve on the classical methods at higher computational cost.

The basic idea

A perceptual hash is computed by reducing an image to a small set of features that are robust against typical transformations and then encoding those features as bits. The Hamming distance between two hashes — the count of differing bits — is taken as an inverse similarity score. Two identical images produce identical hashes (distance 0); two similar images produce small distances; two unrelated images produce distances near half the hash length, the expected value for two random bit strings.

The choice of features determines what the hash is robust against. Pixel-domain features survive minor editing but break under re-encoding. DCT-coefficient features survive JPEG compression but are sensitive to geometric transformations. Learned features survive whatever transformations they were trained against. The classical handcrafted algorithms — pHash, dHash, aHash — make specific feature choices that work well in their original domains and degrade gracefully outside.

The classical algorithms

aHash (average hash)

The simplest perceptual hash. Resize the image to 8×8 pixels in grayscale, compute the mean pixel value, and emit one bit per pixel based on whether the pixel is above or below the mean. The result is a 64-bit hash that is fast to compute and trivially robust against resizing and minor color changes. It is also weakly discriminating: many unrelated images can produce similar aHashes by accident. aHash is useful for very high-confidence duplicate detection but not for similarity search at scale.

pHash (perceptual hash)

The DCT-based variant. Resize the image to 32×32 grayscale, compute the 32×32 DCT, take the 8×8 low-frequency block in the top-left, compute the median, and emit one bit per coefficient based on whether it is above or below the median. The 64-bit result is robust against JPEG compression (low-frequency coefficients are preserved) and modest geometric transformations (the DCT is largely energy-invariant). pHash is the default choice for many duplicate-detection and similarity-search applications.

dHash (difference hash)

A gradient-based variant. Resize to 9×8 grayscale and emit one bit per pixel based on whether each pixel is brighter than the one to its right. The 64-bit result encodes directional gradients and is robust against brightness adjustments. dHash and pHash perform comparably on most benchmarks; dHash is slightly faster to compute and somewhat less sensitive to color shifts.

PDQ

Photo DNA Quotient, developed by Meta (then Facebook) and released as open source in 2019. PDQ produces a 256-bit hash using a more sophisticated pipeline: edge detection, downsampling, DCT, and median quantization. It is the standard hash for the GIFCT (Global Internet Forum to Counter Terrorism) shared hash database and is widely used in CSAM detection. The 256-bit length provides much better discrimination than 64-bit hashes at scale, at the cost of slightly larger storage and slower comparison.

PhotoDNA

Microsoft's older perceptual hashing algorithm, originally developed for CSAM detection in partnership with NCMEC. It produces a 144-byte hash with proprietary internals; the algorithm has not been published openly, but the comparison metric and the operational role are well-documented. PhotoDNA is the foundation of much of the existing CSAM-detection infrastructure and remains in widespread use alongside PDQ.

AlgorithmHash lengthRobustnessTypical use
aHash64 bitsLowQuick duplicate check
pHash64 bitsModerate (JPEG-robust)Duplicate detection at small scale
dHash64 bitsModerate (gradient-based)Quick similarity check
PDQ256 bitsHighLarge-scale matching, GIFCT
PhotoDNA144 bytesHighCSAM detection, Microsoft ecosystem
Neural fingerprintsVariable (typically 512–2048 bits)Trained-distribution dependentModern stock-photo, social-platform matching

Matching at scale and the false-match floor

When the hash database is small, matching is trivial: compare the query hash to every stored hash and report those within a threshold Hamming distance. When the database has billions of entries, this is too expensive, and approximate-nearest-neighbor structures (LSH, HNSW indices, FAISS) are used. Production matching systems at the major platforms operate on databases in the hundreds of millions to billions of hashes with sub-second query latency.

At this scale, the false-match floor becomes the dominant concern. Hash collisions between unrelated images — small Hamming distances between hashes of perceptually different images — occur at a rate that depends on hash length and the distribution of image content. For 64-bit hashes against a billion-entry database, even very strict thresholds produce occasional false matches. PDQ's 256-bit length pushes the false-match rate down to acceptable levels for production CSAM detection; the trade-off is storage and comparison cost.

Published false-match rates for production deployments are typically in the 10⁻⁶ to 10⁻⁸ range, depending on threshold and database composition. For C2PA durable credentials, the false-match rate translates into the rate at which a query for one image returns a manifest belonging to a different image. This is an alarming category of error, and registry operators target the lowest false-match rates technically feasible.

Robustness to transformations

The dominant transformations a perceptual hash must survive depend on the deployment. For platform de-duplication, the relevant transformations are re-encoding at different qualities, resizing to platform-specific dimensions, color-space conversion, and minor cropping. For CSAM detection, the transformations include all of the above plus adversarial scrubbing (blur, noise injection, color-channel scrambling).

Classical perceptual hashes handle the platform-distribution cases well. They handle adversarial cases poorly. A determined adversary can produce small perceptual changes that move the hash out of the matching threshold without making the image visually distinguishable, which is one of the reasons CSAM detection has been moving toward neural fingerprints with explicit adversarial robustness training.

Neural fingerprints

Neural fingerprints replace handcrafted features with learned ones. A network (typically a contrastive-trained encoder) maps images to a fixed-dimension feature vector; the feature vector is binarized into a hash. Training pairs images with their transformations and optimizes for the property that transformed pairs produce close vectors while unrelated pairs produce distant ones.

Neural fingerprints outperform classical hashes on benchmarks that include transformations they were trained against. They underperform when faced with transformations outside training distribution, which is the standard adversarial concern. Production deployments — Adobe's CAI fingerprint, Truepic's, several proprietary platform schemes — combine neural and classical approaches, using each as a check on the other.

Computationally, neural fingerprints are 10×–100× more expensive than classical hashes to compute and store, which matters at platform scale. The deployment trend has been to use cheap classical hashes for first-pass filtering and neural fingerprints for confirmation, which is the standard pattern in any retrieval system with a quality-versus-cost trade-off.

In practice A perceptual hash match is a similarity claim, not an identity claim. Two images with the same hash may be the same image with re-encoding, may be one image and a slight derivative, or may be unrelated images that happen to collide. Always combine perceptual matching with at least one secondary check before treating the match as authoritative.

Privacy considerations

Perceptual hashing is sometimes proposed as a privacy-preserving alternative to image transmission: send the hash, not the image. The privacy properties are weaker than this framing suggests. A hash is a many-to-one function from image to short string; without the original image, you cannot exactly recover content, but with a sufficiently large image corpus you can find very-similar images by hash matching, which leaks substantial information about what was hashed.

The 2021 Apple CSAM-detection proposal — which used a neural perceptual hash on-device against a server-held hash list — drew extensive criticism on exactly this point. The mechanism could be repurposed for non-CSAM hash lists, and the false-match rate was non-zero against arbitrary image populations. Apple eventually withdrew the proposal. The underlying technical questions have not gone away; they have shifted to other deployment contexts, including the durable credentials registry.

Where the field is moving

Two trends shape perceptual hashing through the rest of the 2020s. First, neural fingerprints continue to displace classical hashes in high-value applications, particularly where adversarial robustness matters. The compute cost is becoming less of an issue as hardware improves. Second, the integration with C2PA durable credentials is pushing perceptual hashing into a new role: not just duplicate detection but manifest recovery after stripping. This role has different accuracy requirements (false positives are more consequential because they return a wrong manifest) and is driving the false-match-rate targets down further.

The longer-term question is whether perceptual hashing as a category remains distinct from invisible watermarking. The two are converging in their applications and in some of their technical approaches; the difference between "extract a watermark the producer embedded" and "compute a feature signature the producer did not embed" is blurring as learned-feature schemes that work both ways emerge. The standardization layer (registries, C2PA assertion slots) is likely to absorb the technical distinction over time, leaving the user-facing question simply: did the lookup return a manifest, yes or no.