MinHash

Concept

A probabilistic algorithm used for efficiently estimating the Jaccard similarity between sets, crucial for near-deduplication in large datasets.

Mentioned in 1 video