MinHash
Concept
A probabilistic algorithm used for efficiently estimating the Jaccard similarity between sets, crucial for near-deduplication in large datasets.
Mentioned in 1 video
A probabilistic algorithm used for efficiently estimating the Jaccard similarity between sets, crucial for near-deduplication in large datasets.