Log in
Enquire now
K-means++

K-means++

A clustering method that augments the vanilla K-means algorithm with a randomized seeding technique.

OverviewStructured DataIssuesContributors

Contents

Is a
Technology
Technology

Technology attributes

Related Industries
Machine learning
Machine learning
Date Invented
2016

Other attributes

Wikidata ID
Q13414807

K-means++ is an optimization technique used to initialize the K-means algorithm in unsupervised machine learning.

It is often used as it is faster and more accurate compared to regular K-means.

K-means++ initialization differs from regular K-means initialization in that it is based on a probabilistic approach to centroid definition. The algorithm tries to initialize centroids that are far apart from each other.

Following the first, random centroid initialization, the next point is chosen based on a probability that depends on the distance to the first point: the further apart the point is, the more probable it is to be chosen.

This adds complexity in the K-means algorithm but it reduces the probability of a bad initialization and thus to a bad clustering result.

K-means++ has a different seeding technique for centroid definition compared to K-means.

The steps are the following:

  1. The smallest distance between point x and the nearest centroid is denoted by
  2. A point is chosen randomly from the dataset
  3. The point is chosen belonging to the dataset with probability
  4. Step 3 is repeated until the points are chosen

K-means ++ is said to be competitive. The O notation is used to describe asymptotic behavior of functions. Its objective is to characterize the behavior of a function for high value arguments topics in a simple but rigorous way, in order to be able to compare the behavior of several functions with one another. The O notation is commonly used to describe a strict asymptotic limit, but the narrow asymptotic limits are more formally and properly denoted by the letter Θ.

K-means attempts to minimize a "potential" that is calculated as the sum of the squared distances of points from their cluster centers. For any possible clustering scenario, the potential of a K-means++ solution can be at most

times the potential of the best possible solution. This notation is shortened to .

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

k-means++: The Advantages of Careful Seeding

David Arthur, Sergei Vassilvitskii

http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

Academic paper

References

Find more entities like K-means++

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.