# k-Means Clustering Algorithms

*k*-means clustering is a method to partition observation into clusters where .
Each observation belongs to a cluster with the nearest mean compared to other clusters :

The objective function is to minimize the sum of squared distances of each observation to its cluster mean.

Since has numerous local minima, there is no algorithm known today to guarantee an optimal solution.

## Lloyd’s algorithm

It is most widely known algorithm to solve *k*-means clustering
and sometimes it is mistaken for the method itself. However, it is:

- Sensitive to the initialization of the means
- Promise nothing about quality, hard to decide about number of iterations

## Hartigan’s Algorithm

- Converges quickly
- Sensitive to initial random class assignment of points.
- First animation
- Second animation with different initialization

## MacQueen’s Algorithm

- Convenient for streams
- Sensitive to the order of stream and the initialization of the means
- First animation
- Second animation with shuffled data

[code] [presentation]

**References**

C. Bauckhage. *“NumPy/SciPy Recipes for Data Science: Computing Nearest Neighbors.”*