Posts Tagged ‘markerclusterer’
MarkerClusterer with C# example and html-canvas (part 3)
This is part 3 of my topic on point clustering.
I will look at the marker clustering algorithm.
This is a follow up of my previous post.
http://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2/
The MarkerClusterer algorithm is simple. It iterates the point and put them in boxes.
The first item is put in a box. The following items are only put in a box if they are within a specified distance from any existing box. If not then a new box is created and the item is put in the new box.
The algorithm is also explained here:
http://code.google.com/intl/en-US/apis/maps/articles/toomanymarkers.html#markerclusterer
My implementation running time is O(k*n) where k is number of boxes and n is number of points.
(Please make a comment if I am wrong).
It is possible to manipulate the running time by setting the box-size. If the box-size is big the k will be smaller and the running time faster.
Imagine you have a set of point like this.
And you are interested in clustering to avoid overlapping points and for reducing the visual overload.

Here’s and example of how marker clustering works.
Every point has been put in box by running the defined algorithm above. the boxsize is user-defined.
The distance of the cluster point can minimum be boxsize / 2.
Thus we don’t need to consider merging clusters if they should be to close to each other and overlap.

Here is the box-area removed for a more clean look.
Notice that the points are not all inside the nearest box. This usually happens but there is a simple fix explained later if needed.

And the points are removed for an even more clean look. Now there are only cluster points with numbers that tells how many points they contain.

As the last step in the algorithm all the points are iterated and compared to the existing cluster points. They are assigned to the nearest cluster. This takes O(k*n). The cluster points are located at the center of the boxes.
Here you can see an example how it looks like but this image is for another dataset.
Notice at top left corner there is one red point inside the green box which is not inside it’s ‘correct’ box. This is not an error but because the nearest cluster point for the red point is the red box, the distances are calculated by comparing the center of the boxes to the points. If needed it can also be ‘fixed’ but that is usually not needed.


Alternatively I could also have used a distance-based approach, where you use circles instead of boxes. This is explained in the google code reference link at the top and I will skip the description of the algorithm because it is very similar to the marker clusterer. But I have included the source code for the distance clustering at the bottom. Here’s an image of how the clustering would like like.

Here are some performance test for the marker clusterer algorithm executed from my laptop.
about 0.15 sec for 10.000 points.
about 7.5 sec for 1.000.000 points where the k = 4.

With a bigger box area which can contain all the points, i.e. k = 1, the performance is about 4.8 sec for 1.000.000 points and 0.4 sec. for 100.000 points.
Summary
For the task of clustering overlapping points.
In summary of the 4 clusters: grid, k-means, marker and distance I would usually prefer grid, marker or distance clustering as they have fast running time O(k*n) and you can easily manipulate the running time by adjusting the k-value by defining the grid-size, boxsize or distance-size. K-means can have poor running time depending on your dataset and max_error level. K-means will run slower than the other 3 algorithms for large n.
This is my bundle file that contains the 4 clustering algorithms (grid, k-means, marker, distance) in one package.
The other two clustering algorithms (grid and k-means) are explained in my previous post.
Out of the box, the C# file runs marker clustering with similar result as the images above. It generates a draw.js
file with Canvas drawing information used by the canvas.html file.
All the classes are put in one file for simple distribution. Refactor as you like.
AlgorithmClusterBundle.cs
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;
namespace ClusterBundle
{
// By Kunuk Nykjaer
// Clustering package, Grid clustering, MarkerClusterer and auto increasing K-means clustering
// should split all class to each file and refactor all global const away
// quick source code bundle version
// Version 0.4
//
// for visualization this generates a draw.js file which is used by the canvas.html file
// the canvas.html is distributed elsewhere, e.g. at my blog at http://kunuk.wordpress.com
public class AlgorithmClusterBundle
{
public static DateTime Starttime;
static void Main(string[] args)
{
Starttime = DateTime.Now;
var algo = new AlgorithmClusterBundle();
// run clustertype
algo.Run(clusterType);
algo.GenerateJavascriptDrawFile(clusterType); //create draw.js
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. press a key ...");
Console.ReadKey();
}
//------------------------------
// USER CONFIG, CUSTOMIZE BELOW
public const ClusterType clusterType = ClusterType.MarkerClusterer;
public static readonly int N = 100; //number of points
// use centroid or semi-centroid cluster point placement visualization?
public const bool DoUpdateAllCentroidsToNearestContainingPoint = false;
public const bool UseProfiling = false; //debug, output time spend
// window frame boundary, aka viewport
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;
// where the files are saved or loaded
public const string FolderPath = @"c:\temp\";
// K-MEANS config
// heuristic, set tolerance for cluster density, has effect on running time.
// Set high for many points in dataset, can be lower for fewer points
public const double MAX_ERROR = 50;
// MARKERCLUSTERER config
// box area i.e. cluster size
public const int MARKERCLUSTERER_SIZE = 100;
// DISTANCECLUSTERER config
// radius size i.e. cluster size
public const int DISTANCECLUSTERER_SIZE = 62;
// GRIDCLUSTER config
// grid size
public const int GridX = 7;
public const int GridY = 6;
public const bool DoMergeGridIfCentroidsAreCloseToEachOther = true;
// USER CONFIG, CUSTOMIZE ABOVE
//------------------------------
public enum ClusterType { Unknown = -1, Grid, KMeans, MarkerClusterer, DistanceClusterer } ;
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";
readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();
public static readonly Random Rand = new Random();
const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;
// for bucket placement calc, grid cluster algo
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;
static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx"; // javascript canvas
public bool DisplayGridInCanvas = false; //autoset by used cluster type
public AlgorithmClusterBundle()
{
CreateTestDataSet(N); // create points in memory
//SaveDataSetToFile(_points); // save test points to file
//_points = LoadDataSetFromFile(); // load test points from file
}
// dictionary lookup key used by grid cluster algo
public static string GetId(int idx, int idy) //O(1)
{
return idx + ";" + idy;
}
// save points from mem to file
private const string DatasetSerializeName = "dataset.ser";
static void SaveDataSetToFile(List<XY> dataset)
{
var objectToSerialize = new DatasetToSerialize { Dataset = dataset };
new Serializer().SerializeObject(FolderPath + DatasetSerializeName, objectToSerialize);
}
// load points from file to mem
static List<XY> LoadDataSetFromFile()
{
var objectToSerialize = (DatasetToSerialize)
(new Serializer().DeSerializeObject(FolderPath + DatasetSerializeName));
return objectToSerialize.Dataset;
}
void CreateTestDataSet(int n) //O(n)
{
// CREATE DATA SET
// Create random scattered points
//for (int i = 0; i < n; i++)
//{
// var x = MinX + Rand.NextDouble() * (AbsSizeX);
// var y = MinY + Rand.NextDouble() * (AbsSizeY);
// _points.Add(new XY(x, y));
//}
// Create random region of clusters
for (int i = 0; i < n / 3; i++)
{
var x = MinX + 50 + Rand.NextDouble() * 100;
var y = MinY + 50 + Rand.NextDouble() * 100;
_points.Add(new XY(x, y));
}
for (int i = 0; i < n / 3; i++)
{
var x = MinX + 260 + Rand.NextDouble() * 100;
var y = MinY + 110 + Rand.NextDouble() * 100;
_points.Add(new XY(x, y));
}
for (int i = 0; i < n / 3; i++)
{
var x = MinX + 100 + Rand.NextDouble() * 100;
var y = MinY + 200 + Rand.NextDouble() * 100;
_points.Add(new XY(x, y));
}
}
public void Run(ClusterType clustertype)
{
switch (clustertype)
{
case ClusterType.Grid:
_clusters = new GridCluster(_points).GetCluster();
DisplayGridInCanvas = true;
break;
case ClusterType.KMeans:
_clusters = new KMeans(_points).GetCluster();
break;
case ClusterType.MarkerClusterer:
_clusters = new MarkerClusterer(_points).GetCluster();
break;
case ClusterType.DistanceClusterer:
_clusters = new DistanceClusterer(_points).GetCluster();
break;
case ClusterType.Unknown:
break;
}
}
public static string GetRandomColor()
{
int r = Rand.Next(10, 250);
int g = Rand.Next(10, 250);
int b = Rand.Next(10, 250);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}
static void CreateFile(string s)
{
var path = new FileInfo(FolderPath + "draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}
public void GenerateJavascriptDrawFile(ClusterType clusterType)
{
var sb = new StringBuilder();
// markers
var head = "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;
sb.Append(head);
// grid
if (DisplayGridInCanvas)
{
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX),
ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX),
ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);
}
// if to many points, the canvas can not be drawn or is slow,
// use max points and clusters for drawing
const int max = 10000;
// markers
if (_points != null)
{
sb.Append(NL);
for (int i = 0; i < _points.Count; i++)
{
var p = _points[i];
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));
if (i > max)
break;
}
}
string clusterInfo = "0";
if (_clusters != null)
{
sb.Append(NL);
if (clusterType == ClusterType.Grid || clusterType == ClusterType.KMeans)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));
if (i > max)
break;
}
}
else if (clusterType == ClusterType.MarkerClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawMarkerCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, MARKERCLUSTERER_SIZE, ctx, NL));
if (i > max)
break;
}
}
else if (clusterType == ClusterType.DistanceClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawDistanceCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, DISTANCECLUSTERER_SIZE, ctx, NL));
if (i > max)
break;
}
}
clusterInfo = _clusters.Count + string.Empty;
}
// bottom text
sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 20), NL));
sb.Append(tail);
CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}
public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}
public static void Profile(string s)//O(1)
{
if (!UseProfiling)
return;
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. " + s);
}
[Serializable()]
public class XY : IComparable, ISerializable
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }
//public string XToString(){return X.ToString(S, Culture_enUS);}
//public string YToString(){return Y.ToString(S, Culture_enUS);}
public XY() { }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
}
public XY(XY p) //clone
{
this.X = p.X;
this.Y = p.Y;
this.Color = p.Color;
this.Size = p.Size;
}
public int CompareTo(object o) // if used in sorted list
{
if (this.Equals(o))
return 0;
var other = (XY)o;
if (this.X > other.X)
return -1;
if (this.X < other.X)
return 1;
return 0;
}
// used by k-means random distinct selection of cluster point
public override int GetHashCode()
{
var x = X * 10000; //make the decimals be important
var y = Y * 10000;
var r = x * 17 + y * 37;
return (int)r;
}
private const int ROUND = 6;
public override bool Equals(Object o)
{
if (o == null)
return false;
var other = o as XY;
if (other == null)
return false;
// rounding could be skipped
// depends on granularity of wanted decimal precision
// note, 2 points with same x,y is regarded as being equal
var x = Math.Round(this.X, ROUND) == Math.Round(other.X, ROUND);
var y = Math.Round(this.Y, ROUND) == Math.Round(other.Y, ROUND);
return x && y;
}
public XY(SerializationInfo info, StreamingContext ctxt)
{
this.X = (double)info.GetValue("X", typeof(double));
this.Y = (double)info.GetValue("Y", typeof(double));
this.Color = (string)info.GetValue("Color", typeof(string));
this.Size = (int)info.GetValue("Size", typeof(int));
}
public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
info.AddValue("X", this.X);
info.AddValue("Y", this.Y);
info.AddValue("Color", this.Color);
info.AddValue("Size", this.Size);
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
public double ErrorLevel { get; set; } // clusterpoint and points avg dist
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set { _IsUsed = value; }
}
public Bucket(string id)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Id = id;
}
public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}
public abstract class BaseClusterAlgorithm
{
public List<XY> BaseDataset; // all points
//id, bucket
public readonly Dictionary<string, Bucket> BaseBucketsLookup =
new Dictionary<string, Bucket>();
public BaseClusterAlgorithm(){}
public BaseClusterAlgorithm(List<XY> dataset)
{
if (dataset == null || dataset.Count == 0)
throw new ApplicationException(
string.Format("dataset is null or empty"));
BaseDataset = dataset;
}
public abstract List<XY> GetCluster();
//O(k?? random fn can be slow, but is not slow because atm the k is always 1)
public static XY[] BaseGetRandomCentroids(List<XY> list, int k)
{
Profile("BaseGetRandomCentroids beg");
var set = new HashSet<XY>();
int i = 0;
var kcentroids = new XY[k];
int MAX = list.Count;
while (MAX >= k)
{
int index = Rand.Next(0, MAX - 1);
var xy = list[index];
if (set.Contains(xy))
continue;
set.Add(xy);
kcentroids[i++] = new XY(xy.X, xy.Y) { Color = GetRandomColor() };
if (i >= k)
break;
}
return kcentroids;
}
public List<XY> BaseGetClusterResult()
{
Profile("BaseGetClusterResult beg");
// collect used buckets and return the result
var clusterPoints = new List<XY>();
foreach (var item in BaseBucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
{
bucket.Centroid.Size = bucket.Points.Count;
clusterPoints.Add(bucket.Centroid);
}
}
return clusterPoints;
}
public static XY BaseGetCentroidFromCluster(List<XY> list) //O(n)
{
Profile("BaseGetCentroidFromCluster beg");
int count = list.Count;
if (list == null || count == 0)
return null;
// color is set for the points and the cluster point here
var color = GetRandomColor(); //O(1)
XY centroid = new XY(0, 0) { Size = list.Count };//O(1)
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}
centroid.X /= count;
centroid.Y /= count;
var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };
return cp;
}
//O(k*n)
public static void BaseSetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
Profile("BaseSetCentroidForAllBuckets beg");
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = BaseGetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
}
public double BaseGetTotalError()//O(k)
{
int centroidsUsed = BaseBucketsLookup.Values.Count(b => b.IsUsed);
double sum = BaseBucketsLookup.Values.
Where(b => b.IsUsed).Sum(b => b.ErrorLevel);
return sum / centroidsUsed;
}
public string BaseGetMaxError() //O(k)
{
double maxError = -double.MaxValue;
string id = string.Empty;
foreach (var b in BaseBucketsLookup.Values)
{
if (!b.IsUsed || b.ErrorLevel <= maxError)
continue;
maxError = b.ErrorLevel;
id = b.Id;
}
return id;
}
public XY BaseGetClosestPoint(XY from, List<XY> list) //O(n)
{
double min = double.MaxValue;
XY closests = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d >= min)
continue;
// update
min = d;
closests = p;
}
return closests;
}
public XY BaseGetLongestPoint(XY from, List<XY> list) //O(n)
{
double max = -double.MaxValue;
XY longest = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d <= max)
continue;
// update
max = d;
longest = p;
}
return longest;
}
// assign all points to nearest cluster
public void BaseUpdatePointsByCentroid()//O(n*k)
{
Profile("UpdatePointsByCentroid beg");
int count = BaseBucketsLookup.Count();
// clear points in the buckets, they will be re-inserted
foreach (var bucket in BaseBucketsLookup.Values)
bucket.Points.Clear();
foreach (XY p in BaseDataset)
{
double minDist = Double.MaxValue;
string index = string.Empty;
//for (int i = 0; i < count; i++)
foreach (var i in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[i];
if (bucket.IsUsed == false)
continue;
var centroid = bucket.Centroid;
var dist = MathTool.Distance(p, centroid);
if (dist < minDist)
{
// update
minDist = dist;
index = i;
}
}
//update color for point to match centroid and re-insert
var closestBucket = BaseBucketsLookup[index];
p.Color = closestBucket.Centroid.Color;
closestBucket.Points.Add(p);
}
}
// update centroid location to nearest point,
// e.g. if you want to show cluster point on a real existing point area
//O(n)
public void BaseUpdateCentroidToNearestContainingPoint(Bucket bucket)
{
Profile("BaseUpdateCentroidToNearestContainingPoint beg");
if (bucket == null || bucket.Centroid == null ||
bucket.Points == null || bucket.Points.Count == 0)
return;
var closest = BaseGetClosestPoint(bucket.Centroid, bucket.Points);
bucket.Centroid.X = closest.X;
bucket.Centroid.Y = closest.Y;
}
//O(k*n)
public void BaseUpdateAllCentroidsToNearestContainingPoint()
{
Profile("BaseUpdateAllCentroidsToNearestContainingPoint beg");
foreach (var bucket in BaseBucketsLookup.Values)
BaseUpdateCentroidToNearestContainingPoint(bucket);
}
}
public class DistanceClusterer : BaseClusterAlgorithm
{
public DistanceClusterer(List<XY> dataset)
: base(dataset)
{
}
public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}
// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset[0];
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };
BaseBucketsLookup.Add(firstId, firstBucket);
for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.DistWithin(p, bucket.Centroid, DISTANCECLUSTERER_SIZE);
if (!isInCluster)
continue;
set.Add(bucket.Id);
//use first, short dist will be calc at last step before returning data
break;
}
// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
BaseBucketsLookup.Add(pid, newbucket);
}
}
//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();
return BaseGetClusterResult();
}
}
public class MarkerClusterer : BaseClusterAlgorithm
{
public MarkerClusterer(List<XY> dataset) : base(dataset)
{
}
public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}
// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset[0];
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };
BaseBucketsLookup.Add(firstId, firstBucket);
for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.BoxWithin(p, bucket.Centroid, MARKERCLUSTERER_SIZE);
if (!isInCluster)
continue;
set.Add(bucket.Id);
//use first, short dist will be calc at last step before returning data
break;
}
// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
BaseBucketsLookup.Add(pid, newbucket);
}
}
//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();
return BaseGetClusterResult();
}
}
// O(exponential) ~ can be slow when n or k is big
public class KMeans : BaseClusterAlgorithm
{
private readonly int _InitClusterSize; // start from this cluster points
// Rule of thumb k = sqrt(n/2)
// cluster point optimization iterations
private const int _MaxIterations = 100;
private const int _MaxClusters = 100;
public KMeans(List<XY> dataset) : base(dataset)
{
_InitClusterSize = 1;
}
public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}
List<XY> RunClusterAlgo()
{
/*
ITERATIVE LINEAR ADDING CLUSTER UNTIL
REPEATE iteration of clusters convergence
until the max error is small enough
if not, insert a new cluster at worst place
(farthest in region of worst cluster area) and run again
keeping current cluster points
// one iteration of clusters convergence is defined as ..
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/
RunAlgo();
if (DoUpdateAllCentroidsToNearestContainingPoint)
BaseUpdateAllCentroidsToNearestContainingPoint();
return BaseGetClusterResult();
}
void RunAlgo()
{
// Init clusters
var centroids = BaseGetRandomCentroids(BaseDataset, _InitClusterSize);
for (int i = 0; i < centroids.Length; i++)
{
var pid = i.ToString();
var newbucket = new Bucket(pid) { Centroid = centroids[i] };
BaseBucketsLookup.Add(pid, newbucket);
}
//
double currentMaxError = double.MaxValue;
while (currentMaxError > MAX_ERROR && BaseBucketsLookup.Count < _MaxClusters)
{
RunIterationsUntilKClusterPlacementAreDone();
var id = BaseGetMaxError();
var bucket = BaseBucketsLookup[id];
currentMaxError = bucket.ErrorLevel; //update
if (currentMaxError > MAX_ERROR)
{
// Here it is linear speed when putting one new centroid at a time
// should be semi-fast because the new point is inserted at best area
//from current centroids view.
// possible improvement exists by log2 search by inserting multiple centroids and
// reducing centroid again if needed
// put new centroid in area where maxError but farthest away from current centroid in area
var longest = BaseGetLongestPoint(bucket.Centroid, bucket.Points);
var newcentroid = new XY(longest);
var newid = BaseBucketsLookup.Count.ToString();
var newbucket = new Bucket(newid) { Centroid = newcentroid };
BaseBucketsLookup.Add(newid, newbucket);
}
}
}
void RunIterationsUntilKClusterPlacementAreDone()
{
Profile("RunIterationsUntilKClusterPlacementAreDone beg");
double prevError = Double.MaxValue;
double currError = Double.MaxValue;
for (int i = 0; i < _MaxIterations; i++)
{
prevError = currError;
currError = RunOneIteration();
if (currError >= prevError) // no improvement
break;
}
}
double RunOneIteration() //O(k*n)
{
Profile("RunOneIteration beg");
// update points, assign points to cluster
BaseUpdatePointsByCentroid();
// update centroid pos by its points
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);//O(k*n)
var clustersCount = BaseBucketsLookup.Count;
for (int i = 0; i < clustersCount; i++)
{
var currentBucket = BaseBucketsLookup[i.ToString()];
if (currentBucket.IsUsed == false)
continue;
//update centroid
var newcontroid = BaseGetCentroidFromCluster(currentBucket.Points);
//no need to update color, autoset
currentBucket.Centroid = newcontroid;
currentBucket.ErrorLevel = 0;
//update error
foreach (var p in currentBucket.Points)
{
var dist = MathTool.Distance(newcontroid, p);
currentBucket.ErrorLevel += dist;
}
var val = currentBucket.ErrorLevel / currentBucket.Points.Count;
currentBucket.ErrorLevel = val; //Math.Sqrt(val);
}
return BaseGetTotalError();
}
}
public class GridCluster : BaseClusterAlgorithm
{
public GridCluster(List<XY> dataset) : base(dataset)
{
}
public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo(GridX, GridY);
return cluster;
}
// O(k*n)
void MergeClustersGrid()
{
foreach (var key in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[key];
if (bucket.IsUsed == false)
continue;
var x = bucket.Idx;
var y = bucket.Idy;
// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };
MergeClustersGridHelper(key, neighbors);
}
}
void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic, higher value gives less merging
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Max(minDistX, minDistY);
foreach (var neighborKey in neighborKeys)
{
if (!BaseBucketsLookup.ContainsKey(neighborKey))
continue;
var neighbor = BaseBucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;
var current = BaseBucketsLookup[currentKey];
var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist > minDist)
continue;
// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}
current.Points.AddRange(neighbor.Points);//O(n)
// recalc centroid
var cp = BaseGetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
neighbor.Points.Clear(); //clear mem
}
}
public List<XY> RunClusterAlgo(int gridx, int gridy)
{
// params invalid
if (gridx <= 0 || gridy <= 0)
throw new ApplicationException("GridCluster.RunClusterAlgo gridx or gridy is <= 0");
// put points in buckets
foreach (var p in BaseDataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;
int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);
// bucket id
string id = GetId(idx, idy);
// bucket exists, add point
if (BaseBucketsLookup.ContainsKey(id))
BaseBucketsLookup[id].Points.Add(p);
// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
bucket.Points.Add(p);
BaseBucketsLookup.Add(id, bucket);
}
}
// calc centrod for all buckets
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);
// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther)
MergeClustersGrid();
if (DoUpdateAllCentroidsToNearestContainingPoint)
{
BaseUpdateAllCentroidsToNearestContainingPoint();
}
// check again
// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther
&& DoUpdateAllCentroidsToNearestContainingPoint)
{
MergeClustersGrid();
// and again set centroid to closest point in bucket
BaseUpdateAllCentroidsToNearestContainingPoint();
}
return BaseGetClusterResult();
}
}
public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(XY a, XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}
public static double Min(double a, double b)
{
return a <= b ? a : b;
}
public static double Max(double a, double b)
{
return a >= b ? a : b;
}
public static bool DistWithin(XY a, XY b, double d)
{
var dist = Distance(a, b);
return dist < d;
}
public static bool BoxWithin(XY a, XY b, double boxsize)
{
var d = boxsize / 2;
var withinX = a.X - d <= b.X && a.X + d >= b.X;
var withinY = a.Y - d <= b.Y && a.Y + d >= b.Y;
return withinX && withinY;
}
}
public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter =
File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
Console.ReadKey();
}
return isSuccess;
}
}
[Serializable()]
public class DatasetToSerialize : ISerializable
{
private const string Name = "Dataset";
public List<XY> Dataset { get; set; }
public DatasetToSerialize()
{
Dataset = new List<XY>();
}
public DatasetToSerialize(SerializationInfo info, StreamingContext ctxt)
{
this.Dataset = (List<XY>)info.GetValue(Name, typeof(List<XY>));
}
public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
info.AddValue(Name, this.Dataset);
}
}
public class Serializer
{
public void SerializeObject(string filename, object objectToSerialize)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Create))
{
BinaryFormatter bFormatter = new BinaryFormatter();
bFormatter.Serialize(stream, objectToSerialize);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
Console.ReadKey();
}
}
public object DeSerializeObject(string filename)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Open))
{
BinaryFormatter bFormatter = new BinaryFormatter();
var objectToSerialize = bFormatter.Deserialize(stream);
return objectToSerialize;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
Console.ReadKey();
}
return null;
}
}
}
}
Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.
Open the html file with a modern browser that support canvas, like Firefox or Chrome.
If you want to test with Internet Explorer on your local machine, you should download the excanvas javascript plugin
canvas.html
<html>
<head>
<title>Cluster Test</title>
<!-- IE canvas not supported fallback -->
<!--[if lt IE 10]>
<script src="excanvas.js"></script>
<![endif]-->
<script type="text/javascript" src="draw.js"></script>
<!-- default -->
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}
function drawMark(cx, cy, color, ctx) {
//draw a circle
var radius = 4;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
//ctx.stroke();
}
function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
var radius = 12;
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx - 8, cy + 2);
}
function drawMarkerCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
// draw rect
var a = cx - (size / 2);
var b = cy - (size / 2);
var len = size;
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.strokeRect(a, b, len, len);
}
function drawDistanceCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
// draw radius circle
var radius = size;
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
}
window.onload = function draw() {
var canvas = document.getElementById('canvas');
if (canvas != null && canvas != undefined && canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx);
}
}
</script>
<style type="text/css">
body
{
margin-left: 10px;
margin-top: 10px;
}
canvas
{
border: 1px solid red;
}
</style>
</head>
<body>
<canvas id="canvas" width="610" height="430">
<p>Your browser doesn't support canvas. Try Firefox, Chrome or an another modern browser.</p>
</canvas>
</body>
</html>
The javascript file will be overwritten when you run the C# code.
draw.js
function drawMarkers(ctx) {
drawMark(156.434, 157.921, 'rgb(182,114,144)', ctx);
drawMark(156.788, 139.939, 'rgb(182,114,144)', ctx);
drawMark(150.712, 67.882, 'rgb(67,235,136)', ctx);
drawMark(445.846, 165.556, 'rgb(153,85,202)', ctx);
drawMark(382.982, 179.917, 'rgb(153,85,202)', ctx);
drawMark(466.365, 172.098, 'rgb(153,85,202)', ctx);
drawMark(302.81, 243.478, 'rgb(48,234,127)', ctx);
drawMark(235.35, 265.31, 'rgb(48,234,127)', ctx);
drawMark(262.092, 258.321, 'rgb(48,234,127)', ctx);
drawMarkerCluster(156.434, 157.921, 'rgb(182,114,144)', 2, 160, ctx);
drawMarkerCluster(150.712, 67.882, 'rgb(67,235,136)', 1, 160, ctx);
drawMarkerCluster(445.846, 165.556, 'rgb(153,85,202)', 3, 160, ctx);
drawMarkerCluster(302.81, 243.478, 'rgb(48,234,127)', 3, 160, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 4, 20, 320);
}