Software Programming

Kunuk Nykjaer

MarkerClusterer with C# example and html-canvas (part 3)

This is part 3 of my topic on point clustering.
I will look at the marker clustering algorithm.

This is a follow up of my previous post.
https://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2/

The MarkerClusterer algorithm is simple. It iterates the point and put them in boxes.
The first item is put in a box. The following items are only put in a box if they are within a specified distance from any existing box. If not then a new box is created and the item is put in the new box.

The algorithm is also explained here:

My implementation running time is O(k*n) where k is number of boxes and n is number of points.
(Please make a comment if I am wrong).
It is possible to manipulate the running time by setting the box-size. If the box-size is big the k will be smaller and the running time faster.

Imagine you have a set of point like this.
And you are interested in clustering to avoid overlapping points and for reducing the visual overload.

Here’s and example of how marker clustering works.
Every point has been put in box by running the defined algorithm above. the boxsize is user-defined.
The distance of the cluster point can minimum be boxsize / 2.
Thus we don’t need to consider merging clusters if they should be to close to each other and overlap.

Here is the box-area removed for a more clean look.
Notice that the points are not all inside the nearest box. This usually happens but there is a simple fix explained later if needed.

And the points are removed for an even more clean look. Now there are only cluster points with numbers that tells how many points they contain.

As the last step in the algorithm all the points are iterated and compared to the existing cluster points. They are assigned to the nearest cluster. This takes O(k*n). The cluster points are located at the center of the boxes.
Here you can see an example how it looks like but this image is for another dataset.
Notice at top left corner there is one red point inside the green box which is not inside it’s ‘correct’ box. This is not an error but because the nearest cluster point for the red point is the red box, the distances are calculated by comparing the center of the boxes to the points. If needed it can also be ‘fixed’ but that is usually not needed.

Alternatively I could also have used a distance-based approach, where you use circles instead of boxes. This is explained in the google code reference link at the top and I will skip the description of the algorithm because it is very similar to the marker clusterer. But I have included the source code for the distance clustering at the bottom. Here’s an image of how the clustering would like like.

Here are some performance test for the marker clusterer algorithm executed from my laptop.
about 0.15 sec for 10.000 points.
about 7.5 sec for 1.000.000 points where the k = 4.

With a bigger box area which can contain all the points, i.e. k = 1, the performance is about 4.8 sec for 1.000.000 points and 0.4 sec. for 100.000 points.

Summary
For the task of clustering overlapping points.
In summary of the 4 clusters: grid, k-means, marker and distance I would usually prefer grid, marker or distance clustering as they have fast running time O(k*n) and you can easily manipulate the running time by adjusting the k-value by defining the grid-size, boxsize or distance-size. K-means can have poor running time depending on your dataset and max_error level. K-means will run slower than the other 3 algorithms for large n.

This is my bundle file that contains the 4 clustering algorithms (grid, k-means, marker, distance) in one package.
The other two clustering algorithms (grid and k-means) are explained in my previous post.
Out of the box, the C# file runs marker clustering with similar result as the images above. It generates a draw.js
file with Canvas drawing information used by the canvas.html file.
All the classes are put in one file for simple distribution. Refactor as you like.

AlgorithmClusterBundle.cs

```
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;

namespace ClusterBundle
{
// By Kunuk Nykjaer
// Clustering package, Grid clustering, MarkerClusterer and auto increasing K-means clustering
// should split all class to each file and refactor all global const away
// quick source code bundle version
// Version 0.4
//
// for visualization this generates a draw.js file which is used by the canvas.html file
// the canvas.html is distributed elsewhere, e.g. at my blog at https://kunuk.wordpress.com
public class AlgorithmClusterBundle
{
public static DateTime Starttime;
static void Main(string[] args)
{
Starttime = DateTime.Now;
var algo = new AlgorithmClusterBundle();

// run clustertype
algo.Run(clusterType);
algo.GenerateJavascriptDrawFile(clusterType); //create draw.js

var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. press a key ...");
}

//------------------------------
// USER CONFIG, CUSTOMIZE BELOW
public const ClusterType clusterType = ClusterType.MarkerClusterer;
public static readonly int N = 100; //number of points

// use centroid or semi-centroid cluster point placement visualization?
public const bool DoUpdateAllCentroidsToNearestContainingPoint = false;
public const bool UseProfiling = false; //debug, output time spend

// window frame boundary, aka viewport
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;
// where the files are saved or loaded
public const string FolderPath = @"c:\temp\";

// K-MEANS config
// heuristic, set tolerance for cluster density, has effect on running time.
// Set high for many points in dataset, can be lower for fewer points
public const double MAX_ERROR = 50;

// MARKERCLUSTERER config
// box area i.e. cluster size
public const int MARKERCLUSTERER_SIZE = 100;

// DISTANCECLUSTERER config
// radius size i.e. cluster size
public const int DISTANCECLUSTERER_SIZE = 62;

// GRIDCLUSTER config
// grid size
public const int GridX = 7;
public const int GridY = 6;
public const bool DoMergeGridIfCentroidsAreCloseToEachOther = true;
// USER CONFIG, CUSTOMIZE ABOVE
//------------------------------

public enum ClusterType { Unknown = -1, Grid, KMeans, MarkerClusterer, DistanceClusterer } ;
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random Rand = new Random();

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc, grid cluster algo
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx"; // javascript canvas
public bool DisplayGridInCanvas = false; //autoset by used cluster type

public AlgorithmClusterBundle()
{
CreateTestDataSet(N); // create points in memory
//SaveDataSetToFile(_points); // save test points to file
}

// dictionary lookup key used by grid cluster algo
public static string GetId(int idx, int idy) //O(1)
{
return idx + ";" + idy;
}

// save points from mem to file
private const string DatasetSerializeName = "dataset.ser";
static void SaveDataSetToFile(List<XY> dataset)
{
var objectToSerialize = new DatasetToSerialize { Dataset = dataset };
new Serializer().SerializeObject(FolderPath + DatasetSerializeName, objectToSerialize);
}
// load points from file to mem
{
var objectToSerialize = (DatasetToSerialize)
(new Serializer().DeSerializeObject(FolderPath + DatasetSerializeName));
return objectToSerialize.Dataset;
}

void CreateTestDataSet(int n) //O(n)
{
// CREATE DATA SET

// Create random scattered points
//for (int i = 0; i < n; i++)
//{
//    var x = MinX + Rand.NextDouble() * (AbsSizeX);
//    var y = MinY + Rand.NextDouble() * (AbsSizeY);
//}

// Create random region of clusters
for (int i = 0; i < n / 3; i++)
{
var x = MinX + 50 + Rand.NextDouble() * 100;
var y = MinY + 50 + Rand.NextDouble() * 100;
}

for (int i = 0; i < n / 3; i++)
{
var x = MinX + 260 + Rand.NextDouble() * 100;
var y = MinY + 110 + Rand.NextDouble() * 100;
}

for (int i = 0; i < n / 3; i++)
{
var x = MinX + 100 + Rand.NextDouble() * 100;
var y = MinY + 200 + Rand.NextDouble() * 100;
}
}

public void Run(ClusterType clustertype)
{
switch (clustertype)
{
case ClusterType.Grid:
_clusters = new GridCluster(_points).GetCluster();
DisplayGridInCanvas = true;
break;
case ClusterType.KMeans:
_clusters = new KMeans(_points).GetCluster();
break;
case ClusterType.MarkerClusterer:
_clusters = new MarkerClusterer(_points).GetCluster();
break;
case ClusterType.DistanceClusterer:
_clusters = new DistanceClusterer(_points).GetCluster();
break;
case ClusterType.Unknown:
break;
}
}

public static string GetRandomColor()
{
int r = Rand.Next(10, 250);
int g = Rand.Next(10, 250);
int b = Rand.Next(10, 250);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

static void CreateFile(string s)
{
var path = new FileInfo(FolderPath + "draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}
public void GenerateJavascriptDrawFile(ClusterType clusterType)
{
var sb = new StringBuilder();

// markers
var head = "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
if (DisplayGridInCanvas)
{
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX),
ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX),
ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);
}

// if to many points, the canvas can not be drawn or is slow,
// use max points and clusters for drawing
const int max = 10000;

// markers
if (_points != null)
{
sb.Append(NL);
for (int i = 0; i < _points.Count; i++)
{
var p = _points[i];
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));
if (i > max)
break;
}
}
string clusterInfo = "0";
if (_clusters != null)
{
sb.Append(NL);

if (clusterType == ClusterType.Grid || clusterType == ClusterType.KMeans)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

if (i > max)
break;
}
}

else if (clusterType == ClusterType.MarkerClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawMarkerCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, MARKERCLUSTERER_SIZE, ctx, NL));

if (i > max)
break;
}
}
else if (clusterType == ClusterType.DistanceClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawDistanceCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, DISTANCECLUSTERER_SIZE, ctx, NL));

if (i > max)
break;
}
}

clusterInfo = _clusters.Count + string.Empty;
}

// bottom text
sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 20), NL));

sb.Append(tail);
CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}
public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}
public static void Profile(string s)//O(1)
{
if (!UseProfiling)
return;
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. " + s);
}

[Serializable()]
public class XY : IComparable, ISerializable
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }

//public string XToString(){return X.ToString(S, Culture_enUS);}
//public string YToString(){return Y.ToString(S, Culture_enUS);}

public XY() { }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
}
public XY(XY p) //clone
{
this.X = p.X;
this.Y = p.Y;
this.Color = p.Color;
this.Size = p.Size;
}

public int CompareTo(object o) // if used in sorted list
{
if (this.Equals(o))
return 0;

var other = (XY)o;
if (this.X > other.X)
return -1;
if (this.X < other.X)
return 1;

return 0;
}

// used by k-means random distinct selection of cluster point
public override int GetHashCode()
{
var x = X * 10000; //make the decimals be important
var y = Y * 10000;
var r = x * 17 + y * 37;
return (int)r;
}
private const int ROUND = 6;
public override bool Equals(Object o)
{
if (o == null)
return false;
var other = o as XY;
if (other == null)
return false;

// rounding could be skipped
// depends on granularity of wanted decimal precision
// note, 2 points with same x,y is regarded as being equal
var x = Math.Round(this.X, ROUND) == Math.Round(other.X, ROUND);
var y = Math.Round(this.Y, ROUND) == Math.Round(other.Y, ROUND);
return x && y;
}

public XY(SerializationInfo info, StreamingContext ctxt)
{
this.X = (double)info.GetValue("X", typeof(double));
this.Y = (double)info.GetValue("Y", typeof(double));
this.Color = (string)info.GetValue("Color", typeof(string));
this.Size = (int)info.GetValue("Size", typeof(int));
}

public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
public double ErrorLevel { get; set; } // clusterpoint and points avg dist
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set { _IsUsed = value; }
}
public Bucket(string id)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Id = id;
}
public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class BaseClusterAlgorithm
{
public List<XY> BaseDataset; // all points
//id, bucket
public readonly Dictionary<string, Bucket> BaseBucketsLookup =
new Dictionary<string, Bucket>();

public BaseClusterAlgorithm(){}
public BaseClusterAlgorithm(List<XY> dataset)
{
if (dataset == null || dataset.Count == 0)
throw new ApplicationException(
string.Format("dataset is null or empty"));

BaseDataset = dataset;
}

public abstract List<XY> GetCluster();
//O(k?? random fn can be slow, but is not slow because atm the k is always 1)
public static XY[] BaseGetRandomCentroids(List<XY> list, int k)
{
Profile("BaseGetRandomCentroids beg");
var set = new HashSet<XY>();
int i = 0;
var kcentroids = new XY[k];

int MAX = list.Count;
while (MAX >= k)
{
int index = Rand.Next(0, MAX - 1);
var xy = list[index];
if (set.Contains(xy))
continue;

kcentroids[i++] = new XY(xy.X, xy.Y) { Color = GetRandomColor() };

if (i >= k)
break;
}
return kcentroids;
}

public List<XY> BaseGetClusterResult()
{
Profile("BaseGetClusterResult beg");

// collect used buckets and return the result
var clusterPoints = new List<XY>();
foreach (var item in BaseBucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
{
bucket.Centroid.Size = bucket.Points.Count;
}
}

return clusterPoints;
}
public static XY BaseGetCentroidFromCluster(List<XY> list) //O(n)
{
Profile("BaseGetCentroidFromCluster beg");
int count = list.Count;
if (list == null || count == 0)
return null;

// color is set for the points and the cluster point here
var color = GetRandomColor(); //O(1)
XY centroid = new XY(0, 0) { Size = list.Count };//O(1)
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}
centroid.X /= count;
centroid.Y /= count;
var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };

return cp;
}
//O(k*n)
public static void BaseSetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
Profile("BaseSetCentroidForAllBuckets beg");
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = BaseGetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
}
public double BaseGetTotalError()//O(k)
{
int centroidsUsed = BaseBucketsLookup.Values.Count(b => b.IsUsed);
double sum = BaseBucketsLookup.Values.
Where(b => b.IsUsed).Sum(b => b.ErrorLevel);
return sum / centroidsUsed;
}
public string BaseGetMaxError() //O(k)
{
double maxError = -double.MaxValue;
string id = string.Empty;
foreach (var b in BaseBucketsLookup.Values)
{
if (!b.IsUsed || b.ErrorLevel <= maxError)
continue;

maxError = b.ErrorLevel;
id = b.Id;
}
return id;
}
public XY BaseGetClosestPoint(XY from, List<XY> list) //O(n)
{
double min = double.MaxValue;
XY closests = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d >= min)
continue;

// update
min = d;
closests = p;
}
return closests;
}
public XY BaseGetLongestPoint(XY from, List<XY> list) //O(n)
{
double max = -double.MaxValue;
XY longest = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d <= max)
continue;

// update
max = d;
longest = p;
}
return longest;
}
// assign all points to nearest cluster
public void BaseUpdatePointsByCentroid()//O(n*k)
{
Profile("UpdatePointsByCentroid beg");
int count = BaseBucketsLookup.Count();

// clear points in the buckets, they will be re-inserted
foreach (var bucket in BaseBucketsLookup.Values)
bucket.Points.Clear();

foreach (XY p in BaseDataset)
{
double minDist = Double.MaxValue;
string index = string.Empty;
//for (int i = 0; i < count; i++)
foreach (var i in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[i];
if (bucket.IsUsed == false)
continue;

var centroid = bucket.Centroid;
var dist = MathTool.Distance(p, centroid);
if (dist < minDist)
{
// update
minDist = dist;
index = i;
}
}
//update color for point to match centroid and re-insert
var closestBucket = BaseBucketsLookup[index];
p.Color = closestBucket.Centroid.Color;
}
}

// update centroid location to nearest point,
// e.g. if you want to show cluster point on a real existing point area
//O(n)
public void BaseUpdateCentroidToNearestContainingPoint(Bucket bucket)
{
Profile("BaseUpdateCentroidToNearestContainingPoint beg");
if (bucket == null || bucket.Centroid == null ||
bucket.Points == null || bucket.Points.Count == 0)
return;

var closest = BaseGetClosestPoint(bucket.Centroid, bucket.Points);
bucket.Centroid.X = closest.X;
bucket.Centroid.Y = closest.Y;
}
//O(k*n)
public void BaseUpdateAllCentroidsToNearestContainingPoint()
{
Profile("BaseUpdateAllCentroidsToNearestContainingPoint beg");
foreach (var bucket in BaseBucketsLookup.Values)
BaseUpdateCentroidToNearestContainingPoint(bucket);
}
}

public class DistanceClusterer : BaseClusterAlgorithm
{
public DistanceClusterer(List<XY> dataset)
: base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset[0];
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };

for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.DistWithin(p, bucket.Centroid, DISTANCECLUSTERER_SIZE);
if (!isInCluster)
continue;

//use first, short dist will be calc at last step before returning data
break;
}

// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
}
}

//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();

return BaseGetClusterResult();
}

}

public class MarkerClusterer : BaseClusterAlgorithm
{
public MarkerClusterer(List<XY> dataset) : base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset[0];
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };

for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.BoxWithin(p, bucket.Centroid, MARKERCLUSTERER_SIZE);
if (!isInCluster)
continue;

//use first, short dist will be calc at last step before returning data
break;
}

// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
}
}

//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();

return BaseGetClusterResult();
}
}

// O(exponential) ~ can be slow when n or k is big
public class KMeans : BaseClusterAlgorithm
{
private readonly int _InitClusterSize; // start from this cluster points
// Rule of thumb k = sqrt(n/2)

// cluster point optimization iterations
private const int _MaxIterations = 100;
private const int _MaxClusters = 100;

public KMeans(List<XY> dataset) : base(dataset)
{
_InitClusterSize = 1;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

List<XY> RunClusterAlgo()
{
/*
REPEATE iteration of clusters convergence
until the max error is small enough
if not, insert a new cluster at worst place
(farthest in region of worst cluster area) and run again
keeping current cluster points

// one iteration of clusters convergence is defined as ..
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/

RunAlgo();

if (DoUpdateAllCentroidsToNearestContainingPoint)
BaseUpdateAllCentroidsToNearestContainingPoint();
return BaseGetClusterResult();
}

void RunAlgo()
{
// Init clusters
var centroids = BaseGetRandomCentroids(BaseDataset, _InitClusterSize);
for (int i = 0; i < centroids.Length; i++)
{
var pid = i.ToString();
var newbucket = new Bucket(pid) { Centroid = centroids[i] };
}

//
double currentMaxError = double.MaxValue;
while (currentMaxError > MAX_ERROR && BaseBucketsLookup.Count < _MaxClusters)
{
RunIterationsUntilKClusterPlacementAreDone();

var id = BaseGetMaxError();
var bucket = BaseBucketsLookup[id];
currentMaxError = bucket.ErrorLevel; //update
if (currentMaxError > MAX_ERROR)
{
// Here it is linear speed when putting one new centroid at a time
// should be semi-fast because the new point is inserted at best area
//from current centroids view.
// possible improvement exists by log2 search by inserting multiple centroids and
// reducing centroid again if needed

// put new centroid in area where maxError but farthest away from current centroid in area
var longest = BaseGetLongestPoint(bucket.Centroid, bucket.Points);
var newcentroid = new XY(longest);
var newid = BaseBucketsLookup.Count.ToString();
var newbucket = new Bucket(newid) { Centroid = newcentroid };
}
}
}

void RunIterationsUntilKClusterPlacementAreDone()
{
Profile("RunIterationsUntilKClusterPlacementAreDone beg");
double prevError = Double.MaxValue;
double currError = Double.MaxValue;

for (int i = 0; i < _MaxIterations; i++)
{
prevError = currError;
currError = RunOneIteration();
if (currError >= prevError) // no improvement
break;
}
}

double RunOneIteration() //O(k*n)
{
Profile("RunOneIteration beg");

// update points, assign points to cluster
BaseUpdatePointsByCentroid();
// update centroid pos by its points
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);//O(k*n)
var clustersCount = BaseBucketsLookup.Count;
for (int i = 0; i < clustersCount; i++)
{
var currentBucket = BaseBucketsLookup[i.ToString()];
if (currentBucket.IsUsed == false)
continue;

//update centroid
var newcontroid = BaseGetCentroidFromCluster(currentBucket.Points);
//no need to update color, autoset
currentBucket.Centroid = newcontroid;
currentBucket.ErrorLevel = 0;
//update error
foreach (var p in currentBucket.Points)
{
var dist = MathTool.Distance(newcontroid, p);
currentBucket.ErrorLevel += dist;
}
var val = currentBucket.ErrorLevel / currentBucket.Points.Count;
currentBucket.ErrorLevel = val; //Math.Sqrt(val);
}

return BaseGetTotalError();
}

}

public class GridCluster : BaseClusterAlgorithm
{
public GridCluster(List<XY> dataset) : base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo(GridX, GridY);
return cluster;
}

// O(k*n)
void MergeClustersGrid()
{
foreach (var key in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}
void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic, higher value gives less merging
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Max(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!BaseBucketsLookup.ContainsKey(neighborKey))
continue;

var neighbor = BaseBucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var current = BaseBucketsLookup[currentKey];
var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist > minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = BaseGetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
neighbor.Points.Clear(); //clear mem
}
}

public List<XY> RunClusterAlgo(int gridx, int gridy)
{
// params invalid
if (gridx <= 0 || gridy <= 0)
throw new ApplicationException("GridCluster.RunClusterAlgo gridx or gridy is <= 0");

// put points in buckets
foreach (var p in BaseDataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (BaseBucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);

// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther)
MergeClustersGrid();

if (DoUpdateAllCentroidsToNearestContainingPoint)
{
BaseUpdateAllCentroidsToNearestContainingPoint();
}

// check again
// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther
&& DoUpdateAllCentroidsToNearestContainingPoint)
{
MergeClustersGrid();
// and again set centroid to closest point in bucket
BaseUpdateAllCentroidsToNearestContainingPoint();
}

return BaseGetClusterResult();
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(XY a, XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
public static double Max(double a, double b)
{
return a >= b ? a : b;
}

public static bool DistWithin(XY a, XY b, double d)
{
var dist = Distance(a, b);
return dist < d;
}

public static bool BoxWithin(XY a, XY b, double boxsize)
{
var d = boxsize / 2;
var withinX = a.X - d <= b.X && a.X + d >= b.X;
var withinY = a.Y - d <= b.Y && a.Y + d >= b.Y;
return withinX && withinY;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter =
File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return isSuccess;
}
}

[Serializable()]
public class DatasetToSerialize : ISerializable
{
private const string Name = "Dataset";
public List<XY> Dataset { get; set; }
public DatasetToSerialize()
{
Dataset = new List<XY>();
}
public DatasetToSerialize(SerializationInfo info, StreamingContext ctxt)
{
this.Dataset = (List<XY>)info.GetValue(Name, typeof(List<XY>));
}
public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
}
}

public class Serializer
{
public void SerializeObject(string filename, object objectToSerialize)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Create))
{
BinaryFormatter bFormatter = new BinaryFormatter();
bFormatter.Serialize(stream, objectToSerialize);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
}
public object DeSerializeObject(string filename)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Open))
{
BinaryFormatter bFormatter = new BinaryFormatter();
var objectToSerialize = bFormatter.Deserialize(stream);
return objectToSerialize;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return null;
}
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.
If you want to test with Internet Explorer on your local machine, you should download the excanvas javascript plugin
canvas.html

```<html>
<title>Cluster Test</title>
<!-- IE canvas not supported fallback -->
<!--[if lt IE 10]>
<script src="excanvas.js"></script>
<![endif]-->
<script type="text/javascript" src="draw.js"></script>
<!-- default -->
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
//ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx - 8, cy + 2);
}

function drawMarkerCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
// draw rect
var a = cx - (size / 2);
var b = cy - (size / 2);
var len = size;
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.strokeRect(a, b, len, len);
}

function drawDistanceCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
}

var canvas = document.getElementById('canvas');
if (canvas != null && canvas != undefined && canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx);
}
}
</script>
<style type="text/css">
body
{
margin-left: 10px;
margin-top: 10px;
}
canvas
{
border: 1px solid red;
}
</style>
<body>
<canvas id="canvas" width="610" height="430">
<p>Your browser doesn't support canvas. Try Firefox, Chrome or an another modern browser.</p>
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.
draw.js

```function drawMarkers(ctx) {
drawMark(156.434, 157.921, 'rgb(182,114,144)', ctx);
drawMark(156.788, 139.939, 'rgb(182,114,144)', ctx);
drawMark(150.712, 67.882, 'rgb(67,235,136)', ctx);
drawMark(445.846, 165.556, 'rgb(153,85,202)', ctx);
drawMark(382.982, 179.917, 'rgb(153,85,202)', ctx);
drawMark(466.365, 172.098, 'rgb(153,85,202)', ctx);
drawMark(302.81, 243.478, 'rgb(48,234,127)', ctx);
drawMark(235.35, 265.31, 'rgb(48,234,127)', ctx);
drawMark(262.092, 258.321, 'rgb(48,234,127)', ctx);

drawMarkerCluster(156.434, 157.921, 'rgb(182,114,144)', 2, 160, ctx);
drawMarkerCluster(150.712, 67.882, 'rgb(67,235,136)', 1, 160, ctx);
drawMarkerCluster(445.846, 165.556, 'rgb(153,85,202)', 3, 160, ctx);
drawMarkerCluster(302.81, 243.478, 'rgb(48,234,127)', 3, 160, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 4, 20, 320);
}
```

Written by kunuk Nykjaer

September 20, 2011 at 10:49 pm

Clustering K-means and Grid with C# example and html-canvas (part 2)

This is a follow up to my previous post.
https://kunuk.wordpress.com/2011/09/15/clustering-grid-cluster

The next post with updated code can be seen here.
https://kunuk.wordpress.com/2011/09/20/markerclusterer-with-c-example-and-html-canvas-part-3/

I have implemented K-means clustering algorithm which automatically increments the K value until the errorlevel is low enough to be accepted. The running time of the K-means runs reasonable fast. It’s best used in combination of chosing the right amount of points and setting the max_error. If the max_error is small and the number of points are big then the running time will be very slow. There are max iterations set to avoid very long running time as I believe K-means is NP-hard.

The insertion of the extra cluster point is linear and works by inserting them at the best place and keeping the current cluster points. It finds the current bad performing cluster point regarding error value and insert a new cluster point in that region farthest away from the bad performing cluster point. By doing this the new points are inserted at place which seems logical while minimizing the overall error level.

The Grid clustering algorithm runs pretty fast. Even with 100.000 points the time is below 2 sec. For 10.000 points the time is below 1 sec.

There is added an optional functionality to move the cluster point position to the closest point in the region of the clustered area after the final centroid calculation. By using this your cluster point will be positioned visually on a true existing point in the region.

When used with maps and websites the grid clustering algorithm is a good candidate of chose because of it’s fast running time which is O(m*n) where m is the number of grids and n is the number of points, please comment if I have made a wrong statement ðŸ™‚
The K-means can also be used when you know the expected points and setting the max_error accordingly.

Below are some image examples and the C# source code bundled in one file to keep it simple.

K-means 0.7 sec 10.000 points, max_error 100

K-means 2.2 sec 10.000 points, max_error 50

K-means 0.2 sec 150 points, max_error 30

K-means, an example of how K-means can’t “see” all the true clusters. Here the K is manually set to 3.
This will not happen with the current code, but this is mainly to show some weakness of the K-means algorithm.

To many points view

Grid clustering algorithm with grid 7×6 took 0.04 sec. with 160 points.
The cluster points location are moved to the nearest point in the region after the centroid calculation in this example. Some grid merging has also occurred because the clusters were too close to each other.

Final clustered result view

Performance test of the grid by using lots of points.
Grid 8×7, 1.3 sec, 100.000 points.
With Grid 6×5, it’s about 3 sec, without centroid calculation and without saving to file for 1.000.000 points.

Algorithm.cs

```
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;

// By Kunuk Nykjaer
// Clustering package, Grid clustering and auto increasing K-means clustering
// Version 0.2
public class Algorithm
{
public static DateTime Starttime;
static void Main(string[] args)
{
Starttime = DateTime.Now;
var algo = new Algorithm();

// run kmeans or grid
algo.Run(ClusterType.KMeans);
//algo.Run(ClusterType.Grid);

algo.GenerateJavascriptDrawFile(); //create draw.js

var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend+" sec. press a key ...");
}

// use centroid or semi-centroid cluster point placement visualization?
public const bool DoUpdateAllCentroidsToNearestContainingPoint = false;
public const bool UseProfiling = false; //debug, output time spend

public enum ClusterType { Unknown = -1, Grid, KMeans } ;
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random _rand = new Random();
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;

// used with Grid cluster algo
public const int GridX = 8;
public const int GridY = 7;

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc, grid cluster algo
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx"; // javascript canvas
public bool DisplayGridInCanvas; //autoset by used cluster type

public Algorithm()
{
CreateDataSet();
}

public static string GetId(int idx, int idy) //O(1)
{
return idx + ";" + idy;
}

void CreateDataSet() //O(1)
{
// CREATE DATA SET

// Create random scattered points
//for (int i = 0; i < 150; i++)
//{
//    var x = MinX + _rand.NextDouble() * (AbsSizeX);
//    var y = MinY + _rand.NextDouble() * (AbsSizeY);
//}

// Create random region of clusters
for (int i = 0; i < 50; i++)
{
var x = MinX + _rand.NextDouble() * 100;
var y = MinY + _rand.NextDouble() * 100;
}

for (int i = 0; i < 50; i++)
{
var x = MinX + 200 + _rand.NextDouble() * 100;
var y = MinY + 10 + _rand.NextDouble() * 100;
}

for (int i = 0; i < 50; i++)
{
var x = MinX + 50 + _rand.NextDouble() * 100;
var y = MinY + 150 + _rand.NextDouble() * 100;
}

}

public void Run(ClusterType clustertype)
{
switch (clustertype)
{
case ClusterType.Grid:
_clusters = new GridBaseCluster(_points).GetCluster();
DisplayGridInCanvas = true;
break;
case ClusterType.KMeans:
_clusters = new KMeans(_points).GetCluster();
DisplayGridInCanvas = false;
break;
}
}

public static string GetRandomColor()
{
int r = _rand.Next(10, 240);
int g = _rand.Next(10, 240);
int b = _rand.Next(10, 240);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

static void CreateFile(string s)
{
var path = new FileInfo(@"C:\temp\draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}
public void GenerateJavascriptDrawFile()
{
var sb = new StringBuilder();

// markers
var head = NL + "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
if (DisplayGridInCanvas)
{
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX),
ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX),
ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);
}

// markers
if (_points != null)
foreach (var p in _points)
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));

string clusterInfo = "0";
if (_clusters != null)
{
foreach (var c in _clusters)
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

clusterInfo = _clusters.Count + string.Empty;
}

// bottom text
sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 20), NL));

sb.Append(tail);
CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}
public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}
public static void Profile(string s)
{
if(!UseProfiling)
return;
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. " + s);
}//O(1)

public class XY : IComparable
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }
public string XToString { get { return X.ToString(S, Culture_enUS); } }
public string YToString { get { return Y.ToString(S, Culture_enUS); } }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
Size = 1;
}
public XY(XY p) //clone
{
this.X = p.X;
this.Y = p.Y;
this.Color = p.Color;
this.Size = p.Size;
}

public int CompareTo(object o) // if used in sorted list
{
if (this.Equals(o))
return 0;

var other = (XY)o;
if (this.X > other.X)
return -1;
if (this.X < other.X)
return 1;

return 0;
}
public override int GetHashCode()
{
var x = X * 10000;
var y = Y * 10000;
var r = x * 17 + y * 37;
return (int)r;
}
private const int ROUND = 6;
public override bool Equals(Object o)
{
if (o == null)
return false;
var other = o as XY;
if (other == null)
return false;

var x = Math.Round(this.X, ROUND) == Math.Round(other.X, ROUND);
var y = Math.Round(this.Y, ROUND) == Math.Round(other.Y, ROUND);
return x && y;
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
public double ErrorLevel { get; set; } // clusterpoint and points avg dist
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set{_IsUsed = value; }
}
public Bucket(string id)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Id = id;
}
public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class BaseClusterAlgorithm
{
public List<XY> BaseDataset; // all points
//id, bucket
public readonly Dictionary<string, Bucket> BaseBucketsLookup =
new Dictionary<string, Bucket>();
public abstract List<XY> GetCluster();
//O(k?? random fn can be slow)
public static XY[] BaseGetRandomCentroids(List<XY> list, int k)
{
var set = new HashSet<XY>();
int i = 0;
var kcentroids = new XY[k];

int MAX = list.Count;
while (MAX >= k)
{
int index = _rand.Next(0, MAX - 1);
var xy = list[index];
if (set.Contains(xy))
continue;

kcentroids[i++] = new XY(xy.X, xy.Y) { Color = GetRandomColor() };

if (i >= k)
break;
}

Profile("BaseGetRandomCentroids");
return kcentroids;
}

public List<XY> BaseGetClusterResult()//O(k*n)
{
var clusterPoints = new List<XY>();
// collect used buckets and return the result

if (DoUpdateAllCentroidsToNearestContainingPoint)
BaseUpdateAllCentroidsToNearestContainingPoint();

foreach (var item in BaseBucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
}

Profile("BaseGetClusterResult");
return clusterPoints;
}
public static XY BaseGetCentroidFromCluster(List<XY> list) //O(n)
{
int count = list.Count;
if (list == null || count == 0)
return null;

// color is set for the points and the cluster point here
var color = GetRandomColor(); //O(1)
XY centroid = new XY(0, 0) { Size = list.Count };//O(1)
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}
centroid.X /= count;
centroid.Y /= count;
var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };

Profile("BaseGetCentroidFromCluster");
return cp;
}
//O(k*n)
public static void BaseSetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = BaseGetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
Profile("BaseSetCentroidForAllBuckets");
}
public double BaseGetTotalError()//O(k)
{
int centroidsUsed = BaseBucketsLookup.Values.Count(b => b.IsUsed);
double sum = BaseBucketsLookup.Values.Where(b => b.IsUsed).Sum(b => b.ErrorLevel);
return sum / centroidsUsed;
}
public string BaseGetMaxError() //O(k)
{
double maxError = -double.MaxValue;
string id = string.Empty;
foreach (var b in BaseBucketsLookup.Values)
{
if (!b.IsUsed || b.ErrorLevel <= maxError)
continue;

maxError = b.ErrorLevel;
id = b.Id;
}
return id;
}
public XY BaseGetClosestPoint(XY from, List<XY> list) //O(n)
{
double min = double.MaxValue;
XY closests = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d >= min)
continue;

// update
min = d;
closests = p;
}
return closests;
}
public XY BaseGetLongestPoint(XY from, List<XY> list) //O(n)
{
double max = -double.MaxValue;
XY longest = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d <= max)
continue;

// update
max = d;
longest = p;
}
return longest;
}

// update centroid location to nearest point,
// e.g. if you want to show cluster point on a real existing point area
//O(n)
public void BaseUpdateCentroidToNearestContainingPoint(Bucket bucket)
{
if(bucket==null || bucket.Centroid==null ||
bucket.Points==null||bucket.Points.Count==0)
return;

var closest = BaseGetClosestPoint(bucket.Centroid, bucket.Points);
bucket.Centroid.X = closest.X;
bucket.Centroid.Y = closest.Y;
Profile("BaseUpdateCentroidToNearestContainingPoint");
}
//O(k*n)
public void BaseUpdateAllCentroidsToNearestContainingPoint()
{
foreach (var bucket in BaseBucketsLookup.Values)
BaseUpdateCentroidToNearestContainingPoint(bucket);
Profile("BaseUpdateAllCentroidsToNearestContainingPoint");
}
}

// O(exponential) ~ can be slow when n or k is big
public class KMeans : BaseClusterAlgorithm
{
private const int _InitClusterSize = 1; // start from this cluster points

// heuristic, set tolerance for cluster density, has effect on running time.
// Set high for many points in dataset, can be lower for fewer points
private const double MAX_ERROR = 30;

private const int _MaxIterations = 100; // cluster point optimization iterations
private const int _MaxClusters = 100;

public KMeans(List<XY> dataset)
{
if (dataset==null || dataset.Count==0)
throw new ApplicationException(
string.Format("KMeans dataset is null or empty") );

this.BaseDataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

List<XY> RunClusterAlgo()
{
/*
REPEATE iteration of clusters convergence
* until the max error is small enough
if not, insert a new cluster at worst place
(farthest in region of worst cluster area) and run again
keeping current cluster points

// one iteration of clusters convergence is defined as ..
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/

var clusterPoints = new List<XY>();
// params invalid
if (BaseDataset == null || BaseDataset.Count == 0)
return clusterPoints;

RunAlgo();
return BaseGetClusterResult();
}

void RunAlgo()
{
// Init clusters
var centroids = BaseGetRandomCentroids(BaseDataset, _InitClusterSize);
for (int i = 0; i < centroids.Length; i++)
{
var newbucket = new Bucket(i.ToString()) { Centroid = centroids[i] };
}

//
double currentMaxError = double.MaxValue;
while (currentMaxError > MAX_ERROR && BaseBucketsLookup.Count < _MaxClusters)
{
RunIterationsUntilKClusterPlacementAreDone();

var id = BaseGetMaxError();
var bucket = BaseBucketsLookup[id];
currentMaxError = bucket.ErrorLevel; //update
if (currentMaxError > MAX_ERROR)
{
// Here it is linear speed when putting one new centroid at a time
// should be semi-fast because the new point is inserted at best area
//from current centroids view.
// possible improvement exists by log2 search by inserting multiple centroids and
// reducing centroid again by using closest pair algo which can be O(nlogn) to
// find closest cluster dist and analysing the dist length, if its to small,
//then reduce centroids.

// put new centroid in area where maxError but farthest away from current centroid in area
var longest = BaseGetLongestPoint(bucket.Centroid, bucket.Points);
var newcentroid = new XY(longest);
var newid = BaseBucketsLookup.Count.ToString();
var newbucket = new Bucket(newid) { Centroid = newcentroid };
}
}
}

void RunIterationsUntilKClusterPlacementAreDone()
{
double prevError = Double.MaxValue;
double currError = Double.MaxValue;

for (int i = 0; i < _MaxIterations; i++)
{
prevError = currError;
currError = RunOneIteration();
if (currError >= prevError) // no improvement then stop
break;
}
Profile("RunIterationsUntilKClusterPlacementAreDone");
}

double RunOneIteration() //O(k*n)
{
// update points, assign points to cluster
UpdatePointsByCentroid();
// update centroid pos by its points
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);//O(k*n)
var clustersCount = BaseBucketsLookup.Count;
for (int i = 0; i < clustersCount; i++)
{
var currentBucket = BaseBucketsLookup[i.ToString()];
if (currentBucket.IsUsed == false)
continue;

//update centroid
var newcontroid = BaseGetCentroidFromCluster(currentBucket.Points);
//no need to update color, autoset
currentBucket.Centroid = newcontroid;
currentBucket.ErrorLevel = 0;
//update error
foreach (var p in currentBucket.Points)
{
var dist = MathTool.Distance(newcontroid, p);
currentBucket.ErrorLevel += dist;
}
var val = currentBucket.ErrorLevel/currentBucket.Points.Count;
currentBucket.ErrorLevel = val; //Math.Sqrt(val);
}

Profile("RunOneIteration");
return BaseGetTotalError();
}
void UpdatePointsByCentroid()//O(n*k)
{
int count = BaseBucketsLookup.Count();

// clear points in the buckets, they will be re-inserted
foreach (var bucket in BaseBucketsLookup.Values)
bucket.Points.Clear();

foreach (XY p in BaseDataset)
{
double minDist = Double.MaxValue;
int index = -1;
for (int i = 0; i < count; i++)
{
var bucket = BaseBucketsLookup[i.ToString()];
if (bucket.IsUsed == false)
continue;

var centroid = bucket.Centroid;
var dist = MathTool.Distance(p, centroid);
if (dist < minDist)
{
// update
minDist = dist;
index = i;
}
}
//update color for point to match centroid and re-insert
var closestBucket = BaseBucketsLookup[index.ToString()];
p.Color = closestBucket.Centroid.Color;
}

Profile("UpdatePointsByCentroid");
}
}

public class GridBaseCluster : BaseClusterAlgorithm
{
public GridBaseCluster(List<XY> dataset)
{
BaseDataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo(GridX, GridY);
return cluster;
}

// O(k*n)
void MergeClustersGrid()
{
foreach (var key in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}
void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic, higher value gives less merging
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Max(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!BaseBucketsLookup.ContainsKey(neighborKey))
continue;

var neighbor = BaseBucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var current = BaseBucketsLookup[currentKey];
var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist > minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = BaseGetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
}
}

public List<XY> RunClusterAlgo(int gridx, int gridy)
{
var clusterPoints = new List<XY>();
// params invalid
if (BaseDataset == null || BaseDataset.Count == 0 ||
gridx <= 0 || gridy <= 0)
return clusterPoints;

// put points in buckets
foreach (var p in BaseDataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (BaseBucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);

// merge if gridpoint is to close
MergeClustersGrid();

return BaseGetClusterResult();
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(Algorithm.XY a, Algorithm.XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
public static double Max(double a, double b)
{
return a >= b ? a : b;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter =
File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return isSuccess;
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.

canvas.html

```<html>
<title>Cluster test</title>
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

var canvas = document.getElementById('canvas');
if (canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx); //use fn defined in draw.js
}
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx-8, cy+2);
}
</script>
<script type="text/javascript" src="draw.js"></script>

<style type="text/css">
body{ margin-left: 10px; margin-top:10px;}
canvas  {  border: 1px solid red; }
</style>
<body>
<canvas id="canvas" width="410" height="330">
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.
draw.js

```function drawMarkers(ctx) {
drawMark(161.089, 215.198, 'rgb(82,128,16)', ctx);
drawMark(283.556, 12.712, 'rgb(92,49,66)', ctx);
drawMark(98.915, 37.67, 'rgb(181,136,150)', ctx);
drawMark(318.432, 102.905, 'rgb(92,49,66)', ctx);
drawMark(378.091, 238.713, 'rgb(182,83,25)', ctx);
drawMark(318.736, 80.199, 'rgb(92,49,66)', ctx);
drawMark(122.702, 252.349, 'rgb(82,128,16)', ctx);
drawMark(173.836, 44.669, 'rgb(181,136,150)', ctx);
drawMark(141.157, 172.273, 'rgb(82,128,16)', ctx);
drawMark(102.749, 56.737, 'rgb(181,136,150)', ctx);
drawCluster(306.908, 65.272, 'rgb(92,49,66)', 3, ctx);
drawCluster(378.091, 238.713, 'rgb(182,83,25)', 1, ctx);
drawCluster(141.649, 213.273, 'rgb(82,128,16)', 3, ctx);
drawCluster(125.167, 46.359, 'rgb(181,136,150)', 3, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 4, 20, 320);
}
```

Written by kunuk Nykjaer

September 17, 2011 at 9:35 am

Clustering – Grid cluster with C# and html-canvas example (part 1)

Grid clustering With merging supported if the centroid of the grid clusters are too close to each other.
The second post with updated code can be seen here.
https://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2

Sometimes when you want to visualize points, there can be to many points and they overlap.
To avoid this problem you can use different clustering methods and group points which are close to each. A cluster contains information such as the number of points. The idea is to apply interaction method with a click-event for the cluster where you will zoom in. The view gets expanded and the points stop overlapping.

Here I look at a grid version of clustering. You divide the screen into grid sections and put the points in a grid cluster. Then you apply a centroid calculation to put the location of the cluster point.

In this example I will visualize both the points and the cluster point. Normally you would only visualize the cluster points.
For the visualization I will use html and canvas. The clustering algorithm is implemented in C#. The C# program generates a draw.js file with point and cluster points drawing information.

The running time is something like O(m*n) where m is number of grids and n is number of points.
This clustering visualization works good enough but is not always perfect. Perfect would be always putting points to the closest cluster point.
I will compare this with K-means at a later time.

Here’s an example of how the points are clustered in grids.

The cluster points has a number which tells how many points it contains. Notice the bottom cluster with size 8. It has merged with it’s neighbor cluster where you can see the points are spread in two grid area.

The code examples will generate an image similar to above when run out of the box.

Here are the source codes.

Program.cs

```using System;

// By Kunuk Nykjaer
class Program
{
static void Main(string[] args)
{
var algo = new Algorithm();
algo.Run("gridcluster");
algo.GenerateDataString();

Console.WriteLine("press a key ...");
}
}
```

Algorithm.cs

```using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;

// By Kunuk Nykjaer
public class Algorithm
{
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random _rand = new Random();
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;
public const int GridX = 4;
public const int GridY = 3;

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx";

public Algorithm()
{
CreateDataSet();
}

void CreateDataSet()
{
// CREATE DATA SET

//for (int i = 0; i < 100; i++)
//{
//    var x = MinX + rand.NextDouble() * (MaxX);
//    var y = MinY + rand.NextDouble() * (MaxY);
//}

// Create random region of clusters
for (int i = 0; i < 10; i++)
{
var x = MinX + _rand.NextDouble() * 100;
var y = MinY + _rand.NextDouble() * 100;
}

for (int i = 0; i < 10; i++)
{
var x = MinX + 200 + _rand.NextDouble() * 100;
var y = MinY + 10 + _rand.NextDouble() * 100;
}

for (int i = 0; i < 10; i++)
{
var x = MinX + 50 + _rand.NextDouble() * 100;
var y = MinY + 150 + _rand.NextDouble() * 100;
}
}

public void Run(string cluster)
{
if (cluster == "gridcluster")
{
GridCluster grid = new GridCluster(_points);
_clusters = grid.GetCluster();
}
}

public static string GetRandomColor()
{
int r = _rand.Next(10, 240);
int g = _rand.Next(10, 240);
int b = _rand.Next(10, 240);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

public static void SetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = GetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
}

public static string GetId(int idx, int idy)
{
return idx + ";" + idy;
}

static void CreateFile(string s)
{
var path = new FileInfo(@"C:\temp\draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}

static XY GetCentroidFromCluster(List<XY> list)
{
var color = GetRandomColor();

if (list.Count == 0)
return null;

XY centroid = new XY(0, 0);
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}

int count = list.Count();
centroid.X /= count;
centroid.Y /= count;

var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };
return cp;
}

public void GenerateDataString()
{
var sb = new StringBuilder();

// markers
var head = NL + "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX), ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX), ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX), ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX), ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);

// markers
if (_points != null)
foreach (var p in _points)
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));

string clusterInfo = "0";
if (_clusters != null)
{
foreach (var c in _clusters)
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

clusterInfo = _clusters.Count + string.Empty;
}
// bottom text

sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 16), NL));

sb.Append(tail);

CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}

public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}

public class XY
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }
public string XToString { get { return X.ToString(S, Culture_enUS); } }
public string YToString { get { return Y.ToString(S, Culture_enUS); } }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
Size = 1;
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set { _IsUsed = value; }
}

public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class ClusterAlgorithm
{
//id, bucket
public readonly Dictionary<string, Bucket> bucketsLookup = new Dictionary<string, Bucket>();
public abstract List<XY> GetCluster();
}
public class KMeans : ClusterAlgorithm
{
//NOT IMPLE YET
public override List<XY> GetCluster()
{
return null;
}
/*
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
http://en.wikipedia.org/wiki/Centroid#Of_a_finite_set_of_points
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/
}

public class GridCluster : ClusterAlgorithm
{
public GridCluster(List<XY> dataset)
{
_dataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = GridClusterAlgo(GridX, GridY);
return cluster;
}

void MergeClustersGrid()
{
foreach (var key in bucketsLookup.Keys)
{
var bucket = bucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}

void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Min(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!bucketsLookup.ContainsKey(neighborKey))
continue;

var current = bucketsLookup[currentKey];
var neighbor = bucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist >= minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = GetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
}
}

public List<XY> GridClusterAlgo(int gridx, int gridy)
{
var clusterPoints = new List<XY>();
// params invalid
if (_dataset == null || _dataset.Count == 0 || gridx <= 0 || gridy <= 0)
return clusterPoints;

// put points in buckets
foreach (var p in _dataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (bucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
SetCentroidForAllBuckets(bucketsLookup.Values);

// merge if gridpoint is to close
MergeClustersGrid();

// collect buckets non-empty
foreach (var item in bucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
}
return clusterPoints;
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(Algorithm.XY a, Algorithm.XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter = File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex) { Console.WriteLine(ex.StackTrace+"\nPress a key ... ");

return isSuccess;
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.
The html file will display the same as the example image above out of the box.

canvas.html

```<html>
<title>Cluster test</title>
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

var canvas = document.getElementById('canvas');
if (canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx); //use fn defined in draw.js
}
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx-8, cy+2);
}
</script>
<script type="text/javascript" src="draw.js"></script>

<style type="text/css">
body{ margin-left: 10px; margin-top:10px;}
canvas  {  border: 1px solid red; }
</style>
<body>
<canvas id="canvas" width="410" height="330">
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.

draw.js

```function drawMarkers(ctx) {
// By Kunuk Nykjaer
ctx.beginPath();
ctx.moveTo(10, 10);
ctx.lineTo(10, 300);
ctx.moveTo(107.5, 10);
ctx.lineTo(107.5, 300);
ctx.moveTo(205, 10);
ctx.lineTo(205, 300);
ctx.moveTo(302.5, 10);
ctx.lineTo(302.5, 300);
ctx.moveTo(400, 10);
ctx.lineTo(400, 300);
ctx.moveTo(10, 10);
ctx.lineTo(400, 10);
ctx.moveTo(10, 106.667);
ctx.lineTo(400, 106.667);
ctx.moveTo(10, 203.333);
ctx.lineTo(400, 203.333);
ctx.moveTo(10, 300);
ctx.lineTo(400, 300);
ctx.stroke();
drawMark(24.566, 48.809, 'rgb(197,120,199)', ctx);
drawMark(63.55, 22.318, 'rgb(197,120,199)', ctx);
drawMark(36.17, 66.005, 'rgb(197,120,199)', ctx);
drawMark(30.083, 23.226, 'rgb(197,120,199)', ctx);
drawMark(96.758, 53.378, 'rgb(197,120,199)', ctx);
drawMark(97.994, 13.421, 'rgb(197,120,199)', ctx);
drawMark(91.573, 15.839, 'rgb(197,120,199)', ctx);
drawMark(59.451, 23.006, 'rgb(197,120,199)', ctx);
drawMark(63.699, 81.094, 'rgb(197,120,199)', ctx);
drawMark(21.665, 15.711, 'rgb(197,120,199)', ctx);
drawMark(298.417, 77.83, 'rgb(11,24,61)', ctx);
drawMark(291.38, 86.759, 'rgb(11,24,61)', ctx);
drawMark(281.345, 95.517, 'rgb(11,24,61)', ctx);
drawMark(284.307, 92.13, 'rgb(11,24,61)', ctx);
drawMark(240.746, 106.23, 'rgb(11,24,61)', ctx);
drawMark(247.004, 85.546, 'rgb(11,24,61)', ctx);
drawMark(291.715, 33.158, 'rgb(11,24,61)', ctx);
drawMark(216.012, 33.45, 'rgb(11,24,61)', ctx);
drawMark(260.559, 46.371, 'rgb(11,24,61)', ctx);
drawMark(239.061, 103.01, 'rgb(11,24,61)', ctx);
drawMark(67.791, 210.702, 'rgb(111,163,157)', ctx);
drawMark(146.852, 211.327, 'rgb(111,163,157)', ctx);
drawMark(82.629, 164.594, 'rgb(106,75,235)', ctx);
drawMark(71.372, 244.351, 'rgb(111,163,157)', ctx);
drawMark(110.755, 255.688, 'rgb(111,163,157)', ctx);
drawMark(144.035, 185.499, 'rgb(96,173,79)', ctx);
drawMark(116.13, 217.092, 'rgb(111,163,157)', ctx);
drawMark(112.972, 216.736, 'rgb(111,163,157)', ctx);
drawMark(117.428, 246.791, 'rgb(111,163,157)', ctx);
drawMark(100.972, 210.653, 'rgb(111,163,157)', ctx);
drawCluster(58.551, 36.281, 'rgb(197,120,199)', 10, ctx);
drawCluster(265.055, 76, 'rgb(11,24,61)', 10, ctx);
drawCluster(105.534, 226.668, 'rgb(111,163,157)', 8, ctx);
drawCluster(82.629, 164.594, 'rgb(106,75,235)', 1, ctx);
drawCluster(144.035, 185.499, 'rgb(96,173,79)', 1, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 5, 20, 316);
}

```

Written by kunuk Nykjaer

September 15, 2011 at 11:24 pm