# Software Programming

Kunuk Nykjaer

## K-nearest neighbor

There are miscellaneous algorithms for searching nearest neighbors.

An alternative method is to use grid indexing strategy.
Slowly expand the grid boxes from the center to find the k-nearest neighbors.
The grid is used as a filtering mechanism to reduce the search space.

This works fast for distributed data across the space and is a good alternative algorithm for dynamic data that changes position frequently. The origin is in ring 0.
The origin can anywhere in the box and the distance to test on first iteration must be at least 1 x grid distance. The first iteration starts with data from ring 0 and 1.

The algorithm goes like this:

```i = 0
Point origin
list currRing = empty
list nextRing = empty
list temp
while all rings not explored
i = i + 1
temp = empty
for all point in nextRing test distance between origin and point
if distance is within i * grid put it in currRing
else put it in temp

nextRing = empty
Add all from temp to nextRing

temp = empty
if(i==1) temp = points from ring 0 and ring 1
else temp = points from ring i

For all point in temp test distance between origin and point
if distance is within i * grid put it in currRing
else put it in nextRing

If there are at least k points in currRing then exit while loop
end while loop

if currRing count is < k then add all from nextRing to currRing

sort currRing by distance
take first k from currRing
``` Grid version algorithm:
Apply algorithm described above.
Searching: `O(n * m)` where m is grid cells and k << n.

Naive version algorithm:
For origin point check the distance to every other point.
Take k first items sorted by distance.
Searching: `O(n log n)`

Written by kunuk Nykjaer

January 21, 2013 at 9:36 pm

## Single detection

Here I define single detection as data which has a certain minimum distance to every other data.
It can be used for anomaly detection or outlier detection of data set. It can also be used for collision detection.

Given a list of points in 2D dimension how can you find the ones which are not close to the other points?

I will include my Big O notation calculations and some test runs of the implementations.

There is a naive implementation in `O(n2)`.
There are also better alternatives which runs faster than the naive version.

First generate a grid of squares like this and put all your data in the grid according to their positions in the dimension. In the figure above there is a single item illustrated with a red dot inside the blue box.

For anomaly detection iterate the grid boxes and find those which has a single item inside a box. Those items are candidate as an anomaly item. If there are multiple items inside the blue box then it cannot be a candidate as an anomaly item because of the distance constraint. The candidate can be anywhere inside the box, thus the outer circle shows which areas that must be examined. If there is no other item inside the outer circle, then we know the candidate is an anomaly. On worst case 20 boxes must be examined as illustrated above.

Euclidean distance
is used.
The relation between the max distance and grid square distance can be calculated using Pythagoras.
Max Distance = c
Grid box distance = a

`c2 = a2 + a2 => a = sqrt( (c2) / 2)`

Naive version algorithm:
For all the points check the distance to every other point.
Searching: `O(n2)`

Grid version algorithm:
Insert all the items in the grid buckets.
Take the neighbor items of the candidate item’s grid. You will select items from 20 boxes as a worst case. Then iterate all the items and test the distance. If all the distance are above the maximum allowed then the candidate is detected as an anomaly.

This algorithm is much better than the naive version, runs very fast and is relatively simple to implement. Each grid box is a bucket of items. Use a Hash table for each bucket.
Inserting the items into the grids take `O(n)`. Removing and inserting an item takes `O(1)`.
Searching for anomalies take `O(m * n)`.

If there are A anomalies in the data set and m grids this algorithm will run in:
Initialization: `O(n + m)` + Searching: `O(m * A * n/A)` => `O(m * n)`

K-d tree with nearest neighbor algorithm:
Insert all the items in the grid buckets takes O(n).
Do also apply a K-d tree data structure. From the candidate item’s in the box apply the first nearest neighbor algorithm. If the distance is above maximum allowed then it is detected as an anomaly.

It take `O(n log n)` time to generate the K-d tree. The nearest neighbor search takes
`O(log n)`.
Removing and inserting an item takes `O(log n)`. If there are A anomalies in the data set and m grids then searching for anomalies take `O(m * A * log n)`.

This algorithm will run in:
initialization: `O(m + n * log n)` + Searching: `O(m * A * log n)`

Test results
With my test runs of items randomly distributed data the grid version performs best. The naive version is very slow. The K-d Tree version is much faster than the naive version.

I found this C# K-d Tree implementation with nearest neighbor search which I used for testing the algorithm.

Sample picture: The test runs are the time to detect the anomalies, the searching running time.
The data structure initialization is not included.
For n = 1.000.000.
K-d Tree it took about 30 seconds.
Grid and Naive about 2 seconds.

```Random distributed, n = 100
Algorithm:     Milli seconds:   Single detection found:
NeighborGrid       12                   40
K-d Tree          128                   40
Naive               2                   40
```

```Random distributed, n = 10.000
Algorithm:     Milli seconds:   Single detection found:
NeighborGrid       141                  8384
K-d Tree          2543                  8384
Naive            36421                  8384
```

```Random distributed, n = 20.000
Grid: 4286 x 3572
MaxDistance: 0.2

Algorithm:     Milli seconds:   Single detection found:
NeighborGrid       1434                   18682
K-d Tree           5878                   18682
Naive             129401                  18682
```

```Random distributed, n = 30.000
Grid: 1705 x 1421
MaxDistance: 0.5

Algorithm:     Milli seconds:   Single detection found:
NeighborGrid        521                     27100
K-d Tree            7936                    27100
Naive               slow
```

grid size is smaller thus the NeighborGrid runs fast.

```Random distributed, n = 50.000
Grid: 4286 x 3572
MaxDistance: 0.2

Algorithm:     Milli seconds:   Single detection found:
NeighborGrid        1472                   42268
K-d Tree           13361                   42268
Naive              very slow
```

```Random distributed, n = 1.000.000
Grid: 1705 x 1421
MaxDistance: 0.5

Algorithm:     Milli seconds:   Single detection found:
NeighborGrid           983              35824
K-d Tree             23966              35824
Naive              forever
```

Conclusion
These are just sample runs from random distributed data.
If the number of anomalies were very high and the data set were different then the K-d Tree algorithm might run faster. I would guess that the data set would have to be very specific and not likely to be a common scenario. I have given my best estimated Big O running time analysis for the algorithms.

The algorithm runs fast because of the single item grid box test. First they iterate the grid and skip every grid box except for those who has a single item inside. If you expect few anomalies for large data set then this would run very fast because most grid boxes would be skipped for further examination. Thus the running time for anomaly search is closer to `O(m)` for `A << n`.

If you are working with dynamic data where positions changes, then the Grid algorithm will run faster because of the `O(1)` operations. With K-d tree you will have to rebuild the tree occasionally and delete/update takes `O(log n)`.

Sample picture for n = 100 Written by kunuk Nykjaer

January 13, 2013 at 8:14 am

## Google Maps Server-side Clustering with Asp.net C#

• This project is available at Github
• Original project location is available at Google Code

For a project in my work I made a server-side clustering of data to be displayed in Google Maps.

The clustering is a grid-based clustering using geo-stationary grid-placement.
I was impressed by the performance of the solution from Maptimize
They have this example webside Crunch Panorama

I wanted to make something similar and have made an example project of server-side clustering. The demo can be seen here Google Maps Clustering. The grid has been included in the view for demonstration. Clusters that are close to each other are merged.

The search options is included by example code from Cibul Tech Blog. Here is the overview of how the viewport works. ### Client-server Ajax communication demonstration:

This Google Maps clustering project uses two webservice methods: GetMarkers and GetMarkerDetail.
The GetMarkers method returns the marker and cluster information and GetMarkerDetail method returns specific information for a selected marker. GetMarkers method:
The client sends the boundary of the screen using Norht-east and South-west latlon point and the zoom level.
Client json get markers request:

```
{"nelat":"62.511244050310154","nelon":"30.008460312500006",
"swlat":"47.574956142655","swlon":"-5.235680312499994","zoomlevel":"5"}
```

The server does the server-side clustering based on the information from the client and returns only the data that is to be displayed to the client. Here it is two markers with latlon information. When the count (the C) is larger than 1, then it is a cluster marker, else it is a details marker. The details marker has the I and T information set. The I is the id and the T is type information.

```
{"Points":[
{"Y":"56.05777","X":"9.30294","C":1624,"I":"","T":""},
{"Y":"55.09446","X":"15.11401","C":"1","I":"aa005c3b-9301-4040-bac3-3dee8cb29b48","T":"2"}
]}
```

The client examines the reply and removes markers that should no longer be visible and only adds new marker to the screen. The latlon and count information are used to build a key for identifying unique marker objects.

GetMarkerDetail method:
The client sends a details request for a specific marker using an id-information.
Client json get marker detail: request

```
{"id":"aa005c3b-9301-4040-bac3-3dee8cb29b48"}
```

The server replies by giving content for the specific marker.
Server json show marker detail reply:

```
{"Content":"CONTENT INFORMATION HERE"}
```

### Math information

The centroid calculation is based on circular mean. This is because the longitude overlaps at -180 and 180 degrees. There are no overlapping using latitude with Google Maps (haven’t seen or noticed it).

The distances are calculated using Euclidean distance. This calculation is fast and it is good enough for clustering the points into a cluster.

You usually only need as a maximum 6 decimal precision when using latlon and Google Maps. This gives within 1 meter precision.

The grid-size is fixed and adapts to the zoom level.
By doing one step zooming in Google Maps the latlon window range is doubled or halved.
The geo-stationary grid-placement indexes are achieved using divide and modulus calculation.

Time complexity is on average(m*n) and space complexity is O(m*n)
where n is the number of points used in total and m is the number of grids returned to the client.
You are welcome to analyse it yourself and correct me. I use hashset and hashmap to minimize the time complexity. For fast enough response time (below 1 sec) the number of markers should be max 300.000.
Time complexity is ~ O(n^2) on worst case but extremely unlikely,
happens if most centroids are merged with neighbor-centroids.

The DB in this project is a simple file with list of points which is loaded in the memory. The algorithm uses linear search and can (probably) be improved using spatial data structure.

Written by kunuk Nykjaer

November 5, 2011 at 6:46 pm

## MarkerClusterer with C# example and html-canvas (part 3)

This is part 3 of my topic on point clustering.
I will look at the marker clustering algorithm.

This is a follow up of my previous post.
https://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2/

The MarkerClusterer algorithm is simple. It iterates the point and put them in boxes.
The first item is put in a box. The following items are only put in a box if they are within a specified distance from any existing box. If not then a new box is created and the item is put in the new box.

The algorithm is also explained here:

My implementation running time is O(k*n) where k is number of boxes and n is number of points.
(Please make a comment if I am wrong).
It is possible to manipulate the running time by setting the box-size. If the box-size is big the k will be smaller and the running time faster.

Imagine you have a set of point like this.
And you are interested in clustering to avoid overlapping points and for reducing the visual overload. Here’s and example of how marker clustering works.
Every point has been put in box by running the defined algorithm above. the boxsize is user-defined.
The distance of the cluster point can minimum be boxsize / 2.
Thus we don’t need to consider merging clusters if they should be to close to each other and overlap. Here is the box-area removed for a more clean look.
Notice that the points are not all inside the nearest box. This usually happens but there is a simple fix explained later if needed. And the points are removed for an even more clean look. Now there are only cluster points with numbers that tells how many points they contain. As the last step in the algorithm all the points are iterated and compared to the existing cluster points. They are assigned to the nearest cluster. This takes O(k*n). The cluster points are located at the center of the boxes.
Here you can see an example how it looks like but this image is for another dataset.
Notice at top left corner there is one red point inside the green box which is not inside it’s ‘correct’ box. This is not an error but because the nearest cluster point for the red point is the red box, the distances are calculated by comparing the center of the boxes to the points. If needed it can also be ‘fixed’ but that is usually not needed.  Alternatively I could also have used a distance-based approach, where you use circles instead of boxes. This is explained in the google code reference link at the top and I will skip the description of the algorithm because it is very similar to the marker clusterer. But I have included the source code for the distance clustering at the bottom. Here’s an image of how the clustering would like like. Here are some performance test for the marker clusterer algorithm executed from my laptop.
about 0.15 sec for 10.000 points.
about 7.5 sec for 1.000.000 points where the k = 4. With a bigger box area which can contain all the points, i.e. k = 1, the performance is about 4.8 sec for 1.000.000 points and 0.4 sec. for 100.000 points.

Summary
For the task of clustering overlapping points.
In summary of the 4 clusters: grid, k-means, marker and distance I would usually prefer grid, marker or distance clustering as they have fast running time O(k*n) and you can easily manipulate the running time by adjusting the k-value by defining the grid-size, boxsize or distance-size. K-means can have poor running time depending on your dataset and max_error level. K-means will run slower than the other 3 algorithms for large n.

This is my bundle file that contains the 4 clustering algorithms (grid, k-means, marker, distance) in one package.
The other two clustering algorithms (grid and k-means) are explained in my previous post.
Out of the box, the C# file runs marker clustering with similar result as the images above. It generates a draw.js
file with Canvas drawing information used by the canvas.html file.
All the classes are put in one file for simple distribution. Refactor as you like.

AlgorithmClusterBundle.cs

```
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.Text;

namespace ClusterBundle
{
// By Kunuk Nykjaer
// Clustering package, Grid clustering, MarkerClusterer and auto increasing K-means clustering
// should split all class to each file and refactor all global const away
// quick source code bundle version
// Version 0.4
//
// for visualization this generates a draw.js file which is used by the canvas.html file
// the canvas.html is distributed elsewhere, e.g. at my blog at https://kunuk.wordpress.com
public class AlgorithmClusterBundle
{
public static DateTime Starttime;
static void Main(string[] args)
{
Starttime = DateTime.Now;
var algo = new AlgorithmClusterBundle();

// run clustertype
algo.Run(clusterType);
algo.GenerateJavascriptDrawFile(clusterType); //create draw.js

var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. press a key ...");
}

//------------------------------
// USER CONFIG, CUSTOMIZE BELOW
public const ClusterType clusterType = ClusterType.MarkerClusterer;
public static readonly int N = 100; //number of points

// use centroid or semi-centroid cluster point placement visualization?
public const bool DoUpdateAllCentroidsToNearestContainingPoint = false;
public const bool UseProfiling = false; //debug, output time spend

// window frame boundary, aka viewport
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;
// where the files are saved or loaded
public const string FolderPath = @"c:\temp\";

// K-MEANS config
// heuristic, set tolerance for cluster density, has effect on running time.
// Set high for many points in dataset, can be lower for fewer points
public const double MAX_ERROR = 50;

// MARKERCLUSTERER config
// box area i.e. cluster size
public const int MARKERCLUSTERER_SIZE = 100;

// DISTANCECLUSTERER config
// radius size i.e. cluster size
public const int DISTANCECLUSTERER_SIZE = 62;

// GRIDCLUSTER config
// grid size
public const int GridX = 7;
public const int GridY = 6;
public const bool DoMergeGridIfCentroidsAreCloseToEachOther = true;
// USER CONFIG, CUSTOMIZE ABOVE
//------------------------------

public enum ClusterType { Unknown = -1, Grid, KMeans, MarkerClusterer, DistanceClusterer } ;
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random Rand = new Random();

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc, grid cluster algo
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx"; // javascript canvas
public bool DisplayGridInCanvas = false; //autoset by used cluster type

public AlgorithmClusterBundle()
{
CreateTestDataSet(N); // create points in memory
//SaveDataSetToFile(_points); // save test points to file
}

// dictionary lookup key used by grid cluster algo
public static string GetId(int idx, int idy) //O(1)
{
return idx + ";" + idy;
}

// save points from mem to file
private const string DatasetSerializeName = "dataset.ser";
static void SaveDataSetToFile(List<XY> dataset)
{
var objectToSerialize = new DatasetToSerialize { Dataset = dataset };
new Serializer().SerializeObject(FolderPath + DatasetSerializeName, objectToSerialize);
}
// load points from file to mem
{
var objectToSerialize = (DatasetToSerialize)
(new Serializer().DeSerializeObject(FolderPath + DatasetSerializeName));
return objectToSerialize.Dataset;
}

void CreateTestDataSet(int n) //O(n)
{
// CREATE DATA SET

// Create random scattered points
//for (int i = 0; i < n; i++)
//{
//    var x = MinX + Rand.NextDouble() * (AbsSizeX);
//    var y = MinY + Rand.NextDouble() * (AbsSizeY);
//}

// Create random region of clusters
for (int i = 0; i < n / 3; i++)
{
var x = MinX + 50 + Rand.NextDouble() * 100;
var y = MinY + 50 + Rand.NextDouble() * 100;
}

for (int i = 0; i < n / 3; i++)
{
var x = MinX + 260 + Rand.NextDouble() * 100;
var y = MinY + 110 + Rand.NextDouble() * 100;
}

for (int i = 0; i < n / 3; i++)
{
var x = MinX + 100 + Rand.NextDouble() * 100;
var y = MinY + 200 + Rand.NextDouble() * 100;
}
}

public void Run(ClusterType clustertype)
{
switch (clustertype)
{
case ClusterType.Grid:
_clusters = new GridCluster(_points).GetCluster();
DisplayGridInCanvas = true;
break;
case ClusterType.KMeans:
_clusters = new KMeans(_points).GetCluster();
break;
case ClusterType.MarkerClusterer:
_clusters = new MarkerClusterer(_points).GetCluster();
break;
case ClusterType.DistanceClusterer:
_clusters = new DistanceClusterer(_points).GetCluster();
break;
case ClusterType.Unknown:
break;
}
}

public static string GetRandomColor()
{
int r = Rand.Next(10, 250);
int g = Rand.Next(10, 250);
int b = Rand.Next(10, 250);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

static void CreateFile(string s)
{
var path = new FileInfo(FolderPath + "draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}
public void GenerateJavascriptDrawFile(ClusterType clusterType)
{
var sb = new StringBuilder();

// markers
var head = "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
if (DisplayGridInCanvas)
{
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX),
ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX),
ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);
}

// if to many points, the canvas can not be drawn or is slow,
// use max points and clusters for drawing
const int max = 10000;

// markers
if (_points != null)
{
sb.Append(NL);
for (int i = 0; i < _points.Count; i++)
{
var p = _points[i];
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));
if (i > max)
break;
}
}
string clusterInfo = "0";
if (_clusters != null)
{
sb.Append(NL);

if (clusterType == ClusterType.Grid || clusterType == ClusterType.KMeans)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

if (i > max)
break;
}
}

else if (clusterType == ClusterType.MarkerClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawMarkerCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, MARKERCLUSTERER_SIZE, ctx, NL));

if (i > max)
break;
}
}
else if (clusterType == ClusterType.DistanceClusterer)
{
for (int i = 0; i < _clusters.Count; i++)
{
var c = _clusters[i];
sb.Append(string.Format(
"drawDistanceCluster({0}, {1}, {2}, {3}, {4}, {5});{6}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, DISTANCECLUSTERER_SIZE, ctx, NL));

if (i > max)
break;
}
}

clusterInfo = _clusters.Count + string.Empty;
}

// bottom text
sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 20), NL));

sb.Append(tail);
CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}
public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}
public static void Profile(string s)//O(1)
{
if (!UseProfiling)
return;
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. " + s);
}

[Serializable()]
public class XY : IComparable, ISerializable
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }

//public string XToString(){return X.ToString(S, Culture_enUS);}
//public string YToString(){return Y.ToString(S, Culture_enUS);}

public XY() { }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
}
public XY(XY p) //clone
{
this.X = p.X;
this.Y = p.Y;
this.Color = p.Color;
this.Size = p.Size;
}

public int CompareTo(object o) // if used in sorted list
{
if (this.Equals(o))
return 0;

var other = (XY)o;
if (this.X > other.X)
return -1;
if (this.X < other.X)
return 1;

return 0;
}

// used by k-means random distinct selection of cluster point
public override int GetHashCode()
{
var x = X * 10000; //make the decimals be important
var y = Y * 10000;
var r = x * 17 + y * 37;
return (int)r;
}
private const int ROUND = 6;
public override bool Equals(Object o)
{
if (o == null)
return false;
var other = o as XY;
if (other == null)
return false;

// rounding could be skipped
// depends on granularity of wanted decimal precision
// note, 2 points with same x,y is regarded as being equal
var x = Math.Round(this.X, ROUND) == Math.Round(other.X, ROUND);
var y = Math.Round(this.Y, ROUND) == Math.Round(other.Y, ROUND);
return x && y;
}

public XY(SerializationInfo info, StreamingContext ctxt)
{
this.X = (double)info.GetValue("X", typeof(double));
this.Y = (double)info.GetValue("Y", typeof(double));
this.Color = (string)info.GetValue("Color", typeof(string));
this.Size = (int)info.GetValue("Size", typeof(int));
}

public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
public double ErrorLevel { get; set; } // clusterpoint and points avg dist
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set { _IsUsed = value; }
}
public Bucket(string id)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Id = id;
}
public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class BaseClusterAlgorithm
{
public List<XY> BaseDataset; // all points
//id, bucket
public readonly Dictionary<string, Bucket> BaseBucketsLookup =
new Dictionary<string, Bucket>();

public BaseClusterAlgorithm(){}
public BaseClusterAlgorithm(List<XY> dataset)
{
if (dataset == null || dataset.Count == 0)
throw new ApplicationException(
string.Format("dataset is null or empty"));

BaseDataset = dataset;
}

public abstract List<XY> GetCluster();
//O(k?? random fn can be slow, but is not slow because atm the k is always 1)
public static XY[] BaseGetRandomCentroids(List<XY> list, int k)
{
Profile("BaseGetRandomCentroids beg");
var set = new HashSet<XY>();
int i = 0;
var kcentroids = new XY[k];

int MAX = list.Count;
while (MAX >= k)
{
int index = Rand.Next(0, MAX - 1);
var xy = list[index];
if (set.Contains(xy))
continue;

kcentroids[i++] = new XY(xy.X, xy.Y) { Color = GetRandomColor() };

if (i >= k)
break;
}
return kcentroids;
}

public List<XY> BaseGetClusterResult()
{
Profile("BaseGetClusterResult beg");

// collect used buckets and return the result
var clusterPoints = new List<XY>();
foreach (var item in BaseBucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
{
bucket.Centroid.Size = bucket.Points.Count;
}
}

return clusterPoints;
}
public static XY BaseGetCentroidFromCluster(List<XY> list) //O(n)
{
Profile("BaseGetCentroidFromCluster beg");
int count = list.Count;
if (list == null || count == 0)
return null;

// color is set for the points and the cluster point here
var color = GetRandomColor(); //O(1)
XY centroid = new XY(0, 0) { Size = list.Count };//O(1)
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}
centroid.X /= count;
centroid.Y /= count;
var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };

return cp;
}
//O(k*n)
public static void BaseSetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
Profile("BaseSetCentroidForAllBuckets beg");
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = BaseGetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
}
public double BaseGetTotalError()//O(k)
{
int centroidsUsed = BaseBucketsLookup.Values.Count(b => b.IsUsed);
double sum = BaseBucketsLookup.Values.
Where(b => b.IsUsed).Sum(b => b.ErrorLevel);
return sum / centroidsUsed;
}
public string BaseGetMaxError() //O(k)
{
double maxError = -double.MaxValue;
string id = string.Empty;
foreach (var b in BaseBucketsLookup.Values)
{
if (!b.IsUsed || b.ErrorLevel <= maxError)
continue;

maxError = b.ErrorLevel;
id = b.Id;
}
return id;
}
public XY BaseGetClosestPoint(XY from, List<XY> list) //O(n)
{
double min = double.MaxValue;
XY closests = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d >= min)
continue;

// update
min = d;
closests = p;
}
return closests;
}
public XY BaseGetLongestPoint(XY from, List<XY> list) //O(n)
{
double max = -double.MaxValue;
XY longest = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d <= max)
continue;

// update
max = d;
longest = p;
}
return longest;
}
// assign all points to nearest cluster
public void BaseUpdatePointsByCentroid()//O(n*k)
{
Profile("UpdatePointsByCentroid beg");
int count = BaseBucketsLookup.Count();

// clear points in the buckets, they will be re-inserted
foreach (var bucket in BaseBucketsLookup.Values)
bucket.Points.Clear();

foreach (XY p in BaseDataset)
{
double minDist = Double.MaxValue;
string index = string.Empty;
//for (int i = 0; i < count; i++)
foreach (var i in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[i];
if (bucket.IsUsed == false)
continue;

var centroid = bucket.Centroid;
var dist = MathTool.Distance(p, centroid);
if (dist < minDist)
{
// update
minDist = dist;
index = i;
}
}
//update color for point to match centroid and re-insert
var closestBucket = BaseBucketsLookup[index];
p.Color = closestBucket.Centroid.Color;
}
}

// update centroid location to nearest point,
// e.g. if you want to show cluster point on a real existing point area
//O(n)
public void BaseUpdateCentroidToNearestContainingPoint(Bucket bucket)
{
Profile("BaseUpdateCentroidToNearestContainingPoint beg");
if (bucket == null || bucket.Centroid == null ||
bucket.Points == null || bucket.Points.Count == 0)
return;

var closest = BaseGetClosestPoint(bucket.Centroid, bucket.Points);
bucket.Centroid.X = closest.X;
bucket.Centroid.Y = closest.Y;
}
//O(k*n)
public void BaseUpdateAllCentroidsToNearestContainingPoint()
{
Profile("BaseUpdateAllCentroidsToNearestContainingPoint beg");
foreach (var bucket in BaseBucketsLookup.Values)
BaseUpdateCentroidToNearestContainingPoint(bucket);
}
}

public class DistanceClusterer : BaseClusterAlgorithm
{
public DistanceClusterer(List<XY> dataset)
: base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset;
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };

for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.DistWithin(p, bucket.Centroid, DISTANCECLUSTERER_SIZE);
if (!isInCluster)
continue;

//use first, short dist will be calc at last step before returning data
break;
}

// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
}
}

//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();

return BaseGetClusterResult();
}

}

public class MarkerClusterer : BaseClusterAlgorithm
{
public MarkerClusterer(List<XY> dataset) : base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

// O(k*n)
List<XY> RunClusterAlgo()
{
// put points in buckets
int allPointsCount = BaseDataset.Count;
var firstPoint = BaseDataset;
firstPoint.Color = GetRandomColor();
var firstId = 0.ToString();
var firstBucket = new Bucket(firstId) { Centroid = firstPoint };

for (int i = 1; i < allPointsCount; i++)
{
var set = new HashSet<string>(); //cluster candidate list
var p = BaseDataset[i];
// iterate clusters and collect candidates
foreach (var bucket in BaseBucketsLookup.Values)
{
var isInCluster = MathTool.BoxWithin(p, bucket.Centroid, MARKERCLUSTERER_SIZE);
if (!isInCluster)
continue;

//use first, short dist will be calc at last step before returning data
break;
}

// if not within box area, then make new cluster
if (set.Count == 0)
{
var pid = i.ToString();
p.Color = GetRandomColor();
var newbucket = new Bucket(pid) { Centroid = p };
}
}

//important, align all points to closest cluster point
BaseUpdatePointsByCentroid();

return BaseGetClusterResult();
}
}

// O(exponential) ~ can be slow when n or k is big
public class KMeans : BaseClusterAlgorithm
{
private readonly int _InitClusterSize; // start from this cluster points
// Rule of thumb k = sqrt(n/2)

// cluster point optimization iterations
private const int _MaxIterations = 100;
private const int _MaxClusters = 100;

public KMeans(List<XY> dataset) : base(dataset)
{
_InitClusterSize = 1;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

List<XY> RunClusterAlgo()
{
/*
REPEATE iteration of clusters convergence
until the max error is small enough
if not, insert a new cluster at worst place
(farthest in region of worst cluster area) and run again
keeping current cluster points

// one iteration of clusters convergence is defined as ..
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/

RunAlgo();

if (DoUpdateAllCentroidsToNearestContainingPoint)
BaseUpdateAllCentroidsToNearestContainingPoint();
return BaseGetClusterResult();
}

void RunAlgo()
{
// Init clusters
var centroids = BaseGetRandomCentroids(BaseDataset, _InitClusterSize);
for (int i = 0; i < centroids.Length; i++)
{
var pid = i.ToString();
var newbucket = new Bucket(pid) { Centroid = centroids[i] };
}

//
double currentMaxError = double.MaxValue;
while (currentMaxError > MAX_ERROR && BaseBucketsLookup.Count < _MaxClusters)
{
RunIterationsUntilKClusterPlacementAreDone();

var id = BaseGetMaxError();
var bucket = BaseBucketsLookup[id];
currentMaxError = bucket.ErrorLevel; //update
if (currentMaxError > MAX_ERROR)
{
// Here it is linear speed when putting one new centroid at a time
// should be semi-fast because the new point is inserted at best area
//from current centroids view.
// possible improvement exists by log2 search by inserting multiple centroids and
// reducing centroid again if needed

// put new centroid in area where maxError but farthest away from current centroid in area
var longest = BaseGetLongestPoint(bucket.Centroid, bucket.Points);
var newcentroid = new XY(longest);
var newid = BaseBucketsLookup.Count.ToString();
var newbucket = new Bucket(newid) { Centroid = newcentroid };
}
}
}

void RunIterationsUntilKClusterPlacementAreDone()
{
Profile("RunIterationsUntilKClusterPlacementAreDone beg");
double prevError = Double.MaxValue;
double currError = Double.MaxValue;

for (int i = 0; i < _MaxIterations; i++)
{
prevError = currError;
currError = RunOneIteration();
if (currError >= prevError) // no improvement
break;
}
}

double RunOneIteration() //O(k*n)
{
Profile("RunOneIteration beg");

// update points, assign points to cluster
BaseUpdatePointsByCentroid();
// update centroid pos by its points
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);//O(k*n)
var clustersCount = BaseBucketsLookup.Count;
for (int i = 0; i < clustersCount; i++)
{
var currentBucket = BaseBucketsLookup[i.ToString()];
if (currentBucket.IsUsed == false)
continue;

//update centroid
var newcontroid = BaseGetCentroidFromCluster(currentBucket.Points);
//no need to update color, autoset
currentBucket.Centroid = newcontroid;
currentBucket.ErrorLevel = 0;
//update error
foreach (var p in currentBucket.Points)
{
var dist = MathTool.Distance(newcontroid, p);
currentBucket.ErrorLevel += dist;
}
var val = currentBucket.ErrorLevel / currentBucket.Points.Count;
currentBucket.ErrorLevel = val; //Math.Sqrt(val);
}

return BaseGetTotalError();
}

}

public class GridCluster : BaseClusterAlgorithm
{
public GridCluster(List<XY> dataset) : base(dataset)
{
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo(GridX, GridY);
return cluster;
}

// O(k*n)
void MergeClustersGrid()
{
foreach (var key in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}
void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic, higher value gives less merging
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Max(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!BaseBucketsLookup.ContainsKey(neighborKey))
continue;

var neighbor = BaseBucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var current = BaseBucketsLookup[currentKey];
var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist > minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = BaseGetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
neighbor.Points.Clear(); //clear mem
}
}

public List<XY> RunClusterAlgo(int gridx, int gridy)
{
// params invalid
if (gridx <= 0 || gridy <= 0)
throw new ApplicationException("GridCluster.RunClusterAlgo gridx or gridy is <= 0");

// put points in buckets
foreach (var p in BaseDataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (BaseBucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);

// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther)
MergeClustersGrid();

if (DoUpdateAllCentroidsToNearestContainingPoint)
{
BaseUpdateAllCentroidsToNearestContainingPoint();
}

// check again
// merge if gridpoint is to close
if (DoMergeGridIfCentroidsAreCloseToEachOther
&& DoUpdateAllCentroidsToNearestContainingPoint)
{
MergeClustersGrid();
// and again set centroid to closest point in bucket
BaseUpdateAllCentroidsToNearestContainingPoint();
}

return BaseGetClusterResult();
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(XY a, XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
public static double Max(double a, double b)
{
return a >= b ? a : b;
}

public static bool DistWithin(XY a, XY b, double d)
{
var dist = Distance(a, b);
return dist < d;
}

public static bool BoxWithin(XY a, XY b, double boxsize)
{
var d = boxsize / 2;
var withinX = a.X - d <= b.X && a.X + d >= b.X;
var withinY = a.Y - d <= b.Y && a.Y + d >= b.Y;
return withinX && withinY;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter =
File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return isSuccess;
}
}

[Serializable()]
public class DatasetToSerialize : ISerializable
{
private const string Name = "Dataset";
public List<XY> Dataset { get; set; }
public DatasetToSerialize()
{
Dataset = new List<XY>();
}
public DatasetToSerialize(SerializationInfo info, StreamingContext ctxt)
{
this.Dataset = (List<XY>)info.GetValue(Name, typeof(List<XY>));
}
public void GetObjectData(SerializationInfo info, StreamingContext ctxt)
{
}
}

public class Serializer
{
public void SerializeObject(string filename, object objectToSerialize)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Create))
{
BinaryFormatter bFormatter = new BinaryFormatter();
bFormatter.Serialize(stream, objectToSerialize);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
}
public object DeSerializeObject(string filename)
{
try
{
using (Stream stream = File.Open(filename, FileMode.Open))
{
BinaryFormatter bFormatter = new BinaryFormatter();
var objectToSerialize = bFormatter.Deserialize(stream);
return objectToSerialize;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return null;
}
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.
If you want to test with Internet Explorer on your local machine, you should download the excanvas javascript plugin
canvas.html

```<html>
<title>Cluster Test</title>
<!-- IE canvas not supported fallback -->
<!--[if lt IE 10]>
<script src="excanvas.js"></script>
<![endif]-->
<script type="text/javascript" src="draw.js"></script>
<!-- default -->
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
//ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx - 8, cy + 2);
}

function drawMarkerCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
// draw rect
var a = cx - (size / 2);
var b = cy - (size / 2);
var len = size;
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.strokeRect(a, b, len, len);
}

function drawDistanceCluster(cx, cy, color, n, size, ctx) {
drawCluster(cx, cy, color, n, ctx);
ctx.lineWidth = 2;
ctx.strokeStyle = color;
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
}

var canvas = document.getElementById('canvas');
if (canvas != null && canvas != undefined && canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx);
}
}
</script>
<style type="text/css">
body
{
margin-left: 10px;
margin-top: 10px;
}
canvas
{
border: 1px solid red;
}
</style>
<body>
<canvas id="canvas" width="610" height="430">
<p>Your browser doesn't support canvas. Try Firefox, Chrome or an another modern browser.</p>
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.
draw.js

```function drawMarkers(ctx) {
drawMark(156.434, 157.921, 'rgb(182,114,144)', ctx);
drawMark(156.788, 139.939, 'rgb(182,114,144)', ctx);
drawMark(150.712, 67.882, 'rgb(67,235,136)', ctx);
drawMark(445.846, 165.556, 'rgb(153,85,202)', ctx);
drawMark(382.982, 179.917, 'rgb(153,85,202)', ctx);
drawMark(466.365, 172.098, 'rgb(153,85,202)', ctx);
drawMark(302.81, 243.478, 'rgb(48,234,127)', ctx);
drawMark(235.35, 265.31, 'rgb(48,234,127)', ctx);
drawMark(262.092, 258.321, 'rgb(48,234,127)', ctx);

drawMarkerCluster(156.434, 157.921, 'rgb(182,114,144)', 2, 160, ctx);
drawMarkerCluster(150.712, 67.882, 'rgb(67,235,136)', 1, 160, ctx);
drawMarkerCluster(445.846, 165.556, 'rgb(153,85,202)', 3, 160, ctx);
drawMarkerCluster(302.81, 243.478, 'rgb(48,234,127)', 3, 160, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 4, 20, 320);
}
```

Written by kunuk Nykjaer

September 20, 2011 at 10:49 pm

## Clustering K-means and Grid with C# example and html-canvas (part 2)

This is a follow up to my previous post.
https://kunuk.wordpress.com/2011/09/15/clustering-grid-cluster

The next post with updated code can be seen here.
https://kunuk.wordpress.com/2011/09/20/markerclusterer-with-c-example-and-html-canvas-part-3/

I have implemented K-means clustering algorithm which automatically increments the K value until the errorlevel is low enough to be accepted. The running time of the K-means runs reasonable fast. It’s best used in combination of chosing the right amount of points and setting the max_error. If the max_error is small and the number of points are big then the running time will be very slow. There are max iterations set to avoid very long running time as I believe K-means is NP-hard.

The insertion of the extra cluster point is linear and works by inserting them at the best place and keeping the current cluster points. It finds the current bad performing cluster point regarding error value and insert a new cluster point in that region farthest away from the bad performing cluster point. By doing this the new points are inserted at place which seems logical while minimizing the overall error level.

The Grid clustering algorithm runs pretty fast. Even with 100.000 points the time is below 2 sec. For 10.000 points the time is below 1 sec.

There is added an optional functionality to move the cluster point position to the closest point in the region of the clustered area after the final centroid calculation. By using this your cluster point will be positioned visually on a true existing point in the region.

When used with maps and websites the grid clustering algorithm is a good candidate of chose because of it’s fast running time which is O(m*n) where m is the number of grids and n is the number of points, please comment if I have made a wrong statement 🙂
The K-means can also be used when you know the expected points and setting the max_error accordingly.

Below are some image examples and the C# source code bundled in one file to keep it simple.

K-means 0.7 sec 10.000 points, max_error 100 K-means 2.2 sec 10.000 points, max_error 50 K-means 0.2 sec 150 points, max_error 30 K-means, an example of how K-means can’t “see” all the true clusters. Here the K is manually set to 3.
This will not happen with the current code, but this is mainly to show some weakness of the K-means algorithm. To many points view Grid clustering algorithm with grid 7×6 took 0.04 sec. with 160 points.
The cluster points location are moved to the nearest point in the region after the centroid calculation in this example. Some grid merging has also occurred because the clusters were too close to each other. Final clustered result view Performance test of the grid by using lots of points.
Grid 8×7, 1.3 sec, 100.000 points.
With Grid 6×5, it’s about 3 sec, without centroid calculation and without saving to file for 1.000.000 points. Algorithm.cs

```
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;

// By Kunuk Nykjaer
// Clustering package, Grid clustering and auto increasing K-means clustering
// Version 0.2
public class Algorithm
{
public static DateTime Starttime;
static void Main(string[] args)
{
Starttime = DateTime.Now;
var algo = new Algorithm();

// run kmeans or grid
algo.Run(ClusterType.KMeans);
//algo.Run(ClusterType.Grid);

algo.GenerateJavascriptDrawFile(); //create draw.js

var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend+" sec. press a key ...");
}

// use centroid or semi-centroid cluster point placement visualization?
public const bool DoUpdateAllCentroidsToNearestContainingPoint = false;
public const bool UseProfiling = false; //debug, output time spend

public enum ClusterType { Unknown = -1, Grid, KMeans } ;
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random _rand = new Random();
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;

// used with Grid cluster algo
public const int GridX = 8;
public const int GridY = 7;

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc, grid cluster algo
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx"; // javascript canvas
public bool DisplayGridInCanvas; //autoset by used cluster type

public Algorithm()
{
CreateDataSet();
}

public static string GetId(int idx, int idy) //O(1)
{
return idx + ";" + idy;
}

void CreateDataSet() //O(1)
{
// CREATE DATA SET

// Create random scattered points
//for (int i = 0; i < 150; i++)
//{
//    var x = MinX + _rand.NextDouble() * (AbsSizeX);
//    var y = MinY + _rand.NextDouble() * (AbsSizeY);
//}

// Create random region of clusters
for (int i = 0; i < 50; i++)
{
var x = MinX + _rand.NextDouble() * 100;
var y = MinY + _rand.NextDouble() * 100;
}

for (int i = 0; i < 50; i++)
{
var x = MinX + 200 + _rand.NextDouble() * 100;
var y = MinY + 10 + _rand.NextDouble() * 100;
}

for (int i = 0; i < 50; i++)
{
var x = MinX + 50 + _rand.NextDouble() * 100;
var y = MinY + 150 + _rand.NextDouble() * 100;
}

}

public void Run(ClusterType clustertype)
{
switch (clustertype)
{
case ClusterType.Grid:
_clusters = new GridBaseCluster(_points).GetCluster();
DisplayGridInCanvas = true;
break;
case ClusterType.KMeans:
_clusters = new KMeans(_points).GetCluster();
DisplayGridInCanvas = false;
break;
}
}

public static string GetRandomColor()
{
int r = _rand.Next(10, 240);
int g = _rand.Next(10, 240);
int b = _rand.Next(10, 240);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

static void CreateFile(string s)
{
var path = new FileInfo(@"C:\temp\draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}
public void GenerateJavascriptDrawFile()
{
var sb = new StringBuilder();

// markers
var head = NL + "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
if (DisplayGridInCanvas)
{
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX),
ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX),
ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX),
ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);
}

// markers
if (_points != null)
foreach (var p in _points)
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));

string clusterInfo = "0";
if (_clusters != null)
{
foreach (var c in _clusters)
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

clusterInfo = _clusters.Count + string.Empty;
}

// bottom text
sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 20), NL));

sb.Append(tail);
CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}
public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}
public static void Profile(string s)
{
if(!UseProfiling)
return;
var timespend = DateTime.Now.Subtract(Starttime).TotalSeconds;
Console.WriteLine(timespend + " sec. " + s);
}//O(1)

public class XY : IComparable
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }
public string XToString { get { return X.ToString(S, Culture_enUS); } }
public string YToString { get { return Y.ToString(S, Culture_enUS); } }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
Size = 1;
}
public XY(XY p) //clone
{
this.X = p.X;
this.Y = p.Y;
this.Color = p.Color;
this.Size = p.Size;
}

public int CompareTo(object o) // if used in sorted list
{
if (this.Equals(o))
return 0;

var other = (XY)o;
if (this.X > other.X)
return -1;
if (this.X < other.X)
return 1;

return 0;
}
public override int GetHashCode()
{
var x = X * 10000;
var y = Y * 10000;
var r = x * 17 + y * 37;
return (int)r;
}
private const int ROUND = 6;
public override bool Equals(Object o)
{
if (o == null)
return false;
var other = o as XY;
if (other == null)
return false;

var x = Math.Round(this.X, ROUND) == Math.Round(other.X, ROUND);
var y = Math.Round(this.Y, ROUND) == Math.Round(other.Y, ROUND);
return x && y;
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
public double ErrorLevel { get; set; } // clusterpoint and points avg dist
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set{_IsUsed = value; }
}
public Bucket(string id)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Id = id;
}
public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class BaseClusterAlgorithm
{
public List<XY> BaseDataset; // all points
//id, bucket
public readonly Dictionary<string, Bucket> BaseBucketsLookup =
new Dictionary<string, Bucket>();
public abstract List<XY> GetCluster();
//O(k?? random fn can be slow)
public static XY[] BaseGetRandomCentroids(List<XY> list, int k)
{
var set = new HashSet<XY>();
int i = 0;
var kcentroids = new XY[k];

int MAX = list.Count;
while (MAX >= k)
{
int index = _rand.Next(0, MAX - 1);
var xy = list[index];
if (set.Contains(xy))
continue;

kcentroids[i++] = new XY(xy.X, xy.Y) { Color = GetRandomColor() };

if (i >= k)
break;
}

Profile("BaseGetRandomCentroids");
return kcentroids;
}

public List<XY> BaseGetClusterResult()//O(k*n)
{
var clusterPoints = new List<XY>();
// collect used buckets and return the result

if (DoUpdateAllCentroidsToNearestContainingPoint)
BaseUpdateAllCentroidsToNearestContainingPoint();

foreach (var item in BaseBucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
}

Profile("BaseGetClusterResult");
return clusterPoints;
}
public static XY BaseGetCentroidFromCluster(List<XY> list) //O(n)
{
int count = list.Count;
if (list == null || count == 0)
return null;

// color is set for the points and the cluster point here
var color = GetRandomColor(); //O(1)
XY centroid = new XY(0, 0) { Size = list.Count };//O(1)
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}
centroid.X /= count;
centroid.Y /= count;
var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };

Profile("BaseGetCentroidFromCluster");
return cp;
}
//O(k*n)
public static void BaseSetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = BaseGetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
Profile("BaseSetCentroidForAllBuckets");
}
public double BaseGetTotalError()//O(k)
{
int centroidsUsed = BaseBucketsLookup.Values.Count(b => b.IsUsed);
double sum = BaseBucketsLookup.Values.Where(b => b.IsUsed).Sum(b => b.ErrorLevel);
return sum / centroidsUsed;
}
public string BaseGetMaxError() //O(k)
{
double maxError = -double.MaxValue;
string id = string.Empty;
foreach (var b in BaseBucketsLookup.Values)
{
if (!b.IsUsed || b.ErrorLevel <= maxError)
continue;

maxError = b.ErrorLevel;
id = b.Id;
}
return id;
}
public XY BaseGetClosestPoint(XY from, List<XY> list) //O(n)
{
double min = double.MaxValue;
XY closests = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d >= min)
continue;

// update
min = d;
closests = p;
}
return closests;
}
public XY BaseGetLongestPoint(XY from, List<XY> list) //O(n)
{
double max = -double.MaxValue;
XY longest = null;
foreach (var p in list)
{
var d = MathTool.Distance(from, p);
if (d <= max)
continue;

// update
max = d;
longest = p;
}
return longest;
}

// update centroid location to nearest point,
// e.g. if you want to show cluster point on a real existing point area
//O(n)
public void BaseUpdateCentroidToNearestContainingPoint(Bucket bucket)
{
if(bucket==null || bucket.Centroid==null ||
bucket.Points==null||bucket.Points.Count==0)
return;

var closest = BaseGetClosestPoint(bucket.Centroid, bucket.Points);
bucket.Centroid.X = closest.X;
bucket.Centroid.Y = closest.Y;
Profile("BaseUpdateCentroidToNearestContainingPoint");
}
//O(k*n)
public void BaseUpdateAllCentroidsToNearestContainingPoint()
{
foreach (var bucket in BaseBucketsLookup.Values)
BaseUpdateCentroidToNearestContainingPoint(bucket);
Profile("BaseUpdateAllCentroidsToNearestContainingPoint");
}
}

// O(exponential) ~ can be slow when n or k is big
public class KMeans : BaseClusterAlgorithm
{
private const int _InitClusterSize = 1; // start from this cluster points

// heuristic, set tolerance for cluster density, has effect on running time.
// Set high for many points in dataset, can be lower for fewer points
private const double MAX_ERROR = 30;

private const int _MaxIterations = 100; // cluster point optimization iterations
private const int _MaxClusters = 100;

public KMeans(List<XY> dataset)
{
if (dataset==null || dataset.Count==0)
throw new ApplicationException(
string.Format("KMeans dataset is null or empty") );

this.BaseDataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo();
return cluster;
}

List<XY> RunClusterAlgo()
{
/*
REPEATE iteration of clusters convergence
* until the max error is small enough
if not, insert a new cluster at worst place
(farthest in region of worst cluster area) and run again
keeping current cluster points

// one iteration of clusters convergence is defined as ..
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/

var clusterPoints = new List<XY>();
// params invalid
if (BaseDataset == null || BaseDataset.Count == 0)
return clusterPoints;

RunAlgo();
return BaseGetClusterResult();
}

void RunAlgo()
{
// Init clusters
var centroids = BaseGetRandomCentroids(BaseDataset, _InitClusterSize);
for (int i = 0; i < centroids.Length; i++)
{
var newbucket = new Bucket(i.ToString()) { Centroid = centroids[i] };
}

//
double currentMaxError = double.MaxValue;
while (currentMaxError > MAX_ERROR && BaseBucketsLookup.Count < _MaxClusters)
{
RunIterationsUntilKClusterPlacementAreDone();

var id = BaseGetMaxError();
var bucket = BaseBucketsLookup[id];
currentMaxError = bucket.ErrorLevel; //update
if (currentMaxError > MAX_ERROR)
{
// Here it is linear speed when putting one new centroid at a time
// should be semi-fast because the new point is inserted at best area
//from current centroids view.
// possible improvement exists by log2 search by inserting multiple centroids and
// reducing centroid again by using closest pair algo which can be O(nlogn) to
// find closest cluster dist and analysing the dist length, if its to small,
//then reduce centroids.

// put new centroid in area where maxError but farthest away from current centroid in area
var longest = BaseGetLongestPoint(bucket.Centroid, bucket.Points);
var newcentroid = new XY(longest);
var newid = BaseBucketsLookup.Count.ToString();
var newbucket = new Bucket(newid) { Centroid = newcentroid };
}
}
}

void RunIterationsUntilKClusterPlacementAreDone()
{
double prevError = Double.MaxValue;
double currError = Double.MaxValue;

for (int i = 0; i < _MaxIterations; i++)
{
prevError = currError;
currError = RunOneIteration();
if (currError >= prevError) // no improvement then stop
break;
}
Profile("RunIterationsUntilKClusterPlacementAreDone");
}

double RunOneIteration() //O(k*n)
{
// update points, assign points to cluster
UpdatePointsByCentroid();
// update centroid pos by its points
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);//O(k*n)
var clustersCount = BaseBucketsLookup.Count;
for (int i = 0; i < clustersCount; i++)
{
var currentBucket = BaseBucketsLookup[i.ToString()];
if (currentBucket.IsUsed == false)
continue;

//update centroid
var newcontroid = BaseGetCentroidFromCluster(currentBucket.Points);
//no need to update color, autoset
currentBucket.Centroid = newcontroid;
currentBucket.ErrorLevel = 0;
//update error
foreach (var p in currentBucket.Points)
{
var dist = MathTool.Distance(newcontroid, p);
currentBucket.ErrorLevel += dist;
}
var val = currentBucket.ErrorLevel/currentBucket.Points.Count;
currentBucket.ErrorLevel = val; //Math.Sqrt(val);
}

Profile("RunOneIteration");
return BaseGetTotalError();
}
void UpdatePointsByCentroid()//O(n*k)
{
int count = BaseBucketsLookup.Count();

// clear points in the buckets, they will be re-inserted
foreach (var bucket in BaseBucketsLookup.Values)
bucket.Points.Clear();

foreach (XY p in BaseDataset)
{
double minDist = Double.MaxValue;
int index = -1;
for (int i = 0; i < count; i++)
{
var bucket = BaseBucketsLookup[i.ToString()];
if (bucket.IsUsed == false)
continue;

var centroid = bucket.Centroid;
var dist = MathTool.Distance(p, centroid);
if (dist < minDist)
{
// update
minDist = dist;
index = i;
}
}
//update color for point to match centroid and re-insert
var closestBucket = BaseBucketsLookup[index.ToString()];
p.Color = closestBucket.Centroid.Color;
}

Profile("UpdatePointsByCentroid");
}
}

public class GridBaseCluster : BaseClusterAlgorithm
{
public GridBaseCluster(List<XY> dataset)
{
BaseDataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = RunClusterAlgo(GridX, GridY);
return cluster;
}

// O(k*n)
void MergeClustersGrid()
{
foreach (var key in BaseBucketsLookup.Keys)
{
var bucket = BaseBucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}
void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic, higher value gives less merging
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Max(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!BaseBucketsLookup.ContainsKey(neighborKey))
continue;

var neighbor = BaseBucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var current = BaseBucketsLookup[currentKey];
var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist > minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = BaseGetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
}
}

public List<XY> RunClusterAlgo(int gridx, int gridy)
{
var clusterPoints = new List<XY>();
// params invalid
if (BaseDataset == null || BaseDataset.Count == 0 ||
gridx <= 0 || gridy <= 0)
return clusterPoints;

// put points in buckets
foreach (var p in BaseDataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (BaseBucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
BaseSetCentroidForAllBuckets(BaseBucketsLookup.Values);

// merge if gridpoint is to close
MergeClustersGrid();

return BaseGetClusterResult();
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(Algorithm.XY a, Algorithm.XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
public static double Max(double a, double b)
{
return a >= b ? a : b;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter =
File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace + "\nPress a key ... ");
}
return isSuccess;
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.

canvas.html

```<html>
<title>Cluster test</title>
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

var canvas = document.getElementById('canvas');
if (canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx); //use fn defined in draw.js
}
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx-8, cy+2);
}
</script>
<script type="text/javascript" src="draw.js"></script>

<style type="text/css">
body{ margin-left: 10px; margin-top:10px;}
canvas  {  border: 1px solid red; }
</style>
<body>
<canvas id="canvas" width="410" height="330">
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.
draw.js

```function drawMarkers(ctx) {
drawMark(161.089, 215.198, 'rgb(82,128,16)', ctx);
drawMark(283.556, 12.712, 'rgb(92,49,66)', ctx);
drawMark(98.915, 37.67, 'rgb(181,136,150)', ctx);
drawMark(318.432, 102.905, 'rgb(92,49,66)', ctx);
drawMark(378.091, 238.713, 'rgb(182,83,25)', ctx);
drawMark(318.736, 80.199, 'rgb(92,49,66)', ctx);
drawMark(122.702, 252.349, 'rgb(82,128,16)', ctx);
drawMark(173.836, 44.669, 'rgb(181,136,150)', ctx);
drawMark(141.157, 172.273, 'rgb(82,128,16)', ctx);
drawMark(102.749, 56.737, 'rgb(181,136,150)', ctx);
drawCluster(306.908, 65.272, 'rgb(92,49,66)', 3, ctx);
drawCluster(378.091, 238.713, 'rgb(182,83,25)', 1, ctx);
drawCluster(141.649, 213.273, 'rgb(82,128,16)', 3, ctx);
drawCluster(125.167, 46.359, 'rgb(181,136,150)', 3, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 4, 20, 320);
}
```

Written by kunuk Nykjaer

September 17, 2011 at 9:35 am

## Clustering – Grid cluster with C# and html-canvas example (part 1)

Grid clustering With merging supported if the centroid of the grid clusters are too close to each other.
The second post with updated code can be seen here.
https://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2

Sometimes when you want to visualize points, there can be to many points and they overlap.
To avoid this problem you can use different clustering methods and group points which are close to each. A cluster contains information such as the number of points. The idea is to apply interaction method with a click-event for the cluster where you will zoom in. The view gets expanded and the points stop overlapping.

Here I look at a grid version of clustering. You divide the screen into grid sections and put the points in a grid cluster. Then you apply a centroid calculation to put the location of the cluster point.

In this example I will visualize both the points and the cluster point. Normally you would only visualize the cluster points.
For the visualization I will use html and canvas. The clustering algorithm is implemented in C#. The C# program generates a draw.js file with point and cluster points drawing information.

The running time is something like O(m*n) where m is number of grids and n is number of points.
This clustering visualization works good enough but is not always perfect. Perfect would be always putting points to the closest cluster point.
I will compare this with K-means at a later time.

Here’s an example of how the points are clustered in grids. The cluster points has a number which tells how many points it contains. Notice the bottom cluster with size 8. It has merged with it’s neighbor cluster where you can see the points are spread in two grid area.

The code examples will generate an image similar to above when run out of the box.

Here are the source codes.

Program.cs

```using System;

// By Kunuk Nykjaer
class Program
{
static void Main(string[] args)
{
var algo = new Algorithm();
algo.Run("gridcluster");
algo.GenerateDataString();

Console.WriteLine("press a key ...");
}
}
```

Algorithm.cs

```using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;

// By Kunuk Nykjaer
public class Algorithm
{
static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
const string S = "G";

readonly List<XY> _points = new List<XY>();
List<XY> _clusters = new List<XY>();

public static readonly Random _rand = new Random();
public const double MinX = 10;
public const double MinY = 10;
public const double MaxX = 400;
public const double MaxY = 300;
public const int GridX = 4;
public const int GridY = 3;

const double AbsSizeX = MaxX - MinX;
const double AbsSizeY = MaxY - MinY;

// for bucket placement calc
const double DeltaX = AbsSizeX / GridX;
const double DeltaY = AbsSizeY / GridY;

static private readonly string NL = Environment.NewLine;
private const string ctx = "ctx";

public Algorithm()
{
CreateDataSet();
}

void CreateDataSet()
{
// CREATE DATA SET

//for (int i = 0; i < 100; i++)
//{
//    var x = MinX + rand.NextDouble() * (MaxX);
//    var y = MinY + rand.NextDouble() * (MaxY);
//}

// Create random region of clusters
for (int i = 0; i < 10; i++)
{
var x = MinX + _rand.NextDouble() * 100;
var y = MinY + _rand.NextDouble() * 100;
}

for (int i = 0; i < 10; i++)
{
var x = MinX + 200 + _rand.NextDouble() * 100;
var y = MinY + 10 + _rand.NextDouble() * 100;
}

for (int i = 0; i < 10; i++)
{
var x = MinX + 50 + _rand.NextDouble() * 100;
var y = MinY + 150 + _rand.NextDouble() * 100;
}
}

public void Run(string cluster)
{
if (cluster == "gridcluster")
{
GridCluster grid = new GridCluster(_points);
_clusters = grid.GetCluster();
}
}

public static string GetRandomColor()
{
int r = _rand.Next(10, 240);
int g = _rand.Next(10, 240);
int b = _rand.Next(10, 240);
return string.Format("'rgb({0},{1},{2})'", r, g, b);
}

public static void SetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
{
foreach (var item in buckets)
{
var bucketPoints = item.Points;
var cp = GetCentroidFromCluster(bucketPoints);
item.Centroid = cp;
}
}

public static string GetId(int idx, int idy)
{
return idx + ";" + idy;
}

static void CreateFile(string s)
{
var path = new FileInfo(@"C:\temp\draw.js");
var isCreated = FileUtil.WriteFile(s, path);
Console.WriteLine(isCreated + " = File is Created");
}

static XY GetCentroidFromCluster(List<XY> list)
{
var color = GetRandomColor();

if (list.Count == 0)
return null;

XY centroid = new XY(0, 0);
foreach (XY p in list)
{
p.Color = color;
centroid.X += p.X;
centroid.Y += p.Y;
}

int count = list.Count();
centroid.X /= count;
centroid.Y /= count;

var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };
return cp;
}

public void GenerateDataString()
{
var sb = new StringBuilder();

// markers
var head = NL + "function drawMarkers(ctx) {" + NL;
var tail = NL + "}" + NL;

// grid
sb.Append("ctx.beginPath();" + NL);
for (int i = 0; i < GridX + 1; i++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX), ToStringEN(MinY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MinX + i * DeltaX), ToStringEN(MaxY), NL));
}
for (int j = 0; j < GridY + 1; j++)
{
sb.Append(String.Format("ctx.moveTo({0}, {1});{2}",
ToStringEN(MinX), ToStringEN(MinY + j * DeltaY), NL));
sb.Append(String.Format("ctx.lineTo({0}, {1});{2}",
ToStringEN(MaxX), ToStringEN(MinY + j * DeltaY), NL));
}
sb.Append("ctx.stroke(); " + NL);

// markers
if (_points != null)
foreach (var p in _points)
sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}",
ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));

string clusterInfo = "0";
if (_clusters != null)
{
foreach (var c in _clusters)
sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}",
ToStringEN(c.X), ToStringEN(c.Y), c.Color,
c.Size, ctx, NL));

clusterInfo = _clusters.Count + string.Empty;
}
// bottom text

sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}",
clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 16), NL));

sb.Append(tail);

CreateFile(sb.ToString());
//Console.WriteLine(sb.ToString());
}

public static string ToStringEN(double d)
{
double rounded = Math.Round(d, 3);
return rounded.ToString(S, Culture_enUS);
}

public class XY
{
public double X { get; set; }
public double Y { get; set; }
public string Color { get; set; }
public int Size { get; set; }
public string XToString { get { return X.ToString(S, Culture_enUS); } }
public string YToString { get { return Y.ToString(S, Culture_enUS); } }
public XY(double x, double y)
{
X = x;
Y = y;
Color = "'rgb(0,0,0)'";//default
Size = 1;
}
}
public class Bucket
{
public string Id { get; private set; }
public List<XY> Points { get; private set; }
public XY Centroid { get; set; }
public int Idx { get; private set; }
public int Idy { get; private set; }
private bool _IsUsed;
public bool IsUsed
{
get { return _IsUsed && Centroid != null; }
set { _IsUsed = value; }
}

public Bucket(int idx, int idy)
{
IsUsed = true;
Centroid = null;
Points = new List<XY>();
Idx = idx;
Idy = idy;
Id = GetId(idx, idy);
}
}

public abstract class ClusterAlgorithm
{
//id, bucket
public readonly Dictionary<string, Bucket> bucketsLookup = new Dictionary<string, Bucket>();
public abstract List<XY> GetCluster();
}
public class KMeans : ClusterAlgorithm
{
//NOT IMPLE YET
public override List<XY> GetCluster()
{
return null;
}
/*
1) Random k centroids
2) Cluster data by euclidean distance to centroids
3) Update centroids by clustered data,
http://en.wikipedia.org/wiki/Centroid#Of_a_finite_set_of_points
4) Update cluster
5) Continue last two steps until error is small, error is sum of diff
between current and updated centroid
*/
}

public class GridCluster : ClusterAlgorithm
{
public GridCluster(List<XY> dataset)
{
_dataset = dataset;
}

public override List<XY> GetCluster()
{
var cluster = GridClusterAlgo(GridX, GridY);
return cluster;
}

void MergeClustersGrid()
{
foreach (var key in bucketsLookup.Keys)
{
var bucket = bucketsLookup[key];
if (bucket.IsUsed == false)
continue;

var x = bucket.Idx;
var y = bucket.Idy;

// get keys for neighbors
var N = GetId(x, y + 1);
var NE = GetId(x + 1, y + 1);
var E = GetId(x + 1, y);
var SE = GetId(x + 1, y - 1);
var S = GetId(x, y - 1);
var SW = GetId(x - 1, y - 1);
var W = GetId(x - 1, y);
var NW = GetId(x - 1, y - 1);
var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

MergeClustersGridHelper(key, neighbors);
}
}

void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
{
double minDistX = DeltaX / 2.0;//heuristic
double minDistY = DeltaY / 2.0;
//if clusters in grid are too close to each other, merge them
double minDist = MathTool.Min(minDistX, minDistY);

foreach (var neighborKey in neighborKeys)
{
if (!bucketsLookup.ContainsKey(neighborKey))
continue;

var current = bucketsLookup[currentKey];
var neighbor = bucketsLookup[neighborKey];
if (neighbor.IsUsed == false)
continue;

var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
if (dist >= minDist)
continue;

// merge
var color = current.Centroid.Color;
foreach (var p in neighbor.Points)
{
//update color
p.Color = color;
}

// recalc centroid
var cp = GetCentroidFromCluster(current.Points);
current.Centroid = cp;
neighbor.IsUsed = false; //merged, then not used anymore
}
}

public List<XY> GridClusterAlgo(int gridx, int gridy)
{
var clusterPoints = new List<XY>();
// params invalid
if (_dataset == null || _dataset.Count == 0 || gridx <= 0 || gridy <= 0)
return clusterPoints;

// put points in buckets
foreach (var p in _dataset)
{
// find relative val
var relativeX = p.X - MinX;
var relativeY = p.Y - MinY;

int idx = (int)(relativeX / DeltaX);
int idy = (int)(relativeY / DeltaY);

// bucket id
string id = GetId(idx, idy);

if (bucketsLookup.ContainsKey(id))

// new bucket, create and add point
else
{
Bucket bucket = new Bucket(idx, idy);
}
}

// calc centrod for all buckets
SetCentroidForAllBuckets(bucketsLookup.Values);

// merge if gridpoint is to close
MergeClustersGrid();

// collect buckets non-empty
foreach (var item in bucketsLookup)
{
var bucket = item.Value;
if (bucket.IsUsed)
}
return clusterPoints;
}
}

public class MathTool
{
private const double Exp = 2; // 2=euclid, 1=manhatten
//Minkowski dist
public static double Distance(Algorithm.XY a, Algorithm.XY b)
{
return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) +
Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
}

public static double Min(double a, double b)
{
return a <= b ? a : b;
}
}

public static class FileUtil
{
public static bool WriteFile(string data, FileInfo fileInfo)
{
bool isSuccess = false;
try
{
using (StreamWriter streamWriter = File.CreateText(fileInfo.FullName))
{
streamWriter.Write(data);
isSuccess = true;
}
}
catch (Exception ex) { Console.WriteLine(ex.StackTrace+"\nPress a key ... ");

return isSuccess;
}
}
}
```

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.
The html file will display the same as the example image above out of the box.

canvas.html

```<html>
<title>Cluster test</title>
<script type="text/javascript">
var G = {
yellow: "#FF0",
red: "#F00",
green: "#0F0",
blue: "#00F",
black: "#000"
}

var canvas = document.getElementById('canvas');
if (canvas.getContext) {
var ctx = canvas.getContext('2d');
ctx.strokeStyle = G.black;
ctx.font = "10pt Arial";
drawMarkers(ctx); //use fn defined in draw.js
}
}

function drawMark(cx, cy, color, ctx) {
//draw a circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.fillStyle = color;
ctx.fill();
ctx.stroke();
}

function drawCluster(cx, cy, color, n, ctx) {
//draw a bigger circle
ctx.beginPath();
ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
ctx.closePath();
ctx.stroke();
ctx.fillStyle = color;
ctx.fillText("" + n, cx-8, cy+2);
}
</script>
<script type="text/javascript" src="draw.js"></script>

<style type="text/css">
body{ margin-left: 10px; margin-top:10px;}
canvas  {  border: 1px solid red; }
</style>
<body>
<canvas id="canvas" width="410" height="330">
</canvas>
</body>
</html>
```

The javascript file will be overwritten when you run the C# code.

draw.js

```function drawMarkers(ctx) {
// By Kunuk Nykjaer
ctx.beginPath();
ctx.moveTo(10, 10);
ctx.lineTo(10, 300);
ctx.moveTo(107.5, 10);
ctx.lineTo(107.5, 300);
ctx.moveTo(205, 10);
ctx.lineTo(205, 300);
ctx.moveTo(302.5, 10);
ctx.lineTo(302.5, 300);
ctx.moveTo(400, 10);
ctx.lineTo(400, 300);
ctx.moveTo(10, 10);
ctx.lineTo(400, 10);
ctx.moveTo(10, 106.667);
ctx.lineTo(400, 106.667);
ctx.moveTo(10, 203.333);
ctx.lineTo(400, 203.333);
ctx.moveTo(10, 300);
ctx.lineTo(400, 300);
ctx.stroke();
drawMark(24.566, 48.809, 'rgb(197,120,199)', ctx);
drawMark(63.55, 22.318, 'rgb(197,120,199)', ctx);
drawMark(36.17, 66.005, 'rgb(197,120,199)', ctx);
drawMark(30.083, 23.226, 'rgb(197,120,199)', ctx);
drawMark(96.758, 53.378, 'rgb(197,120,199)', ctx);
drawMark(97.994, 13.421, 'rgb(197,120,199)', ctx);
drawMark(91.573, 15.839, 'rgb(197,120,199)', ctx);
drawMark(59.451, 23.006, 'rgb(197,120,199)', ctx);
drawMark(63.699, 81.094, 'rgb(197,120,199)', ctx);
drawMark(21.665, 15.711, 'rgb(197,120,199)', ctx);
drawMark(298.417, 77.83, 'rgb(11,24,61)', ctx);
drawMark(291.38, 86.759, 'rgb(11,24,61)', ctx);
drawMark(281.345, 95.517, 'rgb(11,24,61)', ctx);
drawMark(284.307, 92.13, 'rgb(11,24,61)', ctx);
drawMark(240.746, 106.23, 'rgb(11,24,61)', ctx);
drawMark(247.004, 85.546, 'rgb(11,24,61)', ctx);
drawMark(291.715, 33.158, 'rgb(11,24,61)', ctx);
drawMark(216.012, 33.45, 'rgb(11,24,61)', ctx);
drawMark(260.559, 46.371, 'rgb(11,24,61)', ctx);
drawMark(239.061, 103.01, 'rgb(11,24,61)', ctx);
drawMark(67.791, 210.702, 'rgb(111,163,157)', ctx);
drawMark(146.852, 211.327, 'rgb(111,163,157)', ctx);
drawMark(82.629, 164.594, 'rgb(106,75,235)', ctx);
drawMark(71.372, 244.351, 'rgb(111,163,157)', ctx);
drawMark(110.755, 255.688, 'rgb(111,163,157)', ctx);
drawMark(144.035, 185.499, 'rgb(96,173,79)', ctx);
drawMark(116.13, 217.092, 'rgb(111,163,157)', ctx);
drawMark(112.972, 216.736, 'rgb(111,163,157)', ctx);
drawMark(117.428, 246.791, 'rgb(111,163,157)', ctx);
drawMark(100.972, 210.653, 'rgb(111,163,157)', ctx);
drawCluster(58.551, 36.281, 'rgb(197,120,199)', 10, ctx);
drawCluster(265.055, 76, 'rgb(11,24,61)', 10, ctx);
drawCluster(105.534, 226.668, 'rgb(111,163,157)', 8, ctx);
drawCluster(82.629, 164.594, 'rgb(106,75,235)', 1, ctx);
drawCluster(144.035, 185.499, 'rgb(96,173,79)', 1, ctx);
ctx.fillStyle = 'rgb(0,0,0)';
ctx.fillText('Clusters = ' + 5, 20, 316);
}

```

Written by kunuk Nykjaer

September 15, 2011 at 11:24 pm