Software Programming

Kunuk Nykjaer

Clustering – Grid cluster with C# and html-canvas example (part 1)

with 2 comments


Grid clustering With merging supported if the centroid of the grid clusters are too close to each other.
The second post with updated code can be seen here.
http://kunuk.wordpress.com/2011/09/17/clustering-k-means-and-grid-with-c-example-and-html-canvas-part-2

Sometimes when you want to visualize points, there can be to many points and they overlap.
To avoid this problem you can use different clustering methods and group points which are close to each. A cluster contains information such as the number of points. The idea is to apply interaction method with a click-event for the cluster where you will zoom in. The view gets expanded and the points stop overlapping.

Here I look at a grid version of clustering. You divide the screen into grid sections and put the points in a grid cluster. Then you apply a centroid calculation to put the location of the cluster point.

In this example I will visualize both the points and the cluster point. Normally you would only visualize the cluster points.
For the visualization I will use html and canvas. The clustering algorithm is implemented in C#. The C# program generates a draw.js file with point and cluster points drawing information.

The running time is something like O(m*n) where m is number of grids and n is number of points.
This clustering visualization works good enough but is not always perfect. Perfect would be always putting points to the closest cluster point.
I will compare this with K-means at a later time.

Here’s an example of how the points are clustered in grids.

The cluster points has a number which tells how many points it contains. Notice the bottom cluster with size 8. It has merged with it’s neighbor cluster where you can see the points are spread in two grid area.

The code examples will generate an image similar to above when run out of the box.

Here are the source codes.

Program.cs

using System;

// By Kunuk Nykjaer
class Program
{
    static void Main(string[] args)
    {
        var algo = new Algorithm();
        algo.Run("gridcluster");
        algo.GenerateDataString();

        Console.WriteLine("press a key ...");
        //Console.ReadKey();
    }
}

Algorithm.cs

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;

// By Kunuk Nykjaer
public class Algorithm
{
    static readonly CultureInfo Culture_enUS = new CultureInfo("en-US");
    const string S = "G";

    readonly List<XY> _points = new List<XY>();
    List<XY> _clusters = new List<XY>();

    public static readonly Random _rand = new Random();
    public const double MinX = 10;
    public const double MinY = 10;
    public const double MaxX = 400;
    public const double MaxY = 300;
    public const int GridX = 4;
    public const int GridY = 3;

    const double AbsSizeX = MaxX - MinX;
    const double AbsSizeY = MaxY - MinY;

    // for bucket placement calc
    const double DeltaX = AbsSizeX / GridX;
    const double DeltaY = AbsSizeY / GridY;

    static private readonly string NL = Environment.NewLine;
    private const string ctx = "ctx";

    public Algorithm()
    {
        CreateDataSet();
    }

    void CreateDataSet()
    {
        // CREATE DATA SET

        // Create random spread points
        //for (int i = 0; i < 100; i++)
        //{        
        //    var x = MinX + rand.NextDouble() * (MaxX);
        //    var y = MinY + rand.NextDouble() * (MaxY);
        //    _points.Add(new XY(x, y));
        //}

        // Create random region of clusters
        for (int i = 0; i < 10; i++)
        {
            var x = MinX + _rand.NextDouble() * 100;
            var y = MinY + _rand.NextDouble() * 100;
            _points.Add(new XY(x, y));
        }

        for (int i = 0; i < 10; i++)
        {
            var x = MinX + 200 + _rand.NextDouble() * 100;
            var y = MinY + 10 + _rand.NextDouble() * 100;
            _points.Add(new XY(x, y));
        }

        for (int i = 0; i < 10; i++)
        {
            var x = MinX + 50 + _rand.NextDouble() * 100;
            var y = MinY + 150 + _rand.NextDouble() * 100;
            _points.Add(new XY(x, y));
        }
    }

    public void Run(string cluster)
    {
        if (cluster == "gridcluster")
        {
            GridCluster grid = new GridCluster(_points);
            _clusters = grid.GetCluster();
        }
    }

    public static string GetRandomColor()
    {
        int r = _rand.Next(10, 240);
        int g = _rand.Next(10, 240);
        int b = _rand.Next(10, 240);
        return string.Format("'rgb({0},{1},{2})'", r, g, b);
    }

    public static void SetCentroidForAllBuckets(IEnumerable<Bucket> buckets)
    {
        foreach (var item in buckets)
        {
            var bucketPoints = item.Points;
            var cp = GetCentroidFromCluster(bucketPoints);
            item.Centroid = cp;
        }
    }

    public static string GetId(int idx, int idy)
    {
        return idx + ";" + idy;
    }

    static void CreateFile(string s)
    {       
        var path = new FileInfo(@"C:\temp\draw.js");
        var isCreated = FileUtil.WriteFile(s, path);
        Console.WriteLine(isCreated + " = File is Created");
    }

    static XY GetCentroidFromCluster(List<XY> list)
    {
        var color = GetRandomColor();

        if (list.Count == 0)
            return null;

        XY centroid = new XY(0, 0);
        foreach (XY p in list)
        {
            p.Color = color;
            centroid.X += p.X;
            centroid.Y += p.Y;
        }

        int count = list.Count();
        centroid.X /= count;
        centroid.Y /= count;

        var cp = new XY(centroid.X, centroid.Y) { Size = count, Color = color };
        return cp;
    }

    public void GenerateDataString()
    {
        var sb = new StringBuilder();

        // markers
        var head = NL + "function drawMarkers(ctx) {" + NL;
        var tail = NL + "}" + NL;

        sb.Append(head);

        // grid
        sb.Append("ctx.beginPath();" + NL);
        for (int i = 0; i < GridX + 1; i++)
        {
            sb.Append(String.Format("ctx.moveTo({0}, {1});{2}", 
                ToStringEN(MinX + i * DeltaX), ToStringEN(MinY), NL));
            sb.Append(String.Format("ctx.lineTo({0}, {1});{2}", 
                ToStringEN(MinX + i * DeltaX), ToStringEN(MaxY), NL));
        }
        for (int j = 0; j < GridY + 1; j++)
        {
            sb.Append(String.Format("ctx.moveTo({0}, {1});{2}", 
                ToStringEN(MinX), ToStringEN(MinY + j * DeltaY), NL));
            sb.Append(String.Format("ctx.lineTo({0}, {1});{2}", 
                ToStringEN(MaxX), ToStringEN(MinY + j * DeltaY), NL));
        }
        sb.Append("ctx.stroke(); " + NL);


        // markers
        if (_points != null)
            foreach (var p in _points)
                sb.Append(string.Format("drawMark({0}, {1}, {2}, {3});{4}", 
                    ToStringEN(p.X), ToStringEN(p.Y), p.Color, ctx, NL));


        string clusterInfo = "0";
        if (_clusters != null)
        {
            foreach (var c in _clusters)
                sb.Append(string.Format("drawCluster({0}, {1}, {2}, {3}, {4});{5}", 
                    ToStringEN(c.X), ToStringEN(c.Y), c.Color,
                                        c.Size, ctx, NL));

            clusterInfo = _clusters.Count + string.Empty;
        }
        // bottom text                


        sb.Append("ctx.fillStyle = 'rgb(0,0,0)';" + NL);
        sb.Append(string.Format("ctx.fillText('Clusters = ' + {0}, {1}, {2});{3}", 
            clusterInfo, ToStringEN(MinX + 10), ToStringEN(MaxY + 16), NL));

        sb.Append(tail);

        CreateFile(sb.ToString());
        //Console.WriteLine(sb.ToString());
    }

    public static string ToStringEN(double d)
    {
        double rounded = Math.Round(d, 3);
         return rounded.ToString(S, Culture_enUS); 
    }

    public class XY
    {
        public double X { get; set; }
        public double Y { get; set; }
        public string Color { get; set; }
        public int Size { get; set; }
        public string XToString { get { return X.ToString(S, Culture_enUS); } }
        public string YToString { get { return Y.ToString(S, Culture_enUS); } }
        public XY(double x, double y)
        {
            X = x;
            Y = y;
            Color = "'rgb(0,0,0)'";//default
            Size = 1;
        }
    }
    public class Bucket
    {
        public string Id { get; private set; }
        public List<XY> Points { get; private set; }
        public XY Centroid { get; set; }
        public int Idx { get; private set; }
        public int Idy { get; private set; }
        private bool _IsUsed;
        public bool IsUsed
        {
            get { return _IsUsed && Centroid != null; }
            set { _IsUsed = value; }
        }

        public Bucket(int idx, int idy)
        {
            IsUsed = true;
            Centroid = null;
            Points = new List<XY>();
            Idx = idx;
            Idy = idy;
            Id = GetId(idx, idy);
        }
    }

    public abstract class ClusterAlgorithm
    {
        //id, bucket
        public readonly Dictionary<string, Bucket> bucketsLookup = new Dictionary<string, Bucket>(); 
        public abstract List<XY> GetCluster();
    }
    public class KMeans : ClusterAlgorithm
    {
        //NOT IMPLE YET
        public override List<XY> GetCluster()
        {
            return null;
        }
        /* 
        1) Random k centroids
        2) Cluster data by euclidean distance to centroids
        3) Update centroids by clustered data, 

http://en.wikipedia.org/wiki/Centroid#Of_a_finite_set_of_points

        4) Update cluster
        5) Continue last two steps until error is small, error is sum of diff 
            between current and updated centroid         
         */
    }

    public class GridCluster : ClusterAlgorithm
    {
        private readonly List<XY> _dataset;
        public GridCluster(List<XY> dataset)
        {
            _dataset = dataset;
        }

        public override List<XY> GetCluster()
        {
            var cluster = GridClusterAlgo(GridX, GridY);
            return cluster;
        }


        void MergeClustersGrid()
        {
            foreach (var key in bucketsLookup.Keys)
            {
                var bucket = bucketsLookup[key];
                if (bucket.IsUsed == false)
                    continue;

                var x = bucket.Idx;
                var y = bucket.Idy;

                // get keys for neighbors
                var N = GetId(x, y + 1);
                var NE = GetId(x + 1, y + 1);
                var E = GetId(x + 1, y);
                var SE = GetId(x + 1, y - 1);
                var S = GetId(x, y - 1);
                var SW = GetId(x - 1, y - 1);
                var W = GetId(x - 1, y);
                var NW = GetId(x - 1, y - 1);
                var neighbors = new[] { N, NE, E, SE, S, SW, W, NW };

                MergeClustersGridHelper(key, neighbors);
            }
        }

        void MergeClustersGridHelper(string currentKey, string[] neighborKeys)
        {
            double minDistX = DeltaX / 2.0;//heuristic
            double minDistY = DeltaY / 2.0;
            //if clusters in grid are too close to each other, merge them
            double minDist = MathTool.Min(minDistX, minDistY); 

            foreach (var neighborKey in neighborKeys)
            {
                if (!bucketsLookup.ContainsKey(neighborKey))
                    continue;

                var current = bucketsLookup[currentKey];
                var neighbor = bucketsLookup[neighborKey];
                if (neighbor.IsUsed == false)
                    continue;

                var dist = MathTool.Distance(current.Centroid, neighbor.Centroid);
                if (dist >= minDist)
                    continue;

                // merge
                var color = current.Centroid.Color;
                foreach (var p in neighbor.Points)
                {
                    //update color
                    p.Color = color;
                }

                current.Points.AddRange(neighbor.Points);

                // recalc centroid
                var cp = GetCentroidFromCluster(current.Points);
                current.Centroid = cp;
                neighbor.IsUsed = false; //merged, then not used anymore
            }
        }


        public List<XY> GridClusterAlgo(int gridx, int gridy)
        {
            var clusterPoints = new List<XY>();
            // params invalid
            if (_dataset == null || _dataset.Count == 0 || gridx <= 0 || gridy <= 0) 
                return clusterPoints;

            // put points in buckets
            foreach (var p in _dataset)
            {
                // find relative val
                var relativeX = p.X - MinX;
                var relativeY = p.Y - MinY;

                int idx = (int)(relativeX / DeltaX);
                int idy = (int)(relativeY / DeltaY);

                // bucket id
                string id = GetId(idx, idy);

                // bucket exists, add point
                if (bucketsLookup.ContainsKey(id))
                    bucketsLookup[id].Points.Add(p);

                // new bucket, create and add point
                else
                {
                    Bucket bucket = new Bucket(idx, idy);
                    bucket.Points.Add(p);
                    bucketsLookup.Add(id, bucket);
                }
            }

            // calc centrod for all buckets
            SetCentroidForAllBuckets(bucketsLookup.Values);

            // merge if gridpoint is to close
            MergeClustersGrid();

            // collect buckets non-empty
            foreach (var item in bucketsLookup)
            {
                var bucket = item.Value;
                if (bucket.IsUsed)
                    clusterPoints.Add(bucket.Centroid);
            }
            return clusterPoints;
        }
    }


    public class MathTool
    {
        private const double Exp = 2; // 2=euclid, 1=manhatten
        //Minkowski dist
        public static double Distance(Algorithm.XY a, Algorithm.XY b) 
        {
            return Math.Pow(Math.Pow(Math.Abs(a.X - b.X), Exp) + 
                Math.Pow(Math.Abs(a.Y - b.Y), Exp), 1.0 / Exp);
        }

        public static double Min(double a, double b)
        {
            return a <= b ? a : b;
        }
    }

    public static class FileUtil
    {
        public static bool WriteFile(string data, FileInfo fileInfo)
        {
            bool isSuccess = false;
            try
            {
                using (StreamWriter streamWriter = File.CreateText(fileInfo.FullName))
                {
                    streamWriter.Write(data);
                    isSuccess = true;
                }
            }
            catch (Exception ex) { Console.WriteLine(ex.StackTrace+"\nPress a key ... ");
                Console.ReadKey(); }

            return isSuccess;
        }
    }
}

Put both the html and javascript file in the c:\temp folder.
The C# program creates a new draw.js file in the c:\temp folder.

Open the html file with a modern browser that support canvas, like Firefox or Chrome.
The html file will display the same as the example image above out of the box.

canvas.html

<html>
<head>
    <title>Cluster test</title>            
    <script type="text/javascript">
        var G = {
            yellow: "#FF0",
            red: "#F00",
            green: "#0F0",
            blue: "#00F",
            black: "#000"
        }       

        window.onload = function draw() {
            var canvas = document.getElementById('canvas');
            if (canvas.getContext) {
                var ctx = canvas.getContext('2d');
                ctx.strokeStyle = G.black;
                ctx.font = "10pt Arial";
                drawMarkers(ctx); //use fn defined in draw.js
            }
        }

        function drawMark(cx, cy, color, ctx) {
            //draw a circle
            var radius = 4;
            ctx.beginPath();
            ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
            ctx.closePath();
            ctx.fillStyle = color;
            ctx.fill();
            ctx.stroke();
        }

        function drawCluster(cx, cy, color, n, ctx) {
            //draw a bigger circle            
            var radius = 12;
            ctx.beginPath();
            ctx.arc(cx, cy, radius, 0, Math.PI * 2, true);
            ctx.closePath();            
            ctx.stroke();
            ctx.fillStyle = color;
            ctx.fillText("" + n, cx-8, cy+2);
        }	  
    </script>
    <script type="text/javascript" src="draw.js"></script> 

    <style type="text/css">
        body{ margin-left: 10px; margin-top:10px;}
        canvas  {  border: 1px solid red; }        
    </style>
</head>
<body>
    <canvas id="canvas" width="410" height="330">
        <p>Your browser doesn't support canvas.</p>
    </canvas>
</body>
</html>

The javascript file will be overwritten when you run the C# code.

draw.js

function drawMarkers(ctx) {
    // By Kunuk Nykjaer
    ctx.beginPath();
    ctx.moveTo(10, 10);
    ctx.lineTo(10, 300);
    ctx.moveTo(107.5, 10);
    ctx.lineTo(107.5, 300);
    ctx.moveTo(205, 10);
    ctx.lineTo(205, 300);
    ctx.moveTo(302.5, 10);
    ctx.lineTo(302.5, 300);
    ctx.moveTo(400, 10);
    ctx.lineTo(400, 300);
    ctx.moveTo(10, 10);
    ctx.lineTo(400, 10);
    ctx.moveTo(10, 106.667);
    ctx.lineTo(400, 106.667);
    ctx.moveTo(10, 203.333);
    ctx.lineTo(400, 203.333);
    ctx.moveTo(10, 300);
    ctx.lineTo(400, 300);
    ctx.stroke();
    drawMark(24.566, 48.809, 'rgb(197,120,199)', ctx);
    drawMark(63.55, 22.318, 'rgb(197,120,199)', ctx);
    drawMark(36.17, 66.005, 'rgb(197,120,199)', ctx);
    drawMark(30.083, 23.226, 'rgb(197,120,199)', ctx);
    drawMark(96.758, 53.378, 'rgb(197,120,199)', ctx);
    drawMark(97.994, 13.421, 'rgb(197,120,199)', ctx);
    drawMark(91.573, 15.839, 'rgb(197,120,199)', ctx);
    drawMark(59.451, 23.006, 'rgb(197,120,199)', ctx);
    drawMark(63.699, 81.094, 'rgb(197,120,199)', ctx);
    drawMark(21.665, 15.711, 'rgb(197,120,199)', ctx);
    drawMark(298.417, 77.83, 'rgb(11,24,61)', ctx);
    drawMark(291.38, 86.759, 'rgb(11,24,61)', ctx);
    drawMark(281.345, 95.517, 'rgb(11,24,61)', ctx);
    drawMark(284.307, 92.13, 'rgb(11,24,61)', ctx);
    drawMark(240.746, 106.23, 'rgb(11,24,61)', ctx);
    drawMark(247.004, 85.546, 'rgb(11,24,61)', ctx);
    drawMark(291.715, 33.158, 'rgb(11,24,61)', ctx);
    drawMark(216.012, 33.45, 'rgb(11,24,61)', ctx);
    drawMark(260.559, 46.371, 'rgb(11,24,61)', ctx);
    drawMark(239.061, 103.01, 'rgb(11,24,61)', ctx);
    drawMark(67.791, 210.702, 'rgb(111,163,157)', ctx);
    drawMark(146.852, 211.327, 'rgb(111,163,157)', ctx);
    drawMark(82.629, 164.594, 'rgb(106,75,235)', ctx);
    drawMark(71.372, 244.351, 'rgb(111,163,157)', ctx);
    drawMark(110.755, 255.688, 'rgb(111,163,157)', ctx);
    drawMark(144.035, 185.499, 'rgb(96,173,79)', ctx);
    drawMark(116.13, 217.092, 'rgb(111,163,157)', ctx);
    drawMark(112.972, 216.736, 'rgb(111,163,157)', ctx);
    drawMark(117.428, 246.791, 'rgb(111,163,157)', ctx);
    drawMark(100.972, 210.653, 'rgb(111,163,157)', ctx);
    drawCluster(58.551, 36.281, 'rgb(197,120,199)', 10, ctx);
    drawCluster(265.055, 76, 'rgb(11,24,61)', 10, ctx);
    drawCluster(105.534, 226.668, 'rgb(111,163,157)', 8, ctx);
    drawCluster(82.629, 164.594, 'rgb(106,75,235)', 1, ctx);
    drawCluster(144.035, 185.499, 'rgb(96,173,79)', 1, ctx);
    ctx.fillStyle = 'rgb(0,0,0)';
    ctx.fillText('Clusters = ' + 5, 20, 316);
}

About these ads

Written by kunuk Nykjaer

September 15, 2011 at 11:24 pm

2 Responses

Subscribe to comments with RSS.

  1. […] clustering is a grid-based clustering using geo-stationary grid-placement. I was impressed by the performance of the solution from […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 42 other followers

%d bloggers like this: