Software Programming

Kunuk Nykjaer

Archive for the ‘Framework’ Category

Dependency Injection in .NET

leave a comment »

I have worked with miscellaneous IoC containers for C# and wanted to get a more in dept knowledge about the IoC topic.

If you are a software developer then the concept IoC is a must for you.
It’s a pattern used for loose coupling and IoC containers will make your software architecture more easy to manage.

Here are some benchmarks of the various containers which is frequently updated.

I recently read a book called Dependency Injection in .NET which I highly recommend if you are into C#. I will get geeky about the DI topic and geeky is good if you want to learn 🙂

The books covers the concept and introduce you with a poor mans DI container. Then it demonstrate IoC patterns and anti-patterns and ends with review of 5 different DI containers.

Advertisements

Written by kunuk Nykjaer

June 28, 2013 at 10:44 am

Posted in Csharp, Framework

Tagged with

Q-learning Library example with C#

with 10 comments

This is a follow up of my previous post – Q-learning example with Java
https://kunuk.wordpress.com/2010/09/24/q-learning/

q-learn-bender
Imagine you had a robot, just like Bender here and you want it to learn to navigate correctly to the destination (the beer). Your possible moves are move up, right, down or left from any room. And you want some kind of automatic system that gives you the solution. This is what this library can do for you. You configure the code with possible states, possible actions and the action outcomes. E.g. when you move right from the bedroom the action result is Bender has moved to the Hallway and if you move left from the bedroom then Bender hits the wall and stays in the bedroom. You will need to set up a reward value when Bender reaches the beer e.g. 100 points. That’s it!
No logic programming from your part, just some configuration.

DISCLAIMER:
There might be a bug in the implementation. Read at the comments section from Jens for more info.

In my previous post I made a quick example in Java and this time I will use C#.
This version will support non-deterministic action outcomes and I will include examples of how to use this library. The code examples are included at the bottom. You need QLearning.cs to run these examples.

  • Path finding Bender
  • Path finding demo (same example as in my previous post)
  • Rock paper scissors demo
  • Path finding advanced demo

Path finding Bender example
Path finding Bender result:

** Q-Learning structure **
State Bedroom
  Action up
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
  Action right
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Bedroom, QE 81
  Action down
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
  Action left
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
State Hallway
  Action up
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Hallway, QE 81
  Action right
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Hallway, QE 81
  Action down
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Hallway, QE 90
  Action left
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Hallway, QE 72.9
State Stairwell
  Action up
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Stairwell, QE 81
  Action right
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Stairwell, QE 90
  Action down
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Stairwell, QE 90
  Action left
     ActionResult State Kitchen, Prob. 1, Reward 100, PrevState Stairwell, QE 100

** Show Policy **
From state Bedroom do action right, max QEstimated is 81
From state Hallway do action down, max QEstimated is 90
From state Stairwell do action left, max QEstimated is 100

Bender has learned the fastest way to get some beer. He knows which direction to take from any room.

Path finding example
q-learn1

Path finding result:

** Q-Learning structure **
State A
  Action from_A_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState A, QE 90
  Action from_A_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState A, QE 72.9
State B
  Action from_B_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState B, QE 81
  Action from_B_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState B, QE 100
  Action from_B_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState B, QE 81
State C
  Action from_C_to_C
     ActionResult State C, Prob. 1, Reward 0, PrevState C, QE 0
State D
  Action from_D_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState D, QE 81
  Action from_D_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState D, QE 81
State E
  Action from_E_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState E, QE 90
  Action from_E_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_F
     ActionResult State F, Prob. 1, Reward 0, PrevState E, QE 90
State F
  Action from_F_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState F, QE 100
  Action from_F_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState F, QE 81

** Show Policy **
From state A do action from_A_to_B, max QEstimated is 90
From state B do action from_B_to_C, max QEstimated is 100
From state C do action from_C_to_C, max QEstimated is 0
From state D do action from_D_to_A, max QEstimated is 81
From state E do action from_E_to_B, max QEstimated is 90
From state F do action from_F_to_C, max QEstimated is 100

The results are exactly the same as in my previous post.

rock-paper-scissors

Rock paper scissors result:

** Q-Learning structure **
State Begin
  Action from_Begin_to_Rock
     ActionResult State Rock, Prob. 0.2, Reward 0, PrevState Begin, QE 0
     ActionResult State Paper, Prob. 0.5, Reward -10, PrevState Begin, QE -0.91
     ActionResult State Scissor, Prob. 0.3, Reward 100, PrevState Begin, QE 4.11

  Action from_Begin_to_Paper
     ActionResult State Rock, Prob. 0.2, Reward 100, PrevState Begin, QE 2.44
     ActionResult State Paper, Prob. 0.5, Reward 0, PrevState Begin, QE 0
     ActionResult State Scissor, Prob. 0.3, Reward -10, PrevState Begin, QE -0.41
  Action from_Begin_to_Scissor
     ActionResult State Rock, Prob. 0.2, Reward -10, PrevState Begin, QE -0.24
     ActionResult State Paper, Prob. 0.5, Reward 100, PrevState Begin, QE 9.09
     ActionResult State Scissor, Prob. 0.3, Reward 0, PrevState Begin, QE 0

** Show Policy **
From state Begin do action from_Begin_to_Scissor, max QEstimated is 9.09

** Opponent style **
style is rock 0.2 paper 0.5 scissor 0.3

Here the opponents style is to pick paper 50% of the time. The robot learns correctly to pick scissors as the action policy for maximizing the reward outcome.

Path finding advanced example
q-learn-adv

In the advanced version of path finding, there is a hill from state B to C.
whenever the robot wants to go climb the hill from state B to C, there is a 10% of success and 90% of chance it will slide back to state A.

Path finding advanced example result:

** Q-Learning structure **
State A
  Action from_A_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState A, QE 72.9
  Action from_A_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState A, QE 72.9
State B
  Action from_B_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState B, QE 65.61
  Action from_B_to_C
     ActionResult State C, Prob. 0.1, Reward 100, PrevState B, QE 1.1
     ActionResult State A, Prob. 0.9, Reward 0, PrevState B, QE 31.08
  Action from_B_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState B, QE 81
State C
  Action from_C_to_C
     ActionResult State C, Prob. 1, Reward 0, PrevState C, QE 0
State D
  Action from_D_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState D, QE 65.61
  Action from_D_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState D, QE 81
State E
  Action from_E_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_F
     ActionResult State F, Prob. 1, Reward 0, PrevState E, QE 90
State F
  Action from_F_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState F, QE 100
  Action from_F_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState F, QE 81

** Show Policy **
From state A do action from_A_to_B, max QEstimated is 72.9
From state B do action from_B_to_E, max QEstimated is 81
From state C do action from_C_to_C, max QEstimated is 0
From state D do action from_D_to_E, max QEstimated is 81
From state E do action from_E_to_F, max QEstimated is 90
From state F do action from_F_to_C, max QEstimated is 100

The robot has learned to take the action ‘go to state E’ instead of ‘go to state C’ when it is in state B.
10% of success is not good enough and the robot wisely chooses a longer but more rewarding path.

The data structure is as follows:
The QLearning algorithm has multiple states, for every state there are multiple possible actions and for each action there are multiple possible action outcomes.
q-learn

PathFindingBenderDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingBenderDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            Bedroom, Hallway, Stairwell, Kitchen
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning();
            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.Kitchen.EnumToString()); 

            // State Bedroom           
            stateName = StateNameEnum.Bedroom.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State Hallway           
            stateName = StateNameEnum.Hallway.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State Stairwell           
            stateName = StateNameEnum.Stairwell.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Kitchen.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0){Reward = 100});            
            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

PathFindingDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, 
            V, W, X, Y, Z
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                MaxExploreStepsWithinOneEpisode = 1000 };

            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.C.EnumToString()); 


            // State A           
            stateName = StateNameEnum.A.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action A -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext) ));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action A -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State B
            stateName = StateNameEnum.B.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // B -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // B -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // B -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State C
            stateName = StateNameEnum.C.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // C -> C
            stateNameNext = stateName;
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State D
            stateName = StateNameEnum.D.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // D -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // D -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State E
            stateName = StateNameEnum.E.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // E -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> F
            stateNameNext = StateNameEnum.F.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State F
            stateName = StateNameEnum.F.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // F -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // F -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

PathFindingAdvDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingAdvDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            A, B, C, D, E, F
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                MaxExploreStepsWithinOneEpisode = 1000 };

            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.C.EnumToString()); 


            // State A           
            stateName = StateNameEnum.A.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action A -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext) ));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action A -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State B
            stateName = StateNameEnum.B.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // B -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // B -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.C.EnumToString(), 0.1) { Reward = 100 });
            fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.A.EnumToString(), 0.9));
            // B -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State C
            stateName = StateNameEnum.C.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // C -> C
            stateNameNext = stateName;
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State D
            stateName = StateNameEnum.D.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // D -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // D -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State E
            stateName = StateNameEnum.E.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // E -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> F
            stateNameNext = StateNameEnum.F.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State F
            stateName = StateNameEnum.F.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // F -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // F -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

RockPaperScissorsDemo.cs

using System;
using System.Collections.Generic;
using QLearningFramework;

namespace ConsoleQLearning
{
    class RockPaperScissorsDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            Begin, Rock, Paper, Scissor
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;
            
            RockPaperScissors();

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }

         static void RockPaperScissors()
         {
             QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                 MaxExploreStepsWithinOneEpisode = 1000 };            

             var opponentStyles = new List<double[]>();
             // rock paper scissor probability styles
             opponentStyles.Add(new []{ 0.33,0.33,0.33} );
             opponentStyles.Add(new [] { 0.5, 0.3, 0.2 });
             opponentStyles.Add(new [] { 0.2, 0.5, 0.3 });
             opponentStyles.Add(new[] { 0.1, 0.1, 0.8 });
             int index = new Random().Next(opponentStyles.Count);
             var opponent = opponentStyles[index];

             // opponent probability pick 
             double rockOpponent = opponent[0];
             double paperOpponent = opponent[1];
             double scissorOpponent = opponent[2];

             QAction fromTo;
             QState state;
             string stateName;
             string stateNameNext;

             // ----------- Begin Insert the path setup here -----------

             // insert the end states here
             q.EndStates.Add(StateNameEnum.Rock.EnumToString());
             q.EndStates.Add(StateNameEnum.Paper.EnumToString());
             q.EndStates.Add(StateNameEnum.Scissor.EnumToString());

             // State Begin
             stateName = StateNameEnum.Begin.EnumToString();
             q.AddState(state = new QState(stateName, q));             
             
             // action Rock
             stateNameNext = StateNameEnum.Rock.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));             
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = 0 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = -10 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = 100 });             
             
             // action paper
             stateNameNext = StateNameEnum.Paper.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = 100 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = 0 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = -10 });

             // action scissor
             stateNameNext = StateNameEnum.Scissor.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = -10 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = 100 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = 0 });
             // ----------- End Insert the path setup here -----------

             q.RunTraining();
             q.PrintQLearningStructure();
             q.ShowPolicy();

             Console.WriteLine("\n** Opponent style **");
             Console.WriteLine(string.Format("style is rock {0} paper {1} scissor {2}", 
                 opponent[0].Pretty(), opponent[1].Pretty(), opponent[2].Pretty()));
         }        
    }
}

QLearning.cs

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;

namespace QLearningFramework
{  
    /// <summary>
    /// Version 0.1
    /// Author: Kunuk Nykjaer
    /// one to many relationships data structure
    /// QLearning [1 -- *] State [1 -- *] Action [1 -- *] ActionResult
    /// </summary>
    class QLearning
    {
        public List<QState> States { get; private set; }
        public Dictionary<string, QState> StateLookup { get; private set; }

        public double Alpha { get; internal set; }
        public double Gamma { get; internal set; }

        public HashSet<string> EndStates { get; private set; }
        public int MaxExploreStepsWithinOneEpisode { get; internal set; } //avoid infinite loop
        public bool ShowWarning { get; internal set; } // show runtime warnings regarding q-learning
        public int Episodes { get; internal set; }

        public QLearning()
        {
            States = new List<QState>();
            StateLookup = new Dictionary<string, QState>();
            EndStates = new HashSet<string>();

            // Default when not set
            MaxExploreStepsWithinOneEpisode = 1000;
            Episodes = 1000;
            Alpha = 0.1;
            Gamma = 0.9;
            ShowWarning = true;
        }

        public void AddState(QState state)
        {
            States.Add(state);
        }

        public void RunTraining()
        {
            QMethod.Validate(this);

            /*		 
		    For each episode: Select random initial state 
			Do while not reach goal state
				Select one among all possible actions for the current state 
				Using this possible action, consider to go to the next state 
				Get maximum Q value of this next state based on all possible actions 				
                Set the next state as the current state
		    */

            // For each episode
            var rand = new Random();
            long maxloopEventCount = 0;

            // Train episodes
            for (long i = 0; i < Episodes; i++)
            {
                long maxloop = 0;
                // Select random initial state			
                int stateIndex = rand.Next(States.Count);
                QState state = States[stateIndex];
                QAction action = null;
                do
                {
                    if (++maxloop > MaxExploreStepsWithinOneEpisode)
                    {
                        if(ShowWarning)
                        {
                            string msg = string.Format(
                            "{0} !! MAXLOOP state: {1} action: {2}, {3} endstate is to difficult to reach?",
                            ++maxloopEventCount, state, action, "maybe your path setup is wrong or the ");
                            QMethod.Log(msg);
                        }
                        
                        break;
                    }

                    // no actions, skip this state
                    if(state.Actions.Count==0)
                        break;

                    // Selection strategy is random based on probability
                    int index = rand.Next(state.Actions.Count);
                    action = state.Actions[index];

                    // Using this possible action, consider to go to the next state
                    // Pick random Action outcome
                    QActionResult nextStateResult = action.PickActionByProbability();
                    string nextStateName = nextStateResult.StateName;

                    double q = nextStateResult.QEstimated;
                    double r = nextStateResult.Reward;
                    double maxQ = MaxQ(nextStateName);

                    // Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(next state, all actions) - Q(s,a))
                    double value = q + Alpha * (r + Gamma * maxQ - q); // q-learning                  
                    nextStateResult.QValue = value; // update

                    // is end state go to next episode
                    if (EndStates.Contains(nextStateResult.StateName))
                        break;

                    // Set the next state as the current state                    
                    state = StateLookup[nextStateResult.StateName];

                } while (true);
            }
        }


        double MaxQ(string stateName)
        {
            const double defaultValue = 0;

            if(!StateLookup.ContainsKey(stateName))            
                return defaultValue;                            

            QState state = StateLookup[stateName];
            var actionsFromState = state.Actions;
            double? maxValue = null;
            foreach (var nextState in actionsFromState)
            {
                foreach (var actionResult in nextState.ActionsResult)
                {
                    double value = actionResult.QEstimated;
                    if (value > maxValue || !maxValue.HasValue)
                        maxValue = value;
                }
            }

            // no update
            if (!maxValue.HasValue && ShowWarning) 
                QMethod.Log(string.Format("Warning: No MaxQ value for stateName {0}",
                    stateName) );

            return maxValue.HasValue ? maxValue.Value : defaultValue;
        }

        public void PrintQLearningStructure()
        {
            Console.WriteLine("** Q-Learning structure **");
            foreach (QState state in States)
            {
                Console.WriteLine("State {0}", state.StateName);
                foreach (QAction action in state.Actions)
                {
                    Console.WriteLine("  Action " + action.ActionName);
                    Console.Write(action.GetActionResults());
                }
            }
            Console.WriteLine();
        }

        public void ShowPolicy()
        {
            Console.WriteLine("** Show Policy **");
            foreach (QState state in States)
            {
                double max = Double.MinValue;
                string actionName = "nothing";
                foreach (QAction action in state.Actions)
                {
                    foreach (QActionResult actionResult in action.ActionsResult)
                    {
                        if (actionResult.QEstimated > max)
                        {
                            max = actionResult.QEstimated;
                            actionName = action.ActionName.ToString();
                        }
                    }
                }

                Console.WriteLine(string.Format("From state {0} do action {1}, max QEstimated is {2}",
                    state.StateName, actionName, max.Pretty()));
            }
        }
    }

    class QState
    {        
        public string StateName { get; private set; }
        public List<QAction> Actions { get; private set; }

        public void AddAction(QAction action)
        {
            Actions.Add(action);
        }

        public QState(string stateName, QLearning q)
        {            
            q.StateLookup.Add(stateName, this);
            StateName = stateName;
            Actions = new List<QAction>();
        }

        public override string ToString()
        {
            return string.Format("StateName {0}", StateName);
        }
    }

    class QAction
    {
        private static readonly Random Rand = new Random();
        public QActionName ActionName { get; internal set; }
        public string CurrentState { get; private set; }
        public List<QActionResult> ActionsResult { get; private set; }

        public void AddActionResult(QActionResult actionResult)
        {
            ActionsResult.Add(actionResult);
        }

        public string GetActionResults()
        {
            var sb = new StringBuilder();
            foreach (QActionResult actionResult in ActionsResult)
                sb.AppendLine("     ActionResult " + actionResult);

            return sb.ToString();
        }

        public QAction(string currentState, QActionName actionName = null)
        {
            CurrentState = currentState;
            ActionsResult = new List<QActionResult>();
            ActionName = actionName;
        }

        // The sum of action outcomes must be close to 1
        public void ValidateActionsResultProbability()
        {
            const double epsilon = 0.1;

            if (ActionsResult.Count == 0)
                throw new ApplicationException(string.Format(
                    "ValidateActionsResultProbability is invalid, no action results:\n {0}", 
                    this));

            double sum = ActionsResult.Sum(a => a.Probability);
            if (Math.Abs(1 - sum) > epsilon)
                throw new ApplicationException(string.Format(
                    "ValidateActionsResultProbability is invalid:\n {0}", this));
        }

        public QActionResult PickActionByProbability()
        {
            double d = Rand.NextDouble();
            double sum = 0;
            foreach (QActionResult actionResult in ActionsResult)
            {
                sum += actionResult.Probability;
                if (d <= sum)
                    return actionResult;
            }
            
            // we might get here if sum probability is below 1.0 e.g. 0.99 
            // and the d random value is 0.999
            if (ActionsResult.Count > 0) 
                return ActionsResult.Last();

            throw new ApplicationException(string.Format("No PickAction result: {0}", this));
        }

        public override string ToString()
        {
            double sum = ActionsResult.Sum(a => a.Probability);
            return string.Format("ActionName {0} probability sum: {1} actionResultCount {2}",
                ActionName, sum, ActionsResult.Count);
        }
    }

    class QActionResult
    {
        public string StateName { get; internal set; }
        public string PrevStateName { get; internal set; }
        public double QValue { get; internal set; } // Q value is stored here        
        public double Probability { get; internal set; }
        public double Reward { get; internal set; }

        public double QEstimated
        {
            get { return QValue * Probability; }
        }

        public QActionResult(QAction action, string stateNameNext = null, 
            double probability = 1, double reward = 0)
        {
            PrevStateName = action.CurrentState;
            StateName = stateNameNext;
            Probability = probability;
            Reward = reward;
        }

        public override string ToString()
        {
            return string.Format("State {0}, Prob. {1}, Reward {2}, PrevState {3}, QE {4}",
                StateName, Probability.Pretty(), Reward, PrevStateName, QEstimated.Pretty());
        }
    }

    public class QActionName
    {
        public string From { get; private set; }
        public string To { get; private set; }

        public QActionName(string from, string to = null)
        {
            From = from;
            To = to;
        }

        public override string ToString()
        {
            return GetActionName();
        }

        public string GetActionName()
        {
            if (To == null) 
                return From;
            return QMethod.ActionNameFromTo(From,To);
        }
    }

    static class QMethod
    {
        public static void Log(string s)
        {
            Console.WriteLine(s);
        }

        public static readonly CultureInfo CultureEnUs = new CultureInfo("en-US");

        public static string ToStringEnUs(this double d)
        {
            return d.ToString("G", CultureEnUs);
        }
        public static string Pretty(this double d)
        {
            return ToStringEnUs(Math.Round(d, 2));
        }

        public static string ActionNameFromTo(string a, string b)
        {
            return string.Format("from_{0}_to_{1}", a, b);
        }

        public static string EnumToString<T>(this T type)
        {
            return Enum.GetName(typeof(T), type);
        }

        public static void ValidateRange(double d, string origin = null)
        {
            if (d < 0 || d > 1)
            {
                string s = origin ?? string.Empty;
                throw new ApplicationException(string.Format("ValidateRange error: {0} {1}", d, s));
            }
        }

        public static void Validate(QLearning q)
        {
            foreach (var state in q.States)
            {
                foreach (var action in state.Actions)
                {
                    action.ValidateActionsResultProbability();
                }
            }
        }
    }
}

Currently you have to know all the probability values in advance for the action outcomes, which are used for the Q-learning structure setup. The source code can easily be adapted if you need runtime knowledge of probability outcomes. Just add a probability function lookup and look at the Q-learning formula calculation area. Maybe I will look more into that at a later time.

Written by kunuk Nykjaer

January 14, 2012 at 7:53 pm

Google Maps Server-side Clustering with Asp.net C#

with 35 comments

  • This project is available at Github
  • Original project location is available at Google Code

For a project in my work I made a server-side clustering of data to be displayed in Google Maps.

The clustering is a grid-based clustering using geo-stationary grid-placement.
I was impressed by the performance of the solution from Maptimize
They have this example webside Crunch Panorama

I wanted to make something similar and have made an example project of server-side clustering. The demo can be seen here Google Maps Clustering. The grid has been included in the view for demonstration. Clusters that are close to each other are merged.

The search options is included by example code from Cibul Tech Blog.


googlemapclustering

Google Maps Clustering Viewport:

Here is the overview of how the viewport works.
googlemaps-clustering-viewport

Client-server Ajax communication demonstration:

This Google Maps clustering project uses two webservice methods: GetMarkers and GetMarkerDetail.
The GetMarkers method returns the marker and cluster information and GetMarkerDetail method returns specific information for a selected marker.

googlemapclustering SD

GetMarkers method:
The client sends the boundary of the screen using Norht-east and South-west latlon point and the zoom level.
Client json get markers request:

 
{"nelat":"62.511244050310154","nelon":"30.008460312500006",
  "swlat":"47.574956142655","swlon":"-5.235680312499994","zoomlevel":"5"}

The server does the server-side clustering based on the information from the client and returns only the data that is to be displayed to the client. Here it is two markers with latlon information. When the count (the C) is larger than 1, then it is a cluster marker, else it is a details marker. The details marker has the I and T information set. The I is the id and the T is type information.
Server json show markers reply:

 
{"Points":[
  {"Y":"56.05777","X":"9.30294","C":1624,"I":"","T":""},
  {"Y":"55.09446","X":"15.11401","C":"1","I":"aa005c3b-9301-4040-bac3-3dee8cb29b48","T":"2"}
]}

The client examines the reply and removes markers that should no longer be visible and only adds new marker to the screen. The latlon and count information are used to build a key for identifying unique marker objects.

GetMarkerDetail method:
The client sends a details request for a specific marker using an id-information.
Client json get marker detail: request

 
  {"id":"aa005c3b-9301-4040-bac3-3dee8cb29b48"}

The server replies by giving content for the specific marker.
Server json show marker detail reply:

 
  {"Content":"CONTENT INFORMATION HERE"}

Math information

The centroid calculation is based on circular mean. This is because the longitude overlaps at -180 and 180 degrees. There are no overlapping using latitude with Google Maps (haven’t seen or noticed it).

The distances are calculated using Euclidean distance. This calculation is fast and it is good enough for clustering the points into a cluster.

You usually only need as a maximum 6 decimal precision when using latlon and Google Maps. This gives within 1 meter precision.

The grid-size is fixed and adapts to the zoom level.
By doing one step zooming in Google Maps the latlon window range is doubled or halved.
The geo-stationary grid-placement indexes are achieved using divide and modulus calculation.

Time complexity is on average(m*n) and space complexity is O(m*n)
where n is the number of points used in total and m is the number of grids returned to the client.
You are welcome to analyse it yourself and correct me. I use hashset and hashmap to minimize the time complexity. For fast enough response time (below 1 sec) the number of markers should be max 300.000.
Time complexity is ~ O(n^2) on worst case but extremely unlikely,
happens if most centroids are merged with neighbor-centroids.

The DB in this project is a simple file with list of points which is loaded in the memory. The algorithm uses linear search and can (probably) be improved using spatial data structure.

Written by kunuk Nykjaer

November 5, 2011 at 6:46 pm

Simple debug framework example in Java

leave a comment »

Just a simple your own quick and dirty debug/log framework.
Ability to turn on/off debug prints and setting debugging level.
Implemented in Java

D.java

import java.text.DecimalFormat;
import java.text.NumberFormat;

/**
 * @author Kunuk Nykjaer
 * DebugFramework
 */

public class D {
	private static boolean debug = true;
	public final static int DEBUG_LEVEL_FATAL = 7;
	public final static int DEBUG_LEVEL_ERROR = 6;
	public final static int DEBUG_LEVEL_WARNING = 5;
	public final static int DEBUG_LEVEL_INFO = 4;
	public final static int DEBUG_LEVEL_FINE = 3;
	public final static int DEBUG_LEVEL_FINER = 2;
	public final static int DEBUG_LEVEL_FINEST = 1;
	private static int debugLevel = DEBUG_LEVEL_INFO; // default
	private static long begin = System.currentTimeMillis();	
	private final static NumberFormat nf = new DecimalFormat("#.##");
	
	public static void setDebugLevel(int level)
	{
		debugLevel = level;
	}
	public static void setDebug(boolean b)
	{
		debug = b;
	}
	public static void resetTime()
	{
		begin = System.currentTimeMillis();
	}
	
	
	public static void debug(Object o)
	{
		debug(o, DEBUG_LEVEL_INFO); //default
	}
	public static void debug(Object o, int level)
	{
		
		if(debug && debugLevel <= level){
			double secs = (System.currentTimeMillis() - begin) / 1000.0;
			String time =  nf.format(secs) + " sec: ";
			System.out.println("debug "+time +o);
		}
	}
}

Test.java

public class Test {
		public static void main(String[] args) {

			D.setDebug(true);
			D.resetTime();

			String one = "one";
			String two = "two";
			String three = "three";	

			D.setDebugLevel(D.DEBUG_LEVEL_INFO);
			System.out.println("-- DEBUG_LEVEL_INFO");
			D.debug(one);	// implicit INFO level
			D.debug(two, D.DEBUG_LEVEL_WARNING);
			D.debug(three, D.DEBUG_LEVEL_ERROR);	
			
			D.setDebugLevel(D.DEBUG_LEVEL_WARNING);
			System.out.println("-- DEBUG_LEVEL_WARNING");
			D.debug(one);	// implicit INFO level
			D.debug(two, D.DEBUG_LEVEL_WARNING);
			D.debug(three, D.DEBUG_LEVEL_ERROR);

			D.setDebugLevel(D.DEBUG_LEVEL_ERROR);
			System.out.println("-- DEBUG_LEVEL_ERROR");
			D.debug(one);	// implicit INFO level
			D.debug(two, D.DEBUG_LEVEL_WARNING);
			D.debug(three, D.DEBUG_LEVEL_ERROR);	

			// turn off debug
			D.setDebug(false);
			System.out.println("-- DEBUG set to false");
			D.debug(one);	// implicit INFO level
			D.debug(two, D.DEBUG_LEVEL_WARNING);
			D.debug(three, D.DEBUG_LEVEL_ERROR);
		}		
}

Result:

— DEBUG_LEVEL_INFO
debug 0 sec: one
debug 0 sec: two
debug 0 sec: three
— DEBUG_LEVEL_WARNING
debug 0 sec: two
debug 0 sec: three
— DEBUG_LEVEL_ERROR
debug 0 sec: three
— DEBUG set to false

Written by kunuk Nykjaer

September 28, 2010 at 12:50 am

Posted in Framework, Java