Software Programming

Kunuk Nykjaer

Q-learning Framework example with C#

with one comment

This is a follow up of my previous post – Q-learning example with Java
http://kunuk.wordpress.com/2010/09/24/q-learning/

q-learn-bender
Imagine you had a robot, just like Bender here and you want it to learn to navigate correctly to the destination (the beer). Your possible moves are move up, right, down or left from any room. And you want some kind of automatic system that gives you the solution. This is what this framework can do for you. You configure the framework with possible states, possible actions and the action outcomes. E.g. when you move right from the bedroom the action result is Bender has moved to the Hallway and if you move left from the bedroom then Bender hits the wall and stays in the bedroom. You will need to set up a reward value when Bender reaches the beer e.g. 100 points. That’s it!
No logic programming from your part, just some configuration.

In my previous post I made a quick example in Java and this time I will use C#.
This version will support non-deterministic action outcomes and I will include examples of how to use this framework. The code examples are included at the bottom. You need QLearning.cs to run these examples.

  • Path finding Bender
  • Path finding demo (same example as in my previous post)
  • Rock paper scissors demo
  • Path finding advanced demo

Path finding Bender example
Path finding Bender result:

** Q-Learning structure **
State Bedroom
  Action up
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
  Action right
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Bedroom, QE 81
  Action down
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
  Action left
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Bedroom, QE 72.9
State Hallway
  Action up
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Hallway, QE 81
  Action right
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Hallway, QE 81
  Action down
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Hallway, QE 90
  Action left
     ActionResult State Bedroom, Prob. 1, Reward 0, PrevState Hallway, QE 72.9
State Stairwell
  Action up
     ActionResult State Hallway, Prob. 1, Reward 0, PrevState Stairwell, QE 81
  Action right
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Stairwell, QE 90
  Action down
     ActionResult State Stairwell, Prob. 1, Reward 0, PrevState Stairwell, QE 90
  Action left
     ActionResult State Kitchen, Prob. 1, Reward 100, PrevState Stairwell, QE 100

** Show Policy **
From state Bedroom do action right, max QEstimated is 81
From state Hallway do action down, max QEstimated is 90
From state Stairwell do action left, max QEstimated is 100

Bender has learned the fastest way to get some beer. He knows which direction to take from any room.

Path finding example
q-learn1

Path finding result:

** Q-Learning structure **
State A
  Action from_A_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState A, QE 90
  Action from_A_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState A, QE 72.9
State B
  Action from_B_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState B, QE 81
  Action from_B_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState B, QE 100
  Action from_B_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState B, QE 81
State C
  Action from_C_to_C
     ActionResult State C, Prob. 1, Reward 0, PrevState C, QE 0
State D
  Action from_D_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState D, QE 81
  Action from_D_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState D, QE 81
State E
  Action from_E_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState E, QE 90
  Action from_E_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_F
     ActionResult State F, Prob. 1, Reward 0, PrevState E, QE 90
State F
  Action from_F_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState F, QE 100
  Action from_F_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState F, QE 81

** Show Policy **
From state A do action from_A_to_B, max QEstimated is 90
From state B do action from_B_to_C, max QEstimated is 100
From state C do action from_C_to_C, max QEstimated is 0
From state D do action from_D_to_A, max QEstimated is 81
From state E do action from_E_to_B, max QEstimated is 90
From state F do action from_F_to_C, max QEstimated is 100

The results are exactly the same as in my previous post.

rock-paper-scissors

Rock paper scissors result:

** Q-Learning structure **
State Begin
  Action from_Begin_to_Rock
     ActionResult State Rock, Prob. 0.2, Reward 0, PrevState Begin, QE 0
     ActionResult State Paper, Prob. 0.5, Reward -10, PrevState Begin, QE -0.91
     ActionResult State Scissor, Prob. 0.3, Reward 100, PrevState Begin, QE 4.11

  Action from_Begin_to_Paper
     ActionResult State Rock, Prob. 0.2, Reward 100, PrevState Begin, QE 2.44
     ActionResult State Paper, Prob. 0.5, Reward 0, PrevState Begin, QE 0
     ActionResult State Scissor, Prob. 0.3, Reward -10, PrevState Begin, QE -0.41
  Action from_Begin_to_Scissor
     ActionResult State Rock, Prob. 0.2, Reward -10, PrevState Begin, QE -0.24
     ActionResult State Paper, Prob. 0.5, Reward 100, PrevState Begin, QE 9.09
     ActionResult State Scissor, Prob. 0.3, Reward 0, PrevState Begin, QE 0

** Show Policy **
From state Begin do action from_Begin_to_Scissor, max QEstimated is 9.09

** Opponent style **
style is rock 0.2 paper 0.5 scissor 0.3

Here the opponents style is to pick paper 50% of the time. The robot learns correctly to pick scissors as the action policy for maximizing the reward outcome.

Path finding advanced example
q-learn-adv

In the advanced version of path finding, there is a hill from state B to C.
whenever the robot wants to go climb the hill from state B to C, there is a 10% of success and 90% of chance it will slide back to state A.

Path finding advanced example result:

** Q-Learning structure **
State A
  Action from_A_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState A, QE 72.9
  Action from_A_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState A, QE 72.9
State B
  Action from_B_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState B, QE 65.61
  Action from_B_to_C
     ActionResult State C, Prob. 0.1, Reward 100, PrevState B, QE 1.1
     ActionResult State A, Prob. 0.9, Reward 0, PrevState B, QE 31.08
  Action from_B_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState B, QE 81
State C
  Action from_C_to_C
     ActionResult State C, Prob. 1, Reward 0, PrevState C, QE 0
State D
  Action from_D_to_A
     ActionResult State A, Prob. 1, Reward 0, PrevState D, QE 65.61
  Action from_D_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState D, QE 81
State E
  Action from_E_to_B
     ActionResult State B, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_D
     ActionResult State D, Prob. 1, Reward 0, PrevState E, QE 72.9
  Action from_E_to_F
     ActionResult State F, Prob. 1, Reward 0, PrevState E, QE 90
State F
  Action from_F_to_C
     ActionResult State C, Prob. 1, Reward 100, PrevState F, QE 100
  Action from_F_to_E
     ActionResult State E, Prob. 1, Reward 0, PrevState F, QE 81

** Show Policy **
From state A do action from_A_to_B, max QEstimated is 72.9
From state B do action from_B_to_E, max QEstimated is 81
From state C do action from_C_to_C, max QEstimated is 0
From state D do action from_D_to_E, max QEstimated is 81
From state E do action from_E_to_F, max QEstimated is 90
From state F do action from_F_to_C, max QEstimated is 100

The robot has learned to take the action ‘go to state E’ instead of ‘go to state C’ when it is in state B.
10% of success is not good enough and the robot wisely chooses a longer but more rewarding path.

The data structure is as follows:
The QLearning algorithm has multiple states, for every state there are multiple possible actions and for each action there are multiple possible action outcomes.
q-learn

PathFindingBenderDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingBenderDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            Bedroom, Hallway, Stairwell, Kitchen
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning();
            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.Kitchen.EnumToString()); 

            // State Bedroom           
            stateName = StateNameEnum.Bedroom.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State Hallway           
            stateName = StateNameEnum.Hallway.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Bedroom.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State Stairwell           
            stateName = StateNameEnum.Stairwell.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action up
            stateNameNext = StateNameEnum.Hallway.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("up")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action right
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("right")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action down
            stateNameNext = StateNameEnum.Stairwell.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("down")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action left
            stateNameNext = StateNameEnum.Kitchen.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName("left")));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0){Reward = 100});            
            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

PathFindingDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, 
            V, W, X, Y, Z
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                MaxExploreStepsWithinOneEpisode = 1000 };

            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.C.EnumToString()); 


            // State A           
            stateName = StateNameEnum.A.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action A -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext) ));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action A -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State B
            stateName = StateNameEnum.B.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // B -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // B -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // B -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State C
            stateName = StateNameEnum.C.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // C -> C
            stateNameNext = stateName;
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State D
            stateName = StateNameEnum.D.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // D -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // D -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State E
            stateName = StateNameEnum.E.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // E -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> F
            stateNameNext = StateNameEnum.F.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State F
            stateName = StateNameEnum.F.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // F -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // F -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

PathFindingAdvDemo.cs

using System;
using QLearningFramework;

namespace ConsoleQLearning
{
    class PathFindingAdvDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            A, B, C, D, E, F
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;

            PathFinding();            

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }         

        static void PathFinding()
        {
            QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                MaxExploreStepsWithinOneEpisode = 1000 };

            QAction fromTo;
            QState state;
            string stateName;
            string stateNameNext;

            // ----------- Begin Insert the path setup here -----------
            // insert the end states here, e.g. goal state
            q.EndStates.Add(StateNameEnum.C.EnumToString()); 


            // State A           
            stateName = StateNameEnum.A.EnumToString();
            q.AddState(state = new QState(stateName, q));
            // action A -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext) ));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // action A -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State B
            stateName = StateNameEnum.B.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // B -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // B -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            // action outcome probability
            fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.C.EnumToString(), 0.1) { Reward = 100 });
            fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.A.EnumToString(), 0.9));
            // B -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State C
            stateName = StateNameEnum.C.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // C -> C
            stateNameNext = stateName;
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State D
            stateName = StateNameEnum.D.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // D -> A
            stateNameNext = StateNameEnum.A.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // D -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State E
            stateName = StateNameEnum.E.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // E -> B
            stateNameNext = StateNameEnum.B.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> D
            stateNameNext = StateNameEnum.D.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));
            // E -> F
            stateNameNext = StateNameEnum.F.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // State F
            stateName = StateNameEnum.F.EnumToString();
            q.States.Add(state = new QState(stateName, q));
            // F -> C
            stateNameNext = StateNameEnum.C.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0) { Reward = 100 });
            // F -> E
            stateNameNext = StateNameEnum.E.EnumToString();
            state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
            fromTo.AddActionResult(new QActionResult(fromTo, stateNameNext, 1.0));

            // ----------- End Insert the path setup here -----------

            q.RunTraining();
            q.PrintQLearningStructure();
            q.ShowPolicy();
        }
    }
}

RockPaperScissorsDemo.cs

using System;
using System.Collections.Generic;
using QLearningFramework;

namespace ConsoleQLearning
{
    class RockPaperScissorsDemo
    {
        // ----------- Insert the state names here -----------
        internal enum StateNameEnum
        {
            Begin, Rock, Paper, Scissor
        }
        // ----------- End Insert the state names here -------

        static void Main(string[] args)
        {
            DateTime starttime = DateTime.Now;
            
            RockPaperScissors();

            double timespend = DateTime.Now.Subtract(starttime).TotalSeconds;
            Console.WriteLine("\n{0} sec. press a key ...", timespend.Pretty()); Console.ReadKey();
        }

         static void RockPaperScissors()
         {
             QLearning q = new QLearning { Episodes = 1000, Alpha = 0.1, Gamma = 0.9, 
                 MaxExploreStepsWithinOneEpisode = 1000 };            

             var opponentStyles = new List<double[]>();
             // rock paper scissor probability styles
             opponentStyles.Add(new []{ 0.33,0.33,0.33} );
             opponentStyles.Add(new [] { 0.5, 0.3, 0.2 });
             opponentStyles.Add(new [] { 0.2, 0.5, 0.3 });
             opponentStyles.Add(new[] { 0.1, 0.1, 0.8 });
             int index = new Random().Next(opponentStyles.Count);
             var opponent = opponentStyles[index];

             // opponent probability pick 
             double rockOpponent = opponent[0];
             double paperOpponent = opponent[1];
             double scissorOpponent = opponent[2];

             QAction fromTo;
             QState state;
             string stateName;
             string stateNameNext;

             // ----------- Begin Insert the path setup here -----------

             // insert the end states here
             q.EndStates.Add(StateNameEnum.Rock.EnumToString());
             q.EndStates.Add(StateNameEnum.Paper.EnumToString());
             q.EndStates.Add(StateNameEnum.Scissor.EnumToString());

             // State Begin
             stateName = StateNameEnum.Begin.EnumToString();
             q.AddState(state = new QState(stateName, q));             
             
             // action Rock
             stateNameNext = StateNameEnum.Rock.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));             
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = 0 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = -10 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = 100 });             
             
             // action paper
             stateNameNext = StateNameEnum.Paper.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = 100 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = 0 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = -10 });

             // action scissor
             stateNameNext = StateNameEnum.Scissor.EnumToString();
             state.AddAction(fromTo = new QAction(stateName, new QActionName(stateName, stateNameNext)));
             // action outcome probability
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Rock.EnumToString(), 
                 rockOpponent) { Reward = -10 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Paper.EnumToString(), 
                 paperOpponent) { Reward = 100 });
             fromTo.AddActionResult(new QActionResult(fromTo, StateNameEnum.Scissor.EnumToString(), 
                 scissorOpponent) { Reward = 0 });
             // ----------- End Insert the path setup here -----------

             q.RunTraining();
             q.PrintQLearningStructure();
             q.ShowPolicy();

             Console.WriteLine("\n** Opponent style **");
             Console.WriteLine(string.Format("style is rock {0} paper {1} scissor {2}", 
                 opponent[0].Pretty(), opponent[1].Pretty(), opponent[2].Pretty()));
         }        
    }
}

QLearning.cs

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;

namespace QLearningFramework
{  
    /// <summary>
    /// Version 0.1
    /// Author: Kunuk Nykjaer
    /// one to many relationships data structure
    /// QLearning [1 -- *] State [1 -- *] Action [1 -- *] ActionResult
    /// </summary>
    class QLearning
    {
        public List<QState> States { get; private set; }
        public Dictionary<string, QState> StateLookup { get; private set; }

        public double Alpha { get; internal set; }
        public double Gamma { get; internal set; }

        public HashSet<string> EndStates { get; private set; }
        public int MaxExploreStepsWithinOneEpisode { get; internal set; } //avoid infinite loop
        public bool ShowWarning { get; internal set; } // show runtime warnings regarding q-learning
        public int Episodes { get; internal set; }

        public QLearning()
        {
            States = new List<QState>();
            StateLookup = new Dictionary<string, QState>();
            EndStates = new HashSet<string>();

            // Default when not set
            MaxExploreStepsWithinOneEpisode = 1000;
            Episodes = 1000;
            Alpha = 0.1;
            Gamma = 0.9;
            ShowWarning = true;
        }

        public void AddState(QState state)
        {
            States.Add(state);
        }

        public void RunTraining()
        {
            QMethod.Validate(this);

            /*		 
		    For each episode: Select random initial state 
			Do while not reach goal state
				Select one among all possible actions for the current state 
				Using this possible action, consider to go to the next state 
				Get maximum Q value of this next state based on all possible actions 				
                Set the next state as the current state
		    */

            // For each episode
            var rand = new Random();
            long maxloopEventCount = 0;

            // Train episodes
            for (long i = 0; i < Episodes; i++)
            {
                long maxloop = 0;
                // Select random initial state			
                int stateIndex = rand.Next(States.Count);
                QState state = States[stateIndex];
                QAction action = null;
                do
                {
                    if (++maxloop > MaxExploreStepsWithinOneEpisode)
                    {
                        if(ShowWarning)
                        {
                            string msg = string.Format(
                            "{0} !! MAXLOOP state: {1} action: {2}, {3} endstate is to difficult to reach?",
                            ++maxloopEventCount, state, action, "maybe your path setup is wrong or the ");
                            QMethod.Log(msg);
                        }
                        
                        break;
                    }

                    // no actions, skip this state
                    if(state.Actions.Count==0)
                        break;

                    // Selection strategy is random based on probability
                    int index = rand.Next(state.Actions.Count);
                    action = state.Actions[index];

                    // Using this possible action, consider to go to the next state
                    // Pick random Action outcome
                    QActionResult nextStateResult = action.PickActionByProbability();
                    string nextStateName = nextStateResult.StateName;

                    double q = nextStateResult.QEstimated;
                    double r = nextStateResult.Reward;
                    double maxQ = MaxQ(nextStateName);

                    // Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(next state, all actions) - Q(s,a))
                    double value = q + Alpha * (r + Gamma * maxQ - q); // q-learning                  
                    nextStateResult.QValue = value; // update

                    // is end state go to next episode
                    if (EndStates.Contains(nextStateResult.StateName))
                        break;

                    // Set the next state as the current state                    
                    state = StateLookup[nextStateResult.StateName];

                } while (true);
            }
        }


        double MaxQ(string stateName)
        {
            const double defaultValue = 0;

            if(!StateLookup.ContainsKey(stateName))            
                return defaultValue;                            

            QState state = StateLookup[stateName];
            var actionsFromState = state.Actions;
            double? maxValue = null;
            foreach (var nextState in actionsFromState)
            {
                foreach (var actionResult in nextState.ActionsResult)
                {
                    double value = actionResult.QEstimated;
                    if (value > maxValue || !maxValue.HasValue)
                        maxValue = value;
                }
            }

            // no update
            if (!maxValue.HasValue && ShowWarning) 
                QMethod.Log(string.Format("Warning: No MaxQ value for stateName {0}",
                    stateName) );

            return maxValue.HasValue ? maxValue.Value : defaultValue;
        }

        public void PrintQLearningStructure()
        {
            Console.WriteLine("** Q-Learning structure **");
            foreach (QState state in States)
            {
                Console.WriteLine("State {0}", state.StateName);
                foreach (QAction action in state.Actions)
                {
                    Console.WriteLine("  Action " + action.ActionName);
                    Console.Write(action.GetActionResults());
                }
            }
            Console.WriteLine();
        }

        public void ShowPolicy()
        {
            Console.WriteLine("** Show Policy **");
            foreach (QState state in States)
            {
                double max = Double.MinValue;
                string actionName = "nothing";
                foreach (QAction action in state.Actions)
                {
                    foreach (QActionResult actionResult in action.ActionsResult)
                    {
                        if (actionResult.QEstimated > max)
                        {
                            max = actionResult.QEstimated;
                            actionName = action.ActionName.ToString();
                        }
                    }
                }

                Console.WriteLine(string.Format("From state {0} do action {1}, max QEstimated is {2}",
                    state.StateName, actionName, max.Pretty()));
            }
        }
    }

    class QState
    {        
        public string StateName { get; private set; }
        public List<QAction> Actions { get; private set; }

        public void AddAction(QAction action)
        {
            Actions.Add(action);
        }

        public QState(string stateName, QLearning q)
        {            
            q.StateLookup.Add(stateName, this);
            StateName = stateName;
            Actions = new List<QAction>();
        }

        public override string ToString()
        {
            return string.Format("StateName {0}", StateName);
        }
    }

    class QAction
    {
        private static readonly Random Rand = new Random();
        public QActionName ActionName { get; internal set; }
        public string CurrentState { get; private set; }
        public List<QActionResult> ActionsResult { get; private set; }

        public void AddActionResult(QActionResult actionResult)
        {
            ActionsResult.Add(actionResult);
        }

        public string GetActionResults()
        {
            var sb = new StringBuilder();
            foreach (QActionResult actionResult in ActionsResult)
                sb.AppendLine("     ActionResult " + actionResult);

            return sb.ToString();
        }

        public QAction(string currentState, QActionName actionName = null)
        {
            CurrentState = currentState;
            ActionsResult = new List<QActionResult>();
            ActionName = actionName;
        }

        // The sum of action outcomes must be close to 1
        public void ValidateActionsResultProbability()
        {
            const double epsilon = 0.1;

            if (ActionsResult.Count == 0)
                throw new ApplicationException(string.Format(
                    "ValidateActionsResultProbability is invalid, no action results:\n {0}", 
                    this));

            double sum = ActionsResult.Sum(a => a.Probability);
            if (Math.Abs(1 - sum) > epsilon)
                throw new ApplicationException(string.Format(
                    "ValidateActionsResultProbability is invalid:\n {0}", this));
        }

        public QActionResult PickActionByProbability()
        {
            double d = Rand.NextDouble();
            double sum = 0;
            foreach (QActionResult actionResult in ActionsResult)
            {
                sum += actionResult.Probability;
                if (d <= sum)
                    return actionResult;
            }
            
            // we might get here if sum probability is below 1.0 e.g. 0.99 
            // and the d random value is 0.999
            if (ActionsResult.Count > 0) 
                return ActionsResult.Last();

            throw new ApplicationException(string.Format("No PickAction result: {0}", this));
        }

        public override string ToString()
        {
            double sum = ActionsResult.Sum(a => a.Probability);
            return string.Format("ActionName {0} probability sum: {1} actionResultCount {2}",
                ActionName, sum, ActionsResult.Count);
        }
    }

    class QActionResult
    {
        public string StateName { get; internal set; }
        public string PrevStateName { get; internal set; }
        public double QValue { get; internal set; } // Q value is stored here        
        public double Probability { get; internal set; }
        public double Reward { get; internal set; }

        public double QEstimated
        {
            get { return QValue * Probability; }
        }

        public QActionResult(QAction action, string stateNameNext = null, 
            double probability = 1, double reward = 0)
        {
            PrevStateName = action.CurrentState;
            StateName = stateNameNext;
            Probability = probability;
            Reward = reward;
        }

        public override string ToString()
        {
            return string.Format("State {0}, Prob. {1}, Reward {2}, PrevState {3}, QE {4}",
                StateName, Probability.Pretty(), Reward, PrevStateName, QEstimated.Pretty());
        }
    }

    public class QActionName
    {
        public string From { get; private set; }
        public string To { get; private set; }

        public QActionName(string from, string to = null)
        {
            From = from;
            To = to;
        }

        public override string ToString()
        {
            return GetActionName();
        }

        public string GetActionName()
        {
            if (To == null) 
                return From;
            return QMethod.ActionNameFromTo(From,To);
        }
    }

    static class QMethod
    {
        public static void Log(string s)
        {
            Console.WriteLine(s);
        }

        public static readonly CultureInfo CultureEnUs = new CultureInfo("en-US");

        public static string ToStringEnUs(this double d)
        {
            return d.ToString("G", CultureEnUs);
        }
        public static string Pretty(this double d)
        {
            return ToStringEnUs(Math.Round(d, 2));
        }

        public static string ActionNameFromTo(string a, string b)
        {
            return string.Format("from_{0}_to_{1}", a, b);
        }

        public static string EnumToString<T>(this T type)
        {
            return Enum.GetName(typeof(T), type);
        }

        public static void ValidateRange(double d, string origin = null)
        {
            if (d < 0 || d > 1)
            {
                string s = origin ?? string.Empty;
                throw new ApplicationException(string.Format("ValidateRange error: {0} {1}", d, s));
            }
        }

        public static void Validate(QLearning q)
        {
            foreach (var state in q.States)
            {
                foreach (var action in state.Actions)
                {
                    action.ValidateActionsResultProbability();
                }
            }
        }
    }
}

Currently you have to know all the probability values in advance for the action outcomes, which are used for the Q-learning structure setup. The source code can easily be adapted if you need runtime knowledge of probability outcomes. Just add a probability function lookup and look at the Q-learning formula calculation area. Maybe I will look more into that at a later time.

Written by kunuk Nykjaer

January 14, 2012 at 7:53 pm

Facebook Friends Tag Cloud

leave a comment »

I thought it would be fun to make some kind of visualization using your Facebook friends. I decided to make a tag cloud of the facebook friends’ names.

You can try it yourself here. The styling part was made by using this tutorial: How to Style a Tag Cloud.


tag cloud example

Written by kunuk Nykjaer

December 4, 2011 at 10:31 pm

Posted in Visualization

Tagged with ,

Facebook login to 3rd party, how much data do you give away?

leave a comment »

Do you have a Facebook account and have you used it to login to other sites than Facebook?
Do you know how much data you give away about yourself by using the Facebook login?

You can test by using this page. This page requires you to accept logging in to the Jory web-application using your Facebook account. The application doesn’t require any extra permission, just the permission to accept the login.

The page will examine all the data that is available and display them in a table format. It will also extract information about your friends and display your friends in an image grid view format. You can mouse-over your friends and it will show you the names of your friends.

You can adjust your Facebook privacy settings and test again to see if you want to know whether you show too much private data about yourself.

Written by kunuk Nykjaer

December 4, 2011 at 10:19 pm

Posted in Visualization

Tagged with , ,

Social data exploration

with 2 comments

I will have fun playing with Social data. The idea is to make a site where users can login with a social profile such as Facebook, Google or Twitter account. After the users login the user data will be extracted using the social API. The purpose is to explore Social sites API and use them to make web-tools for easy data exploration, make data visualizations or maybe even knowledge discovery using Data mining techniques.

I will put the content on this site.

I will use the Socialauth-net login framework.

Written by kunuk Nykjaer

December 4, 2011 at 9:56 pm

Certified Web Applications Development with Microsoft .NET Framework 4

leave a comment »

I am now certified as Web Applications Development with Microsoft .NET Framework 4.

If you are going to take this certification I can recommend this book: MCTS Self-Paced Training Kit (Exam 70-515): Web Applications Development with Microsoft .NET Framework 4. But it lacks JQuery materials and you only learn MVC briefly. The certification exam contained a high amount on jQuery related questions imho (jQuery is not a MS-technology). For the exam preparation I recommend reading the book, learn jQuery and practice Asp.net MVC.

Written by kunuk Nykjaer

December 4, 2011 at 9:38 pm

Posted in Certification, Csharp

Tagged with , ,

Follow

Get every new post delivered to your Inbox.