Posts Tagged ‘probability’
Probability of getting certified by guessing
I am curious about the probability of passing a certification exam by guessing.
Therefore I made a quick implementation to calculate some probabilities.
In this example I will use
Website .NET Developer Certification for Sitecore CMS as the test experiment.
The exam is in multiple-choice format with 3 possible answers.
In total there are 40 questions and you have to get at least 28 to pass the exam.
You have 55 minutes to answer all the questions (about 1 minute per answer).
Almost always two possible answers look very similar and one is usually easier to spot as a wrong candidate from the other two.
The math example of how to calculate this kind of probabilities can be read here.
The probability of passing the exam by only guessing AtLeastProbability(0.33, 40, 28) ~ 0%; The probability of passing the exam by knowing 10 answers and guessing the rest minimum 18 correct AtLeastProbability(0.33, 30, 18) ~ 0%; The probability of passing the exam by knowing 20 answers and guessing the rest minimum 8 correct AtLeastProbability(0.33, 20, 8) ~ 34%; The probability of passing the exam by knowing 22 answers and guessing the rest minimum 6 correct AtLeastProbability(0.33, 18, 6) ~ 59%;
As you can see, even if you know 22 correct answers out of 40 then the chance of passing is only about 59% if you guess the rest of the questions where you need to guess 6 correct. That is not promising if your strategy is to pass by guessing.
Now let’s assume you will be able to spot one wrong candidate and are left with 50% of guessing correct. The calculations are now:
The probability of passing the exam by only guessing AtLeastProbability(0.5, 40, 28) ~ 1%; The probability of passing the exam by knowing 10 answers and guessing the rest minimum 18 correct AtLeastProbability(0.5, 30, 18) ~ 18%; The probability of passing the exam by knowing 20 answers and guessing the rest minimum 8 correct AtLeastProbability(0.5, 20, 8) ~ 87%; The probability of passing the exam by knowing 22 answers and guessing the rest minimum 6 correct AtLeastProbability(0.5, 18, 6) ~ 95%;
If you are able to spot 1 wrong candidate, then you should at least know 22 correct answers out of 28 to pass the exam.
The strategy of passing a certification exam by guessing seems like a failing strategy.
Out of the box this program will calculate the probability of having at least 3 successes out of 5 where the success probability is 33%.
AtLeastProbability(0.33, 5, 3) ~ 21%;
Program.cs
using System;
using System.Collections.Generic;
using System.Diagnostics;
/// <summary>
/// Author: Kunuk Nykjaer
/// Binomial probability
/// At least x success out of n tries with probability p.
/// </summary>
public class Program
{
public static void Main(string[] args)
{
const double T = 1.0 / 3; // odds of success
const int knownCorrect = 0;
long n = 5; // total of chances
long atleast = 3; // must have at least this number of success
n -= knownCorrect;
atleast -= knownCorrect;
AtLeastProbability(T, n, atleast);
}
// http://www.regentsprep.org/Regents/math/algtrig/ATS7/BLesson2.htm
static void AtLeastProbability(double T, long n, long atleast)
{
if (T < 0 || T > 1 || n < 0 || atleast < 0 || atleast > n)
throw new ApplicationException("AtLeastProbability invalid");
double F = 1.0 - T; // odds of failure
Console.WriteLine(string.Format("AtLeastProbability({0}, {1}, {2}, {3})",
Math.Round(T,2), Math.Round(F,2), n, atleast));
var stopwatch = new System.Diagnostics.Stopwatch();
stopwatch.Start();
double odds = 0;
for (long r = atleast; r <= n; r++)
odds += C(n, r) * Math.Pow(T, r) * Math.Pow(F, n - r);
stopwatch.Stop();
odds = Math.Round(odds * 100, 3); // %
Console.WriteLine(string.Format(
"odds: {0}%\nTime elapsed msec: {1}", odds, stopwatch.ElapsedMilliseconds));
Console.WriteLine("press a key to exit"); Console.ReadKey();
}
// Binomial
// using tricks to avoid too big numbers in factorisation,
// runtime might be slow for really big numbers
// nCr see http://en.wikipedia.org/wiki/Binomial_probability
static long C(long n, long r)
{
if (r < 0 || n < r)
throw new ApplicationException("C(n,r) invalid n,r");
if (n <= 1) return 1;
long bigr = n - r;
if (bigr < r)
bigr = r;
var uppers = new System.Collections.Generic.List<long>();
var lowers = new System.Collections.Generic.List<long>();
for (long i = n; i > bigr; i--)
uppers.Add(i);
for (long i = n - bigr; i > 1; i--)
lowers.Add(i);
// O(n^2)
// trick to reduce big numbers, datatype long has a limit
for (int i = 0; i < uppers.Count; i++)
{
for (int j = 0; j < lowers.Count; j++)
{
long u = uppers[i];
long l = lowers[j];
long gcd = Gcd(u, l);
if (gcd <= 1) continue;
// reduce number value
uppers[i] = u / gcd;
lowers[j] = l / gcd;
}
}
long upper = 1;
long lower = 1;
foreach (var i in lowers) lower *= i;
foreach (var i in uppers) upper *= i;
if (upper < 0 || lower < 0)
//long datatype has a limit
throw new ApplicationException("C(n,r) negative, datatype limit issue");
long result = upper / lower;
return result;
}
// greatest common divisor
public static long Gcd(long a, long b)
{
return b == 0 ? a : Gcd(b, a % b);
}
}
Lotto probability Monte Carlo method with Java
Monte Carlo is a method of approximating the probability value by trying a lot of times. You can read more about Monte Carlo at wiki here link.
The program example is:
Simple Lotto simulation with n possible numbers and m winner numbers where you have to pick m numbers.
The data structure strategy is to use primitive type int and arrays for reducing memory usage. This may not have the fastest running time, but the memory usage is kept as small as possible for large calculation samples.
This example pick a random winning numbers (ordering not important). Then it tries to select a lot of random coupons and compare the probability of how many correct numbers you select from your winning numbers.
Lotto.java
import java.text.*;
import java.util.*;
/**
*
* @author Kunuk Nykjaer
* Lotto simulation Monte Carlo
* Simple lotto with possible range of n numbers and m range of winner numbers
*
*/
public class Lotto {
private final NumberFormat nf = new DecimalFormat("#.#####");
public static void main(String[] args) {
long begin = System.currentTimeMillis();
int coupons = 5000000;
int numberRange = 36; // 1..numberRange
int pickNumbers = 6;
new Lotto().playLotto(coupons, numberRange, pickNumbers);
long end = System.currentTimeMillis();
System.out.println("Time: " + (end - begin) / 1000.0 + " sec.");
}
/**
*
* Lotto simulation (combination of numbers not same) Check to see how often I win
* Pick random winner numbers, play with n coupons and use random numbers for each coupon
* First index is the winner numbers
*/
private int[][] getResult(int numbersOfCoupons, int numbersRange, int numbersOfuse) {
Random rand = new Random();
int[][] lottos = new int[numbersOfCoupons+1][numbersOfuse];
// O(numbersOfCoupons * k)
for (int coupon = 0; coupon < numbersOfCoupons+1; coupon++) {
int size = numbersRange;
int[] possibleNumbers = new int[size];
int[] lottoNumbers = new int[numbersOfuse];
// O(numbersRange * k)
// store possible numbers for random selection
for (int i = 0; i < possibleNumbers.length; i++)
possibleNumbers[i] = i + 1;
// O(numbersOfuse * k)
// pick random numbersOfuse numbers not same
for (int i = 0; i < lottoNumbers.length; i++) {
int last = size - 1;
int next = rand.nextInt(size);
int temp = possibleNumbers[last];
int value = possibleNumbers[next];
possibleNumbers[next] = temp;
lottoNumbers[i] = value;
size--;
}
lottos[coupon] = lottoNumbers;
}
return lottos;
}
/**
* buy n coupons with random numbers
* lotto is possible numbers 1..n
* I pick x random numbers where there are x winner numbers
*/
public void playLotto(int coupons, int n, int pickSize)
{
HashSet<Integer> winnerNumbersLookup = new HashSet<Integer>();
int[][] lotto = getResult(coupons, n, pickSize);
int[] winnerNumbers = lotto[0];
for (int i = 0; i < winnerNumbers.length; i++) {
winnerNumbersLookup.add( winnerNumbers[i] );
}
int[] countCorrect = new int[pickSize+1];
for (int i = 1; i < lotto.length; i++) {
int numbersCorrect = 0;
int[] coupon = lotto[i];
for (int j = 0; j < coupon.length; j++) {
int number = coupon[j];
if( winnerNumbersLookup.contains(number))
numbersCorrect++;
}
countCorrect[numbersCorrect]++;
}
// Print result
System.out.print("Winner numbers are: ");
Arrays.sort(winnerNumbers);
for (int i = 0; i < winnerNumbers.length; i++) {
System.out.print(winnerNumbers[i]+ " ");
}
System.out.println("\nNumber range is 1.."+n);
System.out.println("PickSize is "+pickSize);
System.out.println("Correct hit count with coupons "+coupons);
for (int i = 0; i < countCorrect.length; i++) {
System.out.println(i+" hit: " +countCorrect[i] + " ~ "+nf.format(100*(countCorrect[i]/(double)coupons))+ "%");
}
}
}
Result:
Winner numbers are: 2 7 11 12 14 33
Number range is 1..36
PickSize is 6
Correct hit count with coupons 5000000
0 hit: 1522834 ~ 30,45668%
1 hit: 2195431 ~ 43,90862%
2 hit: 1056309 ~ 21,12618%
3 hit: 208328 ~ 4,16656%
4 hit: 16653 ~ 0,33306%
5 hit: 441 ~ 0,00882%
6 hit: 4 ~ 0,00008%
Time: 21.63 sec.
The result of the Monte Carlo method (experience based) can be compared with the Math using hyper-geometric distribution (theory based).
I made a probability tool in Javascript a long time ago which can calculate hyper-geometric distribution. By comparing the probability values it show the values are more or less equal.
Heres for 0 hit link for mathematical calculating 0 hit
The value is 0.3048451785406245 which is pretty close to the Monte Carlo method.