Multi-armed Bandit problem

I am interested in reinforcement learning. It is difficult for me. @_@ I tried to implement very simple and famous problem called 'multi-armed bandit'. Image from wikipedia.. The multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their