結果 : q learning algorithm solved example