結果 : what is the first step in developing a q learning algorithm