結果 : convergence of q learning with linear function approximation