結果 : which machine learning algorithm training method is based on rewards and punishments