Use the on-policy prediction method Sarsa (State–Action–Reward–State–Action) to estimate the player performance in Ice Hockey
A Neural-Programmer-Interpreter hybridly trained in strong-supervision mode and reinforcement-learning mode
Use Dual Learning on monolingual data for Neural-Machine-Translation task