Natural Language Generation as Inverse Reinforcement Learning with Neural Machine Translation

Modern robotics applications that involve human-robot interaction require robots to be able to communicate with humans seamlessly and effectively. Natural language provides a flexible and efficient medium through which robots can exchange information with their human partners. Significant advancements have been made in developing robots capable of interpreting free-form instructions, but less attention has been devoted to endowing robots with the ability to generate natural language. We propose a model that enables robots to generate natural language instructions that allow humans to navigate a priori unknown environments. We first decide which information to share with the user according to their preferences, using a policy trained from human demonstrations via inverse reinforcement learning. We then “translate” this information into a natural language instruction using a neural sequence-to-sequence model that learns to generate free-form instructions from natural language corpora. We evaluate our method on a benchmark route instruction dataset and achieve a BLEU score of 72.18% compared to human-generated reference instructions. We additionally conduct navigation experiments with human participants demonstrating that our method generates instructions that people follow as accurately and easily as those produced by humans.


We formulate the problem as two sub-problems, namely, Content Selection and Surface Realization. Content Selection is the problem of deciding how much and which information to share with the user. Surface Realization is the problem of deciding how to convey the information previously selected.


Qualitative Evaluation:

Our Surface Realization module achieved a sentence-level BLUE4-score of 74.67% on the test set.

Quantitative Evaluation:

We asked 42 participants on Amazon Mechanical Turk to navigate a three-dimensional virtual environment according to a provided route instruction. The route instructions were randomly sampled from those generated using our method and those provided by humans as part of the SAIL corpus. No participants experienced the same scenario with both human annotated and machine-generated instructions. We evaluate the accuracy with which human participants followed the natural language instructions in terms of the Manhattan distance between the desired destination and the participant’s location when s/he finished the scenario. Results shown in Figure 1.

Figure 1: Participants’ distances from the goal.

The participants were presented with a survey consisting of eight questions, three requesting demographic information and five requesting feedback on their experience and the quality of the instructions that they followed (Figures 2 to 6).

Figure 2: How would you evaluate the task in terms of difficulty?

Figure 3: How many times did you have to backtrack?

Figure 4: Who do you think generated the instructions?

Figure 5: How would you define the amount of information provided by the instructions?

Figure 6: How confident are you that you followed the desired path?


# File Size Type Actions
1 SAIL-original.pickle 977.8 KB Python pickle v2 Download  ]
2 SAIL-augmented.pickle 457.2 MB Python pickle v2 Download  ]
3 CAS-specifications.pdf 461 KB PDF View  ]

Files hosted on at


Note:  The code for the model will be released soon!

Code hosted on at


Note:   If you use our code/data, please cite the following publication:

Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation.
Andrea F. Daniele, Mohit Bansal, and Matthew R. Walter.
In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2017, Vienna, Austria.

[ pdf, bibtex, abstract ]