Voyager LLM paper notes
by Shreyas Srivastava
Skimmed Voyager paper which is the first embodied LLM agent which can play minecraft and learn new skills.
The key components of the system are
- Automated curriculum development using LLM(prompting GPT to ask for next best skill to learn based on the past experience), The key is to pick tasks which are not too easy(wasting bandwidth on tasks already mastered by tacking related tasks) nor too difficult (may not have enough resource or knowledge built up to execute said task). GPT is able to suggest this automated curriculum based on internet scale knowledge of minecraft. The paper shows that Automated curriculum is a key reason for the success of exploration.
- Iterative prompting mechanism where the GPT outputs the code(as opposed to motor actions) for learning a new skill and is able to repair the code based on the feedback from the minecraft code interpreter environment (i.e. errors or not being able to perform certain action based on game state).
- Skills library: This is essentially an embedding search which is keyed on text - skills type, description of skill(generated by GPT). The skill library retrieves the associated code with the skill which can help in learning related skills in the future and reduce the iteration. The LLM agent first queries the skill library and the output is passed in context into the LLM. Without the use of skill library the model plateaus much earlier and is able to learn fewer skills
- Skill completion validation: In order to establish that the skill has been acquired successfully a separate LLM is used which Example prompt: Inventory (8/36): {‘oak_planks’: 5, ‘cobblestone’: 2, ‘porkchop’: 2, ‘wooden_sword’: 1, ‘coal’: 5, ‘wooden_pickaxe’: 1, ‘oak_log’: 3, ‘dirt’: 9} Task: Mine 5 coal ores
LLM output: Reasoning: Mining coal_ore in Minecraft will get coal. You have 5 coal in your inventory. Success: True
tags: