We cordially invite you to the colloquium of Prof Gill Barequet!
On May 17, 2024, from 10:00 to 11:00 a.m, Ankit Anand, Senior Research Scientist bei Google DeepMind Montreal einen Research Talk (in englischer Sprache) halten.
The research talk is dedicated to the topic "Taking reinforcment learning(RL) to formal world of theorem proving and bringing formal world (code) back to RL".
Abstract:
AI has made great strides in multiple domains including robotics, game playing, biology and climate science etc.. In this talk, we will describe some of our recent attempts to use AI in formal world of theorem proving and how we also brought back formal methods for improvements in world of reinforcement learning using LLMs.
Firstly, I will describe how we adapted the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proofs. We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40. The proofs generated by our algorithm are also almost always significantly shorter than E’s proofs.
Secondly, I will describe our recent work "Code-As-Reward" on how we brought the formal representation to the world of reinforcement learning. Specifically, one key challenge in applying RL to any problem is reward specification. In this work, we use pre-trained large Vision Language models to produce interpretable dense reward functions as code blocks. This hinges on the strong capability of these large models to produce interpretable code. It not only brings formal specification to reward functions but is also significantly cheap as these code blocks are generated with very few calls to VLMs and can be subsequently used in multiple tasks.
The event is free of charge. Interested individuals are warmly invited to attend!