14. Mai 2024

Kolloquium: Research Talk von Dr. Ankit Anand Kolloquium: Research Talk von Dr. Ankit Anand am 17. Mai

Kolloquium: Research Talk von Dr. Ankit Anand © Unsplash

Alle Bilder in Originalgröße herunterladen Der Abdruck im Zusammenhang mit der Nachricht ist kostenlos, dabei ist der angegebene Bildautor zu nennen.

Wir laden herzlich zum Kolloquium von Dr. Ankit Anand ein!

Am 17. Mai 2024, von 10:00 bis 11:00 Uhr, wird Dr. Ankit Anand, Senior Research Scientist bei Google DeepMind Montreal einen Research Talk (in englischer Sprache) halten.

Der Research Talk widmet sich dem Thema „Taking reinforcment learning(RL) to formal world of theorem proving and bringing formal world (code) back to RL“.

Abstract:

AI has made great strides in multiple domains including robotics, game playing, biology and climate science etc.. In this talk, we will describe some of our recent attempts to use AI in formal world of theorem proving and how we also brought back formal methods for improvements in world of reinforcement learning using LLMs.

Firstly, I will describe how we adapted the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proofs. We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40. The proofs generated by our algorithm are also almost always significantly shorter than E’s proofs.

Secondly, I will describe our recent work "Code-As-Reward" on how we brought the formal representation to the world of reinforcement learning. Specifically, one key challenge in applying RL to any problem is reward specification. In this work, we use pre-trained large Vision Language models to produce interpretable dense reward functions as code blocks. This hinges on the strong capability of these large models to produce interpretable code. It not only brings formal specification to reward functions but is also significantly cheap as these code blocks are generated with very few calls to VLMs and can be subsequently used in multiple tasks.

Die Veranstaltung ist kostenfrei. Interessierte sind herzlich eingeladen, teilzunehmen!

Weitere Informationen

Wann? Freitag, 17. Mai 2024, von 10:00 bis 11:00 Uhr

Wo? Institut für Informatik, Friedrich-Hirzebruch-Allee 5, 53113 Bonn, Raum 0.016

Kontakt

Michaela Musselmann
Institut für Informatik
Universität Bonn
Tel.: +49 228 73-4502
E-Mail: musselmann@iai.uni-bonn.de

Urheberrechte

Universität Bonn