Towards Automatic Code Reproduction for Scientific Papers: Benchmarks and Methodologies
Invited Talk, Meta, LLaMA Community Meet-up, London, United Kingdom
I presented our latest work on SciReplicate-Bench and shared methodologies for building agentic LLM systems that can reliably reproduce code from scientific publications. The talk covered benchmarking strategies, memory management, and tooling considerations for research automation.
