发布时间:2024-07-01
On June 13th, the ICML 2024 Challenges on Automated Math Reasoning, jointly organized by the top international conference in the field of machine learning, ICML, and multiple enterprises and universities, came to a successful conclusion. Under the guidance of Associate Professor Zhao Hongke from the College of Management and Economics (hereafter called CoME), Wang Minmao, a 2021 undergraduate student in Information Management and Information Systems, won the global runner up in the field of automated theorem generation and proof as the first team member. This achievement was jointly completed by teachers and students from CoME of Tianjin University (hereafter called TJU) and the State Key Laboratory of Cognitive Intelligence.
International Conference on Machine Learning (hereafter called ICML) is the most prestigious and recognized academic conference in the field of machine learning and artificial intelligence worldwide, and is recommended as an A-class conference by the Chinese Computer Federation (hereafter called CCF). The purpose of this ICML 2024 Automated Math Reasoning Competition is to explore and promote the application of artificial intelligence technology in various mathematical reasoning tasks. This competition is jointly organized by many top enterprises and universities, including Google, Huawei, Oxford University, Cambridge University, Carnegie Mellon University, Peking University, and University of California, Los Angeles, to explore the prospects and challenges of AI technology in various mathematical reasoning and problem-solving tasks.
Mathematical reasoning is the most challenging and profound part of human intelligence. In the development process of mathematical reasoning, humans have summarized various formal languages that can strictly describe mathematical problems and proof processes. In recent years, machine learning algorithms and large-scale language models are gradually approaching or even surpassing human performance in some mathematical reasoning. So, how should our team develop AI mathematical reasoning in the next step, making it the strongest aid for humans to break through unknown mathematical fields? The task of this challenge is to correctly generate new theorems for the model, given a set of axioms and symbols, in order to simplify the proof path.
In this challenge, the team used the Automatic Theorem Generation dataset, which is developed based on the open-source theorem proving library "set.mm" and contains approximately 38,000 human theorems written in the Metamath formal language. This dataset mainly focuses on up to 2,000 propositional theorems, and by giving a set of axioms and symbols, new provable theorems need to be inferred. During the evaluation process, the team used random problem samples from "set.mm" to measure the practicality of the generation theorem.
Scheme Design
Overall flowchart
Prompt template diagram
In this challenge, the team used GPT-3.5 as the inference model. Due to the lack of clear labels for this task, it is not possible to directly test the model's performance locally. To this end, the team first carefully designed the Prompt, compared the Prompts of different templates, and after running them in the cloud, obtained the most effective Prompt. Input 15 randomly selected theorems (tested by sensitivity analysis) as a set of inference theorems in Prompt to maximize the normalization of the model output. Next, the team will run the model multiple times in the cloud, and based on the cloud results, obtain some excellent new theorems, which will be added to the new theorem set. Subsequently, a theorem derivation graph is constructed locally, with each node representing a theorem, and the degree centrality of each node is calculated and sorted. The highest degree theorems are added to the set of highly repetitive theorems. Afterwards, they ran the model again in the cloud and inputted these highly repetitive theorems into Prompt to enhance the inference effect. By using parallel processing methods, a large number of theorems can be generated in a short period of time, and a theorem verification module can be set up to improve the pass rate of model generated theorems.
This competition attracted 184 teams from top universities and technology companies around the world, demonstrating the high attention and active participation of the global research community in the application of AI in the field of mathematics. It is the best performance achieved by CoME of TJU as the first participating unit in the top academic conference competition in the field of AI. In recent years, CoME has strongly supported scientific research and talent cultivation in the field of artificial intelligence. The achievement of this result is attributed to the solid foundation established by the college in talent cultivation in the field of AI.