一个用于数学应用题的统一M树自修正求解器

马志远; 刘嘉聿; 黄振亚

doi:10.52396/JUSTC-2023-0145

一个用于数学应用题的统一M树自修正求解器

A unified M-tree self-correction solver for math word problems

摘要

摘要: 自动解答数学应用题一直是人工智能领域的一大挑战。传统的解答器在构建数学表达式时，往往采用序列或二叉树的形式。然而，这些方法存在两大问题：一是这些依赖特定结构的模型在推理过程中通常固守一定的顺序（例如，从左到右），这限制了其处理问题的灵活性，并增加了出错的风险；二是传统模型在一次性生成解答的过程中，依赖于自回归推理，这可能会导致累积微小错误（如数学符号错误），从而降低了解答的准确性。为了应对这些问题，我们借鉴了人类的“反复推敲”过程，提出了一种称为统一M-tree自我修正解答器（UTSC-Solver）的新方法。这种方法通过迭代推理和自我修正机制来实现。首先，我们采用迭代且非自回归的过程生成数学表达式，从而摆脱了固定的生成顺序，使其能够更有效地处理复杂和多样的问题。其次，我们设计了一种基于生成器和验证器交替执行的自我修正机制，该机制能够迭代地检测和纠正生成表达式中的错误，并利用前一次迭代的信息来指导后续的推理过程。实验结果显示，我们的UTSC-Solver在两个主流数据集上的准确性超过了传统模型，并且提高了数学推理的可解释性。

Abstract: Automatically answer math word problems is a challenging task in artificial intelligence. Previous solvers constructed mathematical expressions in sequence or binary tree. However, these approaches may suffer from the following issues: Models relying on such structures exhibit fixed-order reasoning (e.g., left-to-right), limiting flexibility and increasing error susceptibility; prior models rely on autoregressive reasoning in a single pass, accumulating minor errors (e.g., incorrect math symbols) during generation, resulting in reduced accuracy. To address the above issues, we emulate the human “check and modify” process in reasoning and propose a unified M-tree self-correction solver (UTSC-Solver) by iterative inference with self-correction mechanism. First, we use an iterative, non-autoregressive process for generating mathematical expressions, free from fixed generation orders to handle complex and diverse problems. Additionally, we design a self-correction mechanism based on alternating execution between a generator and a discriminator. This module iteratively detects and rectifies errors in generated expressions, leveraging previous iteration information for subsequent generation guidance. Experimental results show that our UTSC-Solver outperforms traditional models in accuracy on two popular datasets, while it improves the interpretability of mathematical reasoning.

HTML全文

参考文献(52)

施引文献

资源附件(1)