Large Language Models

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

JJingwen XuYYiyang LuZZisu HuangCChangze LvXXiaohua WangSShizheng LiZZhibo XuZZhengkang GuoZZhengyuan WangMMuzhao TianXXuanjing HuangXXiaoqing Zheng
Published
January 30, 2026
Authors
12
Word Count
7,635
Code
Includes code

BatCoder: Self-supervised code-documentation generation via back-translation.

Abstract

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.

Key Takeaways

  • 1

    BatCoder uses self-supervised back-translation for bidirectional learning.

  • 2

    Eliminates need for high-quality paired code-documentation data.

  • 3

    Optimizes code generation and documentation production simultaneously.

Limitations

  • Requires structural validity filtering for generated documentation.

  • Dependent on code similarity metrics for optimization.

Keywords

self-supervised reinforcement learningback-translationcode generationdocumentation productionsemantic similarityimplicit rewardreinforcement learningpass@1model capacity

More in Large Language Models

View all
BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation | Paperchime