Multimodal AI

Kimi K2.5: Visual Agentic Intelligence

KKimi TeamTTongtong BaiYYifan BaiYYiping BaoSS. H. CaiYYuan CaoYY. CharlesHH. S. CheCCheng ChenGGuanduo ChenHHuarong ChenJJia ChenJJiahao ChenJJianlong ChenJJun ChenKKefan ChenLLiang ChenRRuijue ChenXXinhao ChenYYanru ChenYYanxu ChenYYicun ChenYYimin ChenYYingjiang ChenYYuankun ChenYYujie ChenYYutian ChenZZhirong ChenZZiwei ChenDDazhi ChengMMinghan ChuJJialei CuiJJiaqi DengMMuxi DiaoHHao DingMMengfan DongMMengnan DongYYuxin DongYYuhao DongAAngang DuCChenzhuang DuDDikang DuLLingxiao DuYYulun DuYYu FanSShengjun FangQQiulin FengYYichen FengGGarimugai FuKKelin FuHHongcheng GaoTTong GaoYYuyao GeSShangyi GengCChengyang GongXXiaochen GongZZhuoma GongqueQQizheng GuXXinran GuYYicheng GuLLongyu GuanYYuanying GuoXXiaoru HaoWWeiran HeWWenyang HeYYunjia HeCChao HongHHao HuJJiaxi HuYYangyang HuZZhenxing HuKKe HuangRRuiyuan HuangWWeixiao HuangZZhiqi HuangTTao JiangZZhejun JiangXXinyi JinYYu JingGGuokun LaiAAidi LiCC. LiCCheng LiFFang LiGGuanghe LiGGuanyu LiHHaitao LiHHaoyang LiJJia LiJJingwei LiJJunxiong LiLLincan LiMMo LiWWeihong LiWWentao LiXXinhang LiXXinhao LiYYang LiYYanhao LiYYiwei LiYYuxiao LiZZhaowei LiZZheming LiWWeilong LiaoJJiawei LinXXiaohan LinZZhishan LinZZichao LinCCheng LiuCChenyu LiuHHongzhang LiuLLiang LiuSShaowei LiuSShudong LiuSShuran LiuTTianwei LiuTTianyu LiuWWeizhou LiuXXiangyan LiuYYangyang LiuYYanming LiuYYibo LiuYYuanxin LiuYYue LiuZZhengying LiuZZhongnuo LiuEEnzhe LuHHaoyu LuZZhiyuan LuJJunyu LuoTTongxu LuoYYashuo LuoLLong MaYYingwei MaSShaoguang MaoYYuan MeiXXin MenFFanqing MengZZhiyong MengYYibo MiaoMMinqing NiKKun OuyangSSiyuan PanBBo PangYYuchao QianRRuoyu QinZZeyu QinJJiezhong QiuBBowen QuZZeyu ShangYYoubo ShaoTTianxiao ShenZZhennan ShenJJuanfeng ShiLLidong ShiSShengyuan ShiFFeifan SongPPengwei SongTTianhui SongXXiaoxi SongHHongjin SuJJianlin SuZZhaochen SuLLin SuiJJinsong SunJJunyao SunTTongyu SunFFlood SungYYunpeng TaiCChuning TangHHeyi TangXXiaojuan TangZZhengyang TangJJiawen TaoSShiyuan TengCChaoran TianPPengfei TianAAo WangBBowen WangCChensi WangCChuang WangCCongcong WangDDingkun WangDDinglu WangDDongliang WangFFeng WangHHailong WangHHaiming WangHHengzhi WangHHuaqing WangHHui WangJJiahao WangJJinhong WangJJiuzheng WangKKaixin WangLLinian WangQQibin WangSShengjie WangSShuyi WangSSi WangWWei WangXXiaochen WangXXinyuan WangYYao WangYYejie WangYYipu WangYYiqin WangYYucheng WangYYuzhi WangZZhaoji WangZZhaowei WangZZhengtao WangZZhexu WangZZihan WangZZizhe WangCChu WeiMMing WeiCChuan WenZZichen WenCChengjie WuHHaoning WuJJunyan WuRRucong WuWWenhao WuYYuefeng WuYYuhao WuYYuxin WuZZijian WuCChenjun XiaoJJin XieXXiaotong XieYYuchong XieYYifei XinBBowei XingBBoyu XuJJianfan XuJJing XuJJinjing XuLL. H. XuLLin XuSSuting XuWWeixin XuXXinbo XuXXinran XuYYangchuan XuYYichang XuYYuemeng XuZZelai XuZZiyao XuJJunjie YanYYuzi YanGGuangyao YangHHao YangJJunwei YangKKai YangNNingyuan YangRRuihan YangXXiaofei YangXXinlong YangYYing YangYYi YangYYi YangZZhen YangZZhilin YangZZonghan YangHHaotian YaoDDan YeWWenjie YeZZhuorui YeBBohong YinCChengzhen YuLLonghui YuTTao YuTTianxiang YuEEnming YuanMMengjie YuanXXiaokun YuanYYang YueWWeihao ZengDDunyuan ZhaHHaobing ZhanDDehao ZhangHHao ZhangJJin ZhangPPuqi ZhangQQiao ZhangRRui ZhangXXiaobin ZhangYY. ZhangYYadong ZhangYYangkun ZhangYYichi ZhangYYizhi ZhangYYongting ZhangYYu ZhangYYushun ZhangYYutao ZhangYYutong ZhangZZheng ZhangCChenguang ZhaoFFeifan ZhaoJJinxiang ZhaoSShuai ZhaoXXiangyu ZhaoYYikai ZhaoZZijia ZhaoHHuabin ZhengRRuihan ZhengSShaojie ZhengTTengyang ZhengJJunfeng ZhongLLongguang ZhongWWeiming ZhongMM. ZhouRRunjie ZhouXXinyu ZhouZZaida ZhouJJinguo ZhuLLiya ZhuXXinhao ZhuYYuxuan ZhuZZhen ZhuJJingze ZhuangWWeiyu ZhuangYYing ZouXXinxing Zu
Published
February 2, 2026
Authors
326

Abstract

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to 4.5times over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.

Keywords

multimodal agentic modeljoint text-vision pre-trainingzero-vision SFTjoint text-vision reinforcement learningAgent Swarmself-directed parallel agent orchestration frameworkheterogeneous sub-problems

More in Multimodal AI

View all
Kimi K2.5: Visual Agentic Intelligence | Paperchime