Gdpo
My internship work Group Diffusion Policy Optimization (GDPO) is now available on arxiv! It proposes a principled RL method for diffusion language models!
My internship work Group Diffusion Policy Optimization (GDPO) is now available on arxiv! It proposes a principled RL method for diffusion language models!