Reinforcement Learning Methods Research in Low Resource Languages

Applied RLHF techniques (GRPO, PPO) to optimize Large Language Models for low-resource languages, outperforming Supervised Fine-Tuning (SFT) and standard baselines.

  • Ranked 4th out of 35 research projects after developing an automated LLM-judge evaluation framework to determine the most effective reinforcement learning strategies.