Reinforcement Learning Methods Research in Low Resource Languages
Applied RLHF techniques (GRPO, PPO) to optimize Large Language Models for low-resource languages, outperforming Supervised Fine-Tuning (SFT) and standard baselines.
- Ranked 4th out of 35 research projects after developing an automated LLM-judge evaluation framework to determine the most effective reinforcement learning strategies.
