Socratic Tutoring LLM via Multi-Stage Policy Optimization
Fine-tuning open-source LLMs (SFT, Offline DPO, Online GRPO) to align them as Socratic tutors that guide students without prematurely leaking answers.
Fine-tuning open-source LLMs (SFT, Offline DPO, Online GRPO) to align them as Socratic tutors that guide students without prematurely leaking answers.
An internal RAG chatbot powered by a local LLM for confidential data handling, designed to assist employees with personalized services.
Won the User Experience Award at the Vakıfbank Hack to the Future Hackathon. A hub for organizational memory with local document and voice memo querying.
Won the 2025 Cherry New Grad Hackathon. An interactive map-based interface to help consumers find merchants offering Cherry’s services.
Applied RLHF techniques (GRPO, PPO) to optimize LLMs for low-resource languages, outperforming SFT and standard baselines. Ranked 4th/35.