I’ve started delving deeper into LLM, and personally, I find it much easier to dive myself through practice.
This way, one can grasp all the key concepts and outline a list of papers for further exploration.
I began with the StackLLaMA note: A hands-on guide to train LLaMA with RLHF
![](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%201%201'%3E%3C/svg%3E)
Here, you can immediately familiarize yourself with the concepts of Reinforcement Learning from Human Feedback, effective training with LoRA, PPO.
You’ll also get acquainted with the Hugging Face library zoo: accelerate, bitsandbytes, peft, and trl.
The note uses the StackExchange dataset, but for variety, I can recommend using the Anthropic/hh-rlhf dataset
In the second part, we’ll go through key papers.