I’ve started delving deeper into LLM, and personally, I find it much easier to dive myself through practice.
This way, one can grasp all the key concepts and outline a list of papers for further exploration.
I began with the StackLLaMA note: A hands-on guide to train LLaMA with RLHF

Here, you can immediately familiarize yourself with the concepts of Reinforcement Learning from Human Feedback, effective training with LoRA, PPO.
You’ll also get acquainted with the Hugging Face library zoo: accelerate, bitsandbytes, peft, and trl.
The note uses the StackExchange dataset, but for variety, I can recommend using the Anthropic/hh-rlhf dataset
In the second part, we’ll go through key papers.