Data Science, ML and Analytics Engineering

Trending Articles on Large Language Models

Google DeepMind has developed a multi-pass online approach using reinforcement learning to enhance the self-correction capabilities of large language models (LLMs).

Self-Correction in LLMs

It has been shown that supervised fine-tuning (SFT) is ineffective for learning self-correction and faces a mismatch between training data and the model’s responses. To address this issue, a two-stage approach is proposed, which first optimizes self-correction behavior and then uses an additional reward to reinforce self-correction during training. This method relies entirely on data generated by the model itself.

When applied to the Gemini 1.0 Pro and 1.5 Flash models, it achieved record-breaking self-correction performance, improving the baseline models by 15.6% and 9.1%, respectively, in MATH and HumanEval tests.

Read more

All the Latest in the World of LLM

Over the past month, there have been some very interesting and significant events in the world of Large Language Models (LLM).

Major companies have released fresh versions of their models. First, Google launched two new models, Gemini: Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002.

Key Features:

  • More than a 50% price reduction for the 1.5 Pro version
  • Results are delivered twice as fast with three times lower latency

The main focus has been on improving performance and speed and reducing costs for models intended for industrial-grade systems.

Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002.

Details here

Read more

Key Trends in LLM Reasoning Development

In these notes, I’d like to highlight the latest trends and research in reasoning and new prompting techniques that improve output.

Simply put, reasoning is the process of multi-step thinking, where several consecutive steps of reflection are performed, with each step depending on the previous one.

It may seem that Reasoning and Chain of Thought (CoT) are the same thing. They are related but represent different concepts.

Reasoning is a general concept of thinking and making inferences. It encompasses any forms of reflection and conclusions. Chain of Thought is a specific technique used to improve reasoning by adding intermediate steps to help the model clearly express its thoughts and reach more accurate solutions.

Read more

Deep dive into LLM Part Two

In the first part we discussed the practical part of deep dive into LLM.

In this part we will talk about key papers that will help in understanding LLM and passing interviews =) But more on that later.

It all starts with the first GPT

Then I recommend reading the paper about InstructGPT. The topic of training with feedback from a person is discussed there.

Then there are a couple of interesting papers:
SELF-INSTRUCT
Information Retrieval with Contrastive Learning

Then I recommend that you familiarize yourself with two truly iconic papers: LORA and QLORA, which solve the following problems:
– learning speed
– computing resources
– memory efficiency

Two more equally important paperpers are PPO and DPO. Understanding these works will help in reward modeling.

And finally:
Switch Transformers  – as a base Mixtures of experts
Mixtral of Experts  – as Open Source SOTA
Llama 2

Happy reading everyone

Deep dive into LLM Part One

I’ve started delving deeper into LLM, and personally, I find it much easier to dive myself through practice.

This way, one can grasp all the key concepts and outline a list of papers for further exploration.

I began with the StackLLaMA note: A hands-on guide to train LLaMA with RLHF



Here, you can immediately familiarize yourself with the concepts of Reinforcement Learning from Human Feedback, effective training with LoRA, PPO.

You’ll also get acquainted with the Hugging Face library zoo: accelerate, bitsandbytes, peft, and trl.

The note uses the StackExchange dataset, but for variety, I can recommend using the Anthropic/hh-rlhf dataset

In the second part, we’ll go through key papers.