LLM - Renat Alimbekov's personal blog

Open LLM Models GPT-OSS from OpenAI and More

August 2025 was marked by the release of several significant updates and completely new models in the field of artificial intelligence that promise to substantially change the AI landscape. Anthropic, Google DeepMind, and OpenAI presented their latest achievements, demonstrating progress in agentic tasks, world generation, and open language models. Let’s examine these releases.

Open GPT-OSS Models from OpenAI

OpenAI has finally opened their models by releasing GPT-OSS – a family of open models designed for powerful reasoning, agentic tasks, and universal use cases for developers. This series includes two models:

gpt-oss-120b: A large model with 117 billion total parameters (and 5.1 billion active), designed for production, general, and high-reasoning use cases that fits on a single H100 GPU (80 GB).
gpt-oss-20b: A smaller model with 21 billion total parameters (and 3.6 billion active), designed for latency reduction, local or specialized use, operating within 16 GB of memory, perfectly suited for consumer hardware.

What is an llms.txt file? Structure of llms.txt file

Llms.txt is a special text file that allows websites to be understood more effectively by artificial intelligence systems and large language models. The file is placed in the root directory of the website and helps AI systems like ChatGPT, Google Gemini, Claude, and Perplexity process content more accurately.

Origin and Purpose

The llms.txt format was proposed by Jeremy Howard in September 2024 as a solution to the problem of HTML structure complexity for AI systems. Web content often contains complex structures, navigation menus, advertisements, and JavaScript, which makes it difficult for language models to understand the content.

Retrieval-Augmented Generation (RAG): Recent Research and Challenges

In today’s AI-driven world, Retrieval-Augmented Generation (RAG) is becoming an increasingly significant approach that combines the capabilities of information retrieval with the generative abilities of large language models (LLMs). This overcomes a number of limitations faced by traditional LLMs and provides more accurate and fact-based answers.

What is RAG?

RAG is not a single technology, but an entire umbrella of different components, designs, and domain-specific adaptations. A typical RAG system includes:

A data ingestion component: where data is processed, embedded, and stored as context documents in a vector database
A retrieval component: where context documents are retrieved and ranked for relevance to the query
Query component: where the prompt with the query is combined with the search results and sent to the LLM

How to Speed Up LLMs and Reduce Costs: Edge Models

RouteLLM reduces your costs for using LLMs by 3.6 times.

It chooses whether to use a strong or weak LLM model depending on the complexity of the user’s query. This optimizes the balance between cost and quality of the response.

The Python library allows you to use this approach directly.

import os
from routellm.controller import Controller

os.environ["OPENAI_API_KEY"] = "sk-XXXXXX"
# Replace with your model provider, we use Anyscale's Mixtral here.
os.environ["ANYSCALE_API_KEY"] = "esecret_XXXXXX"

client = Controller(
  routers=["mf"],
  strong_model="gpt-4-1106-preview",
  weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
)

Category: LLM