LLM - alimbekov.com

Choosing an LLM for Content Generation: Model Characters

Choosing an LLM for content generation" with model chips Gemini, Claude, DeepSeek, GPT and Grok

Over the past year I’ve run a dozen models on real content generation from Reels scripts to longreads. The takeaway is simple and slightly inconvenient: there is no universal “best” model. Each one has its own character, and the right pick depends on the type of content and the audience.

This post is how I match a model to the task, the pipeline I run in production, and what backs it up: LMArena, FLASK, Arena Expert data, and my own experiment on measuring creativity.

In short

There’s no “best” model there are characters. Gemini explains, Claude tells stories, DeepSeek drafts cheap, GPT is strict and formal.
The working pipeline: generate drafts cheap (DeepSeek / Gemini Flash) → filter → polish on Claude Sonnet.
Thinking models give no edge for creative work, while Claude leads Problem Handling on FLASK.
I put a number on “creativity”: one phrase in the prompt nearly doubles the spread of responses.

How llms.txt Increased AI Chat Traffic by 23%: Real Results

Four months ago, I added llms.txt and llms-full.txt files to my blog. Time to look at the numbers.

What is llms.txt

It’s essentially robots.txt for language models. A file in the site’s root directory that helps AI systems — ChatGPT, Perplexity, Claude, Gemini — better understand the structure and content of your website. I wrote more about it in a separate post.

My files:

llms.txt — short version
llms-full.txt — full version

Methodology

I compared two 4-month periods:

Period	Dates	Status
Before	March 18 — July 17, 2024	without llms.txt
After	July 18 — November 18, 2024	with llms.txt

Data source — Yandex.Metrica, “Referral links” report.

I filtered AI chat domains: chatgpt.com, perplexity.ai, chat.deepseek.com, gemini.google.com, chat.qwen.ai, copilot.microsoft.com, alice.yandex.ru.

MCPMark Benchmark: Why AI Agents Fail 47% of Tasks

If you, like me, wonder how LLMs work with MCP and how well they execute your assigned tasks, then a new study called MCPMark is exactly about that. The research shatters all illusions about artificial intelligence against the harsh reality.

Why Existing Tests Don’t Work

Imagine evaluating a person’s ability to work as a programmer by giving them only tasks to read documentation. Absurd, right? But that’s exactly how most existing benchmarks for AI agents work.

Researchers from the National University of Singapore, EvalSys, and other organizations noticed a critical problem: modern tests for evaluating AI agents’ work with Model Context Protocol (MCP) remain narrow and unrealistic. They either focus on tasks that only require reading information or offer interactions with minimal depth.

It’s like testing driving skills by having a person only sit in the passenger seat and describe what they see through the window.

How to Become an AI-First Specialist Right Now

For the past couple of years, I’ve been working as a Data Science and Data Analytics consultant. My clients are companies and startups from different countries around the world.

Today I’m convinced: no matter what your profession is — lawyer, recruiter, product manager, or designer — a mandatory requirement for working in 2025 is that you must be AI-first and 100% integrate various AI tools and approaches into your work processes.

Why is this critical right now?

Over the past year, I’ve increased my productivity by 3-4 times thanks to the proper use of AI tools. What used to take several days of research, I now do in a day. Projects that required a team of 2-3 people, I now execute alone.

In this post, I’ll share my top AI tools that I use every day, plus additional tools for specific tasks.

Open LLM Models GPT-OSS from OpenAI and More

August 2025 was marked by the release of several significant updates and completely new models in the field of artificial intelligence that promise to substantially change the AI landscape. Anthropic, Google DeepMind, and OpenAI presented their latest achievements, demonstrating progress in agentic tasks, world generation, and open language models. Let’s examine these releases.

Open GPT-OSS Models from OpenAI

OpenAI has finally opened their models by releasing GPT-OSS – a family of open models designed for powerful reasoning, agentic tasks, and universal use cases for developers. This series includes two models:

gpt-oss-120b: A large model with 117 billion total parameters (and 5.1 billion active), designed for production, general, and high-reasoning use cases that fits on a single H100 GPU (80 GB).
gpt-oss-20b: A smaller model with 21 billion total parameters (and 3.6 billion active), designed for latency reduction, local or specialized use, operating within 16 GB of memory, perfectly suited for consumer hardware.