Data Science, ML and Analytics Engineering

How to Speed Up LLMs and Reduce Costs: Edge Models

RouteLLM reduces your costs for using LLMs by 3.6 times.

It chooses whether to use a strong or weak LLM model depending on the complexity of the user’s query. This optimizes the balance between cost and quality of the response.

The Python library allows you to use this approach directly.

import os
from routellm.controller import Controller

os.environ["OPENAI_API_KEY"] = "sk-XXXXXX"
# Replace with your model provider, we use Anyscale's Mixtral here.
os.environ["ANYSCALE_API_KEY"] = "esecret_XXXXXX"

client = Controller(
  routers=["mf"],
  strong_model="gpt-4-1106-preview",
  weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
)

Read more