Engineers Should Handle Prompting LLMs (and Prompts Should Live in Your Codebase)

Published on
Mar 29, 2024

We’ve seen many discussions around Large Language Model (LLM) software development allude to a workflow where prompts live apart from LLM calls and are managed by multiple stakeholders, including non-engineers. In fact, many popular LLM development frameworks and libraries are built in a way that requires prompts to be managed separately from their calls. 

We think this is an unnecessarily cumbersome approach that’s not scalable for complex, production-grade LLM software development. 

Here’s why: For anyone developing production-grade LLM apps, prompts that include code will necessarily be a part of your engineering workflow. Therefore, separating prompts from the rest of your codebase, especially from their API calls, means you’re splitting that workflow into different, independent parts.

Separating concerns and assigning different roles to manage each may seem to bring certain efficiencies, for example, easing collaboration between tech and non-tech roles. But it introduces fundamental complexity that can disrupt the engineering process.

For instance, introducing a change in one place—like adding a new key-value pair to an input for an LLM call—means hunting down that change manually. And then, you will likely still not catch all the errors.

Don’t get us wrong. If your prompts are purely text or have very minimal code, then managing them separately from your calls may not have much of an impact. And there are legitimate examples of prompts with minimal or no code, like prompts for ChatGPT. In such cases, managing prompts separately from calls can make sense.

But any enterprise-grade LLM apps require sophisticated prompts, which means you’ll end up writing code for such prompts anyway.

In fact, trying to write out that logic in text would be even more complicated. In our view, code makes prompting both efficient and manageable, as well as the purview of engineers.

Below, we outline how we arrived at this truth, and how the solution we’ve developed (a Python-based LLM development library) helps developers manage prompts in the codebase easily and efficiently, making, in our experience, LLM app development faster and more enjoyable.

Our Frustrations with Developer Tools for Prompt Engineering

Our view on prompting started as we were using an early version of the OpenAI SDK to build out interpretable machine learning tools at a previous company. This was the standard OpenAI API for accessing GPT model functionalities.

Back then we didn’t have the benefit of any useful helper libraries, so we wrote all the API code ourselves. This amounted to writing lots of boilerplate to accomplish what seemed like simple tasks. For example, automatically extracting the model configuration (such as constraints) from just the name of the features in a given dataset. This required many prompt iterations and it was a pain to evaluate them.

It was around that time that we began asking ourselves: why aren’t there better developer tools in the prompt engineering space? Is it because people are bringing experimental stuff into production too quickly? Or simply because the space is so new?

The more we worked to develop our LLM applications, the more it was clear that from a software engineer's perspective, the separation of prompt management from the calls was fundamentally flawed. It made the actual engineering slow, cumbersome and arguably more error prone. It was almost as if current tools weren't built around developer best practices but rather around Jupyter notebook best practices (if there even is such a thing).

Beyond that, we noticed some other issues:

  • Our prompts became unmanageable past two versions. We weren’t using a prompt management workflow back then, so implementing changes was a manual process. We started telling colleagues not to touch the code because it might break a function somewhere else.
  • A lot of libraries tried to offer functionality for as many use cases as possible, sometimes making you feel dependent on them. They required you to do things their way, or you’d have to wait for them to catch up with new features from the LLMs.

All this led us to rethink how prompts should be managed to make developers’ lives easier. In the end, these frustrations boiled over into us wanting to build our own library that approached LLM development in a developer-first way to make LLM app development faster and more enjoyable. This ultimately became Mirascope.

How Mirascope Makes Prompt Engineering Intuitive and Scalable

For us, prompt engineering boils down to the relationship between the prompt and the API call. Mirascope represents what we feel is a best-in-class approach for generating that prompt, taking the LLM response, and tracking all aspects of that flow.

As developers, we want to focus on innovation and creativity, rather than on managing and troubleshooting underlying processes.

To that end, we designed Mirascope with the following features and capabilities to make your prompting more efficient, simpler, and scalable.

Code Like You Already Code, with Pythonic Simplicity

It was important to us to be able to just code in Python, without having to learn superfluous abstractions or extra, fancy structures that make development more cumbersome than it needs to be. So we designed Mirascope to do just that. 

For instance, we don’t make you implement directed acyclic graphs in the context of sequencing function calls. We provide code that’s eminently readable, lightweight, and maintainable.

An example of this is our `BasePrompt` class, which encapsulates as much logic within the prompt as feasible.

Within it, the `prompt_template` class variable (shown below) provides a string for generating a prompt that requests book recommendations based on topic and genre pairs. The `topics_x_genres` property constructs these pairs and the combined string is integrated into `prompt_template` to create the final list of messages.

1from mirascope import BasePrompt
4class BookRecommendationPrompt(BasePrompt):
5    prompt_template = """
6    Can you recommend some books on the following topic and genre pairs?
7    {topics_x_genres}
8    """
10    topics: list[str]
11    genres: list[str]
13    @property
14    def topics_x_genres(self) -> str:
15        """Returns `topics` as a comma separated list."""
16        return "\n".join(
17            [
18                f"Topic: {topic}, Genre: {genre}"
19                for topic in self.topics
20                for genre in self.genres
21            ]
22        )
25prompt = BookRecommendationPrompt(
26    topics=["coding", "music"], genres=["fiction", "fantasy"]
29#> Can you recommend some books on the following topic and genre pairs?
30#  Topic: coding, Genre: fiction
31#  Topic: coding, Genre: fantasy
32#  Topic: music, Genre: fiction
33#  Topic: music, Genre: fantasy

By default, Mirascope’s `BasePrompt` typically treats the prompt message as a single user message in order to simplify initial use and implementation of the class for straightforward scenarios.

But you may want to add more context to prompts in the form of different roles, such as SYSTEM, USER, ASSISTANT, MODEL, or TOOL (depending on which roles an LLM model can use), to generate responses that are more relevant and nuanced, as shown here:

1from mirascope import BasePrompt
4class BookRecommendationPrompt(BasePrompt):
5    prompt_template = """
6    SYSTEM:
7    You are the world's greatest librarian.
9    USER:
10    Can you recommend some books on {topic}?
11    """
13    topic: str
16prompt = BookRecommendationPrompt(topic="coding")
20[{"role": "system", "content": "You are the world's greatest librarian"}, {"role": "user", "content": "Can you recommend some books on coding?"}]

The `messages` method in the code above provides a structured interaction (through various roles) with the LLM and returns a list of messages formatted according to a predefined template.

You can extend `BasePrompt` to fit whatever use case is needed, from few-shot prompting to chat interactions with an LLM.

We also wanted to avoid introducing complexity where it’s not absolutely necessary. For example, given a choice in how we would chain together components, we’d prefer relying on native Python—perhaps instantiating one class within another and calling its method directly (as shown in the code sample directly below)—rather than relying on the pipe moderator.

Note: This isn’t to say there’s one “best” way to accomplish chaining, and you're certainly not required to do it with the subclass style shown below. For instance, you can also have two separate calls where you pass the output of the first into the second as an attribute during construction (rather than as an internal property). It's not our recommendation since it breaks colocation, but you're free to do what you like. We just have opinionated guidelines, not requirements.

Nevertheless, this approach to chaining encapsulates each step of the process within class methods, allowing for a clean and readable way to sequentially execute tasks that depend on the outcome of previous steps:

1import os
2from functools import cached_property
4from mirascope.openai import OpenAICall, OpenAICallParams
6os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
9class ChefSelector(OpenAICall):
10    prompt_template = "Name a chef who is really good at cooking {food_type} food"
12    food_type: str
14    call_params = OpenAICallParams(model="gpt-4-turbo")
17class RecipeRecommender(ChefSelector):
18    prompt_template = """
19    SYSTEM:
20    Imagine that you are chef {chef}.
21    Your task is to recommend recipes that you, {chef}, would be excited to serve.
23    USER:
24    Recommend a {food_type} recipe using {ingredient}.
25    """
27    ingredient: str
28  call_params = OpenAICallParams(model="gpt-4-turbo")
30    @cached_property
31    def chef(self) -> str:
32        """Uses `ChefSelector` to select the chef based on the food type."""
33        return ChefSelector(food_type=self.food_type).call().content
36recommender = RecipeRecommender(food_type="japanese", ingredient="apples")
37response =
39# > Certainly! Here's a recipe for a delicious and refreshing Japanese Apple Salad: ...

Finally, we show an example of how you can use Mirascope to do few-shot prompting, which provides the language model with a few examples (shots) to help it understand the task and generate better output.

Below are three example sets of book recommendations for different topics to guide the model in understanding the format and type of response expected when asked to recommend books on a new topic, such as "coding."

1from mirascope import BasePrompt
3class FewShotBookRecommendationPrompt(BasePrompt):
4    prompt_template = """
5    I'm looking for book recommendations on various topics. Here are some examples:
7    1. For a topic on 'space exploration', you might recommend:
8       - 'The Right Stuff' by Tom Wolfe
9       - 'Cosmos' by Carl Sagan
11    2. For a topic on 'artificial intelligence', you might recommend:
12       - 'Life 3.0' by Max Tegmark
13       - 'Superintelligence' by Nick Bostrom
15    3. For a topic on 'historical fiction', you might recommend:
16       - 'The Pillars of the Earth' by Ken Follett
17       - 'Wolf Hall' by Hilary Mantel
19    Can you recommend some books on {topic}?
20    """
22    topic: str
24few_shot_prompt = FewShotBookRecommendationPrompt(topic="coding")

Minimizing complexity lowers the learning curve. In Mirascope’s case, beyond knowing our library and Python, the only framework to learn is Pydantic.

Built-in Data Validation for Error-Free Prompting

We find that high-quality prompts—ones that are type and error checked—lead to more accurate and useful LLM responses, and so data validation is at the heart of what we do. 

Automatic validation against predefined schemas is built into the fabric of our framework, allowing you to be more productive rather than having to chase down bugs or code your own basic error handling logic.

For starters, our `BasePrompt` class extends Pydantic’s `BaseModel`, ensuring valid and well-formed inputs for your prompts. This means:

  • Mirascope’s prompt class inherits Pydantic’s capability to ensure the data is correctly typed before it’s processed and sent over to the API, leading to cleaner, more maintainable code. Developers can focus more on the business logic specific to prompting rather than on writing boilerplate.
  • Pydantic easily serializes data both to and from JSON format, which simplifies  the process of preparing request payloads and handling responses, eases integrations with any systems that accept JSON, and helps you quickly spin up FastAPI endpoints.
  • Pydantic is well supported in many IDEs, offering autocompletion and type hints.
  • It also lets developers define custom validation methods if needed, allowing them to enforce complex rules that go beyond type checks and basic validations.

An example of using Pydantic for enforcing type validation (with graceful error handling) is shown below:

1from typing import Type
3from mirascope.openai import OpenAIExtractor
4from pydantic import BaseModel, ValidationError
7class Book(BaseModel):
8  title: str
9  price: float
12class BookRecommender(OpenAIExtractor[Book]):
13     extract_schema: Type[Book] = Book
14     prompt_template = "Please recommend a book."
18     book = BookRecommender().extract()
19     assert isinstance(book, Book)
20     print(book)
21     #> title='The Alchemist' price=12.99
22except ValidationError as e:
23     print(e)
24     #> 1 validation error for Book
25     #  price
26     #    Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='standard', input_type=str]
27     #      For further information visit

You can also validate data in ways that are difficult if not impossible to code successfully, but that LLMs excel at, such as analyzing sentiment. For instance, you can add Pydantic’s `AfterValidator` annotation to Mirascope’s extracted output as shown below:

1from enum import Enum
2from typing import Annotated, Type
4from mirascope.openai import OpenAIExtractor
5from pydantic import AfterValidator, BaseModel, ValidationError
8class Label(Enum):
9    HAPPY = "happy story"
10    SAD = "sad story"
13class Sentiment(OpenAIExtractor[Label]):
14    extract_schema: Type[Label] = Label
15    prompt_template = "Is the following happy or sad? {text}."
17    text: str
20def validate_happy(story: str) -> str:
21    """Check if the content follows the guidelines."""
22    label = Sentiment(text=story).extract()
23    assert label == Label.HAPPY, "Story wasn't happy."
24    return story
27class HappyStory(BaseModel):
28    story: Annotated[str, AfterValidator(validate_happy)]
31class StoryTeller(OpenAIExtractor[HappyStory]):
32    extract_template: Type[HappyStory] = HappyStory
33    prompt_template = "Please tell me a story that's really sad."
37    story = StoryTeller().extract()
38except ValidationError as e:
39    print(e)
40    # > 1 validation error for HappyStoryTool
41    #   story
42    #     Assertion failed, Story wasn't happy. [type=assertion_error, input_value="Once upon a time, there every waking moment.", input_type=str]
43    #       For further information visit

Simplify LLM Interactions with Wrappers and Integrations

We believe in freeing you from writing boilerplate to interact with APIs, so we made available a number of wrappers for common providers. 

Our `OpenAICall` extends `BasePrompt` and `BaseCall` for interacting with OpenAI. Below is an example of how we initialize an `OpenAICall` instance and call the `call` method to generate an `OpenAICallResponse`:

1import os
3from mirascope import OpenAICall, OpenAICallParams
8class RecipeRecommender(OpenAICall):
9    prompt_template = "Recommend recipes that use {ingredient} as an ingredient"
11    ingredient: str
13    call_params = OpenAICallParams(model="gpt-4-turbo")
16response = RecipeRecommender(ingredient="apples").call()
17print(response.content)  # prints the string content of the call

The attribute `call_params` ties the parameters used for making the API call to OpenAI with the specific instance of `OpenAICall`. This means each instance carries with it all the necessary information for making a tailored API request to OpenAI.

For streaming LLM responses, `OpenAICall` also provides the `stream` method, which sets `stream=True` and offers the `OpenAICallResponseChunk` convenience wrappers around the response chunks:

1import os
2from mirascope import OpenAICall, OpenAICallParams
7class RecipeRecommender(OpenAIPrompt):
8    prompt_template = "Recommend recipes that use {ingredient} as an ingredient"
10    ingredient: str
12    call_params = OpenAICallParams(model="gpt-4-turbo")
15stream = RecipeRecommender(ingredient="apples").stream()
17for chunk in stream:
18    print(chunk.content, end="")

The `stream` method returns an `OpenAICallResponseChunk` instance, which is a convenience wrapper around OpenAI’s `ChatCompletionChunk` class.

1from mirascope.openai.types import OpenAIChatCompletionChunk
3chunk = OpenAICallResponseChunk(...)
5chunk.content  # original.choices[0].delta.content    # original.choices[0].delta
7chunk.choice   # original.choices[0]
8chunk.choices  # original.choices
9chunk.chunk    # ChatCompletionChunk(...)

You can access functionality of other libraries that implement an OpenAI wrapper by using the `wrapper` parameter within `OpenAICallParams`. Setting this parameter will internally wrap the OpenAI client within an `OpenAICall`, giving you access to both sets of functionalities, and executing every call to `call`, `call_async`, `stream`, and `stream_async` on top of the wrapped OpenAI client.

1from langsmith import wrappers
2from mirascope.openai import OpenAICall
5class BookRecommender(OpenAICall):
6    prompt_template = "Can you recommend some books on {topic}?"
8    topic: str
10    call_params = OpenAICallParams(
11        model="gpt-4-turbo",
12        wrapper=wrappers.wrap_openai
13    )

We also made sure that Mirascope integrates with Weights & Biases, LangSmith, and LangChain for (respectively) tracking machine learning experiments and visualizing data, and improving prompt effectiveness through automated refinement and testing. We provide code examples for integrating with these libraries in our documentation.

You can use Mirascope with other LLM providers that also implement the OpenAI API, including:

  • Ollama
  • Anyscale
  • Together
  • Groq

Beyond OpenAI, Mirascope provides access to these other LLM providers (and those using their APIs):

  • Anthropic
  • Gemini
  • Mistral

If you wanted to switch to another model provider like Anthropic for instance, you’d just need to change certain classes and parameters:

1from mirascope.anthropic import AnthropicCall, AnthropicCallParams
4class BookRecommender(AnthropicCall):
5    prompt_template = "Please recommend some books."
7    call_params = AnthropicCallParams(
8        model="anthropic-3-opus-20240229",
9    )

Expand LLM Capabilities with Tools

Although LLMs are known mostly for text generation, you can provide them with specific tools (also known as function calling) to extend their capabilities. 

Examples of what you can do with tools include:

  • Granting access to the Bing API for internet search to fetch the latest information on various topics.
  • Providing a secure sandbox environment like for dynamically running code snippets provided by users in a coding tutorial platform.
  • Allowing access to the Google Cloud Natural Language API for evaluating customer feedback and reviews to determine sentiment and help businesses quickly identify areas for improvement.
  • Providing a Machine Learning (ML) recommendation engine API for giving personalized content or product recommendations for an e-commerce website, based on natural language interactions with users.

Mirascope lets you easily define a tool by documenting any function using a docstring as shown below. It automatically converts this into a tool, saving you additional work.

1from typing import Literal
3from mirascope.openai import OpenAICall, OpenAICallParams
6def get_current_weather(
7    location: str, unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
9    """Get the current weather in a given location."""
10    if "tokyo" in location.lower():
11        print(f"It is 10 degrees {unit} in Tokyo, Japan")
12    elif "san francisco" in location.lower():
13        print(f"It is 72 degrees {unit} in San Francisco, CA")
14    elif "paris" in location.lower():
15        print(f"It is 22 degress {unit} in Paris, France")
16    else:
17        print("I'm not sure what the weather is like in {location}")
20class Forecast(OpenAICall):
21    prompt_template = "What's the weather in Tokyo?"
23    call_params = OpenAICallParams(model="gpt-4", tools=[get_current_weather])
25tool = Forecast().call().tool
26if tool:
27    tool.fn(**tool.args)
28    #> It is 10 degrees fahrenheit in Tokyo, Japan

Mirascope supports Google, ReST, Numpydoc, and Epydoc style docstrings for creating tools.

If a particular function doesn’t have a docstring, you can define your own `OpenAITool` class. You can attach the `tool.fn` method to any particular tool to make it immediately accessible and callable with a single line of code.

1from typing import Literal
3from pydantic import Field
5from mirascope.base import tool_fn
6from mirascope.openai import OpenAITool
9def get_current_weather(
10    location: str, unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
12    """Assume this function does not have a docstring."""
13    if "tokyo" in location.lower():
14        print(f"It is 10 degrees {unit} in Tokyo, Japan")
15    elif "san francisco" in location.lower():
16        print(f"It is 72 degrees {unit} in San Francisco, CA")
17    elif "paris" in location.lower():
18        print(f"It is 22 degress {unit} in Paris, France")
19    else:
20        print("I'm not sure what the weather is like in {location}")
25class GetCurrentWeather(OpenAITool):
26    """Get the current weather in a given location."""
28    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
29    unit: Literal["celsius", "fahrenheit"] = "fahrenheit"

Tools allow you to dynamically generate prompts based on current or user-specified data such as extracting current weather data in a given city before generating a prompt like, “Given the current weather conditions in Tokyo, what are fun outdoor activities?” 

See our documentation for details on generating prompts in this way (for instance, by calling the `call` method).

Extract Structured Data from LLM-Generated Text

LLMs are great at producing conversations in text, which is unstructured information. But many applications need structured data from LLM outputs. Scenarios include:

  • Extracting structured information from a PDF invoice (i.e., invoice number, vendor, total charges, taxes, etc.) so that you can automatically insert that information into another system like a CRM or tracking tool, a spreadsheet, etc.
  • Automatically extracting sentiment, feedback categories (product quality, service, delivery, etc.), and customer intentions from customer reviews or survey responses.
  • Pulling out specific medical data such as symptoms, diagnoses, medication names, dosages, and patient history from clinical notes.
  • Extracting financial metrics, stock data, company performance indicators, and market trends from financial reports and news articles.

To handle such scenarios, we have extractors with an`extract` method, which leverages tools like `OpenAITool` to reliably extract structured data from the outputs of LLMs according to the schema defined in Pydantic’s `BaseModel`. In the example below you can see how due dates, priorities, and descriptions are being extracted from a prompt, `task`:

1from typing import Literal
3from mirascope.openai import OpenAIExtractor
4from pydantic import BaseModel
7class TaskDetails(BaseModel):
8    due_date: str
9    priority: Literal["low", "normal", "high"]
10    description: str
13class TaskExtractor(OpenAIExtractor[TaskDetails]):
14    extract_schema: Type[TaskDetails] = TaskDetails
15    prompt_template = """
16    Extract the task details from the following task:
17    {task}
18    """
20    task: str
23task = "Submit quarterly report by next Friday. Task is high priority."
24task_details = TaskExtractor(task=task).extract(TaskDetails)
25assert isinstance(task_details, TaskDetails)
27#> due_date='next Friday' priority='high' description='Submit quarterly report'

You can define schema parameters against which to extract data in Pydantic’s `BaseModel` class, by setting certain attributes and fields in that class. Mirascope also lets you set the number of retries to extract data in case a failure occurs.

But you also don’t have to use a detailed schema like BaseModel if you’re extracting base types like strings, integers, booleans, etc. The code sample below shows how extraction for a simple structure like a list of strings doesn’t need a full-fledged schema definition.

1from mirascope.openai import OpenAIExtractor
4class BookRecommender(OpenAIExtractor[list[str]]):
5    extract_schema: Type[list[str]] = list[str]
6    prompt_template = "Please recommend some science fiction books."
9books = BookRecommendation().extract()
11#> ['Dune', 'Neuromancer', "Ender's Game", "The Hitchhiker's Guide to the Galaxy", 'Foundation', 'Snow Crash']

Mirascope makes things as simple as feasible, requiring you to write less code in cases where more code isn't necessary.

Facilitate Your Prompt Workflows with CLI and IDE Support

As mentioned earlier, our experience with prompts is that they generally become unmanageable after a certain number of iterations. Versioning is obviously a good idea, and we see some cloud prompting tools that offer this, but as they don’t generally colocate prompts with LLM calls, not all the relevant information gets versioned, unfortunately.

We believe it’s important to colocate as much information with the prompt as feasible, and that it should all be versioned together as a single unit. Our prompt management CLI is inspired by Alembic and lets you:

  • Create a local prompt repository and add prompts to it
  • Commit new versions of prompts
  • Switch between different versions of prompts
  • Remove prompts from the repository

Our CLI lets you commit your versions from development as part of your standard Git workflow, ensuring colleagues can see everything that was tried, as well as the differences between prompts. It’s also worth noting that the CLI works with calls and extractors since they subclass the `BasePrompt` class.

When installed, our CLI creates predefined working subdirectories and files as shown below:

|-- mirascope.ini
|-- mirascope
|   |-- prompt_template.j2
|   |-- versions/
|   |   |-- <directory_name>/
|   |   |   |-- version.txt
|   |   |   |-- <revision_id>_<directory_name>.py
|-- prompts/

This creates a prompt management environment that supports collaboration and allows you to centralize prompt development in one place. 

When you save a prompt in the `versions` subdirectory above with a command like:

1mirascope add book_recommender

It versions the prompt and both creates a subdirectory `book_recommender`, and adds a version number to the prompt’s filename, e.g., ``.

As well, the version number is added inside the file itself:

1# versions/book_recommender/
2from mirascope.openai import OpenAICall, OpenAICallParams
4prev_revision_id = "None"
5revision_id = "0001"
8class BookRecommender(OpenAICall):
9    prompt_template = "Can you recommend some books on {topic} in a list format?"
11    topic: str
13    call_params = OpenAICallParams(model="gpt-4-turbo")

Once the prompt file is versioned, you can continue iterating on the prompt, as well as switch and remove versions, etc.

Both Mirascope’s and Pydantic’s documentation are available for your IDE; for example, Mirascope provides help information for inline errors and autocomplete suggestions.

Inline errors:

Inline errors: "BookRecommendationPrompt"


Autocomplete example: BookRecommendationPrompt

If you want to give Mirascope a try, you can get started with our source code on GitHub. You can find our documentation (and more code samples) on our documentation site as well.

Join our beta list!

Get updates and early access to try out new features as a beta tester.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.