8 of the Best Prompt Engineering Tools in 2024

Published on
Apr 3, 2024

While anyone can develop LLM applications using just the OpenAI SDK—we used to do that since we didn’t find helper functions at the time to be particularly helpful—prompt engineering tools that simplify LLM interactions and enhance productivity are emerging as key players.

We’ve tried several libraries and have built our own, and in our experience you should look for six capabilities in a good prompt engineering tool, if you’re looking to develop robust, production-grade LLM applications: 

  1. The tool offers dynamic prompting with automatic data validation to reduce the likelihood of errors and ensure that inputs to the model are well formed and adhere to specified constraints. Not all tools do this: many leave this as an afterthought and expect you to take care of the error handling logic.
  2. It colocates prompts with calls to the LLM to solve a major problem with prompt engineering where scattered calls and prompts become hard to manage at scale. We know of no other libraries (besides ours) embracing this principle—which as a leaner approach could be considered as good software practice. 
  3. It simplifies model interactions with convenience wrappers and integrations, reducing the complexities of interacting with, while offering more choice in LLMs, allowing developers to select the most suitable models and libraries to fit their use cases. Here we emphasize “simplifies,” as many libraries do indeed offer abstractions yet these (paradoxically) increase complexity rather than decrease it.
  4. The library allows you to extract structured data from the unstructured outputs of LLMs to better integrate with other systems like databases, CRMs, etc., minimizing or eliminating the need for manual data entry or parsing. The unstructured text outputs are what LLMs are designed to provide, but getting structured data from their outputs is like a superpower and is something you should look to see that your tool provides (not all of them do).
  5. The library offers prompt management features like version control so you can better track changes, revert to previous versions, and more. Version control makes all the difference in being able to track changes in complex prompts involving code, and for collaborative work. Many tools indeed provide prompt versioning, but as they don’t colocate the rest of the information affecting the quality of the LLM call (such as temperature, LLM model version, etc.) together with the prompt, only a part of the relevant information gets tracked consistently.
  6. It extends model capabilities by letting you add your own tools (i.e., function calling), like adding a calculator to increase the accuracy of math calculations, or granting it access to the Bing Web Search API to find answers. While some libraries provide methods to use such tools, many don’t, implying a reliance on developers to implement these functionalities independently.

Below, we present eight different prompt engineering tools, beginning with our own prompt engineering library, Mirascope. In our list, we highlight the approach and strengths of each tool:

  • Mirascope—best for building production-grade LLM applications.
  • Guidance—best for tailoring prompts via advanced constraints.
  • Haystack—best for structuring prompt pipelines.
  • Priompt—best for priority-controlled context management.
  • Agenta—best for rapid prototyping and collaborative LLM application development.
  • LangChain—best for customizable and scalable LLM applications.
  • PromptHub—best for facilitating collaboration with non-tech roles.
  • Promptmetheus—best for developers seeking an IDE for optimizing and testing prompts.

1. Mirascope—Best for Building Production Grade LLM Applications

Mirascope homepage: An intuitive approach to building with LLMs

Automatically Validate Inputs of Dynamic Prompts to Ensure Accuracy

Back when we started working with the OpenAI SDK, we found that complex prompts involving code became unmanageable past two versions. So we began enforcing manual versioning, telling our developers not to touch the code without authorization, which made workflows inefficient.

It became important for us to catch basic errors before they reached the LLM, just as you’d expect in any standard software development workflow.

In Mirascope, the `BasePrompt` class for prompting is actually an extension of Pydantic’s `BaseModel`, which provides automatic data validation to ensure the prompt’s inputs are well formed and correspond to those defined by the prompt. This contributes to the overall quality of the prompt and lessens the chance of errors silently going into the LLM. 

Both Mirascope and Pydantic offer inline documentation and linting for your editor to ensure the quality of your code. Two examples are shown below.

A missing argument in a function call:

Missing argument in a function call example: "BookReommendationPrompt"

Autocompletion for a function call:

Autocompletion for a function call: "BookRecommendationPrompt"

You can customize and extend Mirascope’s `BasePrompt` class to build prompts using dynamic content, according to your use case:

1from mirascope import BasePrompt
4class BookRecommendationPrompt(BasePrompt):
5    """A prompt for recommending a book."""
7    prompt_template = """
8    SYSTEM: You are the world's greatest librarian.
9    USER: Please recommend a {genre} book.
10    """
12    genre: str
15prompt = BookRecommendationPrompt(genre="fantasy")
17[{"role": "system", "content": "You are the world's greatest librarian"}, {"role": "user", "content": "Please recommend a fantasy book."}]
18  return go(f, seed, [])

Above, the class variable `prompt_template` creates a structured, context-specific prompt, while `messages()` uses this class variable to construct a list of messages that represent the prompt to be sent to the language model.

Calling `str(prompt)` in the code below formats `prompt_template` using the properties of the class matching the template variables, allowing you to easily print and view even more complex prompts based on the instance’s state. 

1from mirascope import BasePrompt
4class BookRecommendationPrompt(BasePrompt):
5    prompt_template = """
6    Can you recommend some books on the following topic and genre pairs?
7    {topics_x_genres}
8    """
10    topics: list[str]
11    genres: list[str]
13    @property
14    def topics_x_genres(self) -> str:
15        """Returns `topics` as a comma separated list."""
16        return "\n".join(
17            [
18                f"Topic: {topic}, Genre: {genre}"
19                for topic in self.topics
20                for genre in self.genres
21            ]
22        )
25prompt = BookRecommendationPrompt(
26    topics=["coding", "music"], genres=["fiction", "fantasy"]
29#> Can you recommend some books on the following topic and genre pairs?
30#  Topic: coding, Genre: fiction
31#  Topic: coding, Genre: fantasy
32#  Topic: music, Genre: fiction
33#  Topic: music, Genre: fantasy

Colocate Prompts with LLM Calls for Better Code Management

One frustration we experienced with prompting was not centralizing as much information as needed within the prompt itself for better overall code management. This included not only API calls but also model configuration and other details.

Having to manage code in disparate locations entailed greater manual efforts on our part to catch errors and keep track of changes. For example, if we decided to change model specification, which was located in one place and affected multiple prompts in other places in the codebase, then it could be a pain to ensure this change was properly accounted for.

Therefore we typically locate information like the model type within the prompt, as shown below:

1import os
3from mirascope import OpenAICall, OpenAICallParams
8class RecipeRecommender(OpenAIPrompt):
9    prompt_template = "Recommend recipes that use {ingredient} as an ingredient"
11    ingredient: str
13    call_params = OpenAICallParams(model="gpt-4-turbo")
16response = RecipeRecommender(ingredient="apples").call()
17print(response.content)  # prints the string content of the call

Mirascope both provides high-level wrappers for calling APIs and locates these calls within the prompt code.

`call_params` above specifies the parameters for the OpenAI API call, and shows how every instance of the `OpenAIPrompt` class carries with it the necessary information for making the call to the LLM.

Further down in the code, an instance of `RecipeRecommender` makes the call to the OpenAI API (using the model specified in `call_params`) to generate content based on the `prompt_template` defined in the class.

Simplify Model Interactions with Wrappers and Integrations

Being able to use convenience wrappers and integrations for working with other libraries and LLMs saves developers time and effort, allowing them to focus more on their craft and less on the intricacies of API communications.

When working with the OpenAI API, the Mirascope `call` method returns an `OpenAICallResponse` class instance, which is a convenience wrapper encapsulating the response (i.e., the `ChatCompletion` class in `openai` that extends `BaseCallResponse`) from the OpenAI API, making it easier to work with by providing a simpler interface. 

Each commented line of the code below shows the equivalent path to access the same information directly from a `ChatCompletion` object if you were not using the wrapper:

1from mirascope.openai.types import OpenAICallResponse
3response = OpenAICallResponse(...)
5completion.content     # original.choices[0].message.content
6completion.tool_calls  # original.choices[0].message.tool_calls
7completion.message     # original.choices[0].message
8completion.choice      # original.choices[0]
9completion.choices     # original.choices
10response.response      # ChatCompletion(...)

Mirascope also supports interactions with other LLMs besides OpenAI to refine the execution of your calls. Currently, the following models (and any providers using their APIs) are supported:

  • OpenAI
  • Anthropic
  • Gemini
  • Mistral

Mirascope provides wrappers to integrate with a number of popular libraries in the machine learning and LLM space, such as Weights & Biases, LangSmith, and LangChain.

For working with libraries that implement an OpenAI wrapper, you can set the `wrapper` parameter within `OpenAICallParams` to internally wrap the `OpenAI` client within an `OpenAICall` to get access to the functionalities of both Mirascope and the other library. From now on, all calls will be executed on top of the wrapped OpenAI client.

1from some_library import some_wrapper
2from mirascope.openai import OpenAICall
5class BookRecommender(OpenAICall):
6    prompt_template = "Can you recommend some books on {topic}?"
8    topic: str
10    call_params = OpenAICallParams(
11        model="gpt-4-turbo",
12        wrapper=some_wrapper
13    )

Conveniently Extract Structured Data from Unstructured LLM Outputs

The conversational outputs of LLMs are valuable but can be difficult to action or integrate with other systems.

Extracting structured data from these unstructured outputs allows you to feed these as inputs to other processes and workflows in turn, making that information more useful and actionable..

For this task, Mirascope offers its `BaseExtractor` class, which extends `BasePrompt` to extract structured information from unstructured LLM language via prompt-based interactions. `BaseExtractor` defines a schema for the expected data for validating it (as it’s built on top of Pydantic) and uses function calling with LLMs.

In the example below, `OpenAIExtractor` (which extends `BaseExtractor`) uses the`extract` method to get task details like due date, priority, and description from a user's natural language input:

1from typing import Literal
3from mirascope.openai import OpenAIExtractor
4from pydantic import BaseModel
7class TaskDetails(BaseModel):
8    due_date: str
9    priority: Literal["low", "normal", "high"]
10    description: str
13class TaskExtractor(OpenAIExtractor[TaskDetails]):
14    extract_schema: Type[TaskDetails] = TaskDetails
15    prompt_template = """
16    Extract the task details from the following task:
17    {task}
18    """
20    task: str
23task = "Submit quarterly report by next Friday. Task is high priority."
24task_details = TaskExtractor(task=task).extract(TaskDetails)
25assert isinstance(task_details, TaskDetails)
27#> due_date='next Friday' priority='high' description='Submit quarterly report'

Mirascope also provides ways to automatically extract base types from natural language, offering the shorthand convenience of not having to write the `pydantic.BaseModel` for when you want to extract a single base type instance:

1from mirascope.openai import OpenAIExtractor
4class BookRecommender(OpenAIExtractor[list[str]]):
5    extract_schema: Type[list[str]] = list[str]
6    prompt_template = "Please recommend some science fiction books."
9books = BookRecommendation().extract()
11#> ['Dune', 'Neuromancer', "Ender's Game", "The Hitchhiker's Guide to the Galaxy", 'Foundation', 'Snow Crash']

Mirascope supports extraction of numerous base types, including `str`, `int`, `float`, `bool`, `Literal`, `tuple`, and more.

Manage Prompt Iterations with Version Control

As mentioned previously, a major pain point of ours has been successfully managing prompts after several iterations of changes. Prompts as code under development should be subject to version tracking like any code developed under collaboration.

For this reason we provide our CLI that’s based on git and Alembic. You can create or download a local repo of your prompt code, version your prompt changes, revert to previous versions, etc.

We simplify prompt management by providing convenient commands like `add`, `init,` `use`, and a few more, and by intuitively structuring a working directory:

|-- mirascope.ini
|-- mirascope
|   |-- prompt_template.j2
|   |-- versions/
|   |   |-- <directory_name>/
|   |   |   |-- version.txt
|   |   |   |-- <revision_id>_<directory_name>.py
|-- prompts/

For example, you can save a prompt under `prompts/book_recommender.py` by entering:

1mirascope add book_recommender

This adds the prompt to the local repo and its filename changes to `prompts/0001book_recommender.py`

A new subdirectory is also created: `prompts/book_recommender`. The version number is added to the code inside the prompt file:

1# versions/book_recommender/0001_book_recommender.py
2from mirascope.openai import OpenAICall, OpenAICallParams
4prev_revision_id = "None"
5revision_id = "0001"
9class BookRecommender(OpenAICall):
10    prompt_template = "Can you recommend some books on {topic} in a list format?"
12    topic: str
14    call_params = OpenAICallParams(model="gpt-4-turbo")

These features allow you to fully manage any changes to prompts, including any information colocated with the prompt code, like changes to LLM models, calls, etc.

Extend Model Capabilities by Adding Tools (Function Calling)

LLMs can deliver impressive results with their conversational outputs, but relying solely on these introduces some limitations:

  • Developers are restricted to the inherent outputs of LLMs, which are mainly text generation based on training data. This limitation hinders the scope of applications that can be developed, ​​confining these to scenarios where only text generation is needed.
  • Without access to real-time data or external databases through tools, LLMs may generate outdated or inaccurate information. This increases the risk of incorrect answers, potentially eroding trust.
  • Tasks that involve data retrieval, processing, or manipulation would become cumbersome. Developers would have to implement separate AI systems to handle these tasks outside the LLM, leading to complexity and fragmented system architectures.

To overcome these challenges, Mirascope provides a number of ways to use tools in your LLM workflows. 

For example, the `tool.fn` decorator is used to attach a tool to a `BaseTool` extension like `OpenAITool`. The example below uses functions as-is with no additional work, automatically generating the tool and attaching the function under-the-hood for you:

1from mirascope.openai import OpenAICall
4def get_weather(location: str) -> str:
5    """Get's the weather for `location` and prints it.
7    Args:
8        location: The "City, State" or "City, Country" for which to get the weather.
9    """
10    print(location)
11    if location == "Tokyo, Japan":
12        return f"The weather in {location} is 72 degrees and sunny."
13    elif location == "San Francisco, CA":
14        return f"The weather in {location} is 45 degrees and cloudy."
15    else:
16        return f"I'm sorry, I don't have the weather for {location}."
19class Forecast(OpenAICall):
20  prompt_template = "What's the weather in Tokyo and San Francisco?"
23weather_tool = Forecast().extract(get_weather)
25#> The weather in Tokyo, Japan is 72 degrees and sunny.

Otherwise, Mirascope can use a Tool with an attached function, when the function you want to use needs more description for the prompt to work well.

1import json
3from typing import Literal
6def get_current_weather(
7    location: str, unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
8) -> str:
9    """Get the current weather in a given location.
11    Args:
12        location: The city and state, e.g. San Francisco, CA.
13        unit: The unit for the temperature.
15    Returns:
16        A JSON object containing the location, temperature, and unit.
17    """
18    if "tokyo" in location.lower():
19        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": unit})
20    elif "san francisco" in location.lower():
21        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
22    elif "paris" in location.lower():
23        return json.dumps({"location": "Paris", "temperature": "22", "unit": unit})
24    else:
25        return json.dumps({"location": location, "temperature": "unknown"})

Mirascope supports a number of docstrings, such as Google-, ReST-, Numpydoc-, and Epydoc-style docstrings. You can also define your own `OpenAITool` class—in cases where the function you’re wanting to convert to a tool doesn’t have a docstring. You can find further use cases for using tools, and code samples, in our overview documentation.

If you want to give Mirascope a try, you can get started with our source code on GitHub. You can find our documentation (and more code samples) on our documentation site as well.

2. Guidance—Best for Tailoring Prompts via Advanced Constraints

Guidance homepage

Guidance is a prompt engineering library that constrains generation using regexes and context-free grammars in Python.

Constrained Generation Using Regular Expressions and Grammars

Guidance allows for dynamic prompting by constraining text generation to adhere to specific formats or rules. As an indirect form of data validation, it ensures you only get well-formed outputs meeting the specified constraints.

Simplified Model Interactions with Convenience Wrappers and Integrations

The library works with a variety of LLM backends like Transformers, Llama.cpp, VertexAI, and OpenAI by providing convenience wrappers that simplify interactions with these models and allowing developers the flexibility to choose the most suitable LLM for their needs.

Extension of Model Capabilities with Custom Tools

The library allows for the integration of custom tools or functions during the text generation process, enhancing the model's capabilities. Examples include adding a calculator function or potentially integrating external APIs for additional data access, although specific API integration capabilities like Bing Web Search were not mentioned.

Guidance, along with its documentation, is available on GitHub. 

3. Haystack—Best for structuring prompt pipelines

Haystack homepage: Open-source LLM framework to build production-ready applications

Haystack is an orchestration framework for building customizable, production-ready LLM applications. Its prompt engineering features allow you to dynamically construct effective prompts, tailor interactions with a variety of LLMs, and leverage Retrieval-Augmented Generation (RAG) for context-enriched responses.

Its PromptHub offers an open source repository for prompt templates.

Simplify Model Interactions

Haystack is technology agnostic and allows users to decide which vendor or technology they want to use, making it easy to switch out components. It supports models from OpenAI, Cohere, and Hugging Face, providing flexibility in model selection.

It’s also generally extensible and flexible, allowing for custom components like modules or classes.

Extract Structured Data from Unstructured Outputs

The framework’s architecture, with its focus on pipelines and components, is geared towards processing and manipulating unstructured data (like documents) to perform tasks such as question answering and semantic search.

Extend Model Capabilities with Tools

Haystack’s design as an extensible framework supports the integration of custom components, which could include tools for enhanced calculations or external API calls. 

The framework features a GitHub repository, a dedication documentation site, and a website.

4. Priompt—Best for Priority-Controlled Context Management

Priompt homepage

Priompt (priority + prompt) is a JSX-based prompting library that uses priorities to decide what to include in the context window. Priompt embodies the philosophy that prompting should be referred to as prompt design and therefore likened to web design.

JSX-based Prompt Composition

Priompt offers an adaptable prompting system using a syntax reminiscent of modern web development frameworks like React, where input variables dynamically alter the prompt’s structure. 

By leveraging the familiar JSX syntax, developers can easily create complex, nested prompts that adapt to the specific needs of their application, improving the clarity and effectiveness of communication with language models.

Priority-Driven Context Inclusion

Priompt’s priority mechanism ensures that the most relevant information is presented within the context window, enhancing the quality of generated responses. Developers can specify absolute and relative priorities for different segments of the prompt, allowing for fine-tuned control over what information is critical and must be included versus what can be omitted if space constraints require it. 

Priompt actively maintains its source code on GitHub.

5. Agenta—Best for Rapid Prototyping and Collaborative LLM Application Development

Agenta homepage: Your Collaborative, All-in-One, Open Source AI Development Platform

Agenta is an open-source, end-to-end LLMOps platform that helps developers and product teams build robust generative AI applications powered by large language models.

Collaborative Development and Evaluation

Agenta lowers the barrier for non-developers to participate in prompt engineering and evaluation (such as collaborating on natural language prompts). By allowing the development of custom LLM applications through a user interface, Agenta empowers product teams to iterate on the configuration of any custom LLM application, evaluate it, annotate it, test it, and deploy it, all within the user interface.

Simplifies Model Interactions

Agenta provides convenience through integrations with various model providers (e.g., OpenAI, Cohere) and compatibility with frameworks like Langchain or LlamaIndex. This simplifies model interactions and offers developers flexibility in choosing suitable models for their needs, and to directly align their prompt engineering techniques.

More information about Agenta is available on its website, its documentation site, and on GitHub.

6. LangChain—Best for Scalable and Customizable LLM Applications

LangChain homepage: Applications that can reason. Powered by LangChain.

LangChain offers structured, reusable prompt templates as classes incorporating dynamic content for basic use cases such as few-shot prompting and LLM chat interactions.

LangChain also provides its LangChain Hub, to which you can upload and manage prompts.

Dynamic Prompting and Advanced Customization

LangChain, through its LangChain Expression Language (LCEL) and modular design, allows for dynamic composition of chains. This includes sophisticated prompt construction that can incorporate customized logic and components, adapting to varying contexts and requirements.

Simplifies Model Interactions with Wrappers and Integrations

LangChain introduces a variety of NLP wrappers that abstract away the complexities of dealing directly with large language models. These wrappers provide developers with high-level abstractions for model I/O, enabling straightforward interaction with AI models. 

The library also offers libraries with integrations for components like large language models, agents, output parsers, and more, as well as off-the-shelf chains for common tasks.

Extend Model Capabilities with Custom Tools

LangChain's extensible design allows developers to automate and integrate custom AI tools and components into their chains.

More information about LangChain is available on its website, GitHub page, and documentation site. We’ve also written an article about LangChain alternatives for AI development.

7. PromptHub—Best for Facilitating Collaboration with Non-Tech Roles

PromptHub homepage: Level up your prompt management

PromptHub is a SaaS prompt management platform built for team collaboration, which includes features such as commonly used templates and prompt versioning.

Dynamic Prompting with Templates Containing Variable Placeholders

PromptHub offers a large number of predefined templates for use cases in areas such as marketing, sales, and engineering. The templates contain placeholders into which you can inject variable values to create dynamic content for your prompts.

Side-by-Side Prompt Version Evaluation

It simplifies the process of experimenting with and comparing different prompt versions, offering a straightforward platform for side-by-side output evaluation. Starting with writing and testing a prompt, users can easily version changes, adjust prompts, AI models, or parameters, and rerun tests to view the previous and new versions alongside each other for direct comparison.

For more information you can read about PromptHub on its website or consult its documentation site.

8. Promptmetheus—Best for Developers Seeking an IDE for Optimizing and Testing Prompts

Promptmetheus homepage: Prompt Engineering IDE

Promptmetheus is a cloud-based IDE for prompt development, which provides AI tools and functionality to help you write, test, and optimize LLM prompts faster and efficiently.

Prompt Engineering IDE

Promptmetheus facilitates the creation, optimization, and deployment of complex prompts and enables users to assemble prompts using different text and data blocks.

Prompt Design and Testing Toolset

The IDE provides a toolset and user experience tailored for prompt engineering, including features for composability, traceability, input variables, and a centralized prompt library. These features support the iterative development process, allowing users to experiment with various AI prompt configurations and fine-tune them for optimal performance. The traceability feature, in particular, keeps a detailed history of prompt versions and their outputs, enabling users to track the evolution of their prompts and understand the impact of changes.

Collaboration and Synchronization Features

Promptmetheus supports real-time collaboration for teams and device synchronization, to streamline the prompt development process for distributed teams and individuals working across multiple devices. These features ensure that all team members have access to the latest versions of AI prompts and can contribute to the development process effectively, regardless of their location or the device they are using.

You can find more information about Promptmetheus’ features on its website.

Want to learn more? You can find Mirascope’s code samples mentioned in this article on both our documentation site and on GitHub.

Join our beta list!

Get updates and early access to try out new features as a beta tester.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.