---
title: Get consistent data from your LLM with JSON Schema
teaser: How to parse content from a tool that is made to speak in human sentences.
tags: json,large language models,artificial intelligence,ruby
author: Rémy Hannequin
published_on: 2024-02-13
---

We recently started an internal project to gain experience with **Large Language
Models (LLMs)** and generate content for a role play-based application. If you
are unfamiliar with LLM or <abbr title="Artificial Intelligence">AI</abbr> in
general, a good starting point could be [Mike and Steffani's article] on the
topic.

[Mike and Steffani's article]: https://thoughtbot.com/blog/what-llms-are-in-ai

One of our app's usages is to generate the structure of a new game such as the
main character, the background story or the first choice the player will have to
face. From the generated content, we want to create objects and store the data
in database tables.

A lot of us have already experienced communication with an AI model. They are
impressive at understanding human sentences and answering in the same way. For
our project, we can simply ask in a natural way:

```
> I want to play a role-play game. Can you generate a character for
me and a background story for my adventure? I would also like to
face a first situation and have to choose between three different
choices to move forward.
```

This is an actual answer based on the previous input:

```
Of course! Let's begin with the character generation and background
story:

Character Generation:
Name: Aria Moonstone
Age: 25
Gender: Female
Race: Half-Elf (Elven ancestry on her mother's side, Human father)
Class: Rogue/Bard
Appearance: Aria has long silver hair that she often keeps in a
braid. Her eyes are a mix of blue and green, [...].

Background Story:
Aria Moonstone grew up in the bustling city of Elvendale, [...].

One day, after performing a particularly heart-wrenching
ballad, [...].

First Situation:
You find yourself at the edge of Whisperwood Forest, a dense and
ominous forest [...]. You can choose one of the following options
to proceed:

1. Follow the path on the left, which appears to be more
well-traveled and easier to navigate but leads through the heart of
the forest where dangerous creatures are known to reside.
2. Take the middle path, which is less trodden and seems to lead
towards a small village that might provide some information about the
lost relic.
3. Choose the rightmost path, which appears to be overgrown and
difficult to follow but could potentially lead to an ancient ruin
where the relic might be hidden.
```

There are already a few learnings here.

First, **each model can be more or less talkative**. For this experiment, we are
using a fine-tuned model based on [Mistral 7B] called [OpenHermes] and it
generated long paragraphs, such that I had to trim most of the content to make
this article more pleasant to read, with `[...]`.

[Mistral 7B]: https://huggingface.co/mistralai/Mistral-7B-v0.1

[OpenHermes]: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

Second, many of **these models are made for answering to a human, like a
human**, and are polite. You can notice the first words being *"Of course! Let's
begin"*. It is always pleasant to feel like we are having a nice conversation
with someone helping us. However, in our case these greetings just make it
harder to parse the content.

Finally, the generated content is already not too bad to parse. It is **well
structured**, a few regular expressions would do the job.

## Problem with consistency

Even if this payload is parsable, it still is not something we can safely rely
on.

Using the same model, I started a new prompt and provided the exact same input.
This time, a few differences appeared in the structure of the answer:

```diff
- Character Generation:
+ Character:
```

```diff
- Race: Half-Elf (Elven ancestry on her mother's side, Human father)
+ Race/Species: Half-Elf
```

```diff
- First Situation:
+ Adventure Situation:
```

```diff
-
-
+
+ What choice do you make, Aria Moonstone?
```

While it doesn't seem like big differences, especially from a human-reading
perspective, it makes things less predictable for the program in charge of
parsing the content.

It looks like the AI model could unpredictably change the order, labels or even
presence of each section.

What about changing model? If at some point we want to experiment with another
trained model to generate original content?
I tried with [OpenChat], another fine-tuned model not based on Mistral. I am not
going to list every single difference, but the response was very different. The
most important one was the missing information: we don't have age, gender, race
or class attributes anymore.

[OpenChat]: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6

We could improve our prompt to make it explicit what kind of information we are
expecting, and while improving the prompt is beneficial most of the time, we
will never achieve a level of confidence in the generated output format.

## Bring consistency with a schema

In programming, we use schemas all the time. In a web application context with
Rails, we have the database schema `db/schema.rb`. Using or providing a
Web <abbr title="Application Programming Interface">API</abbr>, we often have to
deal with schemas as well (SOAP, JSON, [GraphQL], ...).

[GraphQL]: https://thoughtbot.com/blog/tdding-with-react-and-apollo

These schemas exist for a very similar reason as our content parser:
**Share a vocabulary to enable consistency, validity, interaction control and
documentation.**

I have used [JSON Schema] on multiple projects, mostly in tests to validate the
format of a JSON API output, but also to validate user input. It is just one
tool to make it easier for two systems to communicate together. Why not apply
this to the LLM?

[JSON Schema]: https://json-schema.org/

## About JSON Schema

JSON Schema is a [specification] used for **testing, validation and
documentation**. When I write these lines, the latest stable version is from
December 2020, which is old enough for most of the trained models out there to
probably know it.

[specification]: https://json-schema.org/specification

You can specify different data types such as strings, numbers, arrays, objects,
but also constraints or presence validation.

For example, if I want the JSON object to have a character described as an
object with a mandatory `health_points` attribute (non-negative integer between
0 and 100), I can use the following schema:

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "character": {
      "type": "object",
      "properties": {
        "health_points": {
          "description": "Character's health points (HP)",
          "type": "number",
          "minimum": 0,
          "maximum": 100
        }
      },
      "required": [
        "health_points"
      ]
    }
  },
  "required": [
    "character"
  ]
}
```

This seems familiar to something we could be interested in.

## Consistency with business logic constraints

Now that we have access to a great tool to describe the format and values we
expect, let's have a look at a real example we have implemented in our project.

We have the following models `Player`, `Step` and `Choice`:

```rb

class Player < ApplicationRecord
  validates :name, presence: true # string
  validates :health_points, presence: true # integer
  validates :category, presence: true # integer
  validates :background, presence: true # text

  enum category: {
    witch: 0,
    wizard: 1,
    warrior: 2,
    elf: 3,
    dwarf: 4,
    hobbit: 5
  }
end

class Step < ApplicationRecord
  has_many :choices, dependent: :destroy

  validates :description, presence: true # text
end

class Choice < ApplicationRecord
  belongs_to :step

  validates :description, presence: true # string
end
```

When we generate the game, we need to give the AI model a comprehensive request
that will make it return all the information we want, exactly how we want.
Thankfully, we can use a plain Ruby object to fetch all the constraints and make
them readable to the model.

Because we are a little bit greedy, we want the request to be dynamic. The
models and the database have some constraints that we want to be automatically
reflected in the JSON Schema. Basically, **the less we write human sentences,
the more chance we have to receive accurate and consistent data**.

We also want the character's background story and the step description to be
multiple sentences in order to feel immersed into the story. The possible
choices that are offered to the character, though, must be short and brief.

Finally, we just have to wrap the JSON Schema inside a few sentences to let the
model understand our intentions. **Some constraints about the meaning of content
itself cannot be defined as rules in the schema**. If we want, for example, to
have the `character.background` attribute to be written in Old English, JSON
Schema only provides a structure for the format. We tried to use
the `description` attribute available in the specifications but we had poor and
inconsistent results.

```rb

class NewGamePrompt
  def self.generate
    new.generate
  end

  def generate
    "
    Consider the following JSON Schema based on the 2020-12
    specification:

    ```json

    #{schema.to_json}

    ```

    This JSON Schema represents the format I want you to follow to
    generate your answer.
    Now, generate a JSON object that will contain the following
    information:
    I want to create a role play game. You will provide a character
    and a background story for the quest.
    Based on the background story, you will provide choices the
    player can take to continue the story.
    Based on all this information, generate a valid JSON object
    containing the game information I requested.
   "
  end

  private

  def character_types
    Player.categories.keys
  end

  def schema
    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      type: :object,
      properties: {
        character: {
          type: :object,
          properties: {
            name: {
              description: "Name of the character",
              type: :string
            },
            type: {
              description: "Class of the character",
              type: :string,
              enum: character_types
            },
            background: {
              description: "Background story of the character",
              type: :string,
              minLength: 300
            },
            health: {
              description: "Initial health points of the character",
              type: :number,
              minimum: 0,
              maximum: 100
            }
          },
          required: %w[name type background health]
        },
        step: {
          type: "object",
          properties: {
            description: {
              description: "Detailed first step in the story",
              type: :string,
              minLength: 300
            },
            choices: {
              type: :array,
              "minItems": 3,
              "maxItems": 3,
              items: {
                type: :object,
                properties: {
                  description: {
                    description: "Possible choice",
                    type: :string,
                    maxLength: 250
                  }
                },
                required: %w[description]
              }
            }
          },
          required: %w[context description]
        },
        required: %w[character step]
      }
    }
  end
end
```

*Side note*: This is just a plain Ruby object, it can be tested. Here is an
example of what test coverage for this object could look like:

```rb
RSpec.describe NewGamePrompt do
  describe "::generate" do
    it "returns a prompt" do
      prompt = described_class.generate

      expect(prompt).to be_a String
    end

    it "includes the player's possible categories" do
      prompt = described_class.generate

      Player.categories.keys.each do |category|
        expect(prompt).to include category
      end
    end
  end
end
```

## Final reliable response

After these efforts to describe properly the format we expect to safely parse
it, here is an actual response from the LLM:

```json
{
  "character": {
    "name": "Eadric Stormwind",
    "type": "wizard",
    "background": "Eadric Stormwind is a skilled mage from the mystical city of Arcanis. He was born into a family of renowned wizards and grew up in a magical household, learning the art of spell casting at an early age. As he matured, Eadric developed a strong affinity for the element of fire, which led him to specialize in Pyro-magic.",
    "health": 100
  },
  "step": {
    "description": "After months of travel, Eadric finally reached the ruins of the a civilization. As Eadric approached the entrance to the ruins, a mysterious figure emerged from the shadows. The figure introduced itself as the guardian of the ruins and demanded that Eadric prove their worth before being allowed to enter.",
    "choices": [
      {
        "description": "Challenge the guardian to a magical duel to prove your worth."
      },
      {
        "description": "Convince the guardian of your good intentions by sharing your knowledge of ancient magic."
      },
      {
        "description": "Sneak past the guardian and hope to avoid detection."
      }
    ]
  }
}
```

The generated content is safe to be parsed and used for record creation.

### Disclaimer

Depending on the trained model we used, we discovered that sometimes the AI
_really_ wanted to answer like a human. It happened to have this kind of answer,
before the JSON code block:

```
Here's an example of a JSON object that follows the provided schema:

[...]
```

For our internal experiment we just added a regular expression to only parse
what's inside the `json` block.

Another thing to keep in mind is the few iterations it took us to find the
right sentences and their order to go with the JSON Schema. We tested
changing the order of sentences to follow the schema "desired content →
requested format", instead of "requested format → desired content →
reminder of the format", with three different
models (`openhermes2.5-mistral`, `llama2` and `openchat`). Each time, the model
would return the JSON Schema itself, rather than a valid JSON object that
follows that said schema.

Our prompt is not perfect and it might take a few attempts to accompany a
JSON Schema with the right context so that the model you will use
understands exactly what you expect.

## Conclusion

LLMs are powerful tools. Just like any tool, if we want to use them properly, we
need to understand them and adapt to their capabilities. **Using a schema is one
solution for getting closer to consistency.**

Let's keep in mind that all models are different. Powerful and famous ones are
sometimes proprietary, we don't know how they were built and what they have been
trained with. Alongside questions about ethics, this also means **we need to
develop skills and tools when using them to get the best outcome**.
