--- title: Get consistent data from your LLM with JSON Schema teaser: How to parse content from a tool that is made to speak in human sentences. tags: json,large language models,artificial intelligence,ruby author: Rémy Hannequin published_on: 2024-02-13 --- We recently started an internal project to gain experience with **Large Language Models (LLMs)** and generate content for a role play-based application. If you are unfamiliar with LLM or AI in general, a good starting point could be [Mike and Steffani's article] on the topic. [Mike and Steffani's article]: https://thoughtbot.com/blog/what-llms-are-in-ai One of our app's usages is to generate the structure of a new game such as the main character, the background story or the first choice the player will have to face. From the generated content, we want to create objects and store the data in database tables. A lot of us have already experienced communication with an AI model. They are impressive at understanding human sentences and answering in the same way. For our project, we can simply ask in a natural way: ``` > I want to play a role-play game. Can you generate a character for me and a background story for my adventure? I would also like to face a first situation and have to choose between three different choices to move forward. ``` This is an actual answer based on the previous input: ``` Of course! Let's begin with the character generation and background story: Character Generation: Name: Aria Moonstone Age: 25 Gender: Female Race: Half-Elf (Elven ancestry on her mother's side, Human father) Class: Rogue/Bard Appearance: Aria has long silver hair that she often keeps in a braid. Her eyes are a mix of blue and green, [...]. Background Story: Aria Moonstone grew up in the bustling city of Elvendale, [...]. One day, after performing a particularly heart-wrenching ballad, [...]. First Situation: You find yourself at the edge of Whisperwood Forest, a dense and ominous forest [...]. You can choose one of the following options to proceed: 1. Follow the path on the left, which appears to be more well-traveled and easier to navigate but leads through the heart of the forest where dangerous creatures are known to reside. 2. Take the middle path, which is less trodden and seems to lead towards a small village that might provide some information about the lost relic. 3. Choose the rightmost path, which appears to be overgrown and difficult to follow but could potentially lead to an ancient ruin where the relic might be hidden. ``` There are already a few learnings here. First, **each model can be more or less talkative**. For this experiment, we are using a fine-tuned model based on [Mistral 7B] called [OpenHermes] and it generated long paragraphs, such that I had to trim most of the content to make this article more pleasant to read, with `[...]`. [Mistral 7B]: https://huggingface.co/mistralai/Mistral-7B-v0.1 [OpenHermes]: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B Second, many of **these models are made for answering to a human, like a human**, and are polite. You can notice the first words being *"Of course! Let's begin"*. It is always pleasant to feel like we are having a nice conversation with someone helping us. However, in our case these greetings just make it harder to parse the content. Finally, the generated content is already not too bad to parse. It is **well structured**, a few regular expressions would do the job. ## Problem with consistency Even if this payload is parsable, it still is not something we can safely rely on. Using the same model, I started a new prompt and provided the exact same input. This time, a few differences appeared in the structure of the answer: ```diff - Character Generation: + Character: ``` ```diff - Race: Half-Elf (Elven ancestry on her mother's side, Human father) + Race/Species: Half-Elf ``` ```diff - First Situation: + Adventure Situation: ``` ```diff - - + + What choice do you make, Aria Moonstone? ``` While it doesn't seem like big differences, especially from a human-reading perspective, it makes things less predictable for the program in charge of parsing the content. It looks like the AI model could unpredictably change the order, labels or even presence of each section. What about changing model? If at some point we want to experiment with another trained model to generate original content? I tried with [OpenChat], another fine-tuned model not based on Mistral. I am not going to list every single difference, but the response was very different. The most important one was the missing information: we don't have age, gender, race or class attributes anymore. [OpenChat]: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6 We could improve our prompt to make it explicit what kind of information we are expecting, and while improving the prompt is beneficial most of the time, we will never achieve a level of confidence in the generated output format. ## Bring consistency with a schema In programming, we use schemas all the time. In a web application context with Rails, we have the database schema `db/schema.rb`. Using or providing a Web API, we often have to deal with schemas as well (SOAP, JSON, [GraphQL], ...). [GraphQL]: https://thoughtbot.com/blog/tdding-with-react-and-apollo These schemas exist for a very similar reason as our content parser: **Share a vocabulary to enable consistency, validity, interaction control and documentation.** I have used [JSON Schema] on multiple projects, mostly in tests to validate the format of a JSON API output, but also to validate user input. It is just one tool to make it easier for two systems to communicate together. Why not apply this to the LLM? [JSON Schema]: https://json-schema.org/ ## About JSON Schema JSON Schema is a [specification] used for **testing, validation and documentation**. When I write these lines, the latest stable version is from December 2020, which is old enough for most of the trained models out there to probably know it. [specification]: https://json-schema.org/specification You can specify different data types such as strings, numbers, arrays, objects, but also constraints or presence validation. For example, if I want the JSON object to have a character described as an object with a mandatory `health_points` attribute (non-negative integer between 0 and 100), I can use the following schema: ```json { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "character": { "type": "object", "properties": { "health_points": { "description": "Character's health points (HP)", "type": "number", "minimum": 0, "maximum": 100 } }, "required": [ "health_points" ] } }, "required": [ "character" ] } ``` This seems familiar to something we could be interested in. ## Consistency with business logic constraints Now that we have access to a great tool to describe the format and values we expect, let's have a look at a real example we have implemented in our project. We have the following models `Player`, `Step` and `Choice`: ```rb class Player < ApplicationRecord validates :name, presence: true # string validates :health_points, presence: true # integer validates :category, presence: true # integer validates :background, presence: true # text enum category: { witch: 0, wizard: 1, warrior: 2, elf: 3, dwarf: 4, hobbit: 5 } end class Step < ApplicationRecord has_many :choices, dependent: :destroy validates :description, presence: true # text end class Choice < ApplicationRecord belongs_to :step validates :description, presence: true # string end ``` When we generate the game, we need to give the AI model a comprehensive request that will make it return all the information we want, exactly how we want. Thankfully, we can use a plain Ruby object to fetch all the constraints and make them readable to the model. Because we are a little bit greedy, we want the request to be dynamic. The models and the database have some constraints that we want to be automatically reflected in the JSON Schema. Basically, **the less we write human sentences, the more chance we have to receive accurate and consistent data**. We also want the character's background story and the step description to be multiple sentences in order to feel immersed into the story. The possible choices that are offered to the character, though, must be short and brief. Finally, we just have to wrap the JSON Schema inside a few sentences to let the model understand our intentions. **Some constraints about the meaning of content itself cannot be defined as rules in the schema**. If we want, for example, to have the `character.background` attribute to be written in Old English, JSON Schema only provides a structure for the format. We tried to use the `description` attribute available in the specifications but we had poor and inconsistent results. ```rb class NewGamePrompt def self.generate new.generate end def generate " Consider the following JSON Schema based on the 2020-12 specification: ```json #{schema.to_json} ``` This JSON Schema represents the format I want you to follow to generate your answer. Now, generate a JSON object that will contain the following information: I want to create a role play game. You will provide a character and a background story for the quest. Based on the background story, you will provide choices the player can take to continue the story. Based on all this information, generate a valid JSON object containing the game information I requested. " end private def character_types Player.categories.keys end def schema { "$schema": "https://json-schema.org/draft/2020-12/schema", type: :object, properties: { character: { type: :object, properties: { name: { description: "Name of the character", type: :string }, type: { description: "Class of the character", type: :string, enum: character_types }, background: { description: "Background story of the character", type: :string, minLength: 300 }, health: { description: "Initial health points of the character", type: :number, minimum: 0, maximum: 100 } }, required: %w[name type background health] }, step: { type: "object", properties: { description: { description: "Detailed first step in the story", type: :string, minLength: 300 }, choices: { type: :array, "minItems": 3, "maxItems": 3, items: { type: :object, properties: { description: { description: "Possible choice", type: :string, maxLength: 250 } }, required: %w[description] } } }, required: %w[context description] }, required: %w[character step] } } end end ``` *Side note*: This is just a plain Ruby object, it can be tested. Here is an example of what test coverage for this object could look like: ```rb RSpec.describe NewGamePrompt do describe "::generate" do it "returns a prompt" do prompt = described_class.generate expect(prompt).to be_a String end it "includes the player's possible categories" do prompt = described_class.generate Player.categories.keys.each do |category| expect(prompt).to include category end end end end ``` ## Final reliable response After these efforts to describe properly the format we expect to safely parse it, here is an actual response from the LLM: ```json { "character": { "name": "Eadric Stormwind", "type": "wizard", "background": "Eadric Stormwind is a skilled mage from the mystical city of Arcanis. He was born into a family of renowned wizards and grew up in a magical household, learning the art of spell casting at an early age. As he matured, Eadric developed a strong affinity for the element of fire, which led him to specialize in Pyro-magic.", "health": 100 }, "step": { "description": "After months of travel, Eadric finally reached the ruins of the a civilization. As Eadric approached the entrance to the ruins, a mysterious figure emerged from the shadows. The figure introduced itself as the guardian of the ruins and demanded that Eadric prove their worth before being allowed to enter.", "choices": [ { "description": "Challenge the guardian to a magical duel to prove your worth." }, { "description": "Convince the guardian of your good intentions by sharing your knowledge of ancient magic." }, { "description": "Sneak past the guardian and hope to avoid detection." } ] } } ``` The generated content is safe to be parsed and used for record creation. ### Disclaimer Depending on the trained model we used, we discovered that sometimes the AI _really_ wanted to answer like a human. It happened to have this kind of answer, before the JSON code block: ``` Here's an example of a JSON object that follows the provided schema: [...] ``` For our internal experiment we just added a regular expression to only parse what's inside the `json` block. Another thing to keep in mind is the few iterations it took us to find the right sentences and their order to go with the JSON Schema. We tested changing the order of sentences to follow the schema "desired content → requested format", instead of "requested format → desired content → reminder of the format", with three different models (`openhermes2.5-mistral`, `llama2` and `openchat`). Each time, the model would return the JSON Schema itself, rather than a valid JSON object that follows that said schema. Our prompt is not perfect and it might take a few attempts to accompany a JSON Schema with the right context so that the model you will use understands exactly what you expect. ## Conclusion LLMs are powerful tools. Just like any tool, if we want to use them properly, we need to understand them and adapt to their capabilities. **Using a schema is one solution for getting closer to consistency.** Let's keep in mind that all models are different. Powerful and famous ones are sometimes proprietary, we don't know how they were built and what they have been trained with. Alongside questions about ethics, this also means **we need to develop skills and tools when using them to get the best outcome**.