We recently started an internal project to gain experience with Large Language Models (LLMs) and generate content for a role play-based application. If you are unfamiliar with LLM or AI in general, a good starting point could be Mike and Steffani’s article on the topic.
One of our app’s usages is to generate the structure of a new game such as the main character, the background story or the first choice the player will have to face. From the generated content, we want to create objects and store the data in database tables.
A lot of us have already experienced communication with an AI model. They are impressive at understanding human sentences and answering in the same way. For our project, we can simply ask in a natural way:
> I want to play a role-play game. Can you generate a character for
me and a background story for my adventure? I would also like to
face a first situation and have to choose between three different
choices to move forward.
This is an actual answer based on the previous input:
Of course! Let's begin with the character generation and background
story:
Character Generation:
Name: Aria Moonstone
Age: 25
Gender: Female
Race: Half-Elf (Elven ancestry on her mother's side, Human father)
Class: Rogue/Bard
Appearance: Aria has long silver hair that she often keeps in a
braid. Her eyes are a mix of blue and green, [...].
Background Story:
Aria Moonstone grew up in the bustling city of Elvendale, [...].
One day, after performing a particularly heart-wrenching
ballad, [...].
First Situation:
You find yourself at the edge of Whisperwood Forest, a dense and
ominous forest [...]. You can choose one of the following options
to proceed:
1. Follow the path on the left, which appears to be more
well-traveled and easier to navigate but leads through the heart of
the forest where dangerous creatures are known to reside.
2. Take the middle path, which is less trodden and seems to lead
towards a small village that might provide some information about the
lost relic.
3. Choose the rightmost path, which appears to be overgrown and
difficult to follow but could potentially lead to an ancient ruin
where the relic might be hidden.
There are already a few learnings here.
First, each model can be more or less talkative. For this experiment, we are
using a fine-tuned model based on Mistral 7B called OpenHermes and it
generated long paragraphs, such that I had to trim most of the content to make
this article more pleasant to read, with [...]
.
Second, many of these models are made for answering to a human, like a human, and are polite. You can notice the first words being “Of course! Let’s begin”. It is always pleasant to feel like we are having a nice conversation with someone helping us. However, in our case these greetings just make it harder to parse the content.
Finally, the generated content is already not too bad to parse. It is well structured, a few regular expressions would do the job.
Problem with consistency
Even if this payload is parsable, it still is not something we can safely rely on.
Using the same model, I started a new prompt and provided the exact same input. This time, a few differences appeared in the structure of the answer:
- Character Generation:
+ Character:
- Race: Half-Elf (Elven ancestry on her mother's side, Human father)
+ Race/Species: Half-Elf
- First Situation:
+ Adventure Situation:
-
-
+
+ What choice do you make, Aria Moonstone?
While it doesn’t seem like big differences, especially from a human-reading perspective, it makes things less predictable for the program in charge of parsing the content.
It looks like the AI model could unpredictably change the order, labels or even presence of each section.
What about changing model? If at some point we want to experiment with another trained model to generate original content? I tried with OpenChat, another fine-tuned model not based on Mistral. I am not going to list every single difference, but the response was very different. The most important one was the missing information: we don’t have age, gender, race or class attributes anymore.
We could improve our prompt to make it explicit what kind of information we are expecting, and while improving the prompt is beneficial most of the time, we will never achieve a level of confidence in the generated output format.
Bring consistency with a schema
In programming, we use schemas all the time. In a web application context with
Rails, we have the database schema db/schema.rb
. Using or providing a
Web API, we often have to
deal with schemas as well (SOAP, JSON, GraphQL, …).
These schemas exist for a very similar reason as our content parser: Share a vocabulary to enable consistency, validity, interaction control and documentation.
I have used JSON Schema on multiple projects, mostly in tests to validate the format of a JSON API output, but also to validate user input. It is just one tool to make it easier for two systems to communicate together. Why not apply this to the LLM?
About JSON Schema
JSON Schema is a specification used for testing, validation and documentation. When I write these lines, the latest stable version is from December 2020, which is old enough for most of the trained models out there to probably know it.
You can specify different data types such as strings, numbers, arrays, objects, but also constraints or presence validation.
For example, if I want the JSON object to have a character described as an
object with a mandatory health_points
attribute (non-negative integer between
0 and 100), I can use the following schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"character": {
"type": "object",
"properties": {
"health_points": {
"description": "Character's health points (HP)",
"type": "number",
"minimum": 0,
"maximum": 100
}
},
"required": [
"health_points"
]
}
},
"required": [
"character"
]
}
This seems familiar to something we could be interested in.
Consistency with business logic constraints
Now that we have access to a great tool to describe the format and values we expect, let’s have a look at a real example we have implemented in our project.
We have the following models Player
, Step
and Choice
:
class Player < ApplicationRecord
validates :name, presence: true # string
validates :health_points, presence: true # integer
validates :category, presence: true # integer
validates :background, presence: true # text
enum category: {
witch: 0,
wizard: 1,
warrior: 2,
elf: 3,
dwarf: 4,
hobbit: 5
}
end
class Step < ApplicationRecord
has_many :choices, dependent: :destroy
validates :description, presence: true # text
end
class Choice < ApplicationRecord
belongs_to :step
validates :description, presence: true # string
end
When we generate the game, we need to give the AI model a comprehensive request that will make it return all the information we want, exactly how we want. Thankfully, we can use a plain Ruby object to fetch all the constraints and make them readable to the model.
Because we are a little bit greedy, we want the request to be dynamic. The models and the database have some constraints that we want to be automatically reflected in the JSON Schema. Basically, the less we write human sentences, the more chance we have to receive accurate and consistent data.
We also want the character’s background story and the step description to be multiple sentences in order to feel immersed into the story. The possible choices that are offered to the character, though, must be short and brief.
Finally, we just have to wrap the JSON Schema inside a few sentences to let the
model understand our intentions. Some constraints about the meaning of content
itself cannot be defined as rules in the schema. If we want, for example, to
have the character.background
attribute to be written in Old English, JSON
Schema only provides a structure for the format. We tried to use
the description
attribute available in the specifications but we had poor and
inconsistent results.
class NewGamePrompt
def self.generate
new.generate
end
def generate
"
Consider the following JSON Schema based on the 2020-12
specification:
```json
#{schema.to_json}
```
This JSON Schema represents the format I want you to follow to
generate your answer.
Now, generate a JSON object that will contain the following
information:
I want to create a role play game. You will provide a character
and a background story for the quest.
Based on the background story, you will provide choices the
player can take to continue the story.
Based on all this information, generate a valid JSON object
containing the game information I requested.
"
end
private
def character_types
Player.categories.keys
end
def schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
type: :object,
properties: {
character: {
type: :object,
properties: {
name: {
description: "Name of the character",
type: :string
},
type: {
description: "Class of the character",
type: :string,
enum: character_types
},
background: {
description: "Background story of the character",
type: :string,
minLength: 300
},
health: {
description: "Initial health points of the character",
type: :number,
minimum: 0,
maximum: 100
}
},
required: %w[name type background health]
},
step: {
type: "object",
properties: {
description: {
description: "Detailed first step in the story",
type: :string,
minLength: 300
},
choices: {
type: :array,
"minItems": 3,
"maxItems": 3,
items: {
type: :object,
properties: {
description: {
description: "Possible choice",
type: :string,
maxLength: 250
}
},
required: %w[description]
}
}
},
required: %w[context description]
},
required: %w[character step]
}
}
end
end
Side note: This is just a plain Ruby object, it can be tested. Here is an example of what test coverage for this object could look like:
RSpec.describe NewGamePrompt do
describe "::generate" do
it "returns a prompt" do
prompt = described_class.generate
expect(prompt).to be_a String
end
it "includes the player's possible categories" do
prompt = described_class.generate
Player.categories.keys.each do |category|
expect(prompt).to include category
end
end
end
end
Final reliable response
After these efforts to describe properly the format we expect to safely parse it, here is an actual response from the LLM:
{
"character": {
"name": "Eadric Stormwind",
"type": "wizard",
"background": "Eadric Stormwind is a skilled mage from the mystical city of Arcanis. He was born into a family of renowned wizards and grew up in a magical household, learning the art of spell casting at an early age. As he matured, Eadric developed a strong affinity for the element of fire, which led him to specialize in Pyro-magic.",
"health": 100
},
"step": {
"description": "After months of travel, Eadric finally reached the ruins of the a civilization. As Eadric approached the entrance to the ruins, a mysterious figure emerged from the shadows. The figure introduced itself as the guardian of the ruins and demanded that Eadric prove their worth before being allowed to enter.",
"choices": [
{
"description": "Challenge the guardian to a magical duel to prove your worth."
},
{
"description": "Convince the guardian of your good intentions by sharing your knowledge of ancient magic."
},
{
"description": "Sneak past the guardian and hope to avoid detection."
}
]
}
}
The generated content is safe to be parsed and used for record creation.
Disclaimer
Depending on the trained model we used, we discovered that sometimes the AI really wanted to answer like a human. It happened to have this kind of answer, before the JSON code block:
Here's an example of a JSON object that follows the provided schema:
[...]
For our internal experiment we just added a regular expression to only parse
what’s inside the json
block.
Another thing to keep in mind is the few iterations it took us to find the
right sentences and their order to go with the JSON Schema. We tested
changing the order of sentences to follow the schema “desired content →
requested format”, instead of “requested format → desired content →
reminder of the format”, with three different
models (openhermes2.5-mistral
, llama2
and openchat
). Each time, the model
would return the JSON Schema itself, rather than a valid JSON object that
follows that said schema.
Our prompt is not perfect and it might take a few attempts to accompany a JSON Schema with the right context so that the model you will use understands exactly what you expect.
Conclusion
LLMs are powerful tools. Just like any tool, if we want to use them properly, we need to understand them and adapt to their capabilities. Using a schema is one solution for getting closer to consistency.
Let’s keep in mind that all models are different. Powerful and famous ones are sometimes proprietary, we don’t know how they were built and what they have been trained with. Alongside questions about ethics, this also means we need to develop skills and tools when using them to get the best outcome.