---
title: Running AI client-side
teaser: Running AI client-side is a great way to provide privacy and save costs. Learn
  how to do it and when it is useful.
tags: javascript,artificial intelligence,machine learning,web
author: Matheus Richard
published_on: 2024-09-17
---

Everyone is doing AI nowadays. While [using LLMs] or [diffusion models] via API
is a [great way] to use AI, it is also possible to run AI client-side. This
article will show you how to do it and some of the trade-offs involved.

## What can be done?

Actually, quite a lot. We're going to use the [Transformers.js] library to run
AI client-side. It supports a variety of tasks, in different categories. Some
examples are:

- **Natural Language Processing**: [summarization], [translation], and [question answering].
- **Computer Vision**: [object detection], [upscaling images], and [removing backgrounds].
- **Audio**: [automatic speech recognition] and [text to speech].

You can see the [full list of tasks and models here]. I'll use a Rails app to
demonstrate some of the features. The full code is available [in this repo].

<aside class="info">
  <p>For the sake of brevity, I'll use the simplest models in the examples. They're not the best available, but they're good enough for a demo.</p>
  <p>Feel free to explore the full list of tasks and models to find the best one for your use case.</p>
</aside>

## Describing images

Writing alt text for images is [important for accessibility], and AI can help
giving us a head start. Install the `@xenova/transformers` package (I used `yarn`, but you might have a different setup in your project), and you're
good to go.

The idea is to use an image to text model to automatically add a description to
an image when the user uploads it. Unfortunately, Action text doesn't have
events for when an image finishes uploading -- [should be out in Rails 8],
though --, so we'll put the description in an `<output>` tag and let the user
copy from there.

Assuming we have a basic Article model with a title and content, on the
`app/views/articles/_form.html.erb` view we can add the following:

```erb
<%= form_with(
  model: article,
  data: {
    controller: "autocaption",
    action: "trix-attachment-add->autocaption#saveAttachment"
  }) do |form| %>

  <!-- the rest of the form -->

  <div>
    <%= button_tag "Describe image", type: :button, class: "secondary", data: {action: "autocaption#describeImage", autocaption_target: "trigger"} %>
    <span><strong>Description: </strong><output data-autocaption-target="output"></output></span>
  </div>
<% end %>
```

After setting up the controller and the targets, we save the attachment when
the user uploads one. The `autocaption_controller.js` file will look like this:

```javascript
import { Controller } from "@hotwired/stimulus"
import { pipeline, RawImage } from '@xenova/transformers';

export default class extends Controller {
  static targets = ["output", "trigger"]

  async initialize() {
    this.triggerTarget.disabled = true;
    this.captioner = await pipeline('image-to-text');
  }

  saveAttachment(event) {
    this.attachment = event.attachment
    this.triggerTarget.disabled = false;
  }

  async describeImage() {
    const previousLabel = this.triggerTarget.textContent;
    this.triggerTarget.textContent = 'Analyzing...';
    this.triggerTarget.disabled = true;

    const img = await RawImage.fromBlob(this.attachment.file);
    const caption = (await this.captioner(img))[0].generated_text;

    this.outputTarget.textContent = caption;
    this.triggerTarget.textContent = previousLabel;
  }
}
```

After loading the model, we use to describe the image and update the output text
with the description. Here's that in action:

<video controls muted playsinline height="auto" src="https://images.thoughtbot.com/ajry43z2ue00apk8kx1iefyjjxjb_describing.mp4"></video>

## Text to speech

Let's use the text to speech model to read the article content for the users.
The view is straightforward:

```erb
<article data-controller="text-to-speech">
  <header class="flex items-center gap-4">
    <h1><%= @article.title %></h1>

    <%= button_tag "Read aloud", type: :button, class: "secondary", data: { action: "text-to-speech#play", text_to_speech_target: "trigger" } %>
  </header>

  <div data-text-to-speech-target="text">
    <%= @article.content %>
  </div>

  <!-- ... -->
</article>
```

Again, we setup the controller and the targets. We also have a button to start
the reading. The `text_to_speech_controller.js` file will look like this:

```javascript
import { Controller } from "@hotwired/stimulus"
import { pipeline } from '@xenova/transformers';

const speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin';

export default class extends Controller {
  static targets = ["text", "trigger"]

  async initialize() {
    this.triggerTarget.disabled = true;
    this.synthesizer = await pipeline('text-to-speech');
    this.triggerTarget.disabled = false;
  }

  async play() {
    const previousLabel = this.triggerTarget.textContent;
    this.triggerTarget.textContent = 'Preparing...';
    this.triggerTarget.disabled = true;

    const audioData = await this.synthesizer(this.textTarget.textContent, { speaker_embeddings });

    this.triggerTarget.textContent = 'Reading...';
    this.#playAudio(audioData);

    this.triggerTarget.textContent = previousLabel;
    this.triggerTarget.disabled = false;
  }

  #playAudio(audioData) {
    const audioContext = new (window.AudioContext || window.webkitAudioContext)();
    const audioBuffer = audioContext.createBuffer(1, audioData.audio.length, audioData.sampling_rate);
    audioBuffer.copyToChannel(audioData.audio, 0, 0);

    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(audioContext.destination);

    source.start();
  }
}
```

Pretty similar to the previous one, but with some extra work to play the audio
-- in a production scenario you could save the audio as a wav file and play it --.
It will look like this:

<video controls playsinline height="auto" src="https://images.thoughtbot.com/jw1y3c1so4n395rwl6ai10314b3x_read-aloud.mp4"></video>

## When client-side AI is useful?

Let's look at the pros and cons of running AI client-side to understand when it
is useful.

### Pros

- **Privacy**: no data is sent to a server. Users don't have to worry about their
  data being stored or shared. This is especially important for sensitive data
  or in countries with strict data protection laws.
- **Cost**: no need to pay for a server. The AI is running in the users'
  browser. Useful for small projects or prototypes.
- **Offline**: once the model is downloaded, it can run offline. This is useful for
  PWAs, mobile apps, or as a fallback for an external API not being available.

### Cons

- **Performance**: client-side AI is often slower than server-side AI. It also
  depends on the user's device, which can make the whole experience slow.
- **Quality**: client-side AI is often less accurate than server-side AI, since
  the models are much smaller to run on the client. If you need high accuracy,
  this is not the way to go.
- **Bandwidth**: client-side AI requires the model to be downloaded, which can
  be large and slow depending on the user's connection. The models aren't shared
  across websites, which is taxing on the user's bandwidth. This is one of the
  reasons why [Chrome is experimenting with shipping an AI model with the
  browser][chrome-ai].

You are now equipped to make a better decision on whether to use client-side AI
or not. If privacy and performance are concerns, you might also consider [hosting
your own models], or even [using transformers in your backend][transformers_rb].

[using LLMs]: https://thoughtbot.com/blog/tags/artificial-intelligence
[diffusion models]: https://thoughtbot.com/blog/tips-for-using-ai-generated-images-in-your-slides
[great way]: https://thoughtbot.com/blog/ruby-on-rails-is-great-for-ai
[chrome-ai]: https://developer.chrome.com/docs/ai/built-in
[hosting your own models]: https://thoughtbot.com/blog/how-to-use-open-source-LLM-model-locally
[transformers_rb]: https://github.com/ankane/transformers-ruby
[Transformers.js]: https://huggingface.co/docs/transformers.js/index
[full list of tasks and models here]: https://huggingface.co/docs/transformers.js/en/index#supported-tasksmodels
[summarization]: https://huggingface.co/tasks/summarization
[translation]: https://huggingface.co/tasks/translation
[question answering]: https://huggingface.co/tasks/question-answering
[object detection]: https://huggingface.co/tasks/object-detection
[automatic speech recognition]: https://huggingface.co/tasks/speech-recognition
[text to speech]: https://huggingface.co/tasks/text-to-speech
[upscaling images]: https://huggingface.co/fal/AuraSR-v2
[removing backgrounds]: https://huggingface.co/spaces/Xenova/remove-background-web
[important for accessibility]: https://thoughtbot.com/blog/alt-vs-figcaption
[in this repo]: https://github.com/MatheusRich/client-side-ai-demo
[should be out in Rails 8]: https://github.com/rails/rails/pull/52680
