--- title: Running AI client-side teaser: Running AI client-side is a great way to provide privacy and save costs. Learn how to do it and when it is useful. tags: javascript,artificial intelligence,machine learning,web author: Matheus Richard published_on: 2024-09-17 --- Everyone is doing AI nowadays. While [using LLMs] or [diffusion models] via API is a [great way] to use AI, it is also possible to run AI client-side. This article will show you how to do it and some of the trade-offs involved. ## What can be done? Actually, quite a lot. We're going to use the [Transformers.js] library to run AI client-side. It supports a variety of tasks, in different categories. Some examples are: - **Natural Language Processing**: [summarization], [translation], and [question answering]. - **Computer Vision**: [object detection], [upscaling images], and [removing backgrounds]. - **Audio**: [automatic speech recognition] and [text to speech]. You can see the [full list of tasks and models here]. I'll use a Rails app to demonstrate some of the features. The full code is available [in this repo]. ## Describing images Writing alt text for images is [important for accessibility], and AI can help giving us a head start. Install the `@xenova/transformers` package (I used `yarn`, but you might have a different setup in your project), and you're good to go. The idea is to use an image to text model to automatically add a description to an image when the user uploads it. Unfortunately, Action text doesn't have events for when an image finishes uploading -- [should be out in Rails 8], though --, so we'll put the description in an `` tag and let the user copy from there. Assuming we have a basic Article model with a title and content, on the `app/views/articles/_form.html.erb` view we can add the following: ```erb <%= form_with( model: article, data: { controller: "autocaption", action: "trix-attachment-add->autocaption#saveAttachment" }) do |form| %>

<%= button_tag "Describe image", type: :button, class: "secondary", data: {action: "autocaption#describeImage", autocaption_target: "trigger"} %> Description:

<% end %> ``` After setting up the controller and the targets, we save the attachment when the user uploads one. The `autocaption_controller.js` file will look like this: ```javascript import { Controller } from "@hotwired/stimulus" import { pipeline, RawImage } from '@xenova/transformers'; export default class extends Controller { static targets = ["output", "trigger"] async initialize() { this.triggerTarget.disabled = true; this.captioner = await pipeline('image-to-text'); } saveAttachment(event) { this.attachment = event.attachment this.triggerTarget.disabled = false; } async describeImage() { const previousLabel = this.triggerTarget.textContent; this.triggerTarget.textContent = 'Analyzing...'; this.triggerTarget.disabled = true; const img = await RawImage.fromBlob(this.attachment.file); const caption = (await this.captioner(img))[0].generated_text; this.outputTarget.textContent = caption; this.triggerTarget.textContent = previousLabel; } } ``` After loading the model, we use to describe the image and update the output text with the description. Here's that in action:

## Text to speech Let's use the text to speech model to read the article content for the users. The view is straightforward: ```erb

<%= @article.title %>

<%= button_tag "Read aloud", type: :button, class: "secondary", data: { action: "text-to-speech#play", text_to_speech_target: "trigger" } %>

<%= @article.content %>

``` Again, we setup the controller and the targets. We also have a button to start the reading. The `text_to_speech_controller.js` file will look like this: ```javascript import { Controller } from "@hotwired/stimulus" import { pipeline } from '@xenova/transformers'; const speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin'; export default class extends Controller { static targets = ["text", "trigger"] async initialize() { this.triggerTarget.disabled = true; this.synthesizer = await pipeline('text-to-speech'); this.triggerTarget.disabled = false; } async play() { const previousLabel = this.triggerTarget.textContent; this.triggerTarget.textContent = 'Preparing...'; this.triggerTarget.disabled = true; const audioData = await this.synthesizer(this.textTarget.textContent, { speaker_embeddings }); this.triggerTarget.textContent = 'Reading...'; this.#playAudio(audioData); this.triggerTarget.textContent = previousLabel; this.triggerTarget.disabled = false; } #playAudio(audioData) { const audioContext = new (window.AudioContext || window.webkitAudioContext)(); const audioBuffer = audioContext.createBuffer(1, audioData.audio.length, audioData.sampling_rate); audioBuffer.copyToChannel(audioData.audio, 0, 0); const source = audioContext.createBufferSource(); source.buffer = audioBuffer; source.connect(audioContext.destination); source.start(); } } ``` Pretty similar to the previous one, but with some extra work to play the audio -- in a production scenario you could save the audio as a wav file and play it --. It will look like this:

## When client-side AI is useful? Let's look at the pros and cons of running AI client-side to understand when it is useful. ### Pros - **Privacy**: no data is sent to a server. Users don't have to worry about their data being stored or shared. This is especially important for sensitive data or in countries with strict data protection laws. - **Cost**: no need to pay for a server. The AI is running in the users' browser. Useful for small projects or prototypes. - **Offline**: once the model is downloaded, it can run offline. This is useful for PWAs, mobile apps, or as a fallback for an external API not being available. ### Cons - **Performance**: client-side AI is often slower than server-side AI. It also depends on the user's device, which can make the whole experience slow. - **Quality**: client-side AI is often less accurate than server-side AI, since the models are much smaller to run on the client. If you need high accuracy, this is not the way to go. - **Bandwidth**: client-side AI requires the model to be downloaded, which can be large and slow depending on the user's connection. The models aren't shared across websites, which is taxing on the user's bandwidth. This is one of the reasons why [Chrome is experimenting with shipping an AI model with the browser][chrome-ai]. You are now equipped to make a better decision on whether to use client-side AI or not. If privacy and performance are concerns, you might also consider [hosting your own models], or even [using transformers in your backend][transformers_rb]. [using LLMs]: https://thoughtbot.com/blog/tags/artificial-intelligence [diffusion models]: https://thoughtbot.com/blog/tips-for-using-ai-generated-images-in-your-slides [great way]: https://thoughtbot.com/blog/ruby-on-rails-is-great-for-ai [chrome-ai]: https://developer.chrome.com/docs/ai/built-in [hosting your own models]: https://thoughtbot.com/blog/how-to-use-open-source-LLM-model-locally [transformers_rb]: https://github.com/ankane/transformers-ruby [Transformers.js]: https://huggingface.co/docs/transformers.js/index [full list of tasks and models here]: https://huggingface.co/docs/transformers.js/en/index#supported-tasksmodels [summarization]: https://huggingface.co/tasks/summarization [translation]: https://huggingface.co/tasks/translation [question answering]: https://huggingface.co/tasks/question-answering [object detection]: https://huggingface.co/tasks/object-detection [automatic speech recognition]: https://huggingface.co/tasks/speech-recognition [text to speech]: https://huggingface.co/tasks/text-to-speech [upscaling images]: https://huggingface.co/fal/AuraSR-v2 [removing backgrounds]: https://huggingface.co/spaces/Xenova/remove-background-web [important for accessibility]: https://thoughtbot.com/blog/alt-vs-figcaption [in this repo]: https://github.com/MatheusRich/client-side-ai-demo [should be out in Rails 8]: https://github.com/rails/rails/pull/52680