---
title: How to use an open source LLM model locally and remotely
teaser: Use Ollama to run an open source large language model on your local machine
  and on a Digital Ocean remote virtual machine.
tags: open source,artificial intelligence,language models,ollama
author:
- Jose Blanco
- Kate Young
published_on: 2024-02-08
---

When I first started researching how we can use an **open source AI model**, 
it seemed daunting initially. There's a lot of fragmented information on the 
internet from many different sources, making it difficult to start your 
project quickly.

The goal of this post is to have one easy-to-read article that will help you 
set up and run an open source AI model locally using a wrapper named **Ollama**. We'll also cover how to install Ollama in a virtual
machine so you can access it remotely. 

### What is Ollama?

[Ollama](https://github.com/ollama/ollama/tree/main?tab=readme-ov-file#ollama) is a tool that helps us **run large language models** on our 
local machine and makes experimentation more accessible. Among its many features, Ollama exposes an endpoint that we can use to interact with a model. 
In the case of this tutorial, we will use the `/api/chat` endpoint.

### Let’s start!

First, we will need to download Ollama. There are apps for [macOS](https://ollama.com/download/Ollama-darwin.zip) and [Windows](https://ollama.com/download/OllamaSetup.exe), you can install on macOS via [homebrew](https://formulae.brew.sh/formula/ollama), or on Linux using:

```
curl -fsSL https://ollama.com/install.sh | sh
```

There is even an official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) if you prefer working in a container.

Once downloaded, we need to choose one of the models that 
[Ollama supports](https://ollama.ai/library). This list is updated regularly, giving you easy access to the latest open-source LLMs. 

Feel free to take a look at the latest models and choose one. In our case, we will use [gemma3:4b](https://ollama.com/library/gemma3).
The Gemma 3 models from Google are lightweight and multimodal, processing text and images. According to Ollama, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices, so it's a good fit for our first adventure with Ollama.

To pull our chosen model, we need to run the following command in the terminal

```
ollama pull <model-name>
```

In our case, we will run

```
ollama pull gemma3:4b
```

When the model has downloaded, start the LLM server by running

```
ollama serve
```

Now we can start a chat with our friendly local LLM just like we would with any cloud-based service. Just run

```
ollama run <model-name>

>>>Send a message (/? for help)
``` 

You can type your prompt and should see something like this

```
>>> Are you a robot?
You could say that! I'm a large language model, which means I was created by the Gemma team at Google DeepMind. I'm essentially a
computer program – a robot in a way – that's been trained on a massive amount of text data.
I don't have feelings or consciousness like humans do. I process information and generate text based on the patterns I've learned.
Would you like to know more about how I work?
```

So far so good, we have our own LLM chat interface running in our terminal. But we can do one better than that! Ollama provides some handy api endpoints for our new local LLM server. Let's test one out. Open a new terminal window and try the following. Remember to replace "gemma3:4b" with the name of the model you are using.

```
curl http://localhost:11434/api/chat -d '{
    "model": "gemma3:4b",
    "messages": [{ "role": "user", "content": "Are you a robot?" }],
    "stream": false
}'
```

Notice how we've passed in the `stream: false` [parameter](https://github.com/ollama/ollama/blob/main/docs/api.md#parameters) so we get the whole response in one go.

Here's the typical json response we will get back from the endpoint

```
{
  "model": "gemma3:4b",
  "created_at": "2025-04-02T23:12:06.057467Z",
  "message": {
    "role": "assistant",
    "content": "That's a really interesting question! Technically, yes, I am a robot. More specifically, I’m a large language model, and I run on Google's AI infrastructure. \n\nHere's a breakdown of what that means:\n\n* **Large Language Model (LLM):** I was created by Google and trained on a massive amount of text data. This allows me to understand and generate human-like text.\n* **\"Robot\" in a Digital Sense:**  I don't have a physical body. I exist as code and algorithms running on computer servers. I can *simulate* intelligence and conversation, which is why people sometimes think of me as a robot. \n\nThink of it like a very sophisticated computer program designed to mimic human conversation. \n\n**Do you want to delve a little deeper into how I work, or would you like to chat about something else?** \n\nDo you want to know:\n\n*   How I was trained?\n*   What I’m capable of?\n*   How I differ from other AI?"
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 9936345541,
  "load_duration": 44646958,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 140604458,
  "eval_count": 222,
  "eval_duration": 9750388459
}
```

Amazing! So now that we have a direct chat interface as well as our very own LLM api endpoint running locally, the adventure can really begin. 

You've seen how easily we can interact with an open source model in our
**local machine** and start playing around with it. This response is generated
using your **local machine's computing power**, but what about running the model 
in a virtual machine? 

### Using Digital Ocean to install any LLM on our server

One of the easiest (and cheapest) ways I've found to set up Ollama with an 
open-source model in a virtual machine is by using Digital Ocean's **droplets**.
Droplet is just what Digital Ocean calls their virtual machines.

First, we will need to open an account with them and add a payment method. 
Normally, adding $5 is more than enough to play around and launch a virtual machine.

Once we have done that we will have access to our projects. By default, there 
is one project already created for us named **first-project**. Let's click there 
and then spin up a droplet. The following page is where we can configure our 
virtual machine. 

<aside class="info">
As **this is just for learning purposes I would recommend choosing a basic 
configuration**. In my case, I chose a region close to where I live, Ubuntu image, 
Basic Droplet type, and in CPU options I go for the Regular Disk Type: SSD and 
the $48 a month machine.
</aside>

These machines are **CPU-based and lack a GPU**, so you can anticipate a slightly 
slower response from the model compared to your own machine.

With this, we just need to set up a password or SSH key and create the 
virtual machine by clicking the Create Droplet button!

### Now the fun starts!

Once our virtual machine is created, it will get assigned an IP that we will
be able to access via ssh. To do this just open your terminal and run the 
following `ssh root@ip_of_your_address`. 

Depending on what type of security we have set up, we will be asked to 
insert the password or it will log us in using your SSH key. Nice, you are
inside your virtual machine! 

### How to set up Ollama in a virtual machine

Setting up Ollama in a virtual machine is quite similar to the
steps we have followed to install it locally. Access the virtual machine
with the command `ssh root@<your_digital_ocean_ip_address>` and then download Ollama. 

```
curl https://ollama.ai/install.sh | sh
```

Notice after the installation that we get a log stating where we can access
Ollama API, e.g. `>>> The Ollama API is now available at 127.0.0.1:11434.`. 

We need to stop the `ollama` service as we will need to start it while setting up
one environment variable.

```
service ollama stop
```

Now we need to set up the ollama host env variable with the following and then start the ollama server!

```
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```

Nice! We now have Ollama running on our virtual machine. Without closing that
window, open another terminal window and access the virtual machine again
`ssh root@<your_digital_ocean_ip_address>` to pull the model. 

```
ollama pull <model-name>
```

This time we will try a smaller model as we only have a CPU

```
ollama pull llama3.2
```

Once this is completed let's open a third terminal window in your **local** machine 
and try the following. Again, make sure to use your values for `<your_digital_ocean_ip_address>` and your chosen model name.

```
curl http://<your_digital_ocean_ip_address>:11434/api/chat -d '{
"model": "llama3.2",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": false
}'
```

Amazing! You will get a response but this time Ollama is **running on a virtual
machine instead of your local machine.**

### What if we want to run our model on the server forever?

Currently, when we quit the virtual machine terminal we can no longer call our endpoint. 

If you want the LLM model running non-stop in your virtual machine then you can do the following (run each command separately).

```
curl https://ollama.ai/install.sh | sh

ollama pull <model-name>

service ollama stop

nohup env OLLAMA_HOST=0.0.0.0:11434 ollama serve &
```

**`nohup` is a command available on Unix-based systems such as our Ubuntu 
distribution**, that keeps processes running even after you exit the terminal. 
It prevents the processes from receiving the HUP (hangup) signal.

This way we are running Ollama in the background and we can close the terminal
window without stopping the service. Now go ahead and try to call the endpoint
again from a local terminal window.

Voilà! You will still get a response from the model running in your virtual machine.
**This is great as we can now access our model from anywhere at any time!**

When you want to stop the ollama server, ssh back into your virtual machine. You can then check the process ID on port 11434 using `sudo lsof -i :11434` and stop the server using `kill <ollama-PID>`. It will look something like this:

```
root@ubuntu-s-8gb-lon1-01:~# sudo lsof -i :11434
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ollama  2791 root    3u  IPv6  20655      0t0  TCP *:11434 (LISTEN)
root@ubuntu-s-8gb-lon1-01:~# kill 2791
root@ubuntu-s-8gb-lon1-01:~# sudo lsof -i :11434
root@ubuntu-s-8gb-lon1-01:~#

```

### Ready to build something with AI?

You've seen that getting an LLM open source model running with Ollama is very straightforward. If you’re exploring how AI can supercharge your product or team, [get in touch as we’d love to help.](https://thoughtbot.com/services/machine-learning-artificial-intelligence-ai)
