How to use an open source LLM model locally and remotely

When I first started researching how we can use an open source AI model, it seemed daunting initially. There’s a lot of fragmented information on the internet from many different sources, making it difficult to start your project quickly.

The goal of this post is to have one easy-to-read article that will help you set up and run an open source AI model locally using a wrapper named Ollama. We’ll also cover how to install Ollama in a virtual machine so you can access it remotely.

What is Ollama?

Ollama is a tool that helps us run large language models on our local machine and makes experimentation more accessible. Among its many features, Ollama exposes an endpoint that we can use to interact with a model. In the case of this tutorial, we will use the /api/chat endpoint.

Let’s start!

First, we will need to download Ollama. There are apps for macOS and Windows, you can install on macOS via homebrew, or on Linux using:

curl -fsSL https://ollama.com/install.sh | sh

There is even an official Ollama Docker image if you prefer working in a container.

Once downloaded, we need to choose one of the models that Ollama supports. This list is updated regularly, giving you easy access to the latest open-source LLMs.

Feel free to take a look at the latest models and choose one. In our case, we will use gemma3:4b. The Gemma 3 models from Google are lightweight and multimodal, processing text and images. According to Ollama, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices, so it’s a good fit for our first adventure with Ollama.

To pull our chosen model, we need to run the following command in the terminal

ollama pull <model-name>

In our case, we will run

ollama pull gemma3:4b

When the model has downloaded, start the LLM server by running

ollama serve

Now we can start a chat with our friendly local LLM just like we would with any cloud-based service. Just run

ollama run <model-name>

>>>Send a message (/? for help)

You can type your prompt and should see something like this

>>> Are you a robot?
You could say that! I'm a large language model, which means I was created by the Gemma team at Google DeepMind. I'm essentially a
computer program – a robot in a way – that's been trained on a massive amount of text data.
I don't have feelings or consciousness like humans do. I process information and generate text based on the patterns I've learned.
Would you like to know more about how I work?

So far so good, we have our own LLM chat interface running in our terminal. But we can do one better than that! Ollama provides some handy api endpoints for our new local LLM server. Let’s test one out. Open a new terminal window and try the following. Remember to replace “gemma3:4b” with the name of the model you are using.

curl http://localhost:11434/api/chat -d '{
    "model": "gemma3:4b",
    "messages": [{ "role": "user", "content": "Are you a robot?" }],
    "stream": false
}'

Notice how we’ve passed in the stream: false parameter so we get the whole response in one go.

Here’s the typical json response we will get back from the endpoint

{
  "model": "gemma3:4b",
  "created_at": "2025-04-02T23:12:06.057467Z",
  "message": {
    "role": "assistant",
    "content": "That's a really interesting question! Technically, yes, I am a robot. More specifically, I’m a large language model, and I run on Google's AI infrastructure. \n\nHere's a breakdown of what that means:\n\n* **Large Language Model (LLM):** I was created by Google and trained on a massive amount of text data. This allows me to understand and generate human-like text.\n* **\"Robot\" in a Digital Sense:**  I don't have a physical body. I exist as code and algorithms running on computer servers. I can *simulate* intelligence and conversation, which is why people sometimes think of me as a robot. \n\nThink of it like a very sophisticated computer program designed to mimic human conversation. \n\n**Do you want to delve a little deeper into how I work, or would you like to chat about something else?** \n\nDo you want to know:\n\n*   How I was trained?\n*   What I’m capable of?\n*   How I differ from other AI?"
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 9936345541,
  "load_duration": 44646958,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 140604458,
  "eval_count": 222,
  "eval_duration": 9750388459
}

Amazing! So now that we have a direct chat interface as well as our very own LLM api endpoint running locally, the adventure can really begin.

You’ve seen how easily we can interact with an open source model in our local machine and start playing around with it. This response is generated using your local machine’s computing power, but what about running the model in a virtual machine?

Using Digital Ocean to install any LLM on our server

One of the easiest (and cheapest) ways I’ve found to set up Ollama with an open-source model in a virtual machine is by using Digital Ocean’s droplets. Droplet is just what Digital Ocean calls their virtual machines.

First, we will need to open an account with them and add a payment method. Normally, adding $5 is more than enough to play around and launch a virtual machine.

Once we have done that we will have access to our projects. By default, there is one project already created for us named first-project. Let’s click there and then spin up a droplet. The following page is where we can configure our virtual machine.

These machines are CPU-based and lack a GPU, so you can anticipate a slightly slower response from the model compared to your own machine.

With this, we just need to set up a password or SSH key and create the virtual machine by clicking the Create Droplet button!

Now the fun starts!

Once our virtual machine is created, it will get assigned an IP that we will be able to access via ssh. To do this just open your terminal and run the following ssh root@ip_of_your_address.

Depending on what type of security we have set up, we will be asked to insert the password or it will log us in using your SSH key. Nice, you are inside your virtual machine!

How to set up Ollama in a virtual machine

Setting up Ollama in a virtual machine is quite similar to the steps we have followed to install it locally. Access the virtual machine with the command ssh root@<your_digital_ocean_ip_address> and then download Ollama.

curl https://ollama.ai/install.sh | sh

Notice after the installation that we get a log stating where we can access Ollama API, e.g. >>> The Ollama API is now available at 127.0.0.1:11434..

We need to stop the ollama service as we will need to start it while setting up one environment variable.

service ollama stop

Now we need to set up the ollama host env variable with the following and then start the ollama server!

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Nice! We now have Ollama running on our virtual machine. Without closing that window, open another terminal window and access the virtual machine again ssh root@<your_digital_ocean_ip_address> to pull the model.

ollama pull <model-name>

This time we will try a smaller model as we only have a CPU

ollama pull llama3.2

Once this is completed let’s open a third terminal window in your local machine and try the following. Again, make sure to use your values for <your_digital_ocean_ip_address> and your chosen model name.

curl http://<your_digital_ocean_ip_address>:11434/api/chat -d '{
"model": "llama3.2",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": false
}'

Amazing! You will get a response but this time Ollama is running on a virtual machine instead of your local machine.

What if we want to run our model on the server forever?

Currently, when we quit the virtual machine terminal we can no longer call our endpoint.

If you want the LLM model running non-stop in your virtual machine then you can do the following (run each command separately).

curl https://ollama.ai/install.sh | sh

ollama pull <model-name>

service ollama stop

nohup env OLLAMA_HOST=0.0.0.0:11434 ollama serve &

nohup is a command available on Unix-based systems such as our Ubuntu distribution, that keeps processes running even after you exit the terminal. It prevents the processes from receiving the HUP (hangup) signal.

This way we are running Ollama in the background and we can close the terminal window without stopping the service. Now go ahead and try to call the endpoint again from a local terminal window.

Voilà! You will still get a response from the model running in your virtual machine. This is great as we can now access our model from anywhere at any time!

When you want to stop the ollama server, ssh back into your virtual machine. You can then check the process ID on port 11434 using sudo lsof -i :11434 and stop the server using kill <ollama-PID>. It will look something like this:

root@ubuntu-s-8gb-lon1-01:~# sudo lsof -i :11434
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ollama  2791 root    3u  IPv6  20655      0t0  TCP *:11434 (LISTEN)
root@ubuntu-s-8gb-lon1-01:~# kill 2791
root@ubuntu-s-8gb-lon1-01:~# sudo lsof -i :11434
root@ubuntu-s-8gb-lon1-01:~#

Ready to build something with AI?

You’ve seen that getting an LLM open source model running with Ollama is very straightforward. If you’re exploring how AI can supercharge your product or team, get in touch as we’d love to help.