The other day I was going through my old laptops (I have a collection of them 😅) and had one idea: What if I could run Ollama on one of them and access it remotely from my main machine? Could that mean that I am effectively hosting an AI model on my old laptop, for free? (Well, not exactly free, but you get the idea).
This is how this post started.
Running Ollama Locally and Accessing It Remotely with ngrok
Running AI models locally has become increasingly popular, especially with tools like Ollama that allow you to deploy and interact with large language models on your own machine. However, accessing these local instances remotely can be challenging.
In a previous post I talked about how you can install Ollama and run it in a virtual machine so you can access it remotely. But what if you want to run Ollama on your local machine and access it remotely? This is where ngrok comes in.
What is ngrok?
What’s ngrok anyway? and why do I need to know about it?
ngrok is a cool tool! From their documentation we have the serious explanation below:
ngrok is a cross-platform application that allows developers to expose their local web servers to
the internet
Basically, it is a door open to the internet from your local machine. This means that if you set it up correctly, you can access your machine from anywhere in the world. How cool is that?
Setting Up ngrok
First, you’ll need to create an account on ngrok’s website. After signing up, visit the setup page (https://dashboard.ngrok.com/get-started/setup/macos) to get your authentication token.
If you’re using macOS, installing ngrok is straightforward using Homebrew:
brew install ngrok
After installing ngrok, you can authenticate your account using the token you received earlier:
ngrok config add-authtoken YOUR_KEY
The auth token will be saved in your configuration file, allowing you to use ngrok without logging in each time.
Authtoken saved to configuration file: /Users/your_user/Library/Application Support/ngrok/ngrok.yml
Just with these two steps you can expose any port on your local machine to the internet. A quick example on how this works is if you are running something locally. In my case I have open-webui running on port 8080. I can expose it to the internet with the following command:
ngrok http http://localhost:8080
This will give you a public URL that you can use to access your local machine from anywhere in the world.
ngrok (Ctrl+C to quit)
❤️ ngrok
Session Status online
Account your_email (Plan: Free)
Version 3.20.0
Region Europe (eu)
Web Interface http://127.0.0.1:4040
Forwarding https://4dcf-92-233-157-239.ngrok-free.app -> http://localhost:8080
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
If you pay attention, in the above output you can see the Forwarding
section. This is the URL you
access to hit your local machine.
Note: The URL will change every time you run ngrok. If you want a persistent URL, you’ll need to follow the following steps: https://ngrok.com/blog-post/free-static-domains-ngrok-users. Your free account in ngrok allows you to claim one free domain.
Making sure Ollama is up and running
The above can be an easy way to access your local machine running the open-webui portal and hitting the different LLM models’ endpoints. But what if we want to run that model locally, expose our machine running Ollama and use it from another remote machine?
First, from the machine we will expose to the world, let’s pull a model with Ollama:
ollama pull llama3:8
Let’s test that everything is running as expected:
ollama serve
Now we can hit the Ollama endpoint!
curl http://localhost:11434/api/chat -d '{
"model": "llama3:8b",
"messages": [{ "role": "user", "content": "Are you a robot?" }]
}'
Response:
{"model":"llama3:8b","created_at":"2025-03-05T14:52:18.117339Z","message":{"role":"assistant","content":"I"},"done":false}
{"model":"llama3:8b","created_at":"2025-03-05T14:52:18.154583Z","message":{"role":"assistant","content":" am"},"done":false}
{"model":"llama3:8b","created_at":"2025-03-05T14:52:18.190636Z","message":{"role":"assistant","content":" not"},"done":false}
{"model":"llama3:8b","created_at":"2025-03-05T14:52:18.226585Z","message":{"role":"assistant","content":" a"},"done":false}
{"model":"llama3:8b","created_at":"2025-03-05T14:52:18.26238Z","message":{"role":"assistant","content":" human"},"done":false}
[...]
Ok so we know that Ollama is responding. Let’s call the same endpoint from a remote machine.
The moment of truth - Hitting Ollama endpoint from anywhere
So far we have seen that we can use ngrok to expose an application running in local. What about exposing the Ollama endpoint and calling it from anywhere?
This is again very easy but needs an extra step.
First stop Ollama and run it again with:
OLLAMA_HOST=0.0.0.0 ollama serve
This means that the service should listen for incoming connections from any IP address, not just from localhost (127.0.0.1). Thanks to this we will be able to hit the Ollama endpoint with the ngrok provided host.
After this let’s expose the port with ngrok:
ngrok http 11434
Again, you will get a forwarding URL like in the examples we have seen before:
Session Status online
Account your_email (Plan: Free)
Version 3.20.0
Region Europe (eu)
Web Interface http://127.0.0.1:4040
Forwarding https://73f1-92-233-157-239.ngrok-free.app -> http://localhost:11434
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
And now, the moment of truth. You can hit the Ollama endpoint from anywhere in the world.
curl https://73f1-92-233-157-239.ngrok-free.app/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "llama3:8b",
"messages": [{ "role": "user", "content": "Are you a robot?" }],
"stream": false
}'
Response:
{"model":"llama3:8b","created_at":"2025-03-06T14:43:59.303248Z",
"message":{"role":"assistant","content":"I am not a human, but I'm also not a traditional robot.
I'm an artificial intelligence language model designed to simulate conversation and answer questions
to the best of my ability based on my training data.\n\nI'm often referred to as a \"chatbot\" or a
\"conversational AI,\" which means I can understand and respond to natural language inputs in a way
that's similar to how humans communicate. However, I don't have consciousness, emotions, or physical
capabilities like humans do.\n\nMy primary function is to provide helpful and accurate information
to users who interact with me through text-based interfaces like this chat window. I'm designed to
be informative, engaging, and sometimes even entertaining!"},
"done_reason":"stop","done":true,
"total_duration":6849062709,
"load_duration":552888167,
"prompt_eval_count":15,
"prompt_eval_duration":868000000,
"eval_count":139,
"eval_duration":5426000000}%
Nicely done! 🤩
Security Considerations
When exposing local services to the internet, security is paramount. The free tier of ngrok provides some inherent protection by assigning a new URL each time you run it, which limits persistent unauthorized access. For additional security, consider using ngrok’s authentication features.
Practical Use Cases
This setup opens numerous possibilities. We can repurpose older hardware into dedicated AI servers, and enjoy on-the-go AI access from mobile devices while traveling.
If you find this approach valuable, consider creating a more permanent solution with dedicated hardware like a Raspberry Pi, developing a simple web interface, or automating the startup process. Those could be nice side projects!
Conclusion
With just a few simple tools, we’ve transformed a local AI model into a remotely accessible service. This approach gives you the privacy benefits of running models locally while providing the convenience of remote accessibility.
Whether you’re a developer looking to experiment with AI or someone who wants to make the most of their existing hardware, this combination of Ollama and ngrok offers a flexible, low-cost solution for accessing AI capabilities from anywhere.
The real power here is that you’ve created your own personal AI API—one that you fully control, doesn’t require monthly subscriptions, and keeps your data private. Not bad for repurposing an old laptop!