Beginner’s Guide to Running Small Language Models Locally
If you’ve heard about AI models like ChatGPT but recently came across ‘Small Language Models’, here’s what that actually means.
Small language models (SLMs) are essentially smaller versions of large language models (LLMs), which are generative AI models designed to understand and produce human-like text. They are trained, prompted, or fine-tuned using domain-specific data.
They are designed to be more compact and efficient, typically containing fewer parameters than large language models. This smaller size doesn’t necessarily mean reduced capability. In many cases, it results in faster processing and lower computational cost, especially in resource-constrained environments like smartphones.
Depending on the use case, industry benchmarks show that using SLMs can significantly accelerate development cycles, sometimes by as much as 60–70%, because they are easier to fine-tune and iterate upon.
SLMs have significantly fewer parameters, typically ranging from a few hundred million to a few billion.
· They are more efficient to train compared to LLMs
· Require less computational power and memory
· Easily accessible due to their smaller size
· Easier to customize for specific domains and tasks
Using small language models, developers can fine-tune AI models for specific domains and applications.
We could potentially see highly specialized and very responsive AI models running directly on our smartphones without compromising user data. Unlike ChatGPT or other online tools, every word you type and every response the AI generates stays entirely on your local hard drive and RAM. No data is sent to a server, making it 100% private.
SLMs can sometimes be more useful than LLMs for specific tasks like creative writing and reasoning.
You might be wondering why you would bother running a smaller language model locally when you can just call a powerful API like GPT-4o.
The answer is that in the real world there are countless scenarios where you cannot or you shouldn’t share the data to an external service.
There are:
1) Privacy regulations to consider
2) Latency requirements that rule out network round trips
3) Cost constraints at scale
4) Edge development situations where internet connectivity isn’t guaranteed
Running your own AI model on a laptop might sound technical or intimidating, but it’s become surprisingly easy today. In this guide, I’ll walk you through how to set up and run a small language model (SLM) directly on your machine, no cloud services, no APIs, completely free and yes, it even works offline!
What You’ll Be Doing
By the end of this guide, you’ll be able to:
· Set up and run your own AI chatbot right on your laptop
· Use it to answer questions, write code, and handle everyday tasks
· Keep using it even without an internet connection once everything is set up
Prerequisites
Before you run a small language model (SLM) locally, make sure you have a few basics in place. Nothing too heavy, just enough to ensure things run smoothly.
1. A decent laptop or PC (at least 8GB RAM, though 16GB is much better for smoother performance)
2. Some free storage space - models can take anywhere from 2GB to 10GB depending on size and quantization (Ollama uses quantization, a technique that 'compresses' these models. This is why a 3B model like Llama 3.2 only needs about 2GB of space and can run smoothly even on laptops with limited memory.)
3. A modern CPU (most SLMs can run on CPU, but a GPU will speed things up if you have one)
4. Basic familiarity with using the terminal or command prompt
Once you have these ready, you’re good to go.
Now that everything is ready, let’s start setting things up step by step.
Step 1: Install Ollama
To run a language model locally, we first need a tool that simplifies the entire process. For this guide, we’ll use Ollama, which handles downloading and running models for you.
Ollama manages models in the GGUF format (used in the llama.cpp ecosystem), where the model weights and metadata are packaged in an optimized, often quantized(compressed) form that your laptop can easily read. In most cases, it automatically downloads pre-quantized versions of models, making them lightweight and efficient to run on local machines. This removes the need for manual setup, model conversion, or dealing directly with tools like llama.cpp.
Follow the steps below:
- Go to the Ollama website
Open your browser and visit the official Ollama website. You’ll see options to download it for different operating systems.
- Download the installer
Choose the version that matches your system (Windows, macOS, or Linux) and start the download.
- Run the installer
Once the file is downloaded, open it and follow the installation steps. It’s just like installing any regular application, click next and finish setup.
- Wait for the installation to complete
It should only take a minute or two. Once it’s done, Ollama will be installed on your system and ready to use.
Version numbers may vary as the tool is updated frequently
After completing these steps, you’ve set up the main tool we’ll use. In the next step, we’ll run your first AI model.
Step 2: Run Your First AI Model
Here comes the fun part - actually running your first AI model.
Open your terminal
This could be Command Prompt, PowerShell, or Terminal whatever you normally use.
Here’s the thing, you don’t need to open a specific folder or activate an environment. Ollama installs itself globally, so you can run it from anywhere.
Pull the model
Just type this and hit enter:
ollama pull llama3.2:3b
The :3b at the end of the name stands for 3 Billion parameters.
Run the model
ollama run llama3.2:3b
Or you can also just type this and hit enter:
ollama run llama3.2
Ollama will see that you don't have the model yet, download it automatically, and start the chat session all at once.
Step 3: Start Using It
That’s it! you now have your own AI running locally.
You can use it just like you would use ChatGPT. Just type your questions or tasks, and it will respond right in the terminal.
Try things like:
· “Explain photosynthesis in simple words”
· “Write a Python function to reverse a string”
· “Summarize this paragraph: …”
Play around with it, ask different kinds of questions, and see what it can do.
And the best part is everything is running directly on your machine.
When you’re done chatting and want to exit, simply type
/bye
and hit enter to return to your normal terminal or you can click ‘ctrl + d’
And if you don't want to use the terminal, Ollama also comes with a built-in app where you can chat with models just like ChatGPT.
search for ollama app
open the app
search for our previously installed slm
start using it like any other AI chatbot

Want to try other models?
Ollama has a full library of models you can explore.
If you want to use a different one, it’s just two simple commands:
ollama pull <model_name>
ollama run <model_name>
That’s it. Pick a model, run it, and start experimenting.
If It Feels Slow
If it feels a bit slow, that’s completely normal.
These models are still pretty heavy, and your laptop is handling everything on its own. So some delay is expected, especially on the first few runs or with longer responses.
A few simple things you can do:
· Close any heavy apps running in the background
· Try using smaller models
· Give it a few extra seconds for longer responses
Once you get used to it, it actually feels pretty smooth.
What Can You Use This For?
Once it’s set up, you can use it for simple, everyday things like:
· Asking random questions
· Getting unstuck while coding
· Quickly summarizing something you don’t want to read fully
· Thinking through ideas when you’re stuck
· Using it anytime, even without internet
It basically becomes something you can open and use whenever you need a quick answer or help.
That’s pretty much it.
Running AI locally isn’t complicated anymore. With tools like Ollama, you can get everything working in just a few minutes and start chatting with your own AI right from your laptop.
Try it out! you’ll be surprised how far small models have come.
