Local LLM on Macbook Air - Intel i5 & 4GB RAM


I recently bought a new Macbook M4 and decided to use my old Macbook Air to run local and private LLM instead of leaving it in the cabinet unused!

Background on the hardware:

In order to do that, Let’s check the current state of my old Mac. It has Dual-core Intel i5 cpu with 4GB DDR3 RAM, with only Intel HD Graphics 5000. Very weak system compared to latest model. However, its macOS Big Sur just received a security update to version 11.7.11 just this Febuary 2026! Despite its age, the system still functions well as a backup notebook when I need it.


My setup:

With the above spec, my options are pretty limited and GUI interface like Open WebUI or LM Studio is out of the question. I decided to go with llama.cpp cli approach, based on what I’ve researched. It is an open-source C/C++ library and inference engine that allows users to run large language models (LLMs) locally and efficiently on consumer hardware, including standard CPUs, GPUs, and Apple Silicon. It utilizes advanced quantization techniques (such as GGUF format) which compress model sizes and reduce memory requirements significantly, with minimal impact on performance. This makes it possible to run large models on devices with limited memory, perfect for my case!

So here are the steps I followed to get my Local LLM up and running:


Step-by-step instructions:

  1. Install Developer Tools:

bash

# Open Terminal and install developer tools
xcode-select --install

if you run into problem installing xcode, use the command below to remove the corrupted installation, then proceed with the install again.

bash

# Remove corrupted installation
sudo rm -rf /library/developer\commandlinetools

# Verify the installation with
xcode-select -p

It should return: /Library/Developer/CommandLineTools.

  1. Download and Build llama.cpp for Intel i5

Download llama.cpp as zip file from GitHub repository. https://github.com/ggml-org/llama.cpp

Download llama.cpp Download llama.cpp by clicking the green CODE button and download ZIP

Once downloaded, go to your downloads folder in terminal and unzip it.

bash

cd ~/Downloads
unzip llama.cpp-master.zip
cd llama.cpp-master

Now let’s proceed with compiling the LLM engine.

llama.cpp recently shifted its primary build system to CMake. On an older system like Big Sur, we need to make sure CMake is actually installed first.

bash

# Run this command to check for CMake
cmake --version

If it says “command not found”: You need to install it. Visit CMake.org and download the macOS Binary DMG from CMake.org,

Download Cmake dmg Download CMake dmg file. Choose macOS 10.13 or later

Once the file is download, double-click the dmg file to install it, and then run the command below to add it to your path.

bash

sudo "/Applications/CMake.app/Contents/bin/cmake-gui" --install

Build with CMake (Optimized for Intel i5) by running these commands below inside your llama.cpp-master folder:

bash

# Create a build directory:
mkdir build
cd build

# Config the build
cmake .. -DGGML_METAL=OFF -DGGML_AVX=ON -DGGML_AVX2=ON -DGGML_ACCELERATE=OFF

# Compile your customized build
cmake --build . --config Release -j 1

the -j 1 argument forces it to use only one core at a time during compilation to save RAM. This process will take a while.

Once the llama.cpp build is finished, the executable file won’t be in the main folder; it will be inside build/bin/. To run it, go back to the main folder.

Next proceed to download the model file.

  1. Download and run the LLM model

For models, I wanted to test the smallest model and check the performance first. So I went with Qwen 0.5B parameter with 4-bit quantization and found it to be quite fast with ingestion and output. You can download and try the model with the link below:

https://huggingface.co/bakongi/Qwen-0.5B_Instruct_RuAlpaca

After testing a few models, I’ve found that the sweet spot for my old Macbook Air i5 is with Qwen2.5, 1.5B parameters. The file size is 1.07GB which is almost the max limit of what my Mac can handle. Below is the link I downloaded it from.

https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF

You can try other models as well.

Download LLM model Download LLM model from huggingface. The 4-bit model is already 1.07GB!

Once the model is downloaded, run the command below to start your locall LLM in conversation mode.

bash

cd ..
./build/bin/llama-cli -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf -n 256 -t 2 --color -i -cnv

If you have system prompt for the LLM, you can add the content to a file such as system.md and run the command:

bash

./build/bin/llama-cli -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf -n 256 -t 2 --color -i -cnv --file system.md

In case the model crashes on your low-memory hardware, you can use the -c argument to lower LLM short-term memory (context). Using -c 2048 will cap the context at 2048 tokens. Higher context uses significantly more memory!

bash

./build/bin/llama-cli -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf -n 256 -t 2 --color on -cnv -c 2048
  1. Test and have fun with your local LLM!

If you followed the instruction, your terminal should now run llama.cpp with the model loaded, then give you a prompt for LLM conversation ! Its reply prompt will include ‘Ingestion’ speed (Prompt: xx t/s) measured in tokens per second (t/s), and ‘Generation’ or output speed (Generation: xx t/s). With my setup, I can achieve around 22-24 tokens/second for ingestion and around 10-11 tokens/second for output generation. Not bad for such an old Mac!


Final Thoughts

For the ultimate convenience, you can use macOS Automator to create an application which will run script to automatically open terminal and run your model for you.

Just follow the instructions below:

  1. Search and run Automator from application folder

  2. Choose ‘File’ -> ‘New’ -> Then choose ‘Application’ type.

  3. From the search bar, find ‘Run AppleScript’.

  4. Copy and paste the script below into the input text box.

on run {input, parameters}
    tell application "Terminal"
        activate
        do script "cd ~/Downloads/llama.cpp-master && ./build/bin/llama-cli -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf -n 256 -t 2 --color on -cnv"
    end tell
end run
  1. Click ‘Save’ and save the script to ‘Application’ folder.

  2. Open your ‘Application’ folder, and you will see your App. Click to open!

And that’s it! Have fun testing different models and fine-tuning your local and private LLM !


What’s Next

  • Check back for more later, so stay-tuned!

🔗 Connect

I’m building Prevalis Strategies as a technical + strategic consulting venture. Follow the journey, learn with me, or drop suggestions or questions!

Domain: https://prevalis.ai
Email: [info@prevalis.ai]
Built & maintained by: prevalis.ai