Running LLaMA on Apple Silicon Mac
Introduction
Are you ready to take your AI research to the next level? Look no further than LLaMA - the Large Language Model Meta AI. Designed to help researchers advance their work in the subfield of AI, LLaMA has been released under a noncommercial license focused on research use cases, granting access to academic researchers, those affiliated with organizations in government, civil society, and academia, and industry research laboratories around the world. In this article, we will dive into the exciting world of LLaMA and explore how to use it with M1 Macs, specifically focusing on running LLaMA 7B and 13B on a M1/M2 MacBook Pro with llama.cpp. Get ready to unlock the full potential of large language models and revolutionize your research!
So how to Run it on your MacBookPro ?
Running LLaMA
Thanks to Georgi Gerganov and his llama.cpp project it is possible to run Meta’s LLaMA on a single computer without a dedicated GPU. That’s amazing!
Step 1 Install some dependencies
You need to install Xcode
to run C++ project. If you don’t have:
1 | xcode-select --install |
At the same time, use Brew
to building the C++ project (pkgconfigand cmake).
1 | brew install pkgconfig cmake python@3.11 |
(Optional) Install a python virtual environment
1 | pip3 install virtualenv |
Create a python virtual environment
1 | virtualenv (Your Project Name) |
Activate your virtual environment
1 | source bin/activate |
Next, install PyTorch
(Recommand you to install Nightly version) and some other packages
1 | pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu |
1 | pip install numpy sentencepiece |
(Optional) Try Metal Performance Shaders (MPS) backend for GPU
1 | Python 3.11.2 (main, Feb 16 2023, 02:55:59) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin |
Step 2 Download Project
Get llama.cpp
🫡 Repository:
1 | git clone https://github.com/ggerganov/llama.cpp |
Run make
to compile the C++ code:
1 | make |
Step 3 Download LLaMA Model
Two ways to Get Model
- Offical Form: https://forms.gle/jk851eBVbX1m5TAv5
- BitTorrent From Github: https://github.com/facebookresearch/llama/pull/73
Note
If you download model from Github, Do not use ipfs link, use BitTorrent, prevent you can't convert model later
After you download your model, the structure will look like this:
1 | . |
Step 4 Convert LLaMA model 7B
You placed the models under models/ in the llama.cpp repo.
1 | python convert-pth-to-ggml.py models/7B 1 |
Your output will look like this:
1 | {'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': 32000} |
If you get an error called RuntimeError: PytorchStreamReader failed reading zip archive: not a ZIP archive, Check your model and my note before.
This should produce models/7B/ggml-model-f16.bin
- another 13GB file.
This script "quantizes the model to 4-bits:
1 | ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 |
Step 5 Running LLaMA model 7B
1 | ./main -m ./models/7B/ggml-model-q4_0.bin \ |
Output:
The first man on the moon was 38 years old in July, ’69. Neil Armstrong had been born only a month or two after my mother’s family settled at their little homestead between Saginaw and Flint/Bay City; they came from Pennsylvania (like most of us).
In his eulogy to him later this year – in which he paid tribute to Armstrong for the “greatness” that was reflected by how much more people were talking about it than usual, despite living through 2018’s worst political disaster yet) … Obama noted: ‘I don’t think…
Step 6 Running LLaMA model 13B
To convert 13B model to ggml
:
1 | python convert-pth-to-ggml.py models/13B/ 1 |
The quantize
command needs to be run for each of those in turn:
1 | ./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2 |
1 | ./main \ |
Enjoy !
Reference🙏🏻
1. https://dev.l1x.be/posts/2023/03/12/using-llama-with-m1-mac/
2. https://til.simonwillison.net/llms/llama-7b-m2