Contents
Today, tools like LM Studio make it easy to find, download, and run large language models on consumer-grade hardware. A typical quantized 7B model (a model with 7 billion parameters which are squeezed into 8 bits each or even smaller) would require 4-7GB of RAM/VRAM which is something an average laptop has.
LM Studio allows you to pick whether to run the model using CPU and RAM or using GPU and VRAM. It also shows the tokens/s metric at the bottom, of the chat dialog
I have used this 5.94GB version of fine-tuned Mistral 7B and did a quick test of both options (CPU vs GPU) on 3 machines and here’re the results.
Tokens/second
Spec | Result |
---|---|
Apple M1 Pro CPU | 14.8 tok/s |
Apple M1 Pro GPU | 19.4 tok/s |
AMD Ryzen 7 7840U CPU | 7.3 tok/s |
AMD Radeon 780M iGPU | 5.0 tok/s |
AMD Ryzen 5 7535HS CPU | 7.4 tok/s |
GeForce RTX 4060 Mobile GPU | 37.9 tok/s |
Hardware Specs
-
2021 M1 Mac Book Pro, MacBook Pro, 10 cores(8 performance and 2 efficiency), 16GB of RAM
-
2023 AOKZEO A1 Pro gaming handheld, AMD Ryzen 5 7 7840U CPU (8 cores, 16 threads), 32 GB LPDDR5X RAM, Radeon 780M iGPU (using system RAM as VRAM), TDP at 30W
- 3D Mark TimeSpy GPU Score 3000
- 3D Mark TimeSpy CPU Score 73003. 2024 MSI Bravo 15
-
2023 C7VF-039XRU laptop, AMD Ryzen 5 7535HS CPU (6 cores, 12 threads, 54W), 16GB DDR RAM, GeForce RTX 4060 (8GB VRAM, 105W)
- GPU was slightly undervalued/overlocked, 3D Mark TimeSpy CPU Score 11300
- 3D Mark TimeSpy CPU Score 7600
Screenshots
Mac
AOKZOE
MSI
[fluentform id="8"]