How to choose suitable model files when deploying local DeepSeek large model?

What is the problem?

Everyone knows that DeepSeek has open-sourced large models. Some people want to try deploying them locally using ollama or lm-studio to see how they perform. After looking through various large model platforms, some may be confused as to why the recommended DeepSeek large models on these platforms are things like "DeepSeek-R1-Distill-Qwen". What's the relationship between DeepSeek and Qwen? What is Distill?

What is DeepSeek, Distill, and Qwen?

DeepSeek-R1 is a reasoning model developed by DeepSeek, a AI company under Huafang Quantization.
- One of the strong points of DeepSeek-R1 is its strong reasoning ability.
Qwen is a large model from Alibaba.
- Its strong point is having a relatively rich number of parameters.
Distill refers to the distillation process, which combines the strengths of both to create a new large model.
- Strong data set + Strong reasoning computing power

In summary, DeepSeek-R1-Distill-Qwen is similar to a grafting technique for fruit trees, grafting DeepSeek's reasoning ability onto Qwen's data.

How to choose a suitable large model for your hardware?

DeepSeek-R1-Distill-Qwen-32B-IQ3_M.gguf, 14.81GB
DeepSeek-R1-Distill-Qwen-7B-f16.gguf, 15.24GB

Taking these two models as examples, they are similar in size, but most of the time, model 2 is better than model 1.

Model 1 has 320 billion parameters, IQ3_XS represents a quantization strategy that balances performance and efficiency. The quantization process may cause some precision loss, but it still retains a lot of original information.
Model 2 has only 70 billion parameters and hasn't undergone pruning or quantization. Theoretically, it can provide performance closest to the original training model, especially in terms of accuracy and detail handling. However, due to fewer parameters, it may not perform as well as the 32B version in understanding and generating complex text.

This is actually a complex issue that requires comprehensive consideration. Generally speaking, the version closer to the original large model is better. However, ordinary users are limited by their GPU computing power and can only choose pruned versions.

A GPU with 16GB of VRAM generally recommends choosing 7B with 14B parameters, as running 32B may be too demanding. Therefore, if you have the latest 5090 GPU, you can choose larger parameter models.

For ordinary users, deploying DeepSeek R1 locally will not yield very good results. The highest quality of generated text is often from various large model's online web versions.

What is the problem?

What is DeepSeek, Distill, and Qwen?

How to choose a suitable large model for your hardware?

Related Posts

Subscribe WeChat