What is the problem?

Everyone knows that DeepSeek has open-sourced large models. Some people want to try deploying them locally using Ollama or LM-Studio to see how they perform. After checking various large model platforms, some may be confused as to why the recommended DeepSeek large models are often “DeepSeek-R1-Distill-Qwen” type. What is the relationship between DeepSeek and Qwen? What is Distill?

What is DeepSeek, Distill, and Qwen?

  • DeepSeek-R1 is a reasoning model developed by DeepSeek, a company under Huafang Quantization.

    • One of the strong points of DeepSeek-R1 is its strong reasoning ability.
  • Qwen is a large model from Alibaba.

    • Its strong point is having a relatively rich number of parameters.
  • Distill refers to the distillation process, which combines the strengths of both models into a new large model.

    • Strong dataset + Strong reasoning computing power

In summary, DeepSeek-R1-Distill-Qwen is similar to a grafting technique in fruit trees, grafting DeepSeek’s reasoning ability onto Qwen’s data.

How to choose a suitable large model for your hardware?

  • DeepSeek-R1-Distill-Qwen-32B-IQ3_M.gguf, 14.81GB
  • DeepSeek-R1-Distill-Qwen-7B-f16.gguf, 15.24GB

Taking these two models as examples, they are similar in size, but most of the time, model 2 is better than model 1.

  • Model 1 has 32 billion parameters, IQ3_XS represents a quantization strategy that balances performance and efficiency. The quantization process may cause some precision loss, but it still retains a lot of original information.
  • Model 2 has only 7 billion parameters, without any pruning or quantization processing. Theoretically, it can provide the best performance close to the original training model, especially in terms of accuracy and detail handling. However, due to fewer parameters, it may not perform as well as the 32B version in understanding and generating complex text.

This is actually a complex issue that requires comprehensive consideration. Generally speaking, the version closest to the original large model is best. However, ordinary users are limited by their GPU computing power and can only choose pruned versions.

A GPU with 16GB of VRAM generally chooses 7B, 14B parameters, as running 32B may be too demanding. Therefore, if you have a new 5090 GPU, you can choose larger parameter models.

For ordinary users, the effect of deploying DeepSeek R1 locally will not be very good, and the highest generation quality is often from various large model’s online web versions.