DeepSeek R1 Local Ai Server LLM Testing on Ollama - Video Insight
DeepSeek R1 Local Ai Server LLM Testing on Ollama - Video Insight
Digital Spaceport
Fullscreen


The reviewer assesses Deep Seek R1's reasoning performance, noting issues in precision while highlighting its potential against competitors.

In the video, the creator presents a detailed review of the Deep Seek R1's reasoning capabilities, specifically its large context window of 128k tokens and its performance compared to other models like Claude and Llama 3.3. The reviewer expresses enthusiasm about testing the model with various practical scenarios, seeking to evaluate its reasoning abilities through unique and engaging questions that typically do not receive enough scrutiny in usual AI assessments. As they experiment with the model’s responses, they document both its strengths and weaknesses, ultimately expressing concern over its performance in easy tasks and potential issues with precision, hinting at room for improvement.


Content rate: B

The content is informative, covering both the strengths and the weaknesses of the Deep Seek R1 model against its competitors. The reviewer engages the audience with practical experiments and insights, despite the occasional lack of clear conclusions regarding some model failures. Overall, while it presents valid testing claims and experiences, the inclusion of unsupported sweeping claims slightly diminishes its value.

AI DeepSeek modeling reasoning testing

Claims:

Claim: The Deep Seek R1 model with a 128k context window is superior to Claude.

Evidence: The reviewer notes significant improvements in response speed and depth of responses compared to Claude but fails standard tests, prompting skepticism about its overall performance.

Counter evidence: The reviewer ultimately states that the model fails to outperform Llama 3.3 in certain scenarios, suggesting it may not be as superior as claimed.

Claim rating: 7 / 10

Claim: The model encountered numerous precision issues, affecting its ability to answer basic questions correctly.

Evidence: During the demonstration, the model incorrectly answered simple inquiries such as counting letters in 'peppermint', indicating potential flaws in its recall abilities.

Counter evidence: Some answers were correct, but persistent errors in straightforward tasks indicate a variability that undermines its reliability.

Claim rating: 8 / 10

Claim: Running the Deep Seek R1 efficiently requires significant power resources and may need dual power supplies.

Evidence: The reviewer discusses increased power consumption when using the model and considers adding a second power supply to support its operation under heavy loads.

Counter evidence: While the power demands were noted, the video does not provide specific benchmarks or data showing consistent power supply issues across various operational scenarios.

Claim rating: 9 / 10

Model version: 0.25 ,chatGPT:gpt-4o-mini-2024-07-18