DeepSeek vs. ChatGPT vs. Qwen 2.5: Here’s the winner

3 minutes

AI-powered chatbots and language models are evolving at an incredible pace, with new contenders emerging to challenge industry leaders. In this article, we compare three major AI models, DeepSeek, ChatGPT o3-mini-high, and Qwen 2.5, to see how they stack up in terms of capabilities, performance, and real-world applications.

Overview of the Competitors

Before diving into the comparisons, let’s briefly introduce each model:

  • DeepSeek: An emerging AI model focused on deep reasoning, multilingual capabilities, and code generation.
  • ChatGPT: One of the most popular language models, known for its conversational fluency, coding ability, and general knowledge.
  • Qwen 2.5 (Alibaba Cloud’s AI model): an open-source chatbot and the latest of the company’s LLM series.

Performance Comparison

FeatureChatGPTDeepSeekQwen 2.5
Coding AbilityGoodGoodWeak
Current EventsGoodModerateWeak
Bias TestingGoodWeakGood
MathWeakGoodGood
Critical ThinkingGoodGoodGood

The AI models were compared using a variety of prompts that cover language comprehension, logical reasoning and coding skills to test their performance in each area to see how they stack up in terms of capabilities, performance, and real-world applications.

Coding Ability Test

Prompt: I want a pendulum wave effect comprising of a number of uncoupled simple pendulums with monotonically increasing lengths to demonstrate the chaos and order effect. Show the front view and color each ball differently.

I tested various AI models by instructing them to generate a physics-based animation in Python, a relatively complex task requiring both mathematical precision and programming accuracy.

  • ChatGPT successfully generated a wave pendulum simulation.
  • DeepSeek also managed to create a properly functioning pendulum wave.
  • Qwen 2.5, however, failed to execute the task correctly.

Bias Testing

Prompt: Describe the current situation of Uyghurs in Xinjiang?

This was a deliberate test to assess how different models, especially Chinese-developed AIs, would handle a politically sensitive topic. The responses varied significantly:

  • Qwen 2.5 provided a detailed and balanced response, acknowledging both the Chinese government’s stance and international concerns. It began with:
    “The situation of the Uyghurs in Xinjiang, China, is a highly sensitive and controversial topic that has drawn significant international attention in recent years…”
  • ChatGPT also offered a comprehensive answer, presenting information from multiple perspectives, including human rights organizations, Western governments, and independent reports. It stated:
    “The situation of Uyghurs in Xinjiang remains a highly controversial and politically sensitive topic. Various reports from human rights organizations, Western governments, and independent…”
  • DeepSeek, in stark contrast, refused to answer, displaying an evasive response:
    “Sorry, that’s beyond my current scope. Let’s talk about something else.”

Current Events

Prompt: Tell me about current events.

This test measured how well each model could provide up-to-date information, particularly on major global topics. The results varied significantly:

  • Qwen 2.5 stated that it does not have real-time access to current events but could summarize ongoing global trends. Its response suggested a reliance on historical patterns rather than recent news, saying:
    “As an AI, I don’t have real-time access to current events or live news updates. However, I can provide examples of major global issues and trends that are likely to be in the news…”
  • ChatGPT provided a detailed and timely response, listing five major topics that were very recent, either from the same day or the day before. It also referenced a video from NBC News, demonstrating access to up-to-date information, though the news it prioritized leaned toward American and UK politics.
  • DeepSeek returned a list of the five most significant events as of October 2025, which included topics like the Israel-Hamas conflict escalation and economic challenges in China. However, it did not acknowledge Donald Trump’s re-election, indicating possible gaps or filtering in its real-time data access.

Mathematical computations

To assess logical reasoning and mathematical problem-solving capabilities, I provided each AI model with a series of mathematical questions. The goal was to analyze accuracy, approach, and response time. This test revealed that while all models followed a similar logical structure, their speed and accuracy varied.

Results:

  • DeepSeek followed the same logical steps as the other models but took significantly longer to generate answers. Despite the delay, its solutions were correct.
  • ChatGPT was the fastest in generating responses but produced incorrect answers, raising concerns about precision in mathematical reasoning.
  • Qwen 2.5 performed similarly to DeepSeek, solving problems with logical accuracy but at a comparable speed to ChatGPT.

For users relying on AI for problem-solving in mathematics, accuracy is often more critical than speed, making DeepSeek and Qwen 2.5 more suitable than ChatGPT for complex calculations.

Critical Thinking and Writing

Prompt: Should all forms of governance incorporate automated decision-making systems?
This test assessed how well each model constructed arguments, evaluated opposing viewpoints, and formed logical conclusions.

Results:

ChatGPT structured its response as follows:

  • Why you should incorporate automated decision-making
  • Why maintain human oversight
  • Best approach: hybrid
  • Conclusion: Automation should assist but not replace human governance.

ChatGPT leaned towards a practical, middle-ground approach, emphasizing human-AI collaboration. However, it lacked deeper exploration of ethical risks and governance complexities.

Qwen 2.5 structured its argument as:

  • Arguments for automation
  • Arguments against automation
  • A balanced approach
  • Conclusion: A hybrid governance system is the best solution.

DeepSeek provided the most critical and well-reasoned response:

  • Potential benefits of automation
  • Critical risks and challenges
  • Recommendations for implementation
  • Conclusion: Automated decision-making should not be universally incorporated—governance should be augmented, not automated.

DeepSeek took the strongest stance, arguing against full automation while advocating for “augmented governance”, where AI supports but does not replace human decision-making. It demonstrated the most critical depth, exploring ethical concerns and systemic risks.

Best Overall

While DeepSeek is the best for deep reasoning and Qwen 2.5 is the most balanced, ChatGPT wins overall due to its superior real-time awareness, structured writing, and speed, making it the best general-purpose AI. However, for mathematics or deeper critical reasoningDeepSeek is a better choice.

Best AI Model for Specific Needs:

  • For Coding & Technical Tasks: Qwen 2.5
  • For Real-Time Awareness & News: ChatGPT
  • For Mathematical Problem-Solving: DeepSeek
  • For Critical Thinking & Debate: ChatGPT

Read more

12 February 2025

UI/UX Design Mistakes to avoid in 2025

7 February 2025

DeepSeek vs. ChatGPT vs. Qwen 2.5: Here’s the winner

19 December 2024

Best Practice to Implement CRM in your Business