DeepSeek从入门到精通

DeepSeek-V3 的综合能力

DeepSeek-V3 在推理速度上相较历史模型有了大幅提升。

在目前大模型主流榜单中,DeepSeek-V3 在开源模型中位列榜首,与世界上最先进的闭源模型不分伯仲。

Benchmark (Metric)DeepSeek V3DeepSeek V2.5Qwen2.5Llama3.1Claude-3.5GPT-4o
090572B-Inst405B-InstSonnet-10220513
ArchitectureMoEMoEDenseDense
# Activated Params37B21B72B405B
# Total Params671B236B72B405B
EnglishMMLU (EM)88.580.685.388.688.387.2
MMLU-Redux (EM)89.180.385.686.288.988.0
MMLU-Pro (EM)75.966.271.673.378.072.6
DROP (3-shot F1)91.687.876.788.788.383.7
IF-Eval (Prompt Strict)86.180.684.186.086.584.3
GPQA-Diamond (Pass@1)59.141.349.051.165.049.9
SimpleQA (Correct)24.910.29.117.128.438.2
FRAMES (Acc.)73.365.469.870.072.580.5
LongBench v2 (Acc.)48.735.439.436.141.048.1
CodeHumanEval-Mul (Pass@1)82.677.477.377.281.780.5
LiveCodeBench (Pass@1-COT)40.529.231.128.436.333.4
LiveCodeBench (Pass@1)37.628.428.730.132.834.2
Codeforces (Percentile)51.635.624.825.320.323.6
SWE Verified (Resolved)42.022.623.824.550.838.8
Aider-Edit (Acc.)79.771.665.463.984.272.9
Aider-Polyglot (Acc.)49.618.27.65.845.316.0
MathAIME 2024 (Pass@1)39.216.723.323.316.09.3
MATH-500 (EM)90.274.780.073.878.374.6
CNMO 2024 (Pass@1)43.210.815.96.813.110.8
ChineseCLUEWSC (EM)90.990.491.484.785.487.9
C-Eval (EM)86.579.586.161.576.776.0
C-SimpleQA (Correct)64.154.148.450.451.359.3

Read pdf:

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

0
Would love your thoughts, please comment.x
()
x