Published 8 months ago
Published 8 months ago
Y0K0
Updated 8 months ago
0
在目前大模型主流榜单中,DeepSeek-V3 在开源模型中位列榜首,与世界上最先进的闭源模型不分伯仲。
Benchmark (Metric)	DeepSeek V3	DeepSeek V2.5	Qwen2.5	Llama3.1	Claude-3.5	GPT-4o
0905	72B-Inst	405B-Inst	Sonnet-1022	0513
Architecture	MoE	MoE	Dense	Dense	-	-
English	MMLU (EM)	88.5	80.6	85.3	88.6	88.3	87.2
MMLU-Redux (EM)	89.1	80.3	85.6	86.2	88.9	88.0
MMLU-Pro (EM)	75.9	66.2	71.6	73.3	78.0	72.6
DROP (3-shot F1)	91.6	87.8	76.7	88.7	88.3	83.7
IF-Eval (Prompt Strict)	86.1	80.6	84.1	86.0	86.5	84.3
GPQA-Diamond (Pass@1)	59.1	41.3	49.0	51.1	65.0	49.9
SimpleQA (Correct)	24.9	10.2	9.1	17.1	28.4	38.2
FRAMES (Acc.)	73.3	65.4	69.8	70.0	72.5
Y0K0
Updated 8 months ago
0
1231231231asdasd