DeepSeek-V2.5-1210

はじめに
ローカルでの実行方法
ライセンス
引用
連絡先
aiスピーキング

はじめに

DeepSeek-V2.5-1210は、DeepSeek-V2.5のアップグレードされたバージョンで、様々な能力において改善されています：

数学：MATH-500ベンチマークでのパフォーマンスが74.8％から82.8％に向上しました。プログラミング：LiveCodebench（08.01 - 12.01）ベンチマークでの精度が29.2％から34.38％に増加しました。書き出しと推論：内部テストデータセットでは、対応する改善が見られます。また、新しいモデルバージョンでは、ファイルアップロードとウェブページ要約機能のユーザー体験が最適化されています。

ローカルでの実行方法

BF16形式での推論するためには、DeepSeek-V2.5を利用するには80GB×8のGPUが必要です。

HuggingfaceのTransformersでの推論 HuggingfaceのTransformersを直接モデル推論に活用できます。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2.5-1210"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory` はデバイスに基づいて設定する必要があります
max_memory = {i: "75GB" for i in range(8)}
# `device_map` を `auto` に設定することはできません
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "C++でクイックソートのコードを書く"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

チャットテンプレートの詳細は、Huggingfaceモデルリポジトリにあるtokenizer_config.json内で確認できます。

注意：チャットテンプレートは、前のDeepSeek-V2-Chatバージョンと比べて更新されています。

チャットテンプレートの例は以下の通りです：

<｜文の始まり｜><｜ユーザー｜>{user_message_1}<｜アシスタント｜>{assistant_message_1}<｜文の終わり｜><｜ユーザー｜>{user_message_2}<｜アシスタント｜>

オプションのシステムメッセージを追加することもできます：

<｜文の始まり｜>{system_message}<｜ユーザー｜>{user_message_1}<｜アシスタント｜>{assistant_message_1}<｜文の終わり｜><｜ユーザー｜>{user_message_2}<｜アシスタント｜>

vLLMでの推論（推奨） vLLMを使用してモデル推論を行うには、次のPull RequestをvLLMコードベースにマージしてください：https://github.com/vllm-project/vllm/pull/4650.

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 8
model_name = "deepseek-ai/DeepSeek-V2.5-1210"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "あなたは誰ですか？"}],
    [{"role": "user", "content": "以下の内容を中国語に直接翻訳してください：DeepSeek-V2は経済的で効率的な訓練と推論を保証するために革新的なアーキテクチャを採用しています。"}],
    [{"role": "user", "content": "C++でクイックソートのコードを書く。"}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

関数呼び出し関数呼び出しにより、モデルは外部ツールを呼び出して機能性を強化できます。

以下が例です：

# `model` と `tokenizer` がロードされていると仮定します
model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)

tool_system_prompt = """あなたは役立つアシスタントです。

## ツール

### 関数

以下の関数を使用できます：

- `get_current_weather`：
```json
{
    "name": "get_current_weather",
    "description": "指定された場所の天気を取得します",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "都市と州、例：サンフランシスコ、CA"
            },
            "unit": {
                "type": "string",
                "enum": [
                    "celsius",
                    "fahrenheit"
                ]
            }
        },
        "required": [
            "location"
        ]
    }
}
```"""

tool_call_messages = [{"role": "system", "content": tool_system_prompt}, {"role": "user", "content": "東京とパリの天気は怎么样ですか？"}]
tool_call_inputs = tokenizer.apply_chat_template(tool_call_messages, add_generation_prompt=True, return_tensors="pt")
tool_call_outputs = model.generate(tool_call_inputs.to(model.device))
# 生成されたテキスト：'<｜ツール呼び出し開始｜><｜ツール呼び出し開始｜>function<｜ツール区切り｜>get_current_weather\n```json\n{"location": "Tokyo"}\n```<｜ツール呼び出し終了｜>\n<｜ツール呼び出し開始｜>function<｜ツール区切り｜>get_current_weather\n```json\n{"location": "Paris"}\n```<｜ツール呼び出し終了｜><｜ツール呼び出し終了｜><｜文の終わり｜>'

# `get_current_weather` を呼び出したときのモック応答
tool_messages = [{"role": "tool", "content": '{"location": "Tokyo", "temperature": "10", "unit": null}'}, {"role": "tool", "content": '{"location": "Paris", "temperature": "22", "unit": null}'}]
tool_inputs = tokenizer.apply_chat_template(tool_messages, add_generation_prompt=False, return_tensors="pt")[:, 1:]
tool_inputs = torch.cat([tool_call_outputs, tool_inputs.to(model.device)], dim=1)
tool_outputs = model.generate(tool_inputs)
# 生成されたテキスト：東京の現在の天気は10度で、パリでは22度です。<｜文の終わり｜>

JSON出力 JSON出力モードを使用して、モデルが有効なJSONオブジェクトを生成することを保証できます。このモードをアクティブにするには、システムプロンプトに特別な指示を追加する必要があります。

# `model` と `tokenizer` がロードされていると仮定します
model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)

user_system_prompt = 'ユーザーが試験テキストを提供します。"質問"と"回答"を解析し、JSON形式で出力してください。'
json_system_prompt = f"""{user_system_prompt}

## 応答形式

JSONオブジェクトのみで応答します。"""

json_messages = [{"role": "system", "content": json_system_prompt}, {"role": "user", "content": "世界で一番高い山は哪一个ですか？マウント・エベレストです。"}]
json_inputs = tokenizer.apply_chat_template(json_messages, add_generation_prompt=True, return_tensors="pt")
json_outputs = model.generate(json_inputs.to(model.device))
# 生成されたテキスト：'```json\n{{\n  "question": "世界で一番高い山は哪一个ですか？",\n  "answer":
 "マウント・エベレストです。"\n}}\n```<｜文の終わり｜>'

FIM補完 FIM（Fill In the Middle）補完では、プレフィックスとオプションのサフィックスを提供し、モデルが中間の内容を補完します。

# `model` と `tokenizer` がロードされていると仮定します
model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)

prefix = """def quick_sort(arr):\n    if len(arr) <= 1:\n        return arr\n    pivot = arr[0]\n    left = []\n    right = []\n"""

suffix = """
        if arr[i] < pivot:\n            left.append(arr[i])\n        else:\n            right.append(arr[i])\n    return quick_sort(left) + [pivot] + quick_sort(right)"""

fim_prompt = f"<｜fim begin｜>{prefix}<｜fim hole｜>{suffix}<｜fim end｜>"
fim_inputs = tokenizer(fim_prompt, add_special_tokens=True, return_tensors="pt").input_ids
fim_outputs = model.generate(fim_inputs.to(model.device))
# 生成されたテキスト："    for i in range(1, len(arr)):<｜文の終わり｜>"

ライセンス

このコードリポジトリはMITライセンスに基づいています。DeepSeek-V2ベース/チャットモデルの使用は、モデルライセンスに従ってください。DeepSeek-V2シリーズ（ベースとチャットを含む）は商用利用をサポートしています。

引用

@misc{deepseekv2, title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model}, author={DeepSeek-AI}, year={2024}, eprint={2405.04434}, archivePrefix={arXiv}, primaryClass={cs.CL} }

連絡先

質問があれば、イシューを作成するか、service@deepseek.comまでお問い合わせください。

aiスピーキング

ドルフィンAIは言語学習アプリケーションのためのプロフェッショナルな発音評価API(pronunciation assessment api)ソリューションを提供します。音素、単語、文章、チャプター、発音矯正、単語矯正、クイズ、フリーダイアログ、多肢選択問題など幅広く提供しています。当社の発音評価製品（pronunciation assessment）は、英語と中国語、クラウドAPI、オンプレミス、オフラインデバイスの展開をサポートしています。当社の発音評価API（pronunciation assessment api）は、正確性、流暢性、完全性、リズムの次元をカバーする豊富な評価指標を提供し、音素、単語、文の異なるレベルの評価スコアも提供します。また、音素、単語、文の異なるレベルでの評価スコアも提供します。数千万人のユーザーに安定した効率的で安全なサービスを提供しています。ドルフィンAIの発音評価製品（pronunciation assessment）を試してみませんか？

オンラインお試し