Supra-50M-Reasoning: 오픈소스 소형 추론 모델 CoT

SupraLabs가 Supra-50M-Reasoning(ThinkSupra-50M)을 출시했습니다. 이는 응답 전에 완전한 사고 사슬(CoT)을 생성하는 50M 파라미터의 소형 모델입니다. Supra-50M-Instruct의 추론 변형으로, Qwen3 1.7B가 생성한 500개 합성 샘플 데이터셋을 사용하여 Supra-50M-Base에서 미세 조정되었으며, bfloat16으로 SFT를 통해 6 에폭 학습했습니다. 실험적이며 환각 현상이 발생하기 쉽고, 완전히 공개되었습니다.

추론 형식

모든 응답은 다음 구조를 따릅니다:

<|begin_of_thought|> ... 생각 ... <|end_of_thought|> <|begin_of_solution|> ... 최종 답변 ... <|end_of_solution|>

빠른 시작

import torch
from transformers import pipeline, AutoTokenizer

MODEL_ID = "SupraLabs/Supra-50M-Reasoning"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False)
pipe = pipeline("text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32)
def build_prompt(instruction, input_text=""):
    if input_text.strip():
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"
def generate(instruction, input_text=""):
    result = pipe(build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.3, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False)
    return result[0]['generated_text'].strip()