Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_50
Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_50
Qwen Team 发布 Qwen-Scope,用于 Qwen3/Qwen3.5 interpretability,在 Qwen3.5-27B 的 0–63 层 residual stream 训练 TopK Sparse Autoencoders,d_sae 81920、d_model 5120、Top-K 50,并提供 PyTorch checkpoints、feature activation 提取示例、Gradio demo 与技术报告。
Qwen-Scope:解码智能,释放潜力

我们很高兴介绍 Qwen-Scope,这是一个基于 Qwen3 和 Qwen3.5 系列模型训练的 interpretability(可解释性)模块。具体而言,我们在 Qwen 的 hidden layers(隐藏层)中集成并训练了 Sparse Autoencoders (SAEs)。通过引入 sparsity constraints(稀疏性约束),我们可以自动提取高度解耦、低冗余且显著更具可解释性的数据特征。Qwen-Scope 不仅可用于分析 Qwen 行为的内部机制,也在模型优化方面具有很大潜力。应用场景包括可控 inference 控制、evaluation sample 分布分析与比较、数据分类与合成,以及模型训练与优化。更多细节请参见我们的 technical report。
Model Details
| Property | Value |
|---|---|
| Base model | Qwen3.5-27B |
SAE width (d_sae) |
81920 |
Hidden size (d_model) |
5120 |
| Expansion factor | 16× |
| Top-K | 50 |
| Hook point | Residual stream |
| Layers covered | 0 – 63(共 64 层) |
| File format | PyTorch .pt dict |
Architecture
这是一个 TopK SAE —— 在每次 forward pass 中,恰好保留 50 个非零特征。
每个 checkpoint 文件 layer{n}.sae.pt 都是一个包含四个 tensor 的 Python dict:
| Key | Shape | Description |
|---|---|---|
W_enc |
(81920, 5120) |
Encoder 权重矩阵 |
W_dec |
(5120, 81920) |
Decoder 权重矩阵 |
b_enc |
(81920,) |
Encoder bias |
b_dec |
(5120,) |
Decoder bias |
Files
该 repository 为每个 transformer layer(第 0–63 层)包含一个 SAE checkpoint:
layer0.sae.pt
layer1.sae.pt
...
layer63.sae.pt
Feature Activation Extraction
端到端 demo:运行 base LLM,在选定层 hook residual stream,并提取稀疏 SAE feature activations。 在大多数情况下,使用基于 base models 训练的 SAEs 来探索 post-training checkpoints 的内部过程也是合理的。
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# ── 1. Load base model ────────────────────────────────────────────────────────
model_name = "Qwen/Qwen3.5-27B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
model.eval()
# ── 2. Load SAE for a target layer ───────────────────────────────────────────
LAYER = 0 # choose any layer in 0–63
sae = torch.load(f"layer{LAYER}.sae.pt", map_location="cpu")
W_enc = sae["W_enc"] # (81920, 5120)
b_enc = sae["b_enc"] # (81920,)
def get_feature_acts(residual: torch.Tensor) -> torch.Tensor:
"""residual: (..., 5120) → sparse feature activations (..., 81920)"""
pre_acts = residual @ W_enc.T + b_enc
topk_vals, topk_idx = pre_acts.topk(50, dim=-1)
acts = torch.zeros_like(pre_acts)
acts.scatter_(-1, topk_idx, topk_vals)
return acts
# ── 3. Hook residual stream after the target transformer layer ────────────────
captured = {}
def _hook(module, input, output):
hidden = output[0] if isinstance(output, tuple) else output
captured["residual"] = hidden.detach().cpu()
hook = model.model.layers[LAYER].register_forward_hook(_hook)
# ── 4. Forward pass ───────────────────────────────────────────────────────────
text = "The capital of France is"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
model(**inputs)
hook.remove()
# ── 5. Extract feature activations ───────────────────────────────────────────
residual = captured["residual"] # (1, seq_len, 5120)
feature_acts = get_feature_acts(residual) # (1, seq_len, 81920)
# Inspect active features for the last token
last_token_acts = feature_acts[0, -1] # (81920,)
active_idx = last_token_acts.nonzero(as_tuple=True)[0]
print(f"Active features : {active_idx.tolist()}")
print(f"Feature values : {last_token_acts[active_idx].tolist()}")
Gradio Demo
我们还提供了一个 gradio demo app.py。你可以在本地运行:
python app.py \
--model Qwen/Qwen3.5-27B \
--model-name-sae-trained-from qwen3.5-27b \
--model-name-analyzing-now qwen3.5-27b \
--sae-path Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_50 \
--top-k 50 \
--num-layers 64 \
--sae-width 81920 \
--d-model 5120 \
--server-port 7860
Caution
严禁将 interpretability tools 用于非科学研究目的,以干扰模型能力,或编造、生成、传播违反公序良俗和社会主义核心价值观的有害信息,包括色情、暴力、歧视性或煽动性内容。违规者的授权将自动终止,并应承担由此产生的一切法律责任。本声明的最终解释权归项目所有者所有。
Citation
如果你在研究中使用这些 SAEs,请引用:
@misc{qwen_scope,
title = {{Qwen-Scope}: Turning Sparse Features into Development Tools for Large Language Models},
url = {https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf},
author = {{Qwen Team}},
month = {April},
year = {2026}
}