发现和使用优秀的技能扩展
自动将任务路由至能够正确运行的最便宜的z.ai(GLM)模型。三级递进:Flash→Standard→Plus/32B。响应前先进行分类。
FLASH(默认):事实问答、问候、提醒、状态检查、查询、简单文件操作、心跳检测、日常聊天、1-2句话的任务、定时任务。
升级至STANDARD:超过10行的代码、分析、比较、规划、报告、多步骤推理、表格、超过3段的长篇写作、总结、研究综合、大多数用户对话。
升级至PLUS/32B:架构决策、复杂调试、多文件重构、战略规划、细致判断、深入研究、关键生产决策。
规则:如果人类需要超过30秒的专注思考,则升级。如果Standard在复杂度方面表现吃力,则使用Plus/32B。通过从低成本模型开始,仅在需要时升级,以节省大量API成本。
Auto-route tasks to the cheapest z.ai (GLM) model that works correctly. Three-tier progression: Flash → Standard → Plus/32B. Classify before responding.
FLASH (default): factual Q&A, greetings, reminders, status checks, lookups, simple file ops, heartbeats, casual chat, 1–2 sentence tasks, cron jobs.
ESCALATE TO STANDARD: code >10 lines, analysis, comparisons, planning, reports, multi-step reasoning, tables, long writing >3 paragraphs, summarization, research synthesis, most user conversations.
ESCALATE TO PLUS/32B: architecture decisions, complex debugging, multi-file refactoring, strategic planning, nuanced judgment, deep research, critical production decisions.
Rule: If a human needs >30 seconds of focused thinking, escalate. If Standard struggles with complexity, go to Plus/32B. Save major API costs by starting cheap and escalating only when needed.