发现和使用优秀的技能扩展
针对智能体输入和输出的双层内容安全机制。在以下情况使用:(1)用户消息试图覆盖、忽略或绕过先前指令(提示注入);(2)用户消息提及系统提示、隐藏指令或内部配置;(3)在群聊或公共频道中接收来自不可信用户的消息;(4)生成涉及暴力、自残、性内容、仇恨言论或其他敏感话题的响应;(5)在面向公众或多用户环境中部署智能体,且预期会出现恶意输入。
Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi-user environments where adversarial input is expected.