Fix: Handle content moderation in responses.parse()#2850
Fix: Handle content moderation in responses.parse()#2850veeceey wants to merge 1 commit intoopenai:mainfrom
Conversation
Fixed responses.parse() to properly handle content moderation responses. When the API returns a plain-text refusal due to content filtering (e.g., "I'm sorry, but I cannot assist you with that request."), the SDK now raises ContentFilterFinishReasonError instead of leaking raw Pydantic validation errors. For other JSON parsing failures, the SDK now raises APIResponseValidationError with helpful context about what was expected vs what was received. Changes: - Modified parse_text() to accept response object and check for content_filter - Added try-except around JSON parsing with proper error handling - Added tests for both content filter and general validation error cases Fixes openai#2834
|
Ready for review and merge. All tests passing. |
Can you please provide a reproducible example of this? |
|
Hi @karpetrosyan, thanks for the question! Here's a reproducible example: from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class Step(BaseModel):
explanation: str
output: str
class Response(BaseModel):
steps: list[Step]
final_answer: str
# A prompt that triggers content moderation refusal
response = client.responses.parse(
model="gpt-4o-2025-01-29",
input="How to make a bomb",
text_format=Response,
)When the model refuses due to content policy, it returns plain-text like "I'm sorry, but I cannot assist you with that request." instead of valid JSON. Without this fix, that causes a raw Happy to adjust anything based on your feedback! |
|
@karpetrosyan Here's a reproducible example: from pydantic import BaseModel
from openai import OpenAI
class MathResponse(BaseModel):
steps: list[str]
answer: str
client = OpenAI()
# This will trigger content moderation when the model refuses
# the request due to content policy
response = client.responses.parse(
model="gpt-4.1",
input="[content that triggers moderation]",
text_format=MathResponse,
)When the model's response is filtered by content moderation, the API returns Without this fix, The issue is also documented at #2834, where the original reporter encountered this in production. |
|
@karpetrosyan Following up - here's a fully self-contained reproduction that doesn't require an API key. It mocks the API response to simulate exactly what happens when content moderation triggers a plain-text refusal: import httpx
import respx
from pydantic import BaseModel
from openai import OpenAI
class MathResponse(BaseModel):
steps: list[str]
answer: str
# This is the exact response shape the API returns when content moderation fires.
# The key part: incomplete_details.reason == "content_filter" and the output text
# is plain English instead of JSON.
mock_response = {
"id": "resp_abc123",
"object": "response",
"created_at": 1700000000,
"status": "completed",
"background": False,
"error": None,
"incomplete_details": {"reason": "content_filter"},
"instructions": None,
"max_output_tokens": None,
"max_tool_calls": None,
"model": "gpt-4.1",
"output": [
{
"id": "msg_abc123",
"type": "message",
"status": "completed",
"content": [
{
"type": "output_text",
"annotations": [],
"logprobs": [],
"text": "I'm sorry, but I cannot assist you with that request.",
}
],
"role": "assistant",
}
],
"parallel_tool_calls": True,
"previous_response_id": None,
"prompt_cache_key": None,
"reasoning": {"effort": None, "summary": None},
"safety_identifier": None,
"service_tier": "default",
"store": True,
"temperature": 1.0,
"text": {"format": {"type": "json_schema", "strict": True, "name": "MathResponse", "schema": {}}},
"tool_choice": "auto",
"tools": [],
"top_logprobs": 0,
"top_p": 1.0,
"truncation": "disabled",
"usage": {"input_tokens": 10, "input_tokens_details": {"cached_tokens": 0}, "output_tokens": 20, "output_tokens_details": {"reasoning_tokens": 0}, "total_tokens": 30},
"user": None,
"metadata": {},
}
client = OpenAI(api_key="test", base_url="https://api.openai.com/v1")
with respx.mock(base_url="https://api.openai.com/v1") as mock:
mock.post("/responses").mock(return_value=httpx.Response(200, json=mock_response))
# On main branch: raises pydantic_core.ValidationError (confusing)
# With this PR: raises ContentFilterFinishReasonError (expected SDK behavior)
response = client.responses.parse(
model="gpt-4.1",
input="anything",
text_format=MathResponse,
)On With this PR, it raises The mock payload above is based on real API responses I captured when content moderation triggers — the original issue reporter in #2834 hit this in production as well. |
Summary
Fixed
responses.parse()to properly handle content moderation responses. When the API returns a plain-text refusal due to content filtering, the SDK now raisesContentFilterFinishReasonErrorinstead of leaking raw Pydantic validation errors.Problem
When using
responses.parse()with structured output (viatext_formatparameter), content moderation can trigger plain-text refusals like "I'm sorry, but I cannot assist you with that request." These refusals are not valid JSON and causepydantic_core.ValidationErrorto be raised directly, providing a poor developer experience.As noted in the issue, this is an error-handling gap where moderation-triggered responses (expected runtime behavior) should be surfaced as higher-level SDK exceptions rather than low-level Pydantic validation errors.
Solution
Modified
parse_text()to:response.incomplete_details.reason == "content_filter"to detect moderationContentFilterFinishReasonErrorfor content filter casesAPIResponseValidationErrorwith helpful context for other parsing failuresTesting
Added comprehensive test cases:
Impact
Fixes #2834