OpenAI hosted an Ask Me Anything (AMA) session on Reddit on April 30, 2025, featuring Joanne Jang, Head of Model Behavior. During the session, Jang discussed the evolution and future direction of ChatGPT's personality and model behavior. She emphasized that future developments will focus on providing users with more intuitive options to customize AI personalities. Jang highlighted challenges with current model refusals, noting that ideal refusals should cite specific rules without making assumptions about user intent or sounding condescending. She acknowledged that models sometimes hallucinate rules and that current prompt-layer patches are fragile and not scalable. OpenAI plans to implement more substantial training-loop changes to improve model behavior rather than relying on temporary fixes.
Key take-aways from @joannejang 30 April 2025 Reddit AMA (Head of Model Behavior, OpenAI) Straight analysis 1. Expect bigger training-loop changes, fewer band-aid prompts. OpenAI is admitting that prompt-layer patches break easily and won’t scale. Anticipate heavier use of
Summary of OpenAI AMA on Model Behavior with Joanne Jang, Head of Model Behavior at OpenAI (April 30, 2025) Model Refusals - Ideal refusals should cite exact rules without assumptions but can sound preachy, accusatory, or condescending - Models sometimes hallucinate rules like https://t.co/k69KF3kGEQ
🚨 Highlights from OpenAI’s AMA with Joanne Jang, Head of Model Behavior (4/30/25) “We think that an ideal refusal would cite the exact rule the model is trying to follow, but do so without making assumptions about the user's intent or making them feel bad.” “We're training the