Open AI has launched ChatGPT-5.2 with main enhancements throughout a number of key areas. The brand new model performs higher at creating spreadsheets, constructing shows, writing code, analyzing photographs, and managing lengthy or advanced duties. It may possibly additionally deal with multi-step initiatives extra effectively with superior tool-use talents.
In accordance with OpenAI, GPT-5.2 has recorded robust efficiency throughout a number of benchmark assessments. On the GDPval evaluation, the mannequin scored greater than human professionals in well-defined knowledge-based duties masking 44 occupations.
The GPT-5.2 Considering mannequin achieved a brand new state-of-the-art rating on GDPval, it measures efficiency on well-defined information work duties throughout 44 occupations. In accordance with OpenAI, the mannequin beats or matches high business professionals in 70.9% of comparisons.
These duties embrace creating shows, constructing spreadsheets, and producing different work-related outputs. The mannequin additionally accomplished duties greater than 11 instances sooner and at lower than 1% of the price of human specialists, primarily based on historic information.
Coding Upgrades
GPT-5.2 additionally reveals robust enhancements in software program improvement. On SWE-Bench Professional, a difficult analysis masking 4 programming languages, GPT-5.2 Considering achieved 55.6%, setting a brand new state-of-the-art consequence. This model of the ChatGPT is taken into account extra sturdy than earlier assessments, making the achievement notable for real-world coding eventualities.
In suggestions from corporations corresponding to Cognition, Warp, Charlie Labs, JetBrains, and Increase Code, GPT-5.2 demonstrated superior agentic coding potential, with enhancements in interactive coding, code opinions, and bug detection.
Higher Accuracy and Fewer Hallucinations
OpenAI claims that GPT-5.2 Considering hallucinates considerably much less in comparison with its predecessor. When examined on de-identified ChatGPT queries, incorrect responses have been 30% much less widespread, marking an enchancment in factual reliability.
Lengthy-Context Understanding
The mannequin additionally units new efficiency information in long-context reasoning. It leads scores on OpenAI MRCRv2, it measures how properly a mannequin can perceive and join data positioned throughout lengthy paperwork.
Corporations corresponding to Zoom, Databricks, Hex, and Triple Whale additionally noticed that GPT-5.2 handles long-horizon reasoning and tool-calling duties with state-of-the-art functionality.

