OpenAI have announced two new AI models that some folks are claiming are "genius level".
At the same time as dropping these new models, OpenAI also released a free CLI-based tool for programmers called Codex.
Are these tools truly 'revolutionary', or just the latest incremental step in a crowded field?
TL;DR Results show AI is getting smarter, but we are not quite close to "genius" yet, especially when dealing with coding.
What Did OpenAI's Deliver?
Let's look at the fresh stuff from OpenAI:
- 03 and 04 mini: These are new AI models designed specifically to be good at "reasoning" tasks. This means they are built to understand problems, figure out steps to solve them, and, importantly for programmers, write computer code. The "mini" in 04 mini suggests it might be a smaller, potentially faster version compared to larger models.
- Codex: This is an open-source (yes the code is publicly available) command-line interface (CLI) tool. Think of it as a helper you can talk to in your computer's terminal. You give it instructions (a "prompt"), and it uses the power of OpenAI's models (like 04 mini) to write code, create files, run commands, and analyze code directly on your computer or in your coding environment (IDE). It's very similar in concept to another tool called Claude Code.
OpenAI isn't alone. There's a huge amount of activity right now, competition includes:
- Google: Offers Firebase Studio (previously known as Project IDX), a web-based coding environment powered by its Gemini 2.5 AI model, which many consider one of the best for programming tasks currently.
- Microsoft: Owns GitHub and develops Visual Studio Code (VS Code), a very popular code editor. Their AI tool, Copilot, recently received a major upgrade called Agent Mode, designed to handle more complex tasks like creating files and running commands, competing directly with tools like Codeex.
- Windsurf and Cursor (which are modified versions of VS Code with added AI features) have also gained popularity. Claude Code, Devon, Augment, and Bolt, all try to help developers write code faster and better.
What are they like in practice?
Claims of "genius" AI sound impressive, but how well do these tools work?
Here's how the installation and setup for Codex works:
- Install the tool using npm:
npm i -g @openai/codex
- Set your OpenAI API key as an environment variable (this lets the tool connect to your OpenAI account):
export OPENAI_API_KEY="********"
- Run commands using codex followed by your instructions. For example: `codex "build me a nextjs boilerplate website"
Right now results are ... mixed. And not just because I quickly ran out of API credit!
Based on my testing, calling the current AI coding tools "genius" seems like hyperbole if I am being polite.
However, it's also wrong to say these tools are worthless.
Bottom line? Be realistic. Don't fall for the hype.
The rewards earned on this comment will go directly to the people( @loading ) sharing the post on Reddit as long as they are registered with @poshtoken. Sign up at https://hiveposh.com. Otherwise, rewards go to the author of the blog post.