blog / AI
AI28 March 20243 min read

Claude 3 vs GPT-4: a practitioner's comparison for business use

Anthropic launched the Claude 3 model family in March 2024. After several weeks of testing alongside GPT-4, here's a practical comparison for people who use these tools for real work.

by Matt Roberts

Anthropic released the Claude 3 model family on March 4th, 2024: Haiku, Sonnet, and Opus, positioned at different capability and cost tiers. Opus is positioned as their most capable model, benchmarking competitively with GPT-4 on a range of standard evaluations.

I've been running both alongside each other for real work tasks for the past few weeks. Here's how it breaks down.

Quick context on where I'm coming from

I'm not an AI researcher. I'm an IT professional who uses these tools for: writing and editing, PowerShell and Python scripting, technical research and summarisation, customer communication drafting, and general problem-solving. My comparison is practical, not academic.

Where Claude 3 Opus has an edge

Nuance in writing tasks

For anything requiring careful tone (a difficult email, a proposal that needs to land precisely right, communication that needs to balance honesty with diplomacy), I've found Opus marginally better than GPT-4. It seems to better pick up on implicit requirements in how I describe a task.

Long document handling

Claude 3's context window is 200,000 tokens for Opus. GPT-4 Turbo has 128,000 tokens. In practice, both are more than sufficient for most tasks, but for genuinely long documents (full product specifications, lengthy contracts, extended code reviews), Claude's additional headroom has mattered a few times.

Following complex instructions

When I give Claude a prompt with multiple constraints ("write this at this length, in this tone, for this audience, avoiding these topics, structured like this"), it tends to adhere to the full set of constraints more consistently than GPT-4, which sometimes drops one of several requirements.

Where GPT-4 has an edge

Code generation

For scripting tasks (PowerShell, Python, Graph API calls), GPT-4 remains my preference. The output tends to be closer to idiomatic, production-ready code. Claude's code output is good, but occasionally over-verbose or structured in ways that feel slightly academic.

Plugin and tool ecosystem

ChatGPT with GPT-4 has a broader plugin ecosystem and code interpreter, which adds useful capability for certain tasks. Claude.ai's interface is clean but more limited in these integrations.

Familiarity and predictability

I've been using GPT-4 for over a year. I know how to prompt it. I know where it struggles. That familiarity has real value in a working context.

Where I've ended up

These models are genuinely close in capability for the kind of work I do. The gap between them is smaller than the gap between either of them and GPT-3.5 was. Choosing between them for a specific use case is a matter of marginal differences, not transformative ones.

My current practice: I use GPT-4 as my default, Claude 3 Sonnet for writing-heavy tasks and long document work. For scripting, GPT-4 still has my preference.

If you haven't tried Claude 3 and you're a regular GPT-4 user, it's worth at least a few weeks of parallel testing. The differences are real, even if they're not dramatic. And the model landscape is moving fast enough that "I'm settled on X" is a position you should revisit regularly.

#claude-3#gpt-4#anthropic#openai#model-comparison
Share:X / TwitterLinkedIn

Related posts

🤖
AI

The 90-minute ultimatum that took the world's most powerful AI offline

On Friday 12 June, Anthropic had 90 minutes to fix a security flaw in Claude Fable 5 or take it offline. By 10pm it was gone. It came back this morning. The 19 days in between are worth understanding.

2 Jul 20264 min read
The model running your Copilot demo might not be OpenAI
AI

The model running your Copilot demo might not be OpenAI

Anthropic's Claude is now the default model inside Microsoft 365 Copilot for Excel and PowerPoint. The product you've been selling as Microsoft AI includes a model from a company that isn't Microsoft. The channel pitch needs updating.

22 Jun 20263 min read
SpaceX goes public. The pitch isn't what you think.
AI

SpaceX goes public. The pitch isn't what you think.

SpaceX hit the Nasdaq today as the largest IPO in history. You'd be forgiven for thinking it's a rocket company. A significant chunk of the investment case is actually about building AI data centres in space. Yes, in space.

12 Jun 20265 min read