I put ChatGPT-4o and 5.1 through 9 real-world tests — from logic puzzles to coding, writing and image analysis.