ChatGPT v4 aces the bar, SATs and can identify exploits in ETH contracts

GPT-4 completed many of the tests within the top 10% of the cohort, while the original version of ChatGPT often finished up in the bottom 10%.

GPT-4, the latest version of the artificial intelligence chatbot ChatGPT, can pass high school tests and law school exams with scores ranking in the 90th percentile and has new processing capabilities that were not possible with the prior version.

The figures from GPT-4’s test scores were shared on March 14 by creator OpenAI, revealing it can also convert image, audio and video inputs to text in addition to handling “much more nuanced instructions” more creatively and reliably. 

“It passes a simulated bar exam with a score around the top 10% of test takers,” OpenAI added. “In contrast, GPT-3.5’s score was around the bottom 10%.”

The figures show that GPT-4 achieved a score of 163 in the 88th percentile on the LSAT exam — the test college students need to pass in the United States to be admitted into law school.

Exam results of GPT-4 and GPT-3.5 on a range of recent U.S. exams. Source: OpenAI

GPT4’s score would put it in a good position to be admitted into a top 20 law school and is only a few marks short of the reported scores needed for acceptance to prestigious schools such as Harvard, Stanford, Princeton or Yale.

The prior version of ChatGPT only scored 149 on the LSAT, putting it in the bottom 40%.

GPT-4 also scored 298 out of 400 in the Uniform Bar Exam — a test undertaken by recently graduated law students permitting them to practice as a lawyer in any U.S. jurisdiction.

UBE scores needed to be admitted to practice law in each U.S. jurisdiction. Source: National Conference of Bar Examiners

The old version of ChatGPT struggled in this test, finishing in the bottom 10% with a score of 213 out of 400.

As for the SAT Evidence-Based Reading & Writing and SAT Math exams taken by U.S. high school students to measure their college readiness, GPT-4 scored in the 93rd and 89th percentile, respectively.

GPT-4 excelled in the “hard” sciences too, posting well above average percentile scores in AP Biology (85-100%), Chemistry (71-88%) and Physics 2 (66-84%).

Exam results of GPT-4 and GPT-3.5 on a range of recent U.S. exams. Source: OpenAI

However its AP Calculus score was fairly average, ranking in the 43rd to 59th percentile.

Another area where GPT-4 was lacking was in English literature exams, posting scores in the 8th to 44th percentile across two separate tests.

OpenAI said GPT-4 and GPT-3.5 took these tests from the 2022-2023 practice exams, and that “no specific training” was taken by the language processing tools:

“We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative.”

The results prompted fear in the Twitter community too.

Related: How will ChatGPT affect the Web3 space? Industry answers

Nick Almond, the founder of FactoryDAO, told his 14,300 Twitter followers on March 14 that GPT4 is going to “scare people” and it will “collapse” the global education system.

Former Coinbase director Conor Grogan said he inserted a live Ethereum smart contract into GPT-4, and the chatbot instantly pointed to several “security vulnerabilities” and outlined how the code mighbe exploited:

Earlier smart contract audits on ChatGPT found that its first version was also capable at spotting out code bugs to a reasonable degree as well.

Rowan Cheung, the founder of the AI newsletter The Rundown, shared a video of GPT transcribing a hand-drawn fake website on a piece of paper into code.


Leave a Reply

Your email address will not be published. Required fields are marked *