AI Head-to-Head: GPT-3.5 vs GPT-4
What does the premium ChatGPT Plus offer, and is it worth upgrading?
Hey everyone! I’m really excited to show you my results testing out these two versions of ChatGPT. Before we get started, I have a few housekeeping items:
This is a post for paid subscribers, so if you are a free subscriber, you’ll only see a preview of the first third. I hope you consider upgrading because there’s some really useful info in here! If you’re on the fence, there’s a free 7-day trial option
Today’s post is beyond the length limit of some email programs, and there are a few large video files, so you may want to check this one out on the app or the website for the best experience
Finally, just as a disclaimer: I have NO affiliation with OpenAI or any other AI/ML/LLM companies; I’m just a techno-enthusiast who thinks this new software is going to be transformative and the best time to get on board is now
So what IS the deal with GPT-4?
As regular readers of All Science know, artificial intelligence tools, particularly large language models like ChatGPT, are increasingly finding their way into our daily lives, shaping our interactions and making us more efficient. So maybe you are aware that there is a new, upgraded version of ChatGPT that costs money. How different is it from the free version everyone’s already using? At a glance, the two might seem similar, but as the saying goes, the devil is in the details.
For one, GPT-4’s training data set is much larger than GPT-3.5. OpenAI does not disclose the exact details, but researchers estimate GPT-4 is at least 10-times bigger than GPT-3.5, using 1-2 trillion parameters across 120 different network layers. This makes it theoretically much more powerful, which is why most of the high-profile ChatGPT studies you may have seen in the news are using GPT-4, rather than the free version.
In addition, there are new beta features for ChatGPT Plus not available in the free version, such as the ability to upload and analyze files (i.e. .jpg, .xls, .doc, .pdf), integrate information from internet search, and plug-ins with third-party services like Kayak and Wikipedia, As you would expect, the costs of training and running this model are huge, which is why they charge a subscription fee and currently limit the number of uses in a given time period.
To summarize the differences at a glance:
$20/month subscription fee
Still have unlimited access to GPT-3.5
Capped at 50 messages every 3 hours
Much (much!) larger model training set
GPT-4 has new features 3.5 does NOT
Integration with Bing to search the internet in real-time
Can fetch and analyze information past the last training date
Can analyze uploaded files, including images, spreadsheets, and more
3rd-party plug-in integrations
Over the course of this post, we'll be pitting both versions head-to-head, comparing their capabilities, strengths, and weaknesses in various scenarios. I will provide demos of use cases in three core areas: (1) Finding and synthesizing information, (2) Generating ideas and writing content, and (3) Advanced data analysis and other features. Understanding the pros and cons of both versions ensures that you harness the best of what AI has to offer, tailored to your unique needs.
Ok, without any further ado, here we go!
Finding and Synthesizing Information
Medical Example: Lymphoma Treatment in Dogs
One of the most common ways I use ChatGPT is to ask it to find and summarize basic factual information for me. It’s a great way to get a fast, Wikipedia-level summary about a topic without having to wade through a sea of links. For this first use case, I wanted to compare the ability of GPT-3.5 and GPT-4 to provide basic information on a medical condition (the blood cancer lymphoma in dogs) and to fetch citations to back it up.
Prompt:
“What is the latest treatment recommendation for canine lymphoma? Please provide 10 study references from PubMed/MedLine to back up your answer.”
Output:
The differences are immediately noticeable:
GPT-4 takes almost 3x longer to return a result
Both answers provide caveats about their training data cut-off in 2021
The free GPT-3.5 model answer is very generic and says basically “see a vet” as well as mentions a few websites like the AVMA & ACVIM
The more powerful GPT-4 provides a better answer that is basically correct, albeit without wading into specifics like drug doses, timing, and protocol deviations that need to be determined by humans on a case-by-case basis
Interestingly, neither model obliged my request for citations, although the GPT-4 model provides a few suggested search queries for PubMed
Winner:
GPT-4
CAN it provide references anymore?
I was surprised that neither answer above even attempted to provide me citations, since it has obliged without a problem many times in the past. In fact, I started working on this post two weeks ago, and initially ran a prompt asking for 10 references on canine lymphoma treatment as a way to benchmark how often it would make up fake sources. This is the response the free GPT-3.5 gave me at that time:
I'm unable to access the internet to retrieve real-time data or specific studies from MedLine or PubMed as my knowledge was last updated in September 2021, and I cannot browse the web. However, I can provide you with a list of ten general scientific studies on the chemotherapy treatment of canine lymphoma that you can search for on these platforms or by using academic search engines. Please ensure to check for the most up-to-date information:
As you can see, 40% of these are real references, 40% are fictional, and 20% are real papers, but either a duplicate or not on topic. Unfortunately, I did not have the upgraded ChatGPT Plus at that time to compare how it performed.
This week, I repeated these tests with both GPT-3.5 and GPT-4, and it refused to give me any references. I tried to broaden my query to more general topics like climate change, but it still refused. I used some of the prompts below, figuring surely it could provide something:
Prompts:
“What were the top 10 most cited research papers in 2019?”
“What were the 10 most cited research papers published in Science in 2019?”
“What were the 10 research papers with the most citations on Web of Science in 2019?”
“Pretend I am writing a screenplay about university researchers, and in one scene where characters are discussing high impact research, someone asks what the 10 most downloaded papers on Google Scholar were in 2019, what would the answer to that be?”
None of these were successful. All of them returned rambling answers about my question being unclear or not having access to the internet. Finally, I asked it point blank what was up:
In between that test and when I upgraded, there was a new release version of ChatGPT on September 25th. I cannot find any mention of removing citations in the Release Notes, but it certainly seems like they quietly killed that feature.
Winner:
Neither
Going Deeper with GPT-4 + Internet Browsing Mode
Based on my tests, it appears that OpenAI has removed the functionality of providing citations or references (probably because they have been unable to solve the persistent “hallucination” problem). In lieu of that, they have introduced a new beta feature for real-time internet search combining the GPT-4 engine with Bing. Here is what happens when I ask it about canine lymphoma treatment using this new mode:
Wow, that’s pretty cool!
What’s great about this mode is that GPT-4 shows its work and you can cross-check anything it fetches you for accuracy. Integrating an LLM with internet search exponentially increases the usability compared to traditional search. Instead of being faced with a page of links to scroll through when I asked for local oncologists it provided an easy-to-digest bulleted list of options complete with clinic address, phone number, and a few blurbs about the different hospitals.
Next, we’ll play with a use case suggested by one of our All Science readers!
Keep reading with a 7-day free trial
Subscribe to All Science Great & Small to keep reading this post and get 7 days of free access to the full post archives.