Do you trust the algorithm? It’s an intriguing question as we consider the possibilities of machine learning and artificial intelligence. Some might even say it’s a question of faith, as we don’t fully understand some of the models and their results.
Background
For the past few weeks, I’ve been experimenting with ChatGPT, the chatbot from OpenAI that is built on a family of large language models. I’m left with one big question – in its current form, can we trust the algorithm?
It’s been difficult to avoid the wave of ChatGPT prompts and learnings, it’s being discussed everywhere and looks like it may soon be integrated into Microsoft Bing.
ChatGPT and generative art algorithms like Dall-E and Midjourney represent new tools that we should all become more comfortable using. There is no doubt additional use cases will continue to emerge, as will new models.
However, many people forget that ChatGPT isn’t necessarily creating new thoughts – it’s all generative and trained on human-created content. I wonder, what will happen if it continues to train itself against algorithmically created content?
The Opportunity
There is no shortage of prompts and examples from ChatGPT doing phenomenal things, but I wondered just how accurate the algorithm was with a pretty benign data set.
As a baseball fan, I hopped back into the world of collectibles before the pandemic. Reliving my youth, I turned to the booming card industry. I collect The Topps Company Living Set, which is an ever-growing collection of hand-painted baseball cards.
As a goal and legacy, I’m trying to have every living player sign their card. Being the geek that I am, I’ve also organized the collection and player data into a Google Sheet.
Testing the Accuracy of ChatGPT
As part of the Google Sheet, I’ve slowly been updating personal information about each player to track whom I may need to prioritize based on age.
What would take me hours to aggregate manually or to develop a script for looked like an excellent use case for ChatGPT.
I started with a relatively easy prompt, “can you give me the birth date and death date of the following baseball players in table format?”
I was initially surprised with how quickly the information was generated, and the spot checks I was doing seemed to indicate the data was correct. But I was unwilling to stop there.
While the algorithm seemed to save me a ton of time collecting this information, I saw two errors that made me question the results. Initially, ChatGPT indicated that Hall of Fame inductees Jack Morris and Fergie Jenkins were dead.
I knew that wasn’t true, but it sent me to Google to confirm my suspicions. Upon discovering the issue, I noticed that many of the younger players on the list had suspect birthdays as well.
Specifically, I dug into the birthday of Bobby Witt Jr. and even after correcting the model, still received incorrect information. Further, as I tried to understand where the information came from ChatGPT couldn’t provide an answer.
The Results
There are currently 580 players on my checklist that I prompted ChatGPT to provide the birth date and death date for.
I then cross-referenced every player manually with their entry on Baseball Reference. I was shocked by my findings.
- 32%, or 188, of the players had incorrect information
- 23%, or 139, of the players had birth dates that were off my months or years
- 4%, or 25, of the players had a birth date that was exactly off by one year
- 3%, or 18, of the players had a birth date that was off by a matter of days
- Five players were incorrectly reported as having passed including Jack Morris, Fergie Jenkins, Rod Carew, Carlton Fisk, and David Ortiz
It was eye opening to see more than a third of the list being misrepresented. While many people have correctly celebrated the abilities of ChatGPT, but it’s accuracy on factual data is quite concerning.
Further, it would be helpful if we could understand the source of the information or have ChatGPT provide a confidence score with fact-based results.
Perhaps that feature will be coming soon, if not I’d recommend taking great care with similar prompts.
It isn’t premature to evangelize such tools or consider the implications of such tools, this just illustrates the growth that these tools and this space still need.
The question remains, can we trust the algorithms that could, or do, rule our world? It has many parallels to the faith people put into religion and I’m curious how much we’ll just accept the answers provided vs. understand where they are coming from.
Proceed with care and caution.