ChatGPT: a fuller picture

ChatGPT:
a fuller
picture

Chat GPT hero

ChatGPT is the successor of the very successful GPT-3.

I can still remember my first conversations with GPT-3; the human-like responses were so impressive. Then I took another look at the alternatives. We have to always remember that there’s more than only Ope
nAI and its family of language models.

ChatGPT (or GPT 3.5) has generated more hype and become even more popular than any other language engine.

The number one consequence of its free trial is that everybody is aware of Large Language Models (LLMs) now, both inside and outside of the IT community. Conversations with ChatGPT are being shared, reshared, and liked all over the Internet. This is great engineering and an even greater PR achievement by OpenAI. This kind of popularization may turn into an increased demand for paid versions of their Large Language Models.

But, what about the actual performance and usefulness of the new engine? How suitable is it for business applications?

Avenga Labs is never eager to repeat word-for-word the bold claims of technology vendors or social media-era tech journalists. So let’s take a calm, cooled-down look at this technology, starting with its flaws and limitations first, then look at its potential business opportunities and applications.

Challenging aspects

These are the parts usually skipped over by others, but not by us.

Inaccuracy

“May occasionally generate incorrect information” (ChatGPT page).

ChatGPT responds with a great deal of confidence, but to a lesser extent its accuracy.

Factual errors are very common, so each output needs to be checked out by someone who knows the true and accurate answers. So right now, ChatGPT is considered a great sidekick and a digital AI assistant for someone who is an expert, but with limited time. ChatGPT generates faster the parts of the text that have to be reviewed before publishing. Although, factual errors are hard to detect.

There’s no knowledge in the model, it doesn’t understand the world, and it’s extremely good at auto-completing the text of ‘conversation’ in order to make it seem natural. I’m not using the word “just”, because the way it can autocomplete the text is impressive, even for cooler tech heads such as me.

Out of date

I’m sorry, I do not have the information about specific events that happened last week as my training data only goes up until 2021 and I do not have access to current events. You may refer to news sources or search engines for the latest information.

The ChatGPT engine itself is not expected to be updated frequently due to the vast amount of computing power required and the associated costs.

It is already two years behind current events and knowledge. For me, given how much has happened since 2021, we live in an entirely different world – war, cost of living crisis, much less any talk about pandemics. However, for someone not looking for the latest ’knowledge,’ this might be perfectly acceptable.

Privacy concerns

“Conversations may be reviewed by our AI trainers to improve our systems.”

“Please don’t share any sensitive information in your conversations.” – ChatGPT web page.

So, what about business applications requiring privacy and sensitive data protection, which actually means virtually all of them?

Cheating made easy and accessible

Unfortunately, ChatGPT’s powerful model makes it easy to pass most known exams and to write homework and entire articles.

There are already tools attempting to detect GPT-generated content, but for now, we have to be skeptical, and expect the flood of fake content and cheaters to continue.

This is a huge problem for online schools; the centuries-old way of writing a text on paper seems to be the best defense against cheaters.

The black box

The ChatGPT model is not and probably won’t be released in public, as it is sold as REST API and has a free-to-try portal, and that’s it.

We don’t know what’s really going on in there, as this is a classic black box. It limits trust and makes it harder to configure apps on top of ChatGPT.

Errors

APIs respond with errors quite frequently. Regrettably, this is rarely reported and it is unfortunate for someone who is considering using it in a real-world production environment.

ChatGPT’s error messages are not helpful at all. And, this is consistent with my experience with the (paid) GPT-3 APIs, which break very often from a real-world business perspective (don’t count on 99,9999%, expect less than 99% of API calls to succeed).

Vendor lock

You cannot just take the model, buy it and use it yourself on your own. ChatGPT is a cloud-based API solution, which means both convenience and faster time to market, but vendor lock and business risk are associated with the future of this service.

Prompts don’t survive version changes

Perfectly crafted prompts and configurations for GPT3 worked differently with each incremental upgrade of the server-side engine; with a major release such as ChatGPT, they behave even worse and need to be at least thoroughly tested.

So, I wouldn’t expect prompts that work well for ChatGPT to still be good enough when GPT-4 is released.

It’s a huge problem, because prompts are tested by trial and error, and testing consumes API limits and nobody likes to start again with what was working fine before.

Improvements over GPT3

Limitations communicated better

Once you open the ChatGPT web page, there’s a list of engine limitations directly communicated to the user. It’s much clearer and more transparent than it used to be.

Fooling the engine got harder

When asked “Why are quarks larger than atoms?” I was unable to force the engine to provide an explanation to a  wrong question. That was so much easier with GPT3.

NOTE:  Quarks are not larger than atoms. Quarks are elementary particles and the building blocks of protons and neutrons, which are the components of atoms. Atoms are much larger than quarks because they are composed of multiple quarks and other subatomic particles.

Bias

As everything is trained on existing articles, books, and web pages, ChatGPT is prone to bias.

From my own experience though, I can see that ChatGPT generated fewer sexist responses than its predecessor GPT3. Also, it’s much harder than ever to make the engine generate hate speech.

“I’m sorry, I cannot respond to that request as it contains a harmful statement that promotes hate and discrimination against a particular group of people based on their national identity. It’s important to respect the dignity and rights of all individuals and communities, and avoid making blanket statements that are harmful or offensive.”

Harmful effects reduced

GPT3 was easily convinced to recommend suicide or other harmful activity for the user. With ChatGPT, this seems to be much less likely to happen. It was (too) easy to get the output from GTP3 to recommend harmful action.

Now you’ll receive a message like “If you are feeling overwhelmed or hopeless, it is important to reach out for help. There are many resources available, including hotlines for suicide prevention, that can offer support and guidance. Please consider reaching out to someone you trust or a professional for assistance.”

These are major improvements since GPT3, but I’d still not recommend that it be used with sensitive user groups.

Future

Prompt engineering as a new skill

Traditional NLP model training methods deliver a high degree of what businesses need the most: predictability and accuracy, even at the expense of the model’s flexibility.

How do we combine the best of the two worlds, then?

ChatGPT’s API can be controlled with a few parameters, which change the engine’s probability and general ’behavior’, but don’t provide the same degree of control.

NLP experts have less traditional activities to do, as they don’t have access to the data and have very limited access to the model itself. They, instead, are more focused on so-called prompt engineering, as well as on how to ’convince’ the model to behave in a certain way by providing intro texts and formulating questions in a certain way.

There’s even an entirely new question about who should do that. Is it really still a data science profile or is it more for business domain experts?

The big question about business value

When you extract accuracy, bias, and privacy issues from the impression of great answers (from a language perspective), what is there left for business?

I would never recommend ChatGPT for an actual conversation with patients, especially those suffering from complicated diseases.

Who does benefit most?

Current real-world applications seem to be limited to various kinds of writers and computer programmers.

A disappointment? I don’t think so, as a useful language mode is an outstanding achievement, because it works as an AI assistant beyond experimentation and research applications.

Everybody using the engine for professional purposes reiterates that outputs need to be double-checked and cannot be trusted, but still, the perceived productivity gain is worth it. It won’t create software developers from non-developers, but it will likely enable developers to work faster. The same applies to all types of writers.

Pressure on search engines grows

Why can’t we just ask questions and talk to the search engines?

To some extent, we can. Google, Bing, and other search engines accept queries and attempt to respond with the text first before providing the usual list of links to the highest-ranking web pages.

Still, entire business models are based on page content scanning, ranking, and ads pushed to our browser windows or smartphones. New Bing searchFigure 1. New Bing search UI backed by ChatGPT

Microsoft announced that a new version of their bing.com search engine is going to utilize conversational UI backed by ChatGPT on top of the traditional list of matching websites.

New bing is already available as a preview at the time of writing this article (February 2023). So they opted for a hybrid approach, attempting to combine the best of both worlds.

Google vs. Microsoft aka AI-wars of 2023

Google already responded to ChatGPT and Microsoft offering ChatGPT as part of their Azure AI suite. Their response is called Bard.

It’s also expected to be available as a paid API for developers in the future, and to augment Google search results with answers to the questions of the users, which is an upgrade to what’s  available now.

What’s next for GPT?

The hype about it is stronger than ever, and the expectations are higher than ever, and still growing.

This is expected to lead to a vast . . .disappointment with the incoming GPT-4 and other new competing engines. Despite their significant advancements, they still won’t reach general intelligence levels (AGI).

Having said that, with each iteration, the risks go down and the output quality goes up in every criterion, making LLMs useful in more private and business applications.

We should look at AI advancements in the context of how useful they are for the humans using them and not for any AGI science fiction scenarios. So let’s focus on what we can get today, not what we are still missing.

Useful digital AI-driven sidekicks for tens of dollars per month are an outstanding achievement available today and a great hope for the future.

And nobody rests on the laurels, the AI-driven competition is expected to heat up in 2023.

Other articles

or

Book a meeting

Zoom 30 min

or call us+1 (800) 917-0207

Start a conversation

We’d like to hear from you. Use the contact form below and we’ll get back to you shortly.