This is a huge problem for online schools; the centuries-old way of writing a text on paper seems to be the best defense against cheaters.
The black box
The ChatGPT model is not and probably won’t be released in public, as it is sold as REST API and has a free-to-try portal, and that’s it.
We don’t know what’s really going on in there, as this is a classic black box. It limits trust and makes it harder to configure apps on top of ChatGPT.
Errors
APIs respond with errors quite frequently. Regrettably, this is rarely reported and it is unfortunate for someone who is considering using it in a real-world production environment.
ChatGPT’s error messages are not helpful at all. And, this is consistent with my experience with the (paid) GPT-3 APIs, which break very often from a real-world business perspective (don’t count on 99,9999%, expect less than 99% of API calls to succeed).
Vendor lock
You cannot just take the model, buy it and use it yourself on your own. ChatGPT is a cloud-based API solution, which means both convenience and faster time to market, but vendor lock and business risk are associated with the future of this service.
Prompts don’t survive version changes
Perfectly crafted prompts and configurations for GPT3 worked differently with each incremental upgrade of the server-side engine; with a major release such as ChatGPT, they behave even worse and need to be at least thoroughly tested.
So, I wouldn’t expect prompts that work well for ChatGPT to still be good enough when GPT-4 is released.
It’s a huge problem, because prompts are tested by trial and error, and testing consumes API limits and nobody likes to start again with what was working fine before.
Improvements over GPT3
Limitations communicated better
Once you open the ChatGPT web page, there’s a list of engine limitations directly communicated to the user. It’s much clearer and more transparent than it used to be.
Fooling the engine got harder
When asked “Why are quarks larger than atoms?” I was unable to force the engine to provide an explanation to a wrong question. That was so much easier with GPT3.
NOTE: Quarks are not larger than atoms. Quarks are elementary particles and the building blocks of protons and neutrons, which are the components of atoms. Atoms are much larger than quarks because they are composed of multiple quarks and other subatomic particles.
Bias
As everything is trained on existing articles, books, and web pages, ChatGPT is prone to bias.
From my own experience though, I can see that ChatGPT generated fewer sexist responses than its predecessor GPT3. Also, it’s much harder than ever to make the engine generate hate speech.
“I’m sorry, I cannot respond to that request as it contains a harmful statement that promotes hate and discrimination against a particular group of people based on their national identity. It’s important to respect the dignity and rights of all individuals and communities, and avoid making blanket statements that are harmful or offensive.”
Harmful effects reduced
GPT3 was easily convinced to recommend suicide or other harmful activity for the user. With ChatGPT, this seems to be much less likely to happen. It was (too) easy to get the output from GTP3 to recommend harmful action.
Now you’ll receive a message like “If you are feeling overwhelmed or hopeless, it is important to reach out for help. There are many resources available, including hotlines for suicide prevention, that can offer support and guidance. Please consider reaching out to someone you trust or a professional for assistance.”
These are major improvements since GPT3, but I’d still not recommend that it be used with sensitive user groups.
Future
Prompt engineering as a new skill
Traditional NLP model training methods deliver a high degree of what businesses need the most: predictability and accuracy, even at the expense of the model’s flexibility.
How do we combine the best of the two worlds, then?
ChatGPT’s API can be controlled with a few parameters, which change the engine’s probability and general ’behavior’, but don’t provide the same degree of control.