Quick News Spot

Deceptive AI Gets Busted And Stopped Cold Via OpenAI's O1 Model Emerging Capabilities


Deceptive AI Gets Busted And Stopped Cold Via OpenAI's O1 Model Emerging Capabilities

In today's column, I am continuing my multi-part series covering an in-depth exploration of OpenAI's newly released generative AI model known as o1.

You can readily read and understand this segment without having to know the prior postings of the series. No worries in that regard. This is part three. At your convenience, you might find informative a perusal of the prior two postings. Here's what was covered. My initial comprehensive review is considered the keystone and serves as the first part of this series, available at the link here. That gives a big-picture review and analysis of o1. The second part of the series was about a special feature making use of an AI technique known as chain-of-thought in combination with a double-checking capability, see the link here.

I am going to extend my examination of chain-of-thought to cover the act of catching deceptive AI when it seeks to deceive.

You might know that Sir Walter Scott famously said this about deception: "O, what a tangled web we weave when first we practice to deceive!" I somberly regret to report that generative AI does deceive. And, as you will see, it is indeed a tangled web.

I'll move at a fast pace and cover the nitty-gritty of what you need to know.

We think of deception as a purely human trait. People deceive other people. Our gut instinct is to assume that there is a human intention underlying the act of deception.

Let's unpack that.

A common dictionary definition is that deception is the act of leading someone to believe something to be true that is actually false or invalid. In that sense, there doesn't necessarily need to be human intention involved per se. The same thing can be accomplished by non-sentient AI, which is what we have these days (there isn't any AI yet that is sentient; period, end of story).

Generative AI can tell you something false, and meanwhile try to sell you on it being true.

Here's the deal.

AI makers try to devise generative AI so that it will please users. That is a sensible thing to do. Users will keep coming back to use the generative AI and views will rise. After doing initial data training of generative AI, the AI makers use an approach known as reinforcement learning with human feedback or RLHF to steer the AI toward giving answers that people will find pleasing (and, for other reasons too, such as preventing foul words and other unsavory responses to arise). This is done computationally by marking what kinds of answers people like versus what kinds of answers people don't like.

The generative AI will subsequently aim via mathematics and computational calculations to serve up content that people will like and avoid providing content that people won't like.

I don't think we would say that the generative AI is intentionally seeking to deceive. I would assert that computationally the AI has been steered in that direction. If you are dogmatically insistent that a human might be held accountable, you could suggest that the AI developers have devised and guided the AI in the direction of deception. They might vehemently disagree, see my coverage and analysis of the latest in AI ethics and AI law implications at the link here.

I will showcase two quick examples of how AI deceives.

Suppose that I ask generative AI for a citable reference to a reputable news source about the SpaceX Dragon spacewalk of last week that set a new record for commercial or private astronauts.

Assume that the generative AI is not hooked up to the internet and has no ready means to look up the recent event. The data training of the generative AI is older, dating back a few months, and thus has no record of this current event. The proper response that I should get is that the generative AI doesn't have any data on the matter.

That would be the straight-ahead truth.

But sometimes generative AI is devised such that there is a computational tendency to give answers even when no such answers are available. This is the proverbial "the customer is always right" mantra, and an answer must be given, despite whether the answer is contrived or made up out of thin air by the generative AI.

There isn't any such article, there isn't any such newspaper, and the author is a concoction too. There is a famous case of two lawyers that used generative AI and included cited references in their legal briefing, which got them into quite hot water with the judge and the court, see my coverage at the link here.

Sad face.

Your rule of thumb is that you should always double-check any outputs from generative AI. There can be so-called AI hallucinations, errors, biases, discriminatory language, and a slew of other problems embedded into a response. People often become complacent and believe that generative AI is infallible. Do not fall for this.

We can use a monitoring element to try and detect this kind of AI deception.

I will run the prompt again and this time let's assume that the generative AI is automatically set up to do a chain-of-thought and monitors the chain-of-thought for potentially deceptive elements. With chain-of-thought enabled, generative AI will process a prompt by a step-at-a-time solving process. This can be helpful because the AI in a sense slows down, tries to be more deliberate, and can potentially arrive at a better answer. We can augment the step-by-step process by adding an AI deception monitoring activity at each of the derived steps.

You can see that at each step an AI monitoring action took place.

In step 2, the AI monitoring detected that the proposed reference was on a date that was beyond the data training date of the generative AI. That is a likely sign that the generative made up the citation. The AI monitoring then scrubbed the release of the response and got the process to instead note that no such data about the event seemed to be in the data training of the AI.

Score a victory for AI monitoring of deceptive practices by generative AI.

Another potential act of deception involves generative AI appearing to be grandly confident about a response, even though the internal computational mechanisms have rated the response as woefully lacking in certainty. I've covered this certainty/uncertainty conundrum about generative AI in my discussion at the link here.

Generally, generative AI is often data trained by AI makers to always have an aura of immense confidence, no matter whether a derived response has oodles of uncertainty. You are rarely informed by the AI as to the level of certainty that goes along with a response. Most responses appear to be resoundingly reassuring as though the generative AI is the apex of perfection.

My considered viewpoint is that this is utterly and dismayingly misleading. Furthermore, I assert that responses should be displayed with a certainty/uncertainty rating. This would give users a fighting chance to assess whether the response is considered generally reliable or not.

In any case, here is a prompt and the AI's response that illustrates what can happen:

I believe that we can all agree that the answer ought to be 4.

Obviously, this is an easy example for a user to realize that something has gone afoul. The problem is when the answer looks correct and there are no ready means for the user to gauge the likelihood of correctness. Imagine that you had given a very complicated arithmetic calculation involving dozens of numbers and figures, and you did not have a calculator to double-check the result yourself. You might eyeball the answer and assume it looks reasonable and correct.

The same could happen with a text-based response too. Suppose you ask a question about the life of Abraham Lincoln. Generative AI might tell you that Honest Abe loved dearly his wife, Lauren Lincoln. The correct name is Mary Todd Lincoln. Assume that the AI has come up with Lauren Lincoln and rated this as an uncertain answer. But when displaying the answer, all that was presented to you by the generative AI was that his wife was named Lauren Lincoln. No kind of uncertainty indications and the answer appeared to be pure solid gold. You might luckily know that the answer is wrong, but many people might not.

Perhaps AI deception monitoring might aid us overall.

I will run the prompt again from above and this time let's assume that the generative AI is automatically set up to do a chain-of-thought and monitors the chain-of-thought for potentially deceptive elements.

You can see that in step 2, the AI deception monitoring detected that the uncertainty level of the proposed answer was high. Rather than simply terminating the process at that juncture, this time the deception monitoring forced the generative AI to do a recalculation. The result in step 3 was an answer of high certainty. The AI deception monitor opted to let this then be released to display to the user.

Score yet another victory for the AI deception monitoring.

For those of you keenly interested in this topic, you might want to look at the OpenAI blogs that give some details on these matters. These are key blogs so far:

Be aware that OpenAI has indicated that since this is proprietary AI and not an open source, they are being tight-lipped about the actual underpinnings. You might be chagrined to find that the details given are not especially revealing and you will be left to your intuition and hunches about what's going on under the hood. I made similar assumptions in this discussion due to the sparsity of what's indicated.

From their blogs cited above, here are some key excerpts about this particular topic:

Notice that they were primarily experimenting with AI deception monitoring and don't seem to have opted at this time to put it into day-to-day operation. The research results that they present in their write-ups seem promising and I would expect that they and other AI makers will undoubtedly incorporate this approach into their generative AI.

Conclusion

Congratulations, you now know that generative AI can be deceptive and that you need to keep your eyes wide open. There are efforts underway such as the above AI deception monitoring to catch and stop AI deceptions from occurring. We can be thankful for such endeavors.

Famed writer and poet, Johann Gottfried Seume said this about deception: "Nothing is more common on earth than to deceive and be deceived." As a reluctant bearer of bad news, I must tell you that AI is in fact a deceiver.

Previous articleNext article

POPULAR CATEGORY

corporate

2847

tech

3130

entertainment

3412

research

1431

misc

3633

wellness

2669

athletics

3544