AI isn't as smart as you think

avatar
(Edited)

2023 was the year of ChatGPT. Everyone is using AI to solve problems, reduce labor, and trying to make a quick buck. It's an amazing technology and is extremely powerful, but at the same time it is super dumb.

We have always known computers are really dumb and are only as smart as the developers who program them. They are however extremely quick and excel at tasks that require repetative operations. I can say this, and you probably already know this, but let me show you some examples that really show what I mean.

I love AI, it is a fascinating technology and is extremely fun to work with. AI in the past has primarily been only available to those with endless amounts of money, but with the announcement of ChatGPT that all changed. Bleeding edge AI is now available to everyone on the planet. It doesn't end with ChatGPT either, ChatGPT was just a catalyst of the AI arms race.

You can now run AI on your own machine provided you have good enough hardware. By good enough hardware I mean as simple as a typical gaming pc. The faster the better, but some AI models you can use without even having a GPU.

I currently run around eight different Large Language Models (LLM, like ChatGPT) on my local machine. Each of them has uses and performance. Some are better for chat, some are better at coding, and some are just larger. Some of them are actively use for projects, and some I am just checking out to see if they are a suitable replacement. Opensource LLM are changing rapidly and a day doesn't go by where another isn't announced. The problem however is most of them are training to perform very well in AI leaderboards but in actual use they are fairly useless.

There are a few riddles and problems I like to use to see how well a model performs, while this isn't the end all be all test, it does give you a glimpse on how well a model performs. It isn't without a catch, which I will talk about later.

A simple problem a typical human can reason is this.

Claire has 6 brothers, and each of her brothers have two sisters. How many sisters does Claire have?

I think most people can figure this out, it's a pretty simple problem. Let's see how AI does.

ChatGPT (3.5 Turbo)

This answer is close, it's obviously wrong, but it's pretty close.

Bing Chat

Google Bard

Open Orca

DeepSeek 7B

ChatGPT 4

Here we have the correct answer, and rightly so as ChatGPT 4 is a huge improvement over all models and even ChatGPT itself (which uses ChatGPT 3.5 Turbo by default).

You can see each model interprets the question differently with only ChatGPT 4 giving the correct answer. This is a specifically a challenging problem for AI as it requires logic that seems simple for a human, but for a computer that is designed to predict the next character in a sentence it is a much more difficult task.

I mentioned above there is a catch with this test, and I'm going to get into that. This specific riddle is one that was made famous by someone on Reddit who posed this challenge a while ago. He did something similar, using it to test multiple models. Over time, two things were discovered.

The first, he noticed models started to solve the problem but it wasn't authentic, it was just regurgitating training data. What he discovered is his riddle was being used to train large language models so they can solve the riddle as well as many others. This resulted in certain models recoognizing this problem and spitting back an answer as if it was reciting multiplication tables. It doesn't actually do the thinking, it just "knows". This has been proven by (like my example) using a different name than the original riddle. In the explanation on how the AI solves it, it spits back the original name used by the Redditor, which was never given in the context. For example, in my example I named the sister Claire, but the AI would initially say Claire has x sisters, but when you ask it to break it down step by step, the last step would refer to the name in the original Riddle.

I said there was two things discovered, the second is that you can guide an AI to solve this problem, sometimes by simply asking it to do it step by step. This does not always change the outcome, but in some cases it did.

AI models are changing rapidly, and one of the things that differentiates a model is how it was training and what data was used to do it. Many open source models are fine tuned on specific data sets to improve their performance for specific tasks. A good example of this is the myraid of programming tuned models, most of which only perform well on Python. The amount of work and hardware to train a model is high, and in some cases millions of dollars to run a single training round.

I'll leave you with another interesting riddle that typically trips up AI.


Why you should vote me as witness



0
0
0.000
42 comments
avatar
(Edited)

Your insights into the dynamic landscape of AI especially with the rise of accessible models like ChatGPT provide a fascinating perspective. The demonstration with the sister riddle illustrates both the progress and challenges in AI comprehension. Its intriguing to see the variations in responses across model emphasizing the evolving nature of these technologies. The revelations about models regurgitating data for specific challenges and the potential for guided problem solving shed light on the complexities of AI training.

0
0
0.000
avatar

AI is one of man’s greatest discoveries but also a great distraction to young folk leaving them to rely so much on it

0
0
0.000
avatar
(Edited)

ChatGPT 4 says that the answer is two sisters but isn't the correct answer to the question posed: (How many sisters does Claire have?) is just one?

The model gets it's explanation right, but the answer to the direct question at the end of the riddle is incorrect

0
0
0.000
avatar

Yes, technically it is wrong, but only by expression, the logic is accurate.

It correctly realizes there are two total sisters and Claire is one of them, but it inaccurately answers the direct question of "How many sisters does Claire have", but it comes to the correct conclusion.

This is another example how AI isn't as smart as people give it credit for and why AI isn't going to replace developers any time soon. You need to be a pretty good developer to take advantage of AI for developing as it frequently makes mistakes and you got to know what you are doing to identify and correct these.

0
0
0.000
avatar

I edited my question for clarity just before you replied, but yeah. I am of the opinion that genuine Artificial Intellegence doesn't exist!

0
0
0.000
avatar

There are three AI development stages largely accepted.

ANI
AGI
ASI

ANI - Artificial Narrow Intelligence. This is where we are, if I teach a computer to play Chess, I can't expect it to be able to play Checkers. ANI learns how to do one task very well based on how well you teach it.

AGI - Artificial General Intelligence. This is way in the future, although many claims we are getting close. If I teach an AGI how to be a lawyer, I can expect it to know how to be a doctor. AGI can do most things well, but not necessarily as good as a human, but you don't have to train it on every thing individually.

ASI - Artificial Super Intelligence. This is where many believe may result in the end of the world as we know it. This is where you no longer need to train AI, it is capable of learning on it's own and at a pace that dwarfs a human. Think of how quickly a straight A student can learn advanced calculus compared to how quickly a high school drop out would learn the same subject. This is what is referred to as a snowball effect, once an AI gets 5x smarter than humans, it will quickly get 10x smarter, 100x smarter, and ultimately 1M times.

0
0
0.000
avatar

The development of ASI would mark the start of the 'technological singularity,' as it's called?

0
0
0.000
avatar

Thanks for sharing these comparisons... AI is very good at working with absolutely huge amounts of data, and finding patterns, etc... but it cannot actually think or solve problems. I think people keep forgetting that it's just a tool and not, you know, a god.

I do find it weird that not a lot of people seem to be too upset with the energy usage of a lot of LLMs, maybe it's not super common knowledge?

0
0
0.000
avatar

I do find it weird that not a lot of people seem to be too upset with the energy usage of a lot of LLMs, maybe it's not super common knowledge?

The energy usage is tiny, an AI model goes through two phases.

  • Training
  • Usage

Training is extremely time consuming and costly, but once that's done you can run many models on a Raspberry Pi to solve actual real problems. In the end, the final result is just a bunch of weights (numbers) representing paths to take in the neural network for specific inputs.

The amount of energy AI can save in the long run is astronomical. Imagine solving cancer, developing new energy solutions, making tedious task far more efficient, taking over the world with robots.

0
0
0.000
avatar

Thanks for that differentiation. I didn't realize the training only happened in the initial phase, I thought LLMs were continuously being trained with the latest available data, I'll have to look into it more. Thanks.

0
0
0.000
avatar

They do get updated, but it isn't something that happens in real time like sending updates and creates to a database. They need to be retrained, then deploy the new model. Training these models requires millions of dollars and can take months. So you won't see frequent updates until technology improves.

0
0
0.000
avatar

I’m sooo looking forward to AGI. Now that will get many people bamboozled

0
0
0.000
avatar

The problem lies in how you ask the question. Use the following structure to help you come up with an accurate reasoning:

Claire has 6 brothers and each of her brothers have two sisters. How many sisters does Claire have?

The logical connective "and" without a comma establishes the relation that both premises must be given.



Bing Chat with GTP-4

0
0
0.000
avatar

yes, prompt engineering is an entire field on it's own right now. A big part of working with AI is figuring out the best prompts, and this varies model to model.

As a human, I don't have to play around with how I ask questions like that.

0
0
0.000
avatar

The real problem is that the common user does NOT know how to ask the question. Logic, mathematics, ..., have an exact way how to word it. A misplaced as, an accent and even a grammatical error, can confuse any human.

If I were to ask you a question like this: "How many bricks does it take to complete a building made of bricks?", neither you nor the AI can give me the answer and neither can any engineer. To solve this question requires data, specifically dimensions of the bricks and dimensions of the building... The way we formulate the question, the information provided, the logic and the structure of the sentence, count for anyone, human or not.

0
0
0.000
avatar

If I were to ask you a question like this: "How many bricks does it take to complete a building made of bricks?", neither you nor the AI can give me the answer and neither can any engineer. To solve this question requires data, specifically dimensions of the bricks and dimensions of the building... The way we formulate the question, the information provided, the logic and the structure of the sentence, count for anyone, human or not.

This though, is a perfect question Google would ask on an interview to test how you solve problems. Questions like how many golf balls can you fit in a limo are good examples of these type of questions.

The question I asked though is easily resolved with a human, but with AI they just miss the point.

0
0
0.000
avatar

My speciality is Numerical Analysis -abstract thinking- and one of the problems with common language is that we pretend to be understood no matter what. Logical language, on the other hand, does not tend to confuse, its structures are exact.

0
0
0.000
avatar

Great examples and I agree as I started using it more and working with it. It became very clear how legit "dumb" AI is and how it's pretty much just a algo of scrapped data that everyone else has produced already.

You legit have to train people how to write prompts to even correctly interact with the things lol.

What is cool though is the new way people are going about it. Before everyone coded in this case do A and in this case do B. How it's coded in a way that the script tries to figure it out and then learn and adopt. It's actully pretty awesome what they are doing with some machines.

0
0
0.000
avatar

AI still seems to be ever increasing amounts of data (albeit cleaned up a bit with labeling) and more compute massed together.

0
0
0.000
avatar
(Edited)

I dunno, I like bard's whole 'it depends' shitlordery. The brothers could indeed have half sisters XD

image.png

It's interesting to me as a musician to see how utterly useless these AI are with the subject. It very quickly gets confused with a lot of basic requests that aren't even riddles but just like, chord progressions

0
0
0.000
avatar

It's interesting to me as a musician to see how utterly useless these AI are with the subject. It very quickly gets confused with a lot of basic requests that aren't even riddles but just like, chord progressions

You can fine tune them to greatly improve them, and prompt engineering is a real science, you can drastically improve your results if you know how to ask properly.

0
0
0.000
avatar

Yeah the whole art of prompting is beyond me based on my attempts in the past! I'd enjoy a music-specific bot though, for whatever purpose

0
0
0.000
avatar

that was also funny to me how Bard went "this is a big family and who know how all of those people are related" 😄

0
0
0.000
avatar

On the 27 of Sept. last year (my birthday) I asked ChatGPT= When is my birthday? Of course it answered there was no way it could know that. So I gave him a hint, when is Google's and Cosme Damian Churruca's (a Spanish admiral from over two centuries ago) birthday? That he answered and told me that given that information my birthday was on September 27 and it would congratulate me on that day. I told it it was the 27 and it apologized to me. So I think ChatGPT is very smart but is lost in time.

0
0
0.000
avatar

Almost every day there is a new announcement related to AI very hard to keep up with, I do like the assistants that you get in the openAI playground, they help so much with my day job.

Fantastic cover image, was that done in mid-journey? oh and if you like to watch TV shows about AI, the best one ever is called Beacon 23, very deep character development and a interesting story-line. (watched it twice now)

0
0
0.000
avatar

Ah, yeah. This reminds me of the Loebner Prize, where chatbots are tested by a jury that tries to expose them as non-human. They ask them these kinds of silly questions. I understand their motivation, but I also know that the AIs you have shown are not pretending to be human, so I don't expect them to be perfect.

Maybe these puzzles are part of linear reasoning, which has been a challenge for language models for a long time, as I learned from a Ted Ed talk. But as you said, they are improving.

I also notice that some models give very formal answers to what humans would consider simple riddles. I know they have limitations, but I also think that the riddle is not a pure logic problem. It is just a riddle. In logic, you need to ensure that your propositions are valid. I wonder if the AI could do better if the riddle was rephrased and broken down into valid propositions.

This also makes me think of the cognitive development puzzles that Piaget studied and that educators use with children under 10. I guess they don't mock them for thinking that a pile of coins has more than a stack, even though they have the same number.

I see these AI issues as fun and interesting examples that can help us find areas for improvement. But if I can't quote a a rigorous study that shows that language models struggle with this type of problems, I would be cautious about judging them. Maybe I am ignoring hundreds of riddles where the models do well and only focusing on their failures. But if I clarify that this is just an observation and not a generalization, I think it is fine.

Good read. Congrats!

0
0
0.000
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

The hamburger was eaten and turned into poop. So eventually in the sewers, transcended into a divine deity?

0
0
0.000
avatar

I think we will see these engines get much better in the next few years. AI research has gone on for ages with Turing doing some of the early work, but access to lots of processing power is accelerating things. I see lots of people saying they are 'dumb', but they can do a lot. They just lack that 'common sense' element that even some people struggle with. I would expect that people are developing knowledge bases of certain fundamental facts that the engines can use, so they don't give people too many fingers/noses/eyes in generated images.

This technology will create lots of challenges. We're already seeing it used to generate posts and replies here with peakd integrating it. It is a revolution and there will be casualties. I just hope that a lot of good can come out of it too.

0
0
0.000
avatar

Do you mind if I steal some of these illustrations and examples for a library presentation on AI which I have been developing?

0
0
0.000
avatar

I think AI is learning from us while we use it for so many purposes. It won't learn humanity though. Can you check out my latest article? Thanks if you have 2 minutes just for a feedback maybe with your opinion.

0
0
0.000
avatar

Wait until these LLM's start being trained on their own output, gonna be like the models on DMT.

0
0
0.000