AIEEEE! OK sometimes we are a debate club. Ye olde AI discussion thread

Otto Von Herunterhängen · Mar 3, 2026

TailsWin said:
The guy is consistent in his belief, I just thought this video is a good example because generally I see it the same way, at least the overall trend. I'm just not a fan of cloud stuff, I'd rather have everything local and independent. With cloud as backup, but that keeps getting more risky.

But I've definitely gone more into talking to AI about stuff then reading it up, especially for science topics. At least for the initial reference. If it's important enough for me to check, I can always follow the links to original sources.

That's true of everything. You can read a book that has wrong information, you may have remember something incorrectly, or misunderstood something and end up believing something incorrect.

Actually, AI has an advantage here. Because it siphons up so much info, it can a) iron out or dismiss anomalies, or b) spot the discrepancies and tell you about them.

I do like that when talking to an LLM, it's inclined to often tell me multiple perspectives, not just a single one as humans often tend to do.

But, training an AI is essentially akin to lossy information compression. It is inherently nor necessarily accurate. But I don't hold it against it. If I need to store information accurately, that's what old-fashioned storage systems are for.

Although I wonder how that's gonna develop. Will we need to store pictures in image formats, if AI will be able recreate them from memory? And is it much different in principle, than using all the image tools and filters to make photos nicer?

How about video? That takes a ton of data. But if you have locally stored visual information about the actors and all the stuff, maybe all we need is some simple descriptions how things move about, and stream/store that.

Well, so are humans. Not really sure why people expect AI to be 'perfect', I think it's pretty cool if it can decrease the human error rate, and that keeps improving. See self-driving cars. It can never be perfect, but nothing is.

Yea, taking an output from an AI always at face value is risky. But that's always the case. In the recent years/decades, it seems we're more and more conditioned to just take anything in without thinking about it: news, religion, science, political speech, everything is presented as facts not to be questioned. Then drop the authoritative-sounding AI into the mix. Well of course then, that people don't want to think about its outputs.

I'm not anti-AI, and I think that is the way systems are going to go simply because of cost. But I still think there is value in knowing what the underlying data is it based it's decisions on, and there is none of that. That's why books always had fucking footnotes and bibliographies, so you could see where they are getting their data from.

And AI has offered none of that to date, just a summary. fuck that.

Otto Von Herunterhängen · Mar 3, 2026

TailsWin said:
How do you think Google decides what search results to show you first or what YT videos to recommend... It's the same problem, big tech deciding what you should know and how to think.

It's not just that. AI isn't actually a database. It's rather a ridiculously complex algorithm that's difficult to parse what exactly it's doing.

But again, it's made by humans, and humans get to decide what biases it should have or what kind of errors are acceptable, even if they catch them.

It's pulling off a database, the ridiculously complex algorithm is just about which data it chooses from that database and why. And of course now companies are gimicking that algorithm if they aren't getting the answer they want. And that is REALLY troubling.

TailsWin · Mar 3, 2026

Otto Von Herunterhängen said:
And AI has offered none of that to date, just a summary. fuck that.

You can ask it to elaborate on details.

From the cloud LLMs, I mostly use Perplexity, and that always provides links where it's pulling information from. Tho I think most AIs do that these days.

And when coding, it can quote the specific piece of code it's referring to.

Yea if you just take the first output you get, it can be hit or miss, but you can always talk to it to get more info. It also helps because if it's not actually sure, when asked to elaborate, it tends to trip up in obvious ways.

Similarly, introducing the problem first helps it understand what you want. That's technically pre-prompting, but it's the same thing as when you ask a person to do something; you may need to explain what you want first before telling them the task.

Idk maybe it's just me. Since I work with models a lot, I'm used to spotting when they talk out of their ass. It's like they have a tell. But even if you're not used to that, you can always do check other sources and use the AI as a starting point.

Btw funny story...

The other day, I was doing something with one of the GPTs, and it told me to compare hashes, saying it should be some specific string 'or similar'.

The fuck you mean, 'or similar'? It's a hash, either it matches or it doesn't lol. But that's one of the things they tend to do. If they don't know or are borked, they start adding such qualifiers. So even someone who doesn't know what a hash is, may realise it's a red flag. For the time being, at least.

But it's already an improvement to a few years ago, when they didn't do that and just kept going on with full authority even if they didn't know what they're saying.

Still best to check the important parts yourself tho.

TailsWin · Mar 3, 2026

Otto Von Herunterhängen said:
It's pulling off a database, the ridiculously complex algorithm is just about which data it chooses from that database and why.

Not really. The original data it's trained on may be a database or whatever, but once it's compiled into the payload (LLM or some other style), that data is melted into the neural network.

If you train an LLM on Lord of the Rings books, you're not actually gonna find the full text of LOTR anywhere when you try to disassemble it. The only way to recreate it is to run the algorithm, so it will output the original text. But the output may be accurate to different degrees based on the training.

It's something like lossy compression, or turning a bitmap image into a vector image. The data is stored as a mathematical representation that's supposed to be similar enough to the original. And it really is an algorithm in the sense that with the same sets of inputs, you get the same output.

But in case of LLMs, the original data is almost irrelevant, because it's just a word soup. The important part are all the steps to get the 'right' output, and that's where all the biases come in, because it's the humans making the general decisions on how the AI should behave.

So yes, it's about who chooses what you get out of it, but no, it's not a database unless you hook one up as an external source. In which case, the AI needs to have some interface to work with it.

Otto Von Herunterhängen · Mar 3, 2026

TailsWin said:
Not really. The original data it's trained on may be a database or whatever, but once it's compiled into the payload (LLM or some other style), that data is melted into the neural network.

If you train an LLM on Lord of the Rings books, you're not actually gonna find the full text of LOTR anywhere when you try to disassemble it. The only way to recreate it is to run the algorithm, so it will output the original text. But the output may be accurate to different degrees based on the training.

It's something like lossy compression, or turning a bitmap image into a vector image. The data is stored as a mathematical representation that's supposed to be similar enough to the original. And it really is an algorithm in the sense that with the same sets of inputs, you get the same output.

But in case of LLMs, the original data is almost irrelevant, because it's just a word soup. The important part are all the steps to get the 'right' output, and that's where all the biases come in, because it's the humans making the general decisions on how the AI should behave.

So yes, it's about who chooses what you get out of it, but no, it's not a database unless you hook one up as an external source. In which case, the AI needs to have some interface to work with it.

If you take the human interpretation of that, it's sort of like a doctor that makes diagnosis based on his training, which was from essentially a database of medical knowledge at the time he went to school. How good his diagnosis are depends on how well he retained knowledge, how good that knowledge actually was, and how the data has changed since he went to medical school if he isn't updating his training.

So no, a human doctor isn't a database, but it's really hard to separate them from one. If the data they were trained on had flaws, so do they.

The difference between an AI doctor for diagnoses and a human one is the cost, while they both have training expense, the human doctor can only see one patient at a time, and has to be replaced every 40 years or so. The AI doctor can diagnose as many patients as needed, humans need to run the tests to give it the results it needs.

But you know, AI is on considerably more solid ground there, and I suspect that will be one of the better applications of it because it is dealing with what is, and not making a prediction.

On the database issue though, both systems (human and AI) need continuing training, because the available data changes all the time. An AI doctor trained before 2019 would have been worth shit in diagnosing COVID for example.

I don't think separating out AI from the underlying data is going to be possible, because new agents are going to have to constantly be trained, and I think you see that going on now with constant new releases.

It doesn't matters how you store it, LLM or whatever, what it spits out will be based on what it was fed. GIGO

TailsWin · Mar 3, 2026

Otto Von Herunterhängen said:
So no, a human doctor isn't a database, but it's really hard to separate them from one. If the data they were trained on had flaws, so do they.

I guess, but that's not really a point against AI, it's just the uncertainty of everything.

Aside of that, we have more and less reliable or authoritative sources, and that is taken into consideration when training. An official medical manual, well-reviewed science papers, or articles by recognised experts, are rated higher than a random reddit post, so a well-trained model is less likely to be influenced by some fringe theory. This is one of the things that really keeps improving the most as people figure out new training techniques.

Funnily enough, most training these days is already made mostly by other AIs, so the risk isn't as much in garbage data, but by some older influence/bias being propagated all the way into the future, if it's not caught at some point. It's like the 'original compiler sin' problem. if some old compiler back in the 60's had a bug or backdoor, that could still be present in compilers today, since compilers are compiled at least partially by older compilers. In the same fashion, if a trainer AI has a bias for whatever reason, it will teach it to a trainee AI, which may then be used to train future ones.

Otto Von Herunterhängen said:
The difference between an AI doctor for diagnoses and a human one is the cost, while they both have training expense, the human doctor can only see one patient at a time, and has to be replaced every 40 years or so.

AI has another advantage, that any new knowledge can be very quickly distributed to all the AI doctors. Although that comes with risks too. Push a bad update and all the cancer patients or people around self-driving cars in the world have a bad day.

I wonder how this will be dealt with. Looking at how the industry works, probably "fuck it, YOLO".

Otto Von Herunterhängen said:
On the database issue though, both systems (human and AI) need continuing training, because the available data changes all the time. An AI doctor trained before 2019 would have been worth shit in diagnosing COVID for example.

Hard to say. AI is actually the best at prediction, because at least the current paradigm is all based on recognising and generalising patters, so it would probably have been quite helpful with that.

That's why AlphaFold managed to figure out so many protein folding problems, by predicting. But obv prediction is inherently inaccurate, so it may still be needed to check if they actually turned out that way.

Otto Von Herunterhängen said:
I don't think separating out AI from the underlying data is going to be possible, because new agents are going to have to constantly be trained, and I think you see that going on now with constant new releases.

It doesn't matters how you store it, LLM or whatever, what it spits out will be based on what it was fed. GIGO

The new models that get released differ more in finetuning and post-training rather than the input data. Almost all input data that's available has already been sampled so there isn't that much else to go on and after some point, it stops making much difference. OpenAI found that out the hard way, when they spent billions training GPT5 with like many times as much data than GPT4, only to find it's no better.

Nowadays it makes much more of a difference to make the models better and smarter even with the same original data. That's what we want from AI after all.

At least that goes for text and images, where there isn't much else to feed into the training. Other dimensions are still just crunching through the available material, which is why we see new AI generators are editors for other fields popping up.

TailsWin · Mar 5, 2026

Interesting stuff.

If this is true, hallucination may be basically completely eliminated. I find this a bit odd how they went about to find those problematic neurons, but they did find them at the end, so what do I know. If the overall conclusion is correct, it may give people new ideas to improve the training processes, or at least tweak the existing models with patches.

Second, Qwen3.5 is out and even the small models are cool. 4B can make an OS-looking web app with functional games and stuff with a one-shot prompt.

I went a few steps further and tried to get 0.8B and 2B models to do some basic coding projects. I didn't get anything useful yet, not even snake, but they are pretty coherent and at least follow instructions and reason, and output some code that's at least consistent and not complete nonsense (as far as I can tell).

For comparison, a year or so ago, baseline for rudimentary coding competency were generally models around 27B, with 70-120B especially in coding variants regarded as 'sorta useful if you can fix it yourself'. The cloud-based coding agents are in the range of 500B - 2T, as far as we know.

Qwen3.5-0.8B for download is a file with 500-600MB size, let that sink in. It can speak and reason almost completely sensibly, definitely smarter than some people I've met.

TailsWin · Mar 8, 2026

So anyway,

The title is a bit clickbaity, but it's pretty crazy how much concrete information we have about non-alignment that come right from the AI companies themselves. And that's just the current AI models, not some future super-AGI. And yet people still want to stick them in everything without any limits or oversight.

It's leaded petrol all over again, just with timeline accelerated about 10x.

TailsWin · Mar 16, 2026

AIEEEE! OK sometimes we are a debate club. Ye olde AI discussion thread

AI is

Going to be the death of humanity

Or at least the death of our current economic system

The dawn of the age of Superabundance

Stop The World, I Want to Get Off

Otto Von Herunterhängen

Administrator

Otto Von Herunterhängen

Administrator

TailsWin

Well-known member

TailsWin

Well-known member

Otto Von Herunterhängen

Administrator

TailsWin

Well-known member

TailsWin

Well-known member

TailsWin

Well-known member

TailsWin

Well-known member

Forum statistics

Share this page