FYI:Galactica: A Large Language Model for Science

alex.shkotin

unread,

Nov 23, 2022, 12:32:20 PM11/23/22

to ontolog-forum

https://galactica.org/static/paper.pdf

https://galactica.org/explore/

It would be great to compare it later with Knowledge concentrator.

alex.shkotin

unread,

Nov 26, 2022, 3:26:43 AM11/26/22

to ontolog-forum

pragmatic "

Language Models can Hallucinate.There are no guarantees for truthful or reliable output from language models, even large ones trained on high-quality data like Galactica. NEVER FOLLOW ADVICE FROM A LANGUAGE MODEL WITHOUT VERIFICATION.

"

https://galactica.org/mission/

среда, 23 ноября 2022 г. в 20:32:20 UTC+3, alex.shkotin:

Azamat Abdoullaev

unread,

Nov 26, 2022, 5:50:13 AM11/26/22

to ontolo...@googlegroups.com

Alex wrote:

Language Models can Hallucinate.There are no guarantees for truthful or reliable output from language models, even large ones trained on high-quality data like Galactica. NEVER FOLLOW ADVICE FROM A LANGUAGE MODEL WITHOUT VERIFICATION.

As I mentioned before, all LLMs are "stochastic parrots", ideal for mindlessly spitting out biases and nonsense. As all statistical learning software applications which are trained on giga, terra or petabytes of data corpus, are dull and dumb by its design.

Galactica is a LLM for science, "trained on 48 million examples of scientific articles, websites, textbooks, lecture notes, and encyclopedias". In the company’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”

Its demo has failed. It was not all the fault of Meta's team. LeCun, Meta AI chief scientist, defended it to the last. On the day the model was released, LeCun tweeted: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?” The same happened with Microsoft' bot on Twitter.

The weakest point of all LLMs is a sheer lack of world models, instead of language models. It is like with human intelligence, you are a living zomby or soldier without world views, the core of sentience, consciousness and awareness, conscience or morality. In other words, such ML models are not aware of what they do, read, compose, translate, transcribe, recognize, drive, etc. All is performed automatically and mechanically and mindlessly, without sentience, knowing and self-knowing.

Again, LeCun is aware of it, now stating that without controllable predictable world models, there is no true path to autonomous machine intelligence. This is why he is absent in the article you mentioned.

Another weak point is the lack of a mathematical model/theory of intelligence in terms of reality, interaction and data.

So, they have a lot of money, instead of a lot of mind, or theoretical background of real-world AI/ML systems.

Then why do they do it? Meta does it because it can allow itself, ...till it goes down.

https://www.linkedin.com/pulse/mathematical-theory-intelligence-azamat-abdoullaev/?published=t

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/dbe5f9d7-243f-4a4f-8a23-ac173a20f0a3n%40googlegroups.com.

John F Sowa

unread,

Nov 26, 2022, 11:20:32 PM11/26/22

to ontolo...@googlegroups.com

Alex and Azamat,

I agree with Alex. Language models can produce good results for many kinds of machine translation because every phrase is tied to a corresponding phrase in the source language. But those methods can fail on various kinds of highly important technical texts:

For human translations, the following examples are ones for which subject-matter knowledge is more important than native experience in the source language. These are also areas for which even the best MT systems are useless.

1. Scientific texts that contain large numbers of rare words and symbols. The probabilities of those symbols are so low that even language models with huge numbers of words from the same branch of science cannot make reliable translations. New chemical compounds, organic molecules, drugs, etc., A chemist can distinguish the name of a new chemical from a misspelling of an old chemical. But an MT system that does not understand chemistry cannot.

2. Texts on financial transactions, which have highly specialized technical terms that have low probabilities and require absolute precision.

3. Texts on legal terminology, especially for international organizations such as the UN and EU. International treaties have specialized terms with precise definitions for which the slightest error could cause an international conflict. Furthermore, many of those international treaties may involve highly specialized terminology for navigation, mineral rights, geography, etc. Any errors could be a disaster.

4. Scientific, engineering, and business innovation. New inventions, scientific discoveries, or new business products can introduce new terminology or use old terms in new senses. Every new publication or patent introduces new terminology in new patterns for which language models become obsolete or misleading for the new subject matter.

Finally, the worst MT systems are those that are so good for familiar texts that people fail to recognize the kinds of texts for which they can fail in dangerous, catastrophic, or extremely expersive ways.

John

_________________________________

From: "Azamat Abdoullaev" <ontop...@gmail.com>

Alex wrote:

Language Models can Hallucinate.There are no guarantees for truthful or reliable output from language models, even large ones trained on high-quality data like Galactica. NEVER FOLLOW ADVICE FROM A LANGUAGE MODEL WITHOUT VERIFICATION.