Claude 2 about LLM

alex.shkotin

unread,

Jul 27, 2023, 1:08:14 PM7/27/23

to ontolog-forum

Q: How LLM returns more than one token at once?

A:LLMs like myself generate text by predicting the next token in a sequence one token at a time. However, there are techniques that allow LLMs to return multiple tokens at once:

Sampling - Instead of always picking the most likely next token, the model can sample from its predicted token probability distribution. This allows it to output multiple varied possible next tokens.
Beam search - The model keeps track of multiple candidate next token sequences, rather than just a single sequence. At each step it expands each candidate sequence by sampling multiple possible next tokens for each. This allows it to generate multiple varied output sequences.
Nucleus sampling - Similar to sampling, but it samples from only the most likely tokens, focusing the samples around the peaks of the probability distribution. This also generates varied outputs.
Top-k sampling - Sample from only the k most likely next tokens at each step, allowing diversity while keeping likelihood high.
Top-p sampling (nucleus sampling) - Sample from the smallest possible set of tokens whose cumulative probability exceeds the threshold p. This also allows for diversity while maintaining likelihood.

So in summary, techniques like sampling, beam search, nucleus sampling and top-k/top-p sampling allow LLMs to generate multiple varied outputs at once by sampling or tracking multiple possible token sequences. The core model still generates one token at a time, but these techniques introduce diversity.

DONE

Very interesting for first step.

Alex

alex.shkotin

unread,

Jul 27, 2023, 1:33:00 PM7/27/23

to ontolog-forum

to be continued https://www.linkedin.com/pulse/claude-2-llm-alex-shkotin

четверг, 27 июля 2023 г. в 20:08:14 UTC+3, alex.shkotin:

John F Sowa

unread,

Jul 27, 2023, 6:15:30 PM7/27/23

to ontolo...@googlegroups.com

From: "alex.shkotin" <alex.s...@gmail.com>
Sent: 7/27/23 1:33 PM
To: ontolog-forum <ontolo...@googlegroups.com>
Subject: [ontolog-forum] Re: Claude 2 about LLM

--

All contributions to this forum are covered by an open-source license.

For information about the wiki, the license, and how to subscribe or

unsubscribe to the forum, see http://ontologforum.org/info/

---

You received this message because you are subscribed to the Google Groups "ontolog-forum" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/a11c7ab1-26b3-4024-bea7-64a60235b143n%40googlegroups.com.

John F Sowa

unread,

Jul 27, 2023, 6:49:26 PM7/27/23

to ontolo...@googlegroups.com

Alex,

As you explain very nicely, the LLMs are tensors and the operations by tensor calculus generate one token at a time. That is all that can be done directly. But external programs, which may use an open-ended variety of methods from AI and computer science, can use those results to do anything they please.

The core methods of tensor calculus for processing a huge amount of data from the WWW were developed by Google. They donated the results to OpenAI, which invited thousands of programmers to play with the resullts Then huge numbers of people followed the band wagon to produce the things that went far beyond anything that Google expected.

But it's essential to distinguish the basic LLMs from whatever anybody else chose to do with them by any means they might invent.

The word 'diversity' is meaningless. It just means 'Write any program you please with any tools you prefer." And any program that anybody writes becomes the main program that uses the LLMs and the original tools provided by Gogle as subroutines.

John

From: "alex.shkotin" <alex.s...@gmail.com>

Alex Shkotin

unread,

Jul 28, 2023, 5:28:39 AM7/28/23

to ontolo...@googlegroups.com

John,

Let me point, just in case you missed, that you are talking with Claude 2, not with me. All text between tokens "Q:" and "DONE" is written by Claude 2 as a response to my query.

My text is only this: "Very interesting for first step."

Let's keep discussion about Google terminology on other threads: Knowledge Graph is just labeled digraph, Tensor is just a two dimensional array.

The second step is here. In a form: my query, Claude 2 answer.

If we think ANN is a part of LLM returning one word on any request, what other components of LLM do we have?

That's an interesting way to conceptualize it. If we consider the core neural network model that produces token probabilities as just one component of a larger LLM system, some other key components would be:

Vocabulary - This defines the set of discrete tokens that the model can output. Things like BPE and WordPiece breakdown words into subword units.
Tokenizer - This preprocesses text into numeric token ids for feeding into the model. It converts between raw text and model tokens.
Memory - Larger LLMs have memory components to store long-term information and context. Things like transformer attention provide a memory.
Sampling algorithms - As we discussed, techniques like nucleus sampling introduce variation into the outputs.
Beam search - Maintains multiple candidate token sequences and prunes low probability ones.
Length control - Components to determine how many tokens should be generated.
Post-processing - Things like detokenization to convert the model's tokens back into proper readable text.

So in summary, the core neural network that predicts tokens is one key component, but it interacts with various other components for tasks like preprocessing, sampling, length control, and formatting to enable practical text generation. The full LLM system comprises multiple moving parts!

DONE

Very informative,

Alex

пт, 28 июл. 2023 г. в 01:49, John F Sowa <so...@bestweb.net>:

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/b1d7f1e49c5d40a3b41a14d77d3edd9d%40bestweb.net.

John F Sowa

unread,

Jul 28, 2023, 9:47:28 AM7/28/23

to ontolo...@googlegroups.com

Alex,

That is even better support for my point. Claude 2 is summarizing explanations by the experts who are doing the basic R & D.

What that summaaryt shows is therat the tensor calculus processing of LLMis is just one important kind of mathematical computation.. All the rest consists of more traditional programming that does a lot of preprocessing and postprocessing.

General principle: Before making any claims about LLMs, it's important to ask what other processes are taking place before and after anything that is being done by the tensor operations on LLMs.

And by the way, I'm sending another note about hallucinations. That word is a standard term in neural machine translation INMT).

John

From: "Alex Shkotin" <alex.s...@gmail.com>
Sent: 7/28/23 5:28 AM
To: ontolo...@googlegroups.com
Subject: Re: [ontolog-forum] Thank you for confirming what I have been saying (was Claude 2 about LLM

To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/CAFxxROSpgsGFDbV%3DHYsVKJQsZ6vqW2d%2BnEnzLMz%2BVo%3DyNBTZFw%40mail.gmail.com.

Alex Shkotin

unread,

Jul 28, 2023, 12:22:28 PM7/28/23

to ontolo...@googlegroups.com

John,

Fine. The only difference in terminology, as I discussed with Claude 2, is that inside of LLM there is ANN, and other components are (from Claude 2 point of view):

"

Vocabulary - This defines the set of discrete tokens that the model can output. Things like BPE and WordPiece breakdown words into subword units.
Tokenizer - This preprocesses text into numeric token ids for feeding into the model. It converts between raw text and model tokens.
Memory - Larger LLMs have memory components to store long-term information and context. Things like transformer attention provide a memory.
Sampling algorithms - As we discussed, techniques like nucleus sampling introduce variation into the outputs.
Beam search - Maintains multiple candidate token sequences and prunes low probability ones.
Length control - Components to determine how many tokens should be generated.
Post-processing - Things like detokenization to convert the model's tokens back into proper readable text.

"

I have found a way to start an LLM from Python and hope I have time to study terminology of LLM usage on programming level.

And about "hallucination" - this what I wrote to Ram: in any specific domain I am sure it has precise definition.

Alex

пт, 28 июл. 2023 г. в 16:47, John F Sowa <so...@bestweb.net>:

To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/7322700861d44fa2ae45d9ee43be958b%40bestweb.net.

Reply all

Reply to author

Forward