LLM's, K8s, and OpenNARS

108 views
Skip to first unread message

Ryan Peach

unread,
Mar 11, 2023, 1:01:43 AM3/11/23
to open-nars
Hey everyone, long time no see.

Since I last contributed I have become a pretty successful cloud engineer, with an ML MS, working for engineering companies like Ansys and Keysight.

When GPT3 and ChatGPT came out, I was excited thinking back on this project, because I quickly realized you could fine tune a foundation model to translate natural language into NARS statements. Has anyone tried this?

As I learned Kubernetes, I also realized this: If you hosted OpenNars on a server, and used the bagging approach to knowledge processing described in the book, you could have a HIGHLY horizontally auto-scalable system. It would just need a performant database, and a server architecture that is stateless.

Putting the two together, I think you could use OpenNars as an "oracle" for ChatGPT, maybe to prevent hallucinations? Maybe to aid in long term chat memory?

Has anyone done any work on either problem? Where is the latest codebase? Is there a dataset of Natural Language to NARS pairs?

Ryan Peach

unread,
Mar 11, 2023, 1:15:16 AM3/11/23
to open-nars
I see here are some examples of NAL to natural language pairs https://github.com/opennars/opennars/tree/master/src/main/resources/nal
Some basic conversion here https://github.com/opennars/OpenNARS-for-Applications/blob/master/english_to_narsese.py
And here https://github.com/opennars/OpenNARS-for-Applications/tree/master/examples

And here is a webocket server into a reasoner https://github.com/opennars/opennars-web

Is there any kind of database that opennars uses or is it all in-memory?

Are there any dockerfiles for this web server?

Pei Wang

unread,
Mar 13, 2023, 11:16:29 AM3/13/23
to open...@googlegroups.com
Hi Ryan,

Nice to hear from you again. You raise an interesting possibility. We have explored similar ideas, though haven't spent much effort on them. Currently people are busy with AGI-23, and hopefully will come back to this topic at a later date.

Regards,

Pei

--
You received this message because you are subscribed to the Google Groups "open-nars" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-nars+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/f645aa59-748d-42eb-9a84-21bb5343ca27n%40googlegroups.com.
Message has been deleted

Pei Wang

unread,
Apr 10, 2023, 5:03:00 PM4/10/23
to open...@googlegroups.com
Hi Jarrad,

On Fri, Apr 7, 2023 at 6:21 AM m...@jarradhope.com <m...@jarradhope.com> wrote:
I was also thinking about this, in the form of two questions:

Three? ;-)

I can only give my intuitive answers (conjunctures) without detailed arguments:
 
"could a term logic dataset (NAL/Narsese) improve the reasoning abilities of LLMs?"

Yes, by comparing the conclusion of NARS and that of LLM to the same question.
 
"could a NARS system be implemented as an LLM?"

No, especially because NARS may be applied to situations where the system's experience is very different from that of human beings.
 
"are human language(English) pairings necessary?"

Not really necessary, but useful for some purposes.

Similar agent model architectures are popping up around LLMs like babyagi and Auto-GP.
We could build a LLM-NARS system around LangChain / LLamaIndex, given the extensions they have it would be immediately more practical agent.

The LLM-NARS inference and control would be an LLM model:
Trained/finetuned on a generated a NAL ruleset and derivations, and Narsese dataset (from the various implementations?)
For control, we could align/finetune/rlhf on another dataset of prompts and desired end/state outputs ala chatgpt/alpaca.

The buffer would be the prompt context length (standard 2048 tokens) and output.
Memory would be a vector or tensor database.

We could keep in alignment with AIKR by keeping the database fixed-size to system resources, ie by pruning the vector/tensor database by least accessed vectors, and/or pruning metadata based on truth values.
With 4bit quantised models, Alpaca.cpp has shown the 7b model memory/disk requirements are 4.21GB, can run on CPU, and can run on Raspberry Pis / Smartphones.
With the latest mmap patches, even the 30b model can have a memory usage of 6GB during inference.

What's unclear to me is what would be the best way to go about generating a clean dataset?
What exactly would the control dataset look like?
Would it be problematic by introducing biases/idiosyncrasies of existing control mechanisms? (would that be an issue?)
How would you measure the difference in reasoning ability of LLMs with/without the term logic dataset?

I need to consider these suggestions and questions more to make comments.
 
Regards,

Pei

Ryan Peach

unread,
Apr 11, 2023, 7:36:05 AM4/11/23
to open...@googlegroups.com
I'm actually working on the Auto-GPT project a lot in my spare time.

However, I think your best solution honestly would be to implement in langchain. AutoGPT as a framework right now is very novice in terms of code quality and organization. It's more of a toy.

On Fri, Apr 7, 2023 at 6:21 AM m...@jarradhope.com <m...@jarradhope.com> wrote:
I was also thinking about this, in the form of two questions:
"could a term logic dataset (NAL/Narsese) improve the reasoning abilities of LLMs?"
"could a NARS system be implemented as an LLM?"
"are human language(English) pairings necessary?"
Similar agent model architectures are popping up around LLMs like babyagi and Auto-GP.
We could build a LLM-NARS system around LangChain / LLamaIndex, given the extensions they have it would be immediately more practical agent.

The LLM-NARS inference and control would be an LLM model:
Trained/finetuned on a generated a NAL ruleset and derivations, and Narsese dataset (from the various implementations?)
For control, we could align/finetune/rlhf on another dataset of prompts and desired end/state outputs ala chatgpt/alpaca.

The buffer would be the prompt context length (standard 2048 tokens) and output.
Memory would be a vector or tensor database.

We could keep in alignment with AIKR by keeping the database fixed-size to system resources, ie by pruning the vector/tensor database by least accessed vectors, and/or pruning metadata based on truth values.
With 4bit quantised models, Alpaca.cpp has shown the 7b model memory/disk requirements are 4.21GB, can run on CPU, and can run on Raspberry Pis / Smartphones.
With the latest mmap patches, even the 30b model can have a memory usage of 6GB during inference.

What's unclear to me is what would be the best way to go about generating a clean dataset?
What exactly would the control dataset look like?
Would it be problematic by introducing biases/idiosyncrasies of existing control mechanisms? (would that be an issue?)
How would you measure the difference in reasoning ability of LLMs with/without the term logic dataset?


On Monday, March 13, 2023 at 4:16:29 PM UTC+1 Pei Wang wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "open-nars" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/open-nars/-7fBHhdRFY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to open-nars+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-nars/b0f51228-67e1-4ea6-90b8-2fd75ed82b30n%40googlegroups.com.


--
Regards,
Ryan Peach

Patrick Hammer

unread,
Apr 15, 2023, 9:23:40 AM4/15/23
to open-nars
Hi everyone!

There is a new video of our attempt of using GPT with NARS (NARSGPT): https://www.youtube.com/watch?v=cpu6TooJ0Dk
The approach to only use GPT as a translator to Narsese looses some of the power of GPT though.
But I think this could be a valuable addition to AutoGPT as it would allow for a proper long-term memory management, which features:
- Instead of having to explicitly write and then read a file and its contents it could automatically store and retrieve a particular relationship.
- When it writes something which is already there it will revise with it so that when it finds contradicting or new information it will cancel out or overrule the previous result.
- And when it gets to max. capacity, the usage count and recency of usage of items will make sure only information which was used a long time ago and not often will be removed.

Best regards,
Patrick

Ryan Peach

unread,
Apr 15, 2023, 11:29:00 AM4/15/23
to open...@googlegroups.com
I think using it as a langchain tool (ReAct), chatGPT plugin, toolformer tool, etc is the most likely use case. Just like chatGPT can use sympy now to solve math problems and inform itself of the answer, or use vector memory storage to achieve long term memory, a persistent NARS server with a chatGPT available API could store experience in the form of NARSE and then use that experience to evaluate claims or questions at runtime when it is most necessary.

From: open...@googlegroups.com <open...@googlegroups.com> on behalf of Patrick Hammer <pat...@gmail.com>
Sent: Saturday, April 15, 2023 9:23:40 AM
To: open-nars <open...@googlegroups.com>
Subject: Re: [open-nars] LLM's, K8s, and OpenNARS
 

Patrick Hammer

unread,
Apr 15, 2023, 12:00:06 PM4/15/23
to open-nars
This would be possible indeed.
But I think a proper long-term memory needs revision support.
Just ingraining and believing what it finds, as is the case now, will not lead to proper learning at runtime.
I think that's where NARS principles can provide more value to systems like AutoGPT.

Best regards,
Patrick

Adrian Borucki

unread,
Apr 16, 2023, 8:54:10 AM4/16/23
to open-nars
The Toolformer comparison is apt as it can be done in the style of like `<text> [Deduce(…) -> <deduction result>] <text based on deduction><rest text>`. One could also try to build a dataset just like it was built for the Toolformer (they have an automatic procedure to calculate “helpfulness” of using an API for the LM. That could be used to further tune the model to use such reasoning tools, it would require a large number of samples to be robust though (1M+).
For full integration NARS can’t just be a server to be called as API though, it has to receive all the experiences, so that it learns in real time as the LM interface gathers more inputs.
Both approaches can be combined of course.

Dwane

unread,
Apr 16, 2023, 10:58:32 AM4/16/23
to open...@googlegroups.com
Hi Patrick,

many thanks for this, it is interesting, and I have duplicated this at the command line.

General (very basic) question:

in the video you are running this in a GUI, and produce a graph, how do I replicate this? i.e. which gui is this and how is it launched?

many thanks

Dwane


Patrick Hammer

unread,
Apr 16, 2023, 11:55:39 AM4/16/23
to open...@googlegroups.com
Hi Dwane!

The relevant commands (pull from ONA repo first as I enabled negaton handling too now) to generate the graphs:

Semantic graph:

echo "a cat is just another form of small animal. Cats usually eat mice but sometimes they also eat cat food" | python3 english_to_narsese_gpt3.py | ./../../NAR shell InspectionOnExit | python3 ./../../concept_usefulness_filter.py 5 | python ./../../concepts_to_graph.py

Temporal graph:

echo "the deer left the forest as it was searching for food. the hunter immediately saw the deer and shot at it. the deer survived and ran back into the forest" | python3 english_to_narsese_gpt3.py EventOutput BetweenEventDelay=15 | ./../../NAR shell InspectionOnExit | python3 ./../../concept_usefulness_filter.py 7 | python ./../../concepts_to_graph.py NoImages NoLinkLabels NoTermlinks

Also congrats for getting the "NUTS, NARS, and Speech" paper accepted!
If you have time we can synch up about this effort and also the other stuff that is going on. :)
I'm finally done with most of the AGI-23 PC chair and editor work so my calendar is finally less tightly packed.

Best regards,
Patrick





Maxim Tarasov

unread,
Apr 16, 2023, 12:28:40 PM4/16/23
to open-nars
623EE208-EB0A-42AF-978B-3684DA1516E3.png

Hi Patrick! Interesting demo using LLM to do the parsing. 
For what it’s worth, I tried your “humulu” example in ChatGPT and it seems to work as is. Has your experience been different?

Patrick Hammer

unread,
Apr 16, 2023, 12:33:16 PM4/16/23
to open...@googlegroups.com
Hi Maxim!

Yes it works (most of the time) with one category with one property.
Introduce a handful of them and you will not get a reliable answer of what matches best.
Instead you will get things like "from the information provided it is not clear what kind of animal inst42 ist"
or "as a language model I cannot make a guess on non-factual information"
or 100 variations thereof. :)

But there might be ways to prompt it to perform this task more reliably, just not in a natural way.

Best regards,
Patrick

Patrick Hammer

unread,
Apr 17, 2023, 12:32:36 AM4/17/23
to open...@googlegroups.com
Hi everyone!

Btw. even though English to Narsese using GPT worked quite well, I also tried something entirely different: https://github.com/patham9/NarsGPT
Implementing a NARS "on top of" a GPT with some NARS-based memory and control structures and NAL inference interface.
You can read more in the repository, and of course feel free to try it.

Best regards,
Patrick
Reply all
Reply to author
Forward
0 new messages