Learning to Reason with LLMs ( new OpenAI-o1 model! 🎉 )

6 views
Skip to first unread message

Alan Timm

unread,
Sep 12, 2024, 5:47:53 PM9/12/24
to RSSC-List
https://openai.com/index/learning-to-reason-with-llms/
https://www.youtube.com/watch?v=2mbM1wHwQh

"Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason. To illustrate this leap forward, we showcase the chain of thought from o1-preview on several difficult problems below."


Sergei G

unread,
Sep 12, 2024, 7:41:00 PM9/12/24
to Alan Timm, RSSC-List
This is a bit scary. We, humans, very rarely resort to logical process - coming up with the first familiar pattern is our most common (and most energy-efficient for the brain) reaction. If we reasoned every time we responded to a stimulus from the environment - we would be slow and would require much more food. Think of being eaten by a tiger and needing an antelope a day just to perform trivial tasks. 

So, with our hardwired knee-jerk response process (quick pattern matching) we, humans - write software, drive cars, govern empires, write prescriptions, start wars and respond to emails. Never, ever to reason is a biological imperative.

Now - imagine a machine that would properly reason every time - this is by definition a whole different class of intelligence. Ultimately - ruling our lives, better than existing mechanisms (insurance companies, credit bureaus) do it now. What a world it would be... "Imagine" or Ubasute?

Best Regards,
-- Sergei


From: rssc...@googlegroups.com <rssc...@googlegroups.com> on behalf of Alan Timm <gest...@gmail.com>
Sent: Thursday, September 12, 2024 4:47 PM
To: RSSC-List <rssc...@googlegroups.com>
Subject: [RSSC-List] Learning to Reason with LLMs ( new OpenAI-o1 model! 🎉 )
 
https://openai.com/index/learning-to-reason-with-llms/
https://www.youtube.com/watch?v=2mbM1wHwQh

"Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason. To illustrate this leap forward, we showcase the chain of thought from o1-preview on several difficult problems below."


--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rssc-list/46f08e06-8716-4e1f-bca3-14ab6b9b147en%40googlegroups.com.

Subhobroto Sinha

unread,
Sep 12, 2024, 8:04:31 PM9/12/24
to Sergei G, Alan Timm, RSSC-List
Sergei,

great thought, as always. I came to a conclusion (and wrote part of that here a few months ago) that costs (of all kinds) will fall dramatically due to proper reasoning every time, so it's imperative we use it ASAP as a tool everywhere in our lives. We need our own self hosted, personalized (they have our data as context) LLM Agents. Everyone should have one to lead a productive life.

I guess it becomes obvious once you realize, a large amount of societal costs are due to human misjudgement and errors.

Let me challenge you (and others) here to think of these scenarios:

political scenario: Given that you are representing your constituents from district 32 that's overrun by criminals entering through an undermanned border, what are the steps you would take to reduce crime?
expectation: a list of steps, unbiased by human emotion, that will reduce crime. Now we, the constituents compare that to what the politicians are actually doing and ask them why a disparity exists

Educational scenario: Given the information you can obtain from my notes on thermodynamics and only to those notes, can I answer how much additional insulation I need to add to my attic to keep my house below 80F when it's 100F outside? If not, what do I need to learn? Also was adding all that expensive plastic window coverings a good idea?
expectation: a good explanation that shows what my blind spots are based off what I know from my notes, why the plastic window coverings were a mistake if that was my objective and how to reduce thermal bridging

health scenario: Given the health vitals information you can obtain from my notes only to those notes that show that I have given up eating sugar and fat, how much of a risk of a stroke or heart attack have I reduced?
expectation: a good explanation that shows what my risks of a stroke or heart attack have not statistically reduced because while sugar and fat might cause plaque buildup which is the root cause of a stroke or heart attack, plaque buildup can als be caused due to 1, 2, 3 and I have not done anything to address those because I feel I have "sacrificed enough" by giving up on sugar and fat.

financial scenario: Given that getting a bachelors and master degree has always resulted in an improvement in the quality of life and better salaries than those without, should I recommend my students get a bachelors and master degree by going into debt as per Table B if they wanted to earn 700% above the poverty line as per Table C inflation adjusted to 2036?
expectation: a good explanation that shows how the ROI for bachelors and master degrees have drastically fallen over the last decade and how people who got their bachelors and master's degrees before then don't realize that yet, so the human feedback loop is delayed by 20 years. Plus, as bachelors and master degree holders ourselves, we are incredibly biased and emotional about it too!
Reply all
Reply to author
Forward
0 new messages