LLaVA - This Open Source Model Can SEE Just like GPT-4-V

51 views
Skip to first unread message

Alan Timm

unread,
Oct 11, 2023, 3:30:45 PM10/11/23
to RSSC-List
This is amazing and would fit in well in an llm-driven robotic solution.
I wonder how it would work out to hook this into camera video feed as a datasource for a robot?  Just let it spin full time?

I think a vision pipeline using yolo for realtime object classification and image segmenting, followed by a llava description could be useful.  the llava step is computationally expensive but only needs to be done once per identified image segment.


Online Demo:  https://llava.hliu.cc/

llava online demo.jpg

Prompt example:

"What is in this photo?  json format.

{
    "people": [
        {
            "name": "Man on left",
            "age": "30s",
            "occupation": "Engineer",
            "clothing": "Brown shirt",
            "accessories": "Watch",
            "action": "Talking"
        },
        {
            "name": "Man on right",
            "age": "50s",
            "occupation": "Engineer",
            "clothing": "Black shirt",
            "accessories": "Glasses",
            "action": "Listening"
        }
    ],
    "location": "Warehouse",
    "items": [
        {
            "name": "Robot",
            "description": "A large metal robot with a blue arm"
        },
        {
            "name": "Boxes",
            "description": "Numerous boxes on shelves and in the background"
        }
    ],
    "conversation": "Two engineers discussing a project in a warehouse"
}

Jim DiNunzio

unread,
Oct 18, 2023, 1:43:59 AM10/18/23
to Alan Timm, RSSC-List

 

A group of people gathered around a table with various electronic equipment, such as laptops, keyboards, and a droid, could provide an opportunity to learn about different aspects of electronics, technology, and robotics. You could gain insights into the design, construction, and programming of electronic devices, as well as the practical applications of these technologies. Additionally, you might learn about the latest trends and advancements in the field, and engage in discussions with experts and enthusiasts who can share their knowledge and experiences. This gathering could also serve as a platform for networking and collaboration, allowing you to connect with like-minded individuals and potentially explore new projects or opportunities in the field.

 

Wow, that is amazing!

 

From: rssc...@googlegroups.com <rssc...@googlegroups.com> On Behalf Of Alan Timm
Sent: Wednesday, October 11, 2023 12:31 PM
To: RSSC-List <rssc...@googlegroups.com>
Subject: [RSSC-List] LLaVA - This Open Source Model Can SEE Just like GPT-4-V

 

This is amazing and would fit in well in an llm-driven robotic solution.

I wonder how it would work out to hook this into camera video feed as a datasource for a robot?  Just let it spin full time?

 

I think a vision pipeline using yolo for realtime object classification and image segmenting, followed by a llava description could be useful.  the llava step is computationally expensive but only needs to be done once per identified image segment.

 

 

Online Demo:  https://llava.hliu.cc/

 

--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rssc-list/0235cd23-307d-4fca-a32c-c32319fe4f2an%40googlegroups.com.

image002.jpg
image003.png

Jim DiNunzio

unread,
Oct 18, 2023, 1:50:23 AM10/18/23
to j...@dinunzio.com, Alan Timm, RSSC-List

The name of the club is the Southern California Robotics Club.

 

 

Almost… just from the partial signs. And if you google search that guess who’s first in the results…

 

 

image001.png
image004.png
image005.png
image006.png
image007.jpg
Reply all
Reply to author
Forward
0 new messages