So I'm working on a demo that takes a set of images, comic book pages, and asks Chrome to summarize what's going on in them. One thing I noticed right away is that the model *really* wanted to comment on the art style and such. So I worked on my propt to try to get it to *not* do that, to only focus on what happens, and to give me a one paragraph summary for the page. Here's the prompt I have now:
initailPrompts: [{
role:"system",
content:"You analyze images that are part of a comic book. Each image represents one page of a story. I will prompt you with the image as well as any previous summary from earlier pages. You should summarize the current image and use any previous summary to help guide you with the current page. If the current page is an advertisement or promotional page, simply return nothing. If the image appears to be a cover, return nothing. Your summary should be one paragraph that is no more than three to four sentences and focused on describing what is being shown on the page. Do not give your opinion on the art or color. Just summarize what happens on the page."
}],
However, this never seemed to work. I got great explainers of the pages in question and goes into a lot of detail, but far too much detail and never focusing on the story itself.
I then tried adding a schema: