Questions about evaluating the quality of procedural levels in Super Mario Bros.

Peizhi Shi

unread,

Mar 4, 2015, 1:36:19 PM3/4/15

to procedur...@googlegroups.com

Dear all,

Hi, I am implementing a level generator in Super Mario Bros. Currently, we are trying to make a comparison between our generator and other generators. However, we found that existing evaluation metrics (e.g., leniency, linearity, density, compression distance, pattern density and variation) can evaluate the property of a level generator, but cannot evaluate the quality of a level generator. These metrics cannot tell us which level generator is better.

My questions are:

1. Is there a relationship between the current evaluation metrics (e.g., linearity) and player preferences?

2. Should a good generator cover an extremely wide expressive range for these metrics?

3. Apart from using human judge to make a comparison, are there any methods/metrics which allow us to evaluate the quality of a level generator?

4. Are current evaluation metrics enough for us to make a comparison?

Thanks very much for reading this! I really appreciate if you could give me some suggestions ^_^

Peizhi Shi

Adam Smith

unread,

Mar 4, 2015, 5:07:23 PM3/4/15

to procedur...@googlegroups.com

Your evaluation methods should somehow match your project goals. Even if you had fully automatic and trusted "level quality" metric to apply, this still wouldn't tell a meaningful story if your goal was to... give users intuitive controls, produce levels with some features and not others, produce levels in the style of a given set of levels, require that the player practice certain movements, grow the level continuously in response to past inputs, match the rhythm in a piece of music, have solutions that cluster around a user-provided target path, run within certain real-time constraints, usefully exploit multiple cpu cores or gpus, work across more than one game, etc. If your generator was created in response to some limitations or flaws in one or more past generators, try developing a metric that specifically captures those limitations or flaws so you can see how your response fares. By the way, when you are judging the quality of a generator by examining its outputs, keep in mind that two different generators might have identical output spaces (where one perhaps runs more efficiently, performs better, or outputs the exact same artifacts under a different probability distribution). If you want to make a clear statement about your generator, consider evaluating more than just the outputs (inputs are a good place to start!).

--

---
You received this message because you are subscribed to the Google Groups "Procedural Content Generation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proceduralcont...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gillian Smith

unread,

Mar 5, 2015, 2:50:33 AM3/5/15

to proceduralcontent

Regarding the "quality" of a level generator, I think perhaps we should first agree on what is meant by "quality". The metrics identified in our paper that I think you're talking about were explicitly not intended to be used to determine what makes a "better" generator. I would posit that it may be impossible to actually answer what makes a "better" generator without a concrete game context, and even then, it may not be possible or desirable to do so. It is perhaps more important to look at the impact of the generative space on player experience or the desired game aesthetic. It really depends on the goals that you have in making a generator.

To answer your specific questions:

1) Maybe? But we don't really know for sure. This sounds like a great set of research papers in itself! :)

2) This depends entirely on your goals with the generator, what kind of game or tool you want to use it in, and how general purpose you want it to be. I believe that this question is impossible to answer as it is formulated. You could imagine a generator that could make only one beautiful and emotionally resonant level. I would hate to call this hypothetical generator "worse" than one that can create thousands upon thousands of mediocre pieces of content. Size doesn't matter.

3) Again, I think the term "quality" might be overloaded here....

4) There is a lot of research to be done in identifying new metrics! Defining metrics that are relevant to your design context is a perfectly reasonable thing to do.

In general, my goal with evaluation is to get a nuanced understanding of generative space and expressive range. My "generative space", I mean the properties of content that can be generated and the "shape" of that space. By "expressive range", I mean how the generative space responds to input, and how design-relevant the input controls are.

It of course depends on the goals you have for creating a generator, but I prefer to think of PCG evaluation as a sort of computationally-assisted reflection and analysis on the generator, the process it follows, and its range of potential outputs in the context of its particular design aims. I don't believe in saying that any one generator is "better" than another. What would it mean for Monet to be "better" than Van Gogh? One is not objectively better than another, but there is much to be learned about each artist by performing a deep analysis of their works while bearing in mind the broader cultural context in which the artists lived.

Gillian Smith

Assistant Professor, Game Design and Computer Science

Northeastern University

Playable Innovative Technologies Lab

http://www.sokath.com

On Wed, Mar 4, 2015 at 10:36 AM, Peizhi Shi <wssp...@gmail.com> wrote:

--

Peizhi Shi

unread,

Mar 5, 2015, 6:23:16 AM3/5/15

to procedur...@googlegroups.com

Dear Dr. Adam Smith,

Hi, thank you very much for your reply and suggestions. You are right! Two generators could have identical output spaces. I really appreciate.

Hope you can have a nice day :)

Peizhi

在 2015年3月4日星期三 UTC下午10:07:23，Adam Smith写道：

Peizhi Shi

unread,

Mar 5, 2015, 6:24:38 AM3/5/15

to procedur...@googlegroups.com, gil...@ccs.neu.edu

Dear Prof. Gillian Smith,

Hi, thank you so much for answering these questions and explaining your evaluation metrics in detail! It makes me have a better understanding about PCG, game design and game quality. I really appreciate your help.

Thanks again.

Have a nice day :)

Peizhi

在 2015年3月5日星期四 UTC上午7:50:33，Gillian Smith写道：

Reply all

Reply to author

Forward