[GSoc] Elixir Stream Data type generation.

Nikola Jichev

unread,

Mar 5, 2018, 4:56:43 AM3/5/18

to BEAM Community

Hello Mentors,

I'm Nikola, a third year undergraduate in computer science from Sofia, Bulgaria. My interests lie in functional programming. I also help out our university's elixir courses.

My proposal concerns the property testing library stream_data. At the moment you can generate streams of random data for our properties like this `StreamData.integer()`, check properties of functions and shrink failing data.

As discussed with Jose and Andrea, the project would aim to improve the user experience of the stream_data library.

Here are the two main goals of the project.

Goal 1: More often than not, users will build their own types and structs, which would inconvenience them, since they will have to manually generate their types.

Example:

# Given those simple types:
defmodule Example do
@type name :: binary()
@type age :: integer()
@type user :: {name(), age()}
end
# You would generate the following data stream.
users = gen all name <- binary(),
age <- integer(),
do: {name, age}
# We should be able to make this more concise:
users = gen all Example.user()
# or
import Example
users = gen all user

Goal 2: Whenever a user defines a function spec, he should be able to declare whether he wants a function to be tested with properties. If declared, we would get the specs of a functions, generate the arguments using the type generator from Goal 1, invoke the function and check whether the results belong to the output generator.

Example:

defmodule Example do
@spec add(number(), number()) :: number()
def add(a, b), do: a + b
end
defmodule ExampleTest do
# Long variant:
property "adding two numbers results in a number" do
check all a <- number(),
b <- number(),
do: assert is_number(add(a, b))
end
# We could reduce this to:
property spec: :add
# or
property :add
end

I already have a simple PoC here. (The apis in the code and examples above are some I made up ad-hoc, I believe they will change overtime to better suit the users.)

The hard parts of the project, I imagine, would be generating recursive types and inferring the module the types come from if we want some implicitness in the tests.

Looking forward to your guidance, advice and recommendations on books/code to read,

Nikola!

José Valim

unread,

Mar 5, 2018, 5:14:07 AM3/5/18

to Nikola Jichev, BEAM Community

Hi Nikola, thank you for your the summary!

I have two notes below.

== Simpler type generation

I think for the type generation we can go with a simple API such as:

from_type(Mod, :type)

from_type(Mod, :type_with_args, [generator1, generator2])

That's more explicit and suits Elixir better.

I agree though that generating recursive types will be a challenge, especially co-recursive types. Handling type arguments can also provide some challenge.

== Generator inclusiveness

In your spec example, you wrote:

assert is_number(add(a, b))

Note that you retrieved the is_number guard from the spec output. This means you need to tackle two things here:

* Convert input types to generators - which is the first of your proposal

* Convert output types to Elixir assertions

That would be a big project because you would need to implement the code that traverse the types *twice*. One to build generators, another to build "assertions".

I think it would be better if we improve StreamData generators to also be able to answer the question: does this value belong to the generator? This was in one of the original goals of StreamData. It just has not been implemented yet!

If we assume that we can ask a generator if a value belongs to it, then the problem gets reduce to:

* Convert input types to generators - which is the first of your proposal

* Convert output types to generators - and then simply ask if result of add/2 belongs to it

I believe this approach is preferable because it will make generators more useful and likely reduce the amount of code you have to implement/maintain.

Thoughts?

José Valim
www.plataformatec.com.br
Founder and Director of R&D

Message has been deleted

Nikola Jichev

unread,

Mar 5, 2018, 6:48:31 AM3/5/18

to BEAM Community

Hello Jose,

I agree that we could go with a more explicit API, still in the lazy mindset of ruby.

For the second part, I was showing an example of how the users would manually check the type of the result.

Adding a function to ask generators whether a certain value belongs to the generator would be useful in others scenarios too. Here comes the question how do you check if my email for an example belongs to this generator:

domains = ~w(gmail.com yahoo.com)

emails = gen all name <- StreamData.string(:alphanumeric),
name != "",
domain <- StreamData.member_of(domains),
do: name <> "@" <> domain

Maybe add options when creating a generator like specifying a rule(pass in a captured/anonymous function) to check upon or only have for the built in generators, or both?

And this gets harder when you factor in types like red black trees, where even our generators would not generate the correct thing since it won't be balanced, and users would have to create their generators anyway, or bind a balance function on ours.

And another question I would ask would be how would you shrink recursively generated values? Something like shrinking a term to a subterm(more aggressive -> for trees, this would try replacing a tree with a node) or apply shrinking to all subterms(map shrink over all subtrees) should work I guess?

José Valim

unread,

Mar 5, 2018, 6:54:28 AM3/5/18

to Nikola Jichev, BEAM Community

Maybe add options when creating a generator like specifying a rule(pass in a captured/anonymous function) to check upon or only have for the built in generators, or both?

All built-in generators will provide both data generation and data validation functions. Some of the generator composition functions will be able to augment the data validation function aspect, such as StreamData.filter. However, other options such as bind/gen all won't allow so. For those cases, we will introduce something like StreamData.validator(generator, fn arg -> ... end), similar to what you proposed, where we can add/or override the data validation function.

For the typespecs part, because all type-based generators will be built on top of the built-in types, we should have all aspects covered.

And another question I would ask would be how would you shrink recursively generated values? Something like shrinking a term to a subterm(more aggressive -> for trees, this would try replacing a tree with a node) or apply shrinking to all subterms(map shrink over all subtrees) should work I guess?

The shrinking topic would be better to ask in the StreamData issues tracker as Andrea is more familiar with it.

Reply all

Reply to author

Forward