Let me go a step further in my requirement.
I work data migration projects.
At the beginning of a data migration project, we have something called
a data mapping which expresses
how a program might operate source data in order to produce target
data.
This is usually done in plain english with spreadsheet support.
The mapping process itself is complex, error-prone.
But the main issue is that of the slowness.
Business users define the mapping, developers code, and one week
later, business users get the data back,
and usually we need to do one more loop because
- developers did not capture the mapping intend
- business users missed something
What I think would be a better approach would be to accelerate the
loop by giving business users
means to express the mapping simply.
As they do not practice SQL, the simple way is to go with examples.
So, in my idea, business users give some example of source data, and
target data
and a 'program' interactively would tell them :
- here's the query or queries
- here's the program explanation (in planin english).
We don't need too much data as input is manually filled.
But we need something clever that would find the mapping.
It's not exactly "query by example", it's more a query discovery.
From a technical stand point, the program would be 2 fold :
- first, given the source tables and target table primary keys, find
the query that when applied to the source keys
give the same target key. This is important because, most of the
time, the scalar mapping is simple (e.g : concat firstname and
lastname) bu the vectorial mapping,i.e rowset-based, is not (e.g :
what joins are applied, what group bys, what transpositions, pivot ?)
- next, find the scalar mappings, that is the transformations to apply
on a row-based level( not rowset level).
Moreover, we have some kind of tree to explore.
The initial state is (source1, source2, ..., sourcen, metadata), where
- source1 ... are example tables
- meta defines the sources metadata (mainly discovered keys and fkeys)
We have transitions between states :
- we can apply joins
- group bys
by iterating the subkeys and th different relationships
Finally, we have a stop condition, either we found the target keys or
we have iterated too much (like in recursive breadth first search).
Is this a typical prolog problem ?