Data Mapping Discovery

manimal45

unread,

Feb 13, 2012, 3:08:38 AM2/13/12

to

Hi,

I'm very new to prolog but I believe it might be a good fit for my
problem.
I want to generate data mapping expressions/queries based on source/
target example data instances.

Here's an example :
SOURCE: Person
ID FNAME LNAME DATE COUNTRY
------------------------------------
1 john doe monday c.2
2 john doe tuesday c.1

SOURCE : Country
CID NAME
------------
c1 France
c2 UK

TARGET : TPerson
// For each person, find the latest date and join with country
ID FNAME LNAME DATE COUNTRY
------------------------------------
p1 john doe monday UK

All data examples are submitted by the user.
Now, the "discovery" tool might return the following query :
select
"P"||Person.id, // Complex mapping Tperson.PID =
concat("P",Person.PID)
Person.Fname,
Person.Lname,
PErson.date,
country.name
from
Person
inner join
(
select
pid,
min(date) as date
from
Person
group by
pid
) as P on
Person.pid = p.pid and
Person.date = p.date
inner join
Country on
Person.COUNTRY = Country.CID

Do you know of something similar in prolog ?
Is prolog a good fit for this king of problem ?

A. K.

unread,

Feb 13, 2012, 7:32:22 AM2/13/12

to

If it is not more complex than that (eg no more rules to process), then
you might be better off with SQLite.

Chip Eastham

unread,

Feb 13, 2012, 11:40:19 AM2/13/12

to

Googling on "query-by-example" and prolog turned up a number of seemingly useful links, including a 1981 book on the subject by K.M. Snell.

This more recent blog article discusses Query-by-Example in connection with some SWI-Prolog code:

[Salmon Run: Ontology Rules with Prolog]
http://sujitpal.blogspot.com/2009/06/ontology-rules-with-prolog.html

regards, chip

manimal45

unread,

Feb 14, 2012, 3:37:15 AM2/14/12

to

Chip,

you perfectly understood my problem -- though it was not that clear as
AK answer tend to show.

Thanks, for the intersting links.
However, I'm not into RDF/OWL and the like, I'm just a relational
database user,
who's looking for intersting ways to solve a common issue in data
integration.
Would you think it could be doable in prolog ?
Or do you know of a technology which might be better suited for the
job ?

A. K.

unread,

Feb 14, 2012, 4:10:18 AM2/14/12

to

IMHO one should use the right tools for the job, not force the job to
fit the tools. Prolog's strength is finding solutions for goals in a
world consisting of facts (databases) and rules/constraints. The better
if the whole database fits into main memory.

FWIW here's an article using F#
http://msdn.microsoft.com/en-us/magazine/gg598923.aspx

And of course the classic RDB way
http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html

Chip Eastham

unread,

Feb 14, 2012, 7:01:38 AM2/14/12

to

Please tell us a bit more about the goal.

I tend to think of query-by-example as a way of presenting
relational database access (e.g. report generation) to users
who would be baffled by SQL.

In that respect QBE strikes me as an illustration of the
"garden path fallacy," that with a little effort one gets
some simple results, falsely suggesting that with more
effort alone one can "reach the mountains." Instead one
finds that QBE proves to be mostly a dead-end in meeting
real world user demands.

However it can be a flashy demo, so if that's the goal,
then it might be worth investing development time.

On the other hand there are a lot of tools for RDBMS
which already feature some QBE functionality. Is it
really the right wheel for you to reinvent?

So, back to my question at the top of this post...
what is the goal you are trying to reach?

Prolog should be an excellent tool for generating
SQL code in a parametric way and/or for looking
for ways to optimize SQL (index generation, etc.).
So if the goal was less about a simplified user
interface and more about back-end optimization
and/or "table-driven" application development,
then I'd be inclined to push Prolog as a credible
tool for the job.

regards, chip

manimal45

unread,

Feb 14, 2012, 7:46:17 AM2/14/12

to

Let me go a step further in my requirement.
I work data migration projects.
At the beginning of a data migration project, we have something called
a data mapping which expresses
how a program might operate source data in order to produce target
data.

This is usually done in plain english with spreadsheet support.
The mapping process itself is complex, error-prone.
But the main issue is that of the slowness.
Business users define the mapping, developers code, and one week
later, business users get the data back,
and usually we need to do one more loop because
- developers did not capture the mapping intend
- business users missed something

What I think would be a better approach would be to accelerate the
loop by giving business users
means to express the mapping simply.
As they do not practice SQL, the simple way is to go with examples.
So, in my idea, business users give some example of source data, and
target data
and a 'program' interactively would tell them :
- here's the query or queries
- here's the program explanation (in planin english).

We don't need too much data as input is manually filled.
But we need something clever that would find the mapping.
It's not exactly "query by example", it's more a query discovery.

From a technical stand point, the program would be 2 fold :
- first, given the source tables and target table primary keys, find
the query that when applied to the source keys
give the same target key. This is important because, most of the
time, the scalar mapping is simple (e.g : concat firstname and
lastname) bu the vectorial mapping,i.e rowset-based, is not (e.g :
what joins are applied, what group bys, what transpositions, pivot ?)
- next, find the scalar mappings, that is the transformations to apply
on a row-based level( not rowset level).

Moreover, we have some kind of tree to explore.
The initial state is (source1, source2, ..., sourcen, metadata), where
- source1 ... are example tables
- meta defines the sources metadata (mainly discovered keys and fkeys)

We have transitions between states :
- we can apply joins
- group bys
by iterating the subkeys and th different relationships

Finally, we have a stop condition, either we found the target keys or
we have iterated too much (like in recursive breadth first search).

Is this a typical prolog problem ?

Message has been deleted