odd join behavior

29 views
Skip to first unread message

Stephen Graham

unread,
Dec 3, 2012, 3:41:05 PM12/3/12
to ql...@googlegroups.com
Set up:  include the attached file in a webapp subdirectory called testfeed.

I initially wrote a simple join between the atom:feed and the mapping info, using a script variable to contain the map:

mapping = [
  {
    "name" : "A",
    "user" : "engineering"
  },
  {
    "name" : "B",
    "user" : "pre-sales"
  }
];

create table test.feed on select get from "http://localhost:8080/testfeed/test.xml" resultset "atom:feed.atom:entry";

select test.atom:content.vsc:Thing.vsc:displayName as t:displayName,
       umap.name as umap_name,
       test.atom:content.vsc:Thing.inst:p1 as t:p1,
       umap.user as user

  from test.feed as test, mapping as umap
  where umap.name = test.atom:content.vsc:Thing.vsc:displayName;


The response is what I expected:
[
  {
    "t:displayName": "A",
    "umap_name": "A",
    "t:p1": "A property1",
    "user": "engineering"
  },
  {
    "t:displayName": "B",
    "umap_name": "B",
    "t:p1": "B property1",
    "user": "pre-sales"
  }
]

When I converted the script to instead pull the mapping file in from an http call (to the same data contained in a file in the /testfeed webapp called user-map.json), I get different join results:
create table mapping on select get from "http://localhost:8080/testfeed/user-map.json"  ;

The exact same select statement seems to create a union instead of a join:

[

    {
        "t:displayName": "A",
        "umap_name": "A",
        "t:p1": "A property1",
        "user": "engineering"
    },
    {
        "t:displayName": "A",
        "umap_name": "B",
        "t:p1": "A property1",
        "user": "pre-sales"
    },
    {
        "t:displayName": "B",
        "umap_name": "A",
        "t:p1": "B property1",
        "user": "engineering"
    },
    {
        "t:displayName": "B",
        "umap_name": "B",
        "t:p1": "B property1",
        "user": "pre-sales"
    },
    {
        "t:displayName": "C",
        "umap_name": "A",
        "t:p1": "C property1",
        "user": "engineering"
    },
    {
        "t:displayName": "C",
        "umap_name": "B",
        "t:p1": "C property1",
        "user": "pre-sales"
    }

]


What am I missing that would explain why these two scripts give different results?

thanks in advance
sgg
test.xml

shimonchayim

unread,
Dec 4, 2012, 1:08:26 AM12/4/12
to ql...@googlegroups.com
Join is a bit of work in progress. typically the whole data-structure is treated like a single row hence joining it with service table response may be a little weird. A way around this would be to use udf to filter what you want.

Stephen Graham

unread,
Dec 4, 2012, 9:46:21 AM12/4/12
to ql...@googlegroups.com
So should I add this as a bug?  It is surprising that when the join condition is "A = B" would result in responses where field A and field B are clearly not equal (as shown in my first POST on this thread).

Is making the join semantics tighter a high priority for current ql.io work?  It seems this is a core capability to join feeds of info from one or more tables, but it does not seem to be reliable at the moment.  That is very distressing!

I don't quite see how a UDF would help me here, basically are you suggesting that I somehow pull the entire where clause into the UDF function and recompute the predicates to see if the given row should be included in the result set? 

shimonchayim

unread,
Dec 4, 2012, 12:38:36 PM12/4/12
to ql...@googlegroups.com
Stephen,

I understand your frustration but as I mentioned earlier data structures and service backed table results are treated differently hence this behavior. When you join results from two service backed table results it behaves the way you expect. Example:


or you can paste the following:

select e.ItemID as id, e.Title as title, e.ViewItemURLForNaturalSearch as url, g.geometry.location as latlng
  from details  as e, google.geocode as g
  where e.itemId in (select itemId from finditems where keywords = "iPad")
        and g.address = e.Location


Sure you can put an issue and we will surely looking into it (we are quite swamped at the moment though and will prioritize this).

The way udf in where works (i.e on the result returned by service backed table) is that it will give you a row of data that you can choose to keep or ignore. More like foreach().

- Cylus
Message has been deleted

shimonchayim

unread,
Dec 5, 2012, 1:43:51 AM12/5/12
to ql...@googlegroups.com
As suggested by another user for issue 572

I tried this and it worked.

mapping = [
{
"name" : "A",
"user" : "engineering"
},
{
"name" : "B",
"user" : "pre-sales"
}
];

data = [{
    "t:displayName": "A",
    "p1": "A property1",
    "user": "engineering"
},
{
    "t:displayName": "A",
    "p1": "A property1",
    "user": "pre-sales"
},
{
    "t:displayName": "B",
    "p1": "B property1",
    "user": "engineering"
},
{
    "t:displayName": "B",
    "p1": "B property1",
    "user": "pre-sales"
},
{
    "t:displayName": "C",
    "p1": "C property1",
    "user": "engineering"
},
{
    "t:displayName": "C",
    "p1": "C property1",
    "user": "pre-sales"
}
];

return select d.t:displayName as name, m.user as user from data as d, mapping as m where d.t:displayName = m.name;

result:
[
    • {
      • "name""A",
      • "user""engineering"
      },
    • {
      • "name""A",
      • "user""engineering"
      },
    • {
      • "name""B",
      • "user""pre-sales"
      },
    • {
      • "name""B",
      • "user""pre-sales"
      }
      ]

      So you can copy your service backed table results in vars and then do joins across them. there is no significant performance overhead for doing v/s what you were trying.

      - Cylus

      Stephen Graham

      unread,
      Dec 5, 2012, 9:01:20 AM12/5/12
      to ql...@googlegroups.com
      Agreed.

      I ended up with
      create table map on select get from "http://localhost:8080/testfeed/user-map.json"
      mapping = select * from map;


      create table test.feed on select get from "http://localhost:8080/testfeed/test.xml" resultset "atom:feed.atom:entry";
      feed = select atom:content.vsc:Thing from test.feed;

      select test.vsc:displayName as t:displayName,
             umap.name as umap:name,

             test.atom:content.vsc:Thing.inst:p1 as t:p1,
             umap.user as umap:user

        from feed as test, mapping as umap
        where umap.name = test.vsc:displayName;

      Basically replacing the hard coded JSON in the script with a create table and a select statement.

      Thanks for your help.
      sgg

      shimonchayim

      unread,
      Dec 5, 2012, 1:29:14 PM12/5/12
      to ql...@googlegroups.com
      Glad it worked! Thanks for the issues you logged in github.
      Reply all
      Reply to author
      Forward
      0 new messages