Oracle Left Outer Join problem - first occurance of a match

trp...@gmail.com

unread,

Oct 28, 2008, 1:24:58 PM10/28/08

to

I am trying to left outer join to a table with mismatched results...
let me explain

Table: first_names
id first
1 Tom
2 Mike
3 Jack

Table: last_names
id last
1 Jones
2 Smith
2 Smiths

When I query the tables here is what I get

SELECT a.id, a.first, b.last FROM first_names a, last_names b
WHERE a.id=b.id(+);

Here is what I get

id first last
1 Tom Jones
2 Mike Smith
2 Mike Smiths
3 Jack

What I really want to get is 3 results:
id first last
1 Tom Jones
2 Mike Smith
3 Jack

I just want to join to the first occurance of the id in the second
table, no matter what the result is.

I have simplified this query a good deal to show the problem I am
having, so the data set should not be taken literally as I already see
there will be questions as to why there would be duplicate id's with
different last names, but without going into great detail on the
actual query this is necessary for my circumstances.

So is there a way to do a join and only pick up the first occurance of
a match on the second table?

Thanks!!

ddf

unread,

Oct 28, 2008, 4:43:24 PM10/28/08

to

Possibly this will work:

SQL> create table first_names(
2 id number,
3 first varchar2(20)
4 );

Table created.

SQL>
SQL> create table last_names(
2 id number,
3 last varchar2(30)
4 );

Table created.

SQL>
SQL> insert all
2 into first_names
3 values (1, 'Tom')
4 into first_names
5 values (2, 'Mike')
6 into first_names
7 values (3, 'Jack')
8 into last_names
9 values (1, 'Jones')
10 into last_names
11 values (2, 'Smith')
12 into last_names
13 values (2, 'Smiths')
14 select * From dual;

6 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select f.id, f.first, l.last
2 from
3 first_names f left outer join
4 (select id, last
5 from
6 (select id, last, dense_rank() over (order by id, last)
rk
7 from last_names)
8 where rk = id) l
9 on l.id = f.id
10 /

ID FIRST LAST
---------- -------------------- ------------------------------

1 Tom Jones
2 Mike Smith
3 Jack

SQL>

David Fitzjarrell

trp...@gmail.com

unread,

Oct 28, 2008, 7:52:33 PM10/28/08

to

> David Fitzjarrell- Hide quoted text -
>
> - Show quoted text -

Thanks! This is close, however the id's actually are non linear and
look something like 1001922 so rank doesn't work so well

So I guess to make my example better reflect what I am doing, the
tables should look more like this

SQL> insert all
> 2 into first_names

> 3 values (1001922, 'Tom')
> 4 into first_names
> 5 values (1001957, 'Mike')
> 6 into first_names
> 7 values (1001988, 'Jack')
> 8 into last_names
> 9 values (1001922, 'Jones')
> 10 into last_names
> 11 values (1001957, 'Smith')
> 12 into last_names
> 13 values (1001957, 'Smiths')

> 14 select * From dual;

Table: first_names
id first
1001922 Tom
1001957 Mike
1001988 Jack

Table: last_names
id last
1001922 Jones
1001957 Smith
1001957 Smiths

ddf

unread,

Oct 29, 2008, 11:47:09 AM10/29/08

to

> 1001957 Smiths- Hide quoted text -

>
> - Show quoted text -

Using RANK() in place of DENSE_RANK() and using only the ID I get this
with your 'new' data:

SQL> select f.id, f.first, l.last
2 from
3 first_names f left outer join
4 (select id, last
5 from

6 (select id, last, rank() over (order by id) rk
7 from last_names)
8 where rk = rownum) l

9 on l.id = f.id
10 /

ID FIRST LAST
---------- -------------------- ------------------------------

1001922 Tom Jones
1001957 Mike Smiths
1001988 Jack

SQL>

David Fitzjarrell

ddf

unread,

Oct 29, 2008, 11:52:47 AM10/29/08

to

Of couse that probably won't work once you get past the first set of
duplicates; providing such a limited set of test data can create
'solutions' which work for that set alone, and won't solve the issue
in an extended set of data:

SQL> select f.id, f.first, l.last
2 from
3 first_names f left outer join
4 (select id, last
5 from
6 (select id, last, rank() over (order by id) rk
7 from last_names)
8 where rk = rownum) l
9 on l.id = f.id
10 /

ID FIRST LAST
---------- -------------------- ------------------------------
1001922 Tom Jones
1001957 Mike

1001988 Jack

SQL>

You might be better served with a PL/SQL implementation, either
through an anonymous block or via a stored procedure.

David Fitzjarrell

Bob Jones

unread,

Oct 29, 2008, 9:37:13 PM10/29/08

to

<trp...@gmail.com> wrote in message
news:9290db80-0c54-46e3...@v13g2000pro.googlegroups.com...

By first occurrence, do you mean the row with the smallest rowid? As you may
know, the use of rowid in queries is not a good practice. The result of this
query will vary depending on which order rows are stored. Sorry it just
makes no application sense to me.

trp...@gmail.com

unread,

Oct 30, 2008, 3:45:47 PM10/30/08

to

I got an email from z1h...@gmail.com and this works:

SELECT f.id, f.first, l.second
FROM first_names f, (SELECT min(l1.id),l1.last_name FROM last_names
l1
group by l1.last_name ) l
WHERE f.id = l."MIN(l1.id)"
UNION ALL
SELECT f.id, f.first,NULL
FROM first_names f
WHERE f.id NOT IN (SELECT l.id FROM last_names l)

Thanks!!

trp...@gmail.com

unread,

Oct 30, 2008, 7:15:44 PM10/30/08

to

> makes no application sense to me.- Hide quoted text -

>
> - Show quoted text -

Yeah I knew it would raise questions... I did not design the original
database which I have some issues with, but am just trying to run a
report from it... In the database there will be one login id, shared
by many users, but differntiated by login/password primary key
combination.. i.e.:

login_id pass
user1 11111
user1 22222
user1 33333
user2 44444
user3 55555

so what I am doing is getting a total from a different join based on
the login_id i.e.

login_id count
user1 5000
user2 1050
user3 2200

Now the last piece is I wanted to tie a company name to the login_id
as that is how logins are grouped. The problem is whoever designed the
database assigned a company name to each login/password pair.

The company name is manually entered and I noticed that sometimes
there is a small typo, which results in duplicate entries when I join
the data. i.e.

login_id count company
user1 5000 Microsoft
user1 5000 Micro soft
user2 1050 Google
user3 2200 Yahoo

This is an internal report and I am not real picky if the result shows
the typo or shows the correct name that is why I just want the first
one, whatever it may be.

I know you could argue fix the data or fix the design, but I am just
working with what I have for a very legacy system.

Thanks for all your assistance!