A Strange Use of UDFs?

Garth Wells

unread,

Dec 2, 2007, 11:18:52 PM12/2/07

to

I started a new project recently and the .Net/SQL Server 2000
application I was hired to help with was developed by a large
consulting firm. All the data access for the application is done via
stored procedures, but most stored procedure data access is done
via functions. So, you might have a SELECT inside a stored
procedure that looks similar to this:

SELECT e.LastName, et.TerritoryDescription, en.NADescription
FROM Employees e
LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID
LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID
WHERE e.LastName = 'Fuller'

And the function definitions are *similar* to those created in the
script listed at the end of this post (using the Northwind database).
I would solve the same problem using this approach:

SELECT e.LastName, t.TerritoryDescription, n.NADescription
FROM dbo.Employees e
LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID
JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID
LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID
JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID
WHERE e.LastName = 'Fuller'

The original developers are long gone and no one currently working
on the project knows why the function-centric approach was used.
There are no security restrictions that would merit such an approach,
and even if there were I would think a VIEW solution would be the
more traditional approach. And there are only a couple of complicated
relationships that one might want to "hide" from a less experienced
developer that did not know the intricacies of the data. Anybody have
any ideas when this approach would be justified?

-- Addl. Table and Function Definitions
CREATE TABLE Nationality
(
NationalityID int PRIMARY KEY,
NADescription varchar(20) NOT NULL
)
go
CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription)
go
INSERT Nationality VALUES (1,'America')
INSERT Nationality VALUES (2,'Canada')
INSERT Nationality VALUES (3,'Angola')
go

CREATE TABLE EmployeeNationality
(
EmployeeID int,
NationalityID int,
CreateDate datetime
CONSTRAINT PK_EmployeeNationality
PRIMARY KEY NONCLUSTERED
(EmployeeID,NationalityID,CreateDate)
)
go
INSERT EmployeeNationality values(1,1,'01/01/80')
INSERT EmployeeNationality values(1,1,'01/01/90')
INSERT EmployeeNationality values(2,1,'01/01/90')
INSERT EmployeeNationality values(3,2,'01/01/90')
INSERT EmployeeNationality values(4,3,'01/01/90')
go

CREATE FUNCTION fxEmpNat()

RETURNS TABLE

AS

RETURN (
SELECT e.EmployeeID, n.NADescription
FROM dbo.Employees e
JOIN dbo.EmployeeNationality en
ON e.EmployeeID = en.EmployeeID
AND en.CreateDate = (SELECT MAX(en2.CreateDate)
FROM EmployeeNationality en2
WHERE en.EmployeeID = en2.EmployeeID
AND en.NationalityID = en2.NationalityID )
JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID
)
go

CREATE FUNCTION fxEmpTer()

RETURNS TABLE

AS

RETURN (
SELECT e.EmployeeID, t.TerritoryDescription
FROM dbo.Employees e
JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID
JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID
)
go

Uri Dimant

unread,

Dec 3, 2007, 1:09:55 AM12/3/07

to

Garth
Those udfs are the same as VIEWs. Test it and see what approach performs
better

"Garth Wells" <nob...@nowhere.com> wrote in message
news:%23uZuXOW...@TK2MSFTNGP02.phx.gbl...

Erland Sommarskog

unread,

Dec 3, 2007, 3:10:53 AM12/3/07

to

Garth Wells (nob...@nowhere.com) writes:
> The original developers are long gone and no one currently working
> on the project knows why the function-centric approach was used.
> There are no security restrictions that would merit such an approach,
> and even if there were I would think a VIEW solution would be the
> more traditional approach. And there are only a couple of complicated
> relationships that one might want to "hide" from a less experienced
> developer that did not know the intricacies of the data. Anybody have
> any ideas when this approach would be justified?

Views or inline functions are really not much different. The latter takes
parameters, but else they are the same. (Multi-statement functions and
scalar functions are another matter.)

Personally, I've always been a little skeptic to using views and functions.
There is a risk that a developer thinks "hey, this view has some columns
I look for, and that view has a few more". But the views has some tables
in common, so the resulting query becomes unnecessarily complex and
inefficient.

An advantage with using functions over views, is that the parenthesis
clearly stand out, so you know when you see the query that it is a function.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

--CELKO--

unread,

Dec 3, 2007, 1:29:09 PM12/3/07

to

>> Views or inline functions are really not much different. The latter takes parameters, but else they are the same. <<

The optimizer can take a VIEW, expand it in-line and optimize the
query using it. I don't think that you can do that with the text of a
UDF, can you?

I also stuck in SQL Server with this proprietary syntax, people who
don't know the dialect have to maintain it, etc.

--CELKO--

unread,

Dec 3, 2007, 1:29:33 PM12/3/07

to

One reason I can think of is job security. This code will never port
and can be read only by dialect speakers. A function call cannot be
optimized like a VIEW, so you are at risk for poor performance, as
well as the maintenance problems, as time goes on.

Another reason is that they are not SQL programmers and do not think
in terms of declarations. They want to see the familiar function call
they know from procedural languages.

Russell Fields

unread,

Dec 3, 2007, 2:42:49 PM12/3/07

to

Joe,

You may be right in that the procedural language paradigm led them down this
route. However, you might be interested that in SQL Server oneclass of UDFs
(in-line table valued functions) does indeed expand in-line (thus the name)
and benefits from optimization.

RLF

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:dae05c2d-b7e8-4344...@y43g2000hsy.googlegroups.com...

Erland Sommarskog

unread,

Dec 3, 2007, 6:14:22 PM12/3/07

to

--CELKO-- (jcel...@earthlink.net) writes:
>>> Views or inline functions are really not much different. The latter
takes parameters, but else they are the same. <<
>
> The optimizer can take a VIEW, expand it in-line and optimize the
> query using it. I don't think that you can do that with the text of a
> UDF, can you?

That is precisly what SQL Server can do with an inline table function.
An inline table function is a parameterised view with another name.

JRStern

unread,

Dec 4, 2007, 6:05:00 PM12/4/07

to

I know why.

>AND en.CreateDate = (SELECT MAX(en2.CreateDate)

That's why.

Your data model has surrogate keys galore, because the natural keys are not
unique unless they also contain the date field, which they probably don't.

The ugliness of having a temporal dimension to your data needs
encapsulating, by view or functions. When you want to query by a time
function, the view wouldn't work, hence a more general use of UDFs.

Been there, done that, got the t-shirt.

Josh

--CELKO--

unread,

Dec 4, 2007, 9:21:28 PM12/4/07

to

>> When you want to query by a time function, the view wouldn't work, hence a more general use of UDFs. <<

Reasonable guess as to their mindset. But if you follow the
recommended pattern (Snodgrass, et al), you have durations (start,
end) pairs, a VIEW with the current status and a Calendar table in the
schema. No need for UDFs at all.

Alex Kuznetsov

unread,

Dec 5, 2007, 10:06:19 AM12/5/07

to

On Dec 3, 12:29 pm, --CELKO-- <jcelko...@earthlink.net> wrote:
> and can be read only by dialect speakers. A function call cannot be
> optimized like a VIEW,

Wrong. It can.

JRStern

unread,

Dec 5, 2007, 7:51:00 PM12/5/07

to

Probably best if they followed that pattern, BUT first, they think they may
need to do it by parameter for a non-current time, so they want to
encapsulate, and second, I don't really like that pattern, it puts a huge
burden on the data model and optimizer, I prefer to keep archive in a
separate table.

Half or more of these issues come up regarding SOX, people want an audit
trail and are afraid to ever delete anything. The other half comes from some
kind of legitimate temporal requirement, but in about four or five go-rounds
on this now in different shops, I've NEVER seen anyone do the analysis and
implementation with logical correctness or completeness, they always shoot
for way too much, underestimate the costs, and generally kill performance as
a side-effect.

When oh when will a database vendor implement the Snodgrass model - or their
own similar temporal models - in some push-button way, that hides the
temporal fields, defaults to current, and makes the hidden ones appear only
on demand?

Given that in twenty years they haven't moved on this or hardly on any kind
of object/relational technology, I ain't holding my breath.

Josh

--CELKO--

unread,

Dec 6, 2007, 2:10:56 PM12/6/07

to

>> Probably best if they followed that pattern, BUT first, they think they may need to do it by parameter for a non-current time, so they want to encapsulate, and second, I don't really like that pattern, it puts a huge burden on the data model and optimizer, I prefer to keep archive in a separate table.<<

I agree about the archives, and I want to put it into a Data
Warehouse, too. You usually have three classes of temporal data --
current, recent and archive. The trick is with the recent data; it it
one year ago? A week? A day? Hours ago? And as usual, the answer is
that it all depends ...

>> Half or more of these issues come up regarding SOX, people want an audit trail and are afraid to ever delete anything. <<

I was hoping that maybe the Democrats would make some effort to fix
this mess. It is not just an audit trail -- you have to explain every
tiny little thing in detail. <insert rant her>

>> The other half comes from some kind of legitimate temporal requirement, but in about four or five go-rounds on this now in different shops, I've NEVER seen anyone do the analysis and implementation with logical correctness or completeness, they always shoot for way too much, underestimate the costs, and generally kill performance as a side-effect. <<

Agreed.

JRStern

unread,

Dec 6, 2007, 6:37:00 PM12/6/07

to

"--CELKO--" wrote:

> >> Probably best if they followed that pattern, BUT first, they think they may need to do it by parameter for a non-current time, so they want to encapsulate, and second, I don't really like that pattern, it puts a huge burden on the data model and optimizer, I prefer to keep archive in a separate table.<<
>
> I agree about the archives, and I want to put it into a Data
> Warehouse, too. You usually have three classes of temporal data --
> current, recent and archive. The trick is with the recent data; it it
> one year ago? A week? A day? Hours ago? And as usual, the answer is
> that it all depends ...

Well yeah, but with the general temporal pattern, current is the current
row(s), "recent" is any non-current row, typically one year or your disk's
capacity or you haven't actually written the book-off to warehouse code yet,
whichever comes first.

> >> Half or more of these issues come up regarding SOX, people want an audit trail and are afraid to ever delete anything. <<
>
> I was hoping that maybe the Democrats would make some effort to fix
> this mess. It is not just an audit trail -- you have to explain every

> tiny little thing in detail. <insert rant here>

At the risk of falling off-topic, I'm not at all certain SOX wants you to
explain every little thing, and it should really have minimal impact on IT,
*but* for lack of wanting to address the real ethical and governance impact
it was supposed to have, the suits have decided to pretend it's really an IT
problem, bought a few terabytes of SAN, and told the grubs to save everything
<insert rant here>

> >> The other half comes from some kind of legitimate temporal requirement, but in about four or five go-rounds on this now in different shops, I've NEVER seen anyone do the analysis and implementation with logical correctness or completeness, they always shoot for way too much, underestimate the costs, and generally kill performance as a side-effect. <<
>
> Agreed.

So, how about the idea of actually tailoring physical-level storage to
facilitate this obnoxious new requirement, and building in support for it all
through the data-modeling system? IOW, extending the relational model
officially in these directions. Which I think is further than Snodgrass has
ever tried to take it.

Josh

--CELKO--

unread,

Dec 6, 2007, 10:18:44 PM12/6/07

to

>> So, how about the idea of actually tailoring physical-level storage to facilitate this obnoxious new requirement, and building in support for it all through the data-modeling system? <<

That was the idea of Red Brick and the DW products that followed it ..

>> IOW, extending the relational model officially in these directions. Which I think is further than Snodgrass has ever tried to take it. <<

When Rick was on X3H2, he proposed then withdrew a very elaborate
temporal model for the SQL Standard based on his work with the
temporal DBs in the research labs at the time. Unfortunately, this
was not an add-on for the existing products and would have meant
throwing out existing engines.

JRStern

unread,

Dec 11, 2007, 1:16:01 PM12/11/07

to

"--CELKO--" wrote:

> >> So, how about the idea of actually tailoring physical-level storage to facilitate this obnoxious new requirement, and building in support for it all through the data-modeling system? <<
>
> That was the idea of Red Brick and the DW products that followed it ..

Aha, I did not know that.

> >> IOW, extending the relational model officially in these directions. Which I think is further than Snodgrass has ever tried to take it. <<
>
> When Rick was on X3H2, he proposed then withdrew a very elaborate
> temporal model for the SQL Standard based on his work with the
> temporal DBs in the research labs at the time. Unfortunately, this
> was not an add-on for the existing products and would have meant
> throwing out existing engines.

I was reading his stuff back when, but never really followed that closely.
But my point was not really how it fit with the engines, so much as the
algebraic theory.

On the other hand, I can really see a set of design tools that honors (that
is, hides!) the temporal dimension and still works on current engines.

Josh