I started a new project recently and the .Net/SQL Server 2000 application I was hired to help with was developed by a large consulting firm. All the data access for the application is done via stored procedures, but most stored procedure data access is done via functions. So, you might have a SELECT inside a stored procedure that looks similar to this:
SELECT e.LastName, et.TerritoryDescription, en.NADescription FROM Employees e LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID WHERE e.LastName = 'Fuller'
And the function definitions are *similar* to those created in the script listed at the end of this post (using the Northwind database). I would solve the same problem using this approach:
SELECT e.LastName, t.TerritoryDescription, n.NADescription FROM dbo.Employees e LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID WHERE e.LastName = 'Fuller'
The original developers are long gone and no one currently working on the project knows why the function-centric approach was used. There are no security restrictions that would merit such an approach, and even if there were I would think a VIEW solution would be the more traditional approach. And there are only a couple of complicated relationships that one might want to "hide" from a less experienced developer that did not know the intricacies of the data. Anybody have any ideas when this approach would be justified?
-- Addl. Table and Function Definitions CREATE TABLE Nationality ( NationalityID int PRIMARY KEY, NADescription varchar(20) NOT NULL ) go CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription) go INSERT Nationality VALUES (1,'America') INSERT Nationality VALUES (2,'Canada') INSERT Nationality VALUES (3,'Angola') go
RETURN ( SELECT e.EmployeeID, n.NADescription FROM dbo.Employees e JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID AND en.CreateDate = (SELECT MAX(en2.CreateDate) FROM EmployeeNationality en2 WHERE en.EmployeeID = en2.EmployeeID AND en.NationalityID = en2.NationalityID ) JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID ) go
CREATE FUNCTION fxEmpTer()
RETURNS TABLE
AS
RETURN ( SELECT e.EmployeeID, t.TerritoryDescription FROM dbo.Employees e JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID ) go
>I started a new project recently and the .Net/SQL Server 2000 > application I was hired to help with was developed by a large > consulting firm. All the data access for the application is done via > stored procedures, but most stored procedure data access is done > via functions. So, you might have a SELECT inside a stored > procedure that looks similar to this:
> SELECT e.LastName, et.TerritoryDescription, en.NADescription > FROM Employees e > LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID > LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID > WHERE e.LastName = 'Fuller'
> And the function definitions are *similar* to those created in the > script listed at the end of this post (using the Northwind database). > I would solve the same problem using this approach:
> SELECT e.LastName, t.TerritoryDescription, n.NADescription > FROM dbo.Employees e > LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID > JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID > LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID > JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID > WHERE e.LastName = 'Fuller'
> The original developers are long gone and no one currently working > on the project knows why the function-centric approach was used. > There are no security restrictions that would merit such an approach, > and even if there were I would think a VIEW solution would be the > more traditional approach. And there are only a couple of complicated > relationships that one might want to "hide" from a less experienced > developer that did not know the intricacies of the data. Anybody have > any ideas when this approach would be justified?
> -- Addl. Table and Function Definitions > CREATE TABLE Nationality > ( > NationalityID int PRIMARY KEY, > NADescription varchar(20) NOT NULL > ) > go > CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription) > go > INSERT Nationality VALUES (1,'America') > INSERT Nationality VALUES (2,'Canada') > INSERT Nationality VALUES (3,'Angola') > go
> RETURN ( > SELECT e.EmployeeID, n.NADescription > FROM dbo.Employees e > JOIN dbo.EmployeeNationality en > ON e.EmployeeID = en.EmployeeID > AND en.CreateDate = (SELECT MAX(en2.CreateDate) > FROM EmployeeNationality en2 > WHERE en.EmployeeID = en2.EmployeeID > AND en.NationalityID = en2.NationalityID ) > JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID > ) > go
> CREATE FUNCTION fxEmpTer()
> RETURNS TABLE
> AS
> RETURN ( > SELECT e.EmployeeID, t.TerritoryDescription > FROM dbo.Employees e > JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID > JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID > ) > go
Garth Wells (nob...@nowhere.com) writes: > The original developers are long gone and no one currently working > on the project knows why the function-centric approach was used. > There are no security restrictions that would merit such an approach, > and even if there were I would think a VIEW solution would be the > more traditional approach. And there are only a couple of complicated > relationships that one might want to "hide" from a less experienced > developer that did not know the intricacies of the data. Anybody have > any ideas when this approach would be justified?
Views or inline functions are really not much different. The latter takes parameters, but else they are the same. (Multi-statement functions and scalar functions are another matter.)
Personally, I've always been a little skeptic to using views and functions. There is a risk that a developer thinks "hey, this view has some columns I look for, and that view has a few more". But the views has some tables in common, so the resulting query becomes unnecessarily complex and inefficient.
An advantage with using functions over views, is that the parenthesis clearly stand out, so you know when you see the query that it is a function.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
One reason I can think of is job security. This code will never port and can be read only by dialect speakers. A function call cannot be optimized like a VIEW, so you are at risk for poor performance, as well as the maintenance problems, as time goes on.
Another reason is that they are not SQL programmers and do not think in terms of declarations. They want to see the familiar function call they know from procedural languages.
You may be right in that the procedural language paradigm led them down this route. However, you might be interested that in SQL Server oneclass of UDFs (in-line table valued functions) does indeed expand in-line (thus the name) and benefits from optimization.
RLF
"--CELKO--" <jcelko...@earthlink.net> wrote in message
> One reason I can think of is job security. This code will never port > and can be read only by dialect speakers. A function call cannot be > optimized like a VIEW, so you are at risk for poor performance, as > well as the maintenance problems, as time goes on.
> Another reason is that they are not SQL programmers and do not think > in terms of declarations. They want to see the familiar function call > they know from procedural languages.
--CELKO-- (jcelko...@earthlink.net) writes: >>> Views or inline functions are really not much different. The latter
takes parameters, but else they are the same. <<
> The optimizer can take a VIEW, expand it in-line and optimize the > query using it. I don't think that you can do that with the text of a > UDF, can you?
That is precisly what SQL Server can do with an inline table function. An inline table function is a parameterised view with another name.
-- Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
Your data model has surrogate keys galore, because the natural keys are not unique unless they also contain the date field, which they probably don't.
The ugliness of having a temporal dimension to your data needs encapsulating, by view or functions. When you want to query by a time function, the view wouldn't work, hence a more general use of UDFs.
"Garth Wells" wrote: > I started a new project recently and the .Net/SQL Server 2000 > application I was hired to help with was developed by a large > consulting firm. All the data access for the application is done via > stored procedures, but most stored procedure data access is done > via functions. So, you might have a SELECT inside a stored > procedure that looks similar to this:
> SELECT e.LastName, et.TerritoryDescription, en.NADescription > FROM Employees e > LEFT JOIN fxEmpNat() en ON e.EmployeeID = en.EmployeeID > LEFT JOIN fxEmpTer() et ON e.EmployeeID = et.EmployeeID > WHERE e.LastName = 'Fuller'
> And the function definitions are *similar* to those created in the > script listed at the end of this post (using the Northwind database). > I would solve the same problem using this approach:
> SELECT e.LastName, t.TerritoryDescription, n.NADescription > FROM dbo.Employees e > LEFT JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID > JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID > LEFT JOIN dbo.EmployeeNationality en ON e.EmployeeID = en.EmployeeID > JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID > WHERE e.LastName = 'Fuller'
> The original developers are long gone and no one currently working > on the project knows why the function-centric approach was used. > There are no security restrictions that would merit such an approach, > and even if there were I would think a VIEW solution would be the > more traditional approach. And there are only a couple of complicated > relationships that one might want to "hide" from a less experienced > developer that did not know the intricacies of the data. Anybody have > any ideas when this approach would be justified?
> -- Addl. Table and Function Definitions > CREATE TABLE Nationality > ( > NationalityID int PRIMARY KEY, > NADescription varchar(20) NOT NULL > ) > go > CREATE INDEX IX_Nationality_NADescription ON Nationality(NADescription) > go > INSERT Nationality VALUES (1,'America') > INSERT Nationality VALUES (2,'Canada') > INSERT Nationality VALUES (3,'Angola') > go
> RETURN ( > SELECT e.EmployeeID, n.NADescription > FROM dbo.Employees e > JOIN dbo.EmployeeNationality en > ON e.EmployeeID = en.EmployeeID > AND en.CreateDate = (SELECT MAX(en2.CreateDate) > FROM EmployeeNationality en2 > WHERE en.EmployeeID = en2.EmployeeID > AND en.NationalityID = en2.NationalityID ) > JOIN dbo.Nationality n ON en.NationalityID = n.NationalityID > ) > go
> CREATE FUNCTION fxEmpTer()
> RETURNS TABLE
> AS
> RETURN ( > SELECT e.EmployeeID, t.TerritoryDescription > FROM dbo.Employees e > JOIN dbo.EmployeeTerritories et ON e.EmployeeID = et.EmployeeID > JOIN dbo.Territories t ON et.TerritoryID = t.TerritoryID > ) > go
>> When you want to query by a time function, the view wouldn't work, hence a more general use of UDFs. <<
Reasonable guess as to their mindset. But if you follow the recommended pattern (Snodgrass, et al), you have durations (start, end) pairs, a VIEW with the current status and a Calendar table in the schema. No need for UDFs at all.
Probably best if they followed that pattern, BUT first, they think they may need to do it by parameter for a non-current time, so they want to encapsulate, and second, I don't really like that pattern, it puts a huge burden on the data model and optimizer, I prefer to keep archive in a separate table.
Half or more of these issues come up regarding SOX, people want an audit trail and are afraid to ever delete anything. The other half comes from some kind of legitimate temporal requirement, but in about four or five go-rounds on this now in different shops, I've NEVER seen anyone do the analysis and implementation with logical correctness or completeness, they always shoot for way too much, underestimate the costs, and generally kill performance as a side-effect.
When oh when will a database vendor implement the Snodgrass model - or their own similar temporal models - in some push-button way, that hides the temporal fields, defaults to current, and makes the hidden ones appear only on demand?
Given that in twenty years they haven't moved on this or hardly on any kind of object/relational technology, I ain't holding my breath.
"--CELKO--" wrote: > >> When you want to query by a time function, the view wouldn't work, hence a more general use of UDFs. <<
> Reasonable guess as to their mindset. But if you follow the > recommended pattern (Snodgrass, et al), you have durations (start, > end) pairs, a VIEW with the current status and a Calendar table in the > schema. No need for UDFs at all.
>> Probably best if they followed that pattern, BUT first, they think they may need to do it by parameter for a non-current time, so they want to encapsulate, and second, I don't really like that pattern, it puts a huge burden on the data model and optimizer, I prefer to keep archive in a separate table.<<
I agree about the archives, and I want to put it into a Data Warehouse, too. You usually have three classes of temporal data -- current, recent and archive. The trick is with the recent data; it it one year ago? A week? A day? Hours ago? And as usual, the answer is that it all depends ...
>> Half or more of these issues come up regarding SOX, people want an audit trail and are afraid to ever delete anything. <<
I was hoping that maybe the Democrats would make some effort to fix this mess. It is not just an audit trail -- you have to explain every tiny little thing in detail. <insert rant her>
>> The other half comes from some kind of legitimate temporal requirement, but in about four or five go-rounds on this now in different shops, I've NEVER seen anyone do the analysis and implementation with logical correctness or completeness, they always shoot for way too much, underestimate the costs, and generally kill performance as a side-effect. <<
"--CELKO--" wrote: > >> Probably best if they followed that pattern, BUT first, they think they may need to do it by parameter for a non-current time, so they want to encapsulate, and second, I don't really like that pattern, it puts a huge burden on the data model and optimizer, I prefer to keep archive in a separate table.<<
> I agree about the archives, and I want to put it into a Data > Warehouse, too. You usually have three classes of temporal data -- > current, recent and archive. The trick is with the recent data; it it > one year ago? A week? A day? Hours ago? And as usual, the answer is > that it all depends ...
Well yeah, but with the general temporal pattern, current is the current row(s), "recent" is any non-current row, typically one year or your disk's capacity or you haven't actually written the book-off to warehouse code yet, whichever comes first.
> >> Half or more of these issues come up regarding SOX, people want an audit trail and are afraid to ever delete anything. <<
> I was hoping that maybe the Democrats would make some effort to fix > this mess. It is not just an audit trail -- you have to explain every > tiny little thing in detail. <insert rant here>
At the risk of falling off-topic, I'm not at all certain SOX wants you to explain every little thing, and it should really have minimal impact on IT, *but* for lack of wanting to address the real ethical and governance impact it was supposed to have, the suits have decided to pretend it's really an IT problem, bought a few terabytes of SAN, and told the grubs to save everything <insert rant here>
> >> The other half comes from some kind of legitimate temporal requirement, but in about four or five go-rounds on this now in different shops, I've NEVER seen anyone do the analysis and implementation with logical correctness or completeness, they always shoot for way too much, underestimate the costs, and generally kill performance as a side-effect. <<
> Agreed.
So, how about the idea of actually tailoring physical-level storage to facilitate this obnoxious new requirement, and building in support for it all through the data-modeling system? IOW, extending the relational model officially in these directions. Which I think is further than Snodgrass has ever tried to take it.
>> So, how about the idea of actually tailoring physical-level storage to facilitate this obnoxious new requirement, and building in support for it all through the data-modeling system? <<
That was the idea of Red Brick and the DW products that followed it ..
>> IOW, extending the relational model officially in these directions. Which I think is further than Snodgrass has ever tried to take it. <<
When Rick was on X3H2, he proposed then withdrew a very elaborate temporal model for the SQL Standard based on his work with the temporal DBs in the research labs at the time. Unfortunately, this was not an add-on for the existing products and would have meant throwing out existing engines.
"--CELKO--" wrote: > >> So, how about the idea of actually tailoring physical-level storage to facilitate this obnoxious new requirement, and building in support for it all through the data-modeling system? <<
> That was the idea of Red Brick and the DW products that followed it ..
Aha, I did not know that.
> >> IOW, extending the relational model officially in these directions. Which I think is further than Snodgrass has ever tried to take it. <<
> When Rick was on X3H2, he proposed then withdrew a very elaborate > temporal model for the SQL Standard based on his work with the > temporal DBs in the research labs at the time. Unfortunately, this > was not an add-on for the existing products and would have meant > throwing out existing engines.
I was reading his stuff back when, but never really followed that closely. But my point was not really how it fit with the engines, so much as the algebraic theory.
On the other hand, I can really see a set of design tools that honors (that is, hides!) the temporal dimension and still works on current engines.