Running tSQLt tests against real data?

460 views
Skip to first unread message

Karri

unread,
May 26, 2021, 4:15:36 AM5/26/21
to tSQLt
Hi,

I've been getting started with tSQLt and have successfully been able to create and run a test case. The case uses FakeTable, inserts data and then compares the expected result using AssetEqualsTable. In this scenario, I need to supply the test data myself.

The tables in my database that I'm testing already have data from the source systems. How do I write a test case to test against the *actual* data in a table/view?

For example, if I have a Unit dimension, that shows a part of an organization's structure, and I want to write a unit test that checks against the actual data that the "HR", "Finance", "IT", "Sales" and "Distribution" all exist in the this dimension. How do I do that with tSQLt?

Thanks,
Karri

Dmitrij Kultasev

unread,
May 26, 2021, 4:46:47 AM5/26/21
to ts...@googlegroups.com
Unit tests should run on any environment, so running it on the real data is kind of against philosophy. tSQLt doesn't limit you to do so, but this would introduce problems. What will happen if somebody messed with the data? --> you might have failing tests on other dev machines. How will you add the data without affecting others? What will happen if you have different versions of the code in different branches? and so on

Normally you should have kind of initial data setup for each testing class. There is built in functionality - if you create stored procedure with the name [class name].setup it will automatically be executed with every test in that class. However you might want to simply create setup procedure with different name, for example [class name].initial_setup and call it in each test where you need it. 

--
You received this message because you are subscribed to the Google Groups "tSQLt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tsqlt+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tsqlt/c15ed010-c9b0-4ac5-b12b-533ef92af0a3n%40googlegroups.com.

Karri

unread,
May 26, 2021, 5:07:25 AM5/26/21
to tSQLt
I understand the concern.
However, in our trunk-based development approach, all devs share the same source code. And because we use a shared development environment (for better or for worse), changes in the data is not a real issue. At least it hasn't up to now.

But I'd be interested in understanding by example, what a test case like this would look like, when you want to read actual data from a table.

Alex Yates

unread,
May 26, 2021, 5:17:48 AM5/26/21
to ts...@googlegroups.com
Test case:

The insert sproc should handle -ve inputs

1. Run faketable command to clean table
2. Run sproc with specific -ve input
3. Read table to validate correct data now in table

Test case:

Return sproc should only return data in valid format

1. Run faketable to clean table
2. Insert test cases, some valid, some invalid, and invalid in all those ways that good testers instinctively know to test against.
3. Run sproc
4. Check that only the data in the valid format is returned

Etc


Jasper Lai

unread,
May 26, 2021, 5:22:53 AM5/26/21
to ts...@googlegroups.com
Hi, 
I haven't used tSQLt for a while, but as I know the tSQLt is for Unit Test, and the data must be  'cleared' or 'known' INPUT, PROCESS, OUTPUT that controlled by developer or tester. 
So you can fake data for INPUT by FakeTable , and fake process for PROCESS by FakeFunction or SypProcedure, then Assert whether the actual outcome is equal to expected OUTPUT or not.
If the INPUT is real data, maybe you can select all real into the faked table, but you must know what is the expected OUTPUT, then you can ASSERT between expected and actual.

The tutorial official web site link : https://tsqlt.org/user-guide/tsqlt-tutorial/ 

Best Regards,
   Jasper  

Karri <karri.l...@gmail.com> 於 2021年5月26日 週三 下午4:15寫道:
--

Karri

unread,
May 26, 2021, 5:23:01 AM5/26/21
to tSQLt
Thanks.
So basically I can run FakeTable to fake the testable object, and then insert data from it to the fake table and then read it to validate the results?
Why is this not documented anywhere? :)

Bryant McClellan

unread,
May 26, 2021, 6:47:14 AM5/26/21
to ts...@googlegroups.com
Karri,
We make use of the SSMSBoost feature that allows you to script the results of a query as, for example, an INSERT statement. That becomes the seed data for a faked table in a test. Of course that is only a starting point. Normally the data in the table currently does not reflect all cases. You always want to simulate data that won't be affected by the thing being tested (like a stored procedure) as well as that which will be affected.

G Bryant McClellan
Senior Solutions Architect
260.748.5393 (direct)

Do it Best Corp.
6502 Nelson Road | Fort Wayne, IN 46803




Karri

unread,
May 26, 2021, 11:11:11 AM5/26/21
to tSQLt
I feel really stupid for not understanding how this works.
If I put tSQLt into the actual database that has my objects (in order to make things easier), and run the following unit test, 

CREATE PROCEDURE [test_Account].[test that the main Accounts exist]
AS
BEGIN
    --Assemble
    EXEC tSQLt.FakeTable @TableName = N'dbo.DimAccount';
    IF OBJECT_ID('#Actual') IS NOT NULL DROP TABLE #Actual;
    IF OBJECT_ID('#Expected') IS NOT NULL DROP TABLE #Expected;
    SELECT DISTINCT (AccountID) INTO #Actual FROM dbo.DimAccount;

    --Act
    SELECT TOP(0) AccountID INTO #Expected FROM #Actual;
    INSERT INTO #Expected (AccountID)
    VALUES
        ('1000,')
        ,('2000')
        ,('3000')
        ,('4000')
        ,('5000')

    --Assert
    EXEC tSQLt.AssertEqualsTable @Expected = N'Expected', @Actual = N'Actual';
END;

The runtime error for the test case is : [test_Account].[test that the main Accounts exist] failed: (Failure) 'Expected' does not exist

However if I run a test with manual values, it works without issues:

CREATE PROCEDURE [test_Account].[test inserting an account number into Account]
AS
BEGIN
  --Assemble
    EXEC tSQLt.FakeTable @TableName = N'dbo.DimAccount';
INSERT INTO dbo.DimAccount ( AccountID ) VALUES ('1000' );

  --Act
  SELECT AccountID INTO #Actual
  FROM dbo.DimAccount

  SELECT TOP(0) AccountID INTO #Expected
  FROM #Actual

  INSERT INTO #Expected ( AccountID ) VALUES ('1000')

  --Assert
  EXEC tSQLt.AssertEqualsTable @Expected = N'#Expected', @Actual = N'#Actual'  
END;


Bryant McClellan

unread,
May 26, 2021, 11:23:00 AM5/26/21
to ts...@googlegroups.com
Include the pound signs for the actual and expected tables in the assertion call. That differs in your 2 examples.

G Bryant McClellan
Senior Solutions Architect
260.748.5393 (direct)

Do it Best Corp.
6502 Nelson Road | Fort Wayne, IN 46803



Bill Wood

unread,
May 26, 2021, 11:28:43 AM5/26/21
to ts...@googlegroups.com
Maybe I'm missing something, but I think there is a fundamental misunderstanding of the purpose of tSQLt in respect to testing production data.

tSQLt is for unit testing, NOT for data validation. When doing unit tests, you want to use data that represents all of the potential use cases. You don't want to rely on production data to do that because there are cases you may want to test for that don't exist in your production data. What do you do then? Dummy up data in production? Skip the tests?

So you use unit testing to exhaustively test the use cases for functional accuracy. You create separate non-tSQLt tests to validata data. That is often done in ETL processes, in front ends, or could be nightly processes you run against the database to look for exceptions. 

In the latter case (nightly processes) you'd create the procedures, use tSQLt to test that they are accurate for all identified use cases, then use them on your production database to look for exceptions.

That is my approach.

Regards,

Bill

On Wed, May 26, 2021 at 11:11 AM Karri <karri.l...@gmail.com> wrote:

Bryant McClellan

unread,
May 26, 2021, 11:55:40 AM5/26/21
to ts...@googlegroups.com
Bill,
Absolutely. I noted earlier in the thread that we use SSMSBoost to help us turn queries against production data into INSERT statements for faked tables. Once the data is in the test, it is fixed. No reference to production data exists in any of our tests. Relying on production data is extremely dicey because it does what it is supposed to do...it changes.

As you note data validation is an entirely different animal. When I taught tSQLt internally I told people to assume the object under test is a black box. If I feed data A into the box, I expect data B to come out. Whether the data would be valid in a production situation is irrelevant. It is that the predicted changes occur and no more or less.

I was assuming that Karri was just making a simple example to work from and that this was not his finished product.

G Bryant McClellan
Senior Solutions Architect
260.748.5393 (direct)

Do it Best Corp.
6502 Nelson Road | Fort Wayne, IN 46803



djla...@gmail.com

unread,
May 26, 2021, 12:20:01 PM5/26/21
to ts...@googlegroups.com
In the example that does NOT work, you're missing hash marks in the @Expected = .....

Karri

unread,
May 27, 2021, 2:44:48 AM5/27/21
to tSQLt
Thanks all for the feedback, I appreciate the fact that I might be doing things wrong here.

Regarding the data used for the unit test, how do you handle situations where you need a substantial amount of data for the test?
Do you write procedures with the actual INSERT INTO statements into the same test class as your unit test? Adding a 1000 rows of inserts into a test case doesn't really feel like a good idea.

PS. Thanks for noticing the missing #-sign in my query, this resolved the issue :)

Ed Elliott

unread,
May 27, 2021, 2:52:17 AM5/27/21
to ts...@googlegroups.com
Unit tests are for checking the logic, when I write them I use enough rows to satisfy any forks in the code (if/join filters) and that’s it (maybe a row that shouldn’t match as well)

Most times this means I have about 5 or so rows to setup, any more and you are moving more into integration  test territory.

Why do you need 1000 rows?

Ed


From: ts...@googlegroups.com <ts...@googlegroups.com> on behalf of Karri <karri.l...@gmail.com>
Sent: Thursday, May 27, 2021 7:44:48 AM
To: tSQLt <ts...@googlegroups.com>
Subject: Re: #tSQLt: Running tSQLt tests against real data?
 

Karri

unread,
May 27, 2021, 6:47:18 AM5/27/21
to tSQLt
Thanks for the reply. That does sounds like a smart way of working.
And maybe I don't really need 1000 rows for a test case, I'm just wondering (out loud) how do you test scenarios where the logic of your stored procedure or view for example is built to calculate cumulative values from daily records? 
Again, maybe I'm just overthinking this :)

Karri

unread,
May 27, 2021, 6:50:08 AM5/27/21
to tSQLt
But you can of course calculate cumulative values for just 5 rows :)

Bill Wood

unread,
May 27, 2021, 8:09:18 AM5/27/21
to ts...@googlegroups.com
Hi Karri,

You might need 1000 rows or more if you are stress testing or validating some overflow condition or some other scenario, so you should probably consider whether that should be done within tSQLt or outside it. It depends on your test. If I was stress testing and monitoring performance with another tool I probably wouldn't use tSQLt, but if I was stress testing to test my error handling logic I would.

If you need 1000 rows and it needs to be in a table that you are faking with tSQLt then I'd create a permanent data staging test table with that data, create the fake table within my test procedure, and load that faked table from the staging table. If the table doesn't need to be faked, you could just use it as is.

Regards,

Bill.

djla...@gmail.com

unread,
May 27, 2021, 12:30:01 PM5/27/21
to ts...@googlegroups.com
hi:  every test class has a Setup procedure, and you can do a lot of prep work there.  Thinks like creating all the fake tables, fake functions, etc.  Because most of the procedures we're testing are quite complex, it is hard to build all the test objects and populate them with data (for all the various tests that need to be done) in that one setup procedure.  So, we usually just put what is common (like creating/faking objects) there, and do the populating in the discrete tests.  

There are many ways to reduce the burden on populating tables in the tests.  A couple of things we learned the hard way:

1) faked tables are flexible, and usually ignore various NOT NULL etc constraints that might be present on the real object.  So, you can just populate the handful of columns that are pertinent to the test.
2) if a sproc uses a view, the HARD WAY to create that view is to fake all the component tables, populate them with values, etc so that the resulting view has the data needed.  The SIMPLER way is something like:  
EXEC tSQLt.RemoveObject 'dbo.v_ivLpnDetail';
EXEC ('
CREATE TABLE dbo.v_ivLpnDetail (
location location NULL,
itemId id NULL,
lotId id NULL
)
INSERT INTO dbo.v_ivLpnDetail
(
location,
itemId,
lotId
)
VALUES (
''myDestinationLocation'', -- location - location
2, -- itemId - id
3 -- lotId
),
(
''mySourceLocation'', -- location - location
1, -- itemId - id
1 -- lotId
),(
''mySourceLocation'', -- location - location
2, -- itemId - id
2 -- lotId
)
'    );
That's one way to do it, and there are others.  The view v_ivLpnDetail is immensely complex, and it would take hundreds of rows to generate what is represented in that simple, stripped down fake view.

3) make liberal use of tSQLt.RemoveObject to get rid of udfs.   For example:
    EXEC tSQLt.RemoveObject 'dbo.udf_xyz';
    EXEC ('CREATE FUNCTION dbo.udf_xyz(@location location, @transactionType transactionType) RETURNS BIT AS BEGIN RETURN (1); END;');
will bypass a complex function that you don't want to test.

4) when testing actual vs expected results, you can reduce the columns tested to only the very barest of minimums.  If a complex sproc ends up only changing a qty in a detail table, then the key and quantity are enough to test.

hth  David



Karri

unread,
May 28, 2021, 2:47:00 AM5/28/21
to tSQLt
Right.
We don't use any UDFs in our implementation, any business logic required is either in a view or if it becomes a performance problem, then it's a stored proc.

Question about setting up test cases:
If I have a complex view or stored proc, and it depends on a bunch of tables or even other views, what's the best way to setup the test case data?
Do I just fake each table that is used in the view, populate that with the test data? Or is there some smarter way?

Dmitrij Kultasev

unread,
May 28, 2021, 4:03:31 AM5/28/21
to ts...@googlegroups.com
As it was mentioned already that the best way would be to fake all the tables in that view and populate it with only data you need for test. It is smart enough way.
It might look that you need to do a lot of stuff, however if there are a lot of objects in the view, then you probably need bunch of tests. So at the end, you'll spend some time to do initial setup, however for every next test, you'll not need to do that again.

so, to be short:
* mock/fake all objects in your tested object
* create initial setup procedure to have all the data needed for this test suite

Bryant McClellan

unread,
Jun 1, 2021, 7:54:57 AM6/1/21
to ts...@googlegroups.com
It depends on what you need to accomplish. You also start with the premise that you can't test everything and that certain things already have tests elsewhere. For example, if you are testing a procedure that consumes a procedure, you don't test them both in one set of tests. You have independent tests for the called procedure.

A view is a perfectly valid target to fake. tSQLt creates a table that uses the schema of the view. If you are only reading the view, this will work, your code will be shorter and run faster. If the view is complex then likely you are not attempting to manipulate data through it. If you are then faking the underlying tables is the way to go.

G Bryant McClellan
Senior Solutions Architect
260.748.5393 (direct)

Do it Best Corp.
6502 Nelson Road | Fort Wayne, IN 46803



Reply all
Reply to author
Forward
0 new messages