Advice wanted - Cucumber and SBE for data transformations.

89 views
Skip to first unread message

kwangomango

unread,
Dec 19, 2016, 8:34:59 AM12/19/16
to Cukes

Hi

I'm looking for some advice on whether Cucumber and SBE is a good fit for the data transformation logic in ETL applications.

 

I work on an integration platform where the core functionality is routing and transforming messages between source and target sytems. The various interfaces include  csv, json and xml formats.

 

A typical epic for our dev teams might be:

 

 'A cash trade (in json format) from System A needs to be transformed (to an industry standard xml format) and routed to accounting System B.'

 

The development process for such a story begins with an analyst (often in isolation) documenting the mapping rule for each piece of data which must be transformed between source and target systems. A piece of data is represented by a value and a location xpath/jsonpath/csv header depending on the format.  There could be in the region of a few hundred pieces of data on a trade and each one has a transformation rule.

The bulk of transformations are what we call simple 1 to 1 mappings: the data value doesn't change but the location is obviously different between the source and target formats. More complicated transformation rules may have conditional logic based on several values .

 

Here is an example rule as written by the analyst for a json to xml transformation for a bond trade. The target settlement amount value is based the source trade principal value:

 

Target location xpath:

StockTransaction/paymentDetails/settlementAmount/amount

 

Target value:

If SM_CURRENCY = TRD_CURRENCY

TRD_PRINCIPAL

else

 TRD_PRINCIPAL*SETTLE_EXCH_RATE

 

 

 

So I had an attempt at writing this rule as a Gherkin feature with a view to replacing the  mapping spec with a series of features which could then be automated using either Cucumber or Spock.

 

Feature: Settlement amount

Target location xpath:

StockTransaction/paymentDetails/settlementAmount/amount

 

Target value:

If SM_CURRENCY = TRD_CURRENCY

TRD_PRINCIPAL

else

 TRD_PRINCIPAL*SETTLE_EXCH_RATE

 

 

  Scenario: Settlement currency is the same as trade currency

Given SM_CURRENCY is 'USD'

And TRD_CURRENCY is 'USD'

And TRD_PRINCIPAL is 1000000

Then  amount is 1000000

 

 

  Scenario: Settlement currency and trade currency are different

Given SM_CURRENCY is 'USD'

And TRD_CURRENCY is 'GBP'

And TRD_PRINCIPAL is 1000000

And SETTLE_EXCH_RATE is 1.5

Then  amount is 1500000

 

 

To me that certainly illustrates the business rule for that particular piece of data  and looks like a specification the devs could code from. The feature could also easily be automated using cucumber or spock.

What do you guys think about this as an approach, especially when it needs to scale for the many more pieces of data which make up the trade? Is it too low level?

 

The analysts are currently are very comfortable using excel to document the business rules but rarely give example scenarios. As you can imagine this excel specification frequently gets out of sync with the code, and the tests we automate from it are often just decided by devs and testers without any input from the analysts. SBE seems like a good way to remove these discrepancies and have a single testable artefact.

Andrew Premdas

unread,
Dec 19, 2016, 8:36:31 AM12/19/16
to cu...@googlegroups.com
It would seem to me that Cukes is not a particularly good fit for this sort  of work. All the business rules described here are detail orientated and about how the data is transformed. There is nothing about why data is transformed in a particular way, or about the value this gives to the business. So I'd see the Cukes you'd be producing in this space as being

1. Full of detail
2. Prone to error (typos etc.)
3. Expensive to change (every minor specification change involves rewriting scenarios)
4. Not very useful when something goes wrong (it is the scenario thats wrong or the code?)

A unit test tool would be much more suitable for this sort of work.

All best

Andrew

--
Posting rules: http://cukes.info/posting-rules.html
---
You received this message because you are subscribed to the Google Groups "Cukes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
------------------------
Andrew Premdas

kwangomango

unread,
Dec 19, 2016, 11:48:28 AM12/19/16
to Cukes
Thanks for your reply Andrew.

So i f we ignore the cucumber automation part for a minute, are you saying that we would not benefit from Specification By Example and capturing the work the analyst does as Gherkin features?
Our main priority is to do away with the spreadsheets which the analysts produce and replace it with some form of living documentation. The spreadsheet is frequently getting out of sync with the code.

The current process involves a handover of this spreadsheet to a dev and a tester who write the code and produce unit tests and acceptance tests to meet the spreadsheet rules. However there is way too much interpretation involved - one dev might write a fitnesse test that tests multiple rules and another might write several individual unit tests. I'm trying to remove the inconsistencies and have a consistent executable specification.
To unsubscribe from this group and stop receiving emails from it, send an email to cukes+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Andrew Premdas

unread,
Dec 19, 2016, 12:08:47 PM12/19/16
to cu...@googlegroups.com
On 19 December 2016 at 15:21, 'kwangomango' via Cukes <cu...@googlegroups.com> wrote:
Thanks for your reply Andrew.

So i f we ignore the cucumber automation part for a minute, are you saying that we would not benefit from Specification By Example and capturing the work the analyst does as Gherkin features?
Our main priority is to do away with the spreadsheets which the analysts produce and replace it with some form of living documentation. The spreadsheet is frequently getting out of sync with the code.

Yes thats my opinion. All you be doing is replacing one sort of documentation that gets out of sync with the code, with another sort of documentation that gets out of sync with the code. In this case I think using Gherkin would probably be a worse solution than your current one.

Only the code can be the single point of truth. The idea that Gherkin can be a single point of truth is a common one. But its completely wrong. Gherkin at is best is a high level map of a problem space (a bit like the London tube map). It can be great at telling you where to look, but it can never be the territory.

What I would be looking to explore in this problem space is ways of making the code itself self documenting. I wonder if a DSL might be a viable solution for this.

As far as testing is concerned, I think clever use of a unit testing tool would be effective here. I suspect you have a large amount of functional repetition in this problem space and that gives an opportunity to find commonality, name it and write tools to support the testing of it.

Hope thats of some use

All best

Andrew
To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

George Dinwiddie

unread,
Dec 19, 2016, 1:43:00 PM12/19/16
to cu...@googlegroups.com
Kwangomango,

I've helped people implement such scenarios by generating input data
from the Given (e.g., by setting certain values in a default XML or
other input file) and checking values in the output with the Then (e.g.,
making assertions about values retrieved using xpath), but as Andrew
says, there is limited value in these tests.

I suggest trying to think about what is intended to be accomplished at
the next highest conceptual level. The business surely has a higher goal
than ETL.

As Andrew suggests, you can unit test the ETL transformations. Or, since
there are many and they're probably not being developed one by one as
tests are specified, you might find approval tests helpful. In this
case, you run the transformation and carefully examine the result. If
it's satisfactory, you save it as "golden data" expected results for
future runs. This is a format of the "guru checks output once" pattern.
It's not BDD, but it can be useful testing.

- George

On 12/19/16 8:08 AM, 'kwangomango' via Cukes wrote:
> Hi
>
> I'm looking for some advice on whether Cucumber and SBE is a good fit
> for the data transformation logic in ETL applications.
>
>
>
> I work on an integration platform where the core functionality is
> routing and transforming messages between source and target sytems. The
> various interfaces include csv, json and xml formats.
>
>
>
> A typical epic for our dev teams might be:
>
>
>
> * 'A cash trade (in json format) from System A needs to be transformed
> (to an industry standard xml format) and routed to accounting System B.'*
>
>
>
> The development process for such a story begins with an analyst (often
> in isolation) documenting the mapping rule for each piece of data which
> must be transformed between source and target systems. A piece of data
> is represented by a value and a location xpath/jsonpath/csv header
> depending on the format. There could be in the region of a few hundred
> pieces of data on a trade and each one has a transformation rule.
>
> The bulk of transformations are what we call simple 1 to 1 mappings: the
> data value doesn't change but the location is obviously different
> between the source and target formats. More complicated transformation
> rules may have conditional logic based on several values .
>
>
>
> Here is an example rule as written by the analyst for a json to xml
> transformation for a bond trade. The target settlement amount value is
> based the source trade principal value:
>
>
>
> *Target location xpath:*
>
> StockTransaction/paymentDetails/settlementAmount/amount
>
>
>
> *Target value:*
>
> If SM_CURRENCY = TRD_CURRENCY
>
> TRD_PRINCIPAL
>
> else
>
> TRD_PRINCIPAL*SETTLE_EXCH_RATE
>
>
>
>
>
>
>
> So I had an attempt at writing this rule as a Gherkin feature with a
> view to replacing the mapping spec with a series of features which
> could then be automated using either Cucumber or Spock.
>
>
>
> *Feature:* Settlement amount
>
> Target location xpath:
>
> StockTransaction/paymentDetails/settlementAmount/amount
>
>
>
> Target value:
>
> If SM_CURRENCY = TRD_CURRENCY
>
> TRD_PRINCIPAL
>
> else
>
> TRD_PRINCIPAL*SETTLE_EXCH_RATE
>
>
>
>
>
> *Scenario:* Settlement currency is the same as trade currency
>
> Given SM_CURRENCY is 'USD'
>
> And TRD_CURRENCY is 'USD'
>
> And TRD_PRINCIPAL is 1000000
>
> Then amount is 1000000
>
>
>
>
>
> *Scenario:* Settlement currency and trade currency are different
>
> Given SM_CURRENCY is 'USD'
>
> And TRD_CURRENCY is 'GBP'
>
> And TRD_PRINCIPAL is 1000000
>
> And SETTLE_EXCH_RATE is 1.5
>
> Then amount is 1500000
>
>
>
>
>
> To me that certainly illustrates the business rule for that particular
> piece of data and looks like a specification the devs could code from.
> The feature could also easily be automated using cucumber or spock.
>
> What do you guys think about this as an approach, especially when it
> needs to scale for the many more pieces of data which make up the trade?
> Is it too low level?
>
>
>
> The analysts are currently are very comfortable using excel to document
> the business rules but rarely give example scenarios. As you can imagine
> this excel specification frequently gets out of sync with the code, and
> the tests we automate from it are often just decided by devs and testers
> without any input from the analysts. SBE seems like a good way to remove
> these discrepancies and have a single testable artefact.


--
----------------------------------------------------------------------
* George Dinwiddie * http://blog.gdinwiddie.com
Software Development http://www.idiacomputing.com
Consultant and Coach http://www.agilemaryland.org
----------------------------------------------------------------------

Mark Levison

unread,
Dec 19, 2016, 1:53:14 PM12/19/16
to cu...@googlegroups.com
Kwangomango - I would also add that since BDD/SpecByExample/et al are tools for helping with collaboration the idea that scenarios would created in isolation is a warning sign. Consider where the business/team collaboration is and that is where I would use these tools.

I return you to the genius of Andrew and George

Cheers
Mark

kwangomango

unread,
Dec 20, 2016, 8:39:08 AM12/20/16
to Cukes
Thanks for all the responses guys, i really appreciate it. I must admit though i'm starting to question my understanding of a few things!

I now have about a thousand questions but i'll just start with a point that may help add some clarification. The gherkin features i had in my original post describing a transformation, were never going to be automated using cucumber step definitions or some other full stack integration test. They were always intended to end up as unit tests using something like spock or another BDD flavoured unit test framework which made the test understandable by the less technical people in our team. So the example i gave earlier may have a scenario which looks like this when automated in Spock:

def "Settlement currency is the same as trade currency"() {
    given: "SM_CURRENCY is 'USD'"  
    and: "TRD_CURRENCY is 'USD'"
    and: "TRD_PRINCIPAL is '1000000'"
          def inputTrade = createPrototypeTrade(SM_CURRENCY:USD,TRD_CURRENCY: USD, TRD_PRINCIPAL:1000000)
    when: "settlement amount is generated"
          def outputTrade = tradeTransformer.transform(inputTrade)
    then: "settlement amount is 100000"
             outputTrade.settlementAmount == 1000000
}

Note: i'd probably parameterise this using a 'where' block to remove the hardcoded values.

The point i am trying to make is, i'm looking for a way to make that spock test and the analysts requirement be the same(or nearly the same) artefact. Hence my thought of using Gherkin. It would force the devs into writing the necessary unit tests and leave less to their interpretation. Would that still be considered SBE?

Thanks,

K.


On Monday, December 19, 2016 at 1:36:31 PM UTC, apremdas wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to cukes+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Andrew Premdas

unread,
Dec 20, 2016, 9:32:49 AM12/20/16
to cu...@googlegroups.com
My point/opinion is that Gherkin is not a good fit for the problems you are describing. From what I've seen your Gherkin scenarios obfuscate the problem rather than bring clarity to it. This happens whether you implement these scenarios in Cukes or a unit test framework.

GWT is all about uncovering and expressed what and why. So in your example GWT would be a good way to explore 'why' your transferring a bond trade in json to XML, but its a horrible way to express 'how' each field in the bond trade becomes a field in the XML representation.

How about bringing your analysts and programmers together and use that collaboration to explore better ways of getting business rules implemented and kept upto date.

To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

kwangomango

unread,
Dec 21, 2016, 9:50:02 AM12/21/16
to Cukes
Thanks again Andrew. I think i'm beginning to see what you are saying...

If however, i remove the transformation between json and xml formats from the picture, and just focus on a piece of business logic; do you think GWT/Gherkin is still an appropriate way of framing these unit tests?

Feature: Determine settlement amount

If SM_CURRENCY = TRD_CURRENCY

settlement amount = TRD_PRINCIPAL

else

 settlement amount = TRD_PRINCIPAL*SETTLE_EXCH_RATE


Scenario: Settlement currency and trade currency are different

Given settlement currency is 'USD'

And TRD_CURRENCY is 'GBP'

And TRD_PRINCIPAL is £1000000

And SETTLE_EXCH_RATE is 1.5

       Then  settlement amount is $1500000

or like this in a more wordy style:

Feature: Determine settlement amount
  • If trade currency and settlement currency are different then settlement amount is settlement exchange rate multiplied by the trade principal amount.
  • If trade currency and settlement currency are the same, then settlement amount is the trade principal amount

Scenario: Settlement currency and trade currency are different

Given settlement currency is 'USD'

And trade currency is 'GBP'

And trade principal is £1000000

And settlement exchange rate is 1.5

Then  settlement amount is $1500000


Thanks,

K.

Andrew Premdas

unread,
Dec 21, 2016, 10:30:54 AM12/21/16
to cu...@googlegroups.com
At the core of your example is one simple business rule, which isn't very well expressed that you need to take account of the exchange rate when calculating settlement amounts. All that the GWT is adding to this is a really wordy example that obfuscates this simple statement. In this case its much clearer to write

sa = ser * tpa

and as ser = tcr/scr   sa = tcr/scr * tpa

we can write sa = tcr/scr * tpa

(forgive me for not writing these out in longhand, and there are some assumptions I'm making about currency rates to make this quite this simple)

sa = tcr/scr * tpa explains 'how' you calculate. And is efficient and accurate for doing this. GWT and SBE is terribly inefficient and worse inaccurate for doing this. What GWT is for is to explain 'why' you are doing this and perhaps what is the meaning behind doing this.

By the way, did you notice how GWT made you think there was some business logic taking place here (do one thing if the currencies are the same, do something different if the currencies are different) when there is no actual need for that.

In the domain you've described whats important is how one thing becomes another thing. There are lots of rules that apply to lots of fields involving transformations and changes of location. This is all HOW stuff so GWT is a bad fit.

In the business domain described the problem is that the rules the analysts write and the rules that are implemented don't remain in sync. George made a great suggestion for solving this using (golden data). An alternative would be to have the analysts write the code. This would involve the coders creating a DSL that allows the analysts to express their rules in a familiar way, whilst actually writing code. I'm not sure that this is a good idea (I'm no expert on DSL's) but I strongly believe its a much better idea than using GWT.

All best

Andrew




To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

kwangomango

unread,
Dec 21, 2016, 12:16:53 PM12/21/16
to Cukes
Brilliant, that makes sense. sa = tcr/scr * tpa is the most efficient way of describing the rule for calculating settlement amounts. Everything else is just fluff.

So how about following business rule. Is it benefiting from the GWT scenarios below or am i just restating the rule?
  • A trade cancellation message must be rejected if the trade it is cancelling does not exist

Scenario: Cancellation message is valid

Given: A trade execution message with trade id 1234 has been processed

When: A trade cancellation message with trade id 1234 is received

Then The cancellation message is valid and should be processed


Scenario: Cancellation message is  invalid

Given: A trade execution message with trade id 1234 has NOT been processed

When: A trade cancellation message with trade id 1234 is received

Then The cancellation message is invalid  and should be discarded

Thanks,

K.

Mark Levison

unread,
Dec 21, 2016, 12:21:02 PM12/21/16
to cu...@googlegroups.com
Kwangomango - I’m not Andrew, but I will step in for fun. Is this GWT the result of a conversation between the people who want/use the system and the people building it? If it is and they care about this level of detail then we’ve achieved something. Key point (and it only took a few eons for me to get it) - Cucumber is about collaboration between the business and the doers.

Cheers
Mark
-- 
Mark Levison
Sent with Airmail

kwangomango

unread,
Dec 22, 2016, 8:43:52 AM12/22/16
to Cukes
Historically, analysts were documenting business logic in isolation and handing over to testers and devs just before the start of a sprint. I am trying to replace this with a more collaborative approach. The other consideration was to get rid of their spreadsheets and have a better form of living documentation, hence my looking at SBE.

Do the scenarios i described in my last post look more appropriate?

Cheers,

K

Mark Levison

unread,
Dec 22, 2016, 9:38:06 AM12/22/16
to cu...@googlegroups.com
K/Kwangomango/Insert Real Name here, ....

To the extent I understand your problem domain they look much closer to something effective. When done well these should - bridge the communication gap between stakeholders and doers. My challenge is that I don't know what your stakeholders would value to know if this is it. However you've clearly taken to heart Andrew and George's comments about how to what to express in GWT.

I also smile you've understood the deep point of collaboration BA <-> Dev <-> QA (etc). Your focus on facilitating that will eventually lead to awesomeness. 

My tiny business AgilePainRelief Consulting is about an hour away from winding down for 2016 so I will sign off this thread. 

Merry Christmas, Happy Chanukah, .... or whatever you prefer to celebrate.
Mark 

To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

headshot-square-300x300Mark Levison | 1 (877) 248-8277Twitter | LinkedIn | Facebook
Certified ScrumMaster Training: Vancouver | Edmonton | Ottawa | Montreal | Toronto
Certified Product Owner & Private Training also available ~ Our Training Schedule
Agile Pain Relief Consulting | Notes from a Tool User
Proud Sponsor of Agile Tour Gatineau Ottawa and Agile Coach Camp Canada

Andrew Premdas

unread,
Dec 23, 2016, 3:39:36 PM12/23/16
to cu...@googlegroups.com
On 21 December 2016 at 17:16, 'kwangomango' via Cukes <cu...@googlegroups.com> wrote:
Brilliant, that makes sense. sa = tcr/scr * tpa is the most efficient way of describing the rule for calculating settlement amounts. Everything else is just fluff.

So how about following business rule. Is it benefiting from the GWT scenarios below or am i just restating the rule?
  • A trade cancellation message must be rejected if the trade it is cancelling does not exist

Scenario: Cancellation message is valid

Given: A trade execution message with trade id 1234 has been processed

When: A trade cancellation message with trade id 1234 is received

Then The cancellation message is valid and should be processed


These are so much better. But all they do is state the bleeding obvious, which perhaps isn't a bad thing, but its not a place to stop. A real important part of GWT is to challenge whats here. So here goes (kinda tongue in cheek)

So any trade that has been cancelled  should be cancelled. Aren't there any trades that can't be cancelled? Can anybody cancel anything? etc. etc.

 

Scenario: Cancellation message is  invalid

Given: A trade execution message with trade id 1234 has NOT been processed

When: A trade cancellation message with trade id 1234 is received

Then The cancellation message is invalid  and should be discarded


So how have I cancelled a trade that doesn't exist. Where is my trade? What if the trade is there but the system can't find it for a little while etc. etc.

Some more techniques
===================

Remove the fluff: - As you ask questions and explore stuff you'll see that alot of the text in your scenarios really isn't required. Its fluff. In this case 'with trade id 1234' is just fluff so get rid of it.


Pop the why stack e.g. Why should a cancellation be discarded if the trade has not been processed? This will lead to things like

 I've just created a trade by mistake
And then I cancelled it
And then the stupid system told me I couldn't cancel it because the trade isn't processed yet
But how do I know when the trade is processed
And why do I have to wait until the trade is processed to cancel it
And what the heck is processing anyhow!!!

And you can use these to pull out various scenarios e.g.

Scenario: Cancel newly created trade
Scenario: Detect attempt to cancel non existent trade
Scenario: Cancel really old trade

etc. etc.

then determine which of these are important, and refine, challenge ask why again.

Now you are getting to the heart of what makes GWT useful and practical. It really helps if you have other people around to do this stuff, but you can and should challenge yourself.

Hope thats of some use :)


Andrew

 
To unsubscribe from this group and stop receiving emails from it, send an email to cukes+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
------------------------
Andrew Premdas

kwangomango

unread,
Dec 30, 2016, 5:27:05 AM12/30/16
to Cukes
Again, that's great stuff Andrew, thanks!
I see that the real power of scenarios and GWT is to discover what you don't know - all kinds of questions and additional scenarios start falling out from the discussions and scenario creation workshops. I get that it's great as a discovery tool for what i call the high level business logic, e.g

Can you amend a cancelled trade?
How do we identify a duplicate trade? Do we discard or overwrite?
etc.

So are you saying it is basically pointless to use it to describe the rules which are mostly obvious and that every understands? e.g.

Trades can be created
Trades can be cancelled
Trades can be amended
Trades must be transformed between source and target formats

I'm looking at this mostly from a documentation/specification point of view.

So maybe i've just completely misunderstood what SBE is? All the books and blog posts i've read don't state that it is only applicable at at certain level of detail - the high level business logic i mentioned above. Or the 50000ft view as you called it. Why can't it be applicable to all levels of detail if that detail needs to be visible to lots of different people? It's all just behaviour whatever way you look at it.
I know we've discussed why my json to xml transformation logic isn't benefiting from GWT, but those many rules still need to be documented and given to a developer somehow. And automated tests should be written against those rules to prove they have been implemented correctly. And those rules need to be visible months and years in the future as living documentation when the rules need to be modified or new ones added. So SBE doesn't apply here?
I know you described some other techniques for generating documentation from the code in an earlier post but i'm not sure we'd know where to start with that. Is that what other teams are doing?

I hope that all makes sense. Lots of questions!

Many thanks,

K.
Reply all
Reply to author
Forward
0 new messages