Re: Schedule Tracking

4 views
Skip to first unread message

Rob LaRubbio

unread,
Sep 15, 2011, 1:06:24 PM9/15/11
to motech-ar...@googlegroups.com
I read the top part of the link you provided then skimmed the rest.  To be honest it was a little hard to follow since it seems to be arguing with itself and never really seems to settle on a clear answer.  Anyway a couple things that stuck out to me were:

  • That's arguably a "converter", not a generator.
Legitimate Exceptions:
  • You are stuck with a verbose language that cannot be made very compact, and so need code generation to help generate the needed bulk.
  • You are not allocated time to design a decent framework
    • Even if you do have time to design and implement a decent framework, it might be better to apply that time elsewhere
  • You desire strong compile-time type checking ...
However I wouldn't even say this is code generation but rather rule generation.  I also think that is more than a semantic difference.  I see this as being no different than what the decision tree does (takes a json format and generates VXML or some other markup) or what your IVR code does with Kookoo markup, or potentially even what our web controllers do by generating HTML.  We have a system (the rules engine) that requires a certain input format.  To me that is analogous to we have a web browser that requires an input format (html) or we have an IVR that requires an input format (VXML, Kookoo tunes etc.)

I think we have a miscommunication on the looping.  I'm referring to a schedule that loops.  I'm not referring to message retry.  So imagine a system in the schedule tracking module that wants to track an event that it expects to happen each day.  Obviously you can't write an infinite schedule so instead you create one that says milestone A references milestone A.  You can do that with the rule engine, I don't think you can with a tree.

I also think we already have experience on the debugging front with the xmind -> symptom tree code generation tool.

-Rob

On 09/15/2011 12:36 AM, Vivek Singh wrote:
I didn't realize till now that we are going to generate the rule, hence I was confused about how it would all work. Now I do get that bit and at-least I understand the approach. Though I disagree with your conclusions, I am reluctantly open to us doing it this way. The reasons I disagree, I have provided below. In any case, I would create a story which can follow this approach.

Programming to generate (rule) code vs Just programming
Following link probably says it all I have to say in this matter http://c2.com/cgi/wiki?CodeGenerationIsaDesignSmell. It does talk about exceptions to this rule but I don't believe we would be up against any of them. I feel the layer indirection caused by code generation would complicate things over all than simplify.

Debugging, easier to write and maintain
I do agree that it would be easier to debug issues but only in production not during development. It would be in-fact be a nightmare because of level of indirection, as one would debugging the by-product. For the similar reasons I don't think it would be easier to write and maintain it.
Unit tests are going to be even more horrible, as one would be asserting the generated code. My guess we might be able parse the rule in some data structure and assert that, but I see it more as integration test then.

Handling looping
Looping shouldn't be handled by schedule tracking module but rather we should build this into the way we do asynchronous processing in the system, which applies to all modules. If a part of system is down because of which background jobs cannot be processed then we should have standard techniques in platform to handle it. They can be retry, retry-after-processing-other-items, dont-retry-as-it-is-time-critical, etc.

On 14 September 2011 22:22, Rob LaRubbio <rlar...@grameenfoundation.org> wrote:
Comments inline


On 09/14/2011 01:50 AM, Vivek Singh wrote
>> The larger point I'm trying to make is that I think this module maps very nicely onto a rules engine.  We have constraints within the system and a set of facts.  We expect outputs letting us know which constraints have been met.  I have confidence we can implement a subset of the above features with a linked list approach, and all of them with a graph or tree approach but then we would have to maintain that code (and I don't think it would really be as flexible or performant as just using a rules engine)

Looking at all the requirements a tree kind of representation would solve the problem even with JSON config.
Sure it would but what if we decide for the next release that we want to allow looping schedules?  Then a tree based implementation would require a rewrite while the rules based implementation would need only a tweak to the config format and rule generation code.  I don't think a looping schedule is that far-fetched either.  In the example below were I mention monitoring regular server tasks (like backups/exports) a looping schedule makes perfect sense.

This definitely wouldn't be as flexible as rule based approach, though would be slightly more domain specific expression of problem. I didn't understand why it wouldn't be as per-formant as rule based approach.
I don't see how that would be any more domain specific.  Both would have the same domani specific interface (the JSON config).  One would evaluate it into it's own data structure (a tree) and then add facts and try to solve it.  The other would evaluate it into rules and let the existing rule engine solve it. 

I don't think we can say at this point which would be more performant, but given the fact that the rule engine has logic built (Rete Algorithm) in to prune the tree so it only evaluates branches that can potentially be true, plus the fact that it's development team is larger and focused on building one thing (a rule engine) it leads me to believe it has the greater chance at being more performant.  Additionally I see this as code we don't need to spend time writing, debugging and maintaining.  Building our own sounds a lot like not-invented-here syndrome.

Since the rule based implementation would be extremely flexible and would be able to cover all scenarios (possible via JSON config) should the platform have two competing approaches to do the same thing?
I'm not sure what you mean by this.  My goal is to eliminate having two competing ways to do the same thing.  We already have the rule engine in the code, why write another smaller engine that can only evaluate this limited problem space?  Instead lets leverage the existing engine and only have one approach.  What I'm suggesting is that we write code that allows end users to code these rules up in a simpler config since I don't believe 1) our end users will be able to effectively write the rules, 2) the config also allows them to express extra actions related to the rules matching (i.e. alerts) & 3) we can add a layer of checks around the JSON config to help eliminate user error.  So we should think of this module as us writing a better UI to the rules engine (or DSL) for our users and not us writing a smaller rule engine in addition to that UI work.

-Rob





--
Vivek Singh | +91 98452 32929http://sites.google.com/site/petmongrels | petmongrels@twitter

Vivek Singh

unread,
Sep 15, 2011, 2:09:31 PM9/15/11
to motech-ar...@googlegroups.com
Important points in the link (this is a wiki managed by multiple people, hence the lack of clear cut argument):
>> If the generated code will never be touched by humans, then why not "run" things off the original input instead of the outputted code?
>> The input to the code generator is the higher abstraction that is being converted to a lower abstraction: the output code. Why is the input to the code generator not sufficient in itself?

Generating mark-up is trying to solve essential (not chosen but present) complexity of language mismatch. In-fact even there a template approach is preferred to keep the mis-match to minimum, as against generating entire HTML. The complexity of generating rules is not essential, is chosen by us as a design. "We have a rules engine" doesn't mean we have to use it.

Message that loops can be better handled using appointments module. (Going to dentist every 6 months for rest of life). We don't have to solve every problem using this. Besides if it can specified in json config then it can be handled in Java code as well.
"Anything you can do by generating code, I can do by calling data driven subroutines."

I don't like our xmind approach and was done in a hurry. Ideally we should be reading the xmind as a configuration and constructed decision tree objects at runtime. That would been more extensible. In today's call Aakash was asking whether he can change the xmind and would it work. It wouldn't because every time we would have to manually add code to the generated code. So it is not a ideal example to follow. Essentially it suffers from problem mentioned in the link:
(It's only a problem if you need to muck around with the generated code after generating it (which would violate OnceAndOnlyOnce because it would require mucking around after every time you generate it from source).)
Generating significant blocks of code to be customized for each use is BAD.

Rob LaRubbio

unread,
Sep 15, 2011, 3:04:26 PM9/15/11
to motech-ar...@googlegroups.com
Comments inline


On 09/15/2011 11:09 AM, Vivek Singh wrote:
Important points in the link (this is a wiki managed by multiple people, hence the lack of clear cut argument):
But I think we have good answers to those questions, and like you say the wiki doesn't really take a clear stance on the issue.  In fact it doesn't even define the issue.  Are we talking about Passive or Active code generations?

>> If the generated code will never be touched by humans, then why not "run" things off the original input instead of the outputted code?
1) The humans who will write the original code may not be capable of writing the generated rules
2) We can provide compile time checks on the original code as opposed to execution time checking on the generated rules
3) The original code is a superset of the rules engine since it also allows the specification of alert schedules
4) The rule engine doesn't run off the original input

I'm sure there are more those are just off the top of my head.


>> The input to the code generator is the higher abstraction that is being converted to a lower abstraction: the output code. Why is the input to the code generator not sufficient in itself?
For the reasons listed above.  However you could also say this is the same reason why we are writing the app in Java and not byte code.


Generating mark-up is trying to solve essential (not chosen but present) complexity of language mismatch. In-fact even there a template approach is preferred to keep the mis-match to minimum, as against generating entire HTML.
I would be very disappointed if we did not use templates to generate the rules.  Again I really see this as no different.  We have an interpreter (browser == rules engine) and we need to generate input for it.

The complexity of generating rules is not essential, is chosen by us as a design. "We have a rules engine" doesn't mean we have to use it.
First I question that the generation of the rules is complex.  This should be a fairly straight forward using a template.  Additionally we have a solution that maps perfectly to the problem.  This is a logic problem that we are trying to use data structures and algorithms to solve.  Choosing to write code that you don't have to is rarely a good engineering choice.


Message that loops can be better handled using appointments module. (Going to dentist every 6 months for rest of life). We don't have to solve every problem using this. Besides if it can specified in json config then it can be handled in Java code as well.
"Anything you can do by generating code, I can do by calling data driven subroutines."
I'm not talking about going to the dentist.  I'm talking about each day we expect an input from some remote system, then an output to different remote system followed by two other actions that can occur in any order.  That isn't something you can do with the apt module but is something you could do with this module if it allowed loops in the schedule.

Again just because you can do it by calling data driven subroutines doesn't make that the best engineering choice.


I don't like our xmind approach and was done in a hurry. Ideally we should be reading the xmind as a configuration and constructed decision tree objects at runtime. That would been more extensible. In today's call Aakash was asking whether he can change the xmind and would it work. It wouldn't because every time we would have to manually add code to the generated code.
Why doesn't the xmind XML get converted to the JSON that is the input to the decision tree?  If it was done that way then Aakash could update the trees.

So it is not a ideal example to follow.
Ok, then what about Spring Roo, annotations or aspects? :)

Essentially it suffers from problem mentioned in the link:
(It's only a problem if you need to muck around with the generated code after generating it (which would violate OnceAndOnlyOnce because it would require mucking around after every time you generate it from source).)
Generating significant blocks of code to be customized for each use is BAD.
Again if we went xmind -> json we wouldn't have this problem.  It is since you are going xmind -> code that you do.  Similarly we are going json -> rules not json -> code so changing the json allows for regeneration of the code without the issue you mention above.

-Rob

Rob LaRubbio

unread,
Sep 16, 2011, 12:06:15 AM9/16/11
to motech-ar...@googlegroups.com
I was giving this a little more thought, and like most disagreements I think there are some communication gaps.  I think we are actually talking about two things and we've been confusing them.  One is if using the rule engine as the base implementation is appropriate.  The second is about the interface to that.

So I was wondering first what thoughts you might have about the rule engine.  Do you think it's appropriate for this problem?  Second if you don't like the json DSL for it, then can you propose alternatives that you think are better?

Additionally the rule engine it does have a facility for defining DSL

http://docs.jboss.org/drools/release/5.3.0.Beta1/drools-expert-docs/html_single/index.html#d0e6300

However that doesn't sound too different from what the original design proposal is, just that they provide the implementation that does the conversion to the more verbose rule language.

-Rob

Vivek Singh

unread,
Sep 16, 2011, 2:02:56 AM9/16/11
to motech-ar...@googlegroups.com
JSON DSL
I very much like the JSON DSL we have come up with as it clearly (in domain specific way) articulates the configuration of schedule. It can be better but then we would have to use ruby etc, which would be overkill. So yes what we have got is good.

Using rules engine directly
Rules engine may or may not be a good way to solve this problem. Java program are equally equipped to do implement this logic. I would introduce rules engine only if the end users want to edit them. In this case it would be too much for them hence I wouldn't.

Generating rules from Java/JSON
I think this is overkill. I don't think we need to generate rules from the Java reading JSON configuration. This is where I go to principle that "if my program has intelligence to generate a code which would do the job, then it also has intelligence to do the job directly". I do not agree with the premise that if we generate the rule from Java reading the JSON, somehow it would be easier to maintain than doing it in Java directly.
On one hand to enhance/bug-fix I can write unit tests and change Java program. While on the other hand I would have to change Java program to generate right rules which do the right job. I find this additional step unnecessary.

So overall we should have JSON based configuration (enhanced like message campaign email you send) and a Java program which does the job.

Rob LaRubbio

unread,
Sep 16, 2011, 7:25:36 PM9/16/11
to motech-ar...@googlegroups.com
Just a couple of responses:

1) I don't think the only reason to use the rule engine is because end users would want to tweak the rules (although they will by tweaking the JSON).  We should use the right tool for the job and when the job is one that is logic based we should use a tool that is optimized for that.

2) Continuing on the right tool for the right job theme, I'm not disputing that a java program can solve the problem (especially since the rules engine is written in Java).  I'm saying we can solve the problem easier using a different tool.  For examples search for soduku solvers and compare the solutions in a logic language like prolog to the ones in other languages like ruby, python and java.  The length and complexity difference is dramatic.

3) "if my program has intelligence to generate a code which would do the job, then it also has intelligence to do the job directly"  If from the JSON we were generating the Java code that you are planning to write and it was solving the problem without the help of any other system then I would agree.  However from the JSON we aren't generating code or even doing anything particularly intelligent.  We are generating rules.  The rules aren't intelligent they are simply a description of the problem to be solved.  We are leaving finding the solution to that problem to the rules engine.  I do think there is a fundamental difference here.

4) I also think we need to have some forethought about where this module is going.  We've already moved from a linked list based solution to a tree.  Pretty soon it wouldn't surprise me if we have to ditch the tree for a directed acyclic graph and then for a graph and then for who knows what.  I expect us to eventually need to solve for schedules with cycles between nodes, node dependencies (i.e. Milestone C follows A but only after B is complete) and even references where a particular path is only valid if the preceding milestone was completed with a certain outcome or associated metadata.  All of these enhancements are trivial with a rules engine.  Each adds complexity and code to a solution that we develop ourselves.

I do really enjoy these debates and I think I learn a lot from them so know that I appreciate having them, but I think we've debated this long enough and I'm not sure I see a path to agreement.  We can discuss this on the Tues. call if people think it will help, if not I'm going to have to ask that we at least spike out the rules engine solution.  If there is a fundamental reason why it won't work then we can revisit the decision.

-Rob
Reply all
Reply to author
Forward
0 new messages