On 03/18/2016 11:18 AM, Misagh wrote:
> I am not sure I entirely follow you here. There needs to be adaptation
> regardless of who pushes and who pulls right?
Right. But in current ConnId design the adaptation is always on the IDM
side. E.g. for iPlanetoid LDAP servers it will use cn=changelog entries.
For OpenLDAP/ApacheDS it will use syncrepl. For RDBM data it will use
SQL search with filter over the last modification column. And so on. So
the system which is the data source is not changed (except for allowing
IDM access to the data).
> if I am an app/SOR and you're pulling data from me, then you need to
> adapt to whatever I provide in terms of data schematics and endpoints
> and the semantics of all of that. And you most certainly are going to
> have to "say yes" to whatever I provide most of the times, because I
> as the app/SOR don't have the resources, time or the will to change
> myself to fit your needs. Sometimes there is no control at all because
> the SOR is a total blackbox. (i..e "we have peoplesoft, and it does
> X.") From a client standpoint, it should just work regardless of the
> technical gotchas, right? so I don't think it matters so much whether
> you're in an event-based system for pulls or some other transports.
> There always needs to be an agreement on what the data is, what it
> contains and what it means,
Yes. But the "agreement" is actually quite unilateral. I.e. we have to
adapt to whatever the application provides. And there is no room for
negotiations. E.g. If the application is AD with broken LDAP schema and
insane <GUID=...> DNs then our connector must be able to work with that.
Even though it violates the LDAP standard. What else we can do?
> which means you're going to have come up with some sort of semantics
> and standards on "this is how we accept and pull data from X. Please
> adapt". Does that make sense at all?
Well, not exactly "we". It is "they" that come up with the semantics and
define the interfaces. It is Microsoft and SAP and Peoplesoft that
define the "standards" here. And the connector has to adapt.
The standard approach was tried many times already. There was SPML1,
SPML2 and couple of other protocols before them that everybody forgot
about already. That has failed completely. Now there is SCIM. But it is
not any closer to solving this problem. Even if SCIM would be a perfect
protocol (which is not) and even if all the IDM vendors would adopt it
(which is unlikely) I do not think that Microsoft or SAP will be
providing data in SCIM format anytime soon. And there are always
homebrew applications for which SCIM and its likes are just too complex.
Writing couple hundreds lines of connector code on the IDM side is quite
easy. We have done it many times. And it works. The other option is to
go through complex negotiations about which
version/flavour/profile/interpretation of which standard is required for
all the applications to support, when they will support it, how well
they will support it, who is going to test that, who is going to fix
interoperability issues and most importantly: who is going to pay for
all the man-months spent modifying the applications.
> There are certainly no standards for this sort of thing. The TIER
> initiative as an API working group that is working on putting a
> framework and API around this sort of thing. Might be worth looking into.
Yes, it might be worth looking at. But honestly, living through all this
SPML and SCIM I'm a bit skeptical about the
design-during-standardization approach. Designing good provisioning
interface is a huge issue. It might look easy. But it is not. Not easy
at all. E.g. Waveset was one of the first companies in the IDM field.
They almost completely ruined their first iteration of provisioning API
(called "adapter" at that time). Few years later the next attempt was
made by Sun. The result was the Identity Connector Framework. But that
also failed. It was barely usable. The ConnId project had to fix a lot
of issues in the ICF framework to make it really work. And still many
issues remain:
https://wiki.evolveum.com/display/midPoint/ICF+Issues ...
and this is relatively simple Java API. Not a network protocol. Creating
a working protocol would be much harder.
Therefore my approach is slightly different: Start with a working code.
Distill API out of that. All the time make sure that the API works for
practical cases. Do it all in open source. That's what we have done with
ConnId. We have started with ICF and improved it. But now we all know
that we are getting close to a dead end. The ICF has fundamental issues.
And sooner or later we will need to make ConnId 2.0 that fixes these
problems. So, we want to make a working code that has many (real world)
tests to make sure that it works. And create a protocol out of that.
The problem is funding, of course :-) ... but anyway. For now the ConnId
1.x remains probably the best provisioning API that is available. It is
practical and it works. And I do not think that any protocol designed
from scratch by a committee can really provide much benefits. I would
really like to be proven wrong here :-) I like standards. But the
practical experience in this field suggests that standards have limited
value here.
> As the IDM, when you're pushing data out to downstream systems for
> consumption, an event based approach seems to be all the rage.
Yes and no. Vast majority of applications have synchronous APIs for user
management. So the benefit of inherently asynchronous event-based
approach is quite questionable here.
> Traditionally, I suppose folks have had native connectors for
> provisioning and these connectors hook onto the native APIs of the IDM
> to consume change events and process it.
> Connectors were native to the platform they were built for, as I
> understand your point. an LDAP one, a JDBC one, a Peoplesoft one, etc.
> To loosen the coupling, the IDM would instead send change events over
> a queue, and you'd have consumers listen on the queue, pick up events
> they care about, etc and they generally abide by the transnational
> semantics of whatever the queue implementation is, to guarantee SLAs,
> etc. The IDM cares not who's on the queue and it's free to change
> internal APIs as long the message syntax remains the same.
E.g. this is the way how Novell IDM did it. Theoretically it is very
elegant and robust. But in practice it is absolute nightmare to debug.
If you have any non-trivial provisioning topology you are going to get
lost very quickly. It is very very difficult to follow cause-and-effect
chains in any pure event-based system. It is difficult to replicate
issues due to message reordering. It is even quite hard to test it
properly. Therefore the benefits of async approach are somehow overruled
by the fact that too many people are not able to properly understand and
configure an async system.
Therefore midPoint is slightly different. We are somehow based on
events. Philosophically. We call them "deltas". But when processing a
delta we try to process the entire chain of events that are caused by
that delta. And we are processing it all using one algorithm in a single
process. Which gives system administrator quite a nice overview of what
was the cause, what was the logic to compute the effect, which
expressions were used, how the data got transformed and what the final
effect was. And we keep a firm link between the events. (we call it
"model context":
https://wiki.evolveum.com/display/midPoint/Model+Context). We can do
this as IDM is basically just a hub-and-spoke topology and we are the
hub. It is a good way to keep our sanity even in complicated scenarios
(e.g. scenarios like this:
https://wiki.evolveum.com/display/midPoint/OrgSync+Story+Test). We have
done some very complex configurations with midPoint. And this approach
saved our backsides many many times.
Which is practically almost the same approach as you describe. But the
interface is not the message format between connector and IDM. Because
there no real messages between connector and IDM. There is synchronous
Java API (which some async aspects for application->IDM
synchronization). Connector is in fact part of the IDM itself. It is a
Java code that runs in the same process as IDM. Yes, it is a pluggable
code. And the IDM-connector interface is well defined (this is the
ConnId framework). But there are not real messages. In our case the real
network interface is application-connector interface. And this is the
interface which adapts to whatever the application provides (nice LDAP,
lame LDAP, SQL, SOAP, REST, even command-line over ssh, screen scraping
and similar abominations).
> There are certainly no standards for this sort of thing either! Guess
> you could say the native messaging syntax for Active MQ is "the
> standard" one might choose. But that's hardly useful.
That's absolutely right. It is not useful at all.
> Because while you can somewhat tighten the transport model (and hope
> it never changes when you upgrade the queue impl), you still need to
> come up with a framework on "so how are we going to send these
> messages across the queue? what headers, fields, attributes, etc?"
This is what ConnId does, except that there is no queue and it is
synchronous Java API. But there is one important point: the provisioning
interface cannot have fixed attributes. Every application has a
different set of attributes. We have tried several times to create a
interface with a fixed attribute set because customers wanted that. All
such attempts failed in the end. That seemed like a benefit at the
beginning. But it has become an obstacle couple of years down the road
when the systems changed.
So, our solution is that ConnId connectors are simply just protocol
adapters. If I abstract from the details I can say that what LDAP
connector does is that it converts entries in LDAP messages to a simple
java attribute-value maps. And vice versa. It does not really
understands meaning of the attributes (except for a couple pre-defined
attributes such as password). It is the IDM itself that does most of the
the mapping and adaptation logic. Not the connector. And that makes
perfect sense: the logic is written and maintained only in one place. If
connectors would do that, then the same logic will be repeated many many
times. And this is no simple logic. It has to adapt differences in data
types, string capitalization, data normalization and matching, format
conversions, etc. E.g. in midPoint the subsystem where this code resides
has approx. 140K lines of code. Maintaining even a small part of that in
several places would be a nightmare.
> Indeed. Which sort of brings me to my next question, (and going back
> to my original query on "how do various OSS components compare"): one
> of the key strengths of Grouper, as an example, is its ability to not
> only manage groups and members, etc but also allow advanced set math
> operations on them. Such that you could construct very fancy groups,
> that are composites, whose members are a combination of group X and
> group Y but not group Z and members 1-100 (who could come from any
> source), where X and Y Z can all be composite groups with direct and
> indirect members, and so on. Turtles all the way down. Very powerful.
> Do the IDM systems of today that we know of, midPoint, Syncope, etc,
> support or provide similar functionality when they manage the org
> tree, roles, workflow processes, etc? or even groups if they do
> support that?
I do not think that any IDM system has full set algebra for groups. In
fact, most IDM systems struggle with managing group membership at all
:-) I'm not aware about capabilities of recent Syncope versions (2.x).
But midPoint definitely can do good part of what Grouper does. For
midpoint "group" is just a concrete application of the "org" concept
that we have in midPoint. Simply speaking, for us group is yet another
type of organizational unit. As org units can be placed in any number of
oriented acyclic graphs, we naturally have composition operations. With
turtles all the way down, of course. But we do not have subtraction
operations. Honestly, we try quite hard to avoid them. :-) Subtraction
complicates many things. If everything is just a composition then
merging policies is quite easy (and groups are in fact just a special
case of a policy). If there are subtractions then there may be
conflicts. E.g. in midPoint the same org or role can be assigned to the
same user several times (e.g. with slight variation in parameters). Then
if there are subtractions the evaluation order could make a difference.
This needs explicit priorities and operator/operation precedence and it
would complicate already complex system. Set operations also introduce
dependencies between groups. And we try quite hard for the roles/org not
to depend one on another. That also simplifies evaluation. And it opens
up the way for more massive parallelism in the future. Yes, these may be
limitations. But so far we have been able to do everything that we have
needed by designing good structures, using just the composition and
using conditional assignments if necessary. It works beautifully.
We might add at least some subtraction-like functionality in the future
if really necessary. But there has to be a really good use case for this
that will be worth complicating the system.
> Definitely. I would heartily encourage you to join the TIER mailing
> lists and participate in the calls. While some content may be very
> higher-ed specific, there is definitely room for contribution there.
> The WG is starting to evaluate use cases and requirements on what a
> typical IDM system might look like (based on the collective experience
> and processes of institutions in the higher-ed space and others) and
> how they could fill the gap in what would be a comprehensive IAM
> package that would include Shibboleth, Grouper, etc. The only viable
> options discussed are a combination of CoManage and Grouper and it
> would be nice to throw in midpoint and syncope for further evaluation.
> Options are a very good thing :)
That's a good advice. Thanks. I will try to find the time.