Evaluation of Identity/Person registries

409 views
Skip to first unread message

Misagh M

unread,
Feb 26, 2016, 12:22:53 PM2/26/16
to Identity Ecosystem
Happy Friday!

The IAM team at my company, Unicon, is started to evaluate open-source solutions and options that deal with identity and person registries, particularly those that may have special niches useful for the HigherEd space and use cases. We have found that this is a difficult problem to solve, and there are many varying use cases today employed by institutions that in general it may be difficult for one system to manage and rule all. 

So in today's IAM ecosystem, I think there are a few very good candidates. WSO2, Apache Syncope and midPoint. 

- Does this group have any recommendations as for which platform may be suited for which type of architecture and deployment?
- Does this group have some sort of comparison chart that would outline the differences, pros/cons of said platforms? I'd be specially interested in factors such as build/deployment practices, tech stack used, UX, ease of maintenance and upgrade, ease of extension, provisioning functionality and ease of integration with other common OSS IAM platforms of today such as CAS, Shibboleth, Grouper, etc.

Feedback is most welcomed. Online demos/videos? even better :)

Misagh

Shawn McKinney

unread,
Feb 28, 2016, 11:17:07 AM2/28/16
to identity-...@googlegroups.com
Misagh,

Others can chime in but unfortunately I'm not aware of any docs such as these. 

Indeed this is one of the reasons we started the OSS Idm ecosystem in the first place.

What we're trying to do now, is establish guidelines to allow cross-organizational cooperation that will be needed before we can gather these data points.

Best,

Shawn

--
You received this message because you are subscribed to the Google Groups "Identity Ecosystem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to identity-ecosys...@googlegroups.com.
To post to this group, send email to identity-...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/identity-ecosystem/a8e0e58a-0610-4f13-8bce-2ff492c969e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Radovan Semancik

unread,
Feb 29, 2016, 4:45:39 AM2/29/16
to identity-...@googlegroups.com
Hi Misagh

and welcome on board.

On 02/26/2016 06:22 PM, Misagh M wrote:
> - Does this group have any recommendations as for which platform may
> be suited for which type of architecture and deployment?

As there is a strong presence of Syncope and midPoint in this group I
think that the recommendations will be no surprise.

I can talk about midPoint. It is currently deployed at several academic
institutions. I can make sure you will get the list of you are
interested. The challenges of academic environment are indeed quite
unique. And I will not pretend midPoint is a silver bullet to solve them
all. But I can say we are quite successful to solve at least some of
them. And we have plans to work on the academia-related features in the
future.

> - Does this group have some sort of comparison chart that would
> outline the differences, pros/cons of said platforms?

There is comparison: https://compare.evolveum.com/

But please take it with a grain of salt. It was done by our team and
therefore it might be a bit subjective. It is also getting a bit dated
now. I have consulted some parts of that with Syncope team, but it is
quite expected that the Syncope team has quite a different point of view.

WSO2 Identity Server is not part of that comparison. It's provisioning
capabilities are not really powerful and it is questionable whether that
product really belongs to the IDM field.

> I'd be specially interested in factors such as build/deployment
> practices, tech stack used, UX, ease of maintenance and upgrade, ease
> of extension, provisioning functionality and ease of integration with
> other common OSS IAM platforms of today such as CAS, Shibboleth,
> Grouper, etc.

Build, deployment, tech stack and UX is very similar for both midPoint
and Syncope. It is maven, Java, web-app form factor, spring, Apache
Wicket for both products. Even the look and feel is similar, as we are
both using the same template. The code repository is git for both teams.
And both teams are very open to contributions. One under-the-hood
difference is internal architecture. MidPoint has a clean architecture
and component structure from the very beginning. For Syncope the team is
working on proper component structure right now (in Syncope 2).

AFAIK, the WSO2 Identity Server is much different here. It is still
maven and Java, but it requires components from the WSO2 platform. The
source code is maintained in fairly closed Subversion repository. I've
tried to cooperate with WSO2 a couple of years before and it haven't worked.

Comparison of provisioning capabilities is not straightforward, as the
products use slightly different mechanisms. But the study that I've
mentioned above may give you some idea about midPoint and Syncope.
Especially this table: https://compare.evolveum.com/features-table.html
When it comes to WSO2 as far as I know the only option is SCIM and
overall the provisioning options are extremely limited.

Extensibility of midPoint is quite easy. The data model is extensible by
providing XML schema definition and dropping it to midPoint instance. No
rebuild is necessary, not database schema change is necessary. Syncope
has a similar concept, although it is not XSD-based. Extensibility of
the code seems to be similar in both Syncope and midPoint. You can
create overlay projects with your customizations. MidPoint has a nice
set of interfaces to the "core" that are stable well documented
available as local Java, REST or SOAP. Syncope has something similar,
although it is REST-only (as far as I remember).

MidPoint was tested with CAS and Shibboleth. We have a nice use-case for
integration of midPoint with CAS and SAML libraries. In that case CAS
acts as SAML service provider, takes the token, invokes midPoint
services to make just-in-time provisioning and the passes the session to
the application. This is part of the Open Data Node solution
(http://opendatanode.org/). Similar midPoint-CAS integration is part of
open data initiative used to manage publishing of open data set by
Slovak government.

We also had some experiments with Shibboleth and we are confident that a
similar integration will work. But there is no productized solution or
reference for Shibboleth yet. Currently we are exploring the
possibilities to integrate with Groupper. Our guys are working on that
right now.

I think Syncope also has some CAS/Shibb integration stories. But I'm
sure Francesco will tell these by himself :-)
When it comes to WSO2, I do not think that there is any substantial
integration with non-WSO2 products. As WSO2 is a complete platform, I
can quite understand that the motivation to integrate outside the
platform may be somehow limited. Maybe I'm wrong ... but ... for
example some time ago I was looking for Apache module for WSO2 Identity
Server. The WSO2 guys ensured me that such module exists. But I was not
able to find a source code nor to get any further information about this
from WSO2.

Disclaimer: these are my personal opinions. I'm sure the
midPoint-related data are quite precise. The Syncope guys will surely
correct me if I got something wrong. As of WSO2 I'm describing the state
that I last had a look which was 1-2 years ago. The situation may be
different now.

> Feedback is most welcomed. Online demos/videos? even better :)

MidPoint online demo is here: http://demo.evolveum.com/
The demo description is here:
https://wiki.evolveum.com/display/midPoint/Live+Demo

--
Radovan Semancik
Software Architect
evolveum.com

Francesco Chicchiriccò

unread,
Feb 29, 2016, 8:07:21 AM2/29/16
to identity-...@googlegroups.com
Hi Misagh,
you can find my replies embedded below, with reference to Apache Syncope, naturally.

Regards.


On 26/02/2016 18:22, Misagh M wrote:
Happy Friday!
The IAM team at my company, Unicon, is started to evaluate open-source solutions and options that deal with identity and person registries, particularly those that may have special niches useful for the HigherEd space and use cases. We have found that this is a difficult problem to solve, and there are many varying use cases today employed by institutions that in general it may be difficult for one system to manage and rule all. 

So in today's IAM ecosystem, I think there are a few very good candidates. WSO2, Apache Syncope and midPoint.

Not sure if I would include WSO2 Identity Server in the comparison, looks more like the identity backend of the reference WSO2 infrastructure rather than a general-purpose identity manager as Apache Syncope and midPoint.
But I am not expert of it, so can't be sure of this point.

Anyway, we don't have (yet?) any WSO2 representative here, so I'd stick with other options.


- Does this group have any recommendations as for which platform may be suited for which type of architecture and deployment?
- Does this group have some sort of comparison chart that would outline the differences, pros/cons of said platforms? I'd be specially interested in factors such as build/deployment practices, tech stack used, UX, ease of maintenance and upgrade, ease of extension, provisioning functionality and ease of integration with other common OSS IAM platforms of today such as CAS, Shibboleth, Grouper, etc.

Unfortunately, we don't have such ecosystem-wide recommendations and comparison yet, but this should be definitely on purpose.
Radovan published a while ago some work within this regard: besides being now outdated, we clearly don't agree on some points, as you can imagine.
We might take the opportunity to jointly review and update in the near future.

Apache Syncope supports Apache Tomcat / TomEE, JBoss / Wildfly and Glassfish deployment, is able to natively manage its internal storage on PostgreSQL, MySQL, MariaDB, Oracle DB and MS SQL Server.

You can check a brief intro about high-level architecture and components at [1].
Syncope supports high-availability and flexible deployment scenarios, having the REST APIs exposed by the core as pivot element.

As we often repeat in the user@ and dev@ mailing lists, Syncope is extensible by design: since the beginning we have designed its architecture with the caveats of Sun Identity Manager (now Oracle Waveset) in mind.
The most effective way to work with your own deployments is to generate a multi-module Maven project which allows to rely on / extend / replace each single feature available.
Moreover, this approach permits to stay in sync with updates and to easily get all fixes provided with new releases.

About reference cases, as Tirasa we have gathered some of them [2] (where the hardest part is to be granted for publication...), even though we are every day discovering new implementations all around the world (Holland, Germany, Canada, Finland, South Africa, Colombia, Argentina, Mexico, USA, Russia, China, ...), especially in the academic and healthcare fields.
Most - if not all - of such stories feature integration with access management technologies (especially CAS and Shibboleth) even though they are made available as delivery solutions rather than bundle packages.

I firmly believe that one of the added-values of Apache Syncope is the fact that it is Apache Syncope - and these are also some supporting arguments for Tirasa choice of bringing its former Syncope IdM at The Apache Software Foundation in 2012:
  • we are working in strict cooperation with other projects at The ASF, in particular CXF, Wicket and OpenJPA; consequences of this direct cross-project involvement are particularly evident when it comes to report and fix issues, submit improvements and enjoy new features - a clear and recent sample of that being the Swagger UI extension in Syncope 2.0 (see blog post and video below)
  • we enjoy legal support, checks and protection provided by the ASF legal team (especially for releases)
  • we are upstream for at least three different IDaaS cloud providers
  • platform users benefit of more vendor-neutral project management (open source is nice, open government is a completely different story)
In the next months we should be ready to release Apache Syncope 2.0 (2.0.0-M1 is already out), which brings a whole lot of new features, fixes, improvements and refactoring (summary at [3]), while the current stable version is 1.2.


Feedback is most welcomed. Online demos/videos? even better :)

You can find different ways to try out Syncope at [4] (several cloud providers, live demo, ..), even though all such resources are currently set for 1.2.
For 2.0 the most effective way is probably to take a look at the Standalone distribution [5].

There are blog posts (featuring videos) introducing some of the most relevant new features available with Syncope 2.0:


Final note: ConnId - the underlying provisioning framework shared by Apache Syncope and midPoint - is IMHO a solid sample of cooperation among "competitor" Open Source projects, and somewhat anticipates the concepts we are trying to support here with the Identity Ecosystem effort.

Should you have specific questions, I am available.
Regards.

[1] http://syncope.apache.org/docs/getting-started.html#a-bird-s-eye-view-on-the-architecture-of-apache-syncope
[2] http://syncope.tirasa.net/success-stories.html
[3] http://cwiki.apache.org/confluence/display/SYNCOPE/Jazz
[4] http://syncope.tirasa.net/trysyncope.html
[5] http://syncope.apache.org/docs/getting-started.html#standalone
-- 
Francesco Chicchiriccò
Tel +393290573276

Amministratore unico @ Tirasa S.r.l.
Viale D'Annunzio 267 - 65127 Pescara
Tel +39 0859116307 / FAX +39 0859111173
http://www.tirasa.net

Involved at The Apache Software Foundation:
member, Syncope PMC chair, Cocoon PMC, Olingo PMC, CXF committer
http://home.apache.org/~ilgrosso/

"To Iterate is Human, to Recurse, Divine"
(James O. Coplien, Bell Labs)

Misagh M

unread,
Mar 16, 2016, 4:46:37 AM3/16/16
to Identity Ecosystem
Thank you all. This is extremely helpful. I'll continue to poke at the provided resources. 

Few follow-up questions:

1. My admittedly-limited understanding of the OpenICF project is that it's very much dead. Is that the case? (Dead could also mean that it's very stable) Writing provisioning connectors is definitely on the TODO list for any given deployment. If the framework is dead or inactive, would that not raise a red flag?

2. More on provisioning, I think the approach the overall community is taking towards provisioning and consumption of change events is very much moving towards message queues and asynchronous processing. AMQP and the like. Is that something that is supported by either of the IDM platforms here? Would that be a viable direction to generally adopt and recommend? and how would that align with OpenICF connectors? 

3. From a deployment perspective, what is the process of deploying IDM registry like? To draw a parallel example, if I was asked to say, deploy Grouper for a given institution, there is fine-tuned process I'd follow to assess, analyze, define and consolidate client requirements. There are questions to be asked about the most common use cases, the capability and maturity of the client environment, systems of records at hand, provisioning targets, group design and authZ rules, etc. So I'd work through these questions and concerns with an institutions with a prepped agenda (that will of course be thrown out the window once we start talking!), typically onsite for about a week until we all come to an agreement on "we shall do X". What is that process like for IDMs? Is there a defined walkthrough of a readiness assessment for any given client to comb through? What assumptions do either IDM platforms make about the deployment environment and SORs? (I realize this may be more of a question for Tirasa and midPoint, the companies, rather than for the project but feel free to skirt around the edges where appropriate)

Misagh

Radovan Semancik

unread,
Mar 16, 2016, 8:02:08 AM3/16/16
to identity-...@googlegroups.com
Hi,

See below.

On 03/16/2016 09:46 AM, Misagh M wrote:
> 1. My admittedly-limited understanding of the OpenICF project is that
> it's very much dead. Is that the case? (Dead could also mean that it's
> very stable) Writing provisioning connectors is definitely on the TODO
> list for any given deployment. If the framework is dead or inactive,
> would that not raise a red flag?

No, OpenICF is not entirely dead. But as far as I know ForgeRock is the
only entity that contributes to that. Tirasa and Evolveum do not
cooperate on OpenICF at all. Evolveum have tried to cooperate on OpenICF
for many years but that didn't really work.

Tirasa and Evolveum cooperate on ConnId project. And this is an
excellent cooperation. ConnId project originated from the same code base
as OpenICF: Sun Microsystem Identity Connector Framework. ConnId is
definitely NOT dead, albeit I have to admit that the progress on the
framework is currently a bit slow. But progress there is. And we have
good incentives to maintain that.

On the other hand, connectors themselves are in very active development.
There is a next-generation LDAP connector that we (Evolveum) have
developed during last year. It was a major investment. It is based on
Apache Directory API (as opposed to JNDI used by old connectors) and it
supports all the latest and greatest ConnId features. It has been tested
with OpenLDAP, OpenDJ, 389ds but also with AD and eDirectory - and
supports their "quirks". We have also recently developed connectors for
Atlassian Jira, SAP (that were sponsored by our customers), we have
significantly improved Unix connector that was originally developed by
Tirasa and so on. There is a lot happening in the connectors.

> 2. More on provisioning, I think the approach the overall community is
> taking towards provisioning and consumption of change events is very
> much moving towards message queues and asynchronous processing. AMQP
> and the like. Is that something that is supported by either of the IDM
> platforms here? Would that be a viable direction to generally adopt
> and recommend? and how would that align with OpenICF connectors?

There is event-based mechanism in SunICF/OpenICF/ConnId. However it is a
pull-based mechanism. The IDM is always the client, it connects to the
application and pulls the changes. As far as I know vast majority of all
IDM systems work like this (including commercial IDMs). And there are
some very good reasons for that.

It would be nice to have really async event listeners on the IDM side.
But that would significantly complicate the architecture. E.g. one of
the fundamental principles of provisioning-based IDM systems is that no
application depends on IDM. It is the IDM system that adapts, not
applications. If IDM becomes a event listener, someone needs to send the
events. Which means that the sender needs to adapt to the interface that
describes events. There is no established standard for that. Therefore
this approach is not very practical. In practice it leads to a very
complex spaghetti of inter-dependent applications that is almost
impossible to debug.

Therefore we are always pulling the events. And the IDM (connector) is
converting the events to a common format. When used properly this gives
us latencies in order of 1-10sec. Which is usually perfectly acceptable.
And this dramatically simplifies the architecture.

I'm not against full async event-based approach. Actually I'm really a
big fan of such architectures. And one day we might get to that. But now
we have to be pragmatic. The very first priority for any IDM system is
that it has to work reliably and the cost to deploy and maintain the
whole solution has to be reasonable. And the current approach gives us that.

> 3. From a deployment perspective, what is the process of deploying IDM
> registry like? To draw a parallel example, if I was asked to say,
> deploy Grouper for a given institution, there is fine-tuned process
> I'd follow to assess, analyze, define and consolidate client
> requirements. There are questions to be asked about the most common
> use cases, the capability and maturity of the client environment,
> systems of records at hand, provisioning targets, group design and
> authZ rules, etc. So I'd work through these questions and concerns
> with an institutions with a prepped agenda (that will of course be
> thrown out the window once we start talking!), typically onsite for
> about a week until we all come to an agreement on "we shall do X".
> What is that process like for IDMs? Is there a defined walkthrough of
> a readiness assessment for any given client to comb through? What
> assumptions do either IDM platforms make about the deployment
> environment and SORs? (I realize this may be more of a question for
> Tirasa and midPoint, the companies, rather than for the project but
> feel free to skirt around the edges where appropriate)

The most important fact to realize is that advanced IDM system is much
more flexible than Grouper or any other similar technology. Typical IDM
system supports users and accounts, but also entitlements,
organizational structure, roles, workflow processes, access reviews, ...

In IDM space there were traditionally methodologies that are similar to
the one that you describe. But these are mostly proprietary. I'm not
aware of any comprehensive IDM deployment methodology that is openly
published (and I will really appreciate that!). But as grouper
management features are essentially a subset of IDM features I think
that you can easily adapt grouper methodology to IDM.

However, there is one major drawback when using waterfall-like
methodologies like this one. The customer usually have only a very rough
idea what he needs. E.g. it is often very difficult to compile even such
a simple think as a list of systems that need to be integrated and
description of their interfaces. Any plans that are made at the
beginning are extremely likely to change when the reality kicks in. That
happened almost all the time when we were deploying first-generation
IDMs during 2000s. For commercial IDMs this was really the only feasible
option. But now we have other options ...

When we designed midPoint in 2010-2011 we have considered the lessons
learned by using first-generation IDM systems. We have designed midPoint
specifically to support more agile methodologies. And now we strongly
recommend to deploy midPoint is several small phases (each 1-3 months
long). Analyze systems incrementally, integrate them incrementally,
focus on the major pain points first, leave the marginally important
systems for later. E.g. midPoint is also well equipped to gradually
introduce RBAC policies. It has several enforcement modes (e.g. it can
be set up only to add privileges and never remove them, add/remove only
if there is explicit change or fully enforce the policies). MidPoint can
easily support exceptions from policies and legalize status quo (if
needed). MidPoint 3.4 will have full support for access reviews, that is
yet another method to keep semi-ad-hoc policies under control. And there
are other features that support iterative and incremental deployment.

The features are there, we have very good experience using this
approach. Yet, we haven't documented that as a comprehensive
methodology. Maybe we can cooperate on that?

Misagh

unread,
Mar 18, 2016, 6:18:45 AM3/18/16
to identity-...@googlegroups.com
On Wed, Mar 16, 2016 at 5:02 AM Radovan Semancik <radovan....@evolveum.com> wrote:

It would be nice to have really async event listeners on the IDM side.
But that would significantly complicate the architecture. E.g. one of
the fundamental principles of provisioning-based IDM systems is that no
application depends on IDM. It is the IDM system that adapts, not
applications. If IDM becomes a event listener, someone needs to send the
events. Which means that the sender needs to adapt to the interface that
describes events. There is no established standard for that. Therefore
this approach is not very practical. In practice it leads to a very
complex spaghetti of inter-dependent applications that is almost
impossible to debug.

I am not sure I entirely follow you here. There needs to be adaptation regardless of who pushes and who pulls right? if I am an app/SOR and you're pulling data from me, then you need to adapt to whatever I provide in terms of data schematics and endpoints and the semantics of all of that. And you most certainly are going to have to "say yes" to whatever I provide most of the times, because I as the app/SOR don't have the resources, time or the will to change myself to fit your needs. Sometimes there is no control at all because the SOR is a total blackbox. (i..e "we have peoplesoft, and it does X.") From a client standpoint, it should just work regardless of the technical gotchas, right? so I don't think it matters so much whether you're in an event-based system for pulls or some other transports. There always needs to be an agreement on what the data is, what it contains and what it means, which means you're going to have come up with some sort of semantics and standards on "this is how we accept and pull data from X. Please adapt". Does that make sense at all? 

There are certainly no standards for this sort of thing. The TIER initiative as an API working group that is working on putting a framework and API around this sort of thing. Might be worth looking into. 

That said, I do agree than as an IDM you always want to pull data rather than expect people/apps to push them to you. Maybe they could, maybe you expose an option for them to do so but generally you pull. To draw a parallel example, the Grouper world calls this a defined "Subject Source" where it starts to pull from these sources to populate a virtual representation of all subjects (which mostly in Grouper terminology are groups or anything than can be or go into a group). But this applies only when you're feeding data to the IDM right? As the IDM, when you're pushing data out to downstream systems for consumption, an event based approach seems to be all the rage. Traditionally, I suppose folks have had native connectors for provisioning and these connectors hook onto the native APIs of the IDM to consume change events and process it. Connectors were native to the platform they were built for, as I understand your point. an LDAP one, a JDBC one, a Peoplesoft one, etc. To loosen the coupling, the IDM would instead send change events over a queue, and you'd have consumers listen on the queue, pick up events they care about, etc and they generally abide by the transnational semantics of whatever the queue implementation is, to guarantee SLAs, etc. The IDM cares not who's on the queue and it's free to change internal APIs as long the message syntax remains the same. 

There are certainly no standards for this sort of thing either! Guess you could say the native messaging syntax for Active MQ is "the standard" one might choose. But that's hardly useful. Because while you can somewhat tighten the transport model (and hope it never changes when you upgrade the queue impl), you still need to come up with a framework on "so how are we going to send these messages across the queue? what headers, fields, attributes, etc?"


The most important fact to realize is that advanced IDM system is much
more flexible than Grouper or any other similar technology. Typical IDM
system supports users and accounts, but also entitlements,
organizational structure, roles, workflow processes, access reviews, ...

Indeed. Which sort of brings me to my next question, (and going back to my original query on "how do various OSS components compare"): one of the key strengths of Grouper, as an example, is its ability to not only manage groups and members, etc but also allow advanced set math operations on them. Such that you could construct very fancy groups, that are composites, whose members are a combination of group X and group Y but not group Z and members 1-100 (who could come from any source), where X and Y Z can all be composite groups with direct and indirect members, and so on. Turtles all the way down. Very powerful. Do the IDM systems of today that we know of, midPoint, Syncope, etc, support or provide similar functionality when they manage the org tree, roles, workflow processes, etc? or even groups if they do support that?
 

The features are there, we have very good experience using this
approach. Yet, we haven't documented that as a comprehensive
methodology. Maybe we can cooperate on that?

Definitely. I would heartily encourage you to join the TIER mailing lists and participate in the calls. While some content may be very higher-ed specific, there is definitely room for contribution there. The WG is starting to evaluate use cases and requirements on what a typical IDM system might look like (based on the collective experience and processes of institutions in the higher-ed space and others) and how they could fill the gap in what would be a comprehensive IAM package that would include Shibboleth, Grouper, etc. The only viable options discussed are a combination of CoManage and Grouper and it would be nice to throw in midpoint and syncope for further evaluation. Options are a very good thing :) 
 
Appreciate the feedback, as always. 

--
You received this message because you are subscribed to the Google Groups "Identity Ecosystem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to identity-ecosys...@googlegroups.com.
To post to this group, send an email to identity-...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/identity-ecosystem/56E94B3C.30704%40evolveum.com.

For more options, visit https://groups.google.com/d/optout.
--
-Misagh

Radovan Semancik

unread,
Mar 18, 2016, 1:44:50 PM3/18/16
to identity-...@googlegroups.com
On 03/18/2016 11:18 AM, Misagh wrote:
> I am not sure I entirely follow you here. There needs to be adaptation
> regardless of who pushes and who pulls right?

Right. But in current ConnId design the adaptation is always on the IDM
side. E.g. for iPlanetoid LDAP servers it will use cn=changelog entries.
For OpenLDAP/ApacheDS it will use syncrepl. For RDBM data it will use
SQL search with filter over the last modification column. And so on. So
the system which is the data source is not changed (except for allowing
IDM access to the data).

> if I am an app/SOR and you're pulling data from me, then you need to
> adapt to whatever I provide in terms of data schematics and endpoints
> and the semantics of all of that. And you most certainly are going to
> have to "say yes" to whatever I provide most of the times, because I
> as the app/SOR don't have the resources, time or the will to change
> myself to fit your needs. Sometimes there is no control at all because
> the SOR is a total blackbox. (i..e "we have peoplesoft, and it does
> X.") From a client standpoint, it should just work regardless of the
> technical gotchas, right? so I don't think it matters so much whether
> you're in an event-based system for pulls or some other transports.
> There always needs to be an agreement on what the data is, what it
> contains and what it means,

Yes. But the "agreement" is actually quite unilateral. I.e. we have to
adapt to whatever the application provides. And there is no room for
negotiations. E.g. If the application is AD with broken LDAP schema and
insane <GUID=...> DNs then our connector must be able to work with that.
Even though it violates the LDAP standard. What else we can do?

> which means you're going to have come up with some sort of semantics
> and standards on "this is how we accept and pull data from X. Please
> adapt". Does that make sense at all?

Well, not exactly "we". It is "they" that come up with the semantics and
define the interfaces. It is Microsoft and SAP and Peoplesoft that
define the "standards" here. And the connector has to adapt.

The standard approach was tried many times already. There was SPML1,
SPML2 and couple of other protocols before them that everybody forgot
about already. That has failed completely. Now there is SCIM. But it is
not any closer to solving this problem. Even if SCIM would be a perfect
protocol (which is not) and even if all the IDM vendors would adopt it
(which is unlikely) I do not think that Microsoft or SAP will be
providing data in SCIM format anytime soon. And there are always
homebrew applications for which SCIM and its likes are just too complex.

Writing couple hundreds lines of connector code on the IDM side is quite
easy. We have done it many times. And it works. The other option is to
go through complex negotiations about which
version/flavour/profile/interpretation of which standard is required for
all the applications to support, when they will support it, how well
they will support it, who is going to test that, who is going to fix
interoperability issues and most importantly: who is going to pay for
all the man-months spent modifying the applications.

> There are certainly no standards for this sort of thing. The TIER
> initiative as an API working group that is working on putting a
> framework and API around this sort of thing. Might be worth looking into.

Yes, it might be worth looking at. But honestly, living through all this
SPML and SCIM I'm a bit skeptical about the
design-during-standardization approach. Designing good provisioning
interface is a huge issue. It might look easy. But it is not. Not easy
at all. E.g. Waveset was one of the first companies in the IDM field.
They almost completely ruined their first iteration of provisioning API
(called "adapter" at that time). Few years later the next attempt was
made by Sun. The result was the Identity Connector Framework. But that
also failed. It was barely usable. The ConnId project had to fix a lot
of issues in the ICF framework to make it really work. And still many
issues remain: https://wiki.evolveum.com/display/midPoint/ICF+Issues ...
and this is relatively simple Java API. Not a network protocol. Creating
a working protocol would be much harder.

Therefore my approach is slightly different: Start with a working code.
Distill API out of that. All the time make sure that the API works for
practical cases. Do it all in open source. That's what we have done with
ConnId. We have started with ICF and improved it. But now we all know
that we are getting close to a dead end. The ICF has fundamental issues.
And sooner or later we will need to make ConnId 2.0 that fixes these
problems. So, we want to make a working code that has many (real world)
tests to make sure that it works. And create a protocol out of that.

The problem is funding, of course :-) ... but anyway. For now the ConnId
1.x remains probably the best provisioning API that is available. It is
practical and it works. And I do not think that any protocol designed
from scratch by a committee can really provide much benefits. I would
really like to be proven wrong here :-) I like standards. But the
practical experience in this field suggests that standards have limited
value here.

> As the IDM, when you're pushing data out to downstream systems for
> consumption, an event based approach seems to be all the rage.

Yes and no. Vast majority of applications have synchronous APIs for user
management. So the benefit of inherently asynchronous event-based
approach is quite questionable here.

> Traditionally, I suppose folks have had native connectors for
> provisioning and these connectors hook onto the native APIs of the IDM
> to consume change events and process it.
> Connectors were native to the platform they were built for, as I
> understand your point. an LDAP one, a JDBC one, a Peoplesoft one, etc.
> To loosen the coupling, the IDM would instead send change events over
> a queue, and you'd have consumers listen on the queue, pick up events
> they care about, etc and they generally abide by the transnational
> semantics of whatever the queue implementation is, to guarantee SLAs,
> etc. The IDM cares not who's on the queue and it's free to change
> internal APIs as long the message syntax remains the same.

E.g. this is the way how Novell IDM did it. Theoretically it is very
elegant and robust. But in practice it is absolute nightmare to debug.
If you have any non-trivial provisioning topology you are going to get
lost very quickly. It is very very difficult to follow cause-and-effect
chains in any pure event-based system. It is difficult to replicate
issues due to message reordering. It is even quite hard to test it
properly. Therefore the benefits of async approach are somehow overruled
by the fact that too many people are not able to properly understand and
configure an async system.

Therefore midPoint is slightly different. We are somehow based on
events. Philosophically. We call them "deltas". But when processing a
delta we try to process the entire chain of events that are caused by
that delta. And we are processing it all using one algorithm in a single
process. Which gives system administrator quite a nice overview of what
was the cause, what was the logic to compute the effect, which
expressions were used, how the data got transformed and what the final
effect was. And we keep a firm link between the events. (we call it
"model context":
https://wiki.evolveum.com/display/midPoint/Model+Context). We can do
this as IDM is basically just a hub-and-spoke topology and we are the
hub. It is a good way to keep our sanity even in complicated scenarios
(e.g. scenarios like this:
https://wiki.evolveum.com/display/midPoint/OrgSync+Story+Test). We have
done some very complex configurations with midPoint. And this approach
saved our backsides many many times.

Which is practically almost the same approach as you describe. But the
interface is not the message format between connector and IDM. Because
there no real messages between connector and IDM. There is synchronous
Java API (which some async aspects for application->IDM
synchronization). Connector is in fact part of the IDM itself. It is a
Java code that runs in the same process as IDM. Yes, it is a pluggable
code. And the IDM-connector interface is well defined (this is the
ConnId framework). But there are not real messages. In our case the real
network interface is application-connector interface. And this is the
interface which adapts to whatever the application provides (nice LDAP,
lame LDAP, SQL, SOAP, REST, even command-line over ssh, screen scraping
and similar abominations).

> There are certainly no standards for this sort of thing either! Guess
> you could say the native messaging syntax for Active MQ is "the
> standard" one might choose. But that's hardly useful.

That's absolutely right. It is not useful at all.

> Because while you can somewhat tighten the transport model (and hope
> it never changes when you upgrade the queue impl), you still need to
> come up with a framework on "so how are we going to send these
> messages across the queue? what headers, fields, attributes, etc?"

This is what ConnId does, except that there is no queue and it is
synchronous Java API. But there is one important point: the provisioning
interface cannot have fixed attributes. Every application has a
different set of attributes. We have tried several times to create a
interface with a fixed attribute set because customers wanted that. All
such attempts failed in the end. That seemed like a benefit at the
beginning. But it has become an obstacle couple of years down the road
when the systems changed.

So, our solution is that ConnId connectors are simply just protocol
adapters. If I abstract from the details I can say that what LDAP
connector does is that it converts entries in LDAP messages to a simple
java attribute-value maps. And vice versa. It does not really
understands meaning of the attributes (except for a couple pre-defined
attributes such as password). It is the IDM itself that does most of the
the mapping and adaptation logic. Not the connector. And that makes
perfect sense: the logic is written and maintained only in one place. If
connectors would do that, then the same logic will be repeated many many
times. And this is no simple logic. It has to adapt differences in data
types, string capitalization, data normalization and matching, format
conversions, etc. E.g. in midPoint the subsystem where this code resides
has approx. 140K lines of code. Maintaining even a small part of that in
several places would be a nightmare.

> Indeed. Which sort of brings me to my next question, (and going back
> to my original query on "how do various OSS components compare"): one
> of the key strengths of Grouper, as an example, is its ability to not
> only manage groups and members, etc but also allow advanced set math
> operations on them. Such that you could construct very fancy groups,
> that are composites, whose members are a combination of group X and
> group Y but not group Z and members 1-100 (who could come from any
> source), where X and Y Z can all be composite groups with direct and
> indirect members, and so on. Turtles all the way down. Very powerful.
> Do the IDM systems of today that we know of, midPoint, Syncope, etc,
> support or provide similar functionality when they manage the org
> tree, roles, workflow processes, etc? or even groups if they do
> support that?

I do not think that any IDM system has full set algebra for groups. In
fact, most IDM systems struggle with managing group membership at all
:-) I'm not aware about capabilities of recent Syncope versions (2.x).
But midPoint definitely can do good part of what Grouper does. For
midpoint "group" is just a concrete application of the "org" concept
that we have in midPoint. Simply speaking, for us group is yet another
type of organizational unit. As org units can be placed in any number of
oriented acyclic graphs, we naturally have composition operations. With
turtles all the way down, of course. But we do not have subtraction
operations. Honestly, we try quite hard to avoid them. :-) Subtraction
complicates many things. If everything is just a composition then
merging policies is quite easy (and groups are in fact just a special
case of a policy). If there are subtractions then there may be
conflicts. E.g. in midPoint the same org or role can be assigned to the
same user several times (e.g. with slight variation in parameters). Then
if there are subtractions the evaluation order could make a difference.
This needs explicit priorities and operator/operation precedence and it
would complicate already complex system. Set operations also introduce
dependencies between groups. And we try quite hard for the roles/org not
to depend one on another. That also simplifies evaluation. And it opens
up the way for more massive parallelism in the future. Yes, these may be
limitations. But so far we have been able to do everything that we have
needed by designing good structures, using just the composition and
using conditional assignments if necessary. It works beautifully.

We might add at least some subtraction-like functionality in the future
if really necessary. But there has to be a really good use case for this
that will be worth complicating the system.

> Definitely. I would heartily encourage you to join the TIER mailing
> lists and participate in the calls. While some content may be very
> higher-ed specific, there is definitely room for contribution there.
> The WG is starting to evaluate use cases and requirements on what a
> typical IDM system might look like (based on the collective experience
> and processes of institutions in the higher-ed space and others) and
> how they could fill the gap in what would be a comprehensive IAM
> package that would include Shibboleth, Grouper, etc. The only viable
> options discussed are a combination of CoManage and Grouper and it
> would be nice to throw in midpoint and syncope for further evaluation.
> Options are a very good thing :)

That's a good advice. Thanks. I will try to find the time.

Keith Hazelton

unread,
Apr 11, 2016, 8:06:15 AM4/11/16
to Identity Ecosystem
This is a great thread, and I couldn't resist jumping in.  Misagh pointed to the TIER API work, and I'm glad he did.  I'm chairing that group, and I'd be happy to be one point of information and liaison with the Identity Ecosystem group. It would be even better if some members of the Identity Ecosystem could find time to join the TIER Data Structures and APIs Working Group and/or the TIER Entity Registry Working Group. One of our weekly meetings is on Fridays at 10 am Eastern (US) time, 3 pm UTC.  The wiki home page for the WG is https://spaces.internet2.edu/display/DSAWG/TIER-Data+Structures+and+APIs+Working+Group+Home 

I'm also quite interested in comparisons of open source entity registry alternatives.

         Regards,   --Keith Hazelton
___________________

Igor Farinic

unread,
May 3, 2016, 10:45:57 AM5/3/16
to Identity Ecosystem
Hi Keith,

thank you for the invitation. I have joined the tier-api mailing list and plan to join the calls occasionally.

In our spare time we are actively working with Grouper and try to come with an integration with midPoint identity management to bring value to the Grouper community. 
We plan to write some text for it to get feedback if we are moving the right direction.

If we are able to come with something in next few days, we are still considering to attend Internet2 summit or Open Apereo conference.

regards,
Igor Farinic
Reply all
Reply to author
Forward
0 new messages