Discussion: Network Node Filter Description Data Model

8 views
Skip to first unread message

Jim Klo

unread,
Apr 30, 2013, 2:55:58 PM4/30/13
to learnin...@googlegroups.com, <learning-registry-collaborate@googlegroups.com>
Greetings,

I've created a custom enhancement for the publish service which is relatively easy to extend; which I call "validated publish".  A code release for this is here:


And documentation about the enhancement is here:


As such I'd like to integrate the functionality as a "filter" as described in the spec: 

A bit of discussion I'd like to stir and discuss is around the interpretation of this sentence:

"The data model describing a node filter; one document per node. Filters are used to restrict the resource data that is held at a node. Once the data model has been instantiated for a filter, the value of an immutable element SHALL NOT change. Other values MAY be changed only by the owner of the node document."

Do people believe that this means: 
a) 1 filter per node
b) 1 filter description document per filter implementation per node; multiple filters/filter descriptions documents allowed per node 

I'd like to interpret this as b) (and would like to clarify the language in the spec for this).  But can others define why a vs b (or vice versa) would be better than the other?

My initial thoughts are I think a node should be able to apply as many filters as it likes to the content accepted or distributed.  I can see how a) could build a filter aggregator to implement the support for multiple filters, but think b) is easier for most to understand.

Your thoughts? Please chime in…

Thanks,

- Jim


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac

Marie Bienkowski

unread,
Apr 30, 2013, 4:01:53 PM4/30/13
to <learningreg-dev@googlegroups.com>, <learning-registry-collaborate@googlegroups.com>
Nice job on the Validated publish.

Doesn't your schema service API assume that one node will have multiple filter options?  And the Filter Plugin option? It's as if you are tipping the balance toward b).

I don't see any compelling reason for a). With b) can you filter with AND boolean filters? (From Smithsonian and Schema.org compliant?)

If there is any easy and obvious and documented way to create a compound filter that gets applied to every publish (which makes a) plausible) then I'd stick with b)

 But one advantage to a is that it makes more obvious the ORDER of filters: are all filters associative?

Those are my thoughts.

Marie


Steve Midgley

unread,
Apr 30, 2013, 4:29:11 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, learnin...@googlegroups.com
Hey Jim,

I have never understood filters to be limited to one per node. Also, that "shall not change" part confuses me - does this mean once you install a filter you can't change it? 

On your question specifically, it seems unhelpful to force all your filters into one filter document? That said if you have multiple filter documents, debugging why a filter isn't working could be tricky b/c you might be letting the data in via another filter document..

On Marie's point, I think if you have two filter docs that's effectively an OR between the two filter definitions. Within a single filter you have the possibility of combining ANDs and ORs both I think?

Steve

Jim Klo

unread,
Apr 30, 2013, 4:33:38 PM4/30/13
to <learningreg-dev@googlegroups.com>, <learning-registry-collaborate@googlegroups.com>



On Apr 30, 2013, at 1:01 PM, Marie Bienkowski <marie.bi...@sri.com>
 wrote:

Nice job on the Validated publish.

Doesn't your schema service API assume that one node will have multiple filter options?  And the Filter Plugin option? It's as if you are tipping the balance toward b).


Yes, I'm advocating for b.


I don't see any compelling reason for a). With b) can you filter with AND boolean filters? (From Smithsonian and Schema.org compliant?)


Part of this depends upon how you define what a filter is… accepting data upon characteristics the data has (inclusive) OR rejecting data based upon characteristics that are missing (exclusive)…  

i.e. accept a document because it has a title vs rejecting data because it doesn't fully conform to some schema. 



If there is any easy and obvious and documented way to create a compound filter that gets applied to every publish (which makes a) plausible) then I'd stick with b)

 But one advantage to a is that it makes more obvious the ORDER of filters: are all filters associative?


True, but should order matter?  I'm assuming each filter in idempotent and not influenced by the results of other filters.  In that case filters can be executed in any order until some filter accepts or rejects.
The spec really doesn't define AFAIK if they are to be inclusive or exclusive.  I think exclusive is probably more restrictive.

Jim Klo

unread,
Apr 30, 2013, 4:51:00 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, learnin...@googlegroups.com
Thanks Steve,

comments below:

On Apr 30, 2013, at 1:29 PM, Steve Midgley <steve....@mixrun.com>
 wrote:

Hey Jim,

I have never understood filters to be limited to one per node.


Hence my "B" interpretation.


Also, that "shall not change" part confuses me - does this mean once you install a filter you can't change it? 


There's language all over the spec like this… technically you can just say any change results in a 'new node'… which is just terrible terminology IMO.


On your question specifically, it seems unhelpful to force all your filters into one filter document? That said if you have multiple filter documents, debugging why a filter isn't working could be tricky b/c you might be letting the data in via another filter document..

Let's take debugging vs production out of the equation. Debugging a filter can always be handled in isolation.

It really comes down to if filters are intended to be inclusive or exclusive.  Inclusive says "i have to explicitly say yes to keep that doc"… Exclusive says "I have to explicitly say to deny that doc".

Possibly the filter definition doc needs to describe whether it's inclusive or exclusive?  Then they multiple filters can be applied in a predictable manner.


On Marie's point, I think if you have two filter docs that's effectively an OR between the two filter definitions. Within a single filter you have the possibility of combining ANDs and ORs both I think?

Refer to my last statement.



Steve



On Tue, Apr 30, 2013 at 2:55 PM, Jim Klo <jim...@sri.com> wrote:
Greetings,

I've created a custom enhancement for the publish service which is relatively easy to extend; which I call "validated publish".  A code release for this is here:


And documentation about the enhancement is here:


As such I'd like to integrate the functionality as a "filter" as described in the spec: 

A bit of discussion I'd like to stir and discuss is around the interpretation of this sentence:

"The data model describing a node filter; one document per node. Filters are used to restrict the resource data that is held at a node. Once the data model has been instantiated for a filter, the value of an immutable element SHALL NOT change. Other values MAY be changed only by the owner of the node document."

Do people believe that this means: 
a) 1 filter per node
b) 1 filter description document per filter implementation per node; multiple filters/filter descriptions documents allowed per node 

I'd like to interpret this as b) (and would like to clarify the language in the spec for this).  But can others define why a vs b (or vice versa) would be better than the other?

My initial thoughts are I think a node should be able to apply as many filters as it likes to the content accepted or distributed.  I can see how a) could build a filter aggregator to implement the support for multiple filters, but think b) is easier for most to understand.

Your thoughts? Please chime in…

Thanks,

- Jim


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac



--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.
 
To post: learning-regis...@googlegroups.com
To unsubscribe:learning-registry-collabo...@googlegroups.com
 
For more options, visit this group at
http://groups.google.com/group/learning-registry-collaborate?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Marie Bienkowski

unread,
Apr 30, 2013, 4:51:03 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, learnin...@googlegroups.com
I THINK THAT SHALL NOT CHANGE (oops, had caps lock on) means that once you set a filter you can't change it. b/c then the node will have a mix of data.

Marie



On Apr 30, 2013, at 1:29 PM, Steve Midgley <steve....@mixrun.com> wrote:

Jim Klo

unread,
Apr 30, 2013, 4:55:55 PM4/30/13
to <learningreg-dev@googlegroups.com>, <learning-registry-collaborate@googlegroups.com>
Oh… yelling are we? :-)

Good point… changing filters would require all documents to be refiltered - which shouldn't be an unreasonable thing to perform.

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International

On Apr 30, 2013, at 1:51 PM, Marie Bienkowski <marie.bi...@sri.com>
 wrote:

Grata, Walt

unread,
Apr 30, 2013, 4:57:52 PM4/30/13
to learning-regis...@googlegroups.com, <learningreg-dev@googlegroups.com>
With refiltering removing documents that are no longer valid is doable, but getting documents that have been filtered out in previous replication would be much more difficult.

Joshua Marks

unread,
Apr 30, 2013, 5:06:24 PM4/30/13
to learning-regis...@googlegroups.com, learnin...@googlegroups.com

Jim,

 

I see value in defining different filter/validation class profiles as documents that can be shared and installed (Or updated and removed) in a given node. For example a state might publish a policy document/filter about how to correctly align to the official state standards, and call that an acceptable and specific form of LRMI record. So you might want to check for and accept multiple LRMI or DC  based profiles and vocabularies. Debugging will have to consider permutations as Steve points out. Similarly different standards for paradata assertions might be propagated such as correct application of the OER rubric or some common rating scheme. Having an ‘and/or’ logic option on the aggregate filter seems a good idea.

 

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

 

I welcome you to become a member of the Curriki community, to follow us on Twitter and to say hello on our blogFacebook and LinkedIn communities.

--

--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.
 
To post: learning-regis...@googlegroups.com

Jim Klo

unread,
Apr 30, 2013, 5:10:47 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, <learningreg-dev@googlegroups.com>
On Apr 30, 2013, at 1:57 PM, "Grata, Walt" <walt.gr...@adlnet.gov>
 wrote:

With refiltering removing documents that are no longer valid is doable, but getting documents that have been filtered out in previous replication would be much more difficult.


Yet another excellent point. Those of you keeping score, we're at +2 for why filters can't change.

However one could say, it doesn't matter… it would just mean the existing data set would have to be invalidated, and moving forward new could take it's place.

Jim Klo

unread,
Apr 30, 2013, 5:34:31 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, <learningreg-dev@googlegroups.com>
Valid points!  Here's a follow-on.  The key here is that a service provider connecting to a node would need to use a data service or other mechanism to get only at only the data they want.

Using NSDL as the poster-child; since they recently mentioned their process at some point included slicing the data crudely then inspecting and validating each document on the receiving end of a harvest would be the way for an integrator/state to get data from this node without setting up their own node.  The other option would be standing up your own node with only the filter policy you want.  Then your node will only contain that data - no extra filtering required.

It seems to me if filters are inclusive, then adding additional filters should be okay since it's letting more data in.  With regard to deletion policy where "first one wins" - I think changing filters on a node would be okay - but only if they are inclusive.  The larger challenge I think (which is more technical than anything else) is having distribution partners resubmit historical data so you can revalidate your filters.  I'm wondering if this even makes practical sense.

With exclusive filters the story changes - you might be making a filter that keeps data out; presumably it's the same problem as above.  Does it make sense to try and re-filter historical data or do you just 'start fresh'? To me I think wiping out the content's of your DB's might be the safest approach - but then again.. if you're changing filters to restrict more data… that means that the data you already have is inappropriate and subject to your local deletion policies right?

On Apr 30, 2013, at 2:06 PM, Joshua Marks <jma...@curriki.org> wrote:

Jim,
 
I see value in defining different filter/validation class profiles as documents that can be shared and installed (Or updated and removed) in a given node. For example a state might publish a policy document/filter about how to correctly align to the official state standards, and call that an acceptable and specific form of LRMI record. So you might want to check for and accept multiple LRMI or DC  based profiles and vocabularies. Debugging will have to consider permutations as Steve points out. Similarly different standards for paradata assertions might be propagated such as correct application of the OER rubric or some common rating scheme. Having an ‘and/or’ logic option on the aggregate filter seems a good idea.
 
Joshua Marks
CTO
Curriki: The Global Education and Learning Community
 
I welcome you to become a member of the Curriki community, to follow us on Twitter and to say hello on our blogFacebook and LinkedIn communities.
 
To unsubscribe:learning-registry-collabo...@googlegroups.com

 
For more options, visit this group at
http://groups.google.com/group/learning-registry-collaborate?hl=en?hl=en
 
--- 
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.
 
To post: learning-regis...@googlegroups.com
To unsubscribe:learning-registry-collabo...@googlegroups.com

Steve Midgley

unread,
Apr 30, 2013, 6:16:22 PM4/30/13
to <learning-registry-collaborate@googlegroups.com>, <learningreg-dev@googlegroups.com>
To me, I've always thought of filters as gateways, like iptables filters (not to go all network on a problem again). You set them, while they are in operation they work against matching traffic. When you change them they work against the new rules. So changing a filter is no big deal since you're just expressing what you don't want coming into your network *at that time* -- if you let some traffic that matches through in the past that's ok (or in our case you can eliminate that traffic from the local datastore).

I can't see why it's problem to have filters that change over time, so long as people understand what they're getting (and not getting). Adjusting a filter means that you are adjusting your rules *for the future* not for the past. It's not like a SQL query it's like an iptables definition. If you want to adjust your definitions for the past, you should zero out your database, and do a brand new filtered replication from an LR node which is definitive?

On the inclusive and exclusive question, I think it's pretty traditional in the iptables view of filters to have both kinds of filters. "Throw away everything that matches" and "Only allow everything that matches." Having both types of filters seems really powerful. So that would suggest that having multiple filters would be really great -- at minimum I could have my blacklist filter (exclusive) and a whitelist filter (inclusive), which together would let me describe the kind of content I want.

Most likely the inclusive would be used to restrict to certain kinds of publishers (e.g., only accept data from a list of trusted providers). The exclusive would be used to filter out certain kinds of traffic (e.g., don't take in paradata payloads).

So I'm advocating for:

1) Nodes can have multiple filters
2) Each filter can be inclusive or exclusive (but a single filter can't be both)
3) Filters can be changed freely, and changes operate only on a go-forward basis
3a) Would be nice if filters could be applied to local datasets to expunge things, when a filter is changed
3a1) nb. Filter changes only expunge things, they don't re-pull old data from replication partners

Thoughts?
Steve

Reply all
Reply to author
Forward
0 new messages