sub-filter

6 views
Skip to first unread message

Yves Savourel

unread,
Feb 28, 2012, 11:10:31 AM2/28/12
to okapi...@googlegroups.com
Some aspects we have not discussed yet about sub-filter:

- How do we specify how the enclosing events are generated? For example, in some case we may want the content parsed by the sub-filter to be in a sub-document, in other case in a group. Note that if it's sub-document that eliminate some filters to be sub-filters as they have themselves sub-documents and those cannot be nested.

Is the caller responsible for converting the sub-filter start/end-doc into start/end-subdoc or start/end-group? or the sub-filter itself?

- Can we provide some kind of parameter to indicate what the sub-filtered content is, so its sub-doc or group type can be set?

-ys

Jimbo

unread,
Feb 28, 2012, 11:51:34 AM2/28/12
to okapi...@googlegroups.com, Yves Savourel
On 02/28/2012 09:10 AM, Yves Savourel wrote:
> Some aspects we have not discussed yet about sub-filter:
>
> - How do we specify how the enclosing events are generated? For example, in some case we may want the content parsed by the sub-filter to be in a sub-document, in other case in a group. Note that if it's sub-document that eliminate some filters to be sub-filters as they have themselves sub-documents and those cannot be nested.

I would like to understand this better - I remember our debates left it
unclear the difference between a true subdocument, subfilter start and
group. We will need to revisit this.

I think we made a good case that subdocument was to be used for a true
embedded document - not change of filter. But the reasoning is old and I
don't fully recall the details.

>
> Is the caller responsible for converting the sub-filter start/end-doc into start/end-subdoc or start/end-group? or the sub-filter itself?

The subfilter should do it - it should be seamless to the callers. I'm
thinking on this now and will have html filter as a prototype of how
this can work. I still think AbstractFilter could hold much of this
logic and all filters should implement it - another option is to
delegate this code to a helper class (composition vs inheritance)


>
> - Can we provide some kind of parameter to indicate what the sub-filtered content is, so its sub-doc or group type can be set?

I am putting all these types of parameters in the FilterState class - we
will need to establish sane defaults for the methods that don't take
this parameter.

>
> -ys
>

Yves Savourel

unread,
Feb 28, 2012, 11:58:40 AM2/28/12
to okapi...@googlegroups.com
> I think we made a good case that subdocument was
> to be used for a true embedded document - not change of
> filter. But the reasoning is old and I don't fully recall
> the details.

Yes, I think we did.
So group would be the default.
But I was wondering if some sub-filters are not used for sub-document. Transifex may be such a case.


> The subfilter should do it - it should be seamless to
> the callers. I'm thinking on this now and will have html filter
> as a prototype of how this can work.

Good. At some point we should move our initial Drupal code to the subfilter branch.
Does it compile?


> I am putting all these types of parameters in the
> FilterState class - we will need to establish sane defaults
> for the methods that don't take this parameter.

Good.

-ys


Jimbo

unread,
Feb 28, 2012, 12:02:48 PM2/28/12
to okapi...@googlegroups.com, Yves Savourel
Doesn't compile yet - but little is broken since we went with the
separate ISubFilter interface.

I would wait a little bit longer - once I finish the html filter I think
the interfaces/classes will be more stable.

You can continue to develop the drupal filter using IFIlter only - then
add the ISubfilter interface later - I think the subfilter changes will
not be destructive to your current work - at least that is the plan.

Jim

Jim Hargrave

unread,
Feb 29, 2012, 6:03:47 PM2/29/12
to okapi...@googlegroups.com
If subfiltered formats can be large we will need a better mechanism for delegating subfiltered events to the parent filter's next/hasNext methods. Currently in AbstractMarkupFIlter I buffer all the events in memory then return control to the parent filter - I don't think this is generally the best design.

In the GS system *everything* is buffered so this isn't a problem. 

Let me give this some thought - ideally I would like to solve this once and have all filters use it. But this may be difficult given the range of implementations. 

Jim

Jim Hargrave

unread,
Feb 29, 2012, 6:50:14 PM2/29/12
to okapi...@googlegroups.com
I suggest all filters have a "IFilter currentFilter"  - this is where you pull events. This variable can either be set to a subfilter or main filter (aka "this")

This also means IFilter must implement ISubfilter to keep things simple otherwise you end up with a lot of casting. Those filters that can't be used as a subfilter will just throw NotImplementedException for each method.

This could be done by default in AbstractFilter which will cover ours and Sergeis filters.

Jim

Jimbo

unread,
Mar 1, 2012, 7:51:29 PM3/1/12
to okapi...@googlegroups.com
I have reduced the clutter down to two IFilter methods. Using a separate ISubfilter interface became too much trouble. ISubfilterPostProcessor takes care of any transformations on the events needed by the subfilter - for example removing Start/End document events, updating TextUnit names etc.. There is a DefaultSubfilterPostProcessor class for reference.

FilterState now holds a number of things useful for the subfilter post processsing.

Latest code is checked into the subfilter branch. Comments welcome.

/**
     * Opens the input document as a subfilter.
     *
     * @param input The {@link RawDocument} object to use to open the document.
     * @param generateSkeleton generate skeleton?
     * @param postProcessor {@link Event} post-processor for this filter.
     */
    public void openAsSubfilter(RawDocument input, boolean generateSkeleton, ISubfilterPostProcessor postProcessor);
   
    /**
     * Is this filter currently being used as a subfilter?
     */
    public boolean isSubfilter();

Yves Savourel

unread,
Mar 1, 2012, 9:03:21 PM3/1/12
to okapi...@googlegroups.com

Hi Jim,

 

Looks ok to me.

A few notes:

 

-   It shouldn’t be too bad to implement the code to all filters. As stubs first in the worst case.

 

-   I’m looking forward to see how this will work for merging :) I suppose the start/end group event generated by the converter will need to be detected and converted back to start/end document. In that case I don’t think you can really just convert start-doc to start-group and expect to be able to convert back start-group to start-doc. Maybe the solution is to store the original start/end-doc along with the generated start/end-group event and simply switch them back?

 

-   I noticed something that looks strange to me, but is probably just Git doing its thing: The master branch as now the Drupal filter, I’ve switched to the subfilter branch and got your changes, and somehow Drupal and the other changes I made to the master are also there, but not as changes. They seem to be already merged into subfilter.

 

Cheers,

-ys

Jim Hargrave

unread,
Mar 1, 2012, 9:28:12 PM3/1/12
to okapi...@googlegroups.com, Yves Savourel
On 03/01/2012 07:03 PM, Yves Savourel wrote:

Hi Jim,

 

Looks ok to me.

A few notes:

 

-   It shouldn’t be too bad to implement the code to all filters. As stubs first in the worst case.


two methods is much better than what we originally had and I already put a default implementation in AbstractFilter (throws OkapiNotImplemented exception).

 

-   I’m looking forward to see how this will work for merging :) I suppose the start/end group event generated by the converter will need to be detected and converted back to start/end document. In that case I don’t think you can really just convert start-doc to start-group and expect to be able to convert back start-group to start-doc. Maybe the solution is to store the original start/end-doc along with the generated start/end-group event and simply switch them back?


I thought we had already solved this problem at least - the current implementation (in master) converts startdocument to group and merges back properly - its just the enocoding that we need to fix. But I'm probably missing something.


 

-   I noticed something that looks strange to me, but is probably just Git doing its thing: The master branch as now the Drupal filter, I’ve switched to the subfilter branch and got your changes, and somehow Drupal and the other changes I made to the master are also there, but not as changes. They seem to be already merged into subfilter.


I periodically merge master into subfilter so the branches don't diverge too far - will make the final  subfilter to master merge easier.

 

Cheers,

-ys


Yves Savourel

unread,
Mar 1, 2012, 10:25:10 PM3/1/12
to okapi...@googlegroups.com
> I periodically merge master into subfilter so
> the branches don't diverge too far - will make
> the final subfilter to master merge easier.

Ah, that explains... :)

-ys

Jimbo

unread,
Mar 2, 2012, 1:39:26 PM3/2/12
to okapi...@googlegroups.com
I like the adater approach much better - only *one* new method for
IFilter (isSubfilter). There is AbstractSubFilterAdapter class that
implements IFilter, just pass in a base filter and a FilterState,
override the methods you need and you are off to the races. The abstract
implementation does most of the work for you. Most of the time you just
override the next/hasNext methods to have a fully functionaling subfilter.

I nuked the old subfilter branch and created a fresh one off of master -
let me know of any git problems. Changes are checked in.

We need to decide once and for all what events/resources we want to
send. Group with a subfilter annotation is a bad design IMHO. Why not
have a new events (startsubfilter/endsubfilter)? It would mean more
changes but cleaner and less confusion in the future - we can also
create a new resource that carries any extra info needed by the merger.

Jim


Yves Savourel

unread,
Mar 2, 2012, 2:04:29 PM3/2/12
to okapi...@googlegroups.com
> We need to decide once and for all what events/resources
> we want to send. Group with a subfilter annotation is a
> bad design IMHO. Why not have a new events (startsubfilter/endsubfilter)?
> It would mean more changes but cleaner and less confusion in the future
> - we can also create a new resource that carries any extra info needed
> by the merger.

Would this new events replace start/end-document or just precede them?

-ys

Jim Hargrave

unread,
Mar 2, 2012, 2:12:45 PM3/2/12
to okapi...@googlegroups.com
I would think replace - as its not the start of a true document - only a new format within a document.  

Jim

Yves Savourel

unread,
Mar 2, 2012, 2:45:19 PM3/2/12
to okapi...@googlegroups.com

The downside of using new events would be that then every receiving components (steps, writer, etc.) would have to decide how to handle the sub-filter start/end. If you want to handle it as a group for example, you would have to do this everywhere. But some component may do that while other may not.

 

To me using a specific event is just a way to delegate the problem downstream. And I’m not sure that’s right.

 

Ideally given the same input, the structure that comes out of a filter should always be the same. A start/end-subfilter would introduce a new thing the components wouldn’t have a clear idea how to handle. I suppose one could establish the rule that a start/end-subfilter is to be treated like a start/end-group except for the components that need to do something specific (like the XLIFF writer). But then it means it’s really a group, why not make this some normal part of the group?

 

Using a start/end-subfilter event would also open the door to not treat the events as start/end-group, and that may be dangerous: the filter should decide what the structure of the resources is, not the components after.

 

Just thinking aloud…

-ys

 

 

 

From: okapi...@googlegroups.com [mailto:okapi...@googlegroups.com] On Behalf Of Jim Hargrave


Sent: Friday, March 02, 2012 12:13 PM
To: okapi...@googlegroups.com

Jim Hargrave

unread,
Mar 2, 2012, 2:59:38 PM3/2/12
to okapi...@googlegroups.com
The downside of new events is the extra complexity (default handler methods in BasePipelineStep could make this invisible to most components).  But let's say we stayed with using Group with an annotation as we do now - we would still have to overload the Group resource with all kinds of junk that normal groups don't need. Conflating things usually causes confusion down the road. But good documentation can help.

Using group would mean less refactor - I'm game as long as we come up with a good way to pass down the needed resource info without too many hacks.

Also, is using an annotation the best way to distinguish groups as subfilter events? We could define a sub-class of group (using the same event type) - callers could test the type to see if its a subfilter group. This sublass could also safely hold a new resource type.

type check vs annotation check?  type check seems cleaner to me with the added benefit of a new resource for free.

Jim

Jim

Yves Savourel

unread,
Mar 2, 2012, 3:07:05 PM3/2/12
to okapi...@googlegroups.com
> Also, is using an annotation the best way to distinguish
> groups as subfilter events?

Agreed: Probably not.


> We could define a sub-class of group (using the same event type)
> - callers could test the type to see if its a subfilter group.
> This sublass could also safely hold a new resource type.

Sounds like a good idea. This could be a clean way to extend the group.
Only the components that care about dealing with filter switches (mostly filter-writers) would check the type.

-ys


Reply all
Reply to author
Forward
0 new messages