Writing features effeciently

52 views
Skip to first unread message

jericks

unread,
Sep 22, 2013, 11:43:09 PM9/22/13
to geos...@googlegroups.com
Hi all,

A few recent discussions on the GeoTools mailing list [1] and on Stack overflow [2] got me thining about how GeoScript handles writing large amounts of Features.  For reading large datasets we have geoscript.layer.Cursor which works very well.  For writing we just have Layer.add which doesn't offer much control over transactions and batch sizes.  So, I put together a couple of gists [3] and [4] that try to handle writing large amounts of features. 

Both are pretty similar, they try to write in batches of features instead of one feature at a time and they both by default pick a transaction depending on whether the features come from a Shapefile (Transaction.AUTO_COMMIT) or anything else (DefaultTransaction).  The major difference is that [4] wraps a GeoTools FeatureWriter while [3] just uses FeatureStore.addFeatures().

Let me know what you think.

Thanks,
Jared




Justin Deoliveira

unread,
Sep 30, 2013, 10:41:39 AM9/30/13
to geos...@googlegroups.com
Hey Jared,

Indeed this is something that I have run up against as well. It would be great to address this.

Trying to mind my mind about what this might look up api wise. Do you mind coming up with a few examples of api usage?

Thanks!

-Justin



--
--
You received this message because you are subscribed to the GeoScript mailing list.
To post to this group, send email to geos...@googlegroups.com
To unsubscribe from this group, send email to geoscript+...@googlegroups.com
Visit this group at http://groups.google.com/group/geoscript or see http://geoscript.org
 
---
You received this message because you are subscribed to the Google Groups "GeoScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geoscript+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Justin Deoliveira
Vice President, Engineering | Boundless
jdeo...@boundlessgeo.com
@j_deolive

jericks

unread,
Sep 30, 2013, 11:03:01 PM9/30/13
to geos...@googlegroups.com
Hi Justin!

Here is what I have so far:


I am no expert, but I am trying to do the following:

1. Write features in batches.  The batch size is configurable but defaults to 1000.

2. Use a transaction but by default use the best kind of transaction.  For shapefiles this means a null transaction, for Property and Memory layers an auto_commit transaction, and for everything else a DefaultTransaction.

API wise, here is a basic example:

def writer = new Writer(layer, batch: 500)
try {
   pts.eachWithIndex{Point pt, int i ->
      Feature f = writer.newFeature
      f.geom = pt
      f['id'] = i
      writer.add(f)
    }
} finally {
    writer.close()
}

Basically, you create a geoscript.layer.Writer with the Layer you want to add Features to.  Then you start addings Features.  If you add more Features than the batch size, the Features are commited in a single Transaction.  You always must close the Writer at the end which will commit any remaining Features.

I was also thinkging about adding a getWriter() and withWriter(Closure c)  methods to the Layer class.  The withWriter method is nice because you don't have wrap your code in a try catch block, the withWriter method does it for you.  Here is an example:

layer.withWriter { Writer w ->
   pts.eachWithIndex{Point pt, int i ->
      Feature f = writer.newFeature
      f.geom = pt
      f['id'] = i
      writer.add(f)
    }
}

and here is the implementation:

void withWriter(Map options = [:], Closure c) {
        Writer w = new Writer(options, this)
        try {
            c.call(w)
        } finally {
            w.close()
        }
}

Closures are one of my favorite Groovy features.  I hope this helps.  

Thanks,
Jared
Reply all
Reply to author
Forward
0 new messages