Rhino ETL OutOfMemoryException During Simple Read Operation of 4m Rows

244 views
Skip to first unread message

nabils

unread,
Jul 10, 2010, 12:50:07 PM7/10/10
to Rhino Tools Dev
I get the below OutOfMemoryException when performing a simple read of
4m rows in an ConventionInputCommandOperation. I am not performing any
logic or storing it anywhere. All it does is streams the data. I have
4gb of memory. Seems to be ok with around 2m rows. (See full code
below)

Thanks
Nabil

System.OutOfMemoryException occurred
Message=Exception of type 'System.OutOfMemoryException' was thrown.
Source=mscorlib
StackTrace:
at System.Collections.Hashtable.rehash(Int32 newsize)
at System.Collections.Hashtable.expand()
at System.Collections.Hashtable.Insert(Object key, Object
nvalue, Boolean add)
at System.Collections.Hashtable.set_Item(Object key, Object
value)
at Rhino.Etl.Core.QuackingDictionary.set_Item(String key,
Object value) in c:\Dev\ayende-rhino-etl-88104d7\Rhino.Etl.Core
\QuackingDictionary.cs:line 70
InnerException:

Occurs at code:
public object this[string key]
{
get
{
if (throwOnMissing && items.Contains(key) == false)
throw new MissingKeyException(key);
lastAccess = key;
return items[key];
}
set
{
lastAccess = key;
if(value == DBNull.Value)
items[key] = null;
else
items[key] = value;
}


Code:

public class TestProcess : EtlProcess
{
protected override void Initialize()
{
PipelineExecuter = new SingleThreadedPipelineExecuter();
Register(new BigTableRead());
}
}

public class BigTableRead : ConventionInputCommandOperation
{
public ReadRelationships()
: base("Db")
{
}

protected override void PrepareCommand(System.Data.IDbCommand
cmd)
{
cmd.CommandText = @"SELECT * FROM VeryBigTable";
}
}

Nathan Palmer

unread,
Jul 12, 2010, 12:44:15 PM7/12/10
to rhino-t...@googlegroups.com
The problem is with the PipelineExecuter. You currently are
referencing the SingleThreadedPiplineExecuter

PipelineExecuter = new SingleThreadedPipelineExecuter();

That executor internally uses Caching to speed up other operations
within the pipeline. You can attempt to use the
ThreadPoolPipelineExecutor but you may run into similar issues but
with too many threads holding onto memory at the same time.

I have a very simple non-caching pipeline executor that we use when
working with very large datasets (millions of records.) Try it out. If
it works well we can have it officially added as an option to Rhino
Etl. Here is the code.

public class SimplePipelineExecutor : AbstractPipelineExecuter
{
protected override IEnumerable<Row>
DecorateEnumerableForExecution(IOperation operation, IEnumerable<Row>
enumerator)
{
foreach (Row row in new EventRaisingEnumerator(operation,
enumerator))
{
yield return row;
}
}
}

Nathan

> --
> You received this message because you are subscribed to the Google Groups "Rhino Tools Dev" group.
> To post to this group, send email to rhino-t...@googlegroups.com.
> To unsubscribe from this group, send email to rhino-tools-d...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rhino-tools-dev?hl=en.
>
>

nabils

unread,
Jul 13, 2010, 1:50:34 PM7/13/10
to Rhino Tools Dev
Thanks Nathan. That works fine.
Reply all
Reply to author
Forward
0 new messages