find a faster way to get Id, Last Updated Date and Etag without pulling the whole document from RavenDB

492 views
Skip to first unread message

Tim Zhang

unread,
Dec 8, 2014, 10:04:10 PM12/8/14
to rav...@googlegroups.com
My Requirement:
1.Get one millions document from ravendb
2.Replicate Id,Etag,LastModified collectionName to SQL Server.
3.I want completed 1,and 2 in ten minutes.

a>What 's way should I use?
b>Can I only get Id Etag,LastModified CollectionName from ravendb but not return all fields of one doc?

Daniel Häfele

unread,
Dec 9, 2014, 2:48:43 AM12/9/14
to rav...@googlegroups.com
Use a transformer and stream the results to your application.

But nonetheless, I'm wondering why you're doing that, because the SQL replication bundle does exactly that.

Justin A

unread,
Dec 9, 2014, 10:09:37 PM12/9/14
to rav...@googlegroups.com
regardless of the SQL REPLICATION Bundle ...  this is something that I always think about when doing stuff :  (very poor analogies ... some i sincerely apologize to the NoSql purists...)

Load<Foo> == SELECT * FROM TABLE WHERE ID= id
Query<Foo> == SELECT * FROM TABLE WHERE <where clauses>.

so yeah . it's bringing the whole thing back.

So instead, you want to use TRANSFORMERS to do this.. SELECT Id, Etag, LastModified, CollectionName from TABLE WHERE <where clauses>.

I see transformers as the way to determine the particular fields of the document, thus reducing the network traffic from RavenDB to my app.

Finally - the question people always forget -> will this query return more than 128 docs? if yes .. will it return more than 1024 docs now? is still yes .. then you might need to *stream* the results instead. Streaming is saying: i've got more than 1024 results .. so please go get em.

Again - appols to the Purists. I've just found that triyng to explain some of this nosql to sql people .. having a crap analogy sometimes helps, instead of having nothing at all to explain or some overly complex thing.

(i've also not mentioned anything about sessions / session caching & transformers, etc.etc... just keeping this bare-bones simple).

Tim Zhang

unread,
Dec 10, 2014, 12:53:55 AM12/10/14
to rav...@googlegroups.com
As the following code shows ,I use Transformer and Stream to Get DocInfo {id,Tag,LastModified,Etag} .
but if there are millions docs in db the timeout exception throws from the code " sessionProvider.GetSession().Advanced.Stream(query))
so I guess calculate the transformer need more time if there are large docs.

How can I resolve the issue?

1.Transformer:

AllEntity/Results

from doc in results
 select new { 
doc.Id, 
Tag = doc["@metadata"]["Raven-Entity-Name"],
LastModified = doc["@metadata"]["Last-Modified"],
Etag = doc["@metadata"]["@etag"]
 }'

2.
public class DocInfo
    {
        public string Id { get; set; }
        public string Tag { get; set; }
        public string LastModified { get; set; }
        public string Etag { get; set; }
    }

   public void CreateBatchPartialQuery(DateTime startDateTime, 
            DateTime? endDateTime,
            string collectionName)
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            IDocumentQuery<DocInfo> query;
            if (string.IsNullOrEmpty(collectionName))
            {
                query = sessionProvider.GetSession()
                    .Advanced
                    .LuceneQuery<DocInfo>("Raven/DocumentsByEntityName")
                    .WhereGreaterThanOrEqual("LastModified", startDateTime.ToUniversalTime());
            }
            else
            {
                query = sessionProvider.GetSession()
                    .Advanced
                    .LuceneQuery<DocInfo>("Raven/DocumentsByEntityName")
                    .WhereEquals("Tag", collectionName)
                    .AndAlso()
                    .WhereGreaterThanOrEqual("LastModified", startDateTime.ToUniversalTime());
            }
            if (endDateTime != null)
            {
                query = query.AndAlso().WhereLessThanOrEqual("LastModified", endDateTime.Value.ToUniversalTime());
            }

            query = query.SetResultTransformer("AllEntity/Results");

             var totalCount = 0;

            using (var enumerator = sessionProvider.GetSession().Advanced.Stream(query))
            {
                while (enumerator.MoveNext())
                {
                    totalCount ++;
                    var dataAsJson = enumerator.Current.Document;
                   
                }

           }
            
        }

Tim Zhang

unread,
Dec 10, 2014, 1:01:43 AM12/10/14
to rav...@googlegroups.com

Or can I use  API something like :query.SelectFields(”Id“,"LastModified","Etag","xxx") to get fields I want.

Justin A

unread,
Dec 10, 2014, 10:35:39 PM12/10/14
to rav...@googlegroups.com
>1.Transformer:

AllEntity/Results

from doc in results
 select new { 
 doc.Id, 
 Tag = doc["@metadata"]["Raven-Entity-Name"],
 LastModified = doc["@metadata"]["Last-Modified"],
 Etag = doc["@metadata"]["@etag"]
 }'

that doesn't look like a transformer to me ..... that looks like a map. So to me, that means those fields will be accessible in your WHERE clause .. but the doc is still returned.

Have a look at the transfomers doc (which Daniel linked to above) :  http://ravendb.net/docs/article-page/3.0/csharp/transformers/what-are-transformers

so read that doc .. then post back her eif stuff still doesn't worky-work.


Reply all
Reply to author
Forward
0 new messages