I'm creating App where I need to make Post Feeds. Right now I'm using App Engine Standard and Cloud SQL instance db-n1-standard-1 which is based on MySQL
But I notice it is very expensive. App is under production mode and In first month price is much higher than my expectation. Most costly thing in Cloud SQL is instance hours rate So, I decide to migrate it on datastore.
I have three tables, 1- users 2- posts 3- comments And Every Table has round about 10 BillionsRows. Posts and Comments table are growing very fast 100 of millions rows are expected.
Users Table
user_id name email phone
--------------------------------------------------
u123 Abc abc@m.com 123456
u124 Cde cde@m.com 789065
u786 Qwe qwe@m.com 754599
. . . .
. . . .
. . . .
Posts Table
post_id user_id type src date
------------------------------------------------------------------------
p098 u123 img path/to/file 13-3-17
p456 u123 vid path/to/file 14-3-17
p239 u124 img path/to/file 15-3-17
. . . . .
. . . . .
. . . . .
Comments Table
cmnt_id post_id user_id comment
--------------------------------------------------------
m392 p098 u123 Some Text
m234 p098 u786 Some Text
m324 p456 u123 Some Text
. . . .
. . . .
. . . .
As you can see User
can post one or more posts
and single post
has one or more comments
I need to get posts data along with post comments I know I need to use JOIN
for this. And I can easily use it in Cloud SQL because it's based on MySQL.
These three tables have relation between them and As I know Cloud SQL is relational database but DataStore is not.
I have idea in my mind to convert these tables in DataStore
1- Create Entity as Users
and save all users records there
2- Create Entity as Posts
and save all posts records there
3 - Create Entity as Comments
and save all comments records there
Post id is always I know for which post I need to get data
For example I need post data about id p098
I'm using Java I decide to create two threads
First thread get post
data from Posts
Entity and second thread get comment
data from Comments
Entity. Then I merge these two data set by using java to generate same result as JOIN
Like MySQL
Because I think DataStore is not allow JOINS
Here I need some Suggestions.
KeyFactory keyFactory = datastore.newKeyFactory()
.addAncestors(PathElement.of("User", "u123"), PathElement.of("Post", "p098"))
.setKind("Comment");
Key commentKey = keyFactory.newKey("m392");
Entity comment = Entity.newBuilder(commentKey)
.set("comment", "Some Text")
.build();
datastore.put(comment);
Query<Entity> comments = Query.newEntityQueryBuilder()
.setKind("Comment")
.setFilter(PropertyFilter.hasAncestor(
datastore.newKeyFactory().addAncestors(PathElement.of("User", "u123"))
.setKind("Post").newKey("p098")))
.build();
QueryResults<Entity> results = datastore.run(comments);
only create them when transactions are absolutely necessary
"Note that entity groups are not required if you simply plan to reference one entity from another."
There is limitation in ancestor relation. You can not write more than one entity per sec.
No more than one operation per sec
Try to work-around the write limit by offsetting/buffering any comment related write-ops, so writing is throttled below the 1s limit.
I understood how I can Migrate from SQL to Datastore. But now Problem is thatNo more than one operation per secThis is creating trouble for me. Suppose there is another kind Like child of Post then it quite possible there is more than one like per sec.I don't want to create Like or comment under user, it always under post. Because in my App it's very rear case when need to get all posts where user comment. Mostly need toget all comments of the post. So, I think it's a best to make a post as a parent of comment, to make query fast. You mention.
Try to work-around the write limit by offsetting/buffering any comment related write-ops, so writing is throttled below the 1s limit.How can I do this I'm using Java any article or tutorial ?
For getting the post-previews:
You could compute the post preview as one bigger entity (post body including the recent comment-snippets as repeated structured property). For each new comment to a post, you write some data of the comment into the post's recent comments property. However, since many comments to one post could create a hotspot on the post entity, where even exponential backoff might not be sufficient, but you can apply the offsetting/buffering described below.
After your reply I search a lot on shared count. What I understand is that Shared Count is a technique where you divide single entity into multiple entities, Right ?Pick one of them randomly and update it, When you need to get Count fetch all these entities and sum up values for total count.
Thing which is confusing me is that if you convert one entity into multiple entities with same parent isn't it inside a single entity group ? If it's a single entity group then youcan't apply more than one write op per sec.
For example Post1 has two shared count A and B, is post1, A and B in same entity group ?OR post1 and A is separate entity group and post1 and B is separate entity group ?
If these are separate entity group then we can also use this technique for comments system, am I right ?
Another thing I don't know I understand it correctly or not.
For getting the post-previews:
You could compute the post preview as one bigger entity (post body including the recent comment-snippets as repeated structured property). For each new comment to a post, you write some data of the comment into the post's recent comments property. However, since many comments to one post could create a hotspot on the post entity, where even exponential backoff might not be sufficient, but you can apply the offsetting/buffering described below.You said that to create a Post entity with a property of comment which hold list of recent 3 comments ids ?
But using this we need to insert entity in comment Kind as well as update comment list id in Post entity. Please correct me.
One more thing which I want to ask, If I access any ancestor entity with id and update it will write op one per sec apply ?
For example user is parent of post and you have also post_id, You update the post entity using post_id will write op 1 per sec apply ?