DeleteRows(RowRange)
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/7cdbabd1-c27a-4cff-b81a-a6380f286647%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
colFilter := bigtable.ColumnFilter("a^")
Try using bigtable.StripValueFilter.
Thanks for the suggestion.Are there any plans for implementing better bulk delete functionality? Doing it in the callback like Doug mentioned took about 40 minutes to delete 125k rows.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/7a60e6c8-6348-4352-a360-3833298ef037%40googlegroups.com.
I don't think it's on our radar at the moment, though I have some ideas about how it could be done behind the scenes.
I don't think it's on our radar at the moment, though I have some ideas about how it could be done behind the scenes.Doug - if there's any way to get it on the radar, that would be incredibly helpful. I have 10,000 goroutines deleting records and it is still taking hours to delete one small subsection of my data. Any kind of multi-operations that prevent me from having all this round trip overhead would make things far more efficient. I spend an incredible amount of CPU time on any write operations.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/4136e161-ae0a-494d-8fbd-13d3895dff80%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/17213a7a-839d-45a4-91a1-213c8dc1fc5d%40googlegroups.com.
I see, so you're essentially blowing away your materialized views in order to replace them. And while these are likely to be a few 100k rows each, they generally *won't* be a large fraction of the table.
All the options for how we might implement something like this at the service level are, unfortunately, non-trivial behind the scenes. Deletes are understandably dangerous, and multi-row operations don't fit well into the Bigtable paradigm. We'll continue looking into it, but in the mean time I suspect there's still a lot of room to speed up your current approach. What, if any, filters are you applying to the pre-delete scan you're doing right now? If you're sending back most or all of the data in each row just to delete it, then that's a big bottleneck that we can remove.
bigtable.ChainFilter( bigtable.ColumnFilter("^clid$"), bigtable.StripValueFilter() ) // pseudocode
// Generate row filter from requested campaigns
prefixKey := fmt.Sprintf("%-15s#%07d#%07d", c.UserToken.WorkspaceID, clientID, campaignID)
rowRange := bigtable.PrefixRange(prefixKey)
// Create delete mutation to use for every row
mut := bigtable.NewMutation()
mut.DeleteRow()
// Read in every row then immediately delete that row key rowCount := 0 wg := sync.WaitGroup{} err = table.ReadRows(c, rowRange, func(r bigtable.Row) bool { rowCount++ wg.Add(1) go func(rowKey string) { if err := table.Apply(c, rowKey, mut); err != nil { logger.Error(c, err, "Error deleting row '%s'", rowKey) } wg.Done() }(r.Key()) return true }, bigtable.RowFilter(colFilter)) wg.Wait()
logger.Info(c, "total rows deleted: %d", rowCount)
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/cf7ec954-d9e1-4533-a2c7-911fa749b988%40googlegroups.com.
The former is easily tested, have you tried just running the scan without the deletes to see how long it takes?
You may want to batch groups of rows or create a semaphore to limit the number of goroutines that are executing in parallel.
A goroutine per row is probably too many goroutines, though you shouldn't notice too much overhead except for the memory of all that, especially if you're running on Go 1.5.
It'll be far more dependent on how much the grpc-go package can stuff down the wire. Last time I measured there was ~100 microsecond overhead per RPC at the moment, which implies roughly an upper bound of 10 kQPS on a single connection (which correlates to a single bigtable.Client). Running the deletion over multiple clients (or even multiple machines) may help to max out the server side.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/86ae2b95-f194-49bd-977e-8f790f0ecdd6%40googlegroups.com.
Something is definitely odd here, can you also try with he semaphore/Go routines, but just have them return immediately instead of deleting? I wonder if between the overhead of the semaphore and forking off the routine you might just not be getting the parallelism you think you are. If this winds up being slow as well, then Ian is correct and batching is indeed the answer. That would definitely be my suspicion at this point.
Something is definitely odd here, can you also try with he semaphore/Go routines, but just have them return immediately instead of deleting? I wonder if between the overhead of the semaphore and forking off the routine you might just not be getting the parallelism you think you are. If this winds up being slow as well, then Ian is correct and batching is indeed the answer. That would definitely be my suspicion at this point.That is how I ran the first test that took 15 seconds, to make sure I was comparing apples to apples.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/a8b0742c-9a80-4c69-ae8a-9044c7f596a6%40googlegroups.com.
That's very strange...do you have timings on the individual delete operations? If they themselves are substantially slower than writes that's definitely something we need to look into. Also, when you actually populate the table, do you write the rows in sequence like the deletions here, or is there some sort of parallelism?
I think that's the crucial difference. With your writes, you buffer up a bunch of records, then write them one at a time as fast as possible. Even though they're in order and you're hitting just one tablet, there's only one outstanding request. With these deletes, you have potentially 10k outstanding requests for essentially contiguous rows, which is overwhelming the one or two tablets that serve them. Can you try reducing the concurrency to something like 1000 or even 100, and see if it helps?
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/735e83dd-7833-43f4-a14c-721c5b1422c4%40googlegroups.com.
I think that's the crucial difference. With your writes, you buffer up a bunch of records, then write them one at a time as fast as possible. Even though they're in order and you're hitting just one tablet, there's only one outstanding request. With these deletes, you have potentially 10k outstanding requests for essentially contiguous rows, which is overwhelming the one or two tablets that serve them. Can you try reducing the concurrency to something like 1000 or even 100, and see if it helps?
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/f06d0b09-2d89-4a23-a20b-1addecd3ef5a%40googlegroups.com.
Something like that should work fine for the most part, just be aware of the potential for clock skew between client and server if you use server-supplied timestamps. If you're willing to explicitly set the timestamps on the materialized views yourself, then this approach could be made completely safe.
--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/14fb63a8-f4f0-4d2f-9131-357f387e1cc5%40googlegroups.com.