Best Way To Mass Delete

2,583 views
Skip to first unread message

Derek Perkins

unread,
Sep 2, 2015, 6:15:48 PM9/2/15
to Google Cloud Bigtable Discuss
I have a situation where I need to delete an entire subsection of my table and then recreate it from my source data. I can't just overwrite because I have no guarantee that the source data will still overwrite every datapoint. What I'd really like to do is be able to say 

DeleteRows(RowRange)

Right now however, it seems like I'll have to retrieve all of the rows (potentially millions) and run the delete one by one using Apply. I'm using the Go bigtable package. Is there a better way to go about this?

Thanks,
Derek

Douglas Mcerlean

unread,
Sep 3, 2015, 11:18:51 AM9/3/15
to Google Cloud Bigtable Discuss
What you describe is currently the only way to do this. It shouldn't be too awful in Go though, you can just make your callback to the read issue the row deletion.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/7cdbabd1-c27a-4cff-b81a-a6380f286647%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Derek Perkins

unread,
Sep 8, 2015, 5:24:35 PM9/8/15
to Google Cloud Bigtable Discuss
It's not too terrible, just annoying and seemingly very inefficient. I also can't figure out how to not return any column data while still returning the row keys. I tried using this column filter to not match any columns, but then I don't have any rows returned either.

colFilter := bigtable.ColumnFilter("a^")

At this point, I have to have my column filter return 1 field (I picked the smallest integer field I could) or it doesn't return any rows. Is that expected behavior?

David Symonds

unread,
Sep 8, 2015, 5:41:10 PM9/8/15
to Google Cloud Bigtable Discuss
Try using bigtable.StripValueFilter.

Derek Perkins

unread,
Sep 8, 2015, 6:33:35 PM9/8/15
to Google Cloud Bigtable Discuss
Use that on its own without a column filter?

On Tuesday, September 8, 2015 at 3:41:10 PM UTC-6, David Symonds wrote:
Try using bigtable.StripValueFilter.

David Symonds

unread,
Sep 8, 2015, 8:04:53 PM9/8/15
to Google Cloud Bigtable Discuss
Yeah. It'll return everything except for the values.

Derek Perkins

unread,
Sep 9, 2015, 8:59:23 PM9/9/15
to Google Cloud Bigtable Discuss
Thanks for the suggestion.

Are there any plans for implementing better bulk delete functionality? Doing it in the callback like Doug mentioned took about 40 minutes to delete 125k rows.

Douglas Mcerlean

unread,
Sep 10, 2015, 11:32:26 AM9/10/15
to Google Cloud Bigtable Discuss
I don't think it's on our radar at the moment, though I have some ideas about how it could be done behind the scenes. I imagine the slowness you're seeing is dominated by network round trips though, so if you break up the row space and run multiple goroutines it should parallelize quite well.

On Wed, Sep 9, 2015 at 8:59 PM, Derek Perkins <de...@derekperkins.com> wrote:
Thanks for the suggestion.

Are there any plans for implementing better bulk delete functionality? Doing it in the callback like Doug mentioned took about 40 minutes to delete 125k rows.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 11, 2015, 1:36:18 PM9/11/15
to Google Cloud Bigtable Discuss
I don't think it's on our radar at the moment, though I have some ideas about how it could be done behind the scenes.

Doug - if there's any way to get it on the radar, that would be incredibly helpful. I have 10,000 goroutines deleting records and it is still taking hours to delete one small subsection of my data. Any kind of multi-operations that prevent me from having all this round trip overhead would make things far more efficient. I spend an incredible amount of CPU time on any write operations.

Douglas Mcerlean

unread,
Sep 11, 2015, 1:45:37 PM9/11/15
to Google Cloud Bigtable Discuss
Can I ask why you frequently need to do this? If a large fraction of your data is meaningfully different enough that you'd want to delete it separately, my initial assumption would be that it'd be better to put it in a separate table altogether. We haven't seen a need for something like this previously, but we can look into it if we've overlooked an interesting use case.

On Fri, Sep 11, 2015 at 1:36 PM, Derek Perkins <de...@derekperkins.com> wrote:
I don't think it's on our radar at the moment, though I have some ideas about how it could be done behind the scenes.

Doug - if there's any way to get it on the radar, that would be incredibly helpful. I have 10,000 goroutines deleting records and it is still taking hours to delete one small subsection of my data. Any kind of multi-operations that prevent me from having all this round trip overhead would make things far more efficient. I spend an incredible amount of CPU time on any write operations.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 11, 2015, 3:41:32 PM9/11/15
to Google Cloud Bigtable Discuss
On a more general level, the reason I'm willing to pay $1500 / mo at a minimum is to handle large quantities of data, so it stands to reason that being able to batch operations would be important, whether that is Writing / Deleting.

My particular use case is that Bigtable isn't my persistent datastore, it is just my high speed store for serving data to the client. My raw data lives in BigQuery. Like we discussed before, my table keys look like "tenantID#clientID#campaignID#...". They are able to set custom rules on the campaign level that determine how much of the raw data is filtered into Bigtable for their consumption. When they make those changes, I pull the raw data out of BigQuery, filter it and store the results into Bigtable. Because I can't assume that the new dataset will overwrite all of the previous data, I am doing a prefix delete on tenantID#clientID#campaignID. There are currently hundreds of thousands of ~1kb rows per campaign.

Having a single table per campaign seems like overkill, when it's the exact same dataset as everything else in that table. I recognize that I'm currently doing significantly more deleting than I will be in the future, as I'm still testing that whole publishing process. Normally filter changes won't happen all that often, but I can publish the new data out in 5-20 minutes, while deleting is taking all day.

Douglas Mcerlean

unread,
Sep 14, 2015, 4:05:59 PM9/14/15
to Google Cloud Bigtable Discuss
I see, so you're essentially blowing away your materialized views in order to replace them. And while these are likely to be a few 100k rows each, they generally *won't* be a large fraction of the table.

All the options for how we might implement something like this at the service level are, unfortunately, non-trivial behind the scenes. Deletes are understandably dangerous, and multi-row operations don't fit well into the Bigtable paradigm. We'll continue looking into it, but in the mean time I suspect there's still a lot of room to speed up your current approach. What, if any, filters are you applying to the pre-delete scan you're doing right now? If you're sending back most or all of the data in each row just to delete it, then that's a big bottleneck that we can remove.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 14, 2015, 4:23:50 PM9/14/15
to Google Cloud Bigtable Discuss
I see, so you're essentially blowing away your materialized views in order to replace them. And while these are likely to be a few 100k rows each, they generally *won't* be a large fraction of the table.

Exactly.
 
All the options for how we might implement something like this at the service level are, unfortunately, non-trivial behind the scenes. Deletes are understandably dangerous, and multi-row operations don't fit well into the Bigtable paradigm. We'll continue looking into it, but in the mean time I suspect there's still a lot of room to speed up your current approach. What, if any, filters are you applying to the pre-delete scan you're doing right now? If you're sending back most or all of the data in each row just to delete it, then that's a big bottleneck that we can remove.

 These are the filters I'm applying on the pre-delete scan. I'm bringing back just 1 field with no data. I tried bringing back 0 fields, but then no rows returned.
bigtable.ChainFilter( bigtable.ColumnFilter("^clid$"), bigtable.StripValueFilter() ) // pseudocode

The place where I could speed things up the most is by segmenting the rows, but I'd have to do that blindly since I don't know exactly what data is there. It wouldn't be too bad if I swapped out my PrefixFilter for a RowRange that deleted 1 day at a time, but still doesn't feel like it would make a huge difference. Here's the full loop as it stands right now. I have a wrapper around table.Apply that limits the number of concurrent connections to Bigtable to 10k, which is why I'm spinning out x goroutines here.

// Generate row filter from requested campaigns prefixKey := fmt.Sprintf("%-15s#%07d#%07d", c.UserToken.WorkspaceID, clientID, campaignID) rowRange := bigtable.PrefixRange(prefixKey) // Create delete mutation to use for every row mut := bigtable.NewMutation() mut.DeleteRow()

// Read in every row then immediately delete that row key
rowCount := 0
wg := sync.WaitGroup{}
err = table.ReadRows(c, rowRange, func(r bigtable.Row) bool {
rowCount++
wg.Add(1)
go func(rowKey string) {
if err := table.Apply(c, rowKey, mut); err != nil {
logger.Error(c, err, "Error deleting row '%s'", rowKey)
}
wg.Done()
}(r.Key())
return true
}, bigtable.RowFilter(colFilter))
wg.Wait()

logger.Info(c, "total rows deleted: %d", rowCount)


Douglas Mcerlean

unread,
Sep 14, 2015, 4:44:54 PM9/14/15
to Google Cloud Bigtable Discuss, Ian Lewis
+Ian for Go commentary

I'd certainly recommend segmenting the rows, but I think you're right that that isn't the main problem. Basically, I'd expect the actual delete calls themselves to be just as fast, if not faster, than the writes you originally used to populate the data. In Bigtable a mutation is a mutation is a mutation.

Assuming that's true, whatever's going wrong has to do with the speed of the scan or some unnecessary serialization in the code. I'm not a Go expert, so Ian will have to comment on the latter. The former is easily tested, have you tried just running the scan without the deletes to see how long it takes?

One final note is that, since you're deleting in sequence, you're actually pounding just one tablet at a time. This will certainly slow things down somewhat, but I wouldn't expect it to bottleneck anywhere near as hard as you're observing.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 14, 2015, 10:14:51 PM9/14/15
to Google Cloud Bigtable Discuss, ianl...@google.com
The former is easily tested, have you tried just running the scan without the deletes to see how long it takes?

I just ran some tests using a RowPrefix on 383,180 records with no other processes running on the n1-standard-4 instance with 4 CPUs and 15GB RAM.
  1. Loop through all records and do nothing: 15 seconds
  2. Loop through all records and delete all records: 88 minutes 
    1. had to run it a few times because Bigtable started throwing auth errors
    2. used 35% CPU on an n1-standard-4 all 88 minutes
  3. Loop through now zero records: 18 seconds
    1. I know that GC has to happen on the server, but why did this take longer than the first iteration?

Derek Perkins

unread,
Sep 14, 2015, 11:24:10 PM9/14/15
to Ian Lewis, Google Cloud Bigtable Discuss
You may want to batch groups of rows or create a semaphore to limit the number of goroutines that are executing in parallel.
 
I have an instance level semaphore that currently restricts all Bigtable operations to 10k concurrent connections. If there is a better number to settle on, I'm all ears. :)

Derek Perkins

unread,
Sep 14, 2015, 11:54:28 PM9/14/15
to David Symonds, Ian Lewis, Google Cloud Bigtable Discuss
A goroutine per row is probably too many goroutines, though you shouldn't notice too much overhead except for the memory of all that, especially if you're running on Go 1.5.

Yeah, I haven't worried about that too much yet because I'm only using about 5% of the 15 GB of RAM on the box.
 
It'll be far more dependent on how much the grpc-go package can stuff down the wire. Last time I measured there was ~100 microsecond overhead per RPC at the moment, which implies roughly an upper bound of 10 kQPS on a single connection (which correlates to a single bigtable.Client). Running the deletion over multiple clients (or even multiple machines) may help to max out the server side.

I am currently piping everything through a single client. I'm assuming that a simple DeleteRow mutation isn't adding much overhead to each RPC, so it seems like my 10k limit should be just about right for 10k deletes / second. It would be pretty easy for me to instantiate multiple clients per table too. What would you recommend as the max number of clients per instance and max connections per client?

Ian Lewis

unread,
Sep 15, 2015, 1:06:50 PM9/15/15
to Derek Perkins, Google Cloud Bigtable Discuss
Hi,

It looks like you are creating a new goroutine for each row. You may want to batch groups of rows or create a semaphore to limit the number of goroutines that are executing in parallel. That may help your throughput.

Ian

Ian Lewis

unread,
Sep 15, 2015, 1:06:50 PM9/15/15
to Derek Perkins, dsym...@google.com, Google Cloud Bigtable Discuss

Yah. I read your note on the wrapper around table.Apply() a little too late ;)

I also thought the 1 goroutine to 1 row ratio might not get best performance. Maybe try batching rows so that each goroutine deletes several rows. That way you avoid some of the overhead of creating and deleting goroutines (I know they're cheap but still...). I'm not sure what that number would be. <100 feels right but I think it would need testing to find the right number.

I added David who wrote the Go client library as well to see if he has ideas.

Douglas Mcerlean

unread,
Sep 15, 2015, 4:10:25 PM9/15/15
to Google Cloud Bigtable Discuss, Ian Lewis
Something is definitely odd here, can you also try with he semaphore/Go routines, but just have them return immediately instead of deleting? I wonder if between the overhead of the semaphore and forking off the routine you might just not be getting the parallelism you think you are. If this winds up being slow as well, then Ian is correct and batching is indeed the answer. That would definitely be my suspicion at this point.

As for 15 seconds vs 18, I'd say that's probably within the margin of error on a single run, but it's also the case that immediately after all those deletes the table actually has to sort through *more* data, not less (deletion markers on top of the pre-existing values). After a compaction things will settle back down.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 15, 2015, 4:12:45 PM9/15/15
to Google Cloud Bigtable Discuss, ianl...@google.com
Something is definitely odd here, can you also try with he semaphore/Go routines, but just have them return immediately instead of deleting? I wonder if between the overhead of the semaphore and forking off the routine you might just not be getting the parallelism you think you are. If this winds up being slow as well, then Ian is correct and batching is indeed the answer. That would definitely be my suspicion at this point.

That is how I ran the first test that took 15 seconds, to make sure I was comparing apples to apples. 

Douglas Mcerlean

unread,
Sep 15, 2015, 4:18:39 PM9/15/15
to Google Cloud Bigtable Discuss, Ian Lewis
That's very strange...do you have timings on the individual delete operations? If they themselves are substantially slower than writes that's definitely something we need to look into. Also, when you actually populate the table, do you write the rows in sequence like the deletions here, or is there some sort of parallelism?

On Tue, Sep 15, 2015 at 4:12 PM, Derek Perkins <de...@derekperkins.com> wrote:
Something is definitely odd here, can you also try with he semaphore/Go routines, but just have them return immediately instead of deleting? I wonder if between the overhead of the semaphore and forking off the routine you might just not be getting the parallelism you think you are. If this winds up being slow as well, then Ian is correct and batching is indeed the answer. That would definitely be my suspicion at this point.

That is how I ran the first test that took 15 seconds, to make sure I was comparing apples to apples. 

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 15, 2015, 4:38:43 PM9/15/15
to Google Cloud Bigtable Discuss, ianl...@google.com
That's very strange...do you have timings on the individual delete operations? If they themselves are substantially slower than writes that's definitely something we need to look into. Also, when you actually populate the table, do you write the rows in sequence like the deletions here, or is there some sort of parallelism?

I'd guess that it's pretty close to being in sequence, but I'm not positive, since rows from BQ aren't guaranteed to be in any specific order. I run the writes through a task queue, with each request writing about 10k rows, then creates a new task with the next page token that executes immediately in sequence. I actually tried writing all the data to Bigtable in a single request, the same way that I am deleting, but the Bigtable requests seemed to get slower and slower. I didn't spend enough time investigating if it was a GC issue or something else, as I already had the framework of my task loop in place.

I don't have any exact comparison numbers on deletion vs write time for the same records like I did earlier. It takes me about 10 hours to rebuild my 50 GB table from scratch, but that's obviously much larger than the 300k rows I've been dealing with here.

Douglas Mcerlean

unread,
Sep 15, 2015, 5:03:01 PM9/15/15
to Google Cloud Bigtable Discuss, Ian Lewis

I think that's the crucial difference. With your writes, you buffer up a bunch of records, then write them one at a time as fast as possible. Even though they're in order and you're hitting just one tablet, there's only one outstanding request. With these deletes, you have potentially 10k outstanding requests for essentially contiguous rows, which is overwhelming the one or two tablets that serve them. Can you try reducing the concurrency to something like 1000 or even 100, and see if it helps?

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Sep 15, 2015, 6:39:23 PM9/15/15
to Google Cloud Bigtable Discuss, ianl...@google.com

I think that's the crucial difference. With your writes, you buffer up a bunch of records, then write them one at a time as fast as possible. Even though they're in order and you're hitting just one tablet, there's only one outstanding request. With these deletes, you have potentially 10k outstanding requests for essentially contiguous rows, which is overwhelming the one or two tablets that serve them. Can you try reducing the concurrency to something like 1000 or even 100, and see if it helps?


Sorry if I wasn't clear on my write process. I retrieve 10k records from BigQuery, then I am still concurrently writing 10k records as I loop through the raw data in the same fashion as my deletes, using the same semaphore to restrict concurrency. I can still try reducing the concurrency to see what that does. 

For what it's worth, during the write process last night, I sustained a pretty consistent 4 MB / sec write throughput. Anytime I tried increasing the speed past that by adding more instances, it brought down the whole process pretty quickly. I assume that is related to the data all being written to a single tablet.

Douglas Mcerlean

unread,
Sep 15, 2015, 6:49:44 PM9/15/15
to Google Cloud Bigtable Discuss, Ian Lewis
That'd be my expectation, yes. We've put a lot of work into making Bigtable adjust well to temporary hotspots, but if you hit the same spot hard enough you'll always be able to outpace our attempts to spread the load around. I wonder if maybe, since the writes are sending a lot more data, it bottlenecks enough over the wire or in our flow control to keep the write rate manageable, whereas with deletes you're just pounding and pounding with small requests that make it to the server without the same mitigating circumstances.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Derek Perkins

unread,
Oct 1, 2015, 3:35:59 PM10/1/15
to Google Cloud Bigtable Discuss, ianl...@google.com
Doug,

As I'm circling back around to this, I think I've got a much better solution that works with the current setup. I wanted to run it past you and David Symonds to make sure that my assumptions are correct. Each ReadItem has a Timestamp associated with it, and since I only write all the cells once at the same time for my materialized view, all of those should be the same. I think I can move the long delete process into its own task, and as I'm looping through the rows, I can only issue a Delete mutation if the ReadItem timestamp is before the time I started the delete. That way I could concurrently be overwriting the view while deletion is taking place in the background, sort of like a GC. That should leave me with only a small window for race condition, which is fine for my purposes.

Will that work how I'm expecting?

Thanks,
Derek

Douglas Mcerlean

unread,
Oct 2, 2015, 9:53:38 AM10/2/15
to Google Cloud Bigtable Discuss, Ian Lewis

Something like that should work fine for the most part, just be aware of the potential for clock skew between client and server if you use server-supplied timestamps. If you're willing to explicitly set the timestamps on the materialized views yourself, then this approach could be made completely safe.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages