Optimization

26 views
Skip to first unread message

Padarn Wilson

unread,
Nov 9, 2020, 9:45:07 AM11/9/20
to gocql
Hi all, 
I am reading the README of the project and see this under optimization:
  • Use the TokenAware policy
  • Use many goroutines when doing inserts, the driver is asynchronous but provides a synchronous API, it can execute many queries concurrently
  • Tune query page size
  • Reading data from the network to unmarshal will incur a large amount of allocations, this can adversely affect the garbage collector, tune GOGC
  • Close iterators after use to recycle byte buffers
This is useful, but it leaves me with a number of follow up questions:
  • What about using multiple goroutines when querying? Should I split my query into pages and concurrently process?
  • What is the suggestion to limit the cost of allocations caused by network unmarshall? 
  • Any suggested ways to benchmark changes?
Thank you!

Martin Sucha

unread,
Nov 9, 2020, 10:14:40 AM11/9/20
to Padarn Wilson, gocql
Hi,

On Mon, Nov 9, 2020 at 3:45 PM Padarn Wilson <pad...@gmail.com> wrote:
I am reading the README of the project and see this under optimization:
  • Use the TokenAware policy
  • Use many goroutines when doing inserts, the driver is asynchronous but provides a synchronous API, it can execute many queries concurrently
  • Tune query page size
  • Reading data from the network to unmarshal will incur a large amount of allocations, this can adversely affect the garbage collector, tune GOGC
  • Close iterators after use to recycle byte buffers
This is useful, but it leaves me with a number of follow up questions:
  • What about using multiple goroutines when querying? Should I split my query into pages and concurrently process?
Split queries per partition (if you are fetching multiple partitions). For fetching other than the first page you need to have page state,
page state is returned with the query results, so I think pages need to be fetched linearly. (Adding filtering conditions for splitting is
theoretically possible, but that will have overhead on the server side as it would have to process each partition multiple times.)
  • What is the suggestion to limit the cost of allocations caused by network unmarshall? 
If you are unmarshalling slices the driver reuses them during scan (https://github.com/gocql/gocql/issues/1348) already.
If you use user-defined types a lot I suggest implementing UnmarshalUDT method to avoid overhead caused by using reflection
(we use https://github.com/kiwicom/easycql in production to autogenerate them for us)

Padarn Wilson

unread,
Nov 9, 2020, 9:50:08 PM11/9/20
to gocql
Thanks for the tips Martin

On Monday, November 9, 2020 at 11:14:40 PM UTC+8 Martin Sucha wrote:
Hi,

On Mon, Nov 9, 2020 at 3:45 PM Padarn Wilson <pad...@gmail.com> wrote:
I am reading the README of the project and see this under optimization:
  • Use the TokenAware policy
  • Use many goroutines when doing inserts, the driver is asynchronous but provides a synchronous API, it can execute many queries concurrently
  • Tune query page size
  • Reading data from the network to unmarshal will incur a large amount of allocations, this can adversely affect the garbage collector, tune GOGC
  • Close iterators after use to recycle byte buffers
This is useful, but it leaves me with a number of follow up questions:
  • What about using multiple goroutines when querying? Should I split my query into pages and concurrently process?
Split queries per partition (if you are fetching multiple partitions). For fetching other than the first page you need to have page state,
page state is returned with the query results, so I think pages need to be fetched linearly. (Adding filtering conditions for splitting is
theoretically possible, but that will have overhead on the server side as it would have to process each partition multiple times.)

Makes sense. I was thinking I could split the work by getting the token back, and then concurrently unmarshalling the results and executing the next query, but this might have little impact.
 
  • What is the suggestion to limit the cost of allocations caused by network unmarshall? 
If you are unmarshalling slices the driver reuses them during scan (https://github.com/gocql/gocql/issues/1348) already.
If you use user-defined types a lot I suggest implementing UnmarshalUDT method to avoid overhead caused by using reflection
(we use https://github.com/kiwicom/easycql in production to autogenerate them for us)

Awesome. thanks for that. Looks great. 
Reply all
Reply to author
Forward
0 new messages