sharding

35 views
Skip to first unread message

Artie Pesh-Imam

unread,
Mar 14, 2012, 5:31:17 PM3/14/12
to Squeryl
Hey Max and all,

It's time for my company to look at sharding our database. Im tasked
with making squeryl shardaware. Our general thought process is as
follows:

1) Add a sharded column attribute. This would specify which column a
table is shared on
2) In general when querying:
identify the hash key
bind to the appropriate db connection
execute query
3) We'd also need something to validate if a query is against a
sharded table likely we would throw an exception
4) Also need to define a way of executing a query against all tables

The reason why Im writing is that I would like to get some feedback on
this approach. I've added the sharded column attribute no problem. The
next step for me is to look at getting inserts working correctly. This
would lead me to implement the consistent hashing algorithm as well as
figuring out how to get a session to the appropriate db.

I was looking at adding logic to the table class where data is updated/
inserted to determine the shard to write to. Does this approach make
sense? I guess to put more generally, I was just wondering if there
had been any thought given to the design of incorporating sharding.

Thanks
Artie

Maxime Lévesque

unread,
Mar 14, 2012, 8:05:21 PM3/14/12
to squ...@googlegroups.com


Here's a scheme that would be more strongly typed :

trait ShardingFunction[K] {
  def selectShard(k: K): Session
}

class ShardedTable[A,K](table: Table[A], invokeShardedFieldGetter: A => AnyRef, shardedFieldMetaData: FieldMetaData, f: ShardingFunction[K]) {

  def shardedLookup(k: K)(implicit dsl: QueryDsl) = {

    import dsl._

    using(f.selectShard(k)) {

      val q = from(table)(a => dsl.where {
        FieldReferenceLinker.createEqualityExpressionWithLastAccessedFieldReferenceAndConstant(invokeShardedFieldGetter(a), k)
      } select(a))

      q
    }
  }
}

class ShardedTablePrecursor[A](table: Table[A]) {


   def shardedOn[K,T](f:  A => T)(implicit dsl: QueryDsl, ev: T => TypedExpressionNode[K], sf: ShardingFunction[K]) = {

     import dsl._
     
     val q = from(table)(a=> select(a)) : Query[A]

     val n = 
       Utils.mapSampleObject(q, (a0:A) => ev(f(a0)))
     
     val invoker = f.asInstanceOf[A=>AnyRef]
     
     new ShardedTable[A,K](table, invoker, n._fieldMetaData, sf)
   }
}


object YourSchema extends Schema {

  implicit object yourHashFunction extends ShardingFunction[Long] {
    def selectShard(k: Long): Session = sys.error("implemente me !")
  }

Then you define a sharded table : 

  val shardedTable = table[YourObject].shardedOn(_.theShardingKey)
  
  shardedTable.shardedLookup(k) // here it is enforced at compile time that k is the same type as _.theShardingKey ...

Now the insert and updates are left as an exercise ! ;-)

It probably would be wise to disable the partial update function, i.e. only update with def update(a: A),
or maybe provide a partial update that takes a shard key in a first argument list

Cheers !

2012/3/14 Artie Pesh-Imam <ar...@cx.com>



--
Seuls les poissons morts nagent avec le courant

Maxime Lévesque

unread,
Mar 14, 2012, 8:29:25 PM3/14/12
to squ...@googlegroups.com

The ShardedTable class could also have this method :

  def shardedTransaction[U](shardKey: K)(block: Table[A] => U)(implicit dsl: QueryDsl) = 
    dsl.transaction(f.selectShard(shardKey)) {
      block(table)
    }


shardedTable.shardedTransaction(aValueToShardOn) { table => // the 'raw' table is available here

   // squeryl code here will run against the chosen shard,

   table.insert(...)  // inserts, updates and deletes will magically be ran against the correct shard...
}


2012/3/14 Maxime Lévesque <maxime....@gmail.com>

Artie Pesh-Imam

unread,
Mar 15, 2012, 2:11:51 PM3/15/12
to squ...@googlegroups.com
Max,

Thanks for the suggestion! I do like it. Quick (possibly dumb) question… when defining the table, you're calling sharedOn which is defined in ShardedTablePrecursor. Is there an implicit conversion going on that Im missing. Im not sure how that method can be referenced on table.

Thanks
a



Sent with Sparrow

Maxime Lévesque

unread,
Mar 15, 2012, 2:26:04 PM3/15/12
to squ...@googlegroups.com


Yes... there needs to be this implicit conversion in the the Schema trait :

  protected implicit def tableToSharded[A](t: Table[A]) = new ShardedTablePrecursor(t)


2012/3/15 Artie Pesh-Imam <ar...@cx.com>

Artie Pesh-Imam

unread,
Mar 19, 2012, 1:46:00 PM3/19/12
to squ...@googlegroups.com
Max,

Thanks I've got this working.

A few things: 

  def shardedLookup(k: K)(implicit dsl: QueryDsl) = {

    import dsl._

    using(f.selectShard(k)) {
      val q = from(table)(a => dsl.where {
        FieldReferenceLinker.createEqualityExpressionWithLastAccessedFieldReferenceAndConstant(invokeShardedFieldGetter(a), k)
      } select(a))
      q.single
    }
  }

Here you'll see I've tacked on the single call otherwise, there's no session bound and I can't query.

I was thinking of omitting this call anyway as it doesn't really give anything that isn't available through shardedTransaction. 

My last question is if you'd want to pull this in. It's quite self contained. If yes, I can clean up my test cases and create a pull req.

Thanks
Artie

Sent with Sparrow

Maxime Lévesque

unread,
Mar 19, 2012, 2:09:16 PM3/19/12
to squ...@googlegroups.com

The more self contained and clean, the more I'll want to pull it ;-)
I will change to an sbt multiproject setup pretty soon, sharding could 
go in it's own sub project. Contact me privately, we can discuss it.

BTW, did you read this :  http://www.codefutures.com/database-sharding/ 
?

2012/3/19 Artie Pesh-Imam <ar...@cx.com>
Reply all
Reply to author
Forward
0 new messages