Secor POC at Wayfair

40 views
Skip to first unread message

santoshdo...@gmail.com

unread,
Apr 1, 2020, 9:33:59 AM4/1/20
to secor-users

Hi Team,


We (Big Data Engineering @ Wayfair) are evaluating Secor as a potential platform to persist data to GCS from Kafka. The initial test shows positive results.


It is great to see that Secor is a widely used software for this problem. Thanks for your open-source contribution. 


To speed up our POC efforts, it would be great if we can learn from your experience at Pinterest. It would be a great help if you can answer the following questions.


  • Do you use this in production?
  • It looks like your primary use case is to persist data to AWS. Do you know any user for GCS?
  • At what scale is your deployment?
    • How big is your Secor cluster/cluster?
    • How many messages does your Secor cluster process? (messages/sec & bytes/sec)
    • Are you running them on VMs or Containers?


Thanks,

Santosh

Henry Cai

unread,
May 4, 2020, 3:12:22 AM5/4/20
to secor-users
For some reasons, I didn't see this email.   

See replied inline below:


On Wednesday, April 1, 2020 at 6:33:59 AM UTC-7, Santosh Domalapalli (@Wayfair) wrote:

Hi Team,


We (Big Data Engineering @ Wayfair) are evaluating Secor as a potential platform to persist data to GCS from Kafka. The initial test shows positive results.


It is great to see that Secor is a widely used software for this problem. Thanks for your open-source contribution. 


To speed up our POC efforts, it would be great if we can learn from your experience at Pinterest. It would be a great help if you can answer the following questions.


  • Do you use this in production?
We used to.  We are currently running the successor of Secor (called Merced) in Pinterest.  Secor and Merced shared many code (e.g. messaging parsing and handling, file uploading).  The main difference is Merced is managing the consumer assignment by itself. 
  • It looks like your primary use case is to persist data to AWS. Do you know any user for GCS?
Yes people have been using Secor on GCS.  You can check with https://github.com/norrs , he added quite a bit support for Secor/GCS integration.
  • At what scale is your deployment?
    • How big is your Secor cluster/cluster?
We were deploying secor on the scale of 100 TB/day with about 100 VMs.  That was about 3 years ago.  And our traffic have grown significantly after that but we don't expose the current traffic number.
    • How many messages does your Secor cluster process? (messages/sec & bytes/sec)
 
    • Are you running them on VMs or Containers?
We are running on VMs but I saw quite many people are using containers, you can check with the history of https://github.com/pinterest/secor/commits/master/src/main/scripts/docker-entrypoint.sh, there are 9 people contributed to that file. 


Thanks,

Santosh

Reply all
Reply to author
Forward
0 new messages