You can specify any of the following parameters:
numFeederInstances | Will bring up the specified number of feeder instances. Example: | 1 |
numInstances | An integer value that specifies the number of Iago servers concurrently making requests to your service. Example: | 1 |
Parameter | Description | Required or Default Value |
---|---|---|
true | ||
feederDiskInMb | If your feeder dies running out of disk space, put a big number like 2000 here Example: | 60 |
log | A string value that specifies the complete path to the log you want Iago to replay as input for processLines function in the loadtest. Although log is part of the public API, there is a an obsure feature: the log can be either on HDFS or on your local file system to be uploaded to HDFS. Example: | Required |
maxPerHost | Maximum number of parrot_server instances per mesos box Example: | 1 |
mesosFeederNumCpus | A decimal value that specifies the number of CPUs to allocate the the test. Sets Example: | 5.0 |
mesosServerNumCpus | A decimal value that specifies the number of CPUs to allocate the the test. Sets Example: | 4.0 |
role | A string value that specifies the role in which the Iago job shows up on Example: | null |
serverDiskInMb | If your server dies running out of disk space, put a big number like 2000 here Example: | 60 |
traceLevel | Specifies output trace level of Iago. Only use levels more verbose than INFO when requestRate is set to something very low. Like once a second. Example: | com.twitter.logging.Level.INFO |
victimClusterType | When victimClusterType is "static", we set victims and port. victims can be a single host name, a host:port pair, or a list of host:port pairs separated with commas or spaces. The port is used for two things: to provide a port if none were specified in victims, and to provide a port for the host header using a FinagleTransport. Note that ParrotUdpTransport can only handle a single host:port pair. When victimClusterType is "sdzk" (which stands for "service discovery zookeeper") the victim is considered to be a server set, referenced with victims, victimZk, and victimZkPort. An example victims in this case would be "/twitter/service/devprod/devel/echo" | Default: "static" |
victimZk | the host name of the zookeeper where your serverset is registered | Default is |
victimZkPort | The port of the zookeeper where your serverset is registered | Default: |
Parameter | Description | Required or Default Value |
---|
Parrot is integrated with Mesos allowing us to create a crazy amount of load by distributing parrot across multiple hosts in a cluster. The normal arrangement is one parrot server combined with one parrot feeder. (numInstances = 1 and numFeederInstances = 1). This provides enough load for most testing. At this writing, depending on the data, this simple configuration can produce up to 20,000 requests per second. If you need more than that, you can distribute parrot over multiple hosts using the following parameters. The limiting speed of the feeder is the network card. The parrot server is slower. For a simple thrift victim sending uncompressed log lines of about 70 characters each, the parrot feeder has been observed to run at 1.96 million requests per second. The fastest the parrot server could run in this scenario was 19,000 requests per second. Since the parrot server is so much slower than the parrot feeder, we had a lot of parrot servers (numInstances > 100) to get the parrot feeder to go this fast.
The number of queries your service will experience is the Effective Rate which is numInstances*requestRate. Note that this is independent of how many feeders you have (numFeederInstances).
For example, if we had numFeederInstances = 2, numInstances = 5, and requestRate = 10000, then the effective rate would be 50,000 requests per second.
If you need traffic beyond what one feeder can provide, use more feeders. The feeder periodically queries each of the parrot servers asking if it would like more data. This way, we can have arbitrarily many feeders and not worry about coordinating between them.
By default, the parrot feeder sends a thousand messages at a time to each connected parrot server until the parrot server has one minutes worth of data. This is a good strategy when messages are small (less than a kilobyte). When messages are large, the parrot server will run out of memory. Consider an average message size of 100k, then the feeder will be maintaining an output queue for each connected parrot server of 200 million bytes. For the parrot server, consider a request rate of 2000, then 2000*60*100k = 12 gigabytes (at least). The following parameters help with large messages:
batchSize | how many messages the parrot feeder sends at one time to the parrot server. For large messages, setting this to 1 is recommended. | Default: 1000 |
cachedSeconds | how many seconds worth of data the parrot server will attempt to cache. Setting this to 1 for large messages is recommended. | Default is |
Parameter | Description | Required or Default Value |
---|
The parrot launcher creates the following files
config/target/config.mesos config/target/mesos-feeder.scala config/target/mesos-server.scala scripts/common.sh scripts/parrot-feeder.sh scripts/parrot-server.sh
These are used to launch the parrot feeder and parrot server on mesos.
You can control what parrot will send over to mesos using the two configurations distDir and archiveCommand. The defaults for these are
var distDir = "." def archiveCommand(name: String) = "jar Mcf %s.zip -C %s .".format(name, distDir)
For example,
override def archiveCommand(name: String) = "jar Mcf %s.zip -C %s README config log4j.properties requiredLibs scripts target/parrot-examples.jar".format(name, distDir)
The default classpath is
"*:libs/*"
where APP_HOME is the sandbox mesos created for your job. So, in the default configuration, any jar in the directory local to where you ran the parrot launcher and any jar in a subdirectory "libs". For example
classPath = ".:target/parrot-examples.jar:requiredLibs/*:project/boot/scala-2.9.2/lib/*"
archiveCommand | method for creating the archive that will become the mesos package | def archiveCommand(name: String) = "jar Mcf %s.zip -C %s .".format(name, distDir) |
classpath | classpath used by both the parrot server and the parrot feeder | "*:libs/*" where APP_HOME is the sandbox mesos created for your job. |
Parameter | Description | Required or Default Value |
---|
Data center-specific parameters: These Twitter-specific parameters control some aspects of how your job interacts with our Mesos, Hadoop, Zookeeper, and other prod-ish things.
hadoopNS | File system for the source of your transactions. Example: | hdfs://hadoop-company.com |
hadoopConfig | The location of the config file for your cluster on nest1 Example: | /etc/hadoop/hadoop-conf-smf1 |
mesosCluster | The Mesos cluster on which Iago executes. Example: | |
zkHostName | The host associated with Zookeeper. Example: | zookeeper.company.com |
zkPort | An integer value that specifies the Zookeeper port on which to deliver requests. Example: | 2181 |
zkNode | The Zookeeper node on which to deliver requests. Example: | /twitter/service/parrot2/%s |
Parameter | Description | Required or Default Value |
---|
You can change these parameters only at your own risk:
parrotTasks | The tasks that comprise Iago; by default, 1 server and 1 feeder. Example: | List("server", "feeder") |
parrotLogsDir | A string value that specifies the complete path to the log to the Iago log directory on Mesos. Example: | mesos/pkg/parrot/logs |
proxy | Iago proxy server. This is the host name of a machine that will execute mesos commands on behalf of Iago. Setting it to None means: don't use a proxy, execute our mesos commands on the current machine. Example: | Some("somehostname") |
proxyShell | Iago proxy shell; by default, no x11 forwarding. Example: | ssh -o ForwardX11=no |
proxyCp | Iago copy proxy; by default, secure copy. Example: | scp |
hadoopFS | A string value that is the current syntax used for the file system part of the Hadoop command. Example: | fs |
proxyMkdir | The command used to create a directory remotely. Example: | mkdir -p |
hadoopCmd | A string value that is the current syntax used for the Hadoop part of the Hadoop command Example: | hadoop |
Parameter | Description | Required or Default Value |
---|
By default, we run in the SMF1 mesos cluster. If you would like to run in another cluster, here is the process you should use:
To find out the settings for a cluster you want to run in, ssh to a nest box and look in the etc/hadoop/ folder for the correct hadoop setting folder, then look at the core-site.xml file for the hadoopNS.
Sample WOW config:
mesosCluster = "wow" hadoopConfig = "/etc/hadoop/hadoop-conf-wow" hadoopNS = "hdfs://hadoop-scribe-company.com" zkHostName = Some("zookeeper.company.com")
Your First Run
The first time you run Parrot, you should run it very slowly with tracing enabled. Try
traceLevel = com.twitter.logging.Level.TRACE requestRate = 1 numInstances = 1 mesosFeederNumCpus = 1 mesosServerNumCpus = 1
When you've launched Parrot inspect the Parrot feeder's parrot-feeder.log and the Parrot server's parrot-server.log. If all looks well,
--
---
You received this message because you are subscribed to the Google Groups "Iago Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iago-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.