Automated Static Analysis on Review Request

Steven MacLeod

unread,

Jan 30, 2012, 3:59:13 PM1/30/12

to reviewb...@googlegroups.com

I'm currently writing a Review Board extension (ReviewBot) which aims to automate execution of static analysis tools on proposed code.

Basic Idea:

* A new review request is published.

* ReviewBot takes new code and executes appropriate analysis tools (such as pep8 for python).

* Output of tool is parsed and a review is generated

** Comments are made to relevant lines of code

** Issues possibly opened

** Ship-it if no problems

* Review is posted for each tool (i.e. pep8-Reviewer posts a review)

What I'm interested in is some opinions on the design of ReviewBot.

Currently I'm thinking of two main components to ReviewBot

1. A generic ReviewBot extension to handle posting reviews.

2. A parser specific to each tool.

Initially I was thinking of coding each of these as Review Board extensions, with the tool specific parsers just extending my base class. This is definitely the easiest method, but It has a serious drawback of running the analysis tools and parsers on the webserver. Keeping in mind that I want to make adding additional tools simple (only requiring the writing of a small parser), how do you think I should distribute this?

Mike Conley and I discussed the possibilities of a WebHooks extension to solve this problem. When a review is published, a web request could be fired off to ReviewBot on another server, which could run each tool, and then use the API to post the reviews.

Another option Christian Hammond mentioned to me was using distributed task queues (See python Celery: http://celeryproject.org/). This is something being looked in to already for distributing the diff processing, and could also be applied to ReviewBot.

Any other ideas?

Also, it would be nice to hear what others think about the class design. Do you think ReviewBot should be a central entity that each parser talks to or is called by? Or would a base class that new tool extensions derive from (i.e. The pep8-ReviewBot) make more sense?

- Steve

John J Lee

unread,

Feb 1, 2012, 6:38:52 PM2/1/12

to reviewb...@googlegroups.com

On Mon, 30 Jan 2012, Steven MacLeod wrote:

> I'm currently writing a Review Board extension (ReviewBot) which aims to
> automate execution of static analysis tools on proposed code.

Great.

(RB below == ReviewBoard, not ReviewBot!)

> ** Ship-it if no problems

As I'm sure you know, review workflows differ -- for example, some
workflows allow commit when there's at least one ship-it. In that
workflow you probably prefer that the bot doesn't give a ship-it
(otherwise, it's hard to prioritize your effort as a reviewer because you
can't see quickly which reviews have a ship-it already and which don't).
It would be nice if ReviewBot were to be as flexible as RB itself about
that kind of variation.

While an arbitrary hook that can perform any action is completely
flexible, no matter how simple it is to write a plugin, there's always
some annoying work to implement the hook and adapt it to your PEP8 tool
(or whatever it is the plugin does). Nothing beats just installing
somebody else's plugin and having it just work.

So, another way to preserve that flexibility, which I think is better
because it's more convenient for the various kinds of RB user involved,
would be for plugins to stay out of doing concrete actions like adding
concrete RB comments, adding issues, adding a ship-it, etc. This is in
contrast to your web-hook idea as you first stated it, where the hook gets
to make whatever RB API calls it likes. Having the plugins not be
responsible for those actions does also have the advantage that it would
decouple the plugins from RB. In principle, that would mean users of
other review tools (notably gerrit) could contribute plugins that would
work on RB (as well as on gerrit) -- your web hook idea fits well with
that hope.

This makes life easy for people deploying RB, because they can just
install the PEP8 plugin somebody else wrote, and start off by leaving
ReviewBot set to its default behaviour for what actions to perform.
Maybe they then have five minutes later to change some ReviewBot
configuration to set a few flags like "Don't give a ship-it".

It makes life easy for plugin authors because they get an API to code to
that's much simpler than the full RB API (see below). And maybe a gerrit
user already wrote the plugin they want, so they have nothing to do (maybe
wishful thinking, but seems there's scope for a little bit of cooperation
here).

Probably 90% of workflows could be dealt with using built-in ReviewBot
code plus a couple of configuration options (add a ship it or not, add
issues or not), but there's always the option of adding another hook in
future to customize the actions arbitrarily.

> Mike Conley and I discussed the possibilities of a WebHooks extension to
> solve this problem. When a review is published, a web request could be
> fired off to ReviewBot on another server, which could run each tool, and
> then use the API to post the reviews.

Sounds easy (to use). What arguments would the hook take?

Per my points above re flexibility to workflow and ease of use: how about,
rather than suggesting that plugins use the full RB API, just exposing a
very simple ReviewBot API to the plugins, which allows only for submitting
a set of abstract comments. By "abstract", I just mean not actual RB
comments, but rather along the lines of (suitably REST-encoded) tuples
(file path, start line number, end line number, severity). ReviewBot
itself is then responsible for turning those into concrete RB comments,
issues, ship-its, etc.

(Side issue: With the severity field there I'm thinking it might be useful
to allow for indicating potential problems where the tool can't be sure
that the problem is real.)

Of course there's nothing here to stop the plugins doing RB API calls: I'm
just suggesting that providing a simpler alternative is easier for both
plugin authors and RB deployers.

> Another option Christian Hammond mentioned to me was using distributed task
> queues (See python Celery: http://celeryproject.org/). This is something
> being looked in to already for distributing the diff processing, and could
> also be applied to ReviewBot.

Perhaps that solves some problems for reviewboard (fair enough). It
doesn't sound like it solves a problem for ReviewBot plugin authors or
people deploying RB?

Oops, that was much longer than I intended, sorry!

John

Marcin Stępnicki

unread,

Feb 1, 2012, 6:57:24 PM2/1/12

to reviewb...@googlegroups.com

Dnia 2012-01-30, pon o godzinie 15:59 -0500, Steven MacLeod pisze:

> I'm currently writing a Review Board extension (ReviewBot) which aims to
> automate execution of static analysis tools on proposed code.

Hi!

I am trying to scratch my specific itch - namely, I'd like to have the
report from Yasca (www.yasca.org, sample:
http://embraceunity.com/wp-content/uploads/2010/09/Yasca-Report-20100911054536.html) included in ReviewBoard. Your approach seems to be more generic one and from what I see you've managed to predict more hurdles than I've been aware of.

Please let me know if I can be of any further help. Your idea seems to
be a perfect resolution to my problem :-).

Regards,
Marcin

Mike Conley

unread,

Feb 2, 2012, 6:04:51 PM2/2/12

to reviewboard-dev

Hey Steven,

We already talked about this, but I thought I'd share what we came up
with with the rest of reviewboard-dev.

So, the way I see it, there are two primary pieces to this project:

#1 An extension for Review Board that can be configured to fire POST
requests at some arbitrary server when a particular action happens.

In your case, I think you only care about firing events when a review
request changeset is published. The POST request, in this case,
should probably pass the review request ID#, as well as the diff #.

This extension could be expanded in the future to fire on other
events, but for your purposes, I think the above is sufficient.

#2 A "ReviewBot" service that sits, waiting for a POST request from
that WebHooks extension.

This service will idle on some server, waiting to hear that special
POST request. When it hears it, it goes to the review request, gets
the appropriate changeset, applies it to a local copy of the
repository, and then runs a series of static analysis tools on the
changed files. (For your purposes, pep8 is a fine start).

The output of the analysis tool is turned into review comments that
ReviewBot posts.

We might want ReviewBot to be configurable in the following ways:
-Comments on only the changed lines? Or on the entirety of every
changed file?
-Does it give a Ship-It when the analysis returns clean?
-Does it open an issue for every problem it finds?

Another thing to consider - how do we safeguard ReviewBot from abuse?
How do we prevent someone from spoofing that WebHook and causing
trouble?

And another question - with what technology should ReviewBot be
implemented? Review Board is mostly Python, so Twisted looks like the
logical choice. However, Node.js *is* the new sexy. I don't have too
much experience using either, so I can't really weigh in on the pros
and cons. Are there other alternatives?

Like I said earlier this week - just concentrate on completing that
WebHook extension, and once it does what we need it to do, we'll start
thinking about ReviewBot.

My two cents,

-Mike

Steven MacLeod

unread,

Mar 31, 2012, 1:04:15 AM3/31/12

to reviewb...@googlegroups.com

An initial prototype of Review Bot is now up on github: https://github.com/smacleod/ReviewBot

This is VERY early, and the current state of the code is more of a proof of concept. I will continue to update this repository as Review Bot is improved.

Steven MacLeod

unread,

Apr 4, 2012, 7:41:01 PM4/4/12

to reviewb...@googlegroups.com

Review Bot is getting much closer to the level of an alpha test. I've thrown the code up for review, including some screenshots here: http://reviews.reviewboard.org/r/3042/

On Monday, January 30, 2012 3:59:13 PM UTC-5, Steven MacLeod wrote:

Steven MacLeod

unread,

Apr 12, 2012, 6:03:07 PM4/12/12

to reviewb...@googlegroups.com

While developing Review Bot to it's current state, I've discovered a number of obstacles. In order to make everything production ready, some of these problems need to be tackled, and I'd appreciate some community input on the solutions.

= Current Architecture =

I think I should start with a quick overview of Review Bot's current state, to give the discussion some context. The architecture consists of 3 main components:

1. The 'Extension' - A Review Board extension which detects new review requests and queues them for processing. the Extension also provides the configuration UI in Review Board's admin panel.

2. A message 'Broker' - Review Bot distributes analysis using Celery (http://www.celeryproject.org/), a distributed task queue. The Broker is where each task is queued by the Extension, and consumed by the Workers.

3. 'Workers' - These are processes created by servers running an instance of celeryd. Workers wait for messages from the Broker, and preform the static analysis review. The worker then posts the result of the review to Review Board using the Web API.

== Details ==

The Extension includes information about the server and review requests in its message to the Broker, allowing all configuration to take place in the Review Board admin panel. This also allows the Broker and Worker pool to service multiple instances of Review Board.

Currently workers will execute every analysis tool installed in their server (Installed means provided in the Python entry point reviewbot.tools). Control for which tool is executed has not been implemented.

Each reviewbot.tools entry point is a function which accepts a single argument. This argument will be a 'Review' object. Each patched file from the change has been downloaded and stored in a temporary file, and information about each file is in `Review.files`. The tool is responsible for executing analysis on the files, and using methods provided by the `Review` to craft a review.

= Obstacles =

I think the overall architecture is pretty scalable, and I'd like to keep it that way. There are definitely some problems with how things are currently done though, and I'd appreciate some input on the following:

== Concurrent Review ==

When the Extension sends a message to the Broker, it includes a username and password; the worker uses these credentials when communicating with the Web API. Review Board only allows a single review draft per user, per review request (This is for good reasons). This poses problems when a review request is updated before the initial Review Bot reviews are complete. Review Bot will begin reviewing the new code, and the reviews will combine. This also causes the second worker to fail mid review. Additionally, if we make it so each analysis tool is executed in parallel, we will have the same issue.

What I propose to solve this problem is to have the Extension actually commit completed reviews. A worker should construct a review without repeatedly hitting the Web API, and then POST this to the Extension in a single request. The Extension will then handle actually committing the review to the database.

Can you think of any shortcomings to this solution?

== Concurrent Tool Execution & Tool Selection ==

Right now each review request generates a single message to the Broker, which means a single worker will process the request, and execute each tool. I don't like this method, but it has a pretty large benefit: Each tool is able to share the temporary file store of the modified code, and it is only downloaded from Review Board once. Executing tools like this also means you must stick to a single platform. If all worker pools must contain every tool needed, if you have a windows/unix only tool, you are stuck with a single platform.

Here are a few problems we should try and solve, that are involved in executing each tool concurrently:

- Include the tool to be executed in the message from the Extension to the Broker.

- Ensure only a worker on a server with the tool installed receives the message from the queue.

- What subset of all installed tools should be executed (configuration problem)?

- Ensure only tools which are installed in at least one worker are queued.

- Allow mixing platforms for the workers (Windows / Linux / ...).

- Can we use a single download of the patched files?

I have two proposed solutions, but neither solves all of the problems. One 'manual' solution, and one that I think is more 'elegant'. There is also some aspects common to each solution.

=== manual ===

Make all configuration of the tools manual. Allow the admin to specify manually configured queues, and which tools should be sent to each queue. The workers should then be manually configured to listen on the queue for the tools they have. We could devise a way to require only downloading the patched files once per queue.

=== elegant ===

Each tool to be executed should be a distinct message to the Broker. Queues are actually dynamically created in the Broker when either a message is generated for the queue, or a worker listens on the queue. These queues will correspond to a specific analysis tool. When workers are started, they should be set to listen to all queues which correspond to their installed tools.

=== common ===

Discovering all of the installed tools could be accomplished by adding an 'info' task to the Review Bot workers. Instead of executing a review, this task could just inform the Review Board instance which tools are installed on its server. A message to execute this task could be broadcast to all queues, and the Extension could display these results in the admin panel. I still think configuring which tools are executed automatically should be manually configured, but this could help populate the admin interface with helpful information.

Can you think of a better way to handle this stuff? Could either method be modified to solve more of the problems I mentioned? Have I overlooked something? What method do you like best?

== Analysis Tool API ==

The current API I've set up for each analysis tool to perform it's review is very basic. Right now the tool can set the top body and bottom body of the review manually, and comment using a simple `comment(msg, line)` method. What API would you like to see?

Also, each entry point is just a simple function, which isn't given much help in executing it's analysis. I'd like to convert the entry points over to a base class each tool can inherit from, which could provide some common operations, and helpful methods. Imagining you are writing a plug-in for a new analysis tool, what would you like to see from this base class?

== Configuration ==

Currently all configuration for Review Bot takes place in the Extension. This includes settings the tools should follow when commenting/opening issues. I really like this design as it makes administration very simple, and allows the Broker and workers to be shared between many Review Board instances all with distinct settings.

Something I can't seem to solve is how I can allow tool specific configuration in this environment. E.g. A tool provides a severity with it's warnings, how can a cutoff point be set for when an issue is raised in Review Board? Every tool is slightly different, and some tools may require some very unique configuration, how can I handle this? Is this even worth handling?

== Everything Else ==

Do you think I've missed something here? Am I not considering some important aspect of static analysis execution? Let me know if you have any other ideas or suggestions!

I apologize if this is long-winded, I tried to be as concise as I could.

Thanks for your time,

Steven

On Monday, January 30, 2012 3:59:13 PM UTC-5, Steven MacLeod wrote:

Mike Conley

unread,

Apr 15, 2012, 2:40:57 PM4/15/12

to reviewboard-dev

Hey Steven,

Thanks for putting this outline together.

I can make some suggestions, but as message-queue usage is relatively
new to me, take it with a grain of salt.

I like the sounds of the "elegant" solution, but let me see if I have
it right in my head:

1) The ReviewBot extension broadcasts the "info" message to all
queues, asking what analysis tools are available from the worker pool
2) The list of available tools populates the ReviewBot extension
configuration UI, allowing an RB admin to configure which tools will
be used (perhaps project by project?)
3) When a review request is posted, the queues associated with the
tools configured to run are sent messages indicating that the request
is available for review
4) Workers consume the messages, do their analysis, come up with a
result, and then make a request to a URL that the Review Bot extension
is listening on
5) The Review Bot extension hears the request, and generates a new,
complete review - essentially skipping over the draft step.

Do I have that right?

> The current API I've set up for each analysis tool to perform it's review
> is very basic. Right now the tool can set the top body and bottom body of
> the review manually, and comment using a simple `comment(msg, line)`
> method. What API would you like to see?

Being able to comment on a range of lines would probably be a good
idea.

Anyhow, am I understanding the elegant solution correctly?

-Mike

On Apr 12, 6:03 pm, Steven MacLeod <ste...@smacleod.ca> wrote:
> While developing Review Bot to it's current state, I've discovered a number
> of obstacles. In order to make everything production ready, some of these
> problems need to be tackled, and I'd appreciate some community input on the
> solutions.
>

> *= Current Architecture =*

>
> I think I should start with a quick overview of Review Bot's current state,
> to give the discussion some context. The architecture consists of 3 main
> components:
>
> 1. The 'Extension' - A Review Board extension which detects new review
> requests and queues them for processing. the Extension also provides the
> configuration UI in Review Board's admin panel.
>
> 2. A message 'Broker' - Review Bot distributes analysis using Celery
> (http://www.celeryproject.org/), a distributed task queue. The Broker is
> where each task is queued by the Extension, and consumed by the Workers.
>
> 3. 'Workers' - These are processes created by servers running an instance
> of celeryd. Workers wait for messages from the Broker, and preform the
> static analysis review. The worker then posts the result of the review to
> Review Board using the Web API.
>

> *== Details ==*

>
> The Extension includes information about the server and review requests in
> its message to the Broker, allowing all configuration to take place in the
> Review Board admin panel. This also allows the Broker and Worker pool to
> service multiple instances of Review Board.
>
> Currently workers will execute every analysis tool installed in their
> server (Installed means provided in the Python entry point
> reviewbot.tools). Control for which tool is executed has not been
> implemented.
>
> Each reviewbot.tools entry point is a function which accepts a single
> argument. This argument will be a 'Review' object. Each patched file from
> the change has been downloaded and stored in a temporary file, and
> information about each file is in `Review.files`. The tool is responsible
> for executing analysis on the files, and using methods provided by the
> `Review` to craft a review.
>

> *= Obstacles =*

>
> I think the overall architecture is pretty scalable, and I'd like to keep
> it that way. There are definitely some problems with how things are
> currently done though, and I'd appreciate some input on the following:
>

> *== Concurrent Review ==*

>
> When the Extension sends a message to the Broker, it includes a username
> and password; the worker uses these credentials when communicating with the
> Web API. Review Board only allows a single review draft per user, per
> review request (This is for good reasons). This poses problems when a
> review request is updated before the initial Review Bot reviews are
> complete. Review Bot will begin reviewing the new code, and the reviews
> will combine. This also causes the second worker to fail mid review.
> Additionally, if we make it so each analysis tool is executed in parallel,
> we will have the same issue.
>
> What I propose to solve this problem is to have the Extension actually
> commit completed reviews. A worker should construct a review without
> repeatedly hitting the Web API, and then POST this to the Extension in a
> single request. The Extension will then handle actually committing the
> review to the database.
>
> Can you think of any shortcomings to this solution?
>

> *== Concurrent Tool Execution & Tool Selection ==*

>
> Right now each review request generates a single message to the Broker,
> which means a single worker will process the request, and execute each
> tool. I don't like this method, but it has a pretty large benefit: Each
> tool is able to share the temporary file store of the modified code, and it
> is only downloaded from Review Board once. Executing tools like this also
> means you must stick to a single platform. If all worker pools must contain
> every tool needed, if you have a windows/unix only tool, you are stuck with
> a single platform.
>
> Here are a few problems we should try and solve, that are involved in
> executing each tool concurrently:
>
> - Include the tool to be executed in the message from the Extension to the
> Broker.
> - Ensure only a worker on a server with the tool installed receives the
> message from the queue.
> - What subset of all installed tools should be executed (configuration
> problem)?
> - Ensure only tools which are installed in at least one worker are queued.
> - Allow mixing platforms for the workers (Windows / Linux / ...).
> - Can we use a single download of the patched files?
>
> I have two proposed solutions, but neither solves all of the problems. One
> 'manual' solution, and one that I think is more 'elegant'. There is also
> some aspects common to each solution.
>

> *=== manual ===*

>
> Make all configuration of the tools manual. Allow the admin to specify
> manually configured queues, and which tools should be sent to each queue.
> The workers should then be manually configured to listen on the queue for
> the tools they have. We could devise a way to require only downloading the
> patched files once per queue.
>

> *=== elegant ===*

>
> Each tool to be executed should be a distinct message to the Broker. Queues
> are actually dynamically created in the Broker when either a message is
> generated for the queue, or a worker listens on the queue. These queues
> will correspond to a specific analysis tool. When workers are started, they
> should be set to listen to all queues which correspond to their installed
> tools.
>

> *=== common ===*

>
> Discovering all of the installed tools could be accomplished by adding an
> 'info' task to the Review Bot workers. Instead of executing a review, this
> task could just inform the Review Board instance which tools are installed
> on its server. A message to execute this task could be broadcast to all
> queues, and the Extension could display these results in the admin panel. I
> still think configuring which tools are executed automatically should be
> manually configured, but this could help populate the admin interface with
> helpful information.
>
> Can you think of a better way to handle this stuff? Could either method be
> modified to solve more of the problems I mentioned? Have I overlooked
> something? What method do you like best?
>

> *== Analysis Tool API ==*

>
> The current API I've set up for each analysis tool to perform it's review
> is very basic. Right now the tool can set the top body and bottom body of
> the review manually, and comment using a simple `comment(msg, line)`
> method. What API would you like to see?
>
> Also, each entry point is just a simple function, which isn't given much
> help in executing it's analysis. I'd like to convert the entry points over
> to a base class each tool can inherit from, which could provide some common
> operations, and helpful methods. Imagining you are writing a plug-in for a
> new analysis tool, what would you like to see from this base class?
>

> *== Configuration ==*

>
> Currently all configuration for Review Bot takes place in the Extension.
> This includes settings the tools should follow when commenting/opening
> issues. I really like this design as it makes administration very simple,
> and allows the Broker and workers to be shared between many Review Board
> instances all with distinct settings.
>
> Something I can't seem to solve is how I can allow tool specific
> configuration in this environment. E.g. A tool provides a severity with
> it's warnings, how can a cutoff point be set for when an issue is raised in
> Review Board? Every tool is slightly different, and some tools may require
> some very unique configuration, how can I handle this? Is this even worth
> handling?
>

> *== Everything Else ==*

Steven MacLeod

unread,

Apr 15, 2012, 3:07:05 PM4/15/12

to reviewb...@googlegroups.com

Yeah that's a pretty good description for that solution. There are a couple of details I'd like to clear up though:

I don't want to make the configuration UI for which tools are executed dependent on exactly which tools are returned by the 'info' broadcast. There very well could be cases when a worker cluster is down, or inaccessible, and I'd still like the admin to be able to configure it's tools to run. The way I envisioned it was the info request would provide a list the admin could examine while configuring, and then specify the tools using a more free-form interface such as a csv list or something. It could even perform auto-completion based on the list.

I haven't ironed out all the details of the broadcast to all workers. It's not possible to have a task run on every worker listening to a queue, that's just not how the message queues work. There is a way to define your own 'management commands' in celery, which you can broadcast to run for every worker, which is what I'll probably need to do. The other option is to somehow specify a unique queue for each distinct worker pool, and then send a single task to each of these queues for the info request. I'm not sure how I could automate generating all the unique queues though.

Christian Hammond

unread,

Apr 17, 2012, 1:24:15 AM4/17/12

to reviewb...@googlegroups.com

Hey Steven,

There's a lot to wrap my head around here, and it's clear you've given this a *lot* of thought. More so than I have. I'll see if I can be at all useful though.

On Thu, Apr 12, 2012 at 3:03 PM, Steven MacLeod <ste...@smacleod.ca> wrote:

== Concurrent Review ==

When the Extension sends a message to the Broker, it includes a username and password; the worker uses these credentials when communicating with the Web API. Review Board only allows a single review draft per user, per review request (This is for good reasons). This poses problems when a review request is updated before the initial Review Bot reviews are complete. Review Bot will begin reviewing the new code, and the reviews will combine. This also causes the second worker to fail mid review. Additionally, if we make it so each analysis tool is executed in parallel, we will have the same issue.

What I propose to solve this problem is to have the Extension actually commit completed reviews. A worker should construct a review without repeatedly hitting the Web API, and then POST this to the Extension in a single request. The Extension will then handle actually committing the review to the database.

Can you think of any shortcomings to this solution?

No, I think this is the right solution. A hundred API calls to the traditional API can easily swamp a server, especially when this is happening all the time. There's no reason that the main API has to be the only entrypoint for creating reviews. It's just a decent low-level API.

=== elegant ===

Each tool to be executed should be a distinct message to the Broker. Queues are actually dynamically created in the Broker when either a message is generated for the queue, or a worker listens on the queue. These queues will correspond to a specific analysis tool. When workers are started, they should be set to listen to all queues which correspond to their installed tools.

I like this more. Especially in large cloud deployments, it's nice not to have to manually configure every worker and tool.

I can see the pros of this, but what things specifically does it fail to address?

=== common ===

Discovering all of the installed tools could be accomplished by adding an 'info' task to the Review Bot workers. Instead of executing a review, this task could just inform the Review Board instance which tools are installed on its server. A message to execute this task could be broadcast to all queues, and the Extension could display these results in the admin panel. I still think configuring which tools are executed automatically should be manually configured, but this could help populate the admin interface with helpful information.

Can you think of a better way to handle this stuff? Could either method be modified to solve more of the problems I mentioned? Have I overlooked something? What method do you like best?

== Analysis Tool API ==

The current API I've set up for each analysis tool to perform it's review is very basic. Right now the tool can set the top body and bottom body of the review manually, and comment using a simple `comment(msg, line)` method. What API would you like to see?

I've always regretted that we've used "comment" for diff comments. Seems fine at first until you start supporting other formats. I'd rename to "diff_comment." Who knows, maybe you'll operate on file attachments of certain types later.

A line range would be good, as Mike stated.

Is it worth having a flag for an issue? Maybe some things are issues and some are just suggestions?

Also, each entry point is just a simple function, which isn't given much help in executing it's analysis. I'd like to convert the entry points over to a base class each tool can inherit from, which could provide some common operations, and helpful methods. Imagining you are writing a plug-in for a new analysis tool, what would you like to see from this base class?

So, like "comment" would translate to a "comment" function?

I don't think I have enough of this in my head to suggest much here, beyond the mention about file attachment comments.

You can probably keep this side of it expansive though by not hard-coding methods, and instead mapping a name from the message to something like "handle_<name>" and dynamically invoking it.

== Configuration ==

Currently all configuration for Review Bot takes place in the Extension. This includes settings the tools should follow when commenting/opening issues. I really like this design as it makes administration very simple, and allows the Broker and workers to be shared between many Review Board instances all with distinct settings.

Something I can't seem to solve is how I can allow tool specific configuration in this environment. E.g. A tool provides a severity with it's warnings, how can a cutoff point be set for when an issue is raised in Review Board? Every tool is slightly different, and some tools may require some very unique configuration, how can I handle this? Is this even worth handling?

Does the Review Board side of things know anything about the tools? If so, each tool could provide a Form class we can render as part of the settings page. Kind of like what we do for auth backends (though in that case, we only show one at a time, but they're all in the HTML).

Christian

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com

Steven MacLeod

unread,

Apr 17, 2012, 3:19:06 PM4/17/12

to reviewb...@googlegroups.com

I can see the pros of this, but what things specifically does it fail to address?

The main issue I'm having is figuring out how I can only hit the RB server once for the patched files. I don't know if it's something I should even be worrying about though.

I've always regretted that we've used "comment" for diff comments. Seems fine at first until you start supporting other formats. I'd rename to "diff_comment." Who knows, maybe you'll operate on file attachments of certain types later.

Good point. I've had the idea of Review Bot reviewing parts of the request other than the code in the back of my mind, and this is a distinction I hadn't thought of.

Is it worth having a flag for an issue? Maybe some things are issues and some are just suggestions?

To handle the raising of issues, I've imaged a global configuration for each Review Board instance which looks like this:

Modified Code: [No Comment || Comment || Comment & Issue]
Unmodified Code: [No Comment || Comment || Comment & Issue]
Ship It?: [ No || Yes If no issues on modified code || Yes if no issues]

I think this will be treated as a suggestion though; I would like to give the tool the option to ignore the settings and decide themselves, since this scheme may not make sense for some tools (especially with a wide range of severities). I thought this could be done by providing two interfaces to the tool:

A lower-level commenting interface where they specify if the comment should be an Issue.
A higher-level interface where they just provide the comment/line and it decides the issue state according to the settings, and then calls the lower-level interface.

I know this can't handle every case out there, and per tool configuration is really what is needed, but I think this is a good start.

You can probably keep this side of it expansive though by not hard-coding methods, and instead mapping a name from the message to something like "handle_<name>" and dynamically invoking it.

I'm not sure I fully understand what you are getting at here. Are you proposing a "handle_<name>" method for each piece of the request (file attachment, code file, etc.)? Do you think you could elaborate a little more for me?

Does the Review Board side of things know anything about the tools? If so, each tool could provide a Form class we can render as part of the settings page. Kind of like what we do for auth backends (though in that case, we only show one at a time, but they're all in the HTML).

The Review Board extension knows nothing about the tools at the moment. I've kept the python packages separate so the web server doesn't get forced to install a bunch of analysis tools. Here are a few ideas I have for dealing with the configuration of each tool:

Let the plugin author deal with it. If they want special configuration, they need to allow something on the worker side.
Allow plugin authors to write their own RB extension, which will plugin to the Review Bot extension and provide the configuration. This would give them a lot more control, but would require quite a bit of extra work. Also, if many tools take advantage of this, the RB extension admin panel will become bloated.
Allow plugin authors to write plugin for the RB extension. This would be distinct from writing a RB extension in that they would just provide a special entry point my extension knows about.
Return some sort of configuration schema with the "info" request I've mentioned before. This could be optionally defined in the tool's class, and returned with the other information about the tool.

I definitely like 4 the best for a number of reasons. All information about the tool is kept in a single plugin for Review Bot and everything is dynamic based on what tools are installed in the worker pool. I have no idea what format this schema should take, or how powerful it should be. Any ideas here?

Christian Hammond

unread,

May 31, 2012, 1:00:30 AM5/31/12

to reviewb...@googlegroups.com

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com

On Tue, Apr 17, 2012 at 12:19 PM, Steven MacLeod <ste...@smacleod.ca> wrote:

I can see the pros of this, but what things specifically does it fail to address?

The main issue I'm having is figuring out how I can only hit the RB server once for the patched files. I don't know if it's something I should even be worrying about though.

They should be cached, so it's not terrible. You'll need one per individual worker I'm sure.

I've always regretted that we've used "comment" for diff comments. Seems fine at first until you start supporting other formats. I'd rename to "diff_comment." Who knows, maybe you'll operate on file attachments of certain types later.

Good point. I've had the idea of Review Bot reviewing parts of the request other than the code in the back of my mind, and this is a distinction I hadn't thought of.

Is it worth having a flag for an issue? Maybe some things are issues and some are just suggestions?

To handle the raising of issues, I've imaged a global configuration for each Review Board instance which looks like this:
Modified Code: [No Comment || Comment || Comment & Issue]

Unmodified Code: [No Comment || Comment || Comment & Issue]
Ship It?: [ No || Yes If no issues on modified code || Yes if no issues]
I think this will be treated as a suggestion though; I would like to give the tool the option to ignore the settings and decide themselves, since this scheme may not make sense for some tools (especially with a wide range of severities). I thought this could be done by providing two interfaces to the tool:

A lower-level commenting interface where they specify if the comment should be an Issue.
A higher-level interface where they just provide the comment/line and it decides the issue state according to the settings, and then calls the lower-level interface.

Why not both in one interface? Specify a line range and comment, and optionally whether it's an issue. If not provided, it falls back to the default. Fewer code paths to maintain.

I know this can't handle every case out there, and per tool configuration is really what is needed, but I think this is a good start.

You can probably keep this side of it expansive though by not hard-coding methods, and instead mapping a name from the message to something like "handle_<name>" and dynamically invoking it.

I'm not sure I fully understand what you are getting at here. Are you proposing a "handle_<name>" method for each piece of the request (file attachment, code file, etc.)? Do you think you could elaborate a little more for me?

So, the endpoint on the RB side that will handle the request can be a bit dynamic and do:

    def on_command(self, cmd, parameters):
        func = getattr(self, 'handle_%s' % cmd, None)

        if callable(func):
            func(parameters)

Basically.

I think what I was getting at is that the base tool class would have something like the above for any incoming requests from the tool workers, and then the subclasses can easily just define methods that the workers could invoke by creating handle_* functions. Does that make sense? It's possible I was confused by what you were saying.

Does the Review Board side of things know anything about the tools? If so, each tool could provide a Form class we can render as part of the settings page. Kind of like what we do for auth backends (though in that case, we only show one at a time, but they're all in the HTML).

The Review Board extension knows nothing about the tools at the moment. I've kept the python packages separate so the web server doesn't get forced to install a bunch of analysis tools. Here are a few ideas I have for dealing with the configuration of each tool:

Let the plugin author deal with it. If they want special configuration, they need to allow something on the worker side.
Allow plugin authors to write their own RB extension, which will plugin to the Review Bot extension and provide the configuration. This would give them a lot more control, but would require quite a bit of extra work. Also, if many tools take advantage of this, the RB extension admin panel will become bloated.

Allow plugin authors to write plugin for the RB extension. This would be distinct from writing a RB extension in that they would just provide a special entry point my extension knows about.
Return some sort of configuration schema with the "info" request I've mentioned before. This could be optionally defined in the tool's class, and returned with the other information about the tool.

I definitely like 4 the best for a number of reasons. All information about the tool is kept in a single plugin for Review Bot and everything is dynamic based on what tools are installed in the worker pool. I have no idea what format this schema should take, or how powerful it should be. Any ideas here?

So just to make sure I'm on the same page, the tools would be able to say "Here, these are my configuration options" and then the Review Bot extension's UI could render that somehow?

If so, you could represent it as a list of fields, each with a dictionary describing the name, type, help text, etc.

I think 1-3 aren't ideal, as they're just too much work for developers and users, so something that integrates well into what you have now would be best.

Christian

Daniel Laird

unread,

Sep 13, 2012, 8:54:11 AM9/13/12

to reviewb...@googlegroups.com, chi...@chipx86.com

Hi,

This feature really interested me - I know how busy people are but has this progressed in any way?

Cheers

Dan

Steven MacLeod

unread,

Sep 13, 2012, 11:56:59 AM9/13/12

to reviewb...@googlegroups.com

Hi Daniel,

There has been quite a bit of progress. Most of the features discussed here are completed, but I've hit a roadblock stopping me from getting some sort of beta out.

Currently the review bot extension is passing a username and password out to the workers in plain text, so they can access the API. We're not happy with this approach, and are trying to figure out an alternative. It's looking like I might end up implementing OAuth2 support for Review Board's API.

The Review Bot project currently lives at https://github.com/smacleod/ReviewBot if you'd like to take a look. I'm hoping to get some installation and use documentation up soon, but there's still some work to be done before I'd recommend any sort of production install.

Steve

Nilesh Jaiswal

unread,

Sep 25, 2012, 1:37:04 AM9/25/12

to reviewb...@googlegroups.com

I have been reading this thread and its look very interesting feature if included in the RB. We have similar Static code analysis code implemented and integrated in the rb client which does analysis of the code such as .C, cpp,Cxx, .h, python etc, for this we used third party tools configured to make this work.

Looking forward for this feature.

Daniel Laird

unread,

Oct 1, 2012, 7:56:39 AM10/1/12

to reviewb...@googlegroups.com

Hi Steve,

I have finally got round to playing with this and I am having a few issues - I suspect mainly with my setup but just wanted to see if you might be able to help.

I have a slave machine running rabbitmq (called dmucs)

To this I have added a reviewbot user (and password)

I have then started the reviewbot instance as specified:

reviewbot worker -b amqp://reviewbot:revi...@dmucs.domain.lan:5672//

The Broker URL format I have got from celery docs but this could be my first mistake.

On my test server I have installed the REviewBot extension.

I then try to configure.

Broker URL: amqp://reviewbot:revi...@dmucs.domain.lan:5672//

Review-Server: http://reviewboard-testing.domain.lan
User ID: I have chosen a number that I think = my user ID (based on the ordered list of users).

But is there an easy way to find out user ID? I tried username at it told me that was wrong. (this could be another mistake).

I then Save the extension settings.

I then submitted a python review to the test server and got silence.

What debug can be enabled and what would you like to see enabled?

Hope you can help as I think this will be rather good - and have already got an extension to support use of cpplint waiting which once polished would like to share.

Cheers

Dan Laird

Steven MacLeod

unread,

Oct 1, 2012, 11:05:09 AM10/1/12

to reviewb...@googlegroups.com

Hi Daniel,

All of your configuration looks good. The problems you are having are due to missing documentation. The README file was missing a critical step which I have now added (view at https://github.com/smacleod/ReviewBot/blob/master/README.md).

You will be interested in the final 'Installing and Registering Tasks' section. In a nutshell you need to trigger an 'installed tools refresh' which contacts the workers and discovers installed tools. This can be triggered in the review bot extension's 'DATABASE' admin panel.

As for discovering a user id, you can do this in the Users section of the Review Board admin panel. When on the page to edit a user, the url will show 'http://.../admin/db/auth/user/<user_id>/'. Requiring a numeric user id is not a friendly UI, and I plan to change this soon.

I think it's great that you've written a cpplint tool. I'd like to include a few tools with the base reviewbot installation, and would be happy to work with you to get cpplint included if you're interested.

Let me know if you have any more issues, questions, or comments,

Steve

PS: It should also be noted that the previous password issue has been fixed. Review Bot no longer passes around user passwords.

Daniel Laird

unread,

Oct 2, 2012, 9:16:03 AM10/2/12

to reviewb...@googlegroups.com

Hi Steve,

Thanks for that feedback - however I am still having issues. I suspect that it is a problem with Celery/RabbitMq but I have tried to follow various steps and am still having no luck.

I Have now installed the worker on the same machine (to try and use a broker URL of localhost).

However when I follow the steps now described in the Readme I do not get any updates.

Would it be possible to add some more detail on how you setup RabbitMQ in the readme?

Maybe also adding what sudo rabbitmqctl status produces before and after you have started the worker task (if there is any difference)

Once I can confirm I have got Celery (and/or RabbitMQ) setup then perhaps there is some further debug you can recomend in the extension itself?

I really appreciate the help.

Cheers

Dan

Steven MacLeod

unread,

Oct 2, 2012, 2:27:58 PM10/2/12

to reviewb...@googlegroups.com

Hi Dan,

I'd like you to take a look at the list of review bot tools in the review bot extensions database admin panel (This is where you ran the refresh installed tools). You should have an entry for 'PEP8 Style Checker', and you need to ensure it is set to 'Run automatically'.

If you have an empty list of tools, there was a problem with refreshing the tool list. To ensure the worker is connecting to RabbitMQ correctly run 'sudo rabbitmqctl list_consumers'. If your worker isn't connected, you should get output like this:

Listing consumers ...

...done.

But hopefully you should see something similar to:

Listing consumers ...

pep8.0.1 <rab...@SSMobile03.3.17872.0> 3 true

celery <rab...@SSMobile03.3.17872.0> 2 true

SSMobile03.celery.pidbox <rab...@SSMobile03.3.17876.0> 1 false

...done.

If you are getting the empty consumer list, could you please post the output from 'reviewbot worker -b amqp://<user>:<pass>@<host>:5672// -lINFO'

If you have the worker connected properly, and the PEP8 Tool set to run automatically, but still aren't seeing reviews, run the worker with '-lINFO' and see if you get any extra output when a request is posted. Also, check the Review Board log for any errors.

Let me know how it goes,

Steve

Daniel Laird

unread,

Oct 3, 2012, 3:50:12 AM10/3/12

to reviewb...@googlegroups.com

I have stuck with running the worker on the same machine as the reviewboard instance.

I get the following output when running:

user@cobra:~/ReviewBot/bot$ sudo rabbitmqctl list_consumers

Listing consumers ...

pep8.0.1 <rab...@cobra.2.8883.0> 3 true

pep8.0.1 <rab...@cobra.2.1839.0> 3 true

celery <rab...@cobra.2.8883.0> 2 true

celery <rab...@cobra.2.1839.0> 2 true

cobra.celery.pidbox <rab...@cobra.2.8887.0> 1 false

cobra.celery.pidbox <rab...@cobra.2.1843.0> 1 false

...done.

reviewbot is started by:

reviewbot worker -b amqp://guest:guest@localhost:5672// -lINFO

And I have configure the broker URL in the extension to be the same

amqp://guest:guest@localhost:5672//

In the reviewboard log I am seeing the following when I click 'Refresh Installed tools'

2012-10-03 07:48:29,759 - DEBUG - Start from server, version: 8.0, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2012 VMware, Inc.', u'capabilities': {}, u'platform': u'Erlang/OTP', u'version': u'2.8.7'}, mechanisms: [u'PLAIN', u'AMQPLAIN'], locales: [u'en_US']

2012-10-03 07:48:29,759 - DEBUG - Open OK! known_hosts []

2012-10-03 07:48:29,759 - DEBUG - using channel_id: 1

2012-10-03 07:48:29,760 - DEBUG - Channel open

But thats it - and I still see no registered tools.

I am starting to suspect a RabbitMQ installation/setup issue but maybe the above makes that more obvious to you.

Thanks for the great support

Dan

Steven MacLeod

unread,

Oct 3, 2012, 11:50:49 AM10/3/12

to reviewb...@googlegroups.com

Hey Dan,

I think your RabbitMQ configuration looks fine, and it appears that your worker is connecting to the queue properly (Although 'list_consumers' makes me think you're running two workers, is this the case?). I suspect it is a problem with either your entry points for the tool classes, or a problem using the rbtools API. I've added some extra logging to the worker to try and diagnose the problem.

Please pull the latest changes from the ReviewBot repository:

cd ReviewBot

git checkout master

git pull

This should give us some extra information about what's happening when the tool refresh is requested. Please run the worker with '-lINFO' and provide the output.

I probably should have mentioned this at an earlier stage, but there is a line in the README that states 'The Review Bot worker requires the api development branch of rbtools'. Can you also ensure you are running this api development branch of rbtools on your worker server. This can be installed using:

git clone git://github.com/reviewboard/rbtools.git

cd rbtools

git checkout -t origin/api

python setup.py install

- Steve

Daniel Laird

unread,

Oct 4, 2012, 4:25:09 AM10/4/12

to reviewb...@googlegroups.com

That extra logging prints out

[2012-10-04 09:07:46,071: ERROR/MainProcess] Could not reach RB server: HTTP 404

We have an annoyance in my setup (which I had forgotten) - which is that with post-review we have to specify --disable-proxy.

This is because our proxy server does not know about how test machine so tries to forward to the outside world.

If I unset http_proxy from the environment then all the detection works great, however do you know how I could add an option to reviewbot to replicate what --disable-proxy does but using the new RBTools api?

This would mean I would not need to unset the proxies before running the reviewbot

Thanks for all the help - I shall now test the posting of a review

Dan

Daniel Laird

unread,

Oct 4, 2012, 6:36:48 AM10/4/12

to reviewb...@googlegroups.com

Result :-)

The posting of a .py file triggers the review - and it posts its responses.

I have noticed the following:

the 'bot' gave a 'Ship it' even when issues were raised (I ticked the Ship It with no issues'

Also I then updated the review to change the reviewer - and reviewbot gave another 'Ship it' but did not rerun the output.

I would expect it to not rerun as the diff was not updated.

Other than that really pleased to have made progress and am now going to try a few other things

Dan

Steven MacLeod

unread,

Oct 9, 2012, 11:31:09 AM10/9/12

to reviewb...@googlegroups.com

Glad to here it's working now, sorry my response is so delayed, things have been hectic lately. I will be adding support for disabling the use of a proxy soon.

I have noticed the following:
the 'bot' gave a 'Ship it' even when issues were raised (I ticked the Ship It with no issues'

Also I then updated the review to change the reviewer - and reviewbot gave another 'Ship it' but did not rerun the output.
I would expect it to not rerun as the diff was not updated.

Those are some strange results I would not expect. I'll take a look into this.

Steve

Daniel Laird

unread,

Oct 17, 2012, 8:58:27 AM10/17/12

to reviewb...@googlegroups.com

Hi Steve,

I have just got back to looking at this and I am suffering from a bit of newbie'ism.

I am trying to get the reviewbot plugin to start as a daemon on startup on my slave machines but the documentation for celery has lots of examples none of which help me very much.

Are you developing by any chance on Linux and if so do you have a /etc/init.d script that works and that I could use?

Many thanks for your help so far

Dan

Steven MacLeod

unread,

Oct 17, 2012, 10:38:34 PM10/17/12

to reviewb...@googlegroups.com

Hey Dan,

I am developing on linux, but haven't put any effort into daemonizing 'reviewbot worker' yet. Until I write the scripts, you can follow the celery docs for daemonizing 'celeryd' http://docs.celeryproject.org/en/latest/tutorials/daemonizing.html#daemonizing

All that the 'reviewbot' command does is execute 'celery' with review bot's configuration. You can run the celery commands, such as 'celeryd' using this configuration by passing the '--app=reviewbot.celery' command line option. So, if you follow the celery guide, and make sure celeryd is provided with the --app option, it should work.

In the future the 'reviewbot' command might do more, but if I decide to break the ability to just use --app, I'll have custom scripts for daemonizing.

Hopefully this helps,

Steve

Daniel Laird

unread,

Nov 28, 2012, 8:45:49 AM11/28/12

to reviewb...@googlegroups.com

Hi Steven,

I have just posted a 'Pull' request for you.

It has a few changes and 2 new checkers (cpplint.py and cppcheck).

Its the first time I have used GitHub - but hopefully its done correctly.

Let me know what you think - and whether you can merge it.

We have been running it and are pleased with the results.

Cheers

Dan

Steven MacLeod

unread,

Nov 28, 2012, 2:52:10 PM11/28/12

to reviewb...@googlegroups.com

Hey Dan,

Thanks for the contribution! Do you think you could use post-review, and put the change on http://reviews.reviewboard.org for review? The repository has a .reviewboardrc file, so post-review should know what to do. You'll need to create an account though, if you don't already have one.

Thanks again,

Steve

daniel....@googlemail.com

unread,

Nov 28, 2012, 3:22:05 PM11/28/12

to reviewb...@googlegroups.com

Yep, sounds like a plan. Will sort that put tomorrow

Dan

Sent from my HTC

Reply all

Reply to author

Forward