Making DXR a rocking OpenSource project

Lionel Dricot

unread,

Feb 10, 2012, 11:14:44 AM2/10/12

to dev-stati...@lists.mozilla.org

Hi,

After discussing with Taras during FOSDEM, I've put some thoughts on how
to make DXR rock and have some appeal in the community.

Currently, I see some weaknesses that may hurt DXR massive adoption.
IMHO, this should be fixed before speaking more about DXR.

1) Indexing consume too much memory, making it impossible to use on less
than high-end server.
2) DXR is too tight to mozilla-central
3) Indexing is currently too slow (is it really a problem?)
4) DXR is hard to install and to use.
5) I still don't have a clear picture on how to use one DXR instance with multiple repositories

Regarding 2), we have identified the HG link and are currently trying to
put that in a config file.

1) and 3) looks rather critical because, for now, you need an high-end
machine and lote of time just to test it.

For 4), Carlos think that DXR should use autotools (or, I would say,
python-distutils). That would allow easy installation and, even more,
distribution packages. We should also make some clear shell commands :
index, deploy. Even a small PyGTK gui might be done quickly. Anyway, lot
of opportunity there.

So, what are, in your opinion, the priorities of DXR as an OpenSource
project? Should we try to make a roadmap?

Lionel

Joshua Cranmer

unread,

Feb 13, 2012, 3:21:28 PM2/13/12

to

On 2/10/2012 10:14 AM, Lionel Dricot wrote:
> 1) Indexing consume too much memory, making it impossible to use on less
> than high-end server.

If a project is going to be large enough that indexing memory is a
serious issue, then I suspect that people already have beefy machines
for building it. I'm not saying it's a useful goal; I'm just saying that
it probably won't drive off developers.

> 2) DXR is too tight to mozilla-central

As a rule of thumb, if it's something specific to Mozilla, it should be
able to be placed in a customizable "plugin" of some kind.

> 3) Indexing is currently too slow (is it really a problem?)
> 4) DXR is hard to install and to use.
> 5) I still don't have a clear picture on how to use one DXR instance with multiple repositories

This, IMHO, is the largest issue I would like to see fixed. At the very
least, the ability to have multiple trees in a single $DXR_ROOT install
and have it work nicely is necessary; even more feature-ific would be to
be able to share some index data cross-refs between different trees.

> For 4), Carlos think that DXR should use autotools (or, I would say,
> python-distutils). That would allow easy installation and, even more,
> distribution packages. We should also make some clear shell commands :
> index, deploy. Even a small PyGTK gui might be done quickly. Anyway, lot
> of opportunity there.
>

The main problem that keeps an indexer from being easily packageable is
the fact that you have to do custom steps to compile the target source
code anyways; this is what would make producing a, say, Debian dxr
package difficult. Another option could be to do something like what
bugzilla does: give a web-configurable administration page (which can be
locked down, of course) to avoid ever having to physically see the
command line.

> So, what are, in your opinion, the priorities of DXR as an OpenSource
> project? Should we try to make a roadmap?

Another issue I recall from when I discussed this at the LLVM
developer's meeting was the "#ifdef issue": how do you handle the fact
that, after preprocessing, not all the original source code gets "seen"
by the compiler? The mental model I've had for a long time is to somehow
be able to merge the records of two different compile runs into a single
indexing output.

Kristian Rietveld

unread,

Feb 13, 2012, 3:43:07 PM2/13/12

to dev-stati...@lists.mozilla.org

I think that is indeed a very interesting and also important issue. I
have been thinking about code bases with different "backends". Some
source files only get compiled on Linux, some files only on OS X (in
particular objective-C code to glue with Cocoa). Still, you would really
want to see all code recognized in the DXR interface.

Some methodology to merge the records produced by different
compilers/machines compiling the same code revision seems indeed
necessary. Though I am a bit worried which problems we would get to
solve when parsing records for the same source file produced by
different compilers. For example, what would happen with macros that
expand differently depending on the machine/platform it was preprocessed
on? (Though in that case it would be quite cool to actually show both
macro expansions tagged with the platform it was produced on).

regards,

-kris.

Joshua Cranmer

unread,

Feb 13, 2012, 4:56:05 PM2/13/12

to

Well, given that there are already examples where people define the same
macro with different definitions in a single compilation run (e.g.,
let's-fake-templates-in-C), the use case of macros would need to be
handled in the general case anyways.

But I've kind of imagined something visually like:

foo
+ Defined in widget/unix/foo.c (in Linux mode)
+ Defined in widget/mac/foo.c (in OS X mode)
+ Defined in widget/windows/foo.c (in Windows mode)
[etc]

Martyn Russell

unread,

Feb 14, 2012, 5:50:00 AM2/14/12

to Joshua Cranmer, dev-stati...@lists.mozilla.org

On 13/02/12 20:21, Joshua Cranmer wrote:
> On 2/10/2012 10:14 AM, Lionel Dricot wrote:
>> 1) Indexing consume too much memory, making it impossible to use on less
>> than high-end server.
>
> If a project is going to be large enough that indexing memory is a
> serious issue, then I suspect that people already have beefy machines
> for building it. I'm not saying it's a useful goal; I'm just saying that
> it probably won't drive off developers.

Well, with the Tracker project which Carlos and I maintain currently,
we've received a LOT of bug reports and mailing list comments (over
time) about how awful the resource use was on their local machines in
terms of CPU and memory use.

Clearly the two projects are focusing in different ways but it's put
people off using the Tracker project on their local machine before.

I would like to avoid that where possible.

--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.

Jean-Marc Desperrier

unread,

Feb 23, 2012, 7:29:06 AM2/23/12

to

Joshua Cranmer a écrit :

>> 5) I still don't have a clear picture on how to use one DXR instance
>> with multiple repositories
>
> This, IMHO, is the largest issue I would like to see fixed. At the very
> least, the ability to have multiple trees in a single $DXR_ROOT install
> and have it work nicely is necessary; even more feature-ific would be to
> be able to share some index data cross-refs between different trees.

However I don't think it's the major issue for people who just want to
test. They will first test on one project.
Also, performance would be nice to have (I have a mxr running and the
fact the indexing takes 4 hours on a quite beefy machine is a bit
annoying), but not absolutely required.

I think the number 1 priority is getting it easy to install&use.

Nothing will pay as much as efforts done on that point (except efforts
so that you don't already find bugs after a few minutes of use, when
they make the product look like it's unusable). I've had some first hand
experience about how the complexities to install bugzilla have hindered
adoption.

The first step to get there is prebuild binaries for various platform
(clang has many a big + for it http://llvm.org/releases/download.html),
with installers for windows and .dmg for Mac (wireshark is also a
reference for the up-and-running within minutes aspect).

Of course the hard part for DXR is that you need to integrate it with
the compilation process, and that will never be very easy.
However good step by step tutorials for the most common cases,
frequently verified to make sure they haven't been broken, would still
be useful.
There could be a tutorial for when your project uses autotools, uses a
makefile with the standard compilation macro, uses Maven C/C++, and a
tutorial to cheat by replacing the gcc in the path by a script that
instead runs the proper option for clang/dxr and then change nothing to
the actual compilation script which can then be very non-standard, and
that we won't even try to understand.

I think also support for java would be a big plus, even if there's no
way to make it happen in the short time.

Negreanu Marius

unread,

Feb 23, 2012, 8:26:06 AM2/23/12

to Jean-Marc Desperrier, dev-stati...@lists.mozilla.org

> The first step to get there is prebuild binaries for various platform (clang
> has many a big + for it http://llvm.org/releases/download.html), with
> installers for windows and .dmg for Mac (wireshark is also a reference for
> the up-and-running within minutes aspect).

For this reason I'm currently working on a doxygen backend for DXR.
Doxygen can dump XML-s for inheritance references macros, #includes
and many other infos, w/o requiring a big dependency such as llvm/clang.
Also, I think if something is needed in the backend, you have a
bigger/faster chance
to get it into doxygen than llvm/clang.

As a plus, you get support for all languages that doxygen can parse, which
includes Java, PHP, C++, Fortran.

--
Regards!
http://groleo.wordpress.com

Taras Glek

unread,

Feb 23, 2012, 1:22:43 PM2/23/12

to dev-stati...@lists.mozilla.org

On 2/23/2012 5:26 AM, Negreanu Marius wrote:
>> The first step to get there is prebuild binaries for various platform (clang
>> has many a big + for it http://llvm.org/releases/download.html), with
>> installers for windows and .dmg for Mac (wireshark is also a reference for
>> the up-and-running within minutes aspect).

> For this reason I'm currently working on a doxygen backend for DXR.
> Doxygen can dump XML-s for inheritance references macros, #includes
> and many other infos, w/o requiring a big dependency such as llvm/clang.
> Also, I think if something is needed in the backend, you have a
> bigger/faster chance
> to get it into doxygen than llvm/clang.
>
> As a plus, you get support for all languages that doxygen can parse, which
> includes Java, PHP, C++, Fortran.

Having doxygen to fallback on for files we don't have native support for
would be awesome. Looking forward to this.

Taras

Ehsan Akhgari

unread,

Feb 23, 2012, 2:56:32 PM2/23/12

to Taras Glek, dev-stati...@lists.mozilla.org

On Thu, Feb 23, 2012 at 1:22 PM, Taras Glek <tg...@mozilla.com> wrote:

> On 2/23/2012 5:26 AM, Negreanu Marius wrote:
>

>> The first step to get there is prebuild binaries for various platform
>>> (clang

>>> has many a big + for it http://llvm.org/releases/**download.html<http://llvm.org/releases/download.html>),

>>> with
>>> installers for windows and .dmg for Mac (wireshark is also a reference
>>> for
>>> the up-and-running within minutes aspect).
>>>

>> For this reason I'm currently working on a doxygen backend for DXR.
>> Doxygen can dump XML-s for inheritance references macros, #includes
>> and many other infos, w/o requiring a big dependency such as llvm/clang.
>> Also, I think if something is needed in the backend, you have a
>> bigger/faster chance
>> to get it into doxygen than llvm/clang.
>>
>> As a plus, you get support for all languages that doxygen can parse, which
>> includes Java, PHP, C++, Fortran.
>>
> Having doxygen to fallback on for files we don't have native support for
> would be awesome. Looking forward to this.
>

Yeah, +1! Please let us know about your progress!

Thanks!
--
Ehsan
<http://ehsanakhgari.org/>