Statistics on SW development and language use?

Michael Schuerig

unread,

Oct 4, 1998, 3:00:00 AM10/4/98

to

I'm looking for current statistics on what software developers are doing
and what tools they're using.

In particular

- How many developers are working in various areas, shrink-wrap, custom
business apps, embedded systems, etc.?

- How large are these markets in terms of money?

- What programming languages is existing code written in? (#developers,
LOCs)

- What programming languages is new code written in? (#developers, LOCs)

- What methods are used?

- What tools are used in addition to editors, compilers and linkers?

- Education of people in the field.

I don't expect there to be one all-comprising study, but maybe there are
some more focussed ones. Thanks for any pointers.

Michael

--
Michael Schuerig
mailto:schu...@acm.org
http://www.schuerig.de/michael/

Scott P. Duncan

unread,

Oct 4, 1998, 3:00:00 AM10/4/98

to

Michael Schuerig wrote:

> I'm looking for current statistics on what software developers are doing
> and what tools they're using.

> I don't expect there to be one all-comprising study, but maybe there are
> some more focussed ones.

You might try looking at Ed Yourdon's and Howard Rubin's sites as theyare
often involved in large-scale (even international) studies which touch
on several of the questions you raise (and many others).

Yourdon's site is http://www.yourdon.com/index.htm and Rubin's is
http://www.hrubin.com/ (who, for sure, has world-wide benchmarking
data which Yourdon has participated in collecting with him). You may
also find related links of interest at the links page (swlinks.html) on my
site.

--
Scott P. Duncan
SoftQual Consulting http://www.mindspring.com/~softqual/

Ehud Lamm

unread,

Oct 4, 1998, 3:00:00 AM10/4/98

to

I don't have answers. But I do have some more questions I think worth
prusuing.

- How many lines of code (loc) are produced on average by a programmer
over some time period (say month).

- How large is a software product in terms of modules/classes/loc etc.?

Etc.

Ehud Lamm msl...@pluto.mscc.huji.ac.il

Ralph Cook

unread,

Oct 5, 1998, 3:00:00 AM10/5/98

to

Ehud Lamm wrote:

> - How many lines of code (loc) are produced on average by a programmer
> over some time period (say month).

Be sure to include how many soft drinks they consume per month per
line of code produced, ranked by sugar and caffeine content.

rc
--
When I do speak for my company, I list my title and use "we".

Michael Schuerig

unread,

Oct 5, 1998, 3:00:00 AM10/5/98

to

Ralph Cook <rc...@pobox.com> wrote:

> Ehud Lamm wrote:
>
> > - How many lines of code (loc) are produced on average by a programmer
> > over some time period (say month).
>
> Be sure to include how many soft drinks they consume per month per
> line of code produced, ranked by sugar and caffeine content.

Yes, of course. At least my initial question was not aimed at programmer
productivity. Rather, it was/is meant to get an idea what others are
doing, from a pretty abstract point of view.

Rommert J. Casimir

unread,

Oct 6, 1998, 3:00:00 AM10/6/98

to

Ehud Lamm wrote:
>
> - How many lines of code (loc) are produced on average by a programmer
> over some time period (say month).

A classic rule of thumb is that one line of code takes one hour. This
includes everything from specification to testing and documentation, and
only the necessary lines in the final product are included.
--
Rommert J. Casimir
Tilburg University, Room B435, tel 31 13 4662016
P.O. Box 90153, 5000LE Tilburg, The Netherlands
http://cwis.kub.nl/~few/few/BIKA/rc_home.htm

Ehud Lamm

unread,

Oct 6, 1998, 3:00:00 AM10/6/98

to

> Yes, of course. At least my initial question was not aimed at programmer
> productivity. Rather, it was/is meant to get an idea what others are
> doing, from a pretty abstract point of view.
>

LOC can measure more than productivity. They measure software complexity,
language facilities etc.

Ehud

Scott P. Duncan

unread,

Oct 6, 1998, 3:00:00 AM10/6/98

to

> LOC can measure more than productivity. They measure software complexity,
> language facilities etc.

I think we should not forget that LOC is just a size measure. It itself,
LOCdoes not "measure" anything. It is a denominator in some metrics and a
numerator in others.

What makes LOC an undesirable measure for many is that it is indirect and
subject to individual variation (i.e., a person's coding style or even the
prescribed one at an organization is not likely to match the next
person/organization).

That said, you can, of course, use LOC, FPs, defects, hours, headcount,
etc. to represent a lot of things.

Biju Thomas

unread,

Oct 6, 1998, 3:00:00 AM10/6/98

to

Rommert J. Casimir wrote:
>
> A classic rule of thumb is that one line of code takes one hour. This
> includes everything from specification to testing and documentation, and
> only the necessary lines in the final product are included.

Are there any studies/statistics that give this figure?

Regards,
Biju Thomas

Wayne Woodruff

unread,

Oct 7, 1998, 3:00:00 AM10/7/98

to

"Rapid Development" by McConnell (ISBN 1-55615-900-5) has quite a
bit on estimation of size and effort. Besides that, there is a
wealth of other good info in the book.

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\ Wayne Woodruff /
/ home page: http://www.jtan.com/~wayne \
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Ehud Lamm

unread,

Oct 7, 1998, 3:00:00 AM10/7/98

to

On Tue, 6 Oct 1998, Scott P. Duncan wrote:

> > LOC can measure more than productivity. They measure software complexity,
> > language facilities etc.
>
> I think we should not forget that LOC is just a size measure. It itself,
> LOCdoes not "measure" anything. It is a denominator in some metrics and a
> numerator in others.

Agreed.

>
> What makes LOC an undesirable measure for many is that it is indirect and
> subject to individual variation (i.e., a person's coding style or even the
> prescribed one at an organization is not likely to match the next
> person/organization).
>

This is exactly my point. You can use avergae LOC in a procedure by each
programmer to see variation in coding style. If you stick to one
programmer, fluent in more than one language - you can measure language
differences etc.

Sure it is not perfect. You still have to THINK about what the data means
etc. But it does offer some useful quantative measure.

Ehud Lamm msl...@pluto.mscc.huji.ac.il

Scott P. Duncan

unread,

Oct 7, 1998, 3:00:00 AM10/7/98

to

Ehud Lamm wrote:

> On Tue, 6 Oct 1998, Scott P. Duncan wrote:
>
> > What makes LOC an undesirable measure for many is that it is indirect and
> > subject to individual variation (i.e., a person's coding style or even the
> > prescribed one at an organization is not likely to match the next
> > person/organization).
>
> This is exactly my point. You can use avergae LOC in a procedure by each
> programmer to see variation in coding style. If you stick to one
> programmer, fluent in more than one language - you can measure language
> differences etc.

The real issue is whether a measurement at the individual level buysyou anything
significant as opposed to investing the effort in some
other measures (even LOC based).

> Sure it is not perfect. You still have to THINK about what the data means
> etc. But it does offer some useful quantative measure.

I agree with this. Again, investing that effort and thought might better be
expended on something more global.

What most improvement models assume one would use measurement for
it to instigate change if the data suggests less than optimal results from a
task/process. What will be the result of measuring language differences?
What change might they be expected to imply?

There have been some empirical studies done at the level you are talking
about. You might want to examine procedings from the Empirical Studies
of Programmers Workshops. Unfortunately, not a lot of data has come
from in-the-trenches industry. That low a level of focus has not been too
attractive to most organizations even willing to sponsor productivity and
quality studies. Soloway has done a lot of work with student programmers
on language comprehension, etc.

Ehud Lamm

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

On Wed, 7 Oct 1998, Scott P. Duncan wrote:

> Ehud Lamm wrote:
>
> > On Tue, 6 Oct 1998, Scott P. Duncan wrote:
> >
> > This is exactly my point. You can use avergae LOC in a procedure by each
> > programmer to see variation in coding style. If you stick to one

> > programmer, fluent in more than onelanguage - you can measure language
> > differences etc.
>

> What most improvement models assume one would use measurement for
> it to instigate change if the data suggests less than optimal results from a
> task/process. What will be the result of measuring language differences?
> What change might they be expected to imply?

I can think of two things. 1) Maybe it is smarter to use a different
language. If languages have simmilar expressive powers (let's say C/PASCAl
or ADA/MODULE2 etc.) but very different productivity, you should choose
the better one. 2) Perhaps yuo can find ways to improve computer
languages, and prog. env.

>
> There havebeen some empirical studies done at the level you are talking

> about. You might want to examine procedings from the Empirical Studies
> of Programmers Workshops. Unfortunately, not a lot of data has come

> from in-the-trenches industry. That low a levelof focus has not been too

> attractive to most organizations even willing to sponsor productivity and
> quality studies. Soloway has done a lot of work with student programmers
> on language comprehension, etc.

Any references? Web pages?

Thanks.

Ehud Lamm msl...@pluto.mscc.huji.ac.il

Scott P. Duncan

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

Ehud Lamm wrote:

> I can think of two things. 1) Maybe it is smarter to use a different
> language. If languages have simmilar expressive powers (let's say C/PASCAl
> or ADA/MODULE2 etc.) but very different productivity, you should choose
> the better one. 2) Perhaps yuo can find ways to improve computer
> languages, and prog. env.

Which is why I suggested the Empirical Studies of Programmers workas what you are
suggesting doing is not traditional software metrics and
measurement but research into programming languages. The research
work I've mentioned addresses what has been and is being done in this
field. Is it R&D you want to do?

> Any references? Web pages?

Try a web search on "Empirical Studies of Programmers." Also lookinto ACM's
SIGCHI and other Human-Computer Interaction (HCI)
topics. You can also try http://cse.unl.edu/~susan/esp/ which has
some links from it elsewhere as well. And the Proceedings were
publushe, at least in the past, by Ablex Publishing.

Insiguru

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

>rom: Ehud Lamm <msl...@mscc.huji.ac.il>

>I can think of two things. 1) Maybe it is smarter to use a different
>language. If languages have simmilar expressive powers (let's say C/PASCAl
>or ADA/MODULE2 etc.) but very different productivity, you should choose

Good Point.

A study by NASA's Software Engineering Lab showed that there are, at least in
their environment, relationship differences in languages and applications.

In a study of 33 of their projects, the SEL found that productivity varied from
3 to 5 developed lines of code(DLOC) for FORTRAN projects. 3.5 is recommended
for planning.

Productivity in ADA projects showed even less variability and the internally
recommended productivity per technical and management hours for ADA projects
are 5.0 DLOC .

The study found that their three effort model parameters - productivity, cost
to reuse code, and growth factor - that productivity and reuse cost varies by
language and the schedule growth factor varies by the level of reuse.
While the effort model does not depend on application, the schedule model
varies by application. AGSS and simulators scheduling models are different.
Even with effort being constant, AGSS (ground support systems) takes more
calendar time to develop than simulators. Also, distribution of the life cycle
phases is a function of the amount reuse. So is schedule growth, projects with
moderate to low reuse grows on average by 35% while high reuse project grows by
5%.

To the SEL, local data is invaluable and spells the difference between success
and failure. With it, they can understand their environment. Metrics is the
basis estimating, planning, tracking, decision-making and process improvement.
Their local database called the Experience Factory is the basis for their
success.

For us outsiders, there are even benefits, knowledge of the relationships
between the various estimating parameters is extremely valuable. As we approach
technical similarity with the SEL, the metrics becomes even more valuable. At
some point in similarity, some organization may elect to use 5 lines of ADA
code per hour as a benchmark. However, differences in organizational
architecture, both human/social and technical subsystems, preclude the use of
the SEL's relationships data in a foreign environment.

Randy Mathis
Los Angeles

Scott P. Duncan

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

Insiguru wrote:

> In a study of 33 of their projects, the SEL found that productivity varied from
> 3 to 5 developed lines of code(DLOC) for FORTRAN projects. 3.5 is recommended
> for planning.
>
> Productivity in ADA projects showed even less variability and the internally
> recommended productivity per technical and management hours for ADA projects
> are 5.0 DLOC .

I've forgotten some of the context in these two studies. Did they define
"productivity" to be DLOC/<unit of time>, i.e., not any comparable
functionality?

I'm asking because I think the SEL had other productivity targets based on
increasingly greater reuse over time which is where they found Ada to be of
growing benefit over FORTRAN. It was not the raw LOC/<unit of time>
that ultimately interested them as I do not think they rationalized DLOC between
languages to come up with a comparison. That is, they did not assume a LOC
in FORTRAN was the same as a LOC in Ada and, using the numbers above,
conclude that Ada, on a line by line basis, was more productive.

> that productivity and reuse cost varies by
> language and the schedule growth factor varies by the level of reuse.

Right...reuse was their bigger concern than DLOC number comparisons
and that simply took time to occur.

> To the SEL, local data is invaluable and spells the difference between success
> and failure. With it, they can understand their environment.

I believe this is their major message over the years, not that what they
learned is necessarily transferable between organizations. (Which is not
what I'm suggesting you are saying, just that Frank McGarry used to
take great pains each year at the SEL Workshop to emphasize the point
that other people should expect different results in their environment.)

> Their local database called the Experience Factory is the basis for their
> success.

Also, Basili, et al developed the Goal-Question-Metric paradigm through
the SEL activity.

> At some point in similarity,

It's figuring this out that could take a while which is why McGarryadvocated
emulating the Experience Factory model more than the
data they developed.

Insiguru

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to

On the contrary, the purpose of the single study was to search for trends in
language and application differences.

For planning and estimating purposes, they wanted to know the differences
between the Fortran and ADA projects as well as the differences between the
AGSS systems and the simulator systems.

As long as I can remmber, since 1985, DLOC and effort has been their standard
measure of productivity.

Their major improvement strategy, and here I go into more of a I believe mode,
has been the use of OO to facilitate reuse. In some of their projects, they
achieved 70 percent reuse.

Randy Mathis
Los Angeles

.

Ehud Lamm

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to Scott P. Duncan

On Wed, 14 Oct 1998, Scott P. Duncan wrote:

> Which is why I suggested the Empirical Studies of Programmers workas what you are
> suggesting doing is not traditional software metrics and
> measurement but research into programming languages. The research
> work I've mentioned addresses what has been and is being done in this
> field. Is it R&D you want to do?

That too. Why not? If it helps anything...

One can argue that R&D efforts are the origin of the major advances in SE.

>
> > Any references? Web pages?
>
> Try a web search on "Empirical Studies of Programmers." Also lookinto ACM's
> SIGCHI and other Human-Computer Interaction (HCI)
> topics. You can also try http://cse.unl.edu/~susan/esp/ which has
> some links from it elsewhere as well. And the Proceedings were
> publushe, at least in the past, by Ablex Publishing.
>

Thakns. I will look them up.

Ehud Lamm msl...@pluto.mscc.huji.ac.il

Scott P. Duncan

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to

> On the contrary, the purpose of the single study was to search for trends in
> language and application differences.

Perhaps the single study, but I believe they have been doing an organizedprogram
(along with the UofMD researchers) for over 25 years.

> As long as I can remmber, since 1985, DLOC and effort has been their standard
> measure of productivity.

What I was wondering about was the interpretation put upon the DLOCmeasure
regarding productivity. I think they have some assumptions
regarding the "appropriate" size of things and the effort they expect to
put into specific kinds of applications. Knowing this, the DLOC becomes
more than a "size/time" number which anyone could compute starting
tomorrow. I think, knowing the research goals of the overall effort, that
they realized such a raw number would have to be put into the context of
a domain where delivered functionality, if not rigorously measured, was
reasonably well understood.

> Their major improvement strategy, and here I go into more of a I believe mode,
> has been the use of OO to facilitate reuse. In some of their projects, they
> achieved 70 percent reuse.

I believe you are correct here and recall the 70% reuse number froma SEL Workshop
~5 years ago or so. I'm sure it is documented in
the SEL Workshop Proceedings since Frank McGarry (and/or Vic
Basili) used to give overview talks at the outset of theWorkshops
which would summarize such things and I can remember Frank
talking about the FORTRAN/Ada studies and the high reuse they
had on at least one occasion.

Unfortunately, I have not gone to the last several Workshops to know
what has happened since then. I recall there was also a hint that some
C programming was beginning to be introduced by one or more of the
contractors they used. Since NASA is not a DoD organization, the
use of Ada was not mandatory and one contractor spoke about their
tendency to move to C, rather than Ada, when the latter was not
mandated. I do not believe the SEL had done any C language study
up to the point when I last attended.

For those not familiar with the NASA Software Engineering Lab, it
is a "fictitious" entity. That is, it is not an official NASA entity but a
cooperative effort betwen Goddard Space Flight Center in Greenbelt,
MD, researchers from the UofMD (grad students and faculty like
Vic Basili and Marvin Zelkowitz), and Goddard's prime contractor
(CSC). It started over 25 years ago and the NASA lead was Frank
McGarry (who retired a few years ago and joined CSC, whose lead
person for the SEL was Jerry Page). The SEL probably has more
data and experimental effort behind it -- at least in forms available
to others -- than any other s/w development organization on earth.
What is important is that the SEL has always stressed that people
should emulate their approach and process and the examples of what
sorts of things can be studied. They have always cautioned against
taking their numbers and results out of context, i.e., they have not
encouraged people to use them as a "benchmark," preferring to be
looked upon as just a contributor to the pool of data available.

Scott P. Duncan

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to

Ehud Lamm wrote:

> On Wed, 14 Oct 1998, Scott P. Duncan wrote:
>
> Is it R&D you want to do?
>
> That too. Why not? If it helps anything...

Because a serious research commitment in language studies is not thesame as data
collection and application of a "metric" with the belief
that the metric has already been demonstrated to have some validity.
The purpose of the R&D is to discover/determine the validity.

Some organizations want a plug-and-play metric, they do not want
to invest in research. The NASA SEL work has been going on for
20+ years and has a considerable background to it before the point
where they started doing language comparison studies and drawing
any conclusions for their environment.

> One can argue that R&D efforts are the origin of the major advances in SE.

I do not disagree, in general, but R&D into SE is not the same thing asthe
applications of the results of that R&D. What does your organization
want? Are they prepared to do R&D (which may not lead to an immediately
applicable result) and take months/years to come up with such an advance?

By the way, I said "in general" above because of an interesting experience
at one of the early Empirical Studies of Programmers Workshops. During
one session, Nicholas Zvegintzov asked the room full of researchers if they
felt there was experimental data (or even case study data) that effectively
proved that structured code was "better" than code that was not structured.
Nobody raised their hand, indeed several folks indicated they did not feel
such data existed, though, intuitively, everyone agreed structured code
was "better." Of course, to be fair, people like Thomas Green has been
doing language syntax/structure research for some years and had results
that suggested a structured approach would be better. However, no one
had done the actual study in specific language instances. In this case, the
"advance" had occurred based on other factors and was "proven" through
general acceptance by programmers. The reasoning for "why" came after
a much wider adoption of the practice, but the "whether/if" was never really
challenged through R&D.

Insiguru

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to

>From: "Scott P. Duncan" <soft...@mindspring.com>

believe you are correct here and recall the 70% reuse number from a SEL

Workshop
~5 years ago or so. I'm sure it is documented in
the SEL Workshop Proceedings since Frank McGarry (and/or Vic
Basili) used to give overview talks at the outset of theWorkshops
which would summarize such things and I can remember Frank
talking about the FORTRAN/Ada studies and the high reuse they
had on at least one occasion.

The SEL describes five major contributors to their success in reuse in their
words.

1. Management commitment as managers looked for technical and management
solutions to respond to business challenges.

2. Process maturity, which allows the developers to focus more on the
technical solutions and less on the mechanics of how, the job should be done.

3. Organization maturity that moved from a dependency on key personnel to a
reliance on well-integrated teams that draws on an extensive and organized
store of knowledge.

4. Judicious use of new technology. Where the application of object-oriented
techniques and domain analysis promised and showed tremendous benefits. While
they had the opportunity to buy reusable automated libraries, they choose to
create reusable components rather than automating a process that they did not
understand.

5. Focus on a specific application domain. To ensure that the needs of both
user and reuser were met, multiple missions were considered and developed at
the same time.

They suggested that first, focus on developing a mature organization and
capture and reuse the knowledge and experience of people and projects. Next,
develop and mature a reusable process
That is specifically tailored or adapted to a specific domain. Then concentrate
on building and reusing software engineering products from code units to
requirements and design and beyond.

While step five is more organization specific, it must be resolved. For
example, getting different project managers in the same application domain to
cooperate in the establishment of on domain analysis program and the
development of reusable components.

My question is what happened to the OO, domain analysis, and component reuse
triumvirate in other organizations? Which of the above steps, that you know
of, that other organizations did not do