Tim Menzies
unread,Sep 30, 2013, 3:22:13 PM9/30/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs...@googlegroups.com
I said at the start of class that I would be looking to this class to
find 2 research students in data mining, starting Jan 2014.
If you are interested, please reply with a resume. Our mid-term is
next week and once I have those grades, I will have what I need to
know to make offers for GRA + tuition waivers.
Ideally, I want Ph.D. students but will happily take talented
masters-by-research students.
The work is described below. It is about data mining and learning how
to move models learned in one domain to another. This is called
"transfer learning" and it is a hot current topic. While the test
domain is about software engineering, the general principles should
apply to the entire scientific endeavor. So this specific project is
really about, well, everything.
FYI- the funding is for 4 years from mid 2013 onwards. so this would
be a stable income for several years. HOWEVER, initially i would only
fund you for spring 2014 (and perhaps summer), just so we can see if
we like working with each other.
Over to you. I look forward to reading your resume.
t
--------------------------------------------------------------------
NSF:MEDIUM: Collaborative: Transfer Learning in Software Engineering
Tim Menzies (WVU); Forrest Shull & Lucas Layman (Fraunhofer USA, Inc.)
What are the best organizational principles for the trillions of
dollars spent annually on information technology [126]? What can
software project managers and developers learn from past projects?
Should project managers devote scant resources to a “local lessons
team” that pursue best local practices? Or should software project
managers and developers just and apply the supposed best practices
listed in software engineering (SE) textbooks? Given the current state
of research, we just do not know.
Clearly, urgently, software engineers need better ways to recognize
best practices in past projects, and to understand how to transfer and
adapt those experiences to current projects. No project is exactly
like previous projects- hence, the trick is to find which parts of the
past are most relevant and can be transferred into the current
project. We propose novel automated methods to apply the machine
learning concept of transfer learning to adapt lessons from past
software engineering project data to new conditions.
The goal of the proposed research is to enable software engineers to
find project-specific ”best prac- tices” from past empirical data
using transfer learning. Using data from real software projects, our
transfer learners will find better “best practices” for (1) predicting
software development effort; (2) isolat- ing software detects; (3)
effective code inspection practices; and numerous other issues
identified by our software engineering subject matter experts.
These tools are novel and unique since they will learn the best
lessons from past data, while intelligently avoiding data that is no
longer relevant. This proposal will deliver:
-- New data mining technologies: Novel transfer learners that overcome
current limitations in the state- of-the- art to provide accurate
learning within and across projects;
-- New empirical studies: The quantitative evaluation of existing
“best practices” using transfer learning on empirical data from
software companies;
-- An on-line model analysis service: where anyone find and refine
their own “best practices”.
INTELLECTUAL MERIT: Science is about generality but, to date, there
are too few examples of SE principles that generalize across multiple
projects. Our work will revolutionize SE research:
-- The results will confirm or dispute the software engineering
principles that currently guide decision making, and build knowledge
about how those principles do or do not transition across projects;
-- The resulting data miners will be packaged and disseminated to
enable software managers and re- searchers to benefit from a quick and
inexpensive quantitative analysis of their own data sets.
BROADER IMPACTS: By providing a means to test principles about
software development, this work stands to revolutionize SE research
and enable software managers to rely on facts rather than heuristics.
This is important since software is essential to international
financial and transport systems; our energy generation and
distribution systems; and even the pacemakers that control the beat of
our hearts. Quantitative, directed software project improvement is a
crucial step in the evolution of software engineering.
Our technologies will be an open source package that is readily
deployed inside specific organizations. It will also be available as
an on-line workbench where software managers and developers can test,
explore and refine conjectures about what factors most influence
projects.
The on-line model analysis service will become an valuable resource
for undergraduate and graduate teaching of software engineering. Using
that site, students can review and critique theories about software
engineering using real-world project data.
Finally, this proposal also offers career development to groups that
currently under-represented at uni- versities in America. WVU offers
research education opportunities to economically disadvantaged rural
areas. Also, 40% of the Fraunhofer researchers are women, far above
the latest numbers for women’s representation in U.S. computer science
degree programs.
KEYWORDS: transfer learning software engineering; data mining; defect
prediction; effort estimation