Stop The Line! (LESS 2011)

3 views

Skip to first unread message

Joakim Sundén

unread,

Aug 31, 2011, 3:12:48 AM8/31/11

to Sumpan Lean Coffee

Hej!
På förekommen anledning (dagens diskussion om Stop The Line på Lean Coffee) vill jag meddela att nedanstående research report förhoppningsvis kommer att presenteras på LESS 2011. Sista dagen för early bird i dag! :) http://less2011.org/

Stop The Line (STL) is a practice done by agile teams and organizations in order to assure a high quality level in their software. This practice comes from the Lean manufacturing world. Toyota, in its Toyota Production System (TPS), defined the stop the line events whenever an error was found in the production system. The whole production line was stopped so that the defect could be fixed immediately, thus preventing it from flowing downstream on the line where detection and fixing would be more difficult and costly. This is followed then by systematic root cause analysis to prevent the problem from re-occurring.

In Lean Software Development or Agile Software Development in general, this is translated to stopping the building of new functionality if an error is found in the system by continuous integration (CI) -and thus, test automation (TA)- but also by manual testing. Therefore, the focus is put on fixing the bug immediately. The definition of “line” for the software development industry is, usually, more complex and sophisticated than in manufacturing and, thus, a more fine-grained approach to stopping certain parts of the system is used.

This paper shows how F-Secure handles the STL cases and study the relationship between the usage of a targeted STL practice with an addition of a threshold to trigger a stop on the feature development. Basically, a STL event is raised for bugs which break any of the automated tests or critical bugs which affect a specific part of the system (subsystems or software lines). Consequently, the development of any new feature for that specific line stops and the focus goes to finding and fixing those bugs that caused the failure.

Additionally, if defects are not breaking the automatic tests or are not critical enough (but still important enough so they cannot be trashed), they are not immediately handled but added to the defect list. In this scenario, a threshold on the number of defects is set to raise a Stop Feature Development (SFD) event -stopping completely any new feature development- if the number of defects per team or the number of bugs globally reaches a certain limit. On the team-level SFD events the development of any new feature in that team stops until the bug count for that team goes, comfortably, below the team limit. For global SFD (project wise) events, the whole project is handled correspondingly, with the overall bug count reduced under the project limit.

This study also compiles a set of statistics on how these practices changed the quality of the software over time. The project under study is a multi-site set up (Finland, Malaysia and Poland) formed by 10-12 teams (about 100 persons in total) over 18 months. The metrics analysed are, among others, number and duration of STL events over time, ageing of the defects, root causes of selected events, number and size of the commits, variation of the number of defects over time, etc. The statistics clearly indicates that the situation has changed significantly after the introduction of these practices. After some adjusting and learning period, the overall quality of the software increased and the time to develop new features decreased. This report shows how F-Secure got these results by using these two practices, among others, in a quantitative manner.