Acknowledgements
Professor Barbara Russo has been the ideal supervisor. Her sage advice, insightful criticisms, and patient encouragement aided the writing of this paper in innumerable ways
- Heider Jeffer
Developer Heider Jeffer Fakultät für Informatik Freie Universität Bozen Hjeffer@unibz.it |
|
---|
Document Status: [processed] 2018 November 26th
- HERE you can view or download the project Documentation in PDF format!! and
- HERE you can download the Project in Video in MP4 format!!
A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The term "bug" to describe such defects has been part of engineering jargon since the 1870s, long before the advent of electronic computers and software. It likely originated in hardware engineering to describe mechanical malfunctions. For example, Thomas Edison wrote in a letter to an associate in 1878: "It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise—this thing gives out and then that 'Bugs'."
Software bugs have been linked to significant disasters. Some notable examples include:
- Therac-25 Radiation Therapy Machine (1980s): Bugs in the code controlling this machine led to patient deaths.
- Ariane 5 Rocket (1996): A bug in the on-board guidance computer program caused the European Space Agency's US$1 billion prototype rocket to be destroyed less than a minute after launch.
- RAF Chinook Helicopter Crash (1994): Initially dismissed as pilot error, an investigation suggested that a software bug in the aircraft's engine-control computer may have caused the crash, killing 29 people.
In this research, we propose the SZZ algorithm as a viable solution to address software bugs. We illustrate our problem, solution, and the methodology used through a media player for clarity.
The SZZ algorithm automatically identifies the fix-inducing predecessor lines that are changed in a bug-fixing commit. For this analysis, we need comprehensive data on all changes and fixes in a project, which we obtain from version archives like CVS and bug tracking systems like Bugzilla. This approach helps us answer key research questions: when was the bug introduced, and who was responsible?
Although the SZZ algorithm is not perfect, it is currently the best available method for identifying bug-inducing changes. Our future work will focus on enhancing the algorithm's precision and accuracy.
We have presented our research findings through both written explanations and media graphics. We believe this multimedia approach will make our research more accessible to decision-makers in the software industry and shareholders, clearly explaining our complex implementations in an easy-to-understand format.
-
Model(01): Define Life Cycle of Bug.
-
Model(02): Define SZZ algorithm.
-
Definition(01): Define the Fixed Bug & BUG- Reported and find the link between them.
-
Definition(02): Define BUG-INTRODUCING CHANGE.
-
Definition(03): Define SZZ tools--DIFF command with an example from Prof. Barbara Java source code.
-
Definition(04): Syn: Syntactic Analysis
-
Definition(05): Sem: Semantic Analysis
-
Model Output: ECLIPSE & MOZILLA
-
Output analysis: Similarity and dissimilarities between Eclipse and Mozilla.
Model 1 represent the definition of the life cycle of the bug:
-
Start—Model 1
-
[Bug-Introducing Changes].
-
Bug Report.
-
[Fixed Bug].
-
End—[Model 1]
Figure 1 LIFE CYCLE OF BUG by Heider Jeffer
Model 2 is the definition for the SZZ algorithm. Model 2 will start when the Model 1 is End.
-
Start–[Model 2] from the End—[Model 1]
-
Fixed Bug
-
DIFF Command— [Track & Located the Changes]
-
BUG-Reported
-
Remove False Changes
-
BUG-INTRODUCING CHANGE— [Changes
that occur before the report].
- End — Model 2
Figure 2 SZZ ALGORITHM by Heider Jeffer
“In this project we assume that the changes between Fixed Bug and Bug-Reported is the False Changes, therefore we remove it.”
The FIXED BUG and BUG REPORTED are two information that we know. In this project we want to find the link (relation) between these two information (FIXED BUG & BUG REPORTED) by the following:
-
Start with BUG REPORTED
-
Move Backward.
-
Track change with Diff command
-
Check each revision in term of which revision that includes change.
-
Remove every revision that have no change.
-
Continue until we arrive to the BUG REPORTED.
The link (relation) between FIXED BUG and BUG REPORTED is a sequence of revisions that had been changed.
SZZ can’t start immediately from Bug Reported because the Bug Reported is not define. SZZ must start from FIXED BUG even if the revision changes no matter after Bug reported because the revisions with no changes are highly recommended to define the Start of SZZ algorithm or any algorithm that solve the tracking / locating problem(e.g. Big M algorithm non B.V.).
BUG-INTRODUCING CHANGE is the revision
include change before BUG REPORTED. Since we identify the link (relation) between FIXED BUG and BUG REPORTED, now we can identify (solve) the BUG-INTRODUCING CHANGE problem.
Figure 3 BUG-INTRODUCING CHANGE & REMOVE FALSE
CHANGES
In our project we will start from BUG REPORTED and moving backward. Our algorithm will ignore the revisions that not include changed. The algorithm will keep moving backward to check each revisions one by one if the revision include no change then the algorithm will be ignore it and it will check the next revision, until the algorithm identify the revision that include change, this revision is the BUG- INTRODUCING CHANGE.
False Changes are the changes between Fixed Bug and Bug Reported. We will remove all false changes.
DIFF command is a tool that calculates and displays the differences between two revisions and the output is the colored lines represent the change between these two revisions these changes could be the line that added, line deleted, line moved, or the line changed by the developers, the DIFF will display the differences in the revisions by comparing the revisions line by line before and after the developer added, line deleted, line moved, or changed a line (e.g Heider Jeffer Gitlab – WP project with Prof. Barbara Russo).
In this project we used the DIFF command to:
Find the link (relation) between fixed bug and bug reported. To identify the revision that BUG- INTRODUCING CHANGE.
Figure 5 SZZ with DIFF command by Heider Jeffer
Bug introduced before report because of bug cycle
life; First bug introduced, reported , then fixed (figure 5).
DIFF command used the colors (e.g. Green, Red, Purple, Blue and Yellow) each color represent the type of the lines differences between two revisions as the following:
In our project with Prof. Barbara we make an experiment. We open a source code and we run a diff command on it to determine what changed in the bug- fixes. The SZZ assumes that deleted or modified source code in each (Red or Green - Colors) is the location of a bug.
Figure 6 Prof. Barbara Source Code with DIFF command by Heider Jeffer
The syntactic confidence is an integer number between 0-2. The syntactic confidence Syn analysis for our project have three different type of conditions (A,B,&C). Letter A has a value of zero the rest letters (B & C) they have value of 1 (if they satisfied the conditions criteria that we set for each of them). We initially assign a syntactic confidence syn of zero then we raise the confidence by one:
Syn Level of Confidence:
-
If we have all letters (A,B &C) then A+B+C = 0+1+1 = 2 level of confidence.
-
If we have B or C, then syn = 1 level of
confidence.
DIFF Color | Meaning |
---|---|
Green | Line Added |
Red | Line Deleted |
Purple | Line Changed |
Blue | Change Highlight |
Yellow | Line Changed |
- If we have letter A then syn = 0 level of confidence…etc.
The semantic Sem analysis for our project have five conditions A,B,C,D, and E.The semantic Sem analysis for our project have five different type of conditions (A,B,C,D,&E). Letter A has a value of zero the rest letters (B,C,D,&E) they have value of 1 (if they satisfied the conditions criteria that we set for each one of them). Thus For each (A,B,C, & D) letters we have assigned the link as a semantic level of the confidence based on the following conditions:
Sem Level of Confidence:
-
If we have all letters A+B+C+D+E = 0+1+1+1+1= 4 level of confidence.
-
If we have 3 over 5 letters and one of these letters is A then the sem = 2 level of confidence.
-
If we have 3 over 5 letters and if letter A is not one of them then sem = 3 level of confidence…etc.
After we have the syntactic and semantic confidence level, now, we can distribute the links across these two confidence levels.
Figure 7 Distribution of links accross different classes of syntactic and semantic confidence levels in ECLIPSE from Jacek S´liwersk
By using the following formula on Figure 7, we decided to use only those links whose syntactic and
semantic levels of confidence satisfy the following condition:
sem > 1 ∨ (sem = 1 ∧ syn > 0) … a
And by implement Form. (a) on Figure 8: Distribution of links across different classes of syntactic and semantic confidence levels in ECLIPSE We will have the following output:
Figure 9 sem > 1 ∨ (sem = 1 ∧ syn > 0)
We identified 25,317 links for ECLIPSE, connecting 47% of fixed bugs with 29% of transactions.
We used video editor to display the flow of Form (a).
The SZZ algorithm even if it is good but it is not enough to solve the bug problem because there are many factors interact with bug problem that we have to put them in our consideration one of these factor is the developer as we see the gap between Mozilla and Eclipse in our syn/sem analysis. The more the developer is professional and expert the better to detect, locate, track and fix the bug .
THE ANSWER FOR OUR RESEARCH QUESTION
Here our paper give the answer for our research question When the was Bug Introduced & The Man Who Was Responsible for it. Thus had been determined by SZZ algorithm.
Our solution provide the decision maker in software organization an automatic tracking for bug introduced changed, this mean by using SZZ algorithm you will minimize the cost of time and maximize your software quality which is a paradigm shift toward reliable product in term of eliminating error sooner no later, and this solution give the answer to the major companies and software industries that daily faced the software errors (bug)
-
When was bug introduced? The bug was interduce in the revision before the bug has been reported – [by SZZ]. And,
-
Who was responsible? The developer of the first revision that had been changed before the bug had been reported is responsible for Bug Introducing Changes – [by SZZ].
-
Bug can only appear before the report – [Bug Introducing Changed].
-
Not all changes are fixes – [False Changes].
-
Annotate information is insufficient – [Before the report change is matter].
-
J. S´liwerski, When Do Changes Induce Fixes? https://goo.gl/QZxMMz
-
N. Bettenburg, Automatic Identification of Bug Introducing Changes. https://goo.gl/Vt9f2G
-
T. Zimmermann, Automatic Identification of Bug-Introducing Changes https://goo.gl/pNHR31
Figure 1 from Prof Barbara Russo Slide