WO1999004337A1

WO1999004337A1 - Code measure tool

Info

Publication number: WO1999004337A1
Application number: PCT/GB1998/002050
Authority: WO
Inventors: Gerald Anthony Robinson
Original assignee: British Telecommunications Public Limited Company
Priority date: 1997-07-17
Filing date: 1998-07-13
Publication date: 1999-01-28
Also published as: GB9715061D0; AU8235298A

Abstract

The present invention concerns a code measurement system and method for determining the maintainability of software. The system comprises first means for deriving from a fault record of know software and a metric database of the known software a set of rules, and second means for comparing a metric database of second software not having a fault record to derive a signal indicating the maintainability of the second software.

Description

CODE MEASURE TOOL

The present invention is concerned with computer systems and in particular with a system for analysing computer software so as to establish potential problems in the software. There is now an immense quality of software code which has been written in languages such as C or Cobol and the amount of computer code in current use is increasing at a very substantial rate.

It is thus very important that users can have the capability of identifying where maintenance or reliability problems are likely to occur not only in existing code, but in code which has been written to interlink with existing code and also entirely new code. Such information is useful both when a user is concerned with establishing the quality of new code after it has been communicated or generated in-house, or accessing the demands of maintaining existing code. It can also be used at intervals to determine what has happened during the working life of software and whether the quality of the software has improved or not.

Thus the present invention is particularly concerned with a management system capable of identifying or predicting in computer systems where problems are likely to occur and additionally identifying code for updating/investment and accessing the impact of changing code.

As a result of an appreciation of these problems relating to software code quality a number of metrics have been devised by means of which a measure can be given to the reliability of a particular set of software. These metrics exist at three levels . The lowest of these levels is known as the function level and these metrics deal with functions within a software file. The next level deals with files within a software system and the third level is that which deals with system metrics. There are substantial problems in obtaining sufficient data to carry out software analysis at the files within a system level and even more so at the system level so that the present specification is mainly concerned with metrics at the function level although it will be appreciated that a similar approach and a similar system could be devised for operating at the file and system metric levels.

An example of file level metrics which are already known are shown in the tables marked "Annex A" at the end of this specification.

A known maintainability metric is that called the Hewlett Packard (HP) maintainability index. The HP index is calculated on a file and system level and is mainly a size measure and has been validated by industrial use. In accordance with the HP index source files scoring less than 65 are considered to have poor maintainability, and files scoring over 85 good maintainability. Thus it is already known to provide systems which use rules which when applied to software parameters give an indication of the maintainability of code. However, as will be described later, these rules are often contradictory.

A concern of the present invention is to provide a system which can reliably assess the maintainability of computer software, and in particular at the file level.

In accordance with one aspect of the present invention there is provided a code measurement system for determining the maintainability of software, comprising first means for deriving from a fault record of known software and a metric database of the known software a set of rules, and second means for comparing a metric database of second software not having a fault record to derive a signal indicating the maintainability of the second software.

In accordance with a second aspect of the present invention there is provided a method of measuring code for determining the mountability of software, the method comprising the steps of measuring code system for determining the maintainability of software, comprising a step of deriving from a fault record of known software and a metric database of the known software a set of rules, and a step of comparing a metric database of second software not having a fault record to derive a signal indicating the maintainability of the second sof ware .

In order that the present invention may be more readily understood an embodiment thereof will now be described by way of example and with reference to the accompanying drawings .

Figure 1 shows a general overview of a known system;

Figures 2A and 2B show a general overview of a system in accordance with the present invention;

Figure 3 is a flow diagram showing the operation of the system of Figure 2;

Figures 4 and 5 show typical graphical displays.

In accordance with the present invention a prediction system using rules acquired by data mining techniques builds on knowledge of changes in the past in order to carry out the necessary prediction of quality or maintainability. In such a system it is possible to define a maintainability detecting index for the code under consideration together with the amount it will have to be changed . These two parameters indicate the impact of potential developments . However the very nature of the data mining techniques and the rules identified by these techniques mean that the parameters predicting possible problems have to be expressed as probabilities and frequently one rule when applied will lead to one set of probabilities whilst another rule will lead to a set of contradicting probabilities. Thus the simple application of a set of rules is insufficient to give an unambiguous prediction.

Referring now to Figure 1 of the drawings, this shows a general purpose computer 1 which acts as a measurement tool with regard to a set of files 2 which together comprise the software of a complex computer system. In the present embodiment the files 2 are coded in C but of course they can exist in any suitable codes such as Cobol .

In the arrangement of Figure 1 a set of change records 3 are generated for each of the files in the set of files 2. These can be generated by computer 1 or by another processor (not shown). The results of such an analysis of the files of C code, are shown in Table A, where the file names are shown on the left and the measures taken are shown under the measures heading, together with the number of detected changes and a pass or fail criterion. The criterion threshold can be set arbitrarily but in general will be set so as to catch the worst 25% of the files in the system. In this context worse means most heavily changed. The results of the analysis will be referred to as a metric database and is indicated in Figuer 1 at 20.

TABLE A

FILE NAME MEASURES NO. OF PASS CHANGES OR

FAIL

Ml M2 M3 Mn

ABC.C 21 3.4 29.2 5 F

ABD.C 16 0.2 8.3 2 P

ABE.C 21 29.0 72.0 0 P

ABF.C

XXX. C 16 7.0 16.0 32 F

It will be appreciated that this Table A is the result of investigating a known software system and the present invention is concerned with providing a prediction with regard to the software files which constitute an unknown system.

Figure 2A shows part of a system in accordance with an aspect of the present invention. In Figure 2A data mining software is used in a computer 1 to analyse the metrics database 20 to produce a set 21 of appropriate rules which will be defined in greater detail hereinafter. In Figure 2B the number 10 indicates a set of new files of C code the maintainability of which is to be ascertained and which are measured by the measurement tools 11 in the same way as the known files were measured in Figure 1 so as to generate a metric database 12 similar to the already described Table A. The set of data mining rules 21 as derived from a known set of files as described with respect to Figure 2A are then applied to the metric database 12 using a general purpose computer 22 to generate predictions with regard to each of the files in the newly generated table. Naturally the various steps in the process of generating the metric database and the set of rules can be carried out by the same computer. The predictions can be displayed on the screen 23 of the computer or stored at 24 for subsequent analysis.

A rule will be generated by taking a combination of two or more of the measures shown in Table A together with the pass or fail criterion so as to generate at least two inequalities so that the rule applies when the two or more inequalities by the unknown files are satisfied. In a simple example a rule can, using the reasons shown in Table A, consist of the two inequalities 2 < Nm < 17 and 16 < M2. As an example, if these inequalities are satisfied the rule will indicate that the file is a fail. The confidence factor associated with the rule will be dependent on the number of records in the database or source data to which the rule applies .

The overall basic procedure for deriving maintainability values for the files of an unknown software system are shown in the flowchart of Figure 3.

Step S10 of this flowchart represents the files of a known software system. At step Sll a record is made, as described with regard to Figure 1, of the file changes in the software system over a period of time. At step S12 a metric database is set up in which selected key parameters of the known code are extracted. At step SI3 the file change record and the metric database are combined and at step S14 the data mining is applied to the combined metric database so as to generate a set of rules such as the rule previously discussed. Once the data mining has established the rules, as shown at step 15, the unknown code, that is the code the maintainability or quality of which is to be assessed, has generated from it a metric database which corresponds to the metric database of step 12. Naturally, the code from which this second metric database was generated will not have a fault record. The measurement of the unknown code is shown at step S16 and the generation of the metric database is shown at step S17. At step S18 the rules derived by the data mining step S14 are applied in the prediction system shown in Figure 2 so as to generate and display or otherwise make available a series of predictions from which the maintainability or quality of the unknown code can be assessed. This is shown at step 19. There is, of course, no absolute necessity for the measurement of the known code to be carried out before the measurement of the unknown code with the proviso that it has to be done before step S18 can be carried out.

The calculations just described will in total give a reasonable prediction as to the maintainability of a file. However the nature of the techniques employed will tend to provide a bias which should be compensated for. Thus if the rules used have been generated from heavily maintained code, that is code which has already been changed a lot, then the prediction will be biased in favour of failing files.

The result of the examination of a new file on the basis of the rules is shown in Table B. TABLE B

PASS RULES TRIGGERED FAIL RULES TRIGGERED

RULE 1 CF 87% RULE 12 CF 50%

RULE 13 CF 16% RULE 3 CF 26%

RULE 15 CF 54% RULE 18 CF 14%

RULE 28 CF 32% RULE 74 CF 12%

RULE 96 CF 24% RULE 200 CF 65%

RULE 134 CF 16%

This table shows a selection of rules under which the file passed and a selection of those rules under which the file failed. Naturally many more rules are examined in actual operation of the system. The table also shows a confidence factor (CF), expressed as a percentage, with which the prediction for pass or fail has been given. This confidence factor is an estimate of the probability that an individual rule will be correct in relation to any individual case. However the result as expressed by Table B does not really give a clear indication as to whether the new file is a pass or a fail because of the fact that under some measurements the file is a pass and under other measurements the file is a fail. Additionally the confidence factors vary for each of the predicted results. It is thus necessary to make a further judgement from Table B trying to take into account these varying factors .

A number of different methods have been proposed to rationalise the results as expressed in Table B so as to reach a conclusion about the maintainability of the file.

One simple method is to compare the number of pass rules with the number of fail rules so that the file passes if the pass rule number is greater than the fail rule number. However because of the presence of the confidence factors it may well be that the system predicted fails with more confidence than it predicted passes. Thus a second method would be to sum the confidence factors (CF's) of the passes and compare this with the sum of the CF's of the fails. In the given table the sum of the CF's for pass is 229% and fail 167%. On this calculation the file is a pass. Yet another approach is only to take into account those rules where the CF is greater than chance value.

Yet another method is to simply state that the best rule wins. Thus the highest CF is for rule 1 and using this method the file passes. This procedure can be modified by adjusting the best rule to take into account the fact that as a greater percentage (70%) of the files in the source data were pass files it is more likely statistically that the best rule would be for a pass file. On this basis the new file is a fail as 87 - 70 = 17 and 65 - 30 = 35. However none of these approaches provides a reliable outcome as to the likelihood of the file passing or failing .

In accordance with another aspect of the present invention the system is adapted to carry out a calculation using the following factors:

CF_X = Confidence factor associated with Rule x. tp = Number of pass rules triggered, tf = Number of fail rules triggered, p = % of pass files in "KNOWN", f = % of fail files in "KNOWN" . m = Number of pass rules in role database. n = Number of fail rules in role database.

As a result of these four calculations pass and fail scores are determined as follows :-

PASS SCORE

{ADJUSTMENT, + -^ - ≥) / m tp tf

FAIL SCORE =

(ADJUSTMENT_? + — - — ) /n ² tf tp

The adjustment factor is a factor based on the software system from which the rules were derived. They are based on the known history of the software system about which the prediction is to be made, and also the required sensitivity level of the prediction tool. The two adjustment factors should always sum to 100. h e default values used for both of adjustment factors are 50 and 50. These values would be suitable if the system being analysed is newly developed code, and if it was equally important to avoid false fails and false passes .

If the system which is being analysed has already been heavily maintained then the prediction technique will generally produce more false fails and fewer false passes. This can be prevented by increasing the pass adjustment factor (to 60) and reducing the fail adjustment factor (to 40).

If the intention is to minimize the number of false passes then by reducing the pass adjustment factor and increasing the fail adjustment factor this effect can be achieved.

If Pass is 10% or more greater than Fail then PASS Else Fail.

TEST RESULTS

EFFECTS OF ADJUSTMENTS

The expert system just described can also be used in combination with a neural network to evaluate software. The resulting system is known as a Code Measurement Toolkit (CMT) . This provides an integrated environment for the code analysis and maintainability assessment of C and COBOL code. CMT can be used to:

Predict the likelihood of future change to a file. Use two Maintainability Indices to assess the maintainability of code. Use a comprehensive set of industry leading software metrics including data flow measures to assess the complexity, testability and readability of code.

Export a set of metrics for use by the Code Monitor successive release monitoring tool.

CMT consists of three main components:

1. Code measurement. 2. File change prediction capability. 3. Graphical user interface.

The Code Measurement Component utilises a code parsing and metrics tool X-RAY, developed by South Bank University, as the code parsing tool. Further code metrics are calculated from X-RAY 's output using the Qualms tool and other appropriate algorithms . In total 17 system level, 58 source file level and 44 function level measures are made (function in this context is taken as being synonymous with perform and procedure). These metrics include measurements of the following aspects of source code:

size - processing complexity data complexity information flow between source code files and between functions testability maintainability

Some of the file level measures appear in a number of statistically derived forms, max, mean, weighted mean, and average density (the average density of a metric for a portion of source code is calculated by dividing the metric value by the number of non-comment, non-blank lines of code).

The range of measures made is very broad which enables a very good understanding of various attributes of the code to be obtained. Current commercially available tools do not provide the ability to make such a broad range of measures .

The File Change Prediction Capability which predicts the likelihood of a file changing in the future is based on the analysis of the measures produced by CMT. Two expert systems are used to independently analyse the measurements, a neural net and a rules based expert system. The results from the two expert systems are then compared to allow a judgement to be made.

To enable the two expert systems to analyse the code measurements they first have to be trained on a set of source code which has a known change history. The source code is measured and the measurements and change history are analysed by a data mining engine to extract the rules for the rules based expert system and by a neural network retraining tool. In one embodiment the code and change history of release 32 of four sub-systems was used for training CMT. The sub-systems were of varying size and contained in total approximately 1 million lines of source code. The file change predictions were validated by checking the predictions of change against the amount of change that had already occurred to the code used for training. The predictions were found to be > 90% accurate. The prediction accuracy was also tested with a C source code system, of some 90 source files, for which the source code for the first and each subsequent release was available. The source code for the first release was measured. The predictions of change showed all the files as "Least Likely to Change", this was compared with the number of times that each file actually had been changed. The source code was of generally good quality and the predictions made by CMT were correct .

Once training is complete CMT can then be used to make change predictions of other source code. The system is, as shown in Figure 2, provided with a graphical user interface . The file change prediction results for the set of source files measured are presented in a "File Change Predictions" window, as shown in Figure 4. The window is composed of three scrolled lists 20, 21 and 22.

The top most list 25, which is red, contains all the files which have been given a FAIL rating, i.e. those files which are most likely to change in the future.

The middle list 26, which is amber, contains all of the files which have not been given a rating. This occurs if the two expert systems give conflicting results for a file, e.g. the neural net passes a file which the rules based expert system fails.

The green bottom most list 27 contains all of the files that have been given the rating of a PASS, i.e. those files which are least likely to change in the future .

Naturally colours coding is only available on a colour display.

Code maintainability is measured using the Hewlett Packard Maintainability Index. The results are presented in a window of similar format to the file change predictions, an example is given in Figure 5. In Figure 5 the red Poor Maintainability list shown at 30 shows files scoring less than 65. These files are very likely to be difficult to maintain in the future.

The yellow Reasonable Maintainability list 1 shows files scoring between 65 and 85. These files are likely to be quite maintainable, though may have some difficult areas .

The green Good Maintainability list 32 shows files scoring over 85. These files are likely to be easy to maintain.

The prediction of likelihood of change can be used together with the maintainability measure to identify code that would be a good candidate for improvement. For example if a particular source file is likely to change a lot in the future and it is unmaintainable then it would be a good investment to improve the quality of the code so that future changes are easier to perform and involve less risk.

Claims

1. A code measurement system for determining the maintainability of software, comprising first means for deriving from a fault record of known software and a metric database of the known software a set of rules, and second means for comparing a metric database of second software not having a fault record to derive a signal indicating the maintainability of the second software.

2. A system according to claim 1, wherein said second means are adapted for each file of the second software to generate and display a series of indications as to whether the file passes or fails a maintainability criterion.

3. A system according to claim 2, wherein each indication includes a confidence factor as to the correctness of the pass or fail indication by the rule.

4. A system according to claim 3, wherein the confidence factor associated with the rule is determined by the number of records in the source data from which the rule was generated.

5. A system according to any one of claims to 2 to 4 and including means for determining from said plurality of indications a maintainability criterion based on the following calculations:

x=tp

Z_p = Γêæ JiCF╬¬ -p²) x=l

x=tf

x=tp

Y_f - Γêæ J{100-CF_X)² - (100- ) x=╬╣

wherein :

CF_X = Confidence factor associated with Rule x. tp = Number of pass rules triggered. tf Number of fail rules triggered, p = % of pass files in "KNOWN", f = % of fail files in "KNOWN", m = Number of pass rules in role database, ii = Number of fail rules in role database.

6. A method of measuring code system for determining the maintainability of software, comprising a step of deriving from a fault record of known software and a metric database of the known software a set of rules, and a step of comparing a metric database of second software not having a fault record to derive a signal indicating the maintainability of the second software.

7. A method according to claim 6 , wherein in said second step there is generated and displayed for each file of the second software a series of indications as to whether the file passes or fails a maintainability criterion .

8. A method according to claim 7, wherein each indication includes a confidence factor as to the correctness of the pass or fail indication by the rule.

9. A method according to claim 8, wherein the confidence factor associated with the rule is determined by the number of records in the source data from which the rule was generated.

10. A method according to any one of claims 7 to 9 and including the step of determining from said plurality of indications a maintainability criterion based on the following calculations:

x=tf

Y_p = Γêæ (100^~CF_x)² - (100-P) x=l

x=tp

Γêæ ^(CF² - F²) x=l

x=tp

Y_f = Γêæ J {100 -CF_X) (100- ) x=╬╣

wherein:

CF_X = Confidence factor associated with Rule x tp = Number of pass rules triggered, tf = Number of fail rules triggered, p = % of pass files in "KNOWN". f = % of fail files in "KNOWN". m = Number of pass rules in role database, n = Number of fail rules in role database.