AbstractUsing current software engineering technology, the robustness required for safety critical software is not assurable. However, different approaches are possible which can help to assure software robustness to some extent. For achieving high reliability software, methods should be adopted which avoid introducing faults (fault avoidance); then testing should be carried out to identify any faults which persist (error removal). Finally, techniques should be used which allow any undetected faults to be tolerated (fault tolerance).
The verification of correctness in system design specification and performance analysis of the model, are the basic issues in concurrent systems. In this context, modeling distributed concurrent software is one of the most important activities in the software life cycle, and communication analysis is a primary consideration to achieve reliability and safety. By and large fault avoidance requires human analysis which is error prone; by reducing human involvement in the tedious aspect of modelling and analysis of the software it is hoped that fewer faults will persist into its implementation in the real-time environment.
The Occam language supports concurrent programming and is a language where interprocess interaction takes place by communications. This may lead to deadlock due to communication failure. Proper systematic methods must be adopted in the design of concurrent software for distributed computing systems if the communication structure is to be free of pathologies, such as deadlock. The objective of this thesis is to provide a design environment which ensures that processes are free from deadlock.
A software tool was designed and used to facilitate the production of fault-tolerant software for distributed concurrent systems. Where Occam is used as a design language then state space methods, such as Petri-nets, can be used in analysis and simulation to determine the dynamic behaviour of the software, and to identify structures which may be prone to deadlock so that they may be eliminated from the design before the program is ever run. This design software tool consists of two parts. One takes an input program and translates it into a mathematical model (Petri-net), which is used for modeling and analysis of the concurrent software. The second part is the Petri-net simulator that takes the translated program as its input and starts simulation to generate the reachability tree. The tree identifies `deadlock potential' which the user can explore further.
Finally, the software tool has been applied to a number of Occam programs. Two
examples were taken to show how the tool works in the early design phase for fault prevention before the program is ever run.
|Date of Award||May 1992|
|Supervisor||Geoffrey F Carpenter (Supervisor)|
- software reliability
- fault tolerance software
- concurrent software
- fault avoidance