Abstract

This paper reviews the development of error recovery structures that support general fault tolerance, and describes a new object-oriented scheme for error recovery in concurrent systems that generalizes existing schemes based on either conversations or transactions. This new scheme, which is based on what we term a Coordinated Atomic Action, is intended to facilitate the provision of means of tolerating hardware and software faults, and faults that have affected the environment of the computer system - and to do so for programs that involve cooperating concurrent processes, and the use of shared resources.

From Recovery Blocks to Concurrent Atomic Actions
Randell, B., Romanovsky, A., Rubira-Calsavara, C.M.F., Stroud, R.J., Wu, Z. and Xu, J.
In Predictably Dependable Computing Systems,
Randell, B., Laprie, J-C., Kopetz, H. and Littlewood, B. (eds.), pp 87-101
ESPRIT Basic Research Series,
Springer-Verlag, Brussels, 1995, ISBN 3-540-59334-9