The present invention relates generally to fault tolerant computer
processors, and more particularly, to a voted processing system.
The natural radiation environment on Earth and in space can often
cause short term and long term degradation of semiconductor devices used in computers.
This hazard is a problem for computers where fault-free operation is required.
In addition to these radiation effects, computer chips are subject to random failures
due to undetected defects and weaknesses that evolve over the course of time. Trace
radioactive materials in semiconductor packages may also cause faults.
When computers must operate for long periods in a remote environment,
or where these devices must operate without fault for long periods of time, the
need for systems that are protected from faults or failure becomes critical. Remote
or vulnerable environments include remote oil platforms, submarines, aircraft and
isolated sites such as Antarctica. Systems that operate in Earth orbit and beyond
are especially vulnerable to this radiation hazard.
The presence of cosmic rays and particularly high-energy particles
in space can produce a disturbance called a single event effect (SEE) or a single
event upset (SEU). The magnetic field of the Earth deflects particles. The Earth's
magnetic field also traps charged particles that travel from the Sun and other
stars toward the Earth. Some particles that are not trapped by the Earth's magnetic
field are steered by that field into our atmosphere near the poles. These particles
can penetrate the electronic devices aboard satellites.
When high-energy particles and gamma rays penetrate a semiconductor
device, they deposit charge within the computer circuit and create transients and/or
noise. This phenomenon can "upset" the memory circuits. One type of upset occurs
when a single bit of data stored in the chip's memory changes its value due to
radiation. In this instance, a logical value of "one" can change to a logical value
of "zero" and vice versa. An upset may be generally defined as a misstated output
of a component. This output may comprise one or more signal bits.
The upset rate of a component depends on the construction features
of the chip, including its size, operating voltage, temperature and internal circuit
design. The upset rate for a particular part can vary from ten per day for a commercial
one-megabit random access memory chip (RAM), to 1 every 2800 years for a radiation-hardened
one megabit RAM. A radiation-hardened component is a device that has been specially
designed and built to resist the hazards of radiation. These devices, however,
tend to be much more expensive and slower than conventional chips. They typically
lag the state-of-the-art by several years.
A solution to this problem is presented in U.S. Patent No. 5,903,717,
which discloses a fault tolerant computer system. This approach describes a fault
tolerant computer system in which four RISC CPUs are directly attached to voting
logic and are operated in lock-step. Such an approach protects the CPUs from the
effects of ionizing radiation in space, but it does not protect the peripheral
logic (e.g., the Memory Interface and the Bus Interface).
The development of a fault tolerant computer based on commercially
available parts for use in military and commercial space vehicles would offer significant
operational and cost advantages. Such an invention would offer higher levels of
performance and would cost less to manufacture than existing approaches based on
radiation hardened chips. The invention could be used for remotely installed computer
systems and other processors that are subjected to random failures or to a radiation
environment which produces single event upsets at unacceptably high rates. Such
radiation upset protection would discover and correct errors. It would be extremely
beneficial if a fault tolerance method could be applied not only at the CPU level,
but also at the peripheral logic level. Such a system would fill a long felt need
in specialized computer and satellite industries.
SUMMARY OF THE INVENTION
It is, therefore, an object of the invention to provide an improved
and reliable voted processing system. Another object of the invention is to harden
the peripheral logic of a processing system against radiation.
In one embodiment of the invention, a voted processing system includes
at least three processor groupings coupled to a voter. Each processor grouping
includes a central processing unit (CPU) and a support logic device. The CPUs operate
synchronously to execute an operating step every clock cycle. Each operating step
of each CPU is accomplished in parallel and substantially simultaneously with each
other. The support logic devices, such as memory controllers or bus interfaces,
are coupled to the CPUs. The voter is coupled to all of the processor groupings.
The voter uses redundant voting to detect errors in any one of the processor groupings.
An error is detected if a minority of the processor groupings disagrees with a
majority of the processor groupings. When an error is detected, the majority of
processor groupings are considered the correct output while the remaining processor
groupings are reset.
The present invention thus achieves an improved voted processing
system. The present invention is advantageous in that it allows the use of commercial,
non-radiation hardened components to be used with in a fault sensitive system.
Additional advantages and features of the present invention will
become apparent from the description that follows, and may be realized by means
of the instrumentalities and combinations particularly pointed out in the appended
claims, taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In order that the invention may be well understood, there will now
be described some embodiments thereof, given by way of example, reference being
made to the accompanying drawings, in which:
BEST MODES FOR CARRYING OUT THE INVENTION
- FIGURE 1 depicts a satellite system in which a voted processing circuit in
accordance with the present invention may be utilized; and
- FIGURE 2 schematically illustrates a voted processing circuit in accordance
with one embodiment of the present invention.
Referring to FIGURE 1, a satellite system 10 in which a voted processing
circuit in accordance with the present invention might be utilized is illustrated.
The satellite system 10 is comprised of one or more satellites 12 in communication
with a ground station 14 located on the Earth 16. Each satellite 12 contains one
or more voted processing circuits 18.
The satellite system 10 is responsible for insuring correct processor
operation while being subjected to radiation. Integrated circuits used in computers
and other electronic systems aboard space vehicles are susceptible to a phenomenon
known as Single Event Upset, or SEU. Single Event Upset occurs when radiation passing
through an integrated circuit deposits stray charges in the device, causing one
of its registers to be disrupted. Several fault protection techniques can be utilized
to reduce the number of SEU's that occur in the integrated circuits used aboard
space vehicles, but these conventional techniques have several disadvantages.
Referring to FIGURE 2, a schematic of a voted processing circuit
18 in accordance with one embodiment of the present invention is illustrated. Voted
processing circuit 18 includes three or more processor groupings 20 coupled to
a voter 22. Each processor grouping 20 includes a central processing unit 24 and
a support logic device 28 and has a plurality of processor grouping inputs and
outputs. In one preferred embodiment of the present invention, three processor
groupings 20 are used, however, one skilled in the are would recognize that any
number of processor groupings greater then three may be used.
A voted processing circuit 18 includes three processor groupings
20 and three CPUs 24. One CPU corresponding to each processor grouping 20 is required.
Each CPU 24 receives a clock signal 26 and executes an operating step during a
clock cycle of clock signal 26. Each CPU 24 operates synchronously, each operating
step of each CPU 24 being accomplished in parallel and substantially simultaneously
with each other CPU 24 for each clock cycle. Each CPU 24 includes a plurality of
CPU inputs coupled to support logic device 28 through line 30 and a plurality of
CPU outputs coupled to support logic device 28 through line 32.
In a voted processing circuit 18 containing three processor groupings
20, three support logic devices 28, one corresponding to each processor grouping
20, are required. One skilled in the art would recognize that support logic device
28 may include any type of peripheral or bus support logic component. These components
may include a memory system, a memory controller, a system memory, or a bus interface
controller. Each support logic device 28 includes a plurality of support logic
device inputs coupled to CPU 24 through line 32 and a system bus 34 through line
36. Each support logic device also includes a plurality of support logic device
outputs coupled to CPU 24 through line 30 and voter 22 through line 38.
Voter 22 is coupled to each processor grouping 20 through line 38
and is responsible for detecting output errors and resetting each processor grouping
20. Voter 22 uses redundant voting of the processor grouping outputs. An error
is detected if a minority of the processor grouping outputs disagrees with a majority
of the processor grouping outputs. In the present example, when three processor
groups 20 are used, one is a minority. Each processor grouping output is compared
one with another by voter 22 each clock cycle. Each voter 22 includes a plurality
of voter inputs coupled to support logic device 28 through line 38 and a plurality
of voter outputs coupled to system bus 34 through line 44.
When voted processing circuit 18 is reset through reset input 42,
all of the CPUs 24 and all of the support logic devices 28 are set substantially
to the same state. When voted processing circuit 18 starts running, each individual
processor grouping 20 runs in lock step with all the other processor groupings
20. Typically, all processor groupings 20 will agree on any outputs that they
generate (this is the non-fault state of voted processing circuit 18). In the event
that one processor grouping 20 generates a signal that is in disagreement with
the other signals, voter 22 initiates a recovery process.
In the recovery process, the processor grouping 20 with the error
is immediately reset through line 40. This puts it back into a known state. The
remaining processor groupings are interrupted by voter 22 indicating that a fault
has occurred. When interrupted, the remaining processor groupings 20 start saving
any vital state information into main system memory. Upon completion of this interrupt
process, the fault detection logic resets all processor groupings 20 through line
40, whereupon they start executing code again from a state defined operating step
with minimal disruption.
Typically, when a processor grouping error occurs, the present invention
will successfully save vital information, resynchronize the processor groupings
20 and resume normal execution of code. However, this is not always the case. It
is possible for upsets to occur while voted processing circuit 20 is attempting
to recover from a previous error. While the recovery interrupt is being processed,
voter 22 continues to monitor the outputs of the remaining processor groupings
20. Should voter 22 detect further disagreements such that a majority of the remaining
processor groupings 20 are not known to be in complete agreement, it will declare
a fatal error and immediately reset all processor groupings 20 through line 40.
Once processor groupings 20 have all been reset, voted processing circuit 18 will
start executing code from a hardware defined operating step.
The present invention thus achieves an improved system for preventing
single event upset induced errors by incorporating the processor support logic
before voter 20. While voter logic is susceptible to SEU induced errors, the number
of transistors involved in the voter logic is far smaller than the number of peripheral
logic transistors. Therefore, system exposure to SEUs is greatly reduced and consequent
system reliability is increased. Without the invention, satellite system 10 will
require ground intervention about once every thirty years. The majority of this
is due to errors in processor support logic. With the present invention, satellite
system 10 will require ground intervention about once every one hundred twenty
years, resulting in a four-fold improvement.
From the foregoing, it can be seen that there has been brought to
the art a new and improved voted processing system. It is to be understood that
the preceding description of the preferred embodiment is merely illustrative of
some of the many specific embodiments that represent applications of the principles
of the present invention. Clearly, numerous and other arrangements would be evident
to those skilled in the art without departing from the scope of the invention as
defined by the following claims: