This application pertains to the art of neural networks, and more
particularly to an optically addressed neural network according to the preamble
of claim 1.
The invention is particularly applicable to extraction of information
from visual images and will be described with particular reference thereto. However,
it will be appreciated that the invention has broader applications, such as in
efficient fabrication of any neural network.
A concept similar to the one as outlined in the preamble of claim
1 is taught in the article Optically programmed neural network capable of stand-alone
operation, Applied Optics, 32(19993) September 10, No 26, by Richard G. Stearns.
The field of artificial intelligence-based computing is expanding
rapidly. Many of the fundamental concepts that have been developed in the last
few years, are already reaching the stage of second or third generation in sophistication.
One such area is the subdivision of artificial intelligence comprising neural networks.
A first generation neural network requires functional, digital computing
units which were interconnected by weighting values. Such functional units could
be comprised of independent hardware processors, or alternatively implemented by
software. In either instance, complex and extensive digital computing power was
More recently, it has been recognized that neural networks may be
realized by a combination of electronics and optics. In such systems, a combination
of photoconductive elements and lighting applied thereto was implemented to create
a neural network. See, for example, Stearns, Richard G., Trainable Optically
Programmed Neural Network, Applied Optics, Vol. 31, No. 29, October 10, 1992,
In this system, operation of a neural network with optically-addressed
weighting, constructed from a two-dimensional photoconductor array that is masked
by a liquid-crystal display (LCD) was provided. A fully trainable three-layer perceptron
network was demonstrated using this architecture, which was capable of operating
in a completely standalone mode, once trained. In the previous work, data was input
to the network electronically, by applying voltages to the photoconductor array.
Thus, it is the object of the present invention to provide a network
architecture that may be extended to allow direct optical input.
According to the present invention this object is solved by any of
the optically addressed neural network having the features set out in claims 1
and 9, and by a neural method according to claim 10.
Once trained, the network is capable of processing in real time an
image projected onto it, in a completely standalone mode.
Artificial neural networks appear to be naturally suited to a number
of image processing problems, including for example pattern recognition. This results
in part from their inherent parallel architecture, as well as their ability to
perform well in the presence of image noise and degradation. It would follow that
a compact hardware architecture that could combine optical image capture and neural
network processing would be of significant interest. The subject application teaches
such an architecture, which combines real-time image capture and neural network
classification within a single processing module.
Preferably, the first subset is comprised of first and second rectangular
sub-portions sharing no common column conductor therebetween. The neural network
preferably further comprises a second rectangular subset of array elements sharing
at least one of no column conductor and no common column conductor with one of
the first and second rectangular sub-portions.
In accordance with a more limited aspect of the subject invention,
the input lines are provided by an image capture mechanism disposed within the
two-dimensional array of photoconductors itself.
Preferably, the mask includes a first window portion defining light
passing therethrough so as to affect a subset wkj of the photoconductors,
which subset is defined by a subset of the columns of the photoconductors in the
Preferably, the mask includes a second window portion defining light
passing therethrough so as to affect a subset wlk of the photoconductors,
which subset is defined by a subset of the columns of the photoconductors not disposed
in the J rows. Preferably, the mask includes a generally opaque portion so as to
prevent light from substantially affecting the photoconductors on all areas of
the twodimensional array other than wkj and wlk. Preferably,
the mask includes a third window portion so as to communicate an image to an area
wji of the photoconductors, which area shares no column conductor or
row conductor with the conductors wkj
or wlk, and wherein column
conductors associated with photoconductors of the area wji comprise
the J input conductors. Preferably, photoconductors of the areas wkj
and wlk share no column conductor and now row conductor. Preferably,
the second subset is formed from amplified column signals associated with one-half
the number of column conductors of wlk.
Regarding the neural method according to claim 10, preferably, the
step of selectively controlling includes the step of selectively maintaining subportions
of the mask at a substantially opaque transmissivity.
The method preferably further comprises the steps of: receiving an
optical image into an imaging array of photoconductors; and communicating an output
signal from the imaging array generated as a result of a received optical image
to each of the array nodes of the first defined subset as the input signal. The
method preferably further comprises the step of outputting a portion of the amplified
signals, unique to the array feedback portion of the array nodes being unique to
the first defined subset, as an output signal. The method preferably further comprises
the step of impressing includes the step of impressing the amplified signals to
the array feedback portion defined as unique to the first defined subset.
An advantage of the present invention is the provision of a highly
efficient, compact neural network system.
Yet another advantage of the present invention is the provision of
a highly-integrated image recognition system.
Yet a further advantage of the present invention is the provision
of an accurate neural network which may be fabricated inexpensively.
The invention may take physical form in certain parts and arrangements
of parts, a preferred embodiment and method of which will be described in detail
in this specification and illustrated in the accompanying drawings which form a
part hereof, and wherein:
- FIGS. 1(a) and 1(b) illustrate, in schematic form, an architecture of a three-layer
perception network of the subject invention;
- FIGS. 2(a)-2(k) illustrate a modified forms of the architecture of FIG. 1 wherein
an incident image is projected onto an array of photoconductors to accomplish specified
- FIGS. 3(a)-3(j) illustrate a series of examples in connection with a training
set for handprinted digit recognition;
- FIG. 4 illustrates a network training error for the example handprinted digit
- FIGS. 5(a)-5(j) illustrate, in histogram form, a typical real-time classification
performance of a trained network of the subject invention;
- FIGS. 6(a)-6(h) illustrate several examples of images used to train the network
to recognize a series of human faces;
- FIG. 7 illustrates a suitable liquid crystal display pattern used to filter
an optical image in training a network to recognize faces, such as was provided
in FIG. 6;
- FIG. 8 illustrates a network training error for facial recognition;
- FIGS. 9(a)-9(j) illustrate examples of realtime classification of performance
of a trained network; and
- FIG. 10 illustrates a schematic of an alternative embodiment of a suitable system
implementation of the subject invention.
FIG. 1 illustrates an optically-addressed neural network A of the
The key elements of the optically-addressed neural network have been
described in detail elsewhere, and hence will only be briefly discussed here. The
network is based upon the combination of a two-dimensional array of photoconductors,
with an LCD that is aligned above the photoconductor array. The photoconductor
array is then illuminated through the LCD. A schematic 10 of the two-dimensional
photoconductor array 12 is shown in Fig. 1(a). It consists of a grid of 128 horizontal
lines 14 and 128 vertical lines 16, with an interdigitated a-Si:H photoconductive
sensor fabricated at each node 18 of the grid. Thus at each node 18 is a resistive
interconnection, which is optically programmed by varying the light incident upon
it through a generally aligned LCD panel 20. The system is suitably configured
so that voltages are applied along the horizontal lines 14 of the array, with currents
measured through the vertical lines 18, which are advantageously held at virtual
ground potential. The vertical lines 18 are suitably paired to allow bipolar weighting
in the neural network. The 64 pairs of vertical lines 16 are fed into 64 nonlinear
differential transresistance amplifiers 30. The outputs of these 64 amplifiers
are routed back to the bottom 64 horizontal conducting lines 24 of the sensor array
(see Fig. 1b).
A pitch of the photoconductor array in both the horizontal and vertical
directions is suitably 272 µm. An active-matrix LCD 20, whose pixel pitch is 68µm,
is aligned in the preferred embodiment directly above the sensor array 12, so that
groups of 4x4 LCD pixels mask each photoconductive sensor beneath. The LCD 20 and
sensor array 12 are then illuminated by a relatively collimated light source, such
as a common 35mm projector.
In Fig. 1(b) is shown schematically how the system is driven to implement
a standard perceptron network with one hidden layer. The LCD array 20 is maintained
in a minimum transmission state, except for two rectangular areas which correspond
to the weight fields wkjI, between the input-layers 24 and
hidden-layers 26, and wlkII, between the hidden-and output-layers
of the network, respectively. In the system of Fig. 1(b) the network has J input
neurons, which correspond to J voltages applied to a subset of the top 64 horizontal
lines of the sensor array. There are K hidden-layer neurons, which because of the
vertical line pairing, make use of 2K vertical lines on the sensor array. Thus
the differential current that flows through each of the K vertical line pairs of
the sensor array 12 correspond to the input of a hidden-layer neuron. The nonlinear
transresistance amplifiers perform sigmoidal transformation of the input signals,
and their output voltages correspond to the hidden-layer neurons output values.
By varying the transmission of pixels of the LCD array 20 in the rectangular region
wkjI, interconnections weights between the input-layer and
hidden-layer neurons are adjusted.
A similar rectangular region (wlk) is defined between the
K horizontal lines of the photoconductor array 12 which carry the voltages of the
hidden-layer neuron outputs, and L vertical line pairs (2L vertical lines), which
carry differential currents corresponding to the output-layer neurons. This second
rectangular region wlkII defines the interconnections weights
between the hidden-layer and output-layer neurons of the network. Output lines 24
are shown connected to a nonlinear current-to-voltage conversion unit or amplifier
Because there are 16 LCD pixels above each photoconductive sensor
of the LCD array 20, and two neighboring photoconductive sensors are used to produce
one bipolar weight in the network, there are 32 LCD pixels used for each interconnection
weight. The LCD is capable of generating -16 levels of gray per pixel,
which should in theory allow a large number of weight levels. In the present embodiment,
129 levels of bipolar weighting are implemented, using a halftoning scheme that
allows one LCD pixel of the 32 to be driven in a gray-scale mode at any given time,
with all others driven in binary fashion. This approach is suitably taken to minimize
the effect of LCD pixel nonuniformity, which is most pronounced when pixels are
driven at intermediate gray values.
The LCD array 12 is suitably driven by a Sun IPX workstation. In addition,
the top 64 lines of the sensor array are driven by digital-to-analog converters,
which are suitably controlled by the same workstation. The workstation furthermore
is interfaced to analog-to-digital conversion circuitry, which monitors the outputs
of the 64 nonlinear transresistance amplifiers shown at the bottom of the sensor
array 12 in Fig. 1(b).
A suitable range of voltages applied to the top 64 horizontal lines
of the sensor array is ± 10V. The nonlinear transresistance amplifiers have a small
signal gain of 2 x 108 V/A (differential input current), and saturate
in a sigmoidal fashion at output levels of ± 10V. Associating the hardware network
with a theoretical network whose neurons obey a tanh(x) transformation relation,
it follows that a voltage of 10V in the hardware network corresponds to a neuron
output value of 1.0 in the theoretical network. In typical operation, a single
bipolar weight is found to produce a maximum differential current of 0.80 µA, resulting
in a voltage of 7.4V at the transresistance amplifier output. This implies that
a single weight in the hardware network is limited to a maximum value corresponding
to tanh-1(7.4) = 0.95 in the corresponding theoretical network.
Training occurs by adjustment of the interconnection weights, using
a standard backpropagation algorithm. As mentioned before, neuron output values
of the hardware network are mapped to those of the standard theoretical network
by equating ± 10V in the hardware network to outputs of ± 1.0 in the theoretical
network. The interconnection weights of the hardware network are conveniently mapped
to values between ± 1.0 in the theoretical network (within measurement error of
the maximum measured values ± 0.95). To train the network, backpropagation of the
hardware network output error is performed by the digital computer. In performing
the backpropagation, the output values of the hidden-layer and output-layer neurons
are measured directly, and an ideal tanh(x) neuron transfer function is assumed.
The weight values used in the backpropagation are the intended values which have
been programmed onto the LCD array 20. Thus any nonuniformity or nonideality in
implementation of the weighting through the LCD, or in the nonlinear transformation
of the transresistance amplifiers, is ignored in the backpropagation. The experimental
success of the network indicates that such nonidealities are compensated for in
the adaptive nature of the training algorithm.
The system of Fig. 1(b) functions well as a three-layer perceptron
network. Training in the hardware network is comparable to that of the ideal (theoretical)
network, and once trained, the hardware network may be operated as a standalone
processor, with a processing time of -100 µs through the three-layer
In the system architecture of Fig. 1(b), input to the network occurs
along the horizontal lines 14 of the photoconductor array 12, as a series of applied
voltages. For many applications this may be appropriate. As discussed previously,
one possible exception may occur in the case of image processing, which represents
an important class of problems in the study of neural networks. For many image
processing tasks it may be more desirable to create a compact network architecture
that is able to sense an incident optical image and process the information directly.
Because the present network is based upon the use of a photosensitive array, it
is natural to investigate the extension of the system to allow optical image capture.
Turning next to Fig. 2(a), shown is an architecture of an optically-addressed
neural network 40 used to directly allow optical imaging to be incorporated into
the hardware neural network. A new region wji0 of transmissivity
in the LCD array 42 is introduced into the system, through which is sensed an optical
image projected onto it in window region wji0. Therefore,
in this system, an incident image is confined to the upper left quadrant of the
sensor array, suitably covering a region of 64 x 64 sensor nodes. Both of the weight
fields wkjI and wlkII still exist,
although their location on the LCD 42 is changed, as indicated in the figure. Consideration
of the architecture reveals that the photoconductor array 12 essentially compresses
the incident 64x64 pixel optical image into 32 lines of data (the 32 vertical line
pairs illuminated by the incident image). These 32 lines are automatically routed
into the horizontal lines of the weight field wkjI. The vertical
lines of the weight field wkjI correspond to the hidden-layer
neuron inputs, and their outputs are routed once again to the horizontal lines of
weight field wlkII. The output of the network is as usual
associated with the vertical lines of the weight field wlkII
and is provided to amplifier 44. The amplifier 44 provides separate amplification
to each signal line connected thereto.
Fig. 2(a) depicts only the column (vertical) lines being connected
to amplifier (set) 44. It is to be appreciated that amplification may be alternatively
provided, in whole, or in part, to selected horizontal (row) conductors. The dominant
constraint is that when an impressed voltage is made on one row or column conductor,
amplification must be associated with the complementary conductor of such node.
Two important points may be made immediately concerning the system
of Fig. 2(a). First, the system allows the input of an image that is much larger
(in terms of number of sensed pixels, here 64 x 64 = 4096) than the maximum input
size of the neural network itself. This is a useful feature: it is very typical
that two-dimensional images contain many more pixels than is convenient for direct
input into a neural network, and hence some preprocessing transformation of the
direct image is desirable (e.g. feature extraction). In the present system, the
image is integrated along vertical lines, corresponding to the summing of currents
in the photoconductor array. It is appreciated that for some applications, this
particular method of compressing the image information may not be optimal. The
issue will be discussed further below
A second important point concerning the system of Fig. 2(a) is that
the LCD array 42 may be used as a programmable filter for the imaging process.
Thus the LCD may be used to specifically tailor the transformation of the input
image, in a manner that is suited for a given processing task. For example, one
may program the LCD over the region wji0 to be sensitive to
different spatial frequencies along different vertical lines of the incident image,
by impressing upon the LCD patterns of appropriate spatial frequency modulation.
More generally, as will be discussed below, the standard backpropagation training
algorithm may be extended to allow adjustment of the weight field wji0.
This allows the three weights fields wji0, wkjI,
and wlkII to simultaneously be trained to perform a given
image processing task.
In the system of Fig. 2(a), the weight fields wkjI
and wlkII correspond to the standard interconnection weights
of a three-layer perceptron network, and operate in a manner entirely comparable
to the original architecture of Fig. 1(b). In particular, the LCD array 42 is uniformly
illuminated over the areas corresponding to these weight fields, with incident
light intensity I0. As discussed in Section 2, for the value of I0
typically used in the system, the range of these weight values may be taken to
correspond to ± 1.0, when mapped to a theoretical network of tanh(x) transfer function.
The image to be processed is projected onto the LCD over the region
corresponding to the weight field wji0. In this area, the
incident light intensity varies spatially, and hence the underlying photoconductive
sensors are affected by the product of the local LCD transmissivity and the incident
illumination. As illustrated in FIG. 2(b), each weight wji0
corresponds to two photoconductive sensors, due to the differential pairing of
vertical lines on the array, when input to nonlinear transresistance amplifiers
44 of the system. A representative weight wji0 is indicated
in Fig. 2(b), with two interdigitated sensors each masked by 16 LCD pixels. The
integrated transmissivities of the 16 LCD pixels above the two sensors are indicated
and wji0- in Fig. 2(b). The
values wji0+ and wji0- may be thought
of as the weights that would be programmed if the incident illumination upon the
region in question were equal to I0 (in which case wji0
= wji0+ - wji0-). The actual incident
illumination is labelled lji+ and lji-, where
these values are normalized relative to I0 (hence for an uniform illumination
of l0, lji+ = lji- = 1.0). With these
definitions, we may write:
wji0 = wji0 + lji+
- wji0 - lji-
Each partial weight wji0+ and wji0-
is now governed by a 4 x 4 pixel group on the LCD, which is driven in the same
quasi-halftone manner as described in Section 2, so that each partial weight may
be programmed to one of 65 levels, corresponding to values between 0 and 1.0 in
a theoretical network of tanh(x) neuron transfer function.
In the modified network of Fig. 2(a) there are still 'inputs' associated
with voltages that drive the top 64 horizontal lines of the photoconductor array.
In theory, these voltage levels may be included in the training process, though
they remain static as different images are projected onto the LCD array 42. Upon
examination of the architecture it is clear that any effect produced upon the system
through variation of these 'input' voltage levels may be achieved alternatively
by modification of the weight field wji0, with all 'input'
levels reset to 1.0 (i.e. 10V on the sensor array) in this region. It is therefore
much simpler to set all of the inputs to the weight field wji0
to a value of 1.0, which corresponds to programming the top 64 horizontal lines
of the sensor array to a value of 10 volts. Note that in the hardware system, this
means that one 10V power supply is able to drive the photoconductor array, greatly
simplifying its operation.
The (idealized) forward propagation through the network of Fig. 2(a)
therefore may be written as:
fj = tanh ( Σi [wji0+
lji+ - wji0- lji-])
hk = tanh(Σj wkjl
ol = tanh ( Σk wlkll
where fj corresponds to the compressed image information, and is the
input to the subsequent three-layer perceptron network. The quantities hk
and ol correspond respectively to the hidden-layer and output-layer
neuron values. The weights wji0+ and wji0-
are constrained to lie within the range zero to one, and weights wkjI
and wlkII are constrained to lie within the range ± 1.0.
The standard backpropagation training algorithm may be extended to
the architecture of Fig. 2(a), by appropriate chain rule differentiation to calculate
∂E/∂Wji0+ and ∂E/∂Wji0-,
where E is the network error,
E = SΣl (tl-ol)2
and tl is the target value for output neuron ol. The resulting
equations for updating the weights are:
δwlkII(t) = vδwlkII(t-1)
+ (η/epoch_no) Σepoch hk(tl-to)(1-ol2),
δwkjI(t) = vδwkjI(t-1)
+ (η/epoch_no) Σepoch fj (1-hk2)
δwji0γ(t) = vδwji0γ(t-1)
+ (η/epoch_no) Σepoch ljiγ(1-fj2)
Here the learning coefficient is denoted by η and the momentum coefficient by
v. The parameter γ in Eq. 4(c) represents either a + or - symbol. Herein,
one epoch always corresponds to the presentation often input images (i.e. epoch.no
The network has been tested on two problems which form suitable examples:
recognition of handprinted digits and recognition of human faces. In running the
network, input images are projected onto the upper left quadrant of the LCD (see
Fig. 2a), using a Sharp #XV-101TU LCD Video projector, whose video input is obtained
from a video camera. The same projector is used to illuminate uniformly the weight
fields wkjI and wlkII on the LCD. For
both recognition tasks, the network used contains ten hidden units and five output
units (thus in Fig. 2a, I = 64, J = 32, K = 10, and L = 5).
In training the network, exemplar images are not projected onto the
LCD using the video projector. Instead, the training images are downloaded onto
the LCD directly over the weight field wji0. Thus, for example,
over a 4 x 4 pixel area of the LCD whose transmissivity would be programmed to
a value wji0+ in the trained network, the transmissivity is
programmed to the value lji + wji0+ during training.
In the training phase the incident illumination over the entire LCD is uniform.
This approach of imprinting the training images onto the LCD is assumed largely
for convenience. Training images are stored digitally, as 64 x 64 pixel grayscale
bitmap images. It is convenient to write these images onto the LCD during training,
rather than feed them into the video projector. It would certainly be possible to
use projected images for training the network, and this may yield better results
in certain circumstances, as that is the mode in which the network is run, once
For the problem of handprinted character recognition, the network
is trained to recognize the digits 0 to 5. Because ultimately the network is expected
to recognize direct, real-time images, it is not reasonable to rigorously normalize
the characters to a bounding box, which is often done in such recognition tasks.
It is clear however that due to the physical size limitation of the present hardware
network (there are only ten hidden units), some constraint must be used in printing
the digits. In creating a training set, and in testing the trained network, a peripheral
box is employed inside of which the digits are printed. The digits are rendered
so that they fill approximately the full height of the box (the width is not constrained).
Fig. 3 shows examples of 10 training images.
In the structure exemplified by Fig. 2(a), realized is an optical
neural network for a standard three-layer perceptron. The interconnection are illustrated
in the node map provided Fig. 2(c).
Figs. 2(d) and 2(e) show the architecture and node map associated
with a further interconnection realizable by a slight variation in mask construction,
i.e., connections from layer K to layer J. It will be appreciated that the system
differs from that in Fig. 2(a) by mask/interconnections selection. Similar pairings
are provided by Figs. 2(f) and 2(g) (recurrent connection within layer K), Figs.
2(h) and 2(i) (recurrent connections within layer J), and Figs. 2(j) and 2(k) (connections
from layer J to layer L). From these illustrations, it will be appreciated that
many different neural interconnections are realizable by merely selecting the mask
The illustrative network is trained on 300 exemplars, with weight
fields wji0+ and wji0- initialized
to random values in the range 0 to 0.3, and weights wkjI and
wlkII randomly initialized to values in the range ± 0.2.
During training, as well as running of the network, the top 64 horizontal lines
of the photoconductor array are maintained at 10.0 V. One output neuron is assigned
to each of the five digits, and is trained to produce a value of 0.8 (i.e. 8.0 V)
if the output neuron corresponds to that digit, and a value of -0.8 (i.e. -8.0
V) otherwise. In Fig. 4 are shown data of the network error during training. Once
trained, the system is exercised by projecting real-time images captured using
the camera/video projector combination. In Fig. 5 are shown ten test images; the
network output is indicated by histograms, with the five bins corresponding to
the five output neuron values. It should be emphasized that the network response
to the digits of Fig. 5 is performed in real time; the bitmap images shown in Fig.
5 were obtained by electronic scanning of the paper upon which they were written,
after testing of the network.
The digits and histograms of Figs. 5(a) through 5(e) indicate the
typical classification performance of the trained network. In Figs. 5(f) through
5(h) are shown sequential images of the creation of the digit four. The first vertical
stroke in rendering the digit is recognized as a 1 (Fig. 5f). After the horizontal
stroke is added (Fig. 5g), the network no longer classifies the image strongly
as any of the five digits. With introduction of the final diagonal stroke (Fig.
5h), the network classifies the image as that of the digit four. Note that this
rendering of a four is different than that shown in Fig. 5(e): the network was
trained to recognize both cases. In Fig. 5(i) an example is shown of correct classification
even when the digit does not touch both the top and bottom of the bounding box.
Finally, Fig. 5(j) represents a case in which the network is unable to classify
the image. It is interesting to note that the trained network performed best when
the bounding box was included during testing. It might have been expected that since
in training the bounding box was common to all exemplars the network would learn
to ignore its presence, but this was not found to be the case.
Similarly, the network is trainable to recognize 64 x 64 pixel images
of four different faces. In Fig. 6 are shown examples of eight training images.
The network was trained on 250 exemplars, which included images of the faces under
varying illumination, varying angle of lateral head tilt (within a lateral range
of ± 15 degrees from direct view), varying magnification of the face (±5%), varying
translation of the head within the frame ( ± 10%), and varying facial expression.
To successfully train the network in facial recognition, it is desirable to begin
with a structured weight field wji0, rather than the initial
random pattern that sufficed for training of the network on handprinted digits.
The initial weight field wji0 used for facial recognition
is indicated in Fig. 7. The pattern of the weight field is seen to select different
spatial frequencies of the image, along its vertical columns. Every four vertical
line pairs of the sensor array (i.e. 8 lines) perform a crude spatial-frequency
compression (in the vertical dimension) of the corresponding portion of the image.
With this initial weight field wji0, and random values for
the weights wkjI and wlkII, the network
is successfully trainable. A typical training error curve is shown in Fig. 8. In
Fig. 9 are shown ten examples of the response of the trained network to real-time
video images of the four faces.
In Figures 9(a) through 9(e) and 9(f) through 9(i) are shown examples
of correct network classification. The figures represent notable variation in image
capture: there is significant lateral tilting of the head in Figs. 9(f), (g), and
(i). Furthermore, the face of Fig. 9(f) is translated within the frame of the image.
The illumination has been altered in generating the images of Figs. 9(b) and 9(h).
Examples of poor network classification are shown in Figs. 9(e) and 9(j). The illumination
of the head in Fig. 9(j) may be too different from that in the training exemplars
for the network to correctly classify the image. The reason for incorrect classification
of the image in Fig. 9(e) is not clear, but may be associated with an unacceptable
reduction of the head size. These results on recognition of faces may be compared
to recent work by others, using photorefractive holograms.
The disclosed network is shown to train well, considering its size,
in recognition of digit and facial images. It may be noted that the results of
Figs. 4, 5, 8, and 9 are found to be very similar to simulated results of an ideal
network, using Equations (1) - (4) to describe forward propagation and training.
In particular, after 750 epochs, the training error in the simulated network is
only -30% below that of the hardware network, for both the digit and facial recognition
tasks. Larger networks are currently being simulated, to understand the capability
of the system.
The above results indicate the opticallyaddressed neural network architecture
to allow direct projection of optical images onto the network, with subsequent
neural network processing of the sensed images. This may be a very attractive and
compact architecture for some image recognition tasks. The network, once trained,
responds in real time to images projected onto it. The response time of the trained
network corresponds to the combined response times of the photoconductive sensors
and the nonlinear transresistance amplifier circuitry. The present transresistance
amplifier circuitry has a response time of 40µs. In previous work, it has been
shown that the response time of the sensors to changes in incident illumination
is of the order of 200 - 300 µs. This is much longer than the response time of the
sensors to a change in applied voltage at constant illumination (several microseconds),
and hence is expected to limit the processing speed of the trained system.
The training time of the network is limited by two factors. First,
there is the time needed to measure the hardware neuron values, and perform the
subsequent backpropagation of the network error in the digital computer. Second,
there is the time associated with changing the pattern on the LCD array, either
merely updating the interconnection weight fields, or additionally impressing the
training image onto the upper-left quadrant of the LCD. In the present implementation,
training occurs at a rate of 0.3 epoch/sec. (3 exemplars/sec.).
It is useful to bring to attention the areas in the network over which
the LCD pixels are maintained in a state of minimum transmittance (the black regions
shown in the schematics of Figs. 1b and 2a). These areas typically correspond to
interconnections not used in the present multilayer feedforward network, such as
recurrent connections within layers, or connections between non-sequential layers
(e.g. between input and output layers). The ability to implement such interconnections
occurs naturally in the architecture of the present system. For standard feedforward
networks, this results in some inefficiency in implementation, as a significant
fraction of the possible network interconnections are left unused. It is possible
to implement the present feedforward architecture more efficiently by driving and
sensing the horizontal and vertical lines from all four sides of the sensor array.
In Fig. 10 is shown an example of such an efficient implementation. Here the conductive
rows 50 and columns 52 of the sensor array 54 are severed (electrically isolated)
along the two dotted lines shown. An optical image is incident upon the upper half
of the sensor array and super-imposed LCD 60. The system indicated in Fig. 10 allows
(64-L) hidden layer unit's with L outputs. For such a network, only 2L2
sensors are not utilized. Photoconductive outputs are committed to amplifiers 62
and 64. This represents a very efficient use of available interconnections.
As mentioned earlier, the image-sensing architecture of the present
network may be optimized further. In the disclosed embodiment, incident optical
image is integrated along vertical lines 52, due to the hardwired interconnection
of the photoconductive sensor array. This sensing architecture is capable of producing
successful pattern recognition. Improved network performance may be expected if
the incident optical image were transformed differently upon its initial capture.
In particular, rather than integrate throughout entire rows or columns of the image,
it may be more suitable to capture and process local regions of the image, passing
this local information on to the following layer of the network. In the region of
the photoconductive sensor array dedicated to image capture, photoconductive structures
(or alternatively photodiode structures) is suitably fabricated which are each
sensitive to a local area of the incident image. These structures could be made
to allow bipolar weighting of the incident image, if desired. Furthermore, retaining
the LCD as a spatial light modulator over this region would allow the system to
perform adaptive local filtering of the optical image. This approach would be more
consistent with other neural network architectures which employ local feature extraction
as an initial step in image processing.
Further adaptation includes the use of a lenslet array to project
duplicate images onto the sensor array. In this case, any given region of the optical
image would be processed by multiple sensors, allowing multiple features to be
extracted for each region of the image.
In the preferred embodiment, the nonlinear current-to-voltage converters
used to perform sigmoidal transformation are being integrated onto the glass substrate
of the photoconductive array, using polycrystalline silicon technology. Successful
integration of these amplifiers should allow the entire system to be contained
within the photoconductor array substrate and LCD. In particular, only five external
voltage lines will be needed to drive the entire sensor array and amplification
circuitry. In addition, if after training the network to perform a specific task,
the LCD were replaced by an appropriate static spatial light modulator (e.g photographic
film), the entire neural network module would require only these few input voltage
lines to capture and process an incident optical image.
A hardware neural network architecture has been taught, which is capable
of capturing and processing an incident optical image, in real time. The system,
based on the combination of a photoconductive array and LCD, operates in a standalone
mode, once trained. This architecture allows the filtering of the optical image
upon capture to be incorporated into the network training process. The network
has the potential to be very compact. Because all of the network components are
based upon large-area thin-film technology, there is great potential for scalability
and integration within the architecture.