Go to page: 1 2 [3]

[#] Sun Feb 11 2024 09:44:34 UTC from rss <>

Subject: Reverse-engineering an analog Bendix air data computer: part 4, the Mach section

In the 1950s, many fighter planes used the Bendix Central Air Data Computer (CADC) to compute airspeed, Mach number, and other "air data". The CADC is an analog computer, using tiny gears and specially-machined cams for its mathematics. In this article, part 4 of my series,1 I reverse engineer the Mach section of the CADC and explain its calculations. (In the photo below, the Mach section is the middle section of the CADC.)

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

Aircraft have determined airspeed from air pressure for over a century. A port in the side of the plane provides the static air pressure,2 the air pressure outside the aircraft. A pitot tube points forward and receives the "total" air pressure, a higher pressure due to the air forced into the tube by the speed of the airplane. The airspeed can be determined from the ratio of these two pressures, while the altitude can be determined from the static pressure.

But as you approach the speed of sound, the fluid dynamics of air change and the calculations become very complicated. With the development of supersonic fighter planes in the 1950s, simple mechanical instruments were no longer sufficient. Instead, an analog computer calculated the "air data" (airspeed, air density, Mach number, and so forth) from the pressure measurements. This computer then transmitted the air data electrically to the systems that needed it: instruments, weapons targeting, engine control, and so forth. Since the computer was centralized, the system was called a Central Air Data Computer or CADC, manufactured by Bendix and other companies.

A closeup of the numerous gears inside the CADC. Three differential gear mechanisms are visible.

Each value in the Bendix CADC is indicated by the rotational position of a shaft. Compact electric motors rotate the shafts, controlled by the pressure inputs. Gears, cams, and differentials perform computations, with the results indicated by more rotations. Devices called synchros converted the rotations to electrical outputs that are connected to other aircraft systems. The CADC is said to contain 46 synchros, 511 gears, 820 ball bearings, and a total of 2,781 major parts (but I haven't counted). These components are crammed into a compact cylinder: just 15 inches long and weighing 28.7 pounds.

The equations computed by the CADC are impressively complicated. For instance, one equation is:

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

It seems incredible that these functions could be computed mechanically, but three techniques make this possible. The fundamental mechanism is the differential gear, which adds or subtracts values. Second, logarithms are used extensively, so multiplications and divisions are implemented by additions and subtractions performed by a differential, while square roots are calculated by gearing down by a factor of 2. Finally, specially-shaped cams implement functions: logarithm, exponential, and application-specific functions. By combining these mechanisms, complicated functions can be computed mechanically, as I will explain below.

The differential

The differential gear assembly is the mathematical component of the CADC, as it performs addition or subtraction.3 The differential takes two input rotations and produces an output rotation that is the sum or difference of these rotations.4 Since most values in the CADC are expressed logarithmically, the differential computes multiplication and division when it adds or subtracts its inputs.

A closeup of a differential mechanism.

While the differential functions like the differential in a car, it is constructed differently, with a spur-gear design. This compact arrangement of gears is about 1 cm thick and 3 cm in diameter. The differential is mounted on a shaft along with three co-axial gears: two gears provide the inputs to the differential and the third provides the output. In the photo, the gears above and below the differential are the input gears. The entire differential body rotates with the sum, connected to the output gear at the top through a concentric shaft. (In practice, any of the three gears can be used as the output.) The two thick gears inside the differential body are part of the mechanism.

The cams

The CADC uses cams to implement various functions. Most importantly, cams compute logarithms and exponentials. Cams also implement complicated functions of one variable such as ${M}/{\sqrt{1 + .2 M^2}}$. The function is encoded into the cam's shape during manufacturing, so a hard-to-compute nonlinear function isn't a problem for the CADC. The photo below shows a cam with the follower arm in front. As the cam rotates, the follower moves in and out according to the cam's radius.

A cam inside the CADC implements a function.

However, the shape of the cam doesn't provide the function directly, as you might expect. The main problem with the straightforward approach is the discontinuity when the cam wraps around. For example, if the cam implemented an exponential directly, its radius would spiral exponentially and there would be a jump back to the starting value when it wraps around. Instead, the CADC uses a clever patented method: the cam encodes the difference between the desired function and a straight line. For example, an exponential curve is shown below (blue), with a line (red) between the endpoints. The height of the gray segment, the difference, specifies the radius of the cam (added to the cam's fixed minimum radius). The point is that this difference goes to 0 at the extremes, so the cam will no longer have a discontinuity when it wraps around. Moreover, this technique significantly reduces the size of the value (i.e. the height of the gray region is smaller than the height of the blue line), increasing the cam's accuracy.5

An exponential curve (blue), linear curve (red), and the difference (gray).

To make this work, the cam position must be added to the linear value to yield the result. This is implemented by combining each cam with a differential gear; watch for the paired cams and differentials below. As the diagram below shows, the input (23) drives the cam (30) and the differential (25, 37-41). The follower (32) tracks the cam and provides a second input (35) to the differential. The sum from the differential produces the desired function (26).

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

The synchro outputs

A synchro is an interesting device that can transmit a rotational position electrically over three wires. In appearance, a synchro is similar to an electric motor, but its internal construction is different, as shown below. Before digital systems, synchros were very popular for transmitting signals electrically through an aircraft. For instance, a synchro could transmit an altitude reading to a cockpit display or a targeting system. Two synchros at different locations have their stator windings connected together, while the rotor windings are driven with AC. Rotating the shaft of one synchro causes the other to rotate to the same position.6

Cross-section diagram of a synchro showing the rotor and stators.

For the CADC, most of the outputs are synchro signals, using compact synchros that are about 3 cm in length. For improved resolution, many of the CADC outputs use two synchros: a coarse synchro and a fine synchro. The two synchros are typically geared in an 11:1 ratio, so the fine synchro rotates 11 times as fast as the coarse synchro. Over the output range, the coarse synchro may turn 180°, providing the approximate output unambiguously, while the fine synchro spins multiple times to provide more accuracy.

Examining the Mach section of the CADC

Another view of the CADC.

The Bendix CADC is constructed from modular sections. In this blog post, I'm focusing on the middle section, called the "Mach section" and indicated by the arrow above. This section computes log static pressure, impact pressure, pressure ratio, and Mach number and provides these outputs electrically as synchro signals. It also provides the log pressure ratio and log static pressure to the rest of the CADC as shaft rotations. The left section of the CADC computes values related to airspeed, air density, and temperature.7 The right section has the pressure sensors (the black domes), along with the servo mechanisms that control them.

I had feared that any attempt at disassembly would result in tiny gears flying in every direction, but the CADC was designed to be taken apart for maintenance. Thus, I could remove the left section of the CADC for analysis. Unfortunately, we lost the gear alignment between the sections and don't have the calibration instructions, so the CADC no longer produces accurate results.

The diagram below shows the internal components of the Mach section after disassembly. The synchros are in pairs to generate coarse and fine outputs; the coarse synchros can be distinguished because they have spiral anti-backlash springs installed. These springs prevent wobble in the synchro and gear train as the gears change direction. The gears and differentials are not visible from this angle as they are underneath the metal plate. The Pressure Error Correction (PEC) subsystem has a motor to drive the shaft and a control transformer for feedback. The Mach section has two D-sub connectors. The one on the right links the Mach section and pressure section to the front section of the CADC. The Position Error Correction (PEC) servo amplifier board plugs into the left connector. The static pressure and total pressure input lines have fittings so the lines can be disconnected from the lines from the front of the CADC.8

The Mach section with components labeled.

The photo below shows the left section of the CADC. This section meshes with the Mach section shown above. The two sections have parts at various heights, so they join in a complicated way. Two gears receive the pressure signals $ log ~ P_t / P_s $ and $ log ~ P_s $ from the Mach section. The third gear sends the log total temperature to the rest of the CADC. The electrical connector (a standard 37-pin D-sub) supplies 120 V 400 Hz power to the Mach section and pressure transducers and passes synchro signals to the output connectors.

The left part of the CADC that meshes with the Mach section.

The position error correction servo loop

The CADC receives two pressure inputs and two pressure transducers convert the pressures into rotational positions, providing the indicated static pressure $ P_{si} $ and the total pressure $ P_t $ as shaft rotations to the rest of the CADC. (I explained the pressure transducers in detail in the previous article.)

There's one complication though. The static pressure $ P_s $ is the atmospheric pressure outside the aircraft. The problem is that the static pressure measurement is perturbed by the airflow around the aircraft, so the measured pressure (called the indicated static pressure $ P_{si} $) doesn't match the real pressure. This is bad because a "static-pressure error manifests itself as errors in indicated airspeed, altitude, and Mach number to the pilot."9

The solution is a correction factor called the Position Error Correction. This factor gives the ratio between the real pressure $ P_s $ and the measured pressure $ P_{si} $. By applying this correction factor to the indicated (i.e. measured) pressure, the true pressure can be obtained. Since this correction factor depends on the shape of the aircraft, it is generated outside the CADC by a separate cylindrical unit called the Compensator, customized to the aircraft type. The position error computation depends on two parameters: the Mach number provided by the CADC and the angle of attack provided by an aircraft sensor. The compensator determines the correction factor by using a three-dimensional cam. The vintage photo below shows the components inside the compensator.

"Static Pressure and Angle of Attack Compensator Type X1254115-1 (Cover Removed)" from Air Data Computer Mechanization.

The correction factor is transmitted from the compensator to the CADC as a synchro signal over three wires. To use this value, the CADC must convert the synchro signal to a shaft rotation. The CADC uses a motorized servo loop that rotates the shaft until the shaft position matches the angle specified by the synchro input.

The servo loop ensures that the shaft position matches the input angle.

The key to the servo loop is a control transformer. This device looks like a synchro and has five wires like a synchro, but its function is different. Like the synchro motor, the control transformer has three stator wires that provide the angle input. Unlike the synchro, the control transformer also uses the shaft position as an input, while the rotor winding generates an output voltage indicating the error. This output voltage indicates the error between the control transformer's shaft position and the three-wire angle input. The control transformer provides its error signal as a 400 Hz sine wave, with a larger signal indicating more error.10

The amplifier board (below) drives the motor in the appropriate direction to cancel out the error. The power transformer in the upper left is the largest component, powering the amplifier board from the CADC's 115-volt, 400 Hertz aviation power. Below it are two transformer-like components; these are the magnetic amplifiers. The relay in the lower-right corner switches the amplifier into test mode. The rest of the circuitry consists of transistors, resistors, capacitors, and diodes. The construction is completely different from modern printed circuit boards. Instead, the amplifier uses point-to-point wiring between plastic-insulated metal pegs. Both sides of the board have components, with connections between the sides through the metal pegs.

The amplifier board for the position error correction.

The amplifier board is implemented with a transistor amplifier driving two magnetic amplifiers, which control the motor.11 (Magnetic amplifiers are an old technology that can amplify AC signals, allowing the relatively weak transistor output to control a larger AC output.12) The motor is a "Motor / Tachometer Generator" unit that also generates a voltage based on the motor's speed. This speed signal provides negative feedback, limiting the motor speed as the error becomes smaller and ensuring that the feedback loop doesn't overshoot. The photo below shows how the amplifier board is mounted in the middle of the CADC, behind the static pressure tubing.

Side view of the CADC.

The equations

Although the CADC looks like an inscrutable conglomeration of tiny gears, it is possible to trace out the gearing and see exactly how it computes the air data functions. With considerable effort, I have reverse-engineered the mechanisms to create the diagram below, showing how each computation is broken down into mechanical steps. Each line indicates a particular value, specified by a shaft rotation. The ⊕ symbol indicates a differential gear, adding or subtracting its inputs to produce another value. The cam symbol indicates a cam coupled to a differential gear. Each cam computes either a specific function or an exponential, providing the value as a rotation. At the right, the outputs are either shaft rotations to the rest of the CADC or synchro outputs.

This diagram shows how the values are computed. The differential numbers are my own arbitrary numbers. Click for a larger version.

I'll go through each calculation briefly.

log static pressure

The static pressure is calculated by dividing the indicated static pressure by the pressure error correction factor. Since these values are all represented logarithmically, the division turns into a subtraction, performed by a differential gear. The output goes to two synchros, geared to provide coarse and fine outputs.13

\[log ~ P_s = log ~ P_{si} - log ~ P_{si} / P_s \]

Impact pressure

The impact pressure is the pressure due to the aircraft's speed, the difference between the total pressure and the static pressure. To compute the impact pressure, the log pressure values are first converted to linear values by exponentiation, performed by cams. The linear pressure values are then subtracted by a differential gear. Finally, the impact pressure is output through two synchros, coarse and fine in an 11:1 ratio.

\[ P_t - P_s = exp(log ~ P_t) - exp(log ~ P_s) \]

log pressure ratio

The log pressure ratio $ P_t/P_s $ is the ratio of total pressure to static pressure. This value is important because it is used to compute the Mach number, true airspeed, and log free air temperature. The Mach number is computed in the Mach section as described below. The true airspeed and log free air temperature are computed in the left section. The left section receives the log pressure ratio as a rotation. Since the left section and Mach section can be separated for maintenance, a direct shaft connection is not used. Instead, each section has a gear and the gears mesh when the sections are joined.

Computing the log pressure ratio is straightforward. Since the log total pressure and log static pressure are both available, subtracting the logs with a differential yields the desired value. That is,

\[log ~ P_t/P_s = log ~ P_t - log ~ P_s \]

Mach number

The Mach number is defined in terms of $P_t/P_s $, with separate cases for subsonic and supersonic:14

\[M<1:\] \[~~~\frac{P_t}{P_s} = ( 1+.2M^2)^{3.5}\]

\[M > 1:\]

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

Although these equations are very complicated, the solution is a function of one variable $P_t/P_s$ so M can be computed with a single cam. In other words, the mathematics needed to be done when the CADC was manufactured, but once the cam exists, computing M is easy, using the log pressure ratio computed earlier:

\[ M = f(log ~ P_t / P_s) \]

Conclusions

The CADC performs nonlinear calculations that seem way too complicated to solve with mechanical gearing. But reverse-engineering the mechanism shows how the equations are broken down into steps that can be performed with cams and differentials, using logarithms for multiplication and division. The diagram below shows the complex gearing in the Mach section. Each differential below corresponds to a differential in the earlier equation diagram.

A closeup of the gears and cams in the Mach section. The differential for the pressure ratio is hidden in the middle.

Follow me on Twitter @kenshirriff or RSS for more reverse engineering. I'm also on Mastodon as @oldbytes.space@kenshirriff. Thanks to Joe for providing the CADC. Thanks to Nancy Chen for obtaining a hard-to-find document for me.15 Marc Verdiell and Eric Schlaepfer are working on the CADC with me. CuriousMarc's video shows the CADC in action:

Notes and references

My articles on the CADC are:
There is a lot of overlap between the articles, so skip over parts that seem repetitive :-) ↩
The static air pressure can also be provided by holes in the side of the pitot tube; this is the typical approach in fighter planes. ↩
Multiplying a rotation by a constant factor doesn't require a differential; it can be done simply with the ratio between two gears. (If a large gear rotates a small gear, the small gear rotates faster according to the size ratio.) Adding a constant to a rotation is even easier, just a matter of defining what shaft position indicates 0. For this reason, I will ignore constants in the equations. ↩
Strictly speaking, the output of the differential is the sum of the inputs divided by two. I'm ignoring the factor of 2 because the gear ratios can easily cancel it out. It's also arbitrary whether you think of the differential as adding or subtracting, since it depends on which rotation direction is defined as positive. ↩
The diagram below shows a typical cam function in more detail. The input is $log~ dP/P_s$ and the output is $log~M / \sqrt{1+.2KM^2}$. The small humped curve at the bottom is the cam correction. Although the input and output functions cover a wide range, the difference that is encoded in the cam is much smaller and drops to zero at both ends.

This diagram, from Patent 2969910, shows how a cam implements a complicated function.

↩
Internally, a synchro has a moving rotor winding and three fixed stator windings. When AC is applied to the rotor, voltages are developed on the stator windings depending on the position of the rotor. These voltages produce a torque that rotates the synchros to the same position. In other words, the rotor receives power (26 V, 400 Hz in this case), while the three stator wires transmit the position. The diagram below shows how a synchro is represented schematically, with rotor and stator coils.

The schematic symbol for a synchro.

A control transformer has a similar structure, but the rotor winding provides an output, instead of being powered. ↩
Specifically, the left part of the CADC computes true airspeed, air density, total temperature, log true free air temperature, and air density × speed of sound. I discussed the left section in detail here. ↩
From the outside, the CADC is a boring black cylinder, with no hint of the complex gearing inside. The CADC is wired to the rest of the aircraft through round military connectors. The front panel interfaces these connectors to the D-sub connectors used internally. The two pressure inputs are the black cylinders at the bottom of the photo.

The exterior of the CADC. It is packaged in a rugged metal cylinder. It is sealed by a soldered metal band, so we needed a blowtorch to open it.

↩
The concepts of position error correction are described here. ↩
The phase of the signal is 0° or 180°, depending on the direction of the error. In other words, the error signal is proportional to the driving AC signal in one direction and flipped when the error is in the other direction. This is important since it indicates which direction the motor should turn. When the error is eliminated, the signal is zero. ↩
I reverse-engineered the circuit board to create the schematic below for the amplifier. The idea is that one magnetic amplifier or the other is selected, depending on the phase of the error signal, causing the motor to turn counterclockwise or clockwise as needed. To implement this, the magnetic amplifier control windings are connected to opposite phases of the 400 Hz power. The transistor is connected to both magnetic amplifiers through diodes, so current will flow only if the transistor pulls the winding low during the half-cycle that the winding is powered high. Thus, depending on the phase of the transistor output, one winding or the other will be powered, allowing that magnetic amplifier to pass AC to the motor.

This reverse-engineered schematic probably has a few errors. Click the schematic for a larger version.

The CADC has four servo amplifiers: this one for pressure error correction, one for temperature, and two for pressure. The amplifiers have different types of inputs: the temperature input is the probe resistance, the pressure error correction uses an error voltage from the control transformer, and the pressure inputs are voltages from the inductive pickups in the sensor. The circuitry is roughly the same for each amplifier—a transistor amplifier driving two magnetic amplifiers—but the details are different. The largest difference is that each pressure transducer amplifier drives two motors (coarse and fine) so each has two transistor stages and four magnetic amplifiers. ↩
The basic idea of a magnetic amplifier is a controllable inductor. Normally, the inductor blocks alternating current. But applying a relatively small DC signal to a control winding causes the inductor to saturate, permitting the flow of AC. Since the magnetic amplifier uses a small signal to control a much larger signal, it provides amplification.

In the early 1900s, magnetic amplifiers were used in applications such as dimming lights. Germany improved the technology in World War II, using magnetic amplifiers in ships, rockets, and trains. The magnetic amplifier had a resurgence in the 1950s; the Univac Solid State computer used magnetic amplifiers (rather than vacuum tubes or transistors) as its logic elements. However, improvements in transistors made the magnetic amplifier obsolete except for specialized applications. (See my IEEE Spectrum article on magnetic amplifiers for more history of magnetic amplifiers.) ↩
The CADC specification defines how the parameter values correspond to rotation angles of the synchros. For instance, for the log static pressure synchros, the CADC supports the parameter range 0.8099 to 31.0185 inches of mercury. The spec defines the corresponding synchro outputs as 16,320° rotation of the fine synchro and 175.48° rotation of the coarse synchro over this range. The synchro null point corresponds to 29.92 inches of mercury (i.e. zero altitude). The fine synchro is geared to rotate 93 times as fast as the coarse synchro, so it rotates over 45 times during this range, providing higher resolution than a single synchro would provide. The other synchro pairs use a much smaller 11:1 ratio; presumably high accuracy of the static pressure was important. ↩
Although the CADC's equations may seem ad hoc, they can be derived from fluid dynamics principles. These equations were standardized in the 1950s by various government organizations including the National Bureau of Standards and NACA (the precursor of NASA). ↩
It was very difficult to find information about the CADC. The official military specification is MIL-C-25653C(USAF). After searching everywhere, I was finally able to get a copy from the Technical Reports & Standards unit of the Library of Congress. The other useful document was in an obscure conference proceedings from 1958: "Air Data Computer Mechanization" (Hazen), Symposium on the USAF Flight Control Data Integration Program, Wright Air Dev Center US Air Force, Feb 3-4, 1958, pp 171-194. ↩

[#] Sat Feb 17 2024 10:11:34 UTC from rss <>

Subject: Inside the mechanical Bendix Air Data Computer, part 5: motor/tachometers

The Bendix Central Air Data Computer (CADC) is an electromechanical analog computer that uses gears and cams for its mathematics. It was a key part of military planes such as the F-101 and the F-111 fighters, computing airspeed, Mach number, and other "air data". The rotating gears are powered by six small servomotors, so these motors are in a sense the fundamental component of the CADC. In the photo below, you can see one of the cylindrical motors near the center, about 1/3 of the way down.

The servomotors in the CADC are unlike standard motors. Their name—"Motor-Tachometer Generator" or "Motor and Rate Generator"1—indicates that each unit contains both a motor and a speed sensor. Because the motor and generator use two-phase signals, there are a total of eight colorful wires coming out, many more than a typical motor. Moreover, the direction of the motor can be controlled, unlike typical AC motors. I couldn't find a satisfactory explanation of how these units worked, so I bought one and disassembled it. This article (part 5 of my series on the CADC2) provides a complete teardown of the motor/generator and explain how it works.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The image below shows a closeup of two motors powering one of the pressure signal outputs. Note the bundles of colorful wires to each motor, entering in two locations. At the top, the motors drive complex gear trains. The high-speed motors are geared down by the gear trains to provide much slower rotations with sufficient torque to power the rest of the CADC's mechanisms.

Two motor/generators in the pressure section of the CADC. The one at the back is mostly hidden.

The motor/tachometer that we disassembled is shorter than the ones in the CADC (despite having the same part number), but the principles are the same. We started by removing a small C-clip on the end of the motor and and unscrewing the end plate. The unit is pretty simple mechanically. It has bearings at each end for the rotor shaft. There are four wires for the motor and four wires for the tachometer.3

The motor disassembled to show the internal components.

The rotor (below) has two parts on the shaft. the left part is for the motor and the right drum is for the tachometer. The left part is a squirrel-cage rotor4 for the motor. It consists of conducting bars (light-colored) on an iron core. The conductors are all connected at both ends by the conductive rings at either end. The metal drum on the right is used by the tachometer. Note that there are no electrical connections between the rotor components and the rest of the motor: there are no brushes or slip rings. The interaction between the rotor and the windings in the body of the motor is purely magnetic, as will be explained.

The rotor and shaft.

The motor/tachometer contains two cylindrical stators that create the magnetic fields, one for the motor and one for the tachometer. The photo below shows the motor stator inside the unit after removing the tachometer stator. The stators are encased in hard green plastic and tightly pressed inside the unit. In the center, eight metal poles are visible. They direct the magnetic field onto the rotor.

Inside the motor after removing the tachometer winding.

The photo below shows the stator for the tachometer, similar to the stator for the motor. Note the shallow notches that look like black lines in the body on the lower left. These are probably adjustments to the tachometer during manufacturing to compensate for imperfections. The adjustments ensure that the magnetic fields are nulled out so the tachometer returns zero voltage when stationary. The metal plate on top shields the tachometer from the motor's magnetic fields.

The stator for the tachometer.

The poles and the metal case of the stator look solid, but they are not. Instead, they are formed from a stack of thin laminations. The reason to use laminations instead of solid metal is to reduce eddy currents in the metal. Each lamination is varnished, so it is insulated from its neighbors, preventing the flow of eddy currents.

One lamination from the stack of laminations that make up the winding. The lamination suffered some damage during disassembly; it was originally round.

In the photo below, I removed some of the plastic to show the wire windings underneath. The wires look like bare copper, but they have a very thin layer of varnish to insulate them. There are two sets of windings (orange and blue, or red and black) around alternating metal poles. Note that the wires run along the pole, parallel to the rotor, and then wrap around the pole at the top and bottom, forming oblong coils around each pole.5 This generates a magnetic field through each pole.

Removing the plastic reveals the motor windings.

The motor

The motor part of the unit is a two-phase induction motor with a squirrel-cage rotor.6 There are no brushes or electrical connections to the rotor, and there are no magnets, so it isn't obvious what makes the rotor rotate. The trick is the "squirrel-cage" rotor, shown below. It consists of metal bars that are connected at the top and bottom by rings. Assume (for now) that the fixed part of the motor, the stator, creates a rotating magnetic field. The important principle is that a changing magnetic field will produce a current in a wire loop.7 As a result, each loop in the squirrel-cage rotor will have an induced current: current will flow up9 the bars facing the north magnetic field and down the south-facing bars, with the rings on the end closing the circuits.

A squirrel-cage rotor. The numbered parts are (1) shaft, (2) end cap, (3) laminations, and (4) splines to hold the laminations. Image from Robo Blazek.

But how does the stator produce a rotating magnetic field? And how do you control the direction of rotation? The next important principle is that current flowing through a wire produces a magnetic field.8 As a result, the currents in the squirrel cage rotor produce a magnetic field perpendicular to the cage. This magnetic field causes the rotor to turn in the same direction as the stator's magnetic field, driving the motor. Because the rotor is powered by the induced currents, the motor is called an induction motor.

The diagram below shows how the motor is wired, with a control winding and a reference winding. Both windings are powered with AC, but the control voltage either lags the reference winding by 90° or leads the reference winding by 90°, due to the capacitor. Suppose the current through the control winding lags by 90°. First, the reference voltage's sine wave will have a peak, producing the magnetic field's north pole at A. Next (90° later), the control voltage will peak, producing the north pole at B. The reference voltage will go negative, producing a south pole at A and thus a north pole at C. The control voltage will go negative, producing a south pole at B and a north pole at D. This cycle will repeat, with the magnetic field rotating counter-clockwise from A to D. Conversely, if the control voltage leads the reference voltage, the magnetic field will rotate clockwise. This causes the motor to spin in one direction or the other, with the direction controlled by the control voltage. (The motor has four poles for each winding, rather than the one shown below; this increases the torque and reduces the speed.)

Diagram showing the servomotor wiring.

The purpose of the capacitor is to provide the 90° phase shift so the reference voltage and the control voltage can be driven from the same single-phase AC supply (in this case, 26 volts, 400 hertz). Switching the polarity of the control voltage reverses the direction of the motor.

There are a few interesting things about induction motors. You might expect that the motor would spin at the same rate as the rotating magnetic field. However, this is not the case. Remember that a changing magnetic field induces the current in the squirrel-cage rotor. If the rotor is spinning at the same rate as the magnetic field, the rotor will encounter an unchanging magnetic field and there will be no current in the bars of the rotor. As a result, the rotor will not generate a magnetic field and there will be no torque to rotate it. The consequence is that the rotor must spin somewhat slower than the magnetic field. This is called "slippage" and is typically a few percent of the full speed, with more slippage as more torque is required.

Many household appliances use induction motors, but how do they generate a rotating magnetic field from a single-phase AC winding? The problem is that the magnetic field in a single AC winding will just flip back and forth, so the motor will not turn in either direction. One solution is a shaded-pole motor, which puts a copper bar around part of each pole to break the symmetry and produce a weakly rotating magnetic field. More powerful induction motors use a startup winding with a capacitor (analogous to the control winding). This winding can either be switched out of the circuit once the motor starts spinning,10 or used continuously, called a permanent-split capacitor (PSC) motor. The best solution is three-phase power (if available); a three-phase winding automatically produces a rotating magnetic field.

Tachometer/generator

The second part of the unit is the tachometer generator, sometimes called the rate unit.11 The purpose of the generator is to produce a voltage proportional to the speed of the shaft. The unusual thing about this generator is that it produces a 400-hertz output that is either in phase with the input or 180° out of phase. This is important because the phase indicates which direction the shaft is turning. Note that a "normal" generator is different: the output frequency is proportional to the speed.

The diagram below shows the principle behind the generator. It has two stator windings: the reference coil that is powered at 400 Hz, and the output coil that produces the output signal. When the rotor is stationary (A), the magnetic flux is perpendicular to the output coil, so no output voltage is produced. But when the rotor turns (B), eddy currents in the rotor distort the magnetic field. It now couples with the output coil, producing a voltage. As the rotor turns faster, the magnetic field is distorted more, increasing the coupling and thus the output voltage. If the rotor turns in the opposite direction (C), the magnetic field couples with the output coil in the opposite direction, inverting the output phase. (This diagram is more conceptual than realistic, with the coils and flux 90° from their real orientation, so don't take it too seriously. As shown earlier, the coils are perpendicular to the rotor so the real flux lines are completely different.)

Principle of the drag-cup rate generator. From Navy electricity and electronics training series: Principles of synchros, servos, and gyros, Fig 2-16

But why does the rotating drum change the magnetic field? It's easier to understand by considering a tachometer that uses a squirrel-cage rotor instead of a drum. When the rotor rotates, currents will be induced in the squirrel cage, as described earlier with the motor. These currents, in turn, generate a perpendicular magnetic field, as before. This magnetic field, perpendicular to the orginal field, will be aligned with the output coil and will be picked up. The strength of the induced field (and thus the output voltage) is proportional to the speed, while the direction of the field depends on the direction of rotation. Because the primary coil is excited at 400 hertz, the currents in the squirrel cage and the resulting magnetic field also oscillate at 400 hertz. Thus, the output is at 400 hertz, regardless of the input speed.

Using a drum instead of a squirrel cage provides higher accuracy because there are no fluctuations due to the discrete bars. The operation is essentially the same, except that the currents pass through the metal of the drum continuously instead of through individual bars. The result is eddy currents in the drum, producing the second magnetic field. The diagram below shows the eddy currents (red lines) from a metal plate moving through a magnetic field (green), producing a second magnetic field (blue arrows). For the rotating drum, the situation is similar except the metal surface is curved, so both field arrows will have a component pointing to the left. This creates the directed magnetic field that produces the output.

A diagram showing eddy currents in a metal plate moving under a magnet, Image from Chetvorno.

The servo loop

The motor/generator is called a servomotor because it is used in a servo loop, a control system that uses feedback to obtain precise positioning. In particular, the CADC uses the rotational position of shafts to represent various values. The servo loops convert the CADC's inputs (static pressure, dynamic pressure, temperature, and pressure correction) into shaft positions. The rotations of these shafts power the gears, cams, and differentials that perform the computations.

The diagram below shows a typical servo loop in the CADC. The goal is to rotate the output shaft to a position that exactly matches the input voltage. To accomplish this, the output position is converted into a feedback voltage by a potentiometer that rotates as the output shaft rotates.12 The error amplifier compares the input voltage to the feedback voltage and generates an error signal, rotating the servomotor in the appropriate direction. Once the output shaft is in the proper position, the error signal drops to zero and the motor stops. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. This ensures that the motor slows as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.)

Diagram of a servo loop in the CADC.

The error amplifier and motor drive circuit for a pressure transducer are shown below. Because of the state of electronics at the time, it took three circuit boards to implement a single servo loop. The amplifier was implemented with germanium transistors (since silicon transistors were later). The transistors weren't powerful enough to drive the motors directly. Instead, magnetic amplifiers (the yellow transformer-like modules at the front) powered the servomotors. The large rectangular capacitors on the right provided the phase shift required for the control voltage.

One of the three-board amplifiers for the pressure transducer.

Conclusions

The Bendix CADC used a variety of electromechanical devices including synchros, control transformers, servo motors, and tachometer generators. These were expensive military-grade components driven by complex electronics. Nowadays, you can get a PWM servo motor for a few dollars with the gearing, feedback, and control circuitry inside the motor housing. These motors are widely used for hobbyist robotics, drones, and other applications. It's amazing that servo motors have gone from specialized avionics hardware to an easy-to-use, inexpensive commodity.

A modern DC servo motor. Photo by Adafruit (CC BY-NC-SA 2.0 DEED).

Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @oldbytes.space@kenshirriff. Thanks to Joe for providing the CADC. Thanks to Marc Verdiell for disassembling the motor.

Notes and references

The two types of motors in the CADC are part number "FV-101-19-A1" and part number "FV-101-5-A1" (or FV101-5A1). They are called either a "Tachometer Rate Generator" or "Tachometer Motor Generator", with both names applied to the same part number. The "19" and "5" units look the same, with the "19" used for one pressure servo loop and the "5" used everywhere else.

The motor that I got is similar to the ones in the CADC, but shorter. The difference in size is mysterious since both have the Bendix part number FV-101-5-A1.

For reference, the motor I disassembled is labeled:
Cedar Division Control Data Corp. ST10162 Motor Tachometer F0: 26V C0: 26V TACH: 18V 400 CPS DSA-400-70C-4651 FSN6105-581-5331 US BENDIX FV-101-5-A1

I wondered why the motor listed both Control Data and Bendix. In 1952, the Cedar Engineering Company was spun off from the Minneapolis Honeywell Regulator Company (better known as Honeywell, the name it took in 1964). Cedar Engineering produced motors, servos, and aircraft actuators. In 1957, Control Data bought Cedar Engineering, which became the Cedar Division of CDC. Then, Control Data acquired Bendix's computer division in 1963. Thus, three companies were involved. ↩
My previous articles on the CADC are:
↩
From testing the motor, here is how I believe it is wired:
Motor reference (power): red and black
Motor control: blue and orange
Generator reference (power): green and brown
Generator out: white and yellow ↩
The bars on the squirrel-cage rotor are at a slight angle. Parallel bars would go in and out of alignment with the stator, causing fluctuations in the force, while the angled bars avoid this problem. ↩
This cross-section through the stator shows the windings. On the left, each winding is separated into the parts on either side of the pole. On the right, you can see how the wires loop over from one side of the pole to the other. Note the small circles in the 12 o'clock and 9 o'clock positions: cross sections of the input wires. The individual horizontal wires near the circumference connect alternating windings.

A cross-section of the stator, formed by sanding down the plastic on the end.

↩
It's hard to find explanations of AC servomotors since they are an old technology. One discussion is in Electromechanical components for servomechanisms (1961). This book points out some interesting things about a servomotor. The stall torque is proportional to the control voltage. Servomotors are generally high-speed, but low-torque devices, heavily geared down. Because of their high speed and their need to change direction, rotational inertia is a problem. Thus, servomotors typically have a long, narrow rotor compared with typical motors. (You can see in the teardown photo that the rotor is long and narrow.) Servomotors are typically designed with many poles (to reduce speed) and smaller air gaps to increase inductance. These small airgaps (e.g. 0.001") require careful manufacturing tolerance, making servomotors a precision part. ↩
The principle is Faraday's law of induction: "The electromotive force around a closed path is equal to the negative of the time rate of change of the magnetic flux enclosed by the path." ↩
Ampère's law states that "the integral of the magnetizing field H around any closed loop is equal to the sum of the current flowing through the loop." ↩
The direction of the current flow (up or down) depends on the direction of rotation. I'm not going to worry about the specific direction of current flow, magnetic flux, and so forth in this article. ↩
Once an induction motor is spinning, it can be powered from a single AC phase since the stator is rotating with respect to the magnetic field. This works for the servomotor too. I noticed that once the motor is spinning, it can operate without the control voltage. This isn't the normal way of using the motor, though. ↩
A long discussion of tachometers is in the book Electromechanical Components for Servomechanisms (1961). The AC induction-generator tachometer is described starting on page 193.

For a mathematical analysis of the tachometer generator, see Servomechanisms, Section 2, Measurement and Signal Converters, MCP 706-137, U.S. Army. This source also discusses sources of errors in detail. Inexpensive tachometer generators may have an error of 1-2%, while precision devices can have an error of about 0.1%. Accuracy is worse for small airborne generators, though. Since the Bendix CADC uses the tachometer output for damping, not as a signal output, accuracy is less important. ↩
Different inputs in the CADC use different feedback mechanisms. The temperature servo uses a potentiometer for feedback. The angle of attack correction uses a synchro control transformer, which generates a voltage based on the angle error. The pressure transducers contain inductive pickups that generate a voltage based on the pressure error. For more details, see my article on the CADC's pressure transducer servo circuits. ↩

[#] Sat Feb 24 2024 11:57:50 UTC from rss <>

Subject: The first microcomputer: The transfluxor-powered Arma Micro Computer from 1962

What would you say is the first microcomputer?1 The Apple I from 1976? The Altair 8800 from 1974? Perhaps the lesser-known Micral N (1973) or Q1 (1972)? How about the Arma Micro Computer from way back in 1962. The Arma Micro Computer was a compact 20-pound transistorized computer, designed for applications in space such as inertial or celestial navigation, steering, radar, or engine control.

Obviously, the Arma Micro Computer is not a microcomputer according to modern definitions, since its processor was made from discrete components. But it's an interesting computer in many ways. First, it is an example of the aerospace computers of the 1960s, advanced systems that are now almost entirely forgotten. People think of 1960s computers as room-filling mainframes, but there was a whole separate world of cutting-edge miniaturized aerospace computers. (Taking up just 0.4 cubic feet, the Arma Micro Computer was smaller than an Apple II.) Second, the Arma Micro Computer used strange components such as transfluxors and had an unusual 22-bit serial architecture. Finally, the Arma Micro Computer evolved into a series of computers used on Navy ships and submarines, the E-2C Hawkeye airborne early warning plane, the Concorde, and even Air Force One.

The Arma Micro Computer

The Arma Micro Computer, with a circuit board on top. Click this image (or any other) for a larger version. Photo courtesy of Daniel Plotnick.

The Micro Computer used 22-bit words, which may seem like a strange size from the modern perspective. But there's no inherent need for a word size to be a power of 2. In particular, the Micro Computer was designed for mathematical calculations, not dealing with 8-bit characters. The word size was selected to provide enough accuracy for its navigational tasks.

Another strange aspect of the Micro Computer is that it was a serial machine, sequentially operating on one bit of a word at a time.2 This approach was often used in early machines because it substantially reduced the amount of hardware required: it only needs a 1-bit data bus and a 1-bit ALU. The downside is that a serial machine is much slower because each 22-bit word takes 22 clock cycles (plus 5 cycles of overhead). As a result, the Micro Computer executed just 36000 operations per second, despite its 1 megahertz clock speed.

Ad for the Arma Micro Computer (called the MICRO here). Source: Electronics, July 27, 1962.

The Micro Computer had a small instruction set of 19 instructions.3 It included multiply, divide, and square root, instructions that weren't implemented in early microprocessors. This illustrates how early microprocessors were a significant step backward in functionality. Moreover, the multiply, divide, and square root instructions used a separate arithmetic unit, so they could execute in parallel with other arithmetic instructions. Because the Micro Computer needed to interact with spacecraft systems, it had a focus on I/O, with 120 digital inputs or outputs, configured as needed for a particular mission.

Circuits

The Micro Computer was built from silicon transistors and diodes, using diode-transistor logic. The construction technique was somewhat unusual. The basic circuits were the flip-flop, the complementary buffer (i.e. an inverter), and the diode gate. Each basic circuit was constructed on a small wafer, .77 inches on a side.5 The photo below shows wafers for a two-transistor flip-flop and two diode gates. Each wafer had up to 16 connection tabs on the edges. These wafers are analogous to integrated circuits, but constructed from discrete components.

Three circuit modules from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

The wafers were mounted on printed circuit boards, with up to 22 wafers on a board. Pairs of boards were mounted back to back with polyurethane foam between the boards to form a "sandwich", which was conformally coated. The result was a module that was protected against the harsh environment of a missile or spacecraft. The computer could handle a shock of 100 g's and temperatures of 0°C to 85°C as well as 100% humidity or a vacuum.

Because the Micro Computer was a serial machine, its bits were constantly moving. For register storage such as the accumulator, it used six magnetostrictive torsional delay lines, storing a sequence of bits as physical twists that formed pulses racing through a long coil of wire.

The photo below shows the Arma Micro Computer with the case removed. If you look closely, you can see the 22 small circuit wafers mounted on each printed circuit board. The memory driver boards and delay lines are towards the back, spaced more widely than the other printed circuit boards. The cable harness underneath the boards provides the connections between boards.4

Circuit boards inside the Arma Micro Computer. Photo courtesy of Daniel Plotnick.

Transfluxors

One of the most unusual parts of the Micro Computer was its storage. Computers at the time typically used magnetic core memory, with each bit stored in a tiny ferrite ring, magnetized either clockwise or counterclockwise to store a 0 or 1. One drawback of standard core memory was that the process of reading a core also cleared the core, requiring data to be written back after a read.

Diagram of Arma's memory system. From patent 3048828.

The Micro Computer used ferrite cores, but these were "two-aperture" cores, with a larger hole and a smaller hole, as shown above. Data is written to the "major aperture" and read from the "minor aperture". Although the minor aperture switches state and is erased during a read, the major aperture retains the bit, allowing the minor aperture to be switched back to its original state. Thus, unlike regular core memory, transfluxors don't lose their data when reading.

The resulting system is called non-destructive readout (NDRO), compared to the destructive readout (DRO) of regular core memory.6 The Micro Computer used non-destructive readout memory to ensure that the program memory remained uncorrupted. In contrast, if a program is stored in regular core memory, each instruction must be written back as it is executed, creating the possibility that a transient could corrupt the software. By using transfluxors, this possibility of error is eliminated. (In either case, core memory has the convenient property that data is preserved when power is removed, since data is stored magnetically. With modern semiconductor memory, you lose data when the power goes off.)

The photo below shows a compact transfluxor-based storage module used in the Micro Computer, holding 512 words. In total, the computer could hold up to 7808 words of program memory and 256 words of data memory. It appears that transfluxors didn't live up to their promise, since most computers used regular core memory until semiconductor memory took over in the early 1970s.

Transfluxor-based core memory module from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Arma's history and the path to the Micro Computer

The Arma Engineering Company was founded in 1918 and built advanced military equipment.7 Its first product was a searchlight for the Navy, followed by a gyroscopic compass and analog computers for naval gun targeting. In 1939, Arma produced the Torpedo Data Computer, a remarkable electromechanical analog computer. US submarines used this computer to track target ships and automatically aim torpedos. The Torpedo Data Computer performed complex trigonometric calculations and integration to account for the motion of the target ship and the submarine. While the Torpedo Data Computer performed well, the Navy's Mark 14 torpedo had many problems—running too deep, exploding too soon, or failing to explode—making torpedoes often ineffectual even with a perfect hit.

The Torpedo Data Computer Mark III in the USS Pampanito.

Arma underwent major corporate changes due to World War II. Before the war, the German-owned Bosch Company built vehicle starters and aircraft magnetos in the United States. When the US entered World War II in 1941, the government was concerned that a German-controlled company was manufacturing key military hardware so the Office of Alien Property Custodian took over the Bosch plant. In 1948, the banking group that controlled Arma bought Bosch from the Office of the Alien Property Custodian, merging them into the American Bosch Arma Corporation (AMBAC).8 (Arma had earlier received the rights to gyrocompass technology from the German Anschutz company, seized by the Navy after World War I, so Arma benefitted twice from wartime government seizures.)

In the mid-1950s, Arma moved into digital computers, building an inertial guidance computer for the Atlas nuclear missile program. America's first ICBM was the Atlas missile, which became operational in 1959. The first Atlas missiles used radio guidance from the launch site to direct the missile. Since radio signals could be jammed by the enemy, this wasn't a robust solution.

The solution to missile guidance was an inertial navigation system. By using sensitive gyroscopes and accelerometers, a missile could continuously track its position and velocity without any external input, making it unjammable. A key developer of this system was Arma's Wen Tsing Chow, one of the driving forces behind digital aviation computers. He faced extreme skepticism in the 1950s for the idea of putting a computer in a missile. One general mocked him, asking "Where are you going to put the five Harvard professors you'll need to keep it running?" But computerized navigation was successful and in 1961, the Atlas missile was updated to use the Arma inertial guidance computer. It was said to be the first production airborne digital computer.9 Wen Tsing Chow also invented the programmable read-only memory (PROM), allowing missile targeting information to be programmed into a computer outside the factory.

Wen Tsing Chow, computer engineer, with Arma Micro Computer. From Control Engineering, January 1963, page 19. Courtesy of Daniel Plotnick.

The photo below shows the Atlas ICBM's guidance system. The Arma W-107A computer is at the top and the gyroscopes are in the middle. This computer was an 18-bit serial machine running at 143.36 kHz. It ran a hard-wired program that integrated the accelerometer information and solved equations for the crossrange error function, range error function, and gravity, making these computations every half second.10 The computer weighed 240 pounds and consumed 1000 watts. The computer contained about 36,000 components: discrete transistors, diodes, resistors, and capacitors mounted on 9.5" × 6.5" printed-circuit boards. On the ground, the computer was air-cooled to 55 °F, but there was no cooling after launch as the computer only operated for five minutes of powered flight and wouldn't overheat during that time.

Guidance system for Atlas ICBM. From "Atlas Inertial Guidance System" by John Heiderstadt. Photo unclassified in 1967.

The Atlas wasn't originally designed for a computerized guidance system so there wasn't room inside the missile for the computer. To get around this, a large pod was stuck on the side of the missile to hold the computer and gyroscopes, as indicated in the photo below. This doesn't look aerodynamic, but I guess it worked.

Atlas missile. Arrow indicates the pod containing the Arma guidance computer and inertial navigation system. Original photo by Robert DuHamel, CC BY-SA 3.0.

The Atlas guidance computer (left, below) consisted of three aluminum sections called "decks". The top deck held two replaceable target constant units, each providing 54 navigation constants that specified a target. The constants were stored in a stack of printed circuit boards 16" × 8" × 1.5", covered in over a thousand diodes, Wen Tsing Chow's PROM memory. A target was programmed into the stack by a rack of equipment that would selectively burn out diodes, changing the corresponding bit to a 1. (This is why programming a PROM is referred to as "burning the PROM".11) The diode matrix was later replaced with a transfluxor memory array, which had the advantage that it could be reprogrammed as necessary. The top deck also had connectors for the accelerometer inputs, the outputs, and connections for ground support equipment. The bottom deck had power connectors for 28 volts DC and 115V 400 Hz 3-phase AC. In the bottom deck, quartz delay lines were used for storage, representing bits as acoustic waves. Twelve circuit cards, each with a faceted quartz block four inches in diameter, provided a total of 32 words of storage.

Three generations of Arma Computers: the W-107A Atlas ICBM guidance computer, the Lightweight Airborne Digital Computer, and the Arma Micro Computer (perhaps a prototype). Photo courtesy of Daniel Plotnick.

Arma considered the Micro Computer the third generation of its airborne computers. The first generation was the Atlas guidance computer, constructed from germanium transistors and diodes (in the pre-silicon era). The second-generation computer moved to silicon transistors and diodes. The third-generation computers still used discrete components, but mounted on the small square wafers. The third generation also had a general-purpose architecture and programmable transfluxor memory instead of a hard-wired program.

After the Micro Computer

Arma continued to develop computers, improving the Arma Micro Computer. The Micro C computer (1965) was developed for Navy ships and submarines. Much like the original Micro, the Micro C used transfluxor storage, but increased the clock frequency to 972 kHz. The computer was much larger: 3.87 cubic feet and 150 pounds. This description states that "the machine is an outgrowth of the ARMA product line of micro computers and is logically and electrically similar to micro-computers designed for missile environments."

Module from the Arma Micro-C Computer. Photo courtesy of Daniel Plotnick.

In mid-1966, Arma introduced the Micro D computer, built from TTL integrated circuits. Like the original Micro, this computer was serial, but the Micro D had a word length of 18 bits and ran at 1.5 MHz. It weighed 5.25 pounds and was very compact, just 0.09 ft³. Instead of transfluxors, the Micro D used regular magnetic core memory, 4K to 31K words.

The Arma Micro-D 1801 computer. The 1808 was a slightly larger model. Photo courtesy of Daniel Plotnick.

The widely-used Litton LTN-51 inertial navigation system was built around the Arma Micro-D computer.12 This navigation system was designed for commercial aircraft, but was also used for military applications, ships, and NASA aircraft. Aircraft from early Concordes to Air Force One used the LTN-51 for navigation. The photo below shows a navigation unit with the Arma Micro-D computer in the lower left and the gyroscope unit on the right.

Litton LTN-51 inertial navigation system. Photo courtesy of pascal mz, concordescopia.com.

In early 1968, the Arma Portable Micro D was introduced, a 14-pound battery-powered computer also called the Celestial Data Processor. This handheld computer was designed for navigation in crewed earth orbital flight, determining orbital parameters from stadimeter and sextant measurements performed by astronauts. As far as I can tell, this computer never made it beyond the prototype stage.

The Arma Celestial Data Processor (source).

Conclusions

The Arma Micro Computer is just one of the dozens of compact aerospace computers of the 1960s, a category that is mostly forgotten and ignored. Another example is the Delco MAGIC I (1961), said to be the "first complete airborne computer to have its logic functions mechanized exclusively with integrated circuits". IBM's 4 Pi series started in 1966 and was used in many systems from the F-15 to the Space Shuttle. By 1968, denser MOS/LSI chips were used in general-purpose aerospace computers such as the Rockwell MOS GP and the Texas Instruments Model 2502 LSI Computer. 13

Arma also illustrates that a company can be on the cutting edge of technology for decades and then suddenly go out of business and be forgotten. After some struggles, Arma was acquired by United Technologies in 1978 for $210 million and was then shut down in 1982. (The German Bosch corporation remains, now a large multinational known for products such as dishwashers, auto parts, and power tools.) Looking at a list of aerospace computers shows many innovative but vanished companies: Univac, Burroughs, Sperry (now all Unisys), AC Electronics (now part of Raytheon), Autonetics (acquired by Boeing), RCA (bought by GE), and TRW (acquired by Northrop Grumman).

Finally, the Micro Computer illustrates that terms such as "microcomputer" are not objective categories but are social constructs. At first, it seems obvious that the Arma Micro Computer is not a real microcomputer. If you consider a microcomputer to be a computer built around a microprocessor, that's true. (Although "microprocessor" is also not as clear as you might think.) But a microcomputer can also be defined as "A small computer that includes one or more input/output units and sufficient memory to execute instructions" (according to the IBM Dictionary of Computing, 1994)14 and the Arma Micro Computer meets that definition. The "microcomputer" is a shifting concept, changing from the 1960s to the 1990s to today.

For more, follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @kenshirriff@oldbytes.space. Thanks to Daniel Plotnick for providing a great deal of information and photos. Thanks to John Hartman for obtaining an obscure conference proceedings for me.

Notes and references

I should mention the danger of "firsts" from a historical perspective. Historian Michael Williams advised "not to use the word 'first'" and said, "If you add enough adjectives to a description you can always claim your own favorite." (See ENIAC in Action, p7.)

The first usage of "micro-computer" that I could find is from 1956. In Isaac Asimov's short story "The Dying Night", he mentions a "micro-computer" in passing: "In recent years, it [the handheld scanner] had become the hallmark of the scientist, much as the stethoscope was that of the physician and the micro-computer that of the statistician."

Another interesting example of a "micro-computer" is the Texas Instruments Semiconductor Network Computer. This palm-sized computer is often considered the first integrated-circuit computer. It was an 11-bit serial computer running at 100 kHz, built out of RS flip-flops, NOR gates, and logic drivers. The 1961 article below described this computer as a "micro-computer", although this was a one-off use of the term, not the computer's name. This brochure describes the Semiconductor Network Computer in more detail and Semiconductor Networks are described in detail in this article. Unlike modern ICs, these integrated circuits used flying wires for internal connections rather than a deposited metal layer, making their design a dead end.

The Texas Instruments Semiconductor Network Computer. From Computers and Automation, Dec. 1961.

↩
Most of the information on the Arma Micro Computer in this article is from "The Arma Micro Computer for Space Applications", by E. Keonjian and J. Marx, Spaceborne Computing Engineering Conference, 1962, pages 103-116. ↩
The Arma Micro Computer's instruction set consisted of 19 22-bit instructions, shown below.

Instruction set of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

↩
This block diagram shows the structure of the Micro Computer. The accumulator register (AC) is used for all data transfers as well as addition and subtraction. The multiply-divide register is used for multiplication, division, and square roots. The product register (PR), quotient register (QR), and square root register (SR) are used by the corresponding instructions. The data buffer register (S) holds data moving in or out of storage; it is shown with two 11-bit parts.

Block diagram of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

For control logic, the location counter (L) is the 13-bit program counter. For a subroutine call, the current address can be stored in the recall register (RR), which acts as a link register to hold the return address. (The RR is not shown on the diagram because it is held in memory.) Instruction decoding uses the instruction register (I), with the next instruction in the instruction buffer (B). The operand register (P) contains the 13-bit address from an instruction, while the remaining register (R) is used for I/O addressing. ↩
Arma's original plan was to mount circuits on ceramic wafers. Resistors would be printed onto the wafer and wiring silk-screened. (This is similar to IBM's SLT modules (1964), although IBM mounted diode and transistors as bare dies rather than components.) However, the Micro Computer ended up using epoxy-glass wafers with small, but discrete components: standard TO-46 transistors, "fly-speck" diodes, and 1/10 watt resistors. I don't see much advantage to these wafers over mounting the components directly on the printed-circuit board; maybe standardization is the benefit. ↩
The Micro Computer used an unusual mechanism to select a word to read or write. Most computers used a grid of selection wires; by energizing an X and a Y wire at the same time, the corresponding core was selected. The key idea of this "coincident-current" approach is that each wire has half the current necessary to flip a core, so the core with the energized X and Y wires will have enough current to flip. This puts tight constraints on the current level, since too much current will flip all the cores along the wire, but not enough current will not flip the selected current. What makes this difficult is that the properties of a core change with temperature, so either the cores need to be temperature-stabilized or the current needs to be adjusted based on the temperature.

The Micro Computer instead used a separate wire for each word, so as long as the current is large enough, the cores will flip. This approach avoids the issues with temperature sensitivity, an important concern for a computer that needs to handle the large temperature swings of a spacecraft, not an air-conditioned data center. Unfortunately, it requires much more wiring. Specifically, the large advantage of the coincident-current approach is that an N×N grid of wires lets you select N² words. With the Micro Computer approach, N wires only select N words, so the scalability is much worse.

For more on Arma's memory systems, see patents: Memory Device, 3048828 and Multiaperture Core Memory Matrix, 3289181. ↩
The capitalization of Arma vs. ARMA is inconsistent. It often appears in all-caps, but both forms are used, sometimes in the same article. "Arma" is not an acronym; the name came from the names of its founders: Arthur Davis and David Mahood (source: Between Human and Machine, p54). I suspect a 1960s corporate branding effort was responsible for the use of all-caps. ↩
For more on the corporate history of Arma, see IRE Pulse, March 1958, p9-10. Details of corporate politics and what went wrong are here. More information on the financial ups and downs of Arma is in "Charles Perelle's Spacemanship", Fortune, January 1959, an article that focused on Charles Perelle, the president of American Bosch Arma. ↩
Wikipedia says that Arma's guidance computer was "the first production airborne digital computer". However, the Hughes Digitair (1958) has also been called "the first airborne digital computer in actual production." Another source says the Arma computer was the "first all-solid-state, high-reliability, space-borne digital computer." The TRADIC (Transistorized Airborne Digital Computer) (1954) was earlier, but was a prototype system, not a production system. In turn, the TRADIC is said by some to be the first fully transistorized computer, but that depends on exactly how you interpret "fully".

This is another example of how the "first" depends on the specific adjectives used. ↩
The information on the Arma W-107A computer is from "Atlas Inertial Guidance System: As I Remember It" by Principal Engineer John Heiderstadt. ↩
Chow Wen Tsing's PROM patent discusses the term "burning", explaining that it refers to burning out the diodes electrically. To widen the patent, he clarifies that "The term 'blowing out' or 'burning out' further includes any process which, by means less drastic than actual destruction of the non-linear elements, effects a change of the circuit impedance to a level which makes the particular circuit inoperative." This description prevented someone from trying to get around the patent by stating that nothing was really burning. ↩
Details on the LTN-51 navigation system and its uses are in this document. ↩
For more information on early aerospace computers, see State-of-the-art of Aerospace Digital Computers (1967), updated as Trends in Aerospace Digital Computer Design (1969). Also see the 1970 Survey of Military CPUs. Efficient partitioning for the batch-fabricated fourth generation computer (1968) discusses how "The computer industry is on the verge of an upheaval" from new hardware including LSI and fast ROMs, and describes various LSI aerospace computers. ↩
The "IBM Dictionary of Computing" (1994) has two definitions of "microcomputer": "(1) A digital computer whose processing unit consists of one or more microprocessors, and includes storage and input/output facilities. (2) A small computer that includes one or more input/output units and sufficient memory to execute instructions; for example a personal computer. The essential components of a microcomputer are often contained within a single enclosure." The latter definition was from an ISO/IEC draft standard for terminology so it is somewhat "official". ↩

[#] Sat Mar 23 2024 09:38:41 UTC from rss <>

Subject: The Intel 8088 processor's instruction prefetch circuitry: a look inside

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the dominance of the x86 architecture that continues to the present.1 One way that the 8086 and 8088 increased performance was by prefetching: the processor fetches instructions from memory before they are needed, so the processor can execute them without waiting on the relatively slow memory. I've been reverse-engineering the 8088 from die photos and this blog post discusses what I've uncovered about the prefetch circuitry.

The die photo below shows the 8088 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; this article focuses on the prefetch queue components highlighted in red. The components in purple also play a role, and will be discussed below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles memory accesses, while the Execution Unit (EU) executes instructions. In particular, the BIU fetches instructions, which are transferred from the prefetch queue to the Execution Unit via the queue bus.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. The 8088's narrower bus reduced performance, since the processor only transfers one byte at a time rather than two. However, the 8-bit bus enabled cheaper computer hardware. The 8-bit bus was also a better match for hardware based on the older but popular 8-bit Intel 8080 and 8085 processors, allowing the reuse of 8-bit I/O circuitry for instance. Much of the IBM PC was based on the little-known IBM DataMaster, a computer built around the Intel 8085. Thus, selecting the 8088 processor was a natural choice for the IBM PC.

For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. The biggest differences are in the Bus Interface Unit, the circuitry that communicates with memory and I/O devices, since this circuitry handles 16 bits in the 8086 versus 8 bits in the 8088. There are a few microcode differences between the two chips. One interesting change is that for performance reasons the 8088 has a smaller prefetch queue than the 8086 (four bytes instead of six). (I wrote about the 8086's prefetch circuity earlier.)

Prefetching and the architecture of the 8086 and 8088

The 8086 and 8088 were introduced at an interesting point in microprocessor history, when memory was becoming slower than the CPU. For the first microprocessors, the speed of the CPU and the speed of memory were comparable.2 However, as processors became faster, the speed of memory failed to keep up. The 8086 was probably the first microprocessor to prefetch instructions to improve performance. While modern microprocessors have megabytes of fast cache3 to act as a buffer between the CPU and much slower main memory, the 8088 has just 4 bytes of prefetch queue. However, this was enough to substantially increase performance.

Prefetching had a major impact on the design of the 8086 and thus the 8088. Earlier processors such as the 6502, 8080, or Z80 were deterministic: the processor fetched an instruction, executed the instruction, and so forth. Memory accesses corresponded directly to instruction fetching and execution and instructions took a predictable number of clock cycles. This all changed with the introduction of the prefetch queue. Memory operations became unlinked from instruction execution since prefetches happen as needed and when the memory bus is available.

To handle memory operations and instruction execution independently, the implementors of the 8086 and 8088 divided the processors into two processing units: the Bus Interface Unit (BIU) that handles memory accesses, and the Execution Unit (EU) that executes instructions. The Bus Interface Unit contains the instruction prefetch queue; it supplies instructions to the Execution Unit via the Q (queue) bus. The BIU also contains an adder (Σ) for address calculation, adding the segment register base to an address offset, among other things. The Execution Unit is what comes to mind when you think of a processor: it has most of the registers, the arithmetic/logic unit (ALU), and the microcode that implements instructions. The segment registers (CS, DS, SS, ES) and the Instruction Pointer (IP) are in the Bus Interface Unit since they are directly involved in memory accesses, while the general-purpose registers are in the Execution Unit.

Block diagram of the 8088 processor. This diagram differs from most 8088 block diagrams because it shows the actual physical implementation, rather than the programmer's view of the processor. The "Internal Communication Registers" consist of the Indirect Register (IND) and the Operand Register (OPR). These hold a memory address and memory data value respectively. From The 8086 Family User's Manual page 243.

It may seem inefficient for the Bus Interface Unit to have its own adder instead of using the ALU, but there are reasons for the separate adder. First, every memory access uses the adder at least once to add the segment base and offset. The adder is also used to increment the PC or index registers. Since these operations are so frequent, they would create a bottleneck if they used the ALU. Second, since the Execution Unit and the Bus Interface Unit run asynchronously with respect to each other, it would be complicated to share the ALU without conflicts.

Prefetching had another major but little-known effect on the 8086 architecture: the designers were considering making the 8086 a two-chip microprocessor. Prefetching, however, required a one-chip design because the number of control signals required to synchronize prefetching across two chips exceeded the package pins available. This became a compelling argument for the one-chip design that was used for the 8086.4 (The unsuccessful Intel iAPX 432, which was under development at the same time, ended up being a two-chip processor: one to fetch and decode instructions, and one to execute them.)

Implementing the queue

The 8088's instruction prefetch queue is implemented with four 8-bit queue registers along with two hardware "pointers" into the queue. One two-bit counter keeps track of the current read position from 0 to 3, i.e. the queue register that will provide the next instruction byte. The second counter keeps track of the current write position, i.e. the queue register that will receive the next instruction from memory.5 As bytes are fetched from the queue, the read pointer advances. As bytes are added to the queue, the write pointer advances.

The diagram below shows an example queue configuration with two prefetched bytes. The middle two queue registers (Q1 and Q2) hold data. The read pointer indicates that the Execution Unit will get its next byte from Q1. The write pointer indicates that the next prefetched byte will go into Q3.

A queue configuration with two bytes in the prefetch queue. Bytes in blue hold prefetched data.

The diagram below shows how the queue pointers can wrap around. In this configuration, two more bytes have been written to the queue (Q3 and Q0), so the queue is full. The write pointer now points to Q1, the same as the read pointer.

A queue configuration with four bytes in the prefetch queue.

There is an important ambiguity, however. Suppose that four bytes are read from the queue, so the read pointer advances four positions, wrapping around back to Q1. The queue is now empty, as shown below, but the pointers have the same position as the full case above. Thus, if the read pointer and the write pointer both point to the same position, the queue may be empty or full. To distinguish these cases, a flip-flop is set if the queue enters the empty state. This flip-flop generates a signal that Intel called MT (empty).

A queue configuration with the queue empty.

To determine how many bytes are in the queue, the queue circuitry uses a two-bit queue length value, along with the MT flip-flop value to distinguish the empty state. Conceptually, the queue length is generated by subtracting the read position from the write position. However, the implementation does not use a standard subtraction circuit, but instead uses hardcoded logic to determine the two bits of the length, as shown below.

The circuitry to determine the queue length.

The low bit of the length is the XOR of the two positions. In NMOS logic (used by the 8088), an AND-NOR gate is easy to implement, while an XOR gate is difficult. Thus, XOR is implemented as shown in the top circuit. (You can verify that if one input is 1 and the other is 0, the output is 1.) The high-order bit of the length is also based on an AND-NOR gate, one with six inputs. Each input is a combination of read and write positions that yields an output bit 1; each input is computed by a NOR gate, which I haven't drawn.6 As a result, the amount of logic circuitry to compute the length is fairly large.

The diagram below zooms in on the queue control circuitry on the die, with the main flip-flops and circuitry labeled. The circuitry in the middle computes the queue length with the 6-input NOR gate stretched across the whole region. The flip-flops for the read and write positions are in the lower region. Despite the relative simplicity of the queue circuits, they take up a substantial part of the die. Compared to modern chips, the density of the 8088 is very low; you can almost see the flip-flops with the naked eye. But this isn't all the circuitry as prefetching also required queue registers and memory cycle control circuitry. Thus, prefetching was a moderately expensive feature for the 8088, as far as die area.

The queue and prefetch circuitry on the die. The metal layer has been removed for the closeup to show the silicon of the underlying transistors.

The loader

To decode and execute an instruction, the Execution Unit must get instruction bytes from the Bus Interface Unit, but this is not entirely straightforward. The main problem is that the queue can be empty, in which case instruction decoding must block until a byte is available from the queue. The second problem is that instruction decoding is relatively slow so it is pipelined. For maximum performance, the decoder needs a new byte before the current instruction is finished. A circuit called the "loader" solves these problems by providing synchronization between the prefetch queue and the instruction decoder. The loader uses a small state machine to efficiently fetch bytes from the queue at the right time and to provide timing signals to the decoder and microcode engine.

In more detail, as the loader requests the first two instruction bytes from the prefetch queue, it generates two timing signals that control the microcode execution. The FC (First Clock) indicates that the first instruction byte is available, while the SC (Second Clock) indicates the second instruction byte. Note that the First Clock and Second Clock are not necessarily consecutive clock cycles because the prefetch queue could be empty or contain just one byte, in which case the First Clock and/or Second Clock would be delayed. The instruction decoding circuitry and the microcode engine are controlled by the First Clock and Second Clock signals, so they remain synchronized with the bytes supplied by the prefetch queue.

At the end of a microcode sequence, the Run Next Instruction (RNI) micro-operation causes the loader to fetch the next machine instruction. However, fetching and decoding the next instruction is a bit slow so microcode execution would be blocked for a cycle. In many cases, this slowdown can be avoided: if the microcode knows that it is one micro-instruction away from finishing, it issues a Next-to-last (NXT) micro-operation so the loader can start loading the next instruction. This achieves a degree of pipelining in most cases; fetching the next instruction is overlapped with finishing the execution of the previous instruction.

The state machine for the 8086/8088 "loader" circuit. The 1BL signal indicates a 1-byte instruction implemented in logic rather than microcode. From patent US4449184.

The diagram above shows the state machine for the loader. I won't explain it in detail, but essentially it keeps track of whether it is waiting for a First Clock byte or a Second Clock byte, and if it is performing a fetch in advance (NXT) or at the end of an instruction (RNI). The state machine is implemented with two flip-flops to support its four states.

Microcode and the prefetch queue

The loader takes care of fetching an instruction that consists of an opcode byte and a Mod R/M (addressing mode) byte. However, many instructions have additional bytes or don't follow this format For example, an opcode such as "ADD AX" can be followed by an 8- or 16-bit immediate value, adding that value to the AX register. Or a "move memory to AX" instruction can be followed by a 16-bit memory address The microcode uses a separate mechanism for fetching these instruction bytes from the queue. Specifically, each micro-instruction contains a source register and a destination register that specify a data move. By specifying "Q" (the queue) as the source, a byte is fetched from the prefetch queue. If the queue is empty, microcode execution blocks until the Bus Interface Unit loads a byte into the prefetch queue. Thus, the complexity of instruction fetching and the prefetch queue is invisible to the microcode.7

A jump, subroutine call, or other control flow change causes the prefetch queue to be flushed since the queue contents are no longer useful. This is accomplished in microcode with the FLUSH micro-instruction, which resets the queue read and write pointers and sets the MT (empty) flip-flop. Note that the queue is flushed even if the target address is in the queue, for example if you jump one byte ahead.

One complication due to the prefetch queue is that the processor's Instruction Pointer points to the next instruction to be fetched, not the next instruction to be executed. This becomes a problem for a subroutine call, which needs to push the return address. It is also a problem for a relative jump, which is computed from the current instruction. The solution is the CORR micro-instruction, which corrects the Instruction Pointer by subtracting the queue length to determine the current execution position. This is implemented by the Bus Interface Unit, which holds correction constants in the Constant ROM, and subtracts them using the address adder (not the ALU).8

The queue registers

The 8086 and 8088 partition the registers into upper registers (in the Bus Interface Unit) and lower registers (in the Execution Unit). The upper registers are the registers associated with memory accesses (e.g. Instruction Pointer, segment registers) while the lower registers are more general purpose (e.g. AX, BX, SI, SP). The upper registers are connected to two 16-bit internal buses: the B bus and the C bus.

The queue registers are physically part of the upper registers, but are wired into the buses slightly differently, as shown below. In particular, the 8088's queue registers are written 8 bits at a time from the C bus. (In contrast, the 8086's queue registers can be written 16 bits at a time to support two-byte prefetches.) When accessing the queue, the queue registers are read 16 bits at a time, but only one byte is transferred to the Q bus for instruction processing.9

The queue registers in the 8088.

The diagram below shows how the queue registers appear on the die, comparing the six-byte prefetch queue in the 8086 (top) to the four-byte 8088 queue (bottom). The 8086 prefetch registers are structured as three rows of 16-bit registers, while the 8088 prefetch registers are structured as four rows of 8-bit registers. In both cases, each bit is stored in a cross-coupled pair of inverters. The bit lines (not present) are vertical, while the control lines to select a register are horizontal. The layout is different between the processors to support 16-bit versus 8-bit writes. Note the empty space at the bottom of the 8088 registers. Because the rest of the chips are mostly the same, the 8088 couldn't be "compacted" to avoid this wasted space.

The prefetch registers in the 8086 (top) and 8088 (bottom). For the 8086, the metal and polysilicon layers were removed, exposing the underlying silicon. For the 8088, the polysilicon and silicon are visible.

Intel used simulations to determine the best queue sizes for the 8086 and 8088, balancing the performance cost of prefetching against the benefit. (The cost is that prefetching makes the bus unavailable for other memory or I/O operations.) The prefetch queue is discarded on a jump instruction or other change of control flow, causing the prefetched bytes to be wasted. Thus, as the queue gets longer, the chance of discarding a prefetched byte becomes larger, so the potential benefit of prefetching becomes smaller. Since the 8088 prefetches one byte at a time, compared to two bytes at a time on the 8086, prefetching on the 8088 costs twice as much as on the 8086 in terms of bus cycles used per byte. This changes the tradeoffs in favor of a shorter queue.

Because of the difference in queue lengths, the queue control circuitry is different between the 8086 and 8088. In particular, the 8086 needs three-bit counters for the read and write positions, while the 8088 uses two-bit counters. Because of this, the length computation circuitry is also different between the processors.

I plan to continue reverse-engineering the 8088 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier.

Notes and references

Whenever I mention x86's domination of the computing market, people bring up ARM, but ARM has a lot more market share in people's minds than in actual numbers. One research firm says that ARM has 15% of the laptop market share in 2023, expected to increase to 25% by 2027. (Surprisingly, Apple only has 90% of the ARM laptop market.) In the server market, just an estimated 8% of CPU shipments in 2023 were ARM. See Arm-based PCs to Nearly Double Market Share by 2027 and Digitimes. (Of course, mobile phones are almost entirely ARM.) ↩
Steve Furber, co-creator of the ARM chip, mentions that "The first integrated CPUs were coincidentally quite well matched to semiconductor memory speeds, and were therefore built without caches. This can now be seen as a temporary aberration." See VLSI Risc Architecture and Organization p77. To make this concrete, the Apple II (1977) used a MOS 6502 processor running at about 1 megahertz while its 4116 DRAM chips could perform an access in 250 nanoseconds (4 times the clock speed). The 8088 processor ran at 5-10 MHz which meant that 250 ns DRAM chips were slower than the clock speed. Nowadays, processors run at 4 GHz but DRAM access speed is about 50 nanoseconds (1/200 the clock speed). ↩
Modern processors use caches to improve memory performance. Accessing data from a cache is faster than accessing it from main memory, but the tradeoff is that caches are much smaller than main memory. The prefetch queues in the 8086 and 8088 are similar to a cache in some ways, but there are some key differences. First, the prefetch queue is strictly sequential. If you jump ahead two bytes, even if the prefetch queue has those instruction bytes, the processor can't use them. Second, the prefetch queue can't reuse bytes. If you have a 6-byte loop, even though all the code fits in the prefetch queue, it will be reloaded every time. Third, the prefetch queue doesn't provide any consistency. If you modify an instruction in memory a couple of bytes ahead of the PC, the 8086 or 8088 will run the old instruction if it's in the queue. ↩
The design decisions for the 8086 prefetch cache (and many other aspects of the chip) are described in: J. McKevitt and J. Bayliss, "New options from big chips," in IEEE Spectrum, vol. 16, no. 3, pp. 28-34, March 1979, doi: 10.1109/MSPEC.1979.6367944. Prefetch provided a 50% performance benefit to the 8086. ↩
The queue read process doesn't use an explicit read operation. Instead, the selected queue register continuously puts its value onto the queue bus. When the Execution Unit uses this byte, it sends an increment signal to the queue to advance the read pointer. If the queue empty (MT) flip-flop is set, the Execution Unit will wait until a byte is ready. ↩
The NOR gates are used as AND gates, following DeMorgan's laws. For example to produce a 1 output for write position 00 and read position 01, the logic is: NOR(write bit 1', write bit 0', read bit 1', read bit 0). Note that the bits into the NOR gate are all inverted from the "desired" values; if they are all 0, the NOR output is 1. Thus, there are also some inverters on the inputs. ↩
Arbitrary memory reads and writes are performed directly on memory, bypassing the prefetch queue. The 8086/8088 do not provide consistency; if you modify an instruction byte in memory and the byte is in the queue, the processor will execute the old byte. (This type of self-modifying code can be used to determine the queue length, distinguishing the 8086 from the 8088 in software.) ↩
The Constant ROM is used for more than just address correction. For example, it is also used to increment the Instruction Pointer after a prefetch. Other constants are used for the 8088's string operations, which act on a block of memory. The index registers are incremented or decremented by 1 for bytes or 2 for words. When popping a value from the stack, the stack pointer is decremented using the Constant ROM. ↩
Are the 8088's queue registers 16 bits wide or 8 bits wide? It's ambiguous, since the registers are written 8 bits at a time, but read 16 bits at a time. This implementation was probably selected to support the 8088's 8-bit bus while reusing as much of the 8086 design as possible. In particular, the 8088 can only prefetch one byte at a time, so writes need to happen a byte at a time. Thus, there are four control lines selecting which queue byte is written. (The 8088 could write to half of a 16-bit register but that would require moving the prefetched byte to the correct half of a 16-bit bus.) On the read side, it would make sense to have four read lines, selecting one byte from the 8088's queue. However, since the 8086 already had a multiplexer to select one byte from two, the 8088 designers probably felt it was easier to keep that circuit. And with the smaller queue on the 8088, there was no need to try to save space by removing the circuit. Thus, the queue has two read-select lines and a multiplexer control line. All these lines are controlled by the write position and read position flip-flops. ↩

[#] Sat Mar 30 2024 10:28:08 UTC from rss <>

Subject: Inside an unusual 7400-series chip implemented with a gate array

When I look inside a chip from the popular 7400 series, I know what to expect: a fairly simple die, implemented in a straightforward, cost-effective way. However, when I looked inside a military-grade chip built by Integrated Device Technology (IDT)4 I found a very unexpected layout: over 1500 transistors in an orderly matrix. Even stranger, most of the die is wasted: less than 20% of these transistors are used, forming scattered circuits connected by thin metal wires.

In this blog post, I look at this chip in detail, describe its gates, and explain how it implements the "1-of-4" decoder function. I also discuss why it sometimes makes sense to build chips with a gate array design such as this, despite the inefficiency.

A photo of the tiny silicon die in its package. This chip is the IDT 54FCT139ALB dual 1-of-4 decoder. Click this image (or any other) for a larger version.

In the photo below, you can see the silicon die in more detail, with the silicon appearing pink. The main circuitry is implemented in the nine rows that form the gate array, a grid of 1584 transistors. The tiny dark rectangles are transistors of two types, NMOS and PMOS, that work together to implement CMOS logic circuits. At this scale, the metal wiring is visible as faint gray lines and smudges, but most of the transistors are unconnected. Surrounding the gate array are 22 input/output (I/O) blocks each with a square bond pad. As with the transistors, many of these I/O blocks are unused. Fourteen of these bond pads have tiny metal bond wires (the thick black lines) that connect the silicon die to the chip's external pins. Finally, the pairs of bond wires at the center left and center right provide ground and power connections for the chip.

Closeup die photo.

The photo below zooms in on three rows of circuitry in the chip. The large dark rectangles are pairs of transistors, with two lines of transistors in each row of circuitry. At the top and bottom of each row, the thick horizontal white lines are metal wiring that provides power and ground. In each row, one line of transistors holds PMOS transistors, next to the power wiring, while the other line holds NMOS transistors, next to the ground wiring. (The orientation flips in each successive row, so it isn't obvious which transistors are which unless you check the power connections at the end of the row.)

A closeup of the die.

The transistors are wired into gates by the metal layers, the white lines. The gates are connected by horizontal and vertical wiring using the wiring channels between the rows. This wiring style is very similar to standard-cell logic. However, unlike standard-cell logic, the underlying transistor grid is fixed, resulting in wasted transistors. In the image above, most of the transistors in the middle row are used, while the top row is unused and the bottom row is mostly unused.

The diagram below shows the structure of one of the transistor blocks, which contains two tall, thin MOS transistors. The vertical metal contacts connect to the sources and drains of the transistors, with the two transistors sharing the middle contact. (On an integrated circuit, the source and drain of a transistor are identical, so it is arbitrary which side is the source and which is the drain.) The short horizontal metal contacts at the top connect to the gates of the two transistors; the gates are made of polysilicon, which is barely visible in the die photo. The gates partition the active silicon (green), forming the transistors. The gate width is approximately 1 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

NAND gate

In this section, I'll explain the construction of one of the NAND gates on the die. The NAND gate below uses four transistors, two NMOS transistors on the top and two PMOS transistors on the bottom. The white lines are the metal wiring, forming two layers. Most of the wiring (including power and ground) is in the lower (M1) layer. The slightly wider and darker vertical segments are the upper (M2) layer. The circles connect the metal layers when they join, or connect the metal layer to the underlying silicon or polysilicon. With two metal layers, it's a bit tricky to see how the wiring is connected. The A and B inputs each connect to two transistor gates. The transistor group at the top is connected to ground on the right, with the output on the left. The transistor group on the bottom is connected to Vcc on the left and right, with the output in the middle. This has the effect of putting the upper transistors in series and the lower transistors in parallel.

A NAND gate on the die.

Below, I've drawn the schematic of the NAND gate. On the left, the layout of the schematic matches the die layout above. On the right, I've redrawn the schematic with a more traditional layout. To understand its operation, note that a PMOS transistor (top on the right schematic) turns on when the input is low, while an NMOS transistor (bottom on the right) turns on when the input is high. When both inputs are high, the two NMOS transistors turn on, connecting ground to the output, pulling it low. When either input is low, one of the PMOS transistors turns on, pulling the output high. Thus, the circuit implements the NAND function. The NMOS and PMOS transistors operate in a complementary fashion, giving CMOS (Complementary MOS) its name.

Schematic of a NAND gate.

NOR gate

In this section, I'll explain the layout of one of the NOR gates on the die, shown below. This gate is twice as large as the previous NAND gate so it can provide twice the output current.1 The NOR gate uses eight transistors, four PMOS transistors in the upper half and four NMOS transistors in the lower half. (Note that Vcc and ground are flipped compared to the previous gate, as are the NMOS and PMOS transistors.) The two transistors in each block are wired in parallel to produce more current for the output. (A out is the same signal as A in, exiting the block at the top to connect to other circuitry.)

A NOR gate on the die.

The schematic below shows the wiring of the eight transistors. The schematic layout corresponds to the physical layout to make it easier to map between the image and the schematic. The upper transistor groups are wired in series, while the lower transistor groups are wired in parallel.

Schematic corresponding to the gate above.

The schematic below has been redrawn to make the functionality clearer, and the parallel transistors have been removed. If either input is high, one of the NMOS transistors on the bottom will turn on and pull the input low. If both inputs are low, the two PMOS transistors will turn on and pull the input high. This provides the desired NOR function.

Simplified NOR gate schematic.

Note that the NAND and NOR gates have similar but opposite schematics. In the NAND gate, the NMOS transistors are in series while the PMOS transistors are in parallel. In the NOR gate, the roles of the transistors are swapped.

The chip's circuit

The chip I examined is a "dual 1-of-4 decoder with enable".2 The decoding function takes a two-bit input and selects one of four output lines depending on the binary value. The enable line must be low to activate this operation; otherwise all four output lines are disabled. The chip contains two of these decoders, which is why it is called a dual decoder. In total, the chip contains 18 logic gates,3 so it is very simple, even by 1990s standards.

I reverse-engineered the chip and created the schematic below, showing one of the dual units. Each NAND gate matches one of the four input possibilities to drive one of the four outputs. The NOR gates support the ENABLE signal, blocking the outputs unless ENABLE is active (i.e. low).

Reverse-engineered schematic of half the chip.

The chip uses a general-purpose I/O block (below) for each pin, that can be used as an input or an output depending on how it is wired. Each block contains two large drive transistors: an NMOS transistor to pull the output low and a PMOS transistor to pull the output high. The I/O block has separate control lines for the two output transistors. (At the bottom of the image below, two thin metal wires drive the high-side and low-side transistors.) This permits tri-state logic: if neither transistor is energized, the output is left floating. The gate array drives the output transistors with high-current inverter, constructed from multiple transistors in parallel. (This is why the schematic shows more inverters than may seem necessary.)

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

When used as an input, the pad is wired to the surrounding circuitry slightly differently, connecting to input protection diodes (not shown on the schematic). Thus, the functionality of the I/O blocks can be changed by modifying the metal layers, without changing the underlying silicon.

Some 7400-series history

The earliest logic integrated circuits used resistors and transistors internally, so they were called RTL (Resistor Transistor Logic), but RTL had significant performance problems. RTL was rapidly replaced by Diode Transistor Logic (DTL) and then Transistor Transistor Logic (TTL). In 1964, Texas Instruments created a line of TTL integrated circuits for military applications called the SN5400 series. This was shortly followed by the commercial-grade SN7400 series.

The 7400 series of integrated circuits was inexpensive, fast, and easy to use. The line started with simple logic circuits such as four NAND gates on a chip, and moved into more complex chips such as counters, shift registers, and ALUs. The 7400 series became very popular in the 1970s and 1980s, used by electronics hobbyists and high-performance minicomputers alike. These chips became essential building blocks and "glue" logic for microcomputers, heavily used in the Apple II for instance.

The original 7400 series branched into dozens of families with different performance characteristics but the same functionality. The 74LS (low-power Schottky) family, for instance, became very popular as it both improved speed and reduced power consumption. In the mid-1970s, 7400-series chips were introduced that used CMOS circuitry instead of TTL for dramatically lower power consumption. This CMOS family, the 74C series, was followed by numerous other CMOS families.

That brings us to the chip I examined, a member of IDT's 74FCT (Fast CMOS TTL-compatible) line of chips, introduced in the mid-1980s. (Specifically, it is in the 54FCT family because it handles a wider temperature range.) These chips used advanced CMOS technology to provide high speed, low power consumption, and as a military option, radiation tolerance.

Conclusions

Why would you make a chip in this inefficient way, using a gate array that wastes most of the die area? The motivation is that most of the design cost can be shared across many different part types. Each step of integrated circuit processing requires an expensive mask for photolithography. With a gate array, all chip types use the same underlying silicon and transistors, with custom masks just for the two metal layers. In comparison, a fully custom chip might require eight custom masks, which costs much more. The tradeoff is that gate array chips are larger so the manufacturing cost is higher per device.5 Thus, a gate array design is better when selling chips in relatively small quantities, while a custom design is cheaper when mass-producing chips.6 IDT focused on the high-performance and military market rather than the commodity chip market, so gate arrays were a good fit.

One last thing. The packaging of this chip is very interesting since it is mounted on a multi-chip module. The module also contains two Atmel EEPROMs. Presumably the decoder chip decodes address bits to select one of the EEPROMs.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

Thanks to Don S. for providing the chip. Follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

Properly sizing the transistors in a gate is important for performance. Since the transistors in the gate array are all the same size, multiple transistors are used in parallel to get the desired current. The 1999 book Logical Effort describes a methodology for maximizing the performance of CMOS circuits by correctly sizing the transistors. ↩
The part number is "IDT 54FCT139ALB". "54" indicates the chip operates under an enhanced temperature range of -55°C to +125°C. The "A" indicates the chip is 35% faster than the base series (but not as fast as "C"). "L" indicates the chip is packaged in a leadless chip carrier, the square package shown at the top of the article. Finally, "B" indicates the chip was tested according to military standards: MIL-STD-883, Class B. ↩
The chip contains 18 logic gates according to the functional schematic in the datasheet (below). The implementation actually uses 52 logic gates by my count (2×26) because the implementation doesn't exactly match the schematic. In particular, the datasheet shows three-input NAND gates, but the chip uses a NAND gate and a NOR gate along with inverters. The chip also has additional inverters to drive the output transistors in each I/O block.

Schematic of the chip from the datasheet.

↩
Integrated Device Technology was a spinoff from Hewlett Packard that started in 1980. IDT built advanced CMOS chips including fast static RAM and microprocessors (bit-slice and MIPS). It became part of Renesas in 2018. A very detailed 1986 profile of IDT is here. IDT's logo is pretty cool, combining a chip wafer and calculus.

The logo of Integrated Device Technology.

Here's how the logo looks on the die:

Closeup of the die showing the IDT logo.

The die also has the initials of the designers, along with some mysterious symbols. One looks like the Chinese character "正".

↩
Closeups of two parts of the die.
Integrated circuit manufacturing is partitioned into the "front end of line", where the transistors are created on the silicon wafer, and the "back end of line", where the metal wiring is put on top to connect the transistors. With a gate array construction, the front end of line steps create generic gate array wafers. The back end of line steps then connect the transistors as desired for a particular component. The gate array wafers can be produced in large quantities and stored, and then customized for specific products in smaller quantities as needed. This reduces the time to supply a particular chip type since only the back end of line process needs to take place. ↩
The IDT High-Speed CMOS Logic Design Guide briefly mentions the gate array design. The FCT family was built from two sizes of gate arrays, "4004" for smaller chips and "8000" for larger chips. Later, IDT shrunk the original "Z-step" gate arrays to smaller, higher-performance "Y-step" arrays. They then customized some of the devices to create the "W-step" devices. Looking at the markings on the die, we see that this chip uses the "4004Y" gate array.

The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

↩

[#] Sun Apr 28 2024 08:35:56 UTC from rss <>

Subject: Talking to memory: Inside the Intel 8088 processor's bus interface state machine

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the success of the x86 architecture. The designers of the IBM PC selected the 8088 for multiple reasons, but a key factor was that the 8088 processor's 8-bit bus was similar to the bus of the 8085 processor.1 The designers were familiar with the 8085 since they had selected it for the IBM System/23 Datamaster, a now-forgotten desktop computer, making the more-powerful 8088 processor an easy choice for the IBM PC.

The 8088 processor communicates over the bus with memory and I/O devices through a highly-structured sequence of steps called "T-states." A typical 8088 bus cycle consists of four T-states, with one T-state per clock cycle. Although a four-step bus cycle may sound straightforward, its implementation uses a complicated state machine making it one of the most difficult parts of the 8088 to explain. First, the 8088 has many special cases that complicate the bus cycle. Moreover, the bus cycle is really six steps, with two undocumented "extra" steps to make bus operations more efficient. Finally, the complexity of the bus cycle is largely arbitrary, a consequence of Intel's attempts to make the 8088's bus backward-compatible with the earlier 8080 and 8085 processors. However, investigating the bus cycle circuitry in detail provides insight into the timing of the processor's instructions. In addition, this circuitry illustrates the tradeoffs and implementation decisions that are necessary in a production processor. In this blog post, I look in detail at the circuitry that implements this state machine.

By examining the die of the 8088 microprocessor, I could reverse engineer the bus circuitry. The die photo below shows the 8088 microprocessor's silicon die under a microscope. Most visible in the photo is the metal layer on top of the chip, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below, with the two units running largely independently. The BIU handles bus communication (memory and I/O accesses), while the Execution Unit (EU) executes instructions. In the diagram, I've labeled the processor's key functional blocks. This article focuses on the bus state machine, highlighted in red, but other parts of the Bus Interface Unit will also play a role.

Although I'm focusing on the 8088 processor in this blog post, the 8086 is mostly the same. The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. In this article, I'm focusing on the 8088 processor, but most of the description applies to the 8086 as well. Instead of constantly saying "8086/8088", I'll refer to the 8088 and try to point out places where the 8086 is different.

The bus cycle

In this section, I'll describe the basic four-step bus cycles that the 8088 performs.2 To start, the diagram below shows the states for a write cycle (slightly simplified3), when the 8088 writes to memory or an I/O device. The external bus activity is organized as four "T-states", each one clock cycle long and called T1, T2, T3, and T4, with specific actions during each state. During T1, the 8088 outputs the address on the pins. During the T2, T3, and T4 states, the 8088 outputs the data word on the same pins. The external memory or I/O device uses the T states to know when it is receiving address information or data over the bus lines.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

For a read, the bus cycle is slightly different from the write cycle, but uses the same four T-states. During T1, the address is provided on the pins, the same as for a write. After that, however, the processor's data pins are "tri-stated" so they float electrically, allowing the external memory to put data on the bus. The processor reads the data at the end of the T3 state.

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

The purpose of the bus state machine is to move through these four T states for a read or a write. This process may seem straightforward, but (as is usually the case with the 8088) many complications make this process anything but easy. In the next sections, I'll discuss these complications. After that, I'll explain the state machine circuitry with a schematic.

Address calculation

One of the notable (if not hated) features of the 8088 processor is segmentation: the processor supports 1 megabyte of memory, but memory is partitioned into segments of 64 KB for compatibility with the earlier 8080 and 8085 processors. The 8088 calculates each 20-bit memory address by adding the value of a segment register to a 16-bit offset. This calculation is done by a dedicated address adder in the Bus Interface Unit, completely separate from the chip's ALU. (This address adder can be spotted in the upper left of the earlier die photo.)

Calculating the memory address complicates the bus cycle. As the timing diagrams above show, the processor issues the memory address during state T1 of the bus cycle. However, it takes time to perform the address calculation addition, so the address calculation must take place before T1. To accomplish this, there are two "invisible" bus states before T1; I call these states "TS" (T-start) and "T0". During these states, the Bus Interface Unit uses the address adder to compute the address, so the address will be available during the T1 state. These states are invisible to the external circuitry because they don't affect the signals from the chip.

Thus, a single memory operation takes six clock cycles: two preparatory cycles to compute the address before the four visible cycles. However, if multiple memory operations are performed, the operations are overlapped to achieve a degree of pipelining that improves performance. Specifically, the address calculation for the next memory operation takes place during the last two clock cycles of the current memory operation, saving two clock cycles. That is, for consecutive bus cycles, T3 and T4 of one bus cycle overlap with TS and T0 of the next cycle. In other words, during T3 and T4 of one bus cycle, the memory address gets computed for the next bus cycle. This pipelining significantly improves the performance of the 8088, compared to taking 6 clock cycles for each bus cycle.

With this timing, the address adder is free during cycles T1 and T2. To improve performance in another way, the 8088 uses the adder during this idle time to increment or decrement memory addresses. For instance, after popping a word from the stack, the stack pointer needs to be incremented by 2.5 Another case is block move operations (string operations), which need to increment or decrement the pointers each step. By using the address adder, the new pointer value is calculated "for free" as part of the memory cycle, without using the processors regular ALU.4

Address corrections

The address adder is used in one more context: correcting the Instruction Pointer value. Conceptually, the Instruction Pointer (or Program Counter) register points to the next instruction to execute. However, since the 8088 prefetches instructions, the Instruction Pointer indicates the next instruction to be fetched. Thus, the Instruction Pointer typically runs ahead of the "real" value. For the most part, this doesn't matter. This discrepancy becomes an issue, though, for a subroutine call, which needs to push the return address. It is also an issue for a relative branch, which jumps to an address relative to the current execution position.

To support instructions that need the next instruction address, the 8088 implements a micro-instruction CORR, which corrects the Instruction Pointer. This micro-instruction subtracts the length of the prefetch queue from the Instruction Pointer to determine the "real" Instruction Pointer. This subtraction is performed by the address adder, using correction constants that are stored in a small Constant ROM.

The tricky part is ensuring that using the address adder for correction doesn't conflict with other uses of the adder. The solution is to run a special shortened memory cycle—just the TS and T0 states—while the CORR micro-instruction is performed.6 These states block a regular memory cycle from starting, preventing a conflict over the address adder.

A closeup of the address adder circuitry in the 8086. From my article on the adder.

Prefetching

The 8088 prefetches instructions before they are needed, loading instructions from memory into a 6-byte prefetch queue. Prefetching usually improves performance, but can result in an instruction's memory access being delayed by a prefetch, hurting overall performance. To minimize this delay, a bus request from an instruction will preempt a prefetch, even if the prefetch has gone through TS and T0. At that point, the prefetch hasn't created any bus activity yet (which first happens in T1), so preempting the prefetch can be done cleanly. To preempt the prefetch, the bus cycle state machine jumps back to TS, skipping over T1 through T4, and starting the desired access.

A prefetch will also be preempted by the micro-instruction that stops prefetching (SUSP) or the micro-instruction that corrects addresses (CORR). In these cases, there is no point in completing the prefetch, so the state machine cycle will end with T0.

Wait states

One problem with memory accesses is that the memory may be slower than the system's clock speed, a characteristic of less-expensive memory chips. The solution in the 1970s was "wait states". If the memory couldn't respond fast enough, it would tell the processor to add idle clock cycles called wait states, until the memory could respond.7 To produce a wait state, the memory (or I/O device) lowers the processor's READY pin until it is ready to proceed. During this time, the Bus Interface Unit waits, although the Execution Unit continues operation if possible. Although Intel's documentation gives the wait cycle a separate name (Tw), internally the wait is implemented by repeating the T3 state as long as the READY pin is not active.

Halts

Another complication is that the 8088 has a HALT instruction that halts program execution until an interrupt comes in. One consequence is that HALT stops bus operations (specifically prefetching, since stopping execution will automatically stop instruction-driven bus operations). A complication is that the 8088 indicates the HALT state to external devices by performing a special T1 bus cycle without any following bus cycles. But wait: there's another complication. External devices can take control of the bus through the HOLD functionality, allowing external devices to perform operations such as DMA (Direct Memory Access). When the device ends the HOLD, the 8088 performs another special T1 bus cycle, indicating that the HALT is still in effect. Thus, the bus state machine must generate these special T1 states based on HALT and HOLD actions. (I discussed the HALT process in detail here.)

Putting it all together: the state diagram

The state diagram below summarizes the different types of bus cycles. Each circle indicates a specific T-state, and the arrows indicate the transitions between states. The green line shows the basic bus cycle or cycles, starting in TS and then going around the cycle. From T3, a new cycle can start with T0 or the cycle will end with T4. Thus, new cycles can start every four clocks, but a full cycle takes six states (counting the "invisible" TS and T0). The brown line shows that the bus cycle will stay in T3 as long as there is a wait state. The red line shows the two cycles for a CORR correction, while the purple line shows the special T1 state for a HALT instruction. The cyan line shows that a prefetch cycle can be preempted after T0; the cycle will either restart at TS or end.

A state diagram showing the basic bus cycle and various complications.

I'm showing states TS and T3 together since they overlap but aren't the same. Likewise, I'm showing T4 and T0 together. T4 is grayed out because it doesn't exist from the state machine's perspective; the circuitry doesn't take any particular action during T4.

The schematic below shows the implementation of the state machine. The four flip-flops represent the four states, with one flip-flop active at a time, generating states T0, T1, T2, and T3 (from top to bottom). Each output feeds into the logic for the next state, with T3 wrapping back to the top, so the circuit moves through the states in sequence. The flip-flops are clocked so the active state will move from one flip-flop to the next according to the system clock. State TS doesn't have its own flip-flop, but is represented by the input to the T0 flip-flop, so it happens one clock cycle earlier.8 State T4 doesn't have a flip-flop since it isn't "real" to the bus state machine. The logic gates handle the special cases: blocking the state transfer if necessary or starting a state.

Schematic of the state machine.

I'll explain the logic for each state in more detail. The circuitry for the TS state has two AND gates to generate new bus cycles starting from TS. The first one (a) causes TS to happen with T3 if there is a pending bus request (and no HOLD). The second AND gate (b) starts a bus cycle if the bus is not currently active and there is a bus request or a CORR micro-instruction. The flip-flop causes T0 to follow T3/TS, one clock cycle later.

The next gates (c) generate the T1 state following T0 if there is pending bus activity and the cycle isn't preempted to T3. The AND gate (d) starts the special T1 for the HALT instruction.9 The T2 state follows T1 unless T1 was generated by a HALT (e).

The T3 logic is more complicated. First, T3 will always follow T2 (f). Next, a wait state will cause T3 to remain in T3 (g). Finally, for a preempt, T3 will follow T0 (h) if there is a prefetch and a microcode bus operation (i.e. an instruction specified the bus operation).

Next, I'll explain BUS-ACTIVE, an important signal that indicates if the bus is active or not. The Bus Interface Unit generates the BUS-ACTIVE signal to help control the state machine. The BUS-ACTIVE signal is also widely used in the Bus Interface Unit, controlling many functions such as transfers to and from the address registers. BUS-ACTIVE is generated by the complex circuit below that determines if the bus will be active, specifically in states T0 through T3. Because of the flip-flop, the computation of BUS-ACTIVE happens in the previous clock cycle.

The circuit to determine if the bus will be active next cycle.

In more detail, the signal BUS-ACTIVE-PRE indicates if the bus cycle will continue or will end on the next clock cycle. Delaying this signal through the flip-flop generates BUS-ACTIVE, which indicates if the bus is currently active in states T0 through T3. The top AND gate (a) is responsible for starting a cycle or keeping a cycle going (a1). It will allow a new cycle if there is a bus request (without HOLD) (a3). It will also allow a new cycle if there is a CORR micro-instruction prior to the T1 state (even if there is a HOLD, since this "fake" cycle won't use the bus) (a2). Finally, it allows a new cycle for a HALT, using T1-pre (a2).10 Next are the special cases that end a bus cycle. The second AND gate (b) ends the bus cycle after T3 unless there is a wait state or another bus request. (But a HOLD will block the next bus request.) The remaining gates end the cycle after T0 to preempt a prefetch if a CORR or SUSP micro-instruction occurs (d), or end after T1 for a HALT (e).

The BUS-ACTIVE circuit above uses a complex gate, a 5-input NOR gate fed by 5 AND gates with two attached OR gates. Surprisingly, this is implemented in the processor as a single gate with 14 inputs. Due to how gates are implemented with NMOS transistors, it is straightforward to implement this as a single gate. The inverter and NOR gate on the left, however, needed to be implemented separately, as they involve inversion; an NMOS gate must have a single inversion.

The bus state machine circuitry on the die.

The diagram above shows the layout of the bus state machine circuitry on the die, zooming in on the top region of the die. The metal layer has been removed to expose the underlying silicon and polysilicon. The layout of each flip-flop is completely different, since the layout of each transistor is optimized to its surroundings. (This is in contrast to later processors such as the 386, which used standard-cell layout.) Even though the state machine consists of just a handful of flip-flops and gates, it takes a noticeable area on the die due to the large 3.2 µm feature size of the 8088. (Modern processors have features measured in nanometers, not micrometers.)

Conclusions

The bus state machine is an example of how the 8088's design consists of complications on top of complications. While the four-state bus cycle seems straightforward at first, it gets more complicated due to prefetching, wait states, the HALT instruction, and the bus hold feature, not to mention the interactions between these features. While there were good motivations behind these features, they made the processor considerably more complicated. Looking at the internals of the 8088 gives me a better understanding of why simple RISC processors became popular.

The bus state machine is a key part of the read and write circuitry, moving the bus operation through the necessary T-states. However, the state machine is not the only component in this process; a higher-level circuit decides when to perform a read, write, or prefetch, as well as breaking a 16-bit operation into two 8-bit operations.11 These circuits work together with the higher-level circuit telling the state machine when to go through the states.

In my next blog post, I'll describe the higher-level memory circuit so follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process, and the 8086 registers earlier.

Notes and references

The 8085 and 8088 processors both use a 4-step bus cycle for instruction fetching. For other reads and writes, the 8085's bus cycle has three steps compared to four for the 8088. Thus, the 8085 and 8088 bus cycles are similar but not an exact match. ↩
The 8088 has separate instructions to read or write an I/O device. From the bus perspective, there's no difference between an I/O operation and a memory operation except that a pin on the chip indicates if the operation is for memory or I/O.

The 8088 supports I/O operations for historical reasons, going back through the 8086, 8080, 8008, and the Datapoint 2200 system. In contrast, many other contemporary processors such as the 6502 used memory-mapped I/O, using standard memory accesses for I/O devices.

The 8086 has a pin M/IO that is high for a memory access and low for an I/O access. External hardware uses this pin to determine how to handle the request. Confusingly, the pin's function is inverted on the 8088, providing IO/M. One motivation behind the 8088's 8-bit bus was to allow reuse of peripherals from the earlier 8-bit 8085 processor. Thus, the pin's function was inverted so it matched the 8085. (The pin is only available when the 8086/8088 is used in "minimum mode"; "maximum mode" remaps some of the pins, making the system more complicated but providing more control.) ↩
I've made the timing diagram somewhat idealized so actions line up with the clock. In the real datasheet, all the signals are skewed by various amounts so the timing is more complicated. See the datasheet for pages of timing constraints on exactly when signals can change. ↩
For more information on the implementation of the address adder, see my previous blog post. ↩
The POP operation is an example of how the address adder updates a memory pointer. In this case, the stack address is moved from the Stack Pointer to the IND register in order to perform the memory read. As part of the read operation, the IND register is incremented by 2. The address is then moved from the IND register to the Stack Pointer. Thus, the address adder not only performs the segment arithmetic, but also computes the new value for the SP register.

Note that the increment/decrement of the IND register happens after the memory operation. For stack operations, the SP must be decremented before a PUSH and incremented after a POP. The adder cannot perform a predecrement, so the PUSH instruction uses the ALU (Arithmetic/Logic Unit) to perform the decrement. ↩
During the CORR micro-instruction, the Bus Interface Unit performs special TS and T0 states. Note that these states don't have any external effect, so they are invisible outside the processor. ↩
The tradeoff with memory boards was that slower RAM chips were cheaper. The better RAM boards advertised "no wait states", but cheaper boards would add one or more wait states to every access, reducing performance. ↩
Only the second half of the TS state has an effect on the Bus Interface Unit, so TS is not a full state like the other states. Specifically, a delayed TS signal is taken from the first half of the T0 flip-flop, and this signal is used to control various actions in the Bus Interface Unit. (Alternatively, you could think of this as an early T0 state.) This is why there isn't a separate flip-flop for the TS state. I suspect this is due to timing issues; by the time the TS state is generated by the logic, there isn't enough time to do anything with the state in that half clock cycle, due to propagation delays. ↩
There is a bit more circuitry for the T1 state for a HALT. Specifically, there is a flip-flop that is set on this signal. On the next cycle, this flip-flop both blocks the generation of another T1 state and blocks the previous T1 state from progressing to T2. In other words, this flip-flop makes sure the special T1 lasts for one cycle. However, a HOLD state resets this flip-flop. That allows another special T1 to be generated when the HOLD ends. ↩
The trickiest part of this circuit is using T1-pre to start a (short) cycle for HALT. The way it works is that the T1-pre signal only makes a difference if there isn't a bus cycle already active. The only way to get an "unexpected" T1-pre signal is if the state machine generates it for the first cycle of a HALT. Thus, the HALT triggers T1-pre and thus the bus-active signal. You might wonder why the bus-active uses this roundabout technique rather than getting triggered directly by HALT. The motivation is that the special T1 state for HALT requires the AND of three signals to ensure that the state is generated once for the HALT rather than continuously, but happens again after a HOLD, and waits until the current bus cycle is done. Instead of duplicating that AND gate, the circuit uses T1-pre which incorporates that logic. (This took me a long time to figure out.) ↩
The 8088 has a 16-bit bus, compared to the 8088's 8-bit bus. Thus, a 16-bit bus operation on the 8088 will always require two 8-bit operations, while the 8086 can usually perform this operation in a single step. However, a 16-bit bus operation on the 8086 will still need to be broken into two 8-bit operations if the address is unaligned (i.e. odd). ↩

[#] Wed May 29 2024 07:16:53 UTC from rss <>

Subject: Inside a vintage aerospace navigation computer of uncertain purpose

I recently obtained an aerospace computer from the early 1970s, apparently part of a navigation system. Aerospace computers are an interesting but mostly neglected area of computer hardware, so I'm always delighted to examine one up close. In an era when most computers were large mainframes, aerospace computers packed dense electronics into a small package, using technologies such as surface-mounted components and multi-layer printed circuit boards, technologies that wouldn't reach the mainstream for another decade. This blog post examines the circuitry and components inside this computer, including an unusual electromechanical display. Although I was unable to determine who manufactured this system or even its exact function, this system illustrates how hundreds of integrated circuits and a core memory stack can be crammed into a compact package.

The navigation computer, showing the front panel with the display and keyboard, with the electronics unit behind it. Click this image (or any other) for a larger version.

The keyboard

The device has a simple numeric keyboard with a few unexpected features. The numeric keypad can also be used for direction entry, as four of the keys have N, S, E, and W on them. The keys are large, roughly the size of the Apollo spacecraft's DSKY buttons. My theory is that these buttons are designed for operation with gloves, perhaps in a fighter plane where the pilot wears a pressure suit. The buttons are hinged at the top, so they don't push straight in, but pivot when pressed.

Numeric keypads typically use one of two layouts: a telephone-style keypad has the digits 123 at the top, while a calculator-style keypad has the digits 789 at the top. Interestingly, this device uses a calculator layout, while most aviation devices have a telephone layout. The Apollo DSKY also used a calculator layout, which could be a hint at a NASA connection for this device.

Above the keyboard are four codes for self-test: N4576, E9384, S9021, and W4830. Entering these codes on the keyboard presumably triggered the appropriate test of the system when the switch is in test mode.

The display

The computer's display is simple, showing a latitude and longitude. Each value has one decimal position, providing 0.1° of accuracy. The latitude and longitude are prefixed with a compass direction: North/South for latitude and East/West for longitude.

The front panel of the navigation computer, with a display and keyboard.

The display is constructed from an unusual type of electromechanical indicator, with an indicator module for each digit. Each digit position has a rotating wheel with 11 positions (ten digits and a blank). When the indicator module for a position is energized, the wheel spins to the specified position, showing the selected digit. The two leftmost indicators are slightly different as they show a compass direction instead of a digit: N, S, E, or W. Moreover, the direction indicators can also show the compass direction with a diagonal slash through it, as seen above. Perhaps the slashed direction indicates a problem with the value.

The diagram below shows how a digit indicator operates. Each digit position has an electromagnet with a wire to energize it. The dial wheel has an attached permanent magnet (indicated by N and S). Energizing one of the electromagnets causes the dial to spin to that position, aligning the permanent magnet on the dial with the electromagnet. This mechanism forms a reliable indicator with just one moving part. The displayed digit is clearer than a seven-segment display since the digit uses a real font rather than being created from segments.

A diagram illustrating the magnetic indicator construction. From Patent 3201785. The patent describes a different indicator but the construction is similar.

Looking at the back of the keyboard/display unit shows the wiring of the display indicators. Each indicator has a common connection and ten wires to energize one of the electromagnets.1 The electromagnets are connected in a matrix, with all the "1" wires connected, the "2" wires connected, and so forth. To rotate an indicator to a particular digit, a common wire and an electromagnet wire are energized. For instance, powering the common wire of the second indicator and the "5" electromagnetic wire causes the second indicator to rotate to the "5" position. The wiring has a three-dimensional structure with ten bare wires running between the boards, one for each digit value. A yellow wire hangs off each bare wire, linking it to the connector on the left. Each indicator has ten diodes on a circuit board to block "sneak" paths that would energize unselected electromagnets.

The back of the keyboard/display unit. The keyboard buttons are at the back of this photo, while the display modules are at the front.

This matrix circuit reduces the amount of wiring required: although there are 100 electromagnets in total, just 20 wires are sufficient to control them. The driver circuitry, however, is a bit more complex as it must scan through the ten digit positions, activating the right pair of driver wires at the right time. Some of the logic circuitry described below must implement this scanning, as well as the driver circuitry to energize the indicators.

The display and keyboard have many similarities to the Delco Carousel Inertial Navigation System (INS) shown below. (The Delco Carousel was used in many military and civilian aircraft, from the C-141 cargo plane to the Boeing 747 passenger plane.) Both devices have two digital displays, one for latitude North/South and one for longitude East/West. Also note the numeric keypads with four keys assigned to the four compass directions. The controls of the Carousel INS system are considerably more complicated, though. The Carousel has a knob position "TK/GS" (track/ground speed), which may correspond to the "T/G" position on my device.

Control unit for the Delco Carousel inertial navigation system. From Smithsonian collection, gift of Delphi Electronics & Safety.

Note that the display on my unit has just four digits of accuracy, with one digit after the decimal point. A tenth of a degree would provide an accuracy of about ±7 miles, which is low for a navigation device. In comparison, the Delco Carousel has six digits of accuracy (± 100 feet perhaps). This suggests that the device does not provide INS navigation, but some other guidance with lower accuracy.

Packaging the electronics

The unit contains 14 circuit boards, crammed with TTL integrated circuits, along with a core memory stack. The photo below shows how circuit boards surround the core memory stack. The mechanical design of the unit is advanced, allowing the boards to be opened up like a book. This provides compact packaging while allowing access to the boards.

The electronics unit can be disassembled and folds open like a book.

The circuit boards are four-layer printed circuit boards, more advanced than the common two-layer boards of the time. The boards use a mixture of surface-mounted and through-hole components. The flat-pack ICs and the tiny round transistors are surface mounted, which was rare at the time. On the other hand, the resistors, capacitors, diodes, and larger transistors use standard through-hole components. At the time, most electronics used through-hole components, although aerospace systems often used surface-mounted components for higher density. It wasn't until the late 1980s that surface-mount technology became commonplace.

The boards are mounted in solid metal frames, providing both structural integrity and heat conduction for cooling. Most of the frames hold two boards, mounted back-to-back for higher density.

The logic boards

Four of the circuit boards are logic boards, packed with flat-pack integrated circuits. The board below holds 55 integrated circuits, showing the high density that is possible with flat packs.

A board filled with flat-pack logic ICs.

The logic ICs are Signetics 400-series chips, an early type of TTL (Transistor-Transistor Logic) chip. Just three types of these ICs are used: SE440J "Dual exclusive OR" (really AND-OR-INVERT but XOR if provided with particular inputs), SE455J "Dual 4-input buffer/driver" (4-input NAND or NOR gates depending on polarity), and SE480J "Quad 2-input NAND/NOR". These integrated circuits cost $15.45 each in 1966 (about $150 each in current dollars).2

The schematic below shows the circuit that implements AND-OR-INVERT (or exclusive or) in the SE440J. The multiple-emitter transistors on the inputs may appear unusual, but this is the standard way to implement TTL gates. It is important to note that this chip only contains 12 transistors, so the density is low. (Since the chip contains two of these gates, this circuit is duplicated.) In the mid-1960s, integrated circuits only contained a few transistors—the Apollo Guidance Computer's ICs had just 6 transistors—but by the time this unit was built in the early 1970s, some chips had thousands of transistors, tracking Moore's Law. Thus, this unit both illustrates how aviation computers could be built from simple integrated circuits and how the dramatic improvements in IC technology rapidly obsoleted these computers.

Schematic of the SE440J integrated circuit. From datasheet.

The Signetics 400-series seems to have been obscure and short-lived, probably killed off by the wild success of 7400-series TTL chips. I was able to find only a few announcements and datasheets for these chips. The only users of these chips that I could find were NASA projects from the late 1960s.3 Signetics 400-series chips were used in the Mariner Mars and Venus probes, in the Data Automation Subsystem (DAS) (link, link). The Voyager Mars probes also used them. The SE455J gates were also used to interface the Apollo Guidance Computer to a core-rope simulator. JPL used the SE455J in a core memory system. NASA used the SE455J, SE480J, and other Signetics chips in its design for the MICROMIN computer. None of these systems appear to be related to the navigation system, but they illustrate that NASA was using these specific Signetics chips at the time in multiple designs.

The chips are labeled "CDC", raising the possibility that these chips were built by Control Data Corporation (CDC) under license from Signetics. The Aerospace Division of CDC was active at the time, building various compact computer systems. For instance, the CDC 480 computer (1976) was a 16-bit computer based on the Am2900 bit-slice chip. Also known as the AN/AYK-14, this system was used on numerous aircraft including the F-18. An earlier CDC aerospace computer is the AN/AWG-9 Airborne Missile Control System (1965), a 24-bit computer in a compact 1.1 cubit foot package. Used on the F-14 fighter plane, this computer guided the Phoenix air-to-air missile. Based on CDC's activity in aerospace computers at the time, the mystery computer could be a CDC system, although this hypothesis is based solely on integrated circuits labeled "CDC".

The CDC AN/AYK-14 computer with circuit boards. This is an example of an aerospace computer built by CDC slightly later than the mystery computer. From a 1983 brochure.

The photo below shows another logic board. This one has numerous red and white wires attached, linking it to the rest of the system. Curiously, this board has a single transistor, with two associated resistors, in the middle of the board.

Another logic board, with a similar grid of flat-pack integrated circuits.

Analog boards

The computer contains not only logic boards but also boards full of analog circuitry to interface with the core memory, keyboard, and display. The board below contains 17 of the logic ICs seen earlier. However, it also uses many resistors, capacitors (red cylinders), transistors (white circles), inductors (white banded cylinders), and glass diodes. The board also has some analog integrated circuits. In particular, it has three TI SN52709 op-amps, the smaller 10-pin packages. The board also contains some integrated circuits that I couldn't identify: UT1000, UT1027, UD4001, and D245F. The SM 60 ICs in white packages have a logo that I don't recognize. The op-amps could function as sense amplifiers for the core memory, or this board could provide other analog interfacing.

A board with some analog integrated circuits.

The board has multiple gray four-pin packages labeled "926D". Based on the + and - markings, these packages are probably bridge rectifiers, maybe providing power for the circuits. Many of the other boards have these rectifiers. The analog boards also contain a few Halex flat-pack devices labeled "HALEX 101205 727". Hanlex manufactured thin-film resistors in flat packs, so these are probably resistor networks. NASA used Halex resistor networks in some devices (link).4

The analog board shown below sits next to the core memory stack. It uses a different set of flat-pack components: Signetics C8930G and PL 98321. Unfortunately, I could not identify these ICs. This board, unlike the previous boards, has a copper ground plane in the second layer of the circuit board; this layer is visible in the photo as the copper-colored background occupying most of the board.

Another analog board in the aviation computer.

Core memory

The unit is built around a core memory stack, as was common in the era before semiconductor memory took over. Magnetic core memory consists of a grid of tiny ferrite cores with wires threaded through them, forming a core plane. Typically, a core memory unit consists of multiple planes, one for each bit in the word, stacked to form a three-dimensional block of memory.

The photo below shows a closeup of the stack. It appears to have 20 planes, suggesting a 20-bit processor. Soldered wires connect the planes together to provide continuous wiring through the stack. The soldering on these wires looks somewhat haphazard, suggesting that this was not a production unit.

A closeup of the core memory stack. Brightly colored wires connect the module to the rest of the system. Small wires connect the layers together.

The photo below shows the other side of the core memory stack, with similar wiring between the planes. At the right are a few layers of a different type, connected with 26 wires. The tape measure shows that the core memory stack is compact, about 6 cm on a side (2¼").

Measurement of the core memory stack.

Some of the boards are drivers for the core memory stack. The board below has 48 small round transistors, colored either blue or red. Note the green, white, and yellow wires in the lower right, mostly hidden under the brown ground ribbon. These wires are connected to the core memory stack.

A circuit board with many small transistors.

The board below also has numerous wires to the core stack, underneath the brown ground ribbon, so it is presumably another driver board. This board has some round driver transistors with yellow dots. Curiously, in the upper left there are a few circuit board pads where transistors could be mounted but are missing. Perhaps with the additional components the board would support a system with more of something: a larger keyboard? more memory?

A board with driver transistors.

Looking at the back of the unit, you can see the display indicator wiring at the top and a circuit board at the bottom. This board contains 20 transistors in metal cans, specifically Motorola 2N3736 NPN transistors. The core memory stack has 20 planes, matching the 20 transistors on this board, so the board probably implements the core memory "inhibit drivers", controlling the bit written to each plane. The board also has numerous tiny surface-mount transistors in white, red, and black packages. Close examination shows a few thin green "bodge" wires on this board, indicating that rework was performed on the board to fix a circuit problem, another piece of evidence that this unit is a prototype.

A view of the computer from the back, showing the display wiring and a circuit board.

The core memory stack is enclosed by two sheet metal boxes, which I removed for the photos. The stack also has two flexible ground planes attached to it. The designers clearly wanted to ensure that the memory was well shielded, to a degree that I haven't seen in other systems.

Conclusions

Despite my research, this aerospace computer remains a mystery. I was unable to identify who manufactured it or even its exact function. One hypothesis is a NASA connection since NASA was extensively using these Signetics chips at the time. Moreover, this computer was obtained in the Houston area. Another hypothesis, based on the "CDC" label on the chips, is that this computer was built by Control Data's Aerospace Division. If you have any leads on this mysterious aviation computer, please contact me.

This system may have been a prototype. It has no part numbers, manufacturer name, or identifying plate.5 Moreover, the soldering on the core memory stack doesn't seem to be flight quality. Finally, the boards don't have conformal coating, which is typically used for spaceflight systems. However, the mechanical design looks advanced for a prototype, with dense boards that fold together like a book.

This unit clearly has a navigation role, but seems to be too inaccurate for an inertial navigation system (INS). It contains many integrated circuits, but not enough to form a full computer. I hypothesize that this unit contains the circuitry to drive the core memory and the display, and handle keyboard input. Looking at the underside of the unit (below), there are three connectors. I suspect these connectors were plugged into a larger box that held the computer itself.

A view of the underside of the electronics unit with the core memory wrapped in sheet metal.

The date codes on the integrated circuits range from 1966 to 1973, so the computer was probably manufactured in 1973. The seven-year range for date codes is a bit surprising, since integrated circuit technology changed a lot during these years. I suspect that the Signetics 400-series ICs had older date codes because this line didn't catch on so there was a lot of old stock rather than newly-manufactured parts. I also suspect that this system was designed around 1969, based on the multiple NASA systems using these chips then, suggesting that the design and manufacturing of this unit was a multi-year project.

Despite the lingering mysteries of this device, it provides an interesting example of aerospace computers at the beginning of the 1970s. Even though integrated circuits were primitive at the time, with just a few transistors per chip, aerospace computers used these chips and high-density packaging to build computers that were compact, reliable, and low power. These miniature computers controlled aircraft, missiles, and spacecraft, worlds away from the room-filling mainframes that attracted most of the attention.

Thanks to Usagi Electric for providing the aerospace computer. Eric Schlaepfer and Marc Verdiell helped with the analysis. Thanks to Don Straney for his research and comments. Various commenters on Reddit and Twitter provided suggestions. Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff.

Notes and references

The indicators have a blank position, so there are 11 electromagnets. However, only the ten electromagnets associated with digits are used in the device. The N/S/E/W indicators have a square box in one of the positions, which probably is not used. ↩
Signetics had multiple temperature ranges for the 400-series low-power ICs. The RE prefix indicated ultra high reliability aerospace components rated for a temperature range of -55°C to +125°C. The SE prefix on the chips in this unit indicated military airborne chips with the same temperature range. A NE or ST prefix indicated military prototype or industrial chips with a smaller temperature range (0°C to +70°C). A SP prefix indicated the commercial temperature rating, from +15°C to +55°C. A J suffix indicated a flat pack and an A suffix indicated a dual in-line pack (DIP). ↩
NASA computers are the only documented systems that I could find that used these Signetics chips. One possible conclusion is that NASA was the only organization to use these chips. However, it is likely that other companies used these chips but didn't document them as thoroughly as NASA. That is, detailed circuitry for military aerospace computers is unlikely to be on the Internet. ↩
Halex also made hybrid microcircuits, such as flip-flops, so these packages could be more complex than resistor networks. However, I think a resistor network is more likely. ↩
One of the circuit boards had the number "45333000" on it, along with a symbol like "+I-", as shown below.

Closeup of a circuit board showing a number, maybe identifying the board.

One board also had a mysterious symbol that resembles "mw". I couldn't match these symbols to any manufacturers, and it is unclear if they are logos, fiducials, or other symbols.

Closeup of a circuit board showing the "mw" mark.

↩

[#] Sun Jun 23 2024 08:59:00 UTC from rss <>

Subject: Inside the tiny chip that powers Montreal subway tickets

To use the Montreal subway (the Métro), you tap a paper ticket against the turnstile and it opens. The ticket works through a system called NFC, but what's happening internally? How does the ticket work without a battery? How does it communicate with the turnstile? And how can it be so cheap that you can throw the ticket away after one use? To answer these questions, I opened up a ticket and examined the tiny chip inside.

The image below shows the chip inside the ticket, highly magnified. The four golden squares in the corner are the connections to the antenna. The tan-colored lines are the metal wiring layer on top of the chip; the thickest lines wire the antenna to other parts of the chip. The darker region that takes up the majority of the chip is the chip's digital logic. To the left is the analog circuitry that handles the signal from the antenna.

The MIFARE Ultralight die under the microscope. (Click this image (or any other) for a larger view.

The chip uses NFC (Near-Field Communication). The idea behind NFC is that a reader (i.e. the turnstile) and an NFC tag (i.e. the ticket) communicate over a short distance through magnetic fields, allowing them to exchange data. The reader generates a magnetic field that both powers the tag and sends data to the tag. Both the reader and the tag have coil-like antennas so the reader's magnetic field can be picked up by the tag.1 When you tap your ticket on the turnstile, the NFC communication happens in 35 milliseconds, faster than an eyeblink. The data provided by the NFC tag shows that you have a valid ticket and then you can enter the subway.

The photo below shows the subway ticket, made of printed paper.2 At the right, the ticket appears to have golden smart-card contacts, like a credit card with an EMV chip. However, those contacts are completely fake, just printed onto the card with ink, and there is no chip there. Presumably, the makers thought that making the card look like a smart card would help people understand it. The card actually uses an entirely different technology.

A Montreal subway card. This card is for occasional use and is disposable. Regular travel uses a rigid plastic card containing a different chip.

Although the subway card is paper on the outside, its core is a thin plastic sheet, shown below. The sheet has a coiled antenna made from a layer of metal foil. If you look closely, you can see the tiny NFC chip in the lower left, a black speck connected to two sides of the antenna wire.3 The diagonal metal stripe in the upper left makes the antenna into a loop; topologically, a spiral antenna won't work on a 2-D sheet, so the diagonal bridge completes the circuit.

The antenna and chip inside the subway card.

I want to emphasize the absurdly small size of the chip: 570 µm × 485 µm. The photo below shows that it is about the size of a grain of salt. The chip is also extremely thin—75 µm or 120 µm—so you can't even feel the chip inside the ticket.

The chip next to grains of salt. I composited two images, one illuminated from above to show the die and one illuminated from below to show the salt.

Functions of the chip

There are many different types of NFC chips with varying levels of functionality. 4 This one is called the MIFARE Ultralight EV1,5 a low-cost chip designed for one-time ticketing applications. The basic function of the Ultralight chip is simple: providing a block of data to the reader. The chip holds its data in a small EEPROM; this chip has 48 bytes of user memory, while another variant has 108 bytes of user memory.

The Ultralight chip lacks the cryptography support found in more advanced chips. The Ultralight isn't much more secure than a printed ticket with a QR code or barcode, like you'd download for a show. It's up to the reader to validate the data and make sure the same ticket isn't being used multiple times.6

The Ultralight chip has a few features beyond a printed ticket, though. The chips are manufactured with a unique 7-byte identification code (UID). Moreover, the UID is signed, ensuring that fake UIDs cannot be generated.7 The chip also supports password-protected memory access and locking of memory pages to prevent modification. Since the password is transmitted without encryption, the security is weak, but better than nothing.8

Another interesting feature of the chip is the one-way counter. The chip has three 24-bit counters that can be incremented but not decremented. The counters can be used to allow the ticket to be used a particular number of times, for instance.9

Photographing the chip

To photograph the chip, I went through several steps to remove the chip from the ticket and then strip the chip down to the bare silicon. First, to extract the plastic sheet with the chip and the antenna from the paper ticket, I simply soaked the ticket in water. This turned the paper into mush, which could be scraped off to reveal the plastic core. Next, I cut out a small square of plastic that included the chip and put it in boiling sulfuric acid for about 30 seconds. This removed the plastic and adhesive, leaving the silicon die. (I try to avoid boiling acids, but processing a tiny chip like this only required a few drops of sulfuric acid, minimizing the risk.)

The die was covered with a passivation layer to protect its surface, a sandwich of silicon nitride and PSG (phosphosilicate glass) 1.1 µm thick according to the datasheet. The chip's underlying circuitry was visible, but slightly hazy due to this layer. I removed the passivation layer by boiling the chip in phosphoric acid for a few minutes. The image below shows the chip after this step. The top metal layer is much more visible, although some of the metal was dissolved by the acid. The thick metal lines connect the four bond pads to various parts of the analog circuitry, while many thin vertical metal lines provide interconnections of the logic circuitry.

The die after treatment with phosphoric acid to remove the passivation layer. Click for a much larger version.

Next, I treated the die with several cycles of treatment with Armour Etch to dissolve the oxide layer and hydrochloric acid to dissolve the metal. I think the chip had three layers of metal wiring on top of the silicon. Unfortunately, my process doesn't remove the metal layers cleanly, but causes them to come off in chaotic tangles. Since I wasn't interested in tracing the circuitry layer-by-layer, this wasn't a significant problem.

With the metal layers and polysilicon removed, I was left with the bare silicon. At this point, the underlying structure of the chip is visible. The doped silicon regions show the transistors, although they are extremely small at this scale. The white rectangles are capacitors. The chip has capacitors for many reasons: producing the right resonant frequency with the antenna, filtering the power, and boosting the voltage with charge pumps.

The die after stripping it down to the silicon.

My biggest concern while processing this chip was to avoid losing it. With a chip this small, bumping the chip or even breathing on it can send the chip flying perhaps never to be seen again. Even trying to pick up the chip with tweezers is risky, since it can easily pop out and disappear. It's no fun examining the floor, inch by inch, trying to figure out if a speck is the lost chip or a bit of dirt. I found that the best way to move the chip between processing and a microscope slide was to put the chip in a few drops of water and move it with a pipette. Even so, there were a couple of times that I lost track of the chip and had to check some specks under the microscope to determine which was the chip and which were dirt.

Overview of the chip

The block diagram below shows the high-level structure of the chip. At the left, the antenna is connected to the RF interface, the analog circuitry that converts the high-frequency signals into digital data. This circuitry also extracts power from the antenna's signal to power the chip.

Block diagram of the MIFARE Ultralight chip, from the datasheet.

The majority of the chip contains digital logic to process the 18 different commands that it can receive from the reader. Some commands, such as Wake-up or Halt control the chip's state. Other commands, such as Read or Write provide access to the EEPROM storage. The specialized Read_Cnt and Incr_Cnt commands access the chip's counters.

The chip has an "intelligent anticollision function" that allows multiple cards to be read without conflict if they are presented to the reader simultaneously. If a conflict is detected, the reader uses a standard NFC algorithm to select the cards one at a time, based on their identification numbers. The anticollision algorithm uses four of the chip's commands.

Finally, the chip has an EEPROM to store its data. Unlike RAM, the EEPROM holds data even when unpowered; it is designed to hold data for 10 years. To store data in the EEPROM, it must be written with a higher voltage than the rest of the chip uses. The EEPROM interface circuit produces the necessary signals.

The diagram shows the chip with its functional blocks labeled. The majority of the die is occupied with digital logic; I'll explain below how it is implemented with standard-cell logic. At the top is the EEPROM, a square of storage cells. To the right of the EEPROM is a charge pump, a circuit to boost the voltage through switched capacitors. The EEPROM interface circuitry is between the EEPROM and the digital logic.

The die, stripped down to the silicon, with presumed functional blocks labeled.

The remainder of the chip contains analog circuitry that is harder to interpret, so my labels are somewhat speculative. The four bond pads are where the antenna is connected to the chip. There are four pads to support two parallel antennas if desired. The first die photo shows the metal wiring between the bond pads and the structures that I've labeled as RF transistors and RF diodes. The "RF transistors" in the upper left are large, oval-shaped structures. These may be the transistors that send data back to the reader by modifying the load. Alternatively, they could be Zener diodes to regulate the voltage powering the chip, since Zener diodes often have an oval shape. The "RF diodes" at the bottom may rectify the signal from the antenna, producing the power for the chip. The rectified signal is also demodulated and processed by the analog logic to extract the digital data sent from the reader.

Sending data from the tag to the reader: load modulation

You might expect the tag to send data back to the receiver by transmitting a signal through the antenna. However, transmitting a signal takes power and the tag doesn't have much power available, just the power that it extracts from the reader's signal. Instead, the tag uses a clever technique called load modulation to send data to the reader. The idea is that if the tag changes the load across the antenna, it will absorb more or less energy from the reader. The reader can detect this change as a small variation in voltage across its transmitting antenna. Thus, the tag can dynamically change its load to send data back to the reader. Even though the signal produced by load modulation is extremely weak (80 dB less than the transmitted signal), the reader can detect it and extract the data.

In more detail, the reader transmits at a carrier frequency of 13.56 MHz.10 To send data back, the tag switches its load on and off at 848 kHz (1/16 of the carrier frequency), producing a subcarrier on top of the reader's signal. To transmit bits, this load modulation is switched on or off to transmit 106 kilobits per second (1/8 of the modulation frequency). The reader, in turn, extracts the subcarrier with a filter to receive the data bits from the tag.

An NFC tag can apply a load that is either a resistor or a capacitor; a resistor absorbs the signal directly, while a capacitor changes the antenna's resonant frequency and thus the amount of signal transferred to the tag. The die contains many capacitors, but I didn't see any significant resistors, so I suspect that this chip uses a capacitor for the load.

The chip's manufacturing process

The image below shows an extreme closeup of the die. The red box surrounds a region of doped silicon, forming five MOS transistors in series. Each dark vertical line corresponds to the gate of one transistor so the width of this line corresponds to the feature size. I estimate that the chip's feature size is 180 nm. In comparison, the wavelength of visible light is 400-700 nm. Since the features are smaller than the wavelength of light, it's not surprising that image appears blurry.

A closeup of the die, pushing the limits of my microscope.

The 180 nm process was popular in the late 1990s. These features are very large, however, compared to recent chips with features that are a few nanometers across. At the time the MIFARE Ultralight EV1 chip was released (October 2012), the newest semiconductor manufacturing process was 22 nm, so the 180 nm process they used was old even then.

However, it makes sense that the chip would be manufactured with an older process for several reasons. First, much of the chip's area is occupied by analog circuitry and the four bond pads, so shrinking the digital logic won't reduce the overall size much. Moreover, a significantly smaller chip would be impractical to attach to the antenna; I expect even the current chip is a pain to mount. Finally, this chip is designed for the extremely low-cost (i.e. disposable) market, so the chip is manufactured as inexpensively as possible. With a more modern process, more chips would fit on a wafer, dropping the price, but manufacturing each wafer would be more expensive, so there is a tradeoff.

Standard-cell logic

The chip's digital circuitry is implemented with standard-cell logic, a common way of implementing digital logic. The idea behind standard-cell logic is to use automated tools to create the chip layout from a description of the desired logic. The process starts with a library of standard cells. Each cell is a standardized implementation of a simple circuit such as a NAND gate or a flip-flop. The cells are designed so they have a fixed height and can be arranged in rows. The cells are then connected by metal wiring on top of the cells to produce the desired circuitry. Although the resulting circuitry isn't as dense and efficient as a fully customized and optimized layout, standard cell logic is much faster (and thus cheaper) to design than a hand-tuned layout. Thus, standard-cell logic has been heavily used for integrated circuit design since the 1980s.

The photo below shows four rows of gates implemented with standard cell logic, The chip (like most modern chips) uses CMOS logic, with each logic gate built from two types of transistors: NMOS and PMOS. To simplify manufacturing, the NMOS and PMOS transistors are arranged in separate rows. Thus, each row of logic consists of a row of PMOS transistors on top and a row of NMOS transistors below, or vice versa. Due to the physics of semiconductors, the PMOS transistors are larger, which allows the transistor types to be distinguished in the image.

A closeup of the standard cell logic.

Looking at some of the cells and extrapolating, I estimate about 8000 gates in the logic section with about 45,000 transistors. One question is if the chip is implemented as a hardcoded state machine, or if it contains a processor (microcontroller). The transistor count is barely large enough to implement a simple microcontroller such as an 8051, but that wouldn't leave many transistors left over for other necessary circuitry. If a microcontroller were present, it would need software stored somewhere. Given the simplicity of the protocol and the relatively small number of transistors, my guess is that the chip is implemented in hardware (state machines and counters) rather than through a microcontroller.

The diagram below shows how a standard cell implements a 2-input NAND. (This cell is from the Intel 386, not the NFC chip, but the structures are similar.) The cell contains four transistors. The yellow region is the P-type silicon that forms two PMOS transistors; the transistor gates are where the polysilicon (red) crosses the yellow region. (The middle yellow region is the drain for both transistors; there is no discrete boundary between the transistors.) Likewise, the two NMOS transistors are at the bottom, where the polysilicon (red) crosses the active silicon (green). The blue lines indicate the metal wiring for the cell. The black circles are contacts, connections between the metal and the silicon or polysilicon. Finally, the well taps are the opposite type of silicon, connected to the underlying silicon well or substrate to keep it at the proper voltage.

A standard cell for NAND in the Intel 386.

EEPROM

The chip stores its data in an EEPROM, similar to flash memory. The chip provides 640 or 1312 bits of EEPROM, based on the part number; I believe both versions use the same EEPROM implementation, but the cheaper version limits the amount that can be used. I think the EEPROM is the matrix shown below, with row and column drive circuitry to the right and below. (The diagonal lines are accidental scratches while I was processing the chip.)

A closeup of the presumed EEPROM circuitry on the die.

In the photo, the EEPROM appears to be a 64×64 grid, 4K bits of storage rather than the advertised 1312 bits. There are several possible explanations. First, I could be miscounting the capacity (it is easy to be off by a factor of 2, depending on the cell structure). Second, the chip stores data that isn't reflected in the EEPROM memory map; for instance, the one-way counters and the UID signature are not included in the EEPROM storage count. Another possibility is that the extra EEPROM space holds code for a microcontroller (if the chip has one).

An EEPROM requires a relatively high voltage (10-20V) to force electrons into the storage cell for a bit. This voltage is generated by a charge pump circuit that switches capacitors at high frequency to boost the voltage. To the right of the EEPROM is a circuit with several large capacitors, presumably the charge pump.

Conclusions

It's remarkable that these NFC chips can be manufactured so cheaply that they are disposable. To keep the price down, the chips are sold by the wafer and then mounted in the tickets.11 You can buy an eight-inch silicon wafer with the chips for $9000 from Digikey. This may seem expensive until you realize that a single wafer provides an astonishing 100,587 chips, yielding a per-chip price of nine cents. According to the datasheet, a wafer has 103,682 potential good dies per wafer (PGDW). Some dies will be faulty, of course, so the wafer comes with a file telling you which dies are the good ones, 97% of them. (During the manufacturing of a typical chip, the faulty ones are marked with a spot of ink. But that won't work in this case since each die is much smaller than an ink spot.) If you need more chips, you can buy a 12" wafer for $19,000, providing 215,712 chips. A ticket manufacturer mounts each chip on an antenna sheet and then prints the ticket, adding a few cents to the cost of the ticket. The result is an inexpensive ticket that can be used once and discarded.

I'll leave you with one last die photo. In my first attempt at processing the chip, I treated it with Armour Etch. Although this failed to remove the passivation layer, it thinned it slightly, enough to generate some wild colors due to thin-film interference. I call this the "tie die".

The die after treatment with Armour Etch.

Follow me on Twitter @kenshirriff or RSS for more. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in this type of chip, a few years ago, I looked at two RFID race timing chips, the Monza R4 and Monza R6.

Notes and references

Because the card and the reader are positioned close together, the two antennas use "inductive coupling", coupled by magnetic fields rather than radio waves. That is, the two antennas act like transformer windings, transmitting the signal from the reader to the card. ↩
The Montreal subway uses multiple types of cards. In this blog post, I examine the Occasional card (L'Occasionnelle). This is a non-rechargeable card that works for a single trip or up to three days, and then is discarded. For long-term usage, Montreal uses the Opus card, which provides more security and implements the Calypso standard. An Opus card is plastic rather than paper, giving it a longer life. The Calypso standard is much more secure, using cryptography such as AES, DES, and ECC (spec) and provides much larger EEPROM storage. Thus, the transit system uses the Occasional card for cheap, disposable tickets and the Opus card for a long-term ticket, where spending a dollar or two on the physical card isn't an issue.

I haven't examined an Opus card, so I don't know what type of chip it uses or even who manufactures the chip. Many companies produce Calypso cards, for instance, the STMicroelectronics CD21 Calypso chip is based on an Arm core. ↩
If you look closely at the lower right corner of the NFC card, it has three positions that can hold a chip, with the chip in position #3. Presumably, this allows three different NFC chips to be mounted in one card, so one card could have three functions. The NFC protocol is designed to avoid collisions if multiple chips respond, so the three chips won't interfere with each other. ↩
You can easily examine NFC cards like this using your phone, with an app such as NFC Tools or NXP's Taginfo. Tapping a card will display the type of the card and allow the memory to be read (subject to security restrictions). It's entertaining to tap various NFC cards and see what type of chip they use; I found that hotels typically use the MIFARE Classic chip, more advanced than the MIFARE Ultralight chip in the subway ticket.

The NFC Tools app shows that this card is a MIFARE Ultralight EV1.

↩
The part number, as provided by the chip, is MF0UL1101DUx. "MF0UL" indicates the MIFARE Ultralight EV1, a chip in the Ultralight family manufactured by NXP. An "H" if present indicates 50 pF input capacitance, rather than 17 pF in the chip I examined, allowing a different antenna. Next, "1" indicates a chip with 384 bits of user memory, while "2" would indicate 1024 bits. This is followed by "101D", and then a code indicating the specific package: "U" indicates a wafer, while "A" indicates a plastic leadless module carrier (LCC). Other characters specify the wafer diameter and thickness. ↩
It is instructive to think about the security of a printed ticket for a concert with a barcode. You could print out a hundred copies of the ticket, but it will only get you into the concert once. (This assumes that the venue has a centralized database so they can keep track of which tickets have been scanned.) Most of the security is implemented in the backend system, not the ticket itself. The ticket numbers need to be unforgeable, either by generating random numbers or using cryptography. (If the tickets just have QR codes with the numbers 1 to 100, for instance, it would be trivial to make fake tickets.) Moreover, there is nothing to ensure that the person scanning the ticket is legitimate; someone malicious could scan your ticket in line, print out a copy, and get into the concert instead of you. The MIFARE Ultralight chip is similar to a paper ticket in many ways with only slightly more security. ↩
The UID signing is done with an ECC (elliptic-curve cryptography) algorithm. Note that the chip doesn't need any cryptographic support for this; the chip just holds the signature that was programmed during manufacturing. As far as the chip is concerned, it is just providing some stored bytes. ↩
The MIFARE Ultralight has enough security to work as a limited-use ticket, but more advanced applications such as reloadable stored-value cards require a chip that supports encryption such as the DESFire. This allows the market to be partitioned, with the inexpensive Ultralight supporting the low-end market, while the more costly DESFire is required for more advanced applications.

There are many types of MIFARE cards and it's hard to keep them straight, but the diagram below from NXP may help. The different families are arranged left to right: Ultralight, Classic, Plus, DESFire, and SmartMX. The Y dimension indicates the official security certification level. The Z dimension (front to back) shows the evolution within a family over time. I've added a red arrow to indicate the "Ultralight EV1" chip, the focus of this blog post. (Personally, if you need a three-dimensional diagram to explain your product line, the product line may be excessively complicated.)

The various MIFARE NFC types. Diagram from aMIFARE Plus Product Family.

↩
In more detail, a 3-byte counter can be incremented by a specified value until it reaches the all-1's state (0xFFFFFF), at which point it stops. If you wanted to allow, say, 5 uses of a ticket, you could initialize the counter to all-1's minus 5. Then the counter could be incremented 5 times before reaching the limit.

One complication is that the counters have an "anti-tearing" feature for additional security. The problem is that if you tear the card away from the reader in the middle of an update, there is a possibility for counters to be partially updated, yielding a bad result. The anti-tearing feature ensures that a counter will be atomically updated, avoiding a partial update. ↩
There are multiple NFC standards with differences in speed, protocol, and range, including NFC-A, NFC-B, NFC-C, NFC-F, and NFC-V. The MIFARE Ultralight cards use NFC-A, which is defined by the standard "ISO/IEC 14443 Type A". Annoyingly, each part of the standard costs $70. The NFC Forum Analog Technical Specification provides a lot of detail, though. ↩
Instead of a wafer, you can buy the chips on tape but it costs more than twice as much. ↩

[#] Sun Jul 07 2024 10:38:09 UTC from rss <>

Subject: Standard cells: Looking at individual gates in the Pentium processor

Intel released the powerful Pentium processor in 1993, a chip to "separate the really power-hungry folks from ordinary mortals." The original Pentium was followed by the Pentium Pro, the Pentium II, and others, spawning a long-running brand of high-performance processors, Intel's flagship line until the Core processors took over in 2006. The Pentium eventually became virtually synonymous with "PC" and even made it into pop culture.

Even though the Pentium is a complex chip with 3.3 million transistors, its transistors are visible under a microscope, unlike modern chips. By examining the chip, we can see the interesting circuits used for gates, flip-flops, and other circuits, including the use of an unusual technology called BiCMOS. In this article, I take a close look at the original Pentium chip1, showing how much of its circuitry was built out of structured rows of tiny transistors, a technique known as standard-cell design.

The die photo below shows the Pentium's fingernail-sized silicon die under a microscope. I removed the chip's four metal layers to show the underlying silicon, revealing the individual transistors, which are obscured in most die photos by the layers of metal. Standard-cell circuitry, indicated by red boxes, is recognizable because the circuitry is arranged in uniform columns of cells, giving it a characteristic striped appearance. In contrast, the chip's manually-optimized functional blocks are denser and more structured, giving them a darker appearance. Examples are the caches on the left, the datapaths in the middle, and the microcode ROMs on the right.

Die photo of the Intel Pentium processor with standard cells highlighted in red. The edges of the chip suffered some damage when I removed the metal layers. Click this image (or any other) for a larger version.

Standard-cell design

Early processors in the 1970s were usually designed by manually laying out every transistor individually, fitting transistors together like puzzle pieces to optimize their layout. While this was tedious, it resulted in a highly dense layout. Federico Faggin, designer of the popular Z80 processor, was almost done when he ran into a problem. The last few transistors wouldn't fit, so he had to erase three weeks of work and start over. The closeup of the resulting Z80 layout below shows that each transistor has a different, complex shape, optimized to pack the transistors as tightly as possible.2

A closeup of transistors in the Zilog Z80 processor (1976). This chip is NMOS, not CMOS, which provides more layout flexibility. The metal and polysilicon layers have been removed to expose the underlying silicon. The lighter stripes over active silicon indicate where the polysilicon gates were. I think this photo is from the Visual 6502 project but I'm not sure.

Because manual layout is slow, difficult, and error-prone, people developed automated approaches such as standard-cell.3 The idea behind standard-cell is to create a standard library of blocks (cells) to implement each type of gate, flip-flop, and other low-level component. To use a particular circuit, instead of arranging each transistor, you use the standard design from the library. Each cell has a fixed height but the width varies as needed, so the standard cells can be arranged in rows. The Pentium die photo below seven cells in a row. (The rectangular blobs are doped silicon while the long, thin vertical lines are polysilicon.) Compare the orderly arrangement of these transistors with the Z80 transistors above.

Some standard cell circuitry in the Pentium. I removed the metal to show the underlying silicon and polysilicon.

The photo below zooms out to show five rows of standard cells (the dark bands) and the wiring in between. Because CMOS circuitry uses two types of transistors (NMOS and PMOS), each standard-cell row appears as two closely-spaced bands: one of NMOS transistors and one of PMOS transistors. The space between rows is used as a "wiring channel" that holds the wiring between the cells. Power and ground for the circuitry run along the top and bottom of each row.

Some standard cells in the Pentium processor.

The fixed structure of standard cell design makes it suitable for automation, with the layout generated by "automatic place and route" software. The first step, placement, consists of determining an arrangement of cells that minimizes the distance between connected cells. Running long wires between cells wastes space on the die, since you end up with a lot of unnecessary metal wiring. But more importantly, long paths have higher resistance, slowing down the signals. Once the cells are placed in their positions, the "routing" step generates the wiring to connect the calls. Placement and routing are both difficult optimization problems that are NP-complete.

Intel started using automated place and route techniques for the 386 processor, since it was much faster than manual layout and dramatically reduced the number of errors. Placement was done with a program called Timberwolf, developed by a Berkeley grad student. As one member of the 386 team said, "If management had known that we were using a tool by some grad student as a key part of the methodology, they would never have let us use it." Intel developed custom software for routing, using an iterative heuristic approach. Standard-cell design is still used in current processors, but the software is much more advanced.

A brief overview of CMOS

Before looking at the standard cell circuits in detail, I'll give a quick overview of how CMOS circuits are implemented. Modern processors are built from CMOS circuitry, which uses two types of transistors: NMOS and PMOS. The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain regions (green) consist of silicon doped with impurities to change its semiconductor properties, forming N+ silicon. The gate consists of a layer of polysilicon (red), separated from the silicon by a very thin insulating oxide layer. Whenever polysilicon crosses active silicon, a transistor is formed.

Diagram showing the structure of an NMOS transistor.

The NMOS and PMOS transistors are opposite in their construction and operation. A PMOS transistor swaps the N-type and P-type silicon, so it consists of P+ regions in a substrate of N silicon. In operation, an NMOS transistor turns on when the gate is high, while a PMOS transistor turns on when the gate is low.4 An NMOS transistor is best at pulling its output low, while a PMOS transistor is best at pulling its output high. In a CMOS circuit, the transistors work as a team, pulling the output high or low as needed; the "C" in CMOS indicates this "Complementary" approach. NMOS and PMOS transistors are not entirely symmetrical, however, due to the underlying semiconductor physics. Instead, PMOS transistors need to be larger than NMOS transistors, which helps to distinguish PMOS transistors from NMOS transistors on the die.

The layers of circuitry in the Pentium

The construction of the Pentium is more complicated than the diagram above, with four layers of metal wiring that connect the transistors.5 Starting at the surface of the silicon die, the Pentium's transistors are similar to the diagram, with regions of silicon doped to change their semiconductor properties. Polysilicon wiring is created on top of the silicon. The most important role of the polysilicon is that when it crosses doped silicon, a transistor is formed, with the polysilicon as the gate. However, polysilicon is also used as wiring over short distances.

Above the silicon, four layers of metal connect the components: multiple metal layers allow signals to crisscross the chip without running into each other. The metal layers are numbered M1 through M4, with M1 on the bottom. A few rules control the wiring: a metal layer can connect with the layer above or below through a tungsten plug called a "via". Only the bottom metal, M1, can connect to the silicon or polysilicon, through a "contact". The layers usually alternate between horizontal wiring and vertical wiring (at least locally). Thus, a signal from a transistor may travel through M1, bounce up to M2 and M3 to cross other signals, and then go back down to M1 to connect to another transistor. As you can see, automated place and route software has a complicated task, producing millions of complicated wiring paths as densely as possible.

The diagram below shows how the layers appear on the chip. (This photo shows one of the rare spots on the chip where all the layers are visible.) The M4 metal layer on top of the chip is the thickest, so it is mostly used for power, ground, and clock signals rather than data. An M4 ground wire covers the top of this photo. The next layer down is M3. In this part of the chip, M3 lines run vertically. (Due to optical effects, the vertical M3 lines may look like they are on top of M4, but they are below.) The horizontal M2 metal lines are lower and appear brown rather than golden, due to the oxide layers that cover them. The bottom metal layer is M1. The vertical M1 lines are thick in this part of the chip because they provide power to the circuitry.

The Pentium is constructed with four layers of metal. Because the chip has a three-dimensional structure, I used focus stacking to get a clearer image.

The silicon and polysilicon are mostly obscured in the above photo. By removing all the metal layers, I obtained the image below. This image shows the same region as the image above, but it is hard to see the correlation because the metal layers almost completely obscure the silicon. The orderly columns of transistors reveal the standard-cell design. The irregular dark regions are doped silicon, which forms the chip's transistors. The dark or shiny horizontal bands are polysilicon. I will explain below how these regions form gates and other circuits.

A closeup of the silicon and polysilicon.

Inverter

The fundamental CMOS gate is an inverter, shown in the schematic below. The inverter is built from one PMOS transistor (top) and one NMOS transistor (bottom). If the gate input is a "1", the bottom transistor turns on, pulling the output to ground (0). A "0" input turns on the top transistor, pulling the output high (1). Thus, this two-transistor circuit implements an inverter.10

Schematic diagram of a CMOS inverter.

The diagram below shows two views of how a standard-cell inverter appears on the Pentium die, with and without metal. The inverter consists of two transistors, just like the schematic above. The input is connected to the two polysilicon gates of the transistors. The metal output wire is connected to the two transistors (the left sides, specifically).

A standard-cell CMOS inverter in the Pentium.

In more detail, the image on the left includes the bottom (M1) metal layer, but I removed the other metal layers. Two thick metal lines at the top and bottom provide power and ground to the standard cells. The multiple dark circles are contacts between the M1 metal layer and the metal layer on top (M2), providing a path for power and ground that eventually reaches the top (M4) metal layer and then the chip's pins. (The power and ground wires are thick to provide sufficient current to the circuitry while minimizing voltage drops and noise.) The small, lighter circles are vias that connect the M1 metal layer to the underlying silicon or polysilicon. The input to the gate is provided from the M2 metal, which connects to the M1 layer at the indicated contact. The smaller black dots at the top and bottom of this metal strip are vias, connections to the underlying silicon.

For the image on the right, I removed all four metal layers, revealing the polysilicon and doped silicon. Recall that a transistor is constructed from regions of doped silicon with a stripe of polysilicon between the regions, forming the transistor's gate. The diagram shows the two transistors that form the inverter. When combined with the metal wiring, they form the inverter schematic shown earlier. The final feature is the "well tap". The PMOS transistors are constructed in a "well" of N-doped silicon. The well must be kept at a positive voltage, so periodic "taps" connect the well to the +3.3V supply. As mentioned earlier, the PMOS transistor is larger than the NMOS transistor, which allowed me to figure out the transistor types in the photo.

By the way, the chip is built with a 600 nm process, so the width of the polysilicon lines is approximately 600 nm. For comparison, the wavelength of visible light is 400 to 700 nm, with 600 nm corresponding to orange light. This explains why the microscope photos are somewhat fuzzy; the features are the size of the wavelength of light.6

NAND gate

Another common gate in the Pentium is the NAND gate. The schematic below shows a NAND gate with two PMOS transistors above and two NMOS transistors below. If both inputs are high, the two NMOS transistors turn on, pulling the output low. If either input is low, a PMOS transistor turns on, pulling the output high. (Recall that NMOS and PMOS are opposites: a high voltage turns an NMOS transistor on while a low voltage turns a PMOS transistor on.) Thus, the CMOS circuit below produces the desired output for the NAND function.

Schematic of a CMOS NAND gate.

The implementation of the gate as a standard cell, below, follows the schematic. The left photo shows the circuit with one layer of metal (M1). A thick metal line provides 3.3 volts to the gate; it has two contacts that provide power to the two PMOS transistors. The metal line for ground is similar, except only one NMOS transistor is grounded. The thinner metal in the middle has two contacts to get the transistor outputs and a via to connect the output to the M2 metal layer on top. Finally, two tiny bits of M1 metal connect the inputs from the M2 layer to the underlying polysilicon.

Implementation of a CMOS NAND gate as a standard cell.

The right photo shows the circuit with all metal removed, showing the polysilicon and silicon. Since a transistor is formed where a polysilicon line crosses doped silicon, the two polysilicon lines create four transistors. Polysilicon functions both as local wiring and as the transistor gates. In particular, the inputs can be connected at the top or bottom of the circuit (or both), depending on what works best for wiring the circuitry. Note that the transistors are squashed together so the silicon in the middle is part of two transistors. An important asymmetry is that the output is taken from the middle of the PMOS transistors, wiring them in parallel, while the output is taken from the right side of the NMOS transistors, wiring them in series.

Zooming out a bit, the photo below shows three NAND gates. Although the underlying standard cell is the same for each one, there are differences between the gates. At the top, horizontal wiring links the inputs to M2 through vias. The length of each polysilicon line depends on the position of the metal. Moreover, in the middle of each gate, the metal connection to the output is positioned differently. Finally, note that the power wiring shifts upward in the upper right corner; this is to make room for a larger cell to the right. The point is that the standard cells aren't simply copies of each other, but are adjusted in each case to put the inputs, outputs, and power in the right location. Also note that these standard cells are not isolated, but are squeezed together so the PMOS transistors are touching. This optimization slightly increases the density.

Three NAND gates in the Pentium.

OR-NAND gate

The standard cell library includes some complex gates. For instance, the gate below is a 5-input OR-NAND gate, computing ~((A+B+C+D)⋅E). In the NMOS circuit, transistors A through D are paralleled while E is in series. The PMOS circuit is the opposite, with A through D in series and E in parallel. To provide sufficient current, the PMOS circuit has two sets of transistors for A through D, so the PMOS block is much larger than the NMOS block.

The OR-NAND gate as it appears on the die. The left image shows the M1 metal layer while the right image shows the silicon and polysilicon.

Latch

One of the key building blocks of the Pentium's circuitry is the latch. The idea of the latch is to hold one bit, controlled by the clock signal. A latch is "transparent": the latch's input immediately appears on the output while the clock is high. But when the clock is low, the latch holds its previous value. The latch is implemented with a feedback loop that passes the latch's output back into the latch. The heart of this latch circuit is the multiplexer (mux), which selects either the previous output (when the clock is low) or the new input (when the clock is high). The inverters amplify the feedback signal so it doesn't decay in the loop. An inverter also amplifies the output so it can drive other circuitry.

The circuit for a latch.

The circuit for a multiplexer is interesting since it uses "pass transistors". That is, the transistors simply pass their input through to the output, rather than pulling a signal to power or ground as in a typical logic gate. The schematic shows how this works. First, suppose that the select line is low. This will turn on the two transistors connected to the first input, allowing its level to flow to the output. Meanwhile, both transistors connected to the second input will be turned off, blocking that signal. But if the select line is high, everything switches. Now, the two transistors connected to the second input turn on, passing its level to the output. Thus, the multiplexer selects the first input if the control signal is low, and the second input if the control signal is high.

A multiplexer and its implementation in CMOS.

The diagram below shows a multiplexer, part of a latch. On the left, an inverter feeds into one input of the multiplexer.7 On the right is the other input to the multiplexer. The output is taken from the middle, between the pairs of the transistors.

A multiplexer as it appears on the Pentium die.

Note that the multiplexer's circuit is opposite, in a way, to a logic gate. In a logic gate, you want either the NMOS transistor on or the PMOS transistor on, so the output is pulled low or high respectively. This is accomplished by giving the signals on the transistor gates the same polarity, so the same polysilicon line runs through both transistors. In a multiplexer, however, you want the corresponding PMOS and NMOS transistors to turn on at the same time, so they can pass the signal. This requires the signals on the transistor gates to have opposite polarity. One polysilicon line runs through the right PMOS transistor and the left NMOS transistor. The other polysilicon line runs through the left PMOS transistor and the right NMOS transistor, connected by metal wiring (not shown). The multiplexer includes an inverter to provide the necessary signal, but I cropped it out of the diagram below.

The flip-flop

The Pentium makes extensive use of flip-flops. A flip-flop is similar to a latch, except its clock input is edge-sensitive instead of level-sensitive. That is, the flip-flop "remembers" its input at the moment the clock goes from low to high, and provides that value as its output. This difference may seem unimportant, but it turns out to make the flip-flop more useful in counters, state machines, and other clocked circuits.

In the Pentium, a flip-flop is constructed from two latches: a primary latch and a secondary latch. The primary latch passes its value through while the clock is low and holds its value when the clock is high. The output of the primary latch is fed into the secondary latch, which has the opposite clock behavior. The result is that when the clock switches from low to high, the primary latch stops updating its output at the same time that the secondary starts passing this value through, providing the desired flip-flop behavior.

A standard-cell flip-flop.

The photo above shows a standard-cell flop-flop, with an intricate pattern of metal wiring connecting the various sub-components. There are a few variants; with minor logic changes, the flip-flop can have "set" or "reset" inputs, bypassing the clock to force the output to the desired state. (Set and reset functions are useful for initializing flip-flops to a desired value, for example when the processor starts up.)

The BiCMOS buffer

Although I've been discussing CMOS circuits so far, the Pentium was built with BiCMOS, a process that allows circuits to use bipolar transistors in addition to CMOS. By adding a few extra processing steps to the regular CMOS manufacturing process, bipolar (NPN and PNP) transistors can be created. The Pentium made extensive use of BiCMOS circuits since they reduced signal delays by up to 35%. Intel also used BiCMOS for the Pentium Pro, Pentium II, Pentium III, and Xeon processors (but not the Pentium MMX). However, as chip voltages dropped, the benefit from bipolar transistors dropped too and BiCMOS was eventually abandoned.

The schematic below shows a standard-cell BiCMOS buffer in the Pentium chip.8 This circuit is more complex than a CMOS buffer: it uses two inverters, an NPN pull-up transistor, an NMOS pull-down transistor, and a PMOS pull-up transistor.9

Reverse-engineered schematic of the BiCMOS buffer.

In the die images below, note the circular structure of the NPN transistor, very different from the linear structure of the NMOS and PMOS transistors and considerably larger. A sign of the buffer's high-current drive capacity is the output's thick metal wiring, much thicker than the typical signal wiring.

A BiCMOS buffer in the Pentium.

Conclusions

Standard-cell layout is extensively used in modern chips. Modern processors, with their nanometer-scale transistors, are much too small to study under a microscope. The Pentium, on the other hand, has features large enough that its circuits can be observed and reverse engineered. Of course, with 3.3 million transistors, the Pentium is too much for me to reverse engineer in depth, but I still find it interesting to study small-scale circuits and see how they were implemented. This post presented a small sample of the standard cells in the Pentium. The full standard-cell library is much larger, with dozens, if not hundreds, of different cells: many types of logic gates in a variety of sizes and drive strengths. But the fundamental design and layout principles are the same as the cells described here.

One unusual feature of the Pentium is its use of BiCMOS circuitry, which had a peak of popularity in the 1990s, right around the era of the Pentium. Although changing tradeoffs made BiCMOS impractical for digital circuitry, BiCMOS still has an important role in analog ICs, especially high-frequency applications. The Pentium in a sense is a time capsule with its use of BiCMOS.

I hope that you have enjoyed this look at some of the Pentium's circuits. I find it reassuring to see that even complex processors are made up of simple transistor circuits and you can observe and understand these circuits if you look closely.

For more on standard-cell circuits, I wrote about standard cells in an IBM chip and standard cells in the 386 (the 386 article has a lot of overlap with this one). Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon occasionally as @kenshirriff@oldbytes.space.

Notes and references

In this blog post, I'm focusing on the "P54C" version of the original Pentium processor. Intel produced many different versions of the Pentium, and it can be hard to keep them straight. Part of the problem is that "Pentium" is a brand name, with multiple microarchitectures, lines, and products. At the high level, the Pentium (1993) was followed by the Pentium Pro (1995) Pentium II (1997), Pentium III (1999), Pentium 4 (2000), and so on. The original Pentium used the P5 microarchitecture, a superscalar microarchitecture that was advanced but still executed instruction in order like traditional microprocessors. The Pentium Pro was a major jump, implementing a microarchitecture called P6 that broke instructions into micro-operations and executed them out of order using dataflow techniques. The next microarchitecture version was NetBurst, first used with the Pentium 4. NetBurst provided a deep pipeline and introduced hyper-threading, but it was disappointingly slow and was replaced by the Core microarchitecture. The Core microarchitecture is based on the P6 and is Intel's current microarchitecture.

I'll focus now on the original Pentium, which went through several substantial revisions. The first Pentium product was the 80501 (codenamed P5), running at 60 or 66 MHz and using 5 volts. These chips were built with an 800 nm process and contained 3.1 million transistors.

The power consumption of these chips was disappointing, so Intel improved the chip, producing the 80502. These chips, codenamed P54C, used 3.3 volts and ran at 75-120 MHz. The chip's architecture remained essentially the same but support was added for multiprocessing, boosting the transistor count to 3.3 million. The P54C had a much more advanced clock circuit, allowing the external bust speed to stay low (50-66 MHz) while the internal clock speed—and thus performance—climbed to 100 MHz. The chips were built with a smaller 600 nm process with four layers of metal, compared to the previous three. Visually, the die of the P54C is almost the same as the P5, with the additional multiprocessing logic at the bottom and the clock circuitry at the top. For this article, I examined the P54C, but the standard cells should be similar in other versions.

Next, Intel moved to the 350 nm process, producing a smaller, faster Pentium chip, codenamed the P54CS; the die looks almost identical to the P54C (but smaller), with subtle changes to the bond pads. Another variant was designed for mobile use: the Pentium processor with "Voltage Reduction Technology" reduced power consumption by using a 2.9- or 3.1-volt supply for the core and a 3.3-volt supply to drive the I/O pins. These were built first with the 600 nm process (75-100 MHz) and then the 350 nm process (100-150 MHz).

The biggest change to the original Pentium was the Pentium MMX, with part number 80503 and codename P55C. This chip extended the x86 instruction set with 57 new instructions for vector processing. It was built on a 350 nm process before moving to 280 nm, and had 4.5 million transistors. More obscure variants of the original Pentium include the P54CQS, P54CS, P54LM, P24T, and Tillamook, but I won't get into them. ↩
Circuits that had a high degree of regularity, such as the arithmetic/logic unit (ALU) or register storage were typically constructed by manually laying out a block to implement the circuitry for one bit and then repeating the block as needed. Because a circuit was repeated 32 times for the 32-bit processor, the additional effort was worthwhile. ↩
An alternative layout technique is the gate array, which doesn't provide as much flexibility as a standard cell approach. In a gate array (sometimes called a master slice), the chip had a fixed array of transistors (and often resistors). The chip could be customized for a particular application by designing the metal layer to connect the transistors as needed. The density of the chip was usually poor, but gate arrays were much faster to design, so they were advantageous for applications that didn't need high density or produced a relatively small volume of chips. Moreover, manufacturing was much faster because the silicon wafers could be constructed in advance with the transistor array and warehoused. Putting the metal layer on top for a particular application could then be quick. Similar gate arrays used a fixed arrangement of logic gates or flip-flops, rather than transistors. Gate arrays date back to 1967. ↩
The behavior of MOS transistors is complicated, so the description above is simplified, just enough to understand digital circuits. In particular, MOS transistors don't simply switch between "on" and "off" but have states in between. This allows MOS transistors to be used in a wide variety of analog circuits. ↩
The earliest Pentiums had three layers of metal wiring, but Intel moved to a four-layer process with the P54C die, the version that I'm examining. ↩
To get this level of magnification with my microscope, I had to use an oil immersion lens. Instead of looking at the chip in air, as with a normal lens, I had to put a drop of special microscope oil on the chip. I carefully lower the lens until it dips into the oil (making sure I don't crash the lens into the chip). The purpose of the oil is that its index of refraction is almost the same as glass, much higher than air. This gives the lens a higher "numerical aperture", allowing the lens to resolve smaller details. ↩
For completeness, I'll mention that the inverter feeding the multiplexer inverter isn't exactly an inverter. Specifically, the inverter's two transistors are not tied together to produce an output. Instead, the inverter's NMOS transistor provides an input to the multiplexer's NMOS transistor and likewise, the PMOS transistor provides an input to the PMOS transistor. The omission of this connection does not affect the circuit's behavior, but it makes calling the circuit an inverter and a multiplexer a bit of an abstraction. ↩
Intel called this gate "BiNMOS" rather than "BiCMOS" because it uses a bipolar transistor and an NMOS transistor to drive the output, rather than two bipolar transistors. The Pentium's BiCMOS circuitry is described in a conference paper, showing a second NPN transistor to protect the first one. I don't see the second transistor on the die so the two transistors may be implemented in one silicon structure. Reference: R. F. Krick et al., “A 150 MHz 0.6 µm BiCMOS superscalar microprocessor,” IEEE Journal of Solid-State Circuits, vol. 29, no. 12, Dec. 1994, doi:10.1109/4.340418. ↩
The Pentium contains multiple types of BiCMOS standard cells, which I'll show in this footnote. The cell below is an inverter. It is similar to the BiCMOS buffer described earlier, except it lacks the first inverter in the circuit. To make room for the NPN transistor on the left, the PMOS transistors are shifted to the right. As a result, they don't line up with the PMOS transistors in other cells. This is a break from the traditional orderliness of standard cells.

A BiCMOS inverter with PMOS on the left and NMOS on the right. The input is at the bottom and the output is in the middle.

The BiCMOS inverter below is similar, except it uses two NPN transistors, providing more output drive. I removed the M1 metal layer to provide a better view of the transistors.

A BiCMOS inverter with two NPN transistors. The PMOS transistors are in the lower left and the NMOS transistors are in the lower right.

Another interesting BiCMOS circuit is the D flip-flop with enable and BiCMOS output, shown below. This is similar to the earlier flip-flop except it has an enable input, allowing it to either load a new value triggered by the clock, or to hold its earlier value. This allows the flip-flop to remember a value for more than one clock cycle. The additional functionality is implemented by another multiplexer, selecting either the old value or the new value. (This multiplexer is, in a way, one level higher than the multiplexer in each latch.) The transistor for the BiCMOS output is in the upper right, poking out from under the metal. (This circuit might be implemented as two independent cells, one for the flip-flop and one for the driver; I'm not sure.)

A D flip-flop in the Pentium.

↩
One puzzling inverter variant is used in a gate I'll call the "slow buffer". This buffer consists of two inverters, so it passes its input through to the output, buffered. The strange part is that the first inverter uses transistors with wide gates, which makes these transistors much weaker than regular transistors. As a result, the first inverter will be slow to switch states. My guess is that this circuit is used to delay signals, for example, to keep a signal aligned with another signal that is delayed by multiple logic gates.

The buffer consists of two inverters. The first inverter uses wide, weak transistors.

You might expect that larger transistors would be stronger, not weaker. The problem is that these transistors are larger in the wrong dimension. If you make the gate wider, the effect is similar to multiple transistors in parallel, providing more current. But if you make the gate longer (as in this case), the effect is similar to multiple transistors in series, so the resistances add and the total current is reduced. In most cases, transistors are constructed with the smallest gate length possible, which is determined by the manufacturing process, so the transistors here are unusual. This chip was manufactured with an 800 nm process, so the smallest gate length is approximately 800 nm. The gate width (the normal direction for variation) varies dramatically depending on the circuit, optimized to provide maximum performance. ↩

[#] Tue Jul 16 2024 09:43:53 UTC from rss <>

Subject: Inside an IBM/Motorola mainframe controller chip from 1981

In this article, I look inside a chip in the IBM 3274 Control Unit.1 But before I discuss the chip, I need to give some background on mainframes. (I didn't completely analyze the chip, so don't expect a nice narrative or solid conclusions.)

Die photo of the Motorola/IBM SC81150 chip. Click this image (or any other) for a larger version.

IBM's vintage mainframes were extremely underpowered compared to modern computers; a System/370 mainframe ran well under 1 million instructions per second, while a modern laptop executes billions of instructions per second. But these mainframes could support rooms full of users, while my 2017 laptop can barely handle one person.2 Mainframes achieved their high capacity by offloading much of the data entry overhead so the mainframe could focus on the "important" work. The mainframe received data directly into memory in bulk over high-speed I/O channels, without needing to handle character-by-character editing. For instance, a typical data entry terminal (a "3270") let the user update fields on the screen without involving the computer. When the user had filled out the screen, pressing the "Enter" key sent the entire data record to the mainframe at once. Thus, the mainframe didn't need to process every keystroke; it only dealt with complete records. (This is also why many modern keyboards have an "Enter" key.)

A room with IBM 3179 Color Display Stations, 1984. Note that these are terminals, not PCs. From 3270 Information Display System Introduction.

But that was just the beginning of the hierarchy of offloaded processing in a mainframe system. Terminals weren't attached directly to the mainframe. You could wire 16 terminals to a terminal multiplexer (such as the 3299). This would in turn be connected to a 3274 Control Unit that merged the terminal data and handled the network protocols. The Control Unit was connected to the mainframe's channel processor which handled I/O by moving data between memory and peripherals without slowing down the CPU. All these layers allowed the mainframe to focus on the important data processing while the layers underneath dealt with the details.3

An overview of the IBM 3270 Information Display System attachment. The yellow highlights indicate the 3274 Control Unit. From 3270 Information Display System: Introduction.

The 3274 Control Unit (highlighted above) is the source of the chip I examined. The purpose of the Control Unit "is to take care of all communication between the host system and your organization's display stations and printers". The diagram above shows how terminals were connected to a mainframe, with the 3274 Control Unit (indicated by arrows) in the middle. The 3274 was an all-purpose box, handling terminals, printers, modems, and encryption (if needed). It could communicate with the mainframe at up to 650,000 characters per second. The control unit below (above) is a boring beige box. The control panel is minimal since people normally didn't interact with the unit. On the back are coaxial connectors for the lines to the terminals, as well as connectors to interface with the computer and other peripherals.

An IBM 3274-41D Control Unit. From bitsavers.

The Keystone II board

In 1983, IBM announced new Control Unit models with twice the speed: these were the Model 41 and Model 61. These units were built around a board called Keystone II, shown below. The board is constructed with IBM's peculiar PCB style. The board is arranged as a grid of squares with the PCB traces too small to see unless you zoom in. Most of the decoupling capacitors are in IBM's thin, rectangular packages, although I see a few capacitors in more standard blue packages. IBM is almost a parallel universe with its unusual packaging for ICs and capacitors as well as the strange circuit board appearance.

The Keystone II board. The box is labeled Keystone II FCS [i.e. First Customer Shipment] July 23, 1982. Photo from bitsavers, originally from Bob Roberts.

Most of the chips on the board are IBM chips packaged in square aluminum cans, known as MST (Monolithic System Technology). The first line on each package is the IBM part number, which is usually undocumented. The empty socket can hold a ROS chip; ROS is Read-Only Store, known as ROM to people outside IBM. The Texas Instruments ICs in the upper right are easier to identify; the 74LS641 chips are octal bus transceivers, presumably connecting this board to the rest of the system. Similarly, the 561 5843 is a 74S240 octal bus driver while the 561 6647 chips are 74LS245 octal bus transceivers.

The memory chips on the left side of this board are interesting: each one consists of two "piggybacked" 16-kilobit DRAM chips. IBM's part number 8279251 corresponds to the Intel 4116 chip, originally made by Mostek. With 18 piggybacked chips, the board holds 64 kilobytes of parity-protected memory.

The photo below shows the Keystone II board mounted in the 3274 Control Unit. The board is in slot E towards the left and the purple Motorola IC is visible.

The Keystone II card in slot E of a 3274-41D Control Unit. Photo from bitsavers.

The Motorola/IBM chip

The board has a Motorola chip in a purple ceramic package; this is the chip that I examined. Popping off the golden lid reveals the silicon die underneath. The package has the part number "SC81150R", indicating a Motorola Special/Custom chip. This part number is also visible on the die, as shown below.

The corner of the die is marked with the SC81150 part number. Bond pads and bond wires are also visible.

While the outside of the IC is labeled "Motorola", there are no signs of Motorola internally. Instead, the die is marked "IBM" with the eight-striped logo. My guess is that IBM designed the chip and Motorola manufactured it.

The IBM logo on the die.

The diagram below shows the chip with some of the functional blocks identified. Around the outside are the bond pads and the bond wires that are connected to the chip's grid of pins. At the right is the 16×16 block of memory, along with its associated control, byte swap, and output circuitry. The yellowish-white lines are the metal layer on top of the chip that provides the chip's wiring. The thick metal lines distribute power and ground throughout the chip. Unlike modern chips, this chip only has a single metal layer, so power and ground distribution tends to get in the way of useful circuitry.

The die with some functional blocks identified.

The chip is centered around a 16-bit bus (yellow line) that connects many part of the chip. To write to the bus, a circuit pulls bus lines low. The bus lines are kept high by default by 16 pull-up transistors. This approach was fairly common in the NMOS era. However, performance is limited by the relatively weak pull-up current, making bus lines slow to go high due to R-C delays. For higher performance, some chips would precharge the bus high during one clock cycle and then pull lines low during the next cycle.

The two groups of I/O pins at the bottom are connected to the input buffer on the left and the output buffer on the right. The input buffer includes XOR circuits to compute the parity of each byte. Curiously, only 6 bits of the inputs are connected to the main bus, although other circuits use all 8 bits. The buffer also has a circuit to test for a zero value, but only using 5 of the bits.

I've put red boxes around the numerous PLAs, which can be identified by their grids of transistors. This chip has an unusually large number of PLAs. Eric Schlaepfer hypothesizes that the chip was designed on a prototype circuit board using commercial PAL chips for flexibility, and then they transferred the prototype to silicon, preserving the PLA structure. I didn't see any obvious structure to the PLAs; they all seemed to have wires going all over.

The miscellaneous logic scattered around the chip includes many latches and bus drivers; the latch circuit is similar to the memory cells. I didn't fully reverse-engineer this circuitry but I didn't see anything that looked particularly interesting, such as an ALU or counter. The circuitry near the PLAs could be latches as part of state machines, but I didn't investigate further.

I was hoping to find a recognizable processor inside the package, maybe a Motorola 6809 or 68000 processor. Instead, I found a complicated chip that doesn't appear to be a processor. It has a 16×16 memory block along with about 20 PLAs (Programmable Logic Arrays), a curiously large number. PLAs are commonly used in processors for decoding instructions, since they can match bit patterns. I couldn't find a datapatch in the chip; I expected to see the ALU and registers organized in a large but regular 8-bit or 16-bit block of circuitry. The chip doesn't have any ROM4 so there's no microcode on the chip. For these reasons, I think the chip is not a processor or microcontroller, but a specialized data-handling chip, maybe using the PLAs to interpret bits of a protocol.

The chip is built with NMOS technology, the same as the 6502 and 8086 for instance, rather than CMOS technology that is used in modern chips. I measured the transistor features and the chip appears to be built with a 3.5 µm process (not nm!), which Motorola also used for the 68000 processor (1979).

The memory buffer

The chip has a 16×16 memory buffer, which could be a register file or a FIFO buffer. One interesting feature is that the buffer is triple-ported, so it can handle two reads and one write at the same time. The buffer is implemented as a grid of cells, each storing one bit. Each row corresponds to a 16-bit word, while each column corresponds to one bit in a word. Horizontal control lines (made of polysilicon) select which word gets written or read, while vertical bit lines of metal transmit each bit of the word as it is written or read.

The microscope photo below shows two memory cells. These cells are repeated to create the entire memory buffer. The white vertical lines are metal wiring. The short segments are connections within a cell. The thicker vertical lines are power and ground. The thinner lines are the read and write bit lines. The silicon die itself is underneath the metal. The pinkish regions are active silicon, doped to make it conductive. The speckled golden lines are regions are polysilicon wires between the silicon and the metal. It has two roles: most importantly, when polysilicon crosses active silicon, it forms the gate of a transistor. But polysilicon is also used as wiring, important since this chip only has one layer of metal. The large, dark circles are contacts, connections between the metal layer and the silicon. Smaller square regions are contacts between silicon and polysilicon.

Two memory cells, side by side, as they appear under the microscope.

It was too difficult to interpret the circuits when they were obscured by the metal layer so I dissolved the metal layer and oxide with hydrochloric acid and Armour Etch respectively. The photo below shows the die with the metal removed; the greenish areas are remnants in areas where the metal was thick, mostly power and ground supplies. The dark regions in this image are regions of doped silicon. These are the active areas of the chip, showing the blocks of circuitry. There are also some thin lines of polysilicon wiring. The memory buffer is the large block on the right, just below the center.

The chip with the metal layer removed. Click to zoom in on the image.

Like most implementations of static RAM, each storage cell of the buffer is implemented with cross-coupled inverters, with the output of one inverter feeding into the input of the other. To write a new value to the cell, the new value simply overpowers the inverter output, forcing the cell to the new state. To support this, one of the inverters is designed to be weak, generating a smaller signal than a regular inverter. Most circuits that I've examined create the inverter by using a weak transistor, one with a longer gate. This chip, however, uses a circuit that I haven't seen before: an additional transistor, configured to limit the current from the inverter.

The schematic below shows one cell. Each cell uses ten transistors, so it is a "10T" cell. To support multiple reads and writes, each row of cells has three horizontal control signals: one to write to the word, and two to read. Each bit position has one vertical bit line to provide the write data and two vertical bit lines for the data that is read. Pass transistors connect the bit lines to the selected cells to perform a read or a write, allowing the data to flow in or out of the cell. The symbol that looks like an op-amp is a two-transistor NMOS buffer to amplify the signal when reading the cell.

Schematic of one memory cell.

With the metal layer removed, it is easier to see the underlying silicon circuitry and reverse-engineer it. The diagram below shows the silicon and polysilicon for one storage cell, corresponding to the schematic above. (Imagine vertical metal lines for power, ground, and the three bitlines.)

One memory cell with the metal layer removed. I etched the die a few seconds too long so some of the polysilicon is very thin or missing.

The output from the memory unit contains a byte swapper. A 16-bit word is generated with the left half from the read 1 output and the second half from the read 2 output, but the bytes can be swapped. This was probably used to read an aligned 16-bit word if it was unaligned in memory.

Parity circuits

In the lower right part of the chip are two parity circuits, each computing the parity of an 8-bit input. The parity of an input is computed by XORing the bits together through a tree of 2-input XOR gates. First, four gates process pairs of input bits. Next, two XOR gates combine the outputs of the first gates. Finally, an XOR gate combines the two previous outputs to generate the final parity.

The arrangement of the 14 XOR gates to compute parity of the two 8-bit values A and B.

The schematic below shows how an XOR gate is built from a NOR gate and an AND-NOR gate. If both inputs are 0, the first NOR gate forces the output to 0. If both inputs are 1, the AND gate forces the output to 0. Thus, the circuit computes XOR. Each labeled block above implements the XOR circuit below.

Schematic of an XOR gate.

Conclusion

My conclusion is that the processor for the Keystone II board is probably one of the other chips, one of the IBM metal-can MST packages, and this chip helps with data movement in some way. It would be possible to trace out the complete circuitry of the chip and determine exactly how it functions, but that is too time-consuming a project for this relatively obscure chip.

Follow me on Twitter @kenshirriff or RSS for more chip posts. I'm also on Mastodon occasionally as @kenshirriff@oldbytes.space. Thanks to Al Kossow for providing the chip and Dag Spicer for providing photos. Thanks to Eric Schlaepfer for discussion.

Notes and references

The 3274 Control Unit was replaced by the 3174 Establishment Controller, introduced in 1986. An "Establishment Controller" managed a cluster of peripherals or PCs connected to a host mainframe, essentially a box that provided a "kitchen-sink" of functionality including terminal support, local disk storage, Ethernet or token-ring networking, ASCII terminal support, encryption/decryption, and modem support. These units ranged from PC-sized boxes to mini-fridge-sized boxes, depending on how much functionality was required. ↩
I'm serious that my laptop can barely handle one person; my 2017 MacBook Air starts dropping characters if it has even a moderate load, and I have to start one-finger typing. You would think that a 1.8 GHz dual-core i5 processor could handle more than 2 characters per second. I don't know if there's something wrong with it, or if modern software just has too much overhead. Don't worry, I upgraded and do most of my work on a faster, more recent laptop. ↩
The IBM hardware model had the CPU focusing on the big picture, while the hierarchy of boxes underneath processed data, performed storage, handled printing, and so forth. In a sense, this paralleled the structure of offices in that era, where executives had assistants and secretaries to do the tedious work for them: typing, filing, and so forth. Nowadays, the computer hierarchy and the office hierarchy are both considerably flatter. Maybe there's a connection? ↩
A ROM and a PLA are similar in many ways. The general distinction is that a ROM activates one word (row) at a time, while a PLA can activate multiple rows at a time and combine the values, giving more flexibility. A ROM generally has a binary decoder to select the row. This decoder can be recognized by its binary structure: transistors alternating by 1's, by 2's, by 4's, and so forth. ↩

Go to page: 1 2 [3]

Log in using a user name and password

New user? Register now

Log in using OpenID

Log in using Google

Log in using Yahoo

Log in using AOL or AIM

Please wait...

Ken Shirriff's Blog

The differential

The cams

The synchro outputs

Examining the Mach section of the CADC

The position error correction servo loop

The equations

log static pressure

Impact pressure

log pressure ratio

Mach number

Conclusions

Notes and references

The motor

Tachometer/generator

The servo loop

Conclusions

Notes and references

The Arma Micro Computer

Circuits

Transfluxors

Arma's history and the path to the Micro Computer

After the Micro Computer

Conclusions

Notes and references

Prefetching and the architecture of the 8086 and 8088

Implementing the queue

The loader

Microcode and the prefetch queue

The queue registers

Notes and references

NAND gate

NOR gate

The chip's circuit

Some 7400-series history

Conclusions

Notes and references

The bus cycle

Address calculation

Address corrections

Prefetching

Wait states

Halts

Putting it all together: the state diagram

Conclusions

Notes and references

The keyboard

The display

Packaging the electronics

The logic boards

Analog boards

Core memory

Conclusions

Notes and references

Functions of the chip

Photographing the chip

Overview of the chip

Sending data from the tag to the reader: load modulation

The chip's manufacturing process

Standard-cell logic

EEPROM

Conclusions

Notes and references

Standard-cell design

A brief overview of CMOS

The layers of circuitry in the Pentium

Inverter

NAND gate

OR-NAND gate

Latch

The flip-flop

The BiCMOS buffer

Conclusions