Language:

en_US

switch to room list switch to menu My folders
Go to page: 1 2 [3]
[#] Sun Feb 11 2024 09:44:34 UTC from rss <>

Subject: Reverse-engineering an analog Bendix air data computer: part 4, the Mach section

[Reply] [ReplyQuoted] [Headers] [Print]

In the 1950s, many fighter planes used the Bendix Central Air Data Computer (CADC) to compute airspeed, Mach number, and other "air data". The CADC is an analog computer, using tiny gears and specially-machined cams for its mathematics. In this article, part 4 of my series,1 I reverse engineer the Mach section of the CADC and explain its calculations. (In the photo below, the Mach section is the middle section of the CADC.)

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

Aircraft have determined airspeed from air pressure for over a century. A port in the side of the plane provides the static air pressure,2 the air pressure outside the aircraft. A pitot tube points forward and receives the "total" air pressure, a higher pressure due to the air forced into the tube by the speed of the airplane. The airspeed can be determined from the ratio of these two pressures, while the altitude can be determined from the static pressure.

But as you approach the speed of sound, the fluid dynamics of air change and the calculations become very complicated. With the development of supersonic fighter planes in the 1950s, simple mechanical instruments were no longer sufficient. Instead, an analog computer calculated the "air data" (airspeed, air density, Mach number, and so forth) from the pressure measurements. This computer then transmitted the air data electrically to the systems that needed it: instruments, weapons targeting, engine control, and so forth. Since the computer was centralized, the system was called a Central Air Data Computer or CADC, manufactured by Bendix and other companies.

A closeup of the numerous gears inside the CADC. Three differential gear mechanisms are visible.

A closeup of the numerous gears inside the CADC. Three differential gear mechanisms are visible.

Each value in the Bendix CADC is indicated by the rotational position of a shaft. Compact electric motors rotate the shafts, controlled by the pressure inputs. Gears, cams, and differentials perform computations, with the results indicated by more rotations. Devices called synchros converted the rotations to electrical outputs that are connected to other aircraft systems. The CADC is said to contain 46 synchros, 511 gears, 820 ball bearings, and a total of 2,781 major parts (but I haven't counted). These components are crammed into a compact cylinder: just 15 inches long and weighing 28.7 pounds.

The equations computed by the CADC are impressively complicated. For instance, one equation is:

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

It seems incredible that these functions could be computed mechanically, but three techniques make this possible. The fundamental mechanism is the differential gear, which adds or subtracts values. Second, logarithms are used extensively, so multiplications and divisions are implemented by additions and subtractions performed by a differential, while square roots are calculated by gearing down by a factor of 2. Finally, specially-shaped cams implement functions: logarithm, exponential, and application-specific functions. By combining these mechanisms, complicated functions can be computed mechanically, as I will explain below.

The differential

The differential gear assembly is the mathematical component of the CADC, as it performs addition or subtraction.3 The differential takes two input rotations and produces an output rotation that is the sum or difference of these rotations.4 Since most values in the CADC are expressed logarithmically, the differential computes multiplication and division when it adds or subtracts its inputs.

A closeup of a differential mechanism.

A closeup of a differential mechanism.

While the differential functions like the differential in a car, it is constructed differently, with a spur-gear design. This compact arrangement of gears is about 1 cm thick and 3 cm in diameter. The differential is mounted on a shaft along with three co-axial gears: two gears provide the inputs to the differential and the third provides the output. In the photo, the gears above and below the differential are the input gears. The entire differential body rotates with the sum, connected to the output gear at the top through a concentric shaft. (In practice, any of the three gears can be used as the output.) The two thick gears inside the differential body are part of the mechanism.

The cams

The CADC uses cams to implement various functions. Most importantly, cams compute logarithms and exponentials. Cams also implement complicated functions of one variable such as ${M}/{\sqrt{1 + .2 M^2}}$. The function is encoded into the cam's shape during manufacturing, so a hard-to-compute nonlinear function isn't a problem for the CADC. The photo below shows a cam with the follower arm in front. As the cam rotates, the follower moves in and out according to the cam's radius.

A cam inside the CADC implements a function.

A cam inside the CADC implements a function.

However, the shape of the cam doesn't provide the function directly, as you might expect. The main problem with the straightforward approach is the discontinuity when the cam wraps around. For example, if the cam implemented an exponential directly, its radius would spiral exponentially and there would be a jump back to the starting value when it wraps around. Instead, the CADC uses a clever patented method: the cam encodes the difference between the desired function and a straight line. For example, an exponential curve is shown below (blue), with a line (red) between the endpoints. The height of the gray segment, the difference, specifies the radius of the cam (added to the cam's fixed minimum radius). The point is that this difference goes to 0 at the extremes, so the cam will no longer have a discontinuity when it wraps around. Moreover, this technique significantly reduces the size of the value (i.e. the height of the gray region is smaller than the height of the blue line), increasing the cam's accuracy.5

An exponential curve (blue), linear curve (red), and the difference (gray).

An exponential curve (blue), linear curve (red), and the difference (gray).

To make this work, the cam position must be added to the linear value to yield the result. This is implemented by combining each cam with a differential gear; watch for the paired cams and differentials below. As the diagram below shows, the input (23) drives the cam (30) and the differential (25, 37-41). The follower (32) tracks the cam and provides a second input (35) to the differential. The sum from the differential produces the desired function (26).

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

The synchro outputs

A synchro is an interesting device that can transmit a rotational position electrically over three wires. In appearance, a synchro is similar to an electric motor, but its internal construction is different, as shown below. Before digital systems, synchros were very popular for transmitting signals electrically through an aircraft. For instance, a synchro could transmit an altitude reading to a cockpit display or a targeting system. Two synchros at different locations have their stator windings connected together, while the rotor windings are driven with AC. Rotating the shaft of one synchro causes the other to rotate to the same position.6

Cross-section diagram of a synchro showing the rotor and stators.

Cross-section diagram of a synchro showing the rotor and stators.

For the CADC, most of the outputs are synchro signals, using compact synchros that are about 3 cm in length. For improved resolution, many of the CADC outputs use two synchros: a coarse synchro and a fine synchro. The two synchros are typically geared in an 11:1 ratio, so the fine synchro rotates 11 times as fast as the coarse synchro. Over the output range, the coarse synchro may turn 180°, providing the approximate output unambiguously, while the fine synchro spins multiple times to provide more accuracy.

Examining the Mach section of the CADC

Another view of the CADC.

Another view of the CADC.

The Bendix CADC is constructed from modular sections. In this blog post, I'm focusing on the middle section, called the "Mach section" and indicated by the arrow above. This section computes log static pressure, impact pressure, pressure ratio, and Mach number and provides these outputs electrically as synchro signals. It also provides the log pressure ratio and log static pressure to the rest of the CADC as shaft rotations. The left section of the CADC computes values related to airspeed, air density, and temperature.7 The right section has the pressure sensors (the black domes), along with the servo mechanisms that control them.

I had feared that any attempt at disassembly would result in tiny gears flying in every direction, but the CADC was designed to be taken apart for maintenance. Thus, I could remove the left section of the CADC for analysis. Unfortunately, we lost the gear alignment between the sections and don't have the calibration instructions, so the CADC no longer produces accurate results.

The diagram below shows the internal components of the Mach section after disassembly. The synchros are in pairs to generate coarse and fine outputs; the coarse synchros can be distinguished because they have spiral anti-backlash springs installed. These springs prevent wobble in the synchro and gear train as the gears change direction. The gears and differentials are not visible from this angle as they are underneath the metal plate. The Pressure Error Correction (PEC) subsystem has a motor to drive the shaft and a control transformer for feedback. The Mach section has two D-sub connectors. The one on the right links the Mach section and pressure section to the front section of the CADC. The Position Error Correction (PEC) servo amplifier board plugs into the left connector. The static pressure and total pressure input lines have fittings so the lines can be disconnected from the lines from the front of the CADC.8

The Mach section with components labeled.

The Mach section with components labeled.

The photo below shows the left section of the CADC. This section meshes with the Mach section shown above. The two sections have parts at various heights, so they join in a complicated way. Two gears receive the pressure signals \( log ~ P_t / P_s \) and \( log ~ P_s \) from the Mach section. The third gear sends the log total temperature to the rest of the CADC. The electrical connector (a standard 37-pin D-sub) supplies 120 V 400 Hz power to the Mach section and pressure transducers and passes synchro signals to the output connectors.

The left part of the CADC that meshes with the Mach section.

The left part of the CADC that meshes with the Mach section.

The position error correction servo loop

The CADC receives two pressure inputs and two pressure transducers convert the pressures into rotational positions, providing the indicated static pressure \( P_{si} \) and the total pressure \( P_t \) as shaft rotations to the rest of the CADC. (I explained the pressure transducers in detail in the previous article.)

There's one complication though. The static pressure \( P_s \) is the atmospheric pressure outside the aircraft. The problem is that the static pressure measurement is perturbed by the airflow around the aircraft, so the measured pressure (called the indicated static pressure \( P_{si} \)) doesn't match the real pressure. This is bad because a "static-pressure error manifests itself as errors in indicated airspeed, altitude, and Mach number to the pilot."9

The solution is a correction factor called the Position Error Correction. This factor gives the ratio between the real pressure \( P_s \) and the measured pressure \( P_{si} \). By applying this correction factor to the indicated (i.e. measured) pressure, the true pressure can be obtained. Since this correction factor depends on the shape of the aircraft, it is generated outside the CADC by a separate cylindrical unit called the Compensator, customized to the aircraft type. The position error computation depends on two parameters: the Mach number provided by the CADC and the angle of attack provided by an aircraft sensor. The compensator determines the correction factor by using a three-dimensional cam. The vintage photo below shows the components inside the compensator.

"Static Pressure and Angle of Attack Compensator Type X1254115-1 (Cover Removed)" from Air Data Computer Mechanization.

"Static Pressure and Angle of Attack Compensator Type X1254115-1 (Cover Removed)" from Air Data Computer Mechanization.

The correction factor is transmitted from the compensator to the CADC as a synchro signal over three wires. To use this value, the CADC must convert the synchro signal to a shaft rotation. The CADC uses a motorized servo loop that rotates the shaft until the shaft position matches the angle specified by the synchro input.

The servo loop ensures that the shaft position matches the input angle.

The servo loop ensures that the shaft position matches the input angle.

The key to the servo loop is a control transformer. This device looks like a synchro and has five wires like a synchro, but its function is different. Like the synchro motor, the control transformer has three stator wires that provide the angle input. Unlike the synchro, the control transformer also uses the shaft position as an input, while the rotor winding generates an output voltage indicating the error. This output voltage indicates the error between the control transformer's shaft position and the three-wire angle input. The control transformer provides its error signal as a 400 Hz sine wave, with a larger signal indicating more error.10

The amplifier board (below) drives the motor in the appropriate direction to cancel out the error. The power transformer in the upper left is the largest component, powering the amplifier board from the CADC's 115-volt, 400 Hertz aviation power. Below it are two transformer-like components; these are the magnetic amplifiers. The relay in the lower-right corner switches the amplifier into test mode. The rest of the circuitry consists of transistors, resistors, capacitors, and diodes. The construction is completely different from modern printed circuit boards. Instead, the amplifier uses point-to-point wiring between plastic-insulated metal pegs. Both sides of the board have components, with connections between the sides through the metal pegs.

The amplifier board for the position error correction.

The amplifier board for the position error correction.

The amplifier board is implemented with a transistor amplifier driving two magnetic amplifiers, which control the motor.11 (Magnetic amplifiers are an old technology that can amplify AC signals, allowing the relatively weak transistor output to control a larger AC output.12) The motor is a "Motor / Tachometer Generator" unit that also generates a voltage based on the motor's speed. This speed signal provides negative feedback, limiting the motor speed as the error becomes smaller and ensuring that the feedback loop doesn't overshoot. The photo below shows how the amplifier board is mounted in the middle of the CADC, behind the static pressure tubing.

Side view of the CADC.

Side view of the CADC.

The equations

Although the CADC looks like an inscrutable conglomeration of tiny gears, it is possible to trace out the gearing and see exactly how it computes the air data functions. With considerable effort, I have reverse-engineered the mechanisms to create the diagram below, showing how each computation is broken down into mechanical steps. Each line indicates a particular value, specified by a shaft rotation. The ⊕ symbol indicates a differential gear, adding or subtracting its inputs to produce another value. The cam symbol indicates a cam coupled to a differential gear. Each cam computes either a specific function or an exponential, providing the value as a rotation. At the right, the outputs are either shaft rotations to the rest of the CADC or synchro outputs.

This diagram shows how the values are computed. The differential numbers are my own arbitrary numbers. Click for a larger version.

This diagram shows how the values are computed. The differential numbers are my own arbitrary numbers. Click for a larger version.

I'll go through each calculation briefly.

log static pressure

The static pressure is calculated by dividing the indicated static pressure by the pressure error correction factor. Since these values are all represented logarithmically, the division turns into a subtraction, performed by a differential gear. The output goes to two synchros, geared to provide coarse and fine outputs.13

\[log ~ P_s = log ~ P_{si} - log ~ P_{si} / P_s \]

Impact pressure

The impact pressure is the pressure due to the aircraft's speed, the difference between the total pressure and the static pressure. To compute the impact pressure, the log pressure values are first converted to linear values by exponentiation, performed by cams. The linear pressure values are then subtracted by a differential gear. Finally, the impact pressure is output through two synchros, coarse and fine in an 11:1 ratio.

\[ P_t - P_s = exp(log ~ P_t) - exp(log ~ P_s) \]

log pressure ratio

The log pressure ratio \( P_t/P_s \) is the ratio of total pressure to static pressure. This value is important because it is used to compute the Mach number, true airspeed, and log free air temperature. The Mach number is computed in the Mach section as described below. The true airspeed and log free air temperature are computed in the left section. The left section receives the log pressure ratio as a rotation. Since the left section and Mach section can be separated for maintenance, a direct shaft connection is not used. Instead, each section has a gear and the gears mesh when the sections are joined.

Computing the log pressure ratio is straightforward. Since the log total pressure and log static pressure are both available, subtracting the logs with a differential yields the desired value. That is,

\[log ~ P_t/P_s = log ~ P_t - log ~ P_s \]

Mach number

The Mach number is defined in terms of \(P_t/P_s \), with separate cases for subsonic and supersonic:14

\[M<1:\] \[~~~\frac{P_t}{P_s} = ( 1+.2M^2)^{3.5}\]

\[M > 1:\]

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

Although these equations are very complicated, the solution is a function of one variable \(P_t/P_s\) so M can be computed with a single cam. In other words, the mathematics needed to be done when the CADC was manufactured, but once the cam exists, computing M is easy, using the log pressure ratio computed earlier:

\[ M = f(log ~ P_t / P_s) \]

Conclusions

The CADC performs nonlinear calculations that seem way too complicated to solve with mechanical gearing. But reverse-engineering the mechanism shows how the equations are broken down into steps that can be performed with cams and differentials, using logarithms for multiplication and division. The diagram below shows the complex gearing in the Mach section. Each differential below corresponds to a differential in the earlier equation diagram.

A closeup of the gears and cams in the Mach section. The differential for the pressure ratio is hidden in the middle.

A closeup of the gears and cams in the Mach section. The differential for the pressure ratio is hidden in the middle.

Follow me on Twitter @kenshirriff or RSS for more reverse engineering. I'm also on Mastodon as @oldbytes.space@kenshirriff. Thanks to Joe for providing the CADC. Thanks to Nancy Chen for obtaining a hard-to-find document for me.15 Marc Verdiell and Eric Schlaepfer are working on the CADC with me. CuriousMarc's video shows the CADC in action:

Notes and references

  1. My articles on the CADC are:

    There is a lot of overlap between the articles, so skip over parts that seem repetitive :-) 

  2. The static air pressure can also be provided by holes in the side of the pitot tube; this is the typical approach in fighter planes. 

  3. Multiplying a rotation by a constant factor doesn't require a differential; it can be done simply with the ratio between two gears. (If a large gear rotates a small gear, the small gear rotates faster according to the size ratio.) Adding a constant to a rotation is even easier, just a matter of defining what shaft position indicates 0. For this reason, I will ignore constants in the equations. 

  4. Strictly speaking, the output of the differential is the sum of the inputs divided by two. I'm ignoring the factor of 2 because the gear ratios can easily cancel it out. It's also arbitrary whether you think of the differential as adding or subtracting, since it depends on which rotation direction is defined as positive. 

  5. The diagram below shows a typical cam function in more detail. The input is \(log~ dP/P_s\) and the output is \(log~M / \sqrt{1+.2KM^2}\). The small humped curve at the bottom is the cam correction. Although the input and output functions cover a wide range, the difference that is encoded in the cam is much smaller and drops to zero at both ends.

    This diagram, from Patent 2969910, shows how a cam implements a complicated function.

    This diagram, from Patent 2969910, shows how a cam implements a complicated function.

     

  6. Internally, a synchro has a moving rotor winding and three fixed stator windings. When AC is applied to the rotor, voltages are developed on the stator windings depending on the position of the rotor. These voltages produce a torque that rotates the synchros to the same position. In other words, the rotor receives power (26 V, 400 Hz in this case), while the three stator wires transmit the position. The diagram below shows how a synchro is represented schematically, with rotor and stator coils.

    The schematic symbol for a synchro.

    The schematic symbol for a synchro.

    A control transformer has a similar structure, but the rotor winding provides an output, instead of being powered. 

  7. Specifically, the left part of the CADC computes true airspeed, air density, total temperature, log true free air temperature, and air density × speed of sound. I discussed the left section in detail here

  8. From the outside, the CADC is a boring black cylinder, with no hint of the complex gearing inside. The CADC is wired to the rest of the aircraft through round military connectors. The front panel interfaces these connectors to the D-sub connectors used internally. The two pressure inputs are the black cylinders at the bottom of the photo.

    The exterior of the CADC. It is packaged in a rugged metal cylinder. It is sealed by a soldered metal band, so we needed a blowtorch to open it.

    The exterior of the CADC. It is packaged in a rugged metal cylinder. It is sealed by a soldered metal band, so we needed a blowtorch to open it.

     

  9. The concepts of position error correction are described here

  10. The phase of the signal is 0° or 180°, depending on the direction of the error. In other words, the error signal is proportional to the driving AC signal in one direction and flipped when the error is in the other direction. This is important since it indicates which direction the motor should turn. When the error is eliminated, the signal is zero. 

  11. I reverse-engineered the circuit board to create the schematic below for the amplifier. The idea is that one magnetic amplifier or the other is selected, depending on the phase of the error signal, causing the motor to turn counterclockwise or clockwise as needed. To implement this, the magnetic amplifier control windings are connected to opposite phases of the 400 Hz power. The transistor is connected to both magnetic amplifiers through diodes, so current will flow only if the transistor pulls the winding low during the half-cycle that the winding is powered high. Thus, depending on the phase of the transistor output, one winding or the other will be powered, allowing that magnetic amplifier to pass AC to the motor.

    This reverse-engineered schematic probably has a few errors. Click the schematic for a larger version.

    This reverse-engineered schematic probably has a few errors. Click the schematic for a larger version.

    The CADC has four servo amplifiers: this one for pressure error correction, one for temperature, and two for pressure. The amplifiers have different types of inputs: the temperature input is the probe resistance, the pressure error correction uses an error voltage from the control transformer, and the pressure inputs are voltages from the inductive pickups in the sensor. The circuitry is roughly the same for each amplifier—a transistor amplifier driving two magnetic amplifiers—but the details are different. The largest difference is that each pressure transducer amplifier drives two motors (coarse and fine) so each has two transistor stages and four magnetic amplifiers. 

  12. The basic idea of a magnetic amplifier is a controllable inductor. Normally, the inductor blocks alternating current. But applying a relatively small DC signal to a control winding causes the inductor to saturate, permitting the flow of AC. Since the magnetic amplifier uses a small signal to control a much larger signal, it provides amplification.

    In the early 1900s, magnetic amplifiers were used in applications such as dimming lights. Germany improved the technology in World War II, using magnetic amplifiers in ships, rockets, and trains. The magnetic amplifier had a resurgence in the 1950s; the Univac Solid State computer used magnetic amplifiers (rather than vacuum tubes or transistors) as its logic elements. However, improvements in transistors made the magnetic amplifier obsolete except for specialized applications. (See my IEEE Spectrum article on magnetic amplifiers for more history of magnetic amplifiers.) 

  13. The CADC specification defines how the parameter values correspond to rotation angles of the synchros. For instance, for the log static pressure synchros, the CADC supports the parameter range 0.8099 to 31.0185 inches of mercury. The spec defines the corresponding synchro outputs as 16,320° rotation of the fine synchro and 175.48° rotation of the coarse synchro over this range. The synchro null point corresponds to 29.92 inches of mercury (i.e. zero altitude). The fine synchro is geared to rotate 93 times as fast as the coarse synchro, so it rotates over 45 times during this range, providing higher resolution than a single synchro would provide. The other synchro pairs use a much smaller 11:1 ratio; presumably high accuracy of the static pressure was important. 

  14. Although the CADC's equations may seem ad hoc, they can be derived from fluid dynamics principles. These equations were standardized in the 1950s by various government organizations including the National Bureau of Standards and NACA (the precursor of NASA). 

  15. It was very difficult to find information about the CADC. The official military specification is MIL-C-25653C(USAF). After searching everywhere, I was finally able to get a copy from the Technical Reports & Standards unit of the Library of Congress. The other useful document was in an obscure conference proceedings from 1958: "Air Data Computer Mechanization" (Hazen), Symposium on the USAF Flight Control Data Integration Program, Wright Air Dev Center US Air Force, Feb 3-4, 1958, pp 171-194. 





[#] Sat Feb 17 2024 10:11:34 UTC from rss <>

Subject: Inside the mechanical Bendix Air Data Computer, part 5: motor/tachometers

[Reply] [ReplyQuoted] [Headers] [Print]

The Bendix Central Air Data Computer (CADC) is an electromechanical analog computer that uses gears and cams for its mathematics. It was a key part of military planes such as the F-101 and the F-111 fighters, computing airspeed, Mach number, and other "air data". The rotating gears are powered by six small servomotors, so these motors are in a sense the fundamental component of the CADC. In the photo below, you can see one of the cylindrical motors near the center, about 1/3 of the way down.

The servomotors in the CADC are unlike standard motors. Their name—"Motor-Tachometer Generator" or "Motor and Rate Generator"1—indicates that each unit contains both a motor and a speed sensor. Because the motor and generator use two-phase signals, there are a total of eight colorful wires coming out, many more than a typical motor. Moreover, the direction of the motor can be controlled, unlike typical AC motors. I couldn't find a satisfactory explanation of how these units worked, so I bought one and disassembled it. This article (part 5 of my series on the CADC2) provides a complete teardown of the motor/generator and explain how it works.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the compact gear mechanisms inside. Click this image (or any other) for a larger version.

The image below shows a closeup of two motors powering one of the pressure signal outputs. Note the bundles of colorful wires to each motor, entering in two locations. At the top, the motors drive complex gear trains. The high-speed motors are geared down by the gear trains to provide much slower rotations with sufficient torque to power the rest of the CADC's mechanisms.

Two motor/generators in the pressure section of the CADC. The one at the back is mostly hidden.

Two motor/generators in the pressure section of the CADC. The one at the back is mostly hidden.

The motor/tachometer that we disassembled is shorter than the ones in the CADC (despite having the same part number), but the principles are the same. We started by removing a small C-clip on the end of the motor and and unscrewing the end plate. The unit is pretty simple mechanically. It has bearings at each end for the rotor shaft. There are four wires for the motor and four wires for the tachometer.3

The motor disassembled to show the internal components.

The motor disassembled to show the internal components.

The rotor (below) has two parts on the shaft. the left part is for the motor and the right drum is for the tachometer. The left part is a squirrel-cage rotor4 for the motor. It consists of conducting bars (light-colored) on an iron core. The conductors are all connected at both ends by the conductive rings at either end. The metal drum on the right is used by the tachometer. Note that there are no electrical connections between the rotor components and the rest of the motor: there are no brushes or slip rings. The interaction between the rotor and the windings in the body of the motor is purely magnetic, as will be explained.

The rotor and shaft.

The rotor and shaft.

The motor/tachometer contains two cylindrical stators that create the magnetic fields, one for the motor and one for the tachometer. The photo below shows the motor stator inside the unit after removing the tachometer stator. The stators are encased in hard green plastic and tightly pressed inside the unit. In the center, eight metal poles are visible. They direct the magnetic field onto the rotor.

Inside the motor after removing the tachometer winding.

Inside the motor after removing the tachometer winding.

The photo below shows the stator for the tachometer, similar to the stator for the motor. Note the shallow notches that look like black lines in the body on the lower left. These are probably adjustments to the tachometer during manufacturing to compensate for imperfections. The adjustments ensure that the magnetic fields are nulled out so the tachometer returns zero voltage when stationary. The metal plate on top shields the tachometer from the motor's magnetic fields.

The stator for the tachometer.

The stator for the tachometer.

The poles and the metal case of the stator look solid, but they are not. Instead, they are formed from a stack of thin laminations. The reason to use laminations instead of solid metal is to reduce eddy currents in the metal. Each lamination is varnished, so it is insulated from its neighbors, preventing the flow of eddy currents.

One lamination from the stack of laminations that make up the winding. The lamination suffered some damage during disassembly; it was originally round.

One lamination from the stack of laminations that make up the winding. The lamination suffered some damage during disassembly; it was originally round.

In the photo below, I removed some of the plastic to show the wire windings underneath. The wires look like bare copper, but they have a very thin layer of varnish to insulate them. There are two sets of windings (orange and blue, or red and black) around alternating metal poles. Note that the wires run along the pole, parallel to the rotor, and then wrap around the pole at the top and bottom, forming oblong coils around each pole.5 This generates a magnetic field through each pole.

Removing the plastic reveals the motor windings.

Removing the plastic reveals the motor windings.

The motor

The motor part of the unit is a two-phase induction motor with a squirrel-cage rotor.6 There are no brushes or electrical connections to the rotor, and there are no magnets, so it isn't obvious what makes the rotor rotate. The trick is the "squirrel-cage" rotor, shown below. It consists of metal bars that are connected at the top and bottom by rings. Assume (for now) that the fixed part of the motor, the stator, creates a rotating magnetic field. The important principle is that a changing magnetic field will produce a current in a wire loop.7 As a result, each loop in the squirrel-cage rotor will have an induced current: current will flow up9 the bars facing the north magnetic field and down the south-facing bars, with the rings on the end closing the circuits.

A squirrel-cage rotor. The numbered parts are (1) shaft, (2) end cap, (3) laminations, and (4) splines to hold the laminations. Image from Robo Blazek.

A squirrel-cage rotor. The numbered parts are (1) shaft, (2) end cap, (3) laminations, and (4) splines to hold the laminations. Image from Robo Blazek.

But how does the stator produce a rotating magnetic field? And how do you control the direction of rotation? The next important principle is that current flowing through a wire produces a magnetic field.8 As a result, the currents in the squirrel cage rotor produce a magnetic field perpendicular to the cage. This magnetic field causes the rotor to turn in the same direction as the stator's magnetic field, driving the motor. Because the rotor is powered by the induced currents, the motor is called an induction motor.

The diagram below shows how the motor is wired, with a control winding and a reference winding. Both windings are powered with AC, but the control voltage either lags the reference winding by 90° or leads the reference winding by 90°, due to the capacitor. Suppose the current through the control winding lags by 90°. First, the reference voltage's sine wave will have a peak, producing the magnetic field's north pole at A. Next (90° later), the control voltage will peak, producing the north pole at B. The reference voltage will go negative, producing a south pole at A and thus a north pole at C. The control voltage will go negative, producing a south pole at B and a north pole at D. This cycle will repeat, with the magnetic field rotating counter-clockwise from A to D. Conversely, if the control voltage leads the reference voltage, the magnetic field will rotate clockwise. This causes the motor to spin in one direction or the other, with the direction controlled by the control voltage. (The motor has four poles for each winding, rather than the one shown below; this increases the torque and reduces the speed.)

Diagram showing the servomotor wiring.

Diagram showing the servomotor wiring.

The purpose of the capacitor is to provide the 90° phase shift so the reference voltage and the control voltage can be driven from the same single-phase AC supply (in this case, 26 volts, 400 hertz). Switching the polarity of the control voltage reverses the direction of the motor.

There are a few interesting things about induction motors. You might expect that the motor would spin at the same rate as the rotating magnetic field. However, this is not the case. Remember that a changing magnetic field induces the current in the squirrel-cage rotor. If the rotor is spinning at the same rate as the magnetic field, the rotor will encounter an unchanging magnetic field and there will be no current in the bars of the rotor. As a result, the rotor will not generate a magnetic field and there will be no torque to rotate it. The consequence is that the rotor must spin somewhat slower than the magnetic field. This is called "slippage" and is typically a few percent of the full speed, with more slippage as more torque is required.

Many household appliances use induction motors, but how do they generate a rotating magnetic field from a single-phase AC winding? The problem is that the magnetic field in a single AC winding will just flip back and forth, so the motor will not turn in either direction. One solution is a shaded-pole motor, which puts a copper bar around part of each pole to break the symmetry and produce a weakly rotating magnetic field. More powerful induction motors use a startup winding with a capacitor (analogous to the control winding). This winding can either be switched out of the circuit once the motor starts spinning,10 or used continuously, called a permanent-split capacitor (PSC) motor. The best solution is three-phase power (if available); a three-phase winding automatically produces a rotating magnetic field.

Tachometer/generator

The second part of the unit is the tachometer generator, sometimes called the rate unit.11 The purpose of the generator is to produce a voltage proportional to the speed of the shaft. The unusual thing about this generator is that it produces a 400-hertz output that is either in phase with the input or 180° out of phase. This is important because the phase indicates which direction the shaft is turning. Note that a "normal" generator is different: the output frequency is proportional to the speed.

The diagram below shows the principle behind the generator. It has two stator windings: the reference coil that is powered at 400 Hz, and the output coil that produces the output signal. When the rotor is stationary (A), the magnetic flux is perpendicular to the output coil, so no output voltage is produced. But when the rotor turns (B), eddy currents in the rotor distort the magnetic field. It now couples with the output coil, producing a voltage. As the rotor turns faster, the magnetic field is distorted more, increasing the coupling and thus the output voltage. If the rotor turns in the opposite direction (C), the magnetic field couples with the output coil in the opposite direction, inverting the output phase. (This diagram is more conceptual than realistic, with the coils and flux 90° from their real orientation, so don't take it too seriously. As shown earlier, the coils are perpendicular to the rotor so the real flux lines are completely different.)

Principle of the drag-cup rate generator. From Navy electricity and electronics training series: Principles of synchros, servos, and gyros, Fig 2-16

But why does the rotating drum change the magnetic field? It's easier to understand by considering a tachometer that uses a squirrel-cage rotor instead of a drum. When the rotor rotates, currents will be induced in the squirrel cage, as described earlier with the motor. These currents, in turn, generate a perpendicular magnetic field, as before. This magnetic field, perpendicular to the orginal field, will be aligned with the output coil and will be picked up. The strength of the induced field (and thus the output voltage) is proportional to the speed, while the direction of the field depends on the direction of rotation. Because the primary coil is excited at 400 hertz, the currents in the squirrel cage and the resulting magnetic field also oscillate at 400 hertz. Thus, the output is at 400 hertz, regardless of the input speed.

Using a drum instead of a squirrel cage provides higher accuracy because there are no fluctuations due to the discrete bars. The operation is essentially the same, except that the currents pass through the metal of the drum continuously instead of through individual bars. The result is eddy currents in the drum, producing the second magnetic field. The diagram below shows the eddy currents (red lines) from a metal plate moving through a magnetic field (green), producing a second magnetic field (blue arrows). For the rotating drum, the situation is similar except the metal surface is curved, so both field arrows will have a component pointing to the left. This creates the directed magnetic field that produces the output.

A diagram showing eddy currents in a metal plate moving under a magnet, Image from Chetvorno.

A diagram showing eddy currents in a metal plate moving under a magnet, Image from Chetvorno.

The servo loop

The motor/generator is called a servomotor because it is used in a servo loop, a control system that uses feedback to obtain precise positioning. In particular, the CADC uses the rotational position of shafts to represent various values. The servo loops convert the CADC's inputs (static pressure, dynamic pressure, temperature, and pressure correction) into shaft positions. The rotations of these shafts power the gears, cams, and differentials that perform the computations.

The diagram below shows a typical servo loop in the CADC. The goal is to rotate the output shaft to a position that exactly matches the input voltage. To accomplish this, the output position is converted into a feedback voltage by a potentiometer that rotates as the output shaft rotates.12 The error amplifier compares the input voltage to the feedback voltage and generates an error signal, rotating the servomotor in the appropriate direction. Once the output shaft is in the proper position, the error signal drops to zero and the motor stops. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. This ensures that the motor slows as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.)

Diagram of a servo loop in the CADC.

Diagram of a servo loop in the CADC.

The error amplifier and motor drive circuit for a pressure transducer are shown below. Because of the state of electronics at the time, it took three circuit boards to implement a single servo loop. The amplifier was implemented with germanium transistors (since silicon transistors were later). The transistors weren't powerful enough to drive the motors directly. Instead, magnetic amplifiers (the yellow transformer-like modules at the front) powered the servomotors. The large rectangular capacitors on the right provided the phase shift required for the control voltage.

One of the three-board amplifiers for the pressure transducer.

One of the three-board amplifiers for the pressure transducer.

Conclusions

The Bendix CADC used a variety of electromechanical devices including synchros, control transformers, servo motors, and tachometer generators. These were expensive military-grade components driven by complex electronics. Nowadays, you can get a PWM servo motor for a few dollars with the gearing, feedback, and control circuitry inside the motor housing. These motors are widely used for hobbyist robotics, drones, and other applications. It's amazing that servo motors have gone from specialized avionics hardware to an easy-to-use, inexpensive commodity.

A modern DC servo motor. Photo by Adafruit (CC BY-NC-SA 2.0 DEED).

A modern DC servo motor. Photo by Adafruit (CC BY-NC-SA 2.0 DEED).

Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @oldbytes.space@kenshirriff. Thanks to Joe for providing the CADC. Thanks to Marc Verdiell for disassembling the motor.

Notes and references

  1. The two types of motors in the CADC are part number "FV-101-19-A1" and part number "FV-101-5-A1" (or FV101-5A1). They are called either a "Tachometer Rate Generator" or "Tachometer Motor Generator", with both names applied to the same part number. The "19" and "5" units look the same, with the "19" used for one pressure servo loop and the "5" used everywhere else.

    The motor that I got is similar to the ones in the CADC, but shorter. The difference in size is mysterious since both have the Bendix part number FV-101-5-A1.

    For reference, the motor I disassembled is labeled:

    Cedar Division Control Data Corp. ST10162 Motor Tachometer F0: 26V C0: 26V TACH: 18V 400 CPS DSA-400-70C-4651 FSN6105-581-5331 US BENDIX FV-101-5-A1

    I wondered why the motor listed both Control Data and Bendix. In 1952, the Cedar Engineering Company was spun off from the Minneapolis Honeywell Regulator Company (better known as Honeywell, the name it took in 1964). Cedar Engineering produced motors, servos, and aircraft actuators. In 1957, Control Data bought Cedar Engineering, which became the Cedar Division of CDC. Then, Control Data acquired Bendix's computer division in 1963. Thus, three companies were involved. 

  2. My previous articles on the CADC are:

     

  3. From testing the motor, here is how I believe it is wired:
    Motor reference (power): red and black
    Motor control: blue and orange
    Generator reference (power): green and brown
    Generator out: white and yellow 

  4. The bars on the squirrel-cage rotor are at a slight angle. Parallel bars would go in and out of alignment with the stator, causing fluctuations in the force, while the angled bars avoid this problem. 

  5. This cross-section through the stator shows the windings. On the left, each winding is separated into the parts on either side of the pole. On the right, you can see how the wires loop over from one side of the pole to the other. Note the small circles in the 12 o'clock and 9 o'clock positions: cross sections of the input wires. The individual horizontal wires near the circumference connect alternating windings.

    A cross-section of the stator, formed by sanding down the plastic on the end.

    A cross-section of the stator, formed by sanding down the plastic on the end.

     

  6. It's hard to find explanations of AC servomotors since they are an old technology. One discussion is in Electromechanical components for servomechanisms (1961). This book points out some interesting things about a servomotor. The stall torque is proportional to the control voltage. Servomotors are generally high-speed, but low-torque devices, heavily geared down. Because of their high speed and their need to change direction, rotational inertia is a problem. Thus, servomotors typically have a long, narrow rotor compared with typical motors. (You can see in the teardown photo that the rotor is long and narrow.) Servomotors are typically designed with many poles (to reduce speed) and smaller air gaps to increase inductance. These small airgaps (e.g. 0.001") require careful manufacturing tolerance, making servomotors a precision part. 

  7. The principle is Faraday's law of induction: "The electromotive force around a closed path is equal to the negative of the time rate of change of the magnetic flux enclosed by the path." 

  8. Ampère's law states that "the integral of the magnetizing field H around any closed loop is equal to the sum of the current flowing through the loop." 

  9. The direction of the current flow (up or down) depends on the direction of rotation. I'm not going to worry about the specific direction of current flow, magnetic flux, and so forth in this article. 

  10. Once an induction motor is spinning, it can be powered from a single AC phase since the stator is rotating with respect to the magnetic field. This works for the servomotor too. I noticed that once the motor is spinning, it can operate without the control voltage. This isn't the normal way of using the motor, though. 

  11. A long discussion of tachometers is in the book Electromechanical Components for Servomechanisms (1961). The AC induction-generator tachometer is described starting on page 193.

    For a mathematical analysis of the tachometer generator, see Servomechanisms, Section 2, Measurement and Signal Converters, MCP 706-137, U.S. Army. This source also discusses sources of errors in detail. Inexpensive tachometer generators may have an error of 1-2%, while precision devices can have an error of about 0.1%. Accuracy is worse for small airborne generators, though. Since the Bendix CADC uses the tachometer output for damping, not as a signal output, accuracy is less important. 

  12. Different inputs in the CADC use different feedback mechanisms. The temperature servo uses a potentiometer for feedback. The angle of attack correction uses a synchro control transformer, which generates a voltage based on the angle error. The pressure transducers contain inductive pickups that generate a voltage based on the pressure error. For more details, see my article on the CADC's pressure transducer servo circuits





[#] Sat Feb 24 2024 11:57:50 UTC from rss <>

Subject: The first microcomputer: The transfluxor-powered Arma Micro Computer from 1962

[Reply] [ReplyQuoted] [Headers] [Print]

What would you say is the first microcomputer?1 The Apple I from 1976? The Altair 8800 from 1974? Perhaps the lesser-known Micral N (1973) or Q1 (1972)? How about the Arma Micro Computer from way back in 1962. The Arma Micro Computer was a compact 20-pound transistorized computer, designed for applications in space such as inertial or celestial navigation, steering, radar, or engine control.

Obviously, the Arma Micro Computer is not a microcomputer according to modern definitions, since its processor was made from discrete components. But it's an interesting computer in many ways. First, it is an example of the aerospace computers of the 1960s, advanced systems that are now almost entirely forgotten. People think of 1960s computers as room-filling mainframes, but there was a whole separate world of cutting-edge miniaturized aerospace computers. (Taking up just 0.4 cubic feet, the Arma Micro Computer was smaller than an Apple II.) Second, the Arma Micro Computer used strange components such as transfluxors and had an unusual 22-bit serial architecture. Finally, the Arma Micro Computer evolved into a series of computers used on Navy ships and submarines, the E-2C Hawkeye airborne early warning plane, the Concorde, and even Air Force One.

The Arma Micro Computer

The Arma Micro Computer, with a circuit board on top. Click this image (or any other) for a larger version. Photo courtesy of Daniel Plotnick.

The Arma Micro Computer, with a circuit board on top. Click this image (or any other) for a larger version. Photo courtesy of Daniel Plotnick.

The Micro Computer used 22-bit words, which may seem like a strange size from the modern perspective. But there's no inherent need for a word size to be a power of 2. In particular, the Micro Computer was designed for mathematical calculations, not dealing with 8-bit characters. The word size was selected to provide enough accuracy for its navigational tasks.

Another strange aspect of the Micro Computer is that it was a serial machine, sequentially operating on one bit of a word at a time.2 This approach was often used in early machines because it substantially reduced the amount of hardware required: it only needs a 1-bit data bus and a 1-bit ALU. The downside is that a serial machine is much slower because each 22-bit word takes 22 clock cycles (plus 5 cycles of overhead). As a result, the Micro Computer executed just 36000 operations per second, despite its 1 megahertz clock speed.

Ad for the Arma Micro Computer (called the MICRO here). Source: Electronics, July 27, 1962.

Ad for the Arma Micro Computer (called the MICRO here). Source: Electronics, July 27, 1962.

The Micro Computer had a small instruction set of 19 instructions.3 It included multiply, divide, and square root, instructions that weren't implemented in early microprocessors. This illustrates how early microprocessors were a significant step backward in functionality. Moreover, the multiply, divide, and square root instructions used a separate arithmetic unit, so they could execute in parallel with other arithmetic instructions. Because the Micro Computer needed to interact with spacecraft systems, it had a focus on I/O, with 120 digital inputs or outputs, configured as needed for a particular mission.

Circuits

The Micro Computer was built from silicon transistors and diodes, using diode-transistor logic. The construction technique was somewhat unusual. The basic circuits were the flip-flop, the complementary buffer (i.e. an inverter), and the diode gate. Each basic circuit was constructed on a small wafer, .77 inches on a side.5 The photo below shows wafers for a two-transistor flip-flop and two diode gates. Each wafer had up to 16 connection tabs on the edges. These wafers are analogous to integrated circuits, but constructed from discrete components.

Three circuit modules from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Three circuit modules from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

The wafers were mounted on printed circuit boards, with up to 22 wafers on a board. Pairs of boards were mounted back to back with polyurethane foam between the boards to form a "sandwich", which was conformally coated. The result was a module that was protected against the harsh environment of a missile or spacecraft. The computer could handle a shock of 100 g's and temperatures of 0°C to 85°C as well as 100% humidity or a vacuum.

Because the Micro Computer was a serial machine, its bits were constantly moving. For register storage such as the accumulator, it used six magnetostrictive torsional delay lines, storing a sequence of bits as physical twists that formed pulses racing through a long coil of wire.

The photo below shows the Arma Micro Computer with the case removed. If you look closely, you can see the 22 small circuit wafers mounted on each printed circuit board. The memory driver boards and delay lines are towards the back, spaced more widely than the other printed circuit boards. The cable harness underneath the boards provides the connections between boards.4

Circuit boards inside the Arma Micro Computer. Photo courtesy of Daniel Plotnick.

Circuit boards inside the Arma Micro Computer. Photo courtesy of Daniel Plotnick.

Transfluxors

One of the most unusual parts of the Micro Computer was its storage. Computers at the time typically used magnetic core memory, with each bit stored in a tiny ferrite ring, magnetized either clockwise or counterclockwise to store a 0 or 1. One drawback of standard core memory was that the process of reading a core also cleared the core, requiring data to be written back after a read.

Diagram of Arma's memory system. From patent 3048828.

Diagram of Arma's memory system. From patent 3048828.

The Micro Computer used ferrite cores, but these were "two-aperture" cores, with a larger hole and a smaller hole, as shown above. Data is written to the "major aperture" and read from the "minor aperture". Although the minor aperture switches state and is erased during a read, the major aperture retains the bit, allowing the minor aperture to be switched back to its original state. Thus, unlike regular core memory, transfluxors don't lose their data when reading.

The resulting system is called non-destructive readout (NDRO), compared to the destructive readout (DRO) of regular core memory.6 The Micro Computer used non-destructive readout memory to ensure that the program memory remained uncorrupted. In contrast, if a program is stored in regular core memory, each instruction must be written back as it is executed, creating the possibility that a transient could corrupt the software. By using transfluxors, this possibility of error is eliminated. (In either case, core memory has the convenient property that data is preserved when power is removed, since data is stored magnetically. With modern semiconductor memory, you lose data when the power goes off.)

The photo below shows a compact transfluxor-based storage module used in the Micro Computer, holding 512 words. In total, the computer could hold up to 7808 words of program memory and 256 words of data memory. It appears that transfluxors didn't live up to their promise, since most computers used regular core memory until semiconductor memory took over in the early 1970s.

Transfluxor-based core memory module from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Transfluxor-based core memory module from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Arma's history and the path to the Micro Computer

The Arma Engineering Company was founded in 1918 and built advanced military equipment.7 Its first product was a searchlight for the Navy, followed by a gyroscopic compass and analog computers for naval gun targeting. In 1939, Arma produced the Torpedo Data Computer, a remarkable electromechanical analog computer. US submarines used this computer to track target ships and automatically aim torpedos. The Torpedo Data Computer performed complex trigonometric calculations and integration to account for the motion of the target ship and the submarine. While the Torpedo Data Computer performed well, the Navy's Mark 14 torpedo had many problems—running too deep, exploding too soon, or failing to explode—making torpedoes often ineffectual even with a perfect hit.

The Torpedo Data Computer Mark III in the USS Pampanito.

The Torpedo Data Computer Mark III in the USS Pampanito.

Arma underwent major corporate changes due to World War II. Before the war, the German-owned Bosch Company built vehicle starters and aircraft magnetos in the United States. When the US entered World War II in 1941, the government was concerned that a German-controlled company was manufacturing key military hardware so the Office of Alien Property Custodian took over the Bosch plant. In 1948, the banking group that controlled Arma bought Bosch from the Office of the Alien Property Custodian, merging them into the American Bosch Arma Corporation (AMBAC).8 (Arma had earlier received the rights to gyrocompass technology from the German Anschutz company, seized by the Navy after World War I, so Arma benefitted twice from wartime government seizures.)

In the mid-1950s, Arma moved into digital computers, building an inertial guidance computer for the Atlas nuclear missile program. America's first ICBM was the Atlas missile, which became operational in 1959. The first Atlas missiles used radio guidance from the launch site to direct the missile. Since radio signals could be jammed by the enemy, this wasn't a robust solution.

The solution to missile guidance was an inertial navigation system. By using sensitive gyroscopes and accelerometers, a missile could continuously track its position and velocity without any external input, making it unjammable. A key developer of this system was Arma's Wen Tsing Chow, one of the driving forces behind digital aviation computers. He faced extreme skepticism in the 1950s for the idea of putting a computer in a missile. One general mocked him, asking "Where are you going to put the five Harvard professors you'll need to keep it running?" But computerized navigation was successful and in 1961, the Atlas missile was updated to use the Arma inertial guidance computer. It was said to be the first production airborne digital computer.9 Wen Tsing Chow also invented the programmable read-only memory (PROM), allowing missile targeting information to be programmed into a computer outside the factory.

Wen Tsing Chow, computer engineer, with Arma Micro Computer. From Control Engineering, January 1963, page 19. Courtesy of Daniel Plotnick.

Wen Tsing Chow, computer engineer, with Arma Micro Computer. From Control Engineering, January 1963, page 19. Courtesy of Daniel Plotnick.

The photo below shows the Atlas ICBM's guidance system. The Arma W-107A computer is at the top and the gyroscopes are in the middle. This computer was an 18-bit serial machine running at 143.36 kHz. It ran a hard-wired program that integrated the accelerometer information and solved equations for the crossrange error function, range error function, and gravity, making these computations every half second.10 The computer weighed 240 pounds and consumed 1000 watts. The computer contained about 36,000 components: discrete transistors, diodes, resistors, and capacitors mounted on 9.5" × 6.5" printed-circuit boards. On the ground, the computer was air-cooled to 55 °F, but there was no cooling after launch as the computer only operated for five minutes of powered flight and wouldn't overheat during that time.

Guidance system for Atlas ICBM.  From "Atlas Inertial Guidance System" by John Heiderstadt. Photo unclassified in 1967.

Guidance system for Atlas ICBM. From "Atlas Inertial Guidance System" by John Heiderstadt. Photo unclassified in 1967.

The Atlas wasn't originally designed for a computerized guidance system so there wasn't room inside the missile for the computer. To get around this, a large pod was stuck on the side of the missile to hold the computer and gyroscopes, as indicated in the photo below. This doesn't look aerodynamic, but I guess it worked.

Atlas missile. Arrow indicates the pod containing the Arma guidance computer and inertial navigation system. Original photo by Robert DuHamel, CC BY-SA 3.0.

Atlas missile. Arrow indicates the pod containing the Arma guidance computer and inertial navigation system. Original photo by Robert DuHamel, CC BY-SA 3.0.

The Atlas guidance computer (left, below) consisted of three aluminum sections called "decks". The top deck held two replaceable target constant units, each providing 54 navigation constants that specified a target. The constants were stored in a stack of printed circuit boards 16" × 8" × 1.5", covered in over a thousand diodes, Wen Tsing Chow's PROM memory. A target was programmed into the stack by a rack of equipment that would selectively burn out diodes, changing the corresponding bit to a 1. (This is why programming a PROM is referred to as "burning the PROM".11) The diode matrix was later replaced with a transfluxor memory array, which had the advantage that it could be reprogrammed as necessary. The top deck also had connectors for the accelerometer inputs, the outputs, and connections for ground support equipment. The bottom deck had power connectors for 28 volts DC and 115V 400 Hz 3-phase AC. In the bottom deck, quartz delay lines were used for storage, representing bits as acoustic waves. Twelve circuit cards, each with a faceted quartz block four inches in diameter, provided a total of 32 words of storage.

Three generations of Arma Computers: the W-107A Atlas ICBM guidance computer,  the Lightweight Airborne Digital Computer, and the Arma Micro Computer (perhaps a prototype). Photo courtesy of Daniel Plotnick.

Three generations of Arma Computers: the W-107A Atlas ICBM guidance computer, the Lightweight Airborne Digital Computer, and the Arma Micro Computer (perhaps a prototype). Photo courtesy of Daniel Plotnick.

Arma considered the Micro Computer the third generation of its airborne computers. The first generation was the Atlas guidance computer, constructed from germanium transistors and diodes (in the pre-silicon era). The second-generation computer moved to silicon transistors and diodes. The third-generation computers still used discrete components, but mounted on the small square wafers. The third generation also had a general-purpose architecture and programmable transfluxor memory instead of a hard-wired program.

After the Micro Computer

Arma continued to develop computers, improving the Arma Micro Computer. The Micro C computer (1965) was developed for Navy ships and submarines. Much like the original Micro, the Micro C used transfluxor storage, but increased the clock frequency to 972 kHz. The computer was much larger: 3.87 cubic feet and 150 pounds. This description states that "the machine is an outgrowth of the ARMA product line of micro computers and is logically and electrically similar to micro-computers designed for missile environments."

Module from the Arma Micro-C Computer. Photo courtesy of Daniel Plotnick.

Module from the Arma Micro-C Computer. Photo courtesy of Daniel Plotnick.

In mid-1966, Arma introduced the Micro D computer, built from TTL integrated circuits. Like the original Micro, this computer was serial, but the Micro D had a word length of 18 bits and ran at 1.5 MHz. It weighed 5.25 pounds and was very compact, just 0.09 ft3. Instead of transfluxors, the Micro D used regular magnetic core memory, 4K to 31K words.

The Arma Micro-D 1801 computer. The 1808 was a slightly larger model. Photo courtesy of Daniel Plotnick.

The Arma Micro-D 1801 computer. The 1808 was a slightly larger model. Photo courtesy of Daniel Plotnick.

The widely-used Litton LTN-51 inertial navigation system was built around the Arma Micro-D computer.12 This navigation system was designed for commercial aircraft, but was also used for military applications, ships, and NASA aircraft. Aircraft from early Concordes to Air Force One used the LTN-51 for navigation. The photo below shows a navigation unit with the Arma Micro-D computer in the lower left and the gyroscope unit on the right.

Litton LTN-51 inertial navigation system.  Photo courtesy of pascal mz, concordescopia.com.

Litton LTN-51 inertial navigation system. Photo courtesy of pascal mz, concordescopia.com.

In early 1968, the Arma Portable Micro D was introduced, a 14-pound battery-powered computer also called the Celestial Data Processor. This handheld computer was designed for navigation in crewed earth orbital flight, determining orbital parameters from stadimeter and sextant measurements performed by astronauts. As far as I can tell, this computer never made it beyond the prototype stage.

The Arma Celestial Data Processor (source).

The Arma Celestial Data Processor (source).

Conclusions

The Arma Micro Computer is just one of the dozens of compact aerospace computers of the 1960s, a category that is mostly forgotten and ignored. Another example is the Delco MAGIC I (1961), said to be the "first complete airborne computer to have its logic functions mechanized exclusively with integrated circuits". IBM's 4 Pi series started in 1966 and was used in many systems from the F-15 to the Space Shuttle. By 1968, denser MOS/LSI chips were used in general-purpose aerospace computers such as the Rockwell MOS GP and the Texas Instruments Model 2502 LSI Computer. 13

Arma also illustrates that a company can be on the cutting edge of technology for decades and then suddenly go out of business and be forgotten. After some struggles, Arma was acquired by United Technologies in 1978 for $210 million and was then shut down in 1982. (The German Bosch corporation remains, now a large multinational known for products such as dishwashers, auto parts, and power tools.) Looking at a list of aerospace computers shows many innovative but vanished companies: Univac, Burroughs, Sperry (now all Unisys), AC Electronics (now part of Raytheon), Autonetics (acquired by Boeing), RCA (bought by GE), and TRW (acquired by Northrop Grumman).

Finally, the Micro Computer illustrates that terms such as "microcomputer" are not objective categories but are social constructs. At first, it seems obvious that the Arma Micro Computer is not a real microcomputer. If you consider a microcomputer to be a computer built around a microprocessor, that's true. (Although "microprocessor" is also not as clear as you might think.) But a microcomputer can also be defined as "A small computer that includes one or more input/output units and sufficient memory to execute instructions" (according to the IBM Dictionary of Computing, 1994)14 and the Arma Micro Computer meets that definition. The "microcomputer" is a shifting concept, changing from the 1960s to the 1990s to today.

For more, follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @kenshirriff@oldbytes.space. Thanks to Daniel Plotnick for providing a great deal of information and photos. Thanks to John Hartman for obtaining an obscure conference proceedings for me.

Notes and references

  1. I should mention the danger of "firsts" from a historical perspective. Historian Michael Williams advised "not to use the word 'first'" and said, "If you add enough adjectives to a description you can always claim your own favorite." (See ENIAC in Action, p7.)

    The first usage of "micro-computer" that I could find is from 1956. In Isaac Asimov's short story "The Dying Night", he mentions a "micro-computer" in passing: "In recent years, it [the handheld scanner] had become the hallmark of the scientist, much as the stethoscope was that of the physician and the micro-computer that of the statistician."

    Another interesting example of a "micro-computer" is the Texas Instruments Semiconductor Network Computer. This palm-sized computer is often considered the first integrated-circuit computer. It was an 11-bit serial computer running at 100 kHz, built out of RS flip-flops, NOR gates, and logic drivers. The 1961 article below described this computer as a "micro-computer", although this was a one-off use of the term, not the computer's name. This brochure describes the Semiconductor Network Computer in more detail and Semiconductor Networks are described in detail in this article. Unlike modern ICs, these integrated circuits used flying wires for internal connections rather than a deposited metal layer, making their design a dead end.

    The Texas Instruments Semiconductor Network Computer. From Computers and Automation, Dec. 1961.

    The Texas Instruments Semiconductor Network Computer. From Computers and Automation, Dec. 1961.

     

  2. Most of the information on the Arma Micro Computer in this article is from "The Arma Micro Computer for Space Applications", by E. Keonjian and J. Marx, Spaceborne Computing Engineering Conference, 1962, pages 103-116. 

  3. The Arma Micro Computer's instruction set consisted of 19 22-bit instructions, shown below.

    Instruction set of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    Instruction set of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

     

  4. This block diagram shows the structure of the Micro Computer. The accumulator register (AC) is used for all data transfers as well as addition and subtraction. The multiply-divide register is used for multiplication, division, and square roots. The product register (PR), quotient register (QR), and square root register (SR) are used by the corresponding instructions. The data buffer register (S) holds data moving in or out of storage; it is shown with two 11-bit parts.

    Block diagram of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    Block diagram of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    For control logic, the location counter (L) is the 13-bit program counter. For a subroutine call, the current address can be stored in the recall register (RR), which acts as a link register to hold the return address. (The RR is not shown on the diagram because it is held in memory.) Instruction decoding uses the instruction register (I), with the next instruction in the instruction buffer (B). The operand register (P) contains the 13-bit address from an instruction, while the remaining register (R) is used for I/O addressing. 

  5. Arma's original plan was to mount circuits on ceramic wafers. Resistors would be printed onto the wafer and wiring silk-screened. (This is similar to IBM's SLT modules (1964), although IBM mounted diode and transistors as bare dies rather than components.) However, the Micro Computer ended up using epoxy-glass wafers with small, but discrete components: standard TO-46 transistors, "fly-speck" diodes, and 1/10 watt resistors. I don't see much advantage to these wafers over mounting the components directly on the printed-circuit board; maybe standardization is the benefit. 

  6. The Micro Computer used an unusual mechanism to select a word to read or write. Most computers used a grid of selection wires; by energizing an X and a Y wire at the same time, the corresponding core was selected. The key idea of this "coincident-current" approach is that each wire has half the current necessary to flip a core, so the core with the energized X and Y wires will have enough current to flip. This puts tight constraints on the current level, since too much current will flip all the cores along the wire, but not enough current will not flip the selected current. What makes this difficult is that the properties of a core change with temperature, so either the cores need to be temperature-stabilized or the current needs to be adjusted based on the temperature.

    The Micro Computer instead used a separate wire for each word, so as long as the current is large enough, the cores will flip. This approach avoids the issues with temperature sensitivity, an important concern for a computer that needs to handle the large temperature swings of a spacecraft, not an air-conditioned data center. Unfortunately, it requires much more wiring. Specifically, the large advantage of the coincident-current approach is that an N×N grid of wires lets you select N2 words. With the Micro Computer approach, N wires only select N words, so the scalability is much worse.

    For more on Arma's memory systems, see patents: Memory Device, 3048828 and Multiaperture Core Memory Matrix, 3289181

  7. The capitalization of Arma vs. ARMA is inconsistent. It often appears in all-caps, but both forms are used, sometimes in the same article. "Arma" is not an acronym; the name came from the names of its founders: Arthur Davis and David Mahood (source: Between Human and Machine, p54). I suspect a 1960s corporate branding effort was responsible for the use of all-caps. 

  8. For more on the corporate history of Arma, see IRE Pulse, March 1958, p9-10. Details of corporate politics and what went wrong are here. More information on the financial ups and downs of Arma is in "Charles Perelle's Spacemanship", Fortune, January 1959, an article that focused on Charles Perelle, the president of American Bosch Arma. 

  9. Wikipedia says that Arma's guidance computer was "the first production airborne digital computer". However, the Hughes Digitair (1958) has also been called "the first airborne digital computer in actual production." Another source says the Arma computer was the "first all-solid-state, high-reliability, space-borne digital computer." The TRADIC (Transistorized Airborne Digital Computer) (1954) was earlier, but was a prototype system, not a production system. In turn, the TRADIC is said by some to be the first fully transistorized computer, but that depends on exactly how you interpret "fully".

    This is another example of how the "first" depends on the specific adjectives used. 

  10. The information on the Arma W-107A computer is from "Atlas Inertial Guidance System: As I Remember It" by Principal Engineer John Heiderstadt. 

  11. Chow Wen Tsing's PROM patent discusses the term "burning", explaining that it refers to burning out the diodes electrically. To widen the patent, he clarifies that "The term 'blowing out' or 'burning out' further includes any process which, by means less drastic than actual destruction of the non-linear elements, effects a change of the circuit impedance to a level which makes the particular circuit inoperative." This description prevented someone from trying to get around the patent by stating that nothing was really burning. 

  12. Details on the LTN-51 navigation system and its uses are in this document

  13. For more information on early aerospace computers, see State-of-the-art of Aerospace Digital Computers (1967), updated as Trends in Aerospace Digital Computer Design (1969). Also see the 1970 Survey of Military CPUs. Efficient partitioning for the batch-fabricated fourth generation computer (1968) discusses how "The computer industry is on the verge of an upheaval" from new hardware including LSI and fast ROMs, and describes various LSI aerospace computers. 

  14. The "IBM Dictionary of Computing" (1994) has two definitions of "microcomputer": "(1) A digital computer whose processing unit consists of one or more microprocessors, and includes storage and input/output facilities. (2) A small computer that includes one or more input/output units and sufficient memory to execute instructions; for example a personal computer. The essential components of a microcomputer are often contained within a single enclosure." The latter definition was from an ISO/IEC draft standard for terminology so it is somewhat "official". 





[#] Sat Mar 23 2024 09:38:41 UTC from rss <>

Subject: The Intel 8088 processor's instruction prefetch circuitry: a look inside

[Reply] [ReplyQuoted] [Headers] [Print]

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the dominance of the x86 architecture that continues to the present.1 One way that the 8086 and 8088 increased performance was by prefetching: the processor fetches instructions from memory before they are needed, so the processor can execute them without waiting on the relatively slow memory. I've been reverse-engineering the 8088 from die photos and this blog post discusses what I've uncovered about the prefetch circuitry.

The die photo below shows the 8088 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; this article focuses on the prefetch queue components highlighted in red. The components in purple also play a role, and will be discussed below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles memory accesses, while the Execution Unit (EU) executes instructions. In particular, the BIU fetches instructions, which are transferred from the prefetch queue to the Execution Unit via the queue bus.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. The 8088's narrower bus reduced performance, since the processor only transfers one byte at a time rather than two. However, the 8-bit bus enabled cheaper computer hardware. The 8-bit bus was also a better match for hardware based on the older but popular 8-bit Intel 8080 and 8085 processors, allowing the reuse of 8-bit I/O circuitry for instance. Much of the IBM PC was based on the little-known IBM DataMaster, a computer built around the Intel 8085. Thus, selecting the 8088 processor was a natural choice for the IBM PC.

For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. The biggest differences are in the Bus Interface Unit, the circuitry that communicates with memory and I/O devices, since this circuitry handles 16 bits in the 8086 versus 8 bits in the 8088. There are a few microcode differences between the two chips. One interesting change is that for performance reasons the 8088 has a smaller prefetch queue than the 8086 (four bytes instead of six). (I wrote about the 8086's prefetch circuity earlier.)

Prefetching and the architecture of the 8086 and 8088

The 8086 and 8088 were introduced at an interesting point in microprocessor history, when memory was becoming slower than the CPU. For the first microprocessors, the speed of the CPU and the speed of memory were comparable.2 However, as processors became faster, the speed of memory failed to keep up. The 8086 was probably the first microprocessor to prefetch instructions to improve performance. While modern microprocessors have megabytes of fast cache3 to act as a buffer between the CPU and much slower main memory, the 8088 has just 4 bytes of prefetch queue. However, this was enough to substantially increase performance.

Prefetching had a major impact on the design of the 8086 and thus the 8088. Earlier processors such as the 6502, 8080, or Z80 were deterministic: the processor fetched an instruction, executed the instruction, and so forth. Memory accesses corresponded directly to instruction fetching and execution and instructions took a predictable number of clock cycles. This all changed with the introduction of the prefetch queue. Memory operations became unlinked from instruction execution since prefetches happen as needed and when the memory bus is available.

To handle memory operations and instruction execution independently, the implementors of the 8086 and 8088 divided the processors into two processing units: the Bus Interface Unit (BIU) that handles memory accesses, and the Execution Unit (EU) that executes instructions. The Bus Interface Unit contains the instruction prefetch queue; it supplies instructions to the Execution Unit via the Q (queue) bus. The BIU also contains an adder (Σ) for address calculation, adding the segment register base to an address offset, among other things. The Execution Unit is what comes to mind when you think of a processor: it has most of the registers, the arithmetic/logic unit (ALU), and the microcode that implements instructions. The segment registers (CS, DS, SS, ES) and the Instruction Pointer (IP) are in the Bus Interface Unit since they are directly involved in memory accesses, while the general-purpose registers are in the Execution Unit.

Block diagram of the 8088 processor.
This diagram differs from most 8088 block diagrams because it shows the actual physical implementation, rather than the programmer's view of the processor.
The "Internal Communication Registers" consist of the Indirect Register (IND) and the Operand Register (OPR). These hold a memory address and memory data value respectively.
From The 8086 Family User's Manual page 243.

Block diagram of the 8088 processor. This diagram differs from most 8088 block diagrams because it shows the actual physical implementation, rather than the programmer's view of the processor. The "Internal Communication Registers" consist of the Indirect Register (IND) and the Operand Register (OPR). These hold a memory address and memory data value respectively. From The 8086 Family User's Manual page 243.

It may seem inefficient for the Bus Interface Unit to have its own adder instead of using the ALU, but there are reasons for the separate adder. First, every memory access uses the adder at least once to add the segment base and offset. The adder is also used to increment the PC or index registers. Since these operations are so frequent, they would create a bottleneck if they used the ALU. Second, since the Execution Unit and the Bus Interface Unit run asynchronously with respect to each other, it would be complicated to share the ALU without conflicts.

Prefetching had another major but little-known effect on the 8086 architecture: the designers were considering making the 8086 a two-chip microprocessor. Prefetching, however, required a one-chip design because the number of control signals required to synchronize prefetching across two chips exceeded the package pins available. This became a compelling argument for the one-chip design that was used for the 8086.4 (The unsuccessful Intel iAPX 432, which was under development at the same time, ended up being a two-chip processor: one to fetch and decode instructions, and one to execute them.)

Implementing the queue

The 8088's instruction prefetch queue is implemented with four 8-bit queue registers along with two hardware "pointers" into the queue. One two-bit counter keeps track of the current read position from 0 to 3, i.e. the queue register that will provide the next instruction byte. The second counter keeps track of the current write position, i.e. the queue register that will receive the next instruction from memory.5 As bytes are fetched from the queue, the read pointer advances. As bytes are added to the queue, the write pointer advances.

The diagram below shows an example queue configuration with two prefetched bytes. The middle two queue registers (Q1 and Q2) hold data. The read pointer indicates that the Execution Unit will get its next byte from Q1. The write pointer indicates that the next prefetched byte will go into Q3.

A queue configuration with two bytes in the prefetch queue. Bytes in blue hold prefetched data.

A queue configuration with two bytes in the prefetch queue. Bytes in blue hold prefetched data.

The diagram below shows how the queue pointers can wrap around. In this configuration, two more bytes have been written to the queue (Q3 and Q0), so the queue is full. The write pointer now points to Q1, the same as the read pointer.

A queue configuration with four bytes in the prefetch queue.

A queue configuration with four bytes in the prefetch queue.

There is an important ambiguity, however. Suppose that four bytes are read from the queue, so the read pointer advances four positions, wrapping around back to Q1. The queue is now empty, as shown below, but the pointers have the same position as the full case above. Thus, if the read pointer and the write pointer both point to the same position, the queue may be empty or full. To distinguish these cases, a flip-flop is set if the queue enters the empty state. This flip-flop generates a signal that Intel called MT (empty).

A queue configuration with the queue empty.

A queue configuration with the queue empty.

To determine how many bytes are in the queue, the queue circuitry uses a two-bit queue length value, along with the MT flip-flop value to distinguish the empty state. Conceptually, the queue length is generated by subtracting the read position from the write position. However, the implementation does not use a standard subtraction circuit, but instead uses hardcoded logic to determine the two bits of the length, as shown below.

The circuitry to determine the queue length.

The circuitry to determine the queue length.

The low bit of the length is the XOR of the two positions. In NMOS logic (used by the 8088), an AND-NOR gate is easy to implement, while an XOR gate is difficult. Thus, XOR is implemented as shown in the top circuit. (You can verify that if one input is 1 and the other is 0, the output is 1.) The high-order bit of the length is also based on an AND-NOR gate, one with six inputs. Each input is a combination of read and write positions that yields an output bit 1; each input is computed by a NOR gate, which I haven't drawn.6 As a result, the amount of logic circuitry to compute the length is fairly large.

The diagram below zooms in on the queue control circuitry on the die, with the main flip-flops and circuitry labeled. The circuitry in the middle computes the queue length with the 6-input NOR gate stretched across the whole region. The flip-flops for the read and write positions are in the lower region. Despite the relative simplicity of the queue circuits, they take up a substantial part of the die. Compared to modern chips, the density of the 8088 is very low; you can almost see the flip-flops with the naked eye. But this isn't all the circuitry as prefetching also required queue registers and memory cycle control circuitry. Thus, prefetching was a moderately expensive feature for the 8088, as far as die area.

The queue and prefetch circuitry on the die. The metal layer has been removed for the closeup to show the silicon of the underlying transistors.

The queue and prefetch circuitry on the die. The metal layer has been removed for the closeup to show the silicon of the underlying transistors.

The loader

To decode and execute an instruction, the Execution Unit must get instruction bytes from the Bus Interface Unit, but this is not entirely straightforward. The main problem is that the queue can be empty, in which case instruction decoding must block until a byte is available from the queue. The second problem is that instruction decoding is relatively slow so it is pipelined. For maximum performance, the decoder needs a new byte before the current instruction is finished. A circuit called the "loader" solves these problems by providing synchronization between the prefetch queue and the instruction decoder. The loader uses a small state machine to efficiently fetch bytes from the queue at the right time and to provide timing signals to the decoder and microcode engine.

In more detail, as the loader requests the first two instruction bytes from the prefetch queue, it generates two timing signals that control the microcode execution. The FC (First Clock) indicates that the first instruction byte is available, while the SC (Second Clock) indicates the second instruction byte. Note that the First Clock and Second Clock are not necessarily consecutive clock cycles because the prefetch queue could be empty or contain just one byte, in which case the First Clock and/or Second Clock would be delayed. The instruction decoding circuitry and the microcode engine are controlled by the First Clock and Second Clock signals, so they remain synchronized with the bytes supplied by the prefetch queue.

At the end of a microcode sequence, the Run Next Instruction (RNI) micro-operation causes the loader to fetch the next machine instruction. However, fetching and decoding the next instruction is a bit slow so microcode execution would be blocked for a cycle. In many cases, this slowdown can be avoided: if the microcode knows that it is one micro-instruction away from finishing, it issues a Next-to-last (NXT) micro-operation so the loader can start loading the next instruction. This achieves a degree of pipelining in most cases; fetching the next instruction is overlapped with finishing the execution of the previous instruction.

The state machine for the 8086/8088 "loader" circuit.
The 1BL signal indicates a 1-byte instruction implemented in logic rather than microcode.
From patent US4449184.

The state machine for the 8086/8088 "loader" circuit. The 1BL signal indicates a 1-byte instruction implemented in logic rather than microcode. From patent US4449184.

The diagram above shows the state machine for the loader. I won't explain it in detail, but essentially it keeps track of whether it is waiting for a First Clock byte or a Second Clock byte, and if it is performing a fetch in advance (NXT) or at the end of an instruction (RNI). The state machine is implemented with two flip-flops to support its four states.

Microcode and the prefetch queue

The loader takes care of fetching an instruction that consists of an opcode byte and a Mod R/M (addressing mode) byte. However, many instructions have additional bytes or don't follow this format For example, an opcode such as "ADD AX" can be followed by an 8- or 16-bit immediate value, adding that value to the AX register. Or a "move memory to AX" instruction can be followed by a 16-bit memory address The microcode uses a separate mechanism for fetching these instruction bytes from the queue. Specifically, each micro-instruction contains a source register and a destination register that specify a data move. By specifying "Q" (the queue) as the source, a byte is fetched from the prefetch queue. If the queue is empty, microcode execution blocks until the Bus Interface Unit loads a byte into the prefetch queue. Thus, the complexity of instruction fetching and the prefetch queue is invisible to the microcode.7

A jump, subroutine call, or other control flow change causes the prefetch queue to be flushed since the queue contents are no longer useful. This is accomplished in microcode with the FLUSH micro-instruction, which resets the queue read and write pointers and sets the MT (empty) flip-flop. Note that the queue is flushed even if the target address is in the queue, for example if you jump one byte ahead.

One complication due to the prefetch queue is that the processor's Instruction Pointer points to the next instruction to be fetched, not the next instruction to be executed. This becomes a problem for a subroutine call, which needs to push the return address. It is also a problem for a relative jump, which is computed from the current instruction. The solution is the CORR micro-instruction, which corrects the Instruction Pointer by subtracting the queue length to determine the current execution position. This is implemented by the Bus Interface Unit, which holds correction constants in the Constant ROM, and subtracts them using the address adder (not the ALU).8

The queue registers

The 8086 and 8088 partition the registers into upper registers (in the Bus Interface Unit) and lower registers (in the Execution Unit). The upper registers are the registers associated with memory accesses (e.g. Instruction Pointer, segment registers) while the lower registers are more general purpose (e.g. AX, BX, SI, SP). The upper registers are connected to two 16-bit internal buses: the B bus and the C bus.

The queue registers are physically part of the upper registers, but are wired into the buses slightly differently, as shown below. In particular, the 8088's queue registers are written 8 bits at a time from the C bus. (In contrast, the 8086's queue registers can be written 16 bits at a time to support two-byte prefetches.) When accessing the queue, the queue registers are read 16 bits at a time, but only one byte is transferred to the Q bus for instruction processing.9

The queue registers in the 8088.

The queue registers in the 8088.

The diagram below shows how the queue registers appear on the die, comparing the six-byte prefetch queue in the 8086 (top) to the four-byte 8088 queue (bottom). The 8086 prefetch registers are structured as three rows of 16-bit registers, while the 8088 prefetch registers are structured as four rows of 8-bit registers. In both cases, each bit is stored in a cross-coupled pair of inverters. The bit lines (not present) are vertical, while the control lines to select a register are horizontal. The layout is different between the processors to support 16-bit versus 8-bit writes. Note the empty space at the bottom of the 8088 registers. Because the rest of the chips are mostly the same, the 8088 couldn't be "compacted" to avoid this wasted space.

The prefetch registers in the 8086 (top) and 8088 (bottom). For the 8086, the metal and polysilicon layers were removed, exposing the underlying silicon. For the 8088, the polysilicon and silicon are visible.

The prefetch registers in the 8086 (top) and 8088 (bottom). For the 8086, the metal and polysilicon layers were removed, exposing the underlying silicon. For the 8088, the polysilicon and silicon are visible.

Intel used simulations to determine the best queue sizes for the 8086 and 8088, balancing the performance cost of prefetching against the benefit. (The cost is that prefetching makes the bus unavailable for other memory or I/O operations.) The prefetch queue is discarded on a jump instruction or other change of control flow, causing the prefetched bytes to be wasted. Thus, as the queue gets longer, the chance of discarding a prefetched byte becomes larger, so the potential benefit of prefetching becomes smaller. Since the 8088 prefetches one byte at a time, compared to two bytes at a time on the 8086, prefetching on the 8088 costs twice as much as on the 8086 in terms of bus cycles used per byte. This changes the tradeoffs in favor of a shorter queue.

Because of the difference in queue lengths, the queue control circuitry is different between the 8086 and 8088. In particular, the 8086 needs three-bit counters for the read and write positions, while the 8088 uses two-bit counters. Because of this, the length computation circuitry is also different between the processors.

I plan to continue reverse-engineering the 8088 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier.

Notes and references

  1. Whenever I mention x86's domination of the computing market, people bring up ARM, but ARM has a lot more market share in people's minds than in actual numbers. One research firm says that ARM has 15% of the laptop market share in 2023, expected to increase to 25% by 2027. (Surprisingly, Apple only has 90% of the ARM laptop market.) In the server market, just an estimated 8% of CPU shipments in 2023 were ARM. See Arm-based PCs to Nearly Double Market Share by 2027 and Digitimes. (Of course, mobile phones are almost entirely ARM.) 

  2. Steve Furber, co-creator of the ARM chip, mentions that "The first integrated CPUs were coincidentally quite well matched to semiconductor memory speeds, and were therefore built without caches. This can now be seen as a temporary aberration." See VLSI Risc Architecture and Organization p77. To make this concrete, the Apple II (1977) used a MOS 6502 processor running at about 1 megahertz while its 4116 DRAM chips could perform an access in 250 nanoseconds (4 times the clock speed). The 8088 processor ran at 5-10 MHz which meant that 250 ns DRAM chips were slower than the clock speed. Nowadays, processors run at 4 GHz but DRAM access speed is about 50 nanoseconds (1/200 the clock speed). 

  3. Modern processors use caches to improve memory performance. Accessing data from a cache is faster than accessing it from main memory, but the tradeoff is that caches are much smaller than main memory. The prefetch queues in the 8086 and 8088 are similar to a cache in some ways, but there are some key differences. First, the prefetch queue is strictly sequential. If you jump ahead two bytes, even if the prefetch queue has those instruction bytes, the processor can't use them. Second, the prefetch queue can't reuse bytes. If you have a 6-byte loop, even though all the code fits in the prefetch queue, it will be reloaded every time. Third, the prefetch queue doesn't provide any consistency. If you modify an instruction in memory a couple of bytes ahead of the PC, the 8086 or 8088 will run the old instruction if it's in the queue. 

  4. The design decisions for the 8086 prefetch cache (and many other aspects of the chip) are described in: J. McKevitt and J. Bayliss, "New options from big chips," in IEEE Spectrum, vol. 16, no. 3, pp. 28-34, March 1979, doi: 10.1109/MSPEC.1979.6367944. Prefetch provided a 50% performance benefit to the 8086. 

  5. The queue read process doesn't use an explicit read operation. Instead, the selected queue register continuously puts its value onto the queue bus. When the Execution Unit uses this byte, it sends an increment signal to the queue to advance the read pointer. If the queue empty (MT) flip-flop is set, the Execution Unit will wait until a byte is ready. 

  6. The NOR gates are used as AND gates, following DeMorgan's laws. For example to produce a 1 output for write position 00 and read position 01, the logic is: NOR(write bit 1', write bit 0', read bit 1', read bit 0). Note that the bits into the NOR gate are all inverted from the "desired" values; if they are all 0, the NOR output is 1. Thus, there are also some inverters on the inputs. 

  7. Arbitrary memory reads and writes are performed directly on memory, bypassing the prefetch queue. The 8086/8088 do not provide consistency; if you modify an instruction byte in memory and the byte is in the queue, the processor will execute the old byte. (This type of self-modifying code can be used to determine the queue length, distinguishing the 8086 from the 8088 in software.) 

  8. The Constant ROM is used for more than just address correction. For example, it is also used to increment the Instruction Pointer after a prefetch. Other constants are used for the 8088's string operations, which act on a block of memory. The index registers are incremented or decremented by 1 for bytes or 2 for words. When popping a value from the stack, the stack pointer is decremented using the Constant ROM. 

  9. Are the 8088's queue registers 16 bits wide or 8 bits wide? It's ambiguous, since the registers are written 8 bits at a time, but read 16 bits at a time. This implementation was probably selected to support the 8088's 8-bit bus while reusing as much of the 8086 design as possible. In particular, the 8088 can only prefetch one byte at a time, so writes need to happen a byte at a time. Thus, there are four control lines selecting which queue byte is written. (The 8088 could write to half of a 16-bit register but that would require moving the prefetched byte to the correct half of a 16-bit bus.) On the read side, it would make sense to have four read lines, selecting one byte from the 8088's queue. However, since the 8086 already had a multiplexer to select one byte from two, the 8088 designers probably felt it was easier to keep that circuit. And with the smaller queue on the 8088, there was no need to try to save space by removing the circuit. Thus, the queue has two read-select lines and a multiplexer control line. All these lines are controlled by the write position and read position flip-flops. 





[#] Sat Mar 30 2024 10:28:08 UTC from rss <>

Subject: Inside an unusual 7400-series chip implemented with a gate array

[Reply] [ReplyQuoted] [Headers] [Print]

When I look inside a chip from the popular 7400 series, I know what to expect: a fairly simple die, implemented in a straightforward, cost-effective way. However, when I looked inside a military-grade chip built by Integrated Device Technology (IDT)4 I found a very unexpected layout: over 1500 transistors in an orderly matrix. Even stranger, most of the die is wasted: less than 20% of these transistors are used, forming scattered circuits connected by thin metal wires.

In this blog post, I look at this chip in detail, describe its gates, and explain how it implements the "1-of-4" decoder function. I also discuss why it sometimes makes sense to build chips with a gate array design such as this, despite the inefficiency.

A photo of the tiny silicon die in its package.  This chip is the IDT 54FCT139ALB dual 1-of-4 decoder.  Click this image (or any other) for a larger version.

A photo of the tiny silicon die in its package. This chip is the IDT 54FCT139ALB dual 1-of-4 decoder. Click this image (or any other) for a larger version.

In the photo below, you can see the silicon die in more detail, with the silicon appearing pink. The main circuitry is implemented in the nine rows that form the gate array, a grid of 1584 transistors. The tiny dark rectangles are transistors of two types, NMOS and PMOS, that work together to implement CMOS logic circuits. At this scale, the metal wiring is visible as faint gray lines and smudges, but most of the transistors are unconnected. Surrounding the gate array are 22 input/output (I/O) blocks each with a square bond pad. As with the transistors, many of these I/O blocks are unused. Fourteen of these bond pads have tiny metal bond wires (the thick black lines) that connect the silicon die to the chip's external pins. Finally, the pairs of bond wires at the center left and center right provide ground and power connections for the chip.

Closeup die photo.

Closeup die photo.

The photo below zooms in on three rows of circuitry in the chip. The large dark rectangles are pairs of transistors, with two lines of transistors in each row of circuitry. At the top and bottom of each row, the thick horizontal white lines are metal wiring that provides power and ground. In each row, one line of transistors holds PMOS transistors, next to the power wiring, while the other line holds NMOS transistors, next to the ground wiring. (The orientation flips in each successive row, so it isn't obvious which transistors are which unless you check the power connections at the end of the row.)

A closeup of the die.

A closeup of the die.

The transistors are wired into gates by the metal layers, the white lines. The gates are connected by horizontal and vertical wiring using the wiring channels between the rows. This wiring style is very similar to standard-cell logic. However, unlike standard-cell logic, the underlying transistor grid is fixed, resulting in wasted transistors. In the image above, most of the transistors in the middle row are used, while the top row is unused and the bottom row is mostly unused.

The diagram below shows the structure of one of the transistor blocks, which contains two tall, thin MOS transistors. The vertical metal contacts connect to the sources and drains of the transistors, with the two transistors sharing the middle contact. (On an integrated circuit, the source and drain of a transistor are identical, so it is arbitrary which side is the source and which is the drain.) The short horizontal metal contacts at the top connect to the gates of the two transistors; the gates are made of polysilicon, which is barely visible in the die photo. The gates partition the active silicon (green), forming the transistors. The gate width is approximately 1 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

NAND gate

In this section, I'll explain the construction of one of the NAND gates on the die. The NAND gate below uses four transistors, two NMOS transistors on the top and two PMOS transistors on the bottom. The white lines are the metal wiring, forming two layers. Most of the wiring (including power and ground) is in the lower (M1) layer. The slightly wider and darker vertical segments are the upper (M2) layer. The circles connect the metal layers when they join, or connect the metal layer to the underlying silicon or polysilicon. With two metal layers, it's a bit tricky to see how the wiring is connected. The A and B inputs each connect to two transistor gates. The transistor group at the top is connected to ground on the right, with the output on the left. The transistor group on the bottom is connected to Vcc on the left and right, with the output in the middle. This has the effect of putting the upper transistors in series and the lower transistors in parallel.

A NAND gate on the die.

A NAND gate on the die.

Below, I've drawn the schematic of the NAND gate. On the left, the layout of the schematic matches the die layout above. On the right, I've redrawn the schematic with a more traditional layout. To understand its operation, note that a PMOS transistor (top on the right schematic) turns on when the input is low, while an NMOS transistor (bottom on the right) turns on when the input is high. When both inputs are high, the two NMOS transistors turn on, connecting ground to the output, pulling it low. When either input is low, one of the PMOS transistors turns on, pulling the output high. Thus, the circuit implements the NAND function. The NMOS and PMOS transistors operate in a complementary fashion, giving CMOS (Complementary MOS) its name.

Schematic of a NAND gate.

Schematic of a NAND gate.

NOR gate

In this section, I'll explain the layout of one of the NOR gates on the die, shown below. This gate is twice as large as the previous NAND gate so it can provide twice the output current.1 The NOR gate uses eight transistors, four PMOS transistors in the upper half and four NMOS transistors in the lower half. (Note that Vcc and ground are flipped compared to the previous gate, as are the NMOS and PMOS transistors.) The two transistors in each block are wired in parallel to produce more current for the output. (A out is the same signal as A in, exiting the block at the top to connect to other circuitry.)

A NOR gate on the die.

A NOR gate on the die.

The schematic below shows the wiring of the eight transistors. The schematic layout corresponds to the physical layout to make it easier to map between the image and the schematic. The upper transistor groups are wired in series, while the lower transistor groups are wired in parallel.

Schematic corresponding to the gate above.

Schematic corresponding to the gate above.

The schematic below has been redrawn to make the functionality clearer, and the parallel transistors have been removed. If either input is high, one of the NMOS transistors on the bottom will turn on and pull the input low. If both inputs are low, the two PMOS transistors will turn on and pull the input high. This provides the desired NOR function.

Simplified NOR gate schematic.

Simplified NOR gate schematic.

Note that the NAND and NOR gates have similar but opposite schematics. In the NAND gate, the NMOS transistors are in series while the PMOS transistors are in parallel. In the NOR gate, the roles of the transistors are swapped.

The chip's circuit

The chip I examined is a "dual 1-of-4 decoder with enable".2 The decoding function takes a two-bit input and selects one of four output lines depending on the binary value. The enable line must be low to activate this operation; otherwise all four output lines are disabled. The chip contains two of these decoders, which is why it is called a dual decoder. In total, the chip contains 18 logic gates,3 so it is very simple, even by 1990s standards.

I reverse-engineered the chip and created the schematic below, showing one of the dual units. Each NAND gate matches one of the four input possibilities to drive one of the four outputs. The NOR gates support the ENABLE signal, blocking the outputs unless ENABLE is active (i.e. low).

Reverse-engineered schematic of half the chip.

Reverse-engineered schematic of half the chip.

The chip uses a general-purpose I/O block (below) for each pin, that can be used as an input or an output depending on how it is wired. Each block contains two large drive transistors: an NMOS transistor to pull the output low and a PMOS transistor to pull the output high. The I/O block has separate control lines for the two output transistors. (At the bottom of the image below, two thin metal wires drive the high-side and low-side transistors.) This permits tri-state logic: if neither transistor is energized, the output is left floating. The gate array drives the output transistors with high-current inverter, constructed from multiple transistors in parallel. (This is why the schematic shows more inverters than may seem necessary.)

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

When used as an input, the pad is wired to the surrounding circuitry slightly differently, connecting to input protection diodes (not shown on the schematic). Thus, the functionality of the I/O blocks can be changed by modifying the metal layers, without changing the underlying silicon.

Some 7400-series history

The earliest logic integrated circuits used resistors and transistors internally, so they were called RTL (Resistor Transistor Logic), but RTL had significant performance problems. RTL was rapidly replaced by Diode Transistor Logic (DTL) and then Transistor Transistor Logic (TTL). In 1964, Texas Instruments created a line of TTL integrated circuits for military applications called the SN5400 series. This was shortly followed by the commercial-grade SN7400 series.

The 7400 series of integrated circuits was inexpensive, fast, and easy to use. The line started with simple logic circuits such as four NAND gates on a chip, and moved into more complex chips such as counters, shift registers, and ALUs. The 7400 series became very popular in the 1970s and 1980s, used by electronics hobbyists and high-performance minicomputers alike. These chips became essential building blocks and "glue" logic for microcomputers, heavily used in the Apple II for instance.

The original 7400 series branched into dozens of families with different performance characteristics but the same functionality. The 74LS (low-power Schottky) family, for instance, became very popular as it both improved speed and reduced power consumption. In the mid-1970s, 7400-series chips were introduced that used CMOS circuitry instead of TTL for dramatically lower power consumption. This CMOS family, the 74C series, was followed by numerous other CMOS families.

That brings us to the chip I examined, a member of IDT's 74FCT (Fast CMOS TTL-compatible) line of chips, introduced in the mid-1980s. (Specifically, it is in the 54FCT family because it handles a wider temperature range.) These chips used advanced CMOS technology to provide high speed, low power consumption, and as a military option, radiation tolerance.

Conclusions

Why would you make a chip in this inefficient way, using a gate array that wastes most of the die area? The motivation is that most of the design cost can be shared across many different part types. Each step of integrated circuit processing requires an expensive mask for photolithography. With a gate array, all chip types use the same underlying silicon and transistors, with custom masks just for the two metal layers. In comparison, a fully custom chip might require eight custom masks, which costs much more. The tradeoff is that gate array chips are larger so the manufacturing cost is higher per device.5 Thus, a gate array design is better when selling chips in relatively small quantities, while a custom design is cheaper when mass-producing chips.6 IDT focused on the high-performance and military market rather than the commodity chip market, so gate arrays were a good fit.

One last thing. The packaging of this chip is very interesting since it is mounted on a multi-chip module. The module also contains two Atmel EEPROMs. Presumably the decoder chip decodes address bits to select one of the EEPROMs.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

Thanks to Don S. for providing the chip. Follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

  1. Properly sizing the transistors in a gate is important for performance. Since the transistors in the gate array are all the same size, multiple transistors are used in parallel to get the desired current. The 1999 book Logical Effort describes a methodology for maximizing the performance of CMOS circuits by correctly sizing the transistors. 

  2. The part number is "IDT 54FCT139ALB". "54" indicates the chip operates under an enhanced temperature range of -55°C to +125°C. The "A" indicates the chip is 35% faster than the base series (but not as fast as "C"). "L" indicates the chip is packaged in a leadless chip carrier, the square package shown at the top of the article. Finally, "B" indicates the chip was tested according to military standards: MIL-STD-883, Class B. 

  3. The chip contains 18 logic gates according to the functional schematic in the datasheet (below). The implementation actually uses 52 logic gates by my count (2×26) because the implementation doesn't exactly match the schematic. In particular, the datasheet shows three-input NAND gates, but the chip uses a NAND gate and a NOR gate along with inverters. The chip also has additional inverters to drive the output transistors in each I/O block.

    Schematic of the chip from the datasheet.

    Schematic of the chip from the datasheet.

     

  4. Integrated Device Technology was a spinoff from Hewlett Packard that started in 1980. IDT built advanced CMOS chips including fast static RAM and microprocessors (bit-slice and MIPS). It became part of Renesas in 2018. A very detailed 1986 profile of IDT is here. IDT's logo is pretty cool, combining a chip wafer and calculus.

    The logo of Integrated Device Technology.

    The logo of Integrated Device Technology.

    Here's how the logo looks on the die:

    Closeup of the die showing the IDT logo.

    Closeup of the die showing the IDT logo.

    The die also has the initials of the designers, along with some mysterious symbols. One looks like the Chinese character "正".

    Closeups of two parts of the die.

    Closeups of two parts of the die.

  5. Integrated circuit manufacturing is partitioned into the "front end of line", where the transistors are created on the silicon wafer, and the "back end of line", where the metal wiring is put on top to connect the transistors. With a gate array construction, the front end of line steps create generic gate array wafers. The back end of line steps then connect the transistors as desired for a particular component. The gate array wafers can be produced in large quantities and stored, and then customized for specific products in smaller quantities as needed. This reduces the time to supply a particular chip type since only the back end of line process needs to take place. 

  6. The IDT High-Speed CMOS Logic Design Guide briefly mentions the gate array design. The FCT family was built from two sizes of gate arrays, "4004" for smaller chips and "8000" for larger chips. Later, IDT shrunk the original "Z-step" gate arrays to smaller, higher-performance "Y-step" arrays. They then customized some of the devices to create the "W-step" devices. Looking at the markings on the die, we see that this chip uses the "4004Y" gate array.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

     





[#] Sun Apr 28 2024 08:35:56 UTC from rss <>

Subject: Talking to memory: Inside the Intel 8088 processor's bus interface state machine

[Reply] [ReplyQuoted] [Headers] [Print]

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the success of the x86 architecture. The designers of the IBM PC selected the 8088 for multiple reasons, but a key factor was that the 8088 processor's 8-bit bus was similar to the bus of the 8085 processor.1 The designers were familiar with the 8085 since they had selected it for the IBM System/23 Datamaster, a now-forgotten desktop computer, making the more-powerful 8088 processor an easy choice for the IBM PC.

The 8088 processor communicates over the bus with memory and I/O devices through a highly-structured sequence of steps called "T-states." A typical 8088 bus cycle consists of four T-states, with one T-state per clock cycle. Although a four-step bus cycle may sound straightforward, its implementation uses a complicated state machine making it one of the most difficult parts of the 8088 to explain. First, the 8088 has many special cases that complicate the bus cycle. Moreover, the bus cycle is really six steps, with two undocumented "extra" steps to make bus operations more efficient. Finally, the complexity of the bus cycle is largely arbitrary, a consequence of Intel's attempts to make the 8088's bus backward-compatible with the earlier 8080 and 8085 processors. However, investigating the bus cycle circuitry in detail provides insight into the timing of the processor's instructions. In addition, this circuitry illustrates the tradeoffs and implementation decisions that are necessary in a production processor. In this blog post, I look in detail at the circuitry that implements this state machine.

By examining the die of the 8088 microprocessor, I could reverse engineer the bus circuitry. The die photo below shows the 8088 microprocessor's silicon die under a microscope. Most visible in the photo is the metal layer on top of the chip, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below, with the two units running largely independently. The BIU handles bus communication (memory and I/O accesses), while the Execution Unit (EU) executes instructions. In the diagram, I've labeled the processor's key functional blocks. This article focuses on the bus state machine, highlighted in red, but other parts of the Bus Interface Unit will also play a role.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Although I'm focusing on the 8088 processor in this blog post, the 8086 is mostly the same. The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. In this article, I'm focusing on the 8088 processor, but most of the description applies to the 8086 as well. Instead of constantly saying "8086/8088", I'll refer to the 8088 and try to point out places where the 8086 is different.

The bus cycle

In this section, I'll describe the basic four-step bus cycles that the 8088 performs.2 To start, the diagram below shows the states for a write cycle (slightly simplified3), when the 8088 writes to memory or an I/O device. The external bus activity is organized as four "T-states", each one clock cycle long and called T1, T2, T3, and T4, with specific actions during each state. During T1, the 8088 outputs the address on the pins. During the T2, T3, and T4 states, the 8088 outputs the data word on the same pins. The external memory or I/O device uses the T states to know when it is receiving address information or data over the bus lines.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

For a read, the bus cycle is slightly different from the write cycle, but uses the same four T-states. During T1, the address is provided on the pins, the same as for a write. After that, however, the processor's data pins are "tri-stated" so they float electrically, allowing the external memory to put data on the bus. The processor reads the data at the end of the T3 state.

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

The purpose of the bus state machine is to move through these four T states for a read or a write. This process may seem straightforward, but (as is usually the case with the 8088) many complications make this process anything but easy. In the next sections, I'll discuss these complications. After that, I'll explain the state machine circuitry with a schematic.

Address calculation

One of the notable (if not hated) features of the 8088 processor is segmentation: the processor supports 1 megabyte of memory, but memory is partitioned into segments of 64 KB for compatibility with the earlier 8080 and 8085 processors. The 8088 calculates each 20-bit memory address by adding the value of a segment register to a 16-bit offset. This calculation is done by a dedicated address adder in the Bus Interface Unit, completely separate from the chip's ALU. (This address adder can be spotted in the upper left of the earlier die photo.)

Calculating the memory address complicates the bus cycle. As the timing diagrams above show, the processor issues the memory address during state T1 of the bus cycle. However, it takes time to perform the address calculation addition, so the address calculation must take place before T1. To accomplish this, there are two "invisible" bus states before T1; I call these states "TS" (T-start) and "T0". During these states, the Bus Interface Unit uses the address adder to compute the address, so the address will be available during the T1 state. These states are invisible to the external circuitry because they don't affect the signals from the chip.

Thus, a single memory operation takes six clock cycles: two preparatory cycles to compute the address before the four visible cycles. However, if multiple memory operations are performed, the operations are overlapped to achieve a degree of pipelining that improves performance. Specifically, the address calculation for the next memory operation takes place during the last two clock cycles of the current memory operation, saving two clock cycles. That is, for consecutive bus cycles, T3 and T4 of one bus cycle overlap with TS and T0 of the next cycle. In other words, during T3 and T4 of one bus cycle, the memory address gets computed for the next bus cycle. This pipelining significantly improves the performance of the 8088, compared to taking 6 clock cycles for each bus cycle.

With this timing, the address adder is free during cycles T1 and T2. To improve performance in another way, the 8088 uses the adder during this idle time to increment or decrement memory addresses. For instance, after popping a word from the stack, the stack pointer needs to be incremented by 2.5 Another case is block move operations (string operations), which need to increment or decrement the pointers each step. By using the address adder, the new pointer value is calculated "for free" as part of the memory cycle, without using the processors regular ALU.4

Address corrections

The address adder is used in one more context: correcting the Instruction Pointer value. Conceptually, the Instruction Pointer (or Program Counter) register points to the next instruction to execute. However, since the 8088 prefetches instructions, the Instruction Pointer indicates the next instruction to be fetched. Thus, the Instruction Pointer typically runs ahead of the "real" value. For the most part, this doesn't matter. This discrepancy becomes an issue, though, for a subroutine call, which needs to push the return address. It is also an issue for a relative branch, which jumps to an address relative to the current execution position.

To support instructions that need the next instruction address, the 8088 implements a micro-instruction CORR, which corrects the Instruction Pointer. This micro-instruction subtracts the length of the prefetch queue from the Instruction Pointer to determine the "real" Instruction Pointer. This subtraction is performed by the address adder, using correction constants that are stored in a small Constant ROM.

The tricky part is ensuring that using the address adder for correction doesn't conflict with other uses of the adder. The solution is to run a special shortened memory cycle—just the TS and T0 states—while the CORR micro-instruction is performed.6 These states block a regular memory cycle from starting, preventing a conflict over the address adder.

A closeup of the address adder circuitry in the 8086. From my article on the adder.

A closeup of the address adder circuitry in the 8086. From my article on the adder.

Prefetching

The 8088 prefetches instructions before they are needed, loading instructions from memory into a 6-byte prefetch queue. Prefetching usually improves performance, but can result in an instruction's memory access being delayed by a prefetch, hurting overall performance. To minimize this delay, a bus request from an instruction will preempt a prefetch, even if the prefetch has gone through TS and T0. At that point, the prefetch hasn't created any bus activity yet (which first happens in T1), so preempting the prefetch can be done cleanly. To preempt the prefetch, the bus cycle state machine jumps back to TS, skipping over T1 through T4, and starting the desired access.

A prefetch will also be preempted by the micro-instruction that stops prefetching (SUSP) or the micro-instruction that corrects addresses (CORR). In these cases, there is no point in completing the prefetch, so the state machine cycle will end with T0.

Wait states

One problem with memory accesses is that the memory may be slower than the system's clock speed, a characteristic of less-expensive memory chips. The solution in the 1970s was "wait states". If the memory couldn't respond fast enough, it would tell the processor to add idle clock cycles called wait states, until the memory could respond.7 To produce a wait state, the memory (or I/O device) lowers the processor's READY pin until it is ready to proceed. During this time, the Bus Interface Unit waits, although the Execution Unit continues operation if possible. Although Intel's documentation gives the wait cycle a separate name (Tw), internally the wait is implemented by repeating the T3 state as long as the READY pin is not active.

Halts

Another complication is that the 8088 has a HALT instruction that halts program execution until an interrupt comes in. One consequence is that HALT stops bus operations (specifically prefetching, since stopping execution will automatically stop instruction-driven bus operations). A complication is that the 8088 indicates the HALT state to external devices by performing a special T1 bus cycle without any following bus cycles. But wait: there's another complication. External devices can take control of the bus through the HOLD functionality, allowing external devices to perform operations such as DMA (Direct Memory Access). When the device ends the HOLD, the 8088 performs another special T1 bus cycle, indicating that the HALT is still in effect. Thus, the bus state machine must generate these special T1 states based on HALT and HOLD actions. (I discussed the HALT process in detail here.)

Putting it all together: the state diagram

The state diagram below summarizes the different types of bus cycles. Each circle indicates a specific T-state, and the arrows indicate the transitions between states. The green line shows the basic bus cycle or cycles, starting in TS and then going around the cycle. From T3, a new cycle can start with T0 or the cycle will end with T4. Thus, new cycles can start every four clocks, but a full cycle takes six states (counting the "invisible" TS and T0). The brown line shows that the bus cycle will stay in T3 as long as there is a wait state. The red line shows the two cycles for a CORR correction, while the purple line shows the special T1 state for a HALT instruction. The cyan line shows that a prefetch cycle can be preempted after T0; the cycle will either restart at TS or end.

A state diagram showing the basic bus cycle and various complications.

A state diagram showing the basic bus cycle and various complications.

I'm showing states TS and T3 together since they overlap but aren't the same. Likewise, I'm showing T4 and T0 together. T4 is grayed out because it doesn't exist from the state machine's perspective; the circuitry doesn't take any particular action during T4.

The schematic below shows the implementation of the state machine. The four flip-flops represent the four states, with one flip-flop active at a time, generating states T0, T1, T2, and T3 (from top to bottom). Each output feeds into the logic for the next state, with T3 wrapping back to the top, so the circuit moves through the states in sequence. The flip-flops are clocked so the active state will move from one flip-flop to the next according to the system clock. State TS doesn't have its own flip-flop, but is represented by the input to the T0 flip-flop, so it happens one clock cycle earlier.8 State T4 doesn't have a flip-flop since it isn't "real" to the bus state machine. The logic gates handle the special cases: blocking the state transfer if necessary or starting a state.

Schematic of the state machine.

Schematic of the state machine.

I'll explain the logic for each state in more detail. The circuitry for the TS state has two AND gates to generate new bus cycles starting from TS. The first one (a) causes TS to happen with T3 if there is a pending bus request (and no HOLD). The second AND gate (b) starts a bus cycle if the bus is not currently active and there is a bus request or a CORR micro-instruction. The flip-flop causes T0 to follow T3/TS, one clock cycle later.

The next gates (c) generate the T1 state following T0 if there is pending bus activity and the cycle isn't preempted to T3. The AND gate (d) starts the special T1 for the HALT instruction.9 The T2 state follows T1 unless T1 was generated by a HALT (e).

The T3 logic is more complicated. First, T3 will always follow T2 (f). Next, a wait state will cause T3 to remain in T3 (g). Finally, for a preempt, T3 will follow T0 (h) if there is a prefetch and a microcode bus operation (i.e. an instruction specified the bus operation).

Next, I'll explain BUS-ACTIVE, an important signal that indicates if the bus is active or not. The Bus Interface Unit generates the BUS-ACTIVE signal to help control the state machine. The BUS-ACTIVE signal is also widely used in the Bus Interface Unit, controlling many functions such as transfers to and from the address registers. BUS-ACTIVE is generated by the complex circuit below that determines if the bus will be active, specifically in states T0 through T3. Because of the flip-flop, the computation of BUS-ACTIVE happens in the previous clock cycle.

The circuit to determine if the bus will be active next cycle.

The circuit to determine if the bus will be active next cycle.

In more detail, the signal BUS-ACTIVE-PRE indicates if the bus cycle will continue or will end on the next clock cycle. Delaying this signal through the flip-flop generates BUS-ACTIVE, which indicates if the bus is currently active in states T0 through T3. The top AND gate (a) is responsible for starting a cycle or keeping a cycle going (a1). It will allow a new cycle if there is a bus request (without HOLD) (a3). It will also allow a new cycle if there is a CORR micro-instruction prior to the T1 state (even if there is a HOLD, since this "fake" cycle won't use the bus) (a2). Finally, it allows a new cycle for a HALT, using T1-pre (a2).10 Next are the special cases that end a bus cycle. The second AND gate (b) ends the bus cycle after T3 unless there is a wait state or another bus request. (But a HOLD will block the next bus request.) The remaining gates end the cycle after T0 to preempt a prefetch if a CORR or SUSP micro-instruction occurs (d), or end after T1 for a HALT (e).

The BUS-ACTIVE circuit above uses a complex gate, a 5-input NOR gate fed by 5 AND gates with two attached OR gates. Surprisingly, this is implemented in the processor as a single gate with 14 inputs. Due to how gates are implemented with NMOS transistors, it is straightforward to implement this as a single gate. The inverter and NOR gate on the left, however, needed to be implemented separately, as they involve inversion; an NMOS gate must have a single inversion.

The bus state machine circuitry on the die.

The bus state machine circuitry on the die.

The diagram above shows the layout of the bus state machine circuitry on the die, zooming in on the top region of the die. The metal layer has been removed to expose the underlying silicon and polysilicon. The layout of each flip-flop is completely different, since the layout of each transistor is optimized to its surroundings. (This is in contrast to later processors such as the 386, which used standard-cell layout.) Even though the state machine consists of just a handful of flip-flops and gates, it takes a noticeable area on the die due to the large 3.2 µm feature size of the 8088. (Modern processors have features measured in nanometers, not micrometers.)

Conclusions

The bus state machine is an example of how the 8088's design consists of complications on top of complications. While the four-state bus cycle seems straightforward at first, it gets more complicated due to prefetching, wait states, the HALT instruction, and the bus hold feature, not to mention the interactions between these features. While there were good motivations behind these features, they made the processor considerably more complicated. Looking at the internals of the 8088 gives me a better understanding of why simple RISC processors became popular.

The bus state machine is a key part of the read and write circuitry, moving the bus operation through the necessary T-states. However, the state machine is not the only component in this process; a higher-level circuit decides when to perform a read, write, or prefetch, as well as breaking a 16-bit operation into two 8-bit operations.11 These circuits work together with the higher-level circuit telling the state machine when to go through the states.

In my next blog post, I'll describe the higher-level memory circuit so follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process, and the 8086 registers earlier.

Notes and references

  1. The 8085 and 8088 processors both use a 4-step bus cycle for instruction fetching. For other reads and writes, the 8085's bus cycle has three steps compared to four for the 8088. Thus, the 8085 and 8088 bus cycles are similar but not an exact match. 

  2. The 8088 has separate instructions to read or write an I/O device. From the bus perspective, there's no difference between an I/O operation and a memory operation except that a pin on the chip indicates if the operation is for memory or I/O.

    The 8088 supports I/O operations for historical reasons, going back through the 8086, 8080, 8008, and the Datapoint 2200 system. In contrast, many other contemporary processors such as the 6502 used memory-mapped I/O, using standard memory accesses for I/O devices.

    The 8086 has a pin M/IO that is high for a memory access and low for an I/O access. External hardware uses this pin to determine how to handle the request. Confusingly, the pin's function is inverted on the 8088, providing IO/M. One motivation behind the 8088's 8-bit bus was to allow reuse of peripherals from the earlier 8-bit 8085 processor. Thus, the pin's function was inverted so it matched the 8085. (The pin is only available when the 8086/8088 is used in "minimum mode"; "maximum mode" remaps some of the pins, making the system more complicated but providing more control.) 

  3. I've made the timing diagram somewhat idealized so actions line up with the clock. In the real datasheet, all the signals are skewed by various amounts so the timing is more complicated. See the datasheet for pages of timing constraints on exactly when signals can change. 

  4. For more information on the implementation of the address adder, see my previous blog post

  5. The POP operation is an example of how the address adder updates a memory pointer. In this case, the stack address is moved from the Stack Pointer to the IND register in order to perform the memory read. As part of the read operation, the IND register is incremented by 2. The address is then moved from the IND register to the Stack Pointer. Thus, the address adder not only performs the segment arithmetic, but also computes the new value for the SP register.

    Note that the increment/decrement of the IND register happens after the memory operation. For stack operations, the SP must be decremented before a PUSH and incremented after a POP. The adder cannot perform a predecrement, so the PUSH instruction uses the ALU (Arithmetic/Logic Unit) to perform the decrement. 

  6. During the CORR micro-instruction, the Bus Interface Unit performs special TS and T0 states. Note that these states don't have any external effect, so they are invisible outside the processor. 

  7. The tradeoff with memory boards was that slower RAM chips were cheaper. The better RAM boards advertised "no wait states", but cheaper boards would add one or more wait states to every access, reducing performance. 

  8. Only the second half of the TS state has an effect on the Bus Interface Unit, so TS is not a full state like the other states. Specifically, a delayed TS signal is taken from the first half of the T0 flip-flop, and this signal is used to control various actions in the Bus Interface Unit. (Alternatively, you could think of this as an early T0 state.) This is why there isn't a separate flip-flop for the TS state. I suspect this is due to timing issues; by the time the TS state is generated by the logic, there isn't enough time to do anything with the state in that half clock cycle, due to propagation delays. 

  9. There is a bit more circuitry for the T1 state for a HALT. Specifically, there is a flip-flop that is set on this signal. On the next cycle, this flip-flop both blocks the generation of another T1 state and blocks the previous T1 state from progressing to T2. In other words, this flip-flop makes sure the special T1 lasts for one cycle. However, a HOLD state resets this flip-flop. That allows another special T1 to be generated when the HOLD ends. 

  10. The trickiest part of this circuit is using T1-pre to start a (short) cycle for HALT. The way it works is that the T1-pre signal only makes a difference if there isn't a bus cycle already active. The only way to get an "unexpected" T1-pre signal is if the state machine generates it for the first cycle of a HALT. Thus, the HALT triggers T1-pre and thus the bus-active signal. You might wonder why the bus-active uses this roundabout technique rather than getting triggered directly by HALT. The motivation is that the special T1 state for HALT requires the AND of three signals to ensure that the state is generated once for the HALT rather than continuously, but happens again after a HOLD, and waits until the current bus cycle is done. Instead of duplicating that AND gate, the circuit uses T1-pre which incorporates that logic. (This took me a long time to figure out.) 

  11. The 8088 has a 16-bit bus, compared to the 8088's 8-bit bus. Thus, a 16-bit bus operation on the 8088 will always require two 8-bit operations, while the 8086 can usually perform this operation in a single step. However, a 16-bit bus operation on the 8086 will still need to be broken into two 8-bit operations if the address is unaligned (i.e. odd). 





[#] Wed May 29 2024 07:16:53 UTC from rss <>

Subject: Inside a vintage aerospace navigation computer of uncertain purpose

[Reply] [ReplyQuoted] [Headers] [Print]

I recently obtained an aerospace computer from the early 1970s, apparently part of a navigation system. Aerospace computers are an interesting but mostly neglected area of computer hardware, so I'm always delighted to examine one up close. In an era when most computers were large mainframes, aerospace computers packed dense electronics into a small package, using technologies such as surface-mounted components and multi-layer printed circuit boards, technologies that wouldn't reach the mainstream for another decade. This blog post examines the circuitry and components inside this computer, including an unusual electromechanical display. Although I was unable to determine who manufactured this system or even its exact function, this system illustrates how hundreds of integrated circuits and a core memory stack can be crammed into a compact package.

The navigation computer, showing the front panel with the display and keyboard, with the electronics unit behind it. Click this image (or any other) for a larger version.

The navigation computer, showing the front panel with the display and keyboard, with the electronics unit behind it. Click this image (or any other) for a larger version.

The keyboard

The device has a simple numeric keyboard with a few unexpected features. The numeric keypad can also be used for direction entry, as four of the keys have N, S, E, and W on them. The keys are large, roughly the size of the Apollo spacecraft's DSKY buttons. My theory is that these buttons are designed for operation with gloves, perhaps in a fighter plane where the pilot wears a pressure suit. The buttons are hinged at the top, so they don't push straight in, but pivot when pressed.

Numeric keypads typically use one of two layouts: a telephone-style keypad has the digits 123 at the top, while a calculator-style keypad has the digits 789 at the top. Interestingly, this device uses a calculator layout, while most aviation devices have a telephone layout. The Apollo DSKY also used a calculator layout, which could be a hint at a NASA connection for this device.

Above the keyboard are four codes for self-test: N4576, E9384, S9021, and W4830. Entering these codes on the keyboard presumably triggered the appropriate test of the system when the switch is in test mode.

The display

The computer's display is simple, showing a latitude and longitude. Each value has one decimal position, providing 0.1° of accuracy. The latitude and longitude are prefixed with a compass direction: North/South for latitude and East/West for longitude.

The front panel of the navigation computer, with a display and keyboard.

The front panel of the navigation computer, with a display and keyboard.

The display is constructed from an unusual type of electromechanical indicator, with an indicator module for each digit. Each digit position has a rotating wheel with 11 positions (ten digits and a blank). When the indicator module for a position is energized, the wheel spins to the specified position, showing the selected digit. The two leftmost indicators are slightly different as they show a compass direction instead of a digit: N, S, E, or W. Moreover, the direction indicators can also show the compass direction with a diagonal slash through it, as seen above. Perhaps the slashed direction indicates a problem with the value.

The diagram below shows how a digit indicator operates. Each digit position has an electromagnet with a wire to energize it. The dial wheel has an attached permanent magnet (indicated by N and S). Energizing one of the electromagnets causes the dial to spin to that position, aligning the permanent magnet on the dial with the electromagnet. This mechanism forms a reliable indicator with just one moving part. The displayed digit is clearer than a seven-segment display since the digit uses a real font rather than being created from segments.

A diagram illustrating the magnetic indicator construction. From Patent 3201785. The patent describes a different indicator but the construction is similar.

A diagram illustrating the magnetic indicator construction. From Patent 3201785. The patent describes a different indicator but the construction is similar.

Looking at the back of the keyboard/display unit shows the wiring of the display indicators. Each indicator has a common connection and ten wires to energize one of the electromagnets.1 The electromagnets are connected in a matrix, with all the "1" wires connected, the "2" wires connected, and so forth. To rotate an indicator to a particular digit, a common wire and an electromagnet wire are energized. For instance, powering the common wire of the second indicator and the "5" electromagnetic wire causes the second indicator to rotate to the "5" position. The wiring has a three-dimensional structure with ten bare wires running between the boards, one for each digit value. A yellow wire hangs off each bare wire, linking it to the connector on the left. Each indicator has ten diodes on a circuit board to block "sneak" paths that would energize unselected electromagnets.

The back of the keyboard/display unit. The keyboard buttons are at the back of this photo, while the display modules are at the front.

The back of the keyboard/display unit. The keyboard buttons are at the back of this photo, while the display modules are at the front.

This matrix circuit reduces the amount of wiring required: although there are 100 electromagnets in total, just 20 wires are sufficient to control them. The driver circuitry, however, is a bit more complex as it must scan through the ten digit positions, activating the right pair of driver wires at the right time. Some of the logic circuitry described below must implement this scanning, as well as the driver circuitry to energize the indicators.

The display and keyboard have many similarities to the Delco Carousel Inertial Navigation System (INS) shown below. (The Delco Carousel was used in many military and civilian aircraft, from the C-141 cargo plane to the Boeing 747 passenger plane.) Both devices have two digital displays, one for latitude North/South and one for longitude East/West. Also note the numeric keypads with four keys assigned to the four compass directions. The controls of the Carousel INS system are considerably more complicated, though. The Carousel has a knob position "TK/GS" (track/ground speed), which may correspond to the "T/G" position on my device.

Control unit for the Delco Carousel inertial navigation system. From Smithsonian collection, gift of Delphi Electronics & Safety.

Control unit for the Delco Carousel inertial navigation system. From Smithsonian collection, gift of Delphi Electronics & Safety.

Note that the display on my unit has just four digits of accuracy, with one digit after the decimal point. A tenth of a degree would provide an accuracy of about ±7 miles, which is low for a navigation device. In comparison, the Delco Carousel has six digits of accuracy (± 100 feet perhaps). This suggests that the device does not provide INS navigation, but some other guidance with lower accuracy.

Packaging the electronics

The unit contains 14 circuit boards, crammed with TTL integrated circuits, along with a core memory stack. The photo below shows how circuit boards surround the core memory stack. The mechanical design of the unit is advanced, allowing the boards to be opened up like a book. This provides compact packaging while allowing access to the boards.

The electronics unit can be disassembled and folds open like a book.

The electronics unit can be disassembled and folds open like a book.

The circuit boards are four-layer printed circuit boards, more advanced than the common two-layer boards of the time. The boards use a mixture of surface-mounted and through-hole components. The flat-pack ICs and the tiny round transistors are surface mounted, which was rare at the time. On the other hand, the resistors, capacitors, diodes, and larger transistors use standard through-hole components. At the time, most electronics used through-hole components, although aerospace systems often used surface-mounted components for higher density. It wasn't until the late 1980s that surface-mount technology became commonplace.

The boards are mounted in solid metal frames, providing both structural integrity and heat conduction for cooling. Most of the frames hold two boards, mounted back-to-back for higher density.

The logic boards

Four of the circuit boards are logic boards, packed with flat-pack integrated circuits. The board below holds 55 integrated circuits, showing the high density that is possible with flat packs.

A board filled with flat-pack logic ICs.

A board filled with flat-pack logic ICs.

The logic ICs are Signetics 400-series chips, an early type of TTL (Transistor-Transistor Logic) chip. Just three types of these ICs are used: SE440J "Dual exclusive OR" (really AND-OR-INVERT but XOR if provided with particular inputs), SE455J "Dual 4-input buffer/driver" (4-input NAND or NOR gates depending on polarity), and SE480J "Quad 2-input NAND/NOR". These integrated circuits cost $15.45 each in 1966 (about $150 each in current dollars).2

The schematic below shows the circuit that implements AND-OR-INVERT (or exclusive or) in the SE440J. The multiple-emitter transistors on the inputs may appear unusual, but this is the standard way to implement TTL gates. It is important to note that this chip only contains 12 transistors, so the density is low. (Since the chip contains two of these gates, this circuit is duplicated.) In the mid-1960s, integrated circuits only contained a few transistors—the Apollo Guidance Computer's ICs had just 6 transistors—but by the time this unit was built in the early 1970s, some chips had thousands of transistors, tracking Moore's Law. Thus, this unit both illustrates how aviation computers could be built from simple integrated circuits and how the dramatic improvements in IC technology rapidly obsoleted these computers.

Schematic of the SE440J integrated circuit. From datasheet.

Schematic of the SE440J integrated circuit. From datasheet.

The Signetics 400-series seems to have been obscure and short-lived, probably killed off by the wild success of 7400-series TTL chips. I was able to find only a few announcements and datasheets for these chips. The only users of these chips that I could find were NASA projects from the late 1960s.3 Signetics 400-series chips were used in the Mariner Mars and Venus probes, in the Data Automation Subsystem (DAS) (link, link). The Voyager Mars probes also used them. The SE455J gates were also used to interface the Apollo Guidance Computer to a core-rope simulator. JPL used the SE455J in a core memory system. NASA used the SE455J, SE480J, and other Signetics chips in its design for the MICROMIN computer. None of these systems appear to be related to the navigation system, but they illustrate that NASA was using these specific Signetics chips at the time in multiple designs.

The chips are labeled "CDC", raising the possibility that these chips were built by Control Data Corporation (CDC) under license from Signetics. The Aerospace Division of CDC was active at the time, building various compact computer systems. For instance, the CDC 480 computer (1976) was a 16-bit computer based on the Am2900 bit-slice chip. Also known as the AN/AYK-14, this system was used on numerous aircraft including the F-18. An earlier CDC aerospace computer is the AN/AWG-9 Airborne Missile Control System (1965), a 24-bit computer in a compact 1.1 cubit foot package. Used on the F-14 fighter plane, this computer guided the Phoenix air-to-air missile. Based on CDC's activity in aerospace computers at the time, the mystery computer could be a CDC system, although this hypothesis is based solely on integrated circuits labeled "CDC".

The CDC AN/AYK-14 computer with circuit boards. This is an example of an aerospace computer built by CDC slightly later than the mystery computer. From a 1983 brochure.

The CDC AN/AYK-14 computer with circuit boards. This is an example of an aerospace computer built by CDC slightly later than the mystery computer. From a 1983 brochure.

The photo below shows another logic board. This one has numerous red and white wires attached, linking it to the rest of the system. Curiously, this board has a single transistor, with two associated resistors, in the middle of the board.

Another logic board, with a similar grid of flat-pack integrated circuits.

Another logic board, with a similar grid of flat-pack integrated circuits.

Analog boards

The computer contains not only logic boards but also boards full of analog circuitry to interface with the core memory, keyboard, and display. The board below contains 17 of the logic ICs seen earlier. However, it also uses many resistors, capacitors (red cylinders), transistors (white circles), inductors (white banded cylinders), and glass diodes. The board also has some analog integrated circuits. In particular, it has three TI SN52709 op-amps, the smaller 10-pin packages. The board also contains some integrated circuits that I couldn't identify: UT1000, UT1027, UD4001, and D245F. The SM 60 ICs in white packages have a logo that I don't recognize. The op-amps could function as sense amplifiers for the core memory, or this board could provide other analog interfacing.

A board with some analog integrated circuits.

A board with some analog integrated circuits.

The board has multiple gray four-pin packages labeled "926D". Based on the + and - markings, these packages are probably bridge rectifiers, maybe providing power for the circuits. Many of the other boards have these rectifiers. The analog boards also contain a few Halex flat-pack devices labeled "HALEX 101205 727". Hanlex manufactured thin-film resistors in flat packs, so these are probably resistor networks. NASA used Halex resistor networks in some devices (link).4

The analog board shown below sits next to the core memory stack. It uses a different set of flat-pack components: Signetics C8930G and PL 98321. Unfortunately, I could not identify these ICs. This board, unlike the previous boards, has a copper ground plane in the second layer of the circuit board; this layer is visible in the photo as the copper-colored background occupying most of the board.

Another analog board in the aviation computer.

Another analog board in the aviation computer.

Core memory

The unit is built around a core memory stack, as was common in the era before semiconductor memory took over. Magnetic core memory consists of a grid of tiny ferrite cores with wires threaded through them, forming a core plane. Typically, a core memory unit consists of multiple planes, one for each bit in the word, stacked to form a three-dimensional block of memory.

The photo below shows a closeup of the stack. It appears to have 20 planes, suggesting a 20-bit processor. Soldered wires connect the planes together to provide continuous wiring through the stack. The soldering on these wires looks somewhat haphazard, suggesting that this was not a production unit.

A closeup of the core memory stack. Brightly colored wires connect the module to the rest of the system. Small wires connect the layers together.

A closeup of the core memory stack. Brightly colored wires connect the module to the rest of the system. Small wires connect the layers together.

The photo below shows the other side of the core memory stack, with similar wiring between the planes. At the right are a few layers of a different type, connected with 26 wires. The tape measure shows that the core memory stack is compact, about 6 cm on a side (2¼").

Measurement of the core memory stack.

Measurement of the core memory stack.

Some of the boards are drivers for the core memory stack. The board below has 48 small round transistors, colored either blue or red. Note the green, white, and yellow wires in the lower right, mostly hidden under the brown ground ribbon. These wires are connected to the core memory stack.

A circuit board with many small transistors.

A circuit board with many small transistors.

The board below also has numerous wires to the core stack, underneath the brown ground ribbon, so it is presumably another driver board. This board has some round driver transistors with yellow dots. Curiously, in the upper left there are a few circuit board pads where transistors could be mounted but are missing. Perhaps with the additional components the board would support a system with more of something: a larger keyboard? more memory?

A board with driver transistors.

A board with driver transistors.

Looking at the back of the unit, you can see the display indicator wiring at the top and a circuit board at the bottom. This board contains 20 transistors in metal cans, specifically Motorola 2N3736 NPN transistors. The core memory stack has 20 planes, matching the 20 transistors on this board, so the board probably implements the core memory "inhibit drivers", controlling the bit written to each plane. The board also has numerous tiny surface-mount transistors in white, red, and black packages. Close examination shows a few thin green "bodge" wires on this board, indicating that rework was performed on the board to fix a circuit problem, another piece of evidence that this unit is a prototype.

A view of the computer from the back, showing the display wiring and a circuit board.

A view of the computer from the back, showing the display wiring and a circuit board.

The core memory stack is enclosed by two sheet metal boxes, which I removed for the photos. The stack also has two flexible ground planes attached to it. The designers clearly wanted to ensure that the memory was well shielded, to a degree that I haven't seen in other systems.

Conclusions

Despite my research, this aerospace computer remains a mystery. I was unable to identify who manufactured it or even its exact function. One hypothesis is a NASA connection since NASA was extensively using these Signetics chips at the time. Moreover, this computer was obtained in the Houston area. Another hypothesis, based on the "CDC" label on the chips, is that this computer was built by Control Data's Aerospace Division. If you have any leads on this mysterious aviation computer, please contact me.

This system may have been a prototype. It has no part numbers, manufacturer name, or identifying plate.5 Moreover, the soldering on the core memory stack doesn't seem to be flight quality. Finally, the boards don't have conformal coating, which is typically used for spaceflight systems. However, the mechanical design looks advanced for a prototype, with dense boards that fold together like a book.

This unit clearly has a navigation role, but seems to be too inaccurate for an inertial navigation system (INS). It contains many integrated circuits, but not enough to form a full computer. I hypothesize that this unit contains the circuitry to drive the core memory and the display, and handle keyboard input. Looking at the underside of the unit (below), there are three connectors. I suspect these connectors were plugged into a larger box that held the computer itself.

A view of the underside of the electronics unit with the core memory wrapped in sheet metal.

A view of the underside of the electronics unit with the core memory wrapped in sheet metal.

The date codes on the integrated circuits range from 1966 to 1973, so the computer was probably manufactured in 1973. The seven-year range for date codes is a bit surprising, since integrated circuit technology changed a lot during these years. I suspect that the Signetics 400-series ICs had older date codes because this line didn't catch on so there was a lot of old stock rather than newly-manufactured parts. I also suspect that this system was designed around 1969, based on the multiple NASA systems using these chips then, suggesting that the design and manufacturing of this unit was a multi-year project.

Despite the lingering mysteries of this device, it provides an interesting example of aerospace computers at the beginning of the 1970s. Even though integrated circuits were primitive at the time, with just a few transistors per chip, aerospace computers used these chips and high-density packaging to build computers that were compact, reliable, and low power. These miniature computers controlled aircraft, missiles, and spacecraft, worlds away from the room-filling mainframes that attracted most of the attention.

Thanks to Usagi Electric for providing the aerospace computer. Eric Schlaepfer and Marc Verdiell helped with the analysis. Thanks to Don Straney for his research and comments. Various commenters on Reddit and Twitter provided suggestions. Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff.

Notes and references

  1. The indicators have a blank position, so there are 11 electromagnets. However, only the ten electromagnets associated with digits are used in the device. The N/S/E/W indicators have a square box in one of the positions, which probably is not used. 

  2. Signetics had multiple temperature ranges for the 400-series low-power ICs. The RE prefix indicated ultra high reliability aerospace components rated for a temperature range of -55°C to +125°C. The SE prefix on the chips in this unit indicated military airborne chips with the same temperature range. A NE or ST prefix indicated military prototype or industrial chips with a smaller temperature range (0°C to +70°C). A SP prefix indicated the commercial temperature rating, from +15°C to +55°C. A J suffix indicated a flat pack and an A suffix indicated a dual in-line pack (DIP). 

  3. NASA computers are the only documented systems that I could find that used these Signetics chips. One possible conclusion is that NASA was the only organization to use these chips. However, it is likely that other companies used these chips but didn't document them as thoroughly as NASA. That is, detailed circuitry for military aerospace computers is unlikely to be on the Internet. 

  4. Halex also made hybrid microcircuits, such as flip-flops, so these packages could be more complex than resistor networks. However, I think a resistor network is more likely. 

  5. One of the circuit boards had the number "45333000" on it, along with a symbol like "+I-", as shown below.

    Closeup of a circuit board showing a number, maybe identifying the board.

    Closeup of a circuit board showing a number, maybe identifying the board.

    One board also had a mysterious symbol that resembles "mw". I couldn't match these symbols to any manufacturers, and it is unclear if they are logos, fiducials, or other symbols.

    Closeup of a circuit board showing the "mw" mark.

    Closeup of a circuit board showing the "mw" mark.

     





[#] Sun Jun 23 2024 08:59:00 UTC from rss <>

Subject: Inside the tiny chip that powers Montreal subway tickets

[Reply] [ReplyQuoted] [Headers] [Print]

To use the Montreal subway (the Métro), you tap a paper ticket against the turnstile and it opens. The ticket works through a system called NFC, but what's happening internally? How does the ticket work without a battery? How does it communicate with the turnstile? And how can it be so cheap that you can throw the ticket away after one use? To answer these questions, I opened up a ticket and examined the tiny chip inside.

The image below shows the chip inside the ticket, highly magnified. The four golden squares in the corner are the connections to the antenna. The tan-colored lines are the metal wiring layer on top of the chip; the thickest lines wire the antenna to other parts of the chip. The darker region that takes up the majority of the chip is the chip's digital logic. To the left is the analog circuitry that handles the signal from the antenna.

The MIFARE Ultralight die under the microscope. (Click this image (or any other) for a larger view.

The MIFARE Ultralight die under the microscope. (Click this image (or any other) for a larger view.

The chip uses NFC (Near-Field Communication). The idea behind NFC is that a reader (i.e. the turnstile) and an NFC tag (i.e. the ticket) communicate over a short distance through magnetic fields, allowing them to exchange data. The reader generates a magnetic field that both powers the tag and sends data to the tag. Both the reader and the tag have coil-like antennas so the reader's magnetic field can be picked up by the tag.1 When you tap your ticket on the turnstile, the NFC communication happens in 35 milliseconds, faster than an eyeblink. The data provided by the NFC tag shows that you have a valid ticket and then you can enter the subway.

The photo below shows the subway ticket, made of printed paper.2 At the right, the ticket appears to have golden smart-card contacts, like a credit card with an EMV chip. However, those contacts are completely fake, just printed onto the card with ink, and there is no chip there. Presumably, the makers thought that making the card look like a smart card would help people understand it. The card actually uses an entirely different technology.

A Montreal subway card. This card is for occasional use and is disposable. Regular travel uses a rigid plastic card containing a different chip.

A Montreal subway card. This card is for occasional use and is disposable. Regular travel uses a rigid plastic card containing a different chip.

Although the subway card is paper on the outside, its core is a thin plastic sheet, shown below. The sheet has a coiled antenna made from a layer of metal foil. If you look closely, you can see the tiny NFC chip in the lower left, a black speck connected to two sides of the antenna wire.3 The diagonal metal stripe in the upper left makes the antenna into a loop; topologically, a spiral antenna won't work on a 2-D sheet, so the diagonal bridge completes the circuit.

The antenna and chip inside the subway card.

The antenna and chip inside the subway card.

I want to emphasize the absurdly small size of the chip: 570 µm × 485 µm. The photo below shows that it is about the size of a grain of salt. The chip is also extremely thin—75 µm or 120 µm—so you can't even feel the chip inside the ticket.

The chip next to grains of salt. I composited two images, one illuminated from above to show the die and one illuminated from below to show the salt.

The chip next to grains of salt. I composited two images, one illuminated from above to show the die and one illuminated from below to show the salt.

Functions of the chip

There are many different types of NFC chips with varying levels of functionality. 4 This one is called the MIFARE Ultralight EV1,5 a low-cost chip designed for one-time ticketing applications. The basic function of the Ultralight chip is simple: providing a block of data to the reader. The chip holds its data in a small EEPROM; this chip has 48 bytes of user memory, while another variant has 108 bytes of user memory.

The Ultralight chip lacks the cryptography support found in more advanced chips. The Ultralight isn't much more secure than a printed ticket with a QR code or barcode, like you'd download for a show. It's up to the reader to validate the data and make sure the same ticket isn't being used multiple times.6

The Ultralight chip has a few features beyond a printed ticket, though. The chips are manufactured with a unique 7-byte identification code (UID). Moreover, the UID is signed, ensuring that fake UIDs cannot be generated.7 The chip also supports password-protected memory access and locking of memory pages to prevent modification. Since the password is transmitted without encryption, the security is weak, but better than nothing.8

Another interesting feature of the chip is the one-way counter. The chip has three 24-bit counters that can be incremented but not decremented. The counters can be used to allow the ticket to be used a particular number of times, for instance.9

Photographing the chip

To photograph the chip, I went through several steps to remove the chip from the ticket and then strip the chip down to the bare silicon. First, to extract the plastic sheet with the chip and the antenna from the paper ticket, I simply soaked the ticket in water. This turned the paper into mush, which could be scraped off to reveal the plastic core. Next, I cut out a small square of plastic that included the chip and put it in boiling sulfuric acid for about 30 seconds. This removed the plastic and adhesive, leaving the silicon die. (I try to avoid boiling acids, but processing a tiny chip like this only required a few drops of sulfuric acid, minimizing the risk.)

The die was covered with a passivation layer to protect its surface, a sandwich of silicon nitride and PSG (phosphosilicate glass) 1.1 µm thick according to the datasheet. The chip's underlying circuitry was visible, but slightly hazy due to this layer. I removed the passivation layer by boiling the chip in phosphoric acid for a few minutes. The image below shows the chip after this step. The top metal layer is much more visible, although some of the metal was dissolved by the acid. The thick metal lines connect the four bond pads to various parts of the analog circuitry, while many thin vertical metal lines provide interconnections of the logic circuitry.

The die after treatment with phosphoric acid to remove the passivation layer. Click for a much larger version.

The die after treatment with phosphoric acid to remove the passivation layer. Click for a much larger version.

Next, I treated the die with several cycles of treatment with Armour Etch to dissolve the oxide layer and hydrochloric acid to dissolve the metal. I think the chip had three layers of metal wiring on top of the silicon. Unfortunately, my process doesn't remove the metal layers cleanly, but causes them to come off in chaotic tangles. Since I wasn't interested in tracing the circuitry layer-by-layer, this wasn't a significant problem.

With the metal layers and polysilicon removed, I was left with the bare silicon. At this point, the underlying structure of the chip is visible. The doped silicon regions show the transistors, although they are extremely small at this scale. The white rectangles are capacitors. The chip has capacitors for many reasons: producing the right resonant frequency with the antenna, filtering the power, and boosting the voltage with charge pumps.

The die after stripping it down to the silicon.

The die after stripping it down to the silicon.

My biggest concern while processing this chip was to avoid losing it. With a chip this small, bumping the chip or even breathing on it can send the chip flying perhaps never to be seen again. Even trying to pick up the chip with tweezers is risky, since it can easily pop out and disappear. It's no fun examining the floor, inch by inch, trying to figure out if a speck is the lost chip or a bit of dirt. I found that the best way to move the chip between processing and a microscope slide was to put the chip in a few drops of water and move it with a pipette. Even so, there were a couple of times that I lost track of the chip and had to check some specks under the microscope to determine which was the chip and which were dirt.

Overview of the chip

The block diagram below shows the high-level structure of the chip. At the left, the antenna is connected to the RF interface, the analog circuitry that converts the high-frequency signals into digital data. This circuitry also extracts power from the antenna's signal to power the chip.

Block diagram of the MIFARE Ultralight chip, from the datasheet.

Block diagram of the MIFARE Ultralight chip, from the datasheet.

The majority of the chip contains digital logic to process the 18 different commands that it can receive from the reader. Some commands, such as Wake-up or Halt control the chip's state. Other commands, such as Read or Write provide access to the EEPROM storage. The specialized Read_Cnt and Incr_Cnt commands access the chip's counters.

The chip has an "intelligent anticollision function" that allows multiple cards to be read without conflict if they are presented to the reader simultaneously. If a conflict is detected, the reader uses a standard NFC algorithm to select the cards one at a time, based on their identification numbers. The anticollision algorithm uses four of the chip's commands.

Finally, the chip has an EEPROM to store its data. Unlike RAM, the EEPROM holds data even when unpowered; it is designed to hold data for 10 years. To store data in the EEPROM, it must be written with a higher voltage than the rest of the chip uses. The EEPROM interface circuit produces the necessary signals.

The diagram shows the chip with its functional blocks labeled. The majority of the die is occupied with digital logic; I'll explain below how it is implemented with standard-cell logic. At the top is the EEPROM, a square of storage cells. To the right of the EEPROM is a charge pump, a circuit to boost the voltage through switched capacitors. The EEPROM interface circuitry is between the EEPROM and the digital logic.

The die, stripped down to the silicon, with presumed functional blocks labeled.

The die, stripped down to the silicon, with presumed functional blocks labeled.

The remainder of the chip contains analog circuitry that is harder to interpret, so my labels are somewhat speculative. The four bond pads are where the antenna is connected to the chip. There are four pads to support two parallel antennas if desired. The first die photo shows the metal wiring between the bond pads and the structures that I've labeled as RF transistors and RF diodes. The "RF transistors" in the upper left are large, oval-shaped structures. These may be the transistors that send data back to the reader by modifying the load. Alternatively, they could be Zener diodes to regulate the voltage powering the chip, since Zener diodes often have an oval shape. The "RF diodes" at the bottom may rectify the signal from the antenna, producing the power for the chip. The rectified signal is also demodulated and processed by the analog logic to extract the digital data sent from the reader.

Sending data from the tag to the reader: load modulation

You might expect the tag to send data back to the receiver by transmitting a signal through the antenna. However, transmitting a signal takes power and the tag doesn't have much power available, just the power that it extracts from the reader's signal. Instead, the tag uses a clever technique called load modulation to send data to the reader. The idea is that if the tag changes the load across the antenna, it will absorb more or less energy from the reader. The reader can detect this change as a small variation in voltage across its transmitting antenna. Thus, the tag can dynamically change its load to send data back to the reader. Even though the signal produced by load modulation is extremely weak (80 dB less than the transmitted signal), the reader can detect it and extract the data.

In more detail, the reader transmits at a carrier frequency of 13.56 MHz.10 To send data back, the tag switches its load on and off at 848 kHz (1/16 of the carrier frequency), producing a subcarrier on top of the reader's signal. To transmit bits, this load modulation is switched on or off to transmit 106 kilobits per second (1/8 of the modulation frequency). The reader, in turn, extracts the subcarrier with a filter to receive the data bits from the tag.

An NFC tag can apply a load that is either a resistor or a capacitor; a resistor absorbs the signal directly, while a capacitor changes the antenna's resonant frequency and thus the amount of signal transferred to the tag. The die contains many capacitors, but I didn't see any significant resistors, so I suspect that this chip uses a capacitor for the load.

The chip's manufacturing process

The image below shows an extreme closeup of the die. The red box surrounds a region of doped silicon, forming five MOS transistors in series. Each dark vertical line corresponds to the gate of one transistor so the width of this line corresponds to the feature size. I estimate that the chip's feature size is 180 nm. In comparison, the wavelength of visible light is 400-700 nm. Since the features are smaller than the wavelength of light, it's not surprising that image appears blurry.

A closeup of the die, pushing the limits of my microscope.

A closeup of the die, pushing the limits of my microscope.

The 180 nm process was popular in the late 1990s. These features are very large, however, compared to recent chips with features that are a few nanometers across. At the time the MIFARE Ultralight EV1 chip was released (October 2012), the newest semiconductor manufacturing process was 22 nm, so the 180 nm process they used was old even then.

However, it makes sense that the chip would be manufactured with an older process for several reasons. First, much of the chip's area is occupied by analog circuitry and the four bond pads, so shrinking the digital logic won't reduce the overall size much. Moreover, a significantly smaller chip would be impractical to attach to the antenna; I expect even the current chip is a pain to mount. Finally, this chip is designed for the extremely low-cost (i.e. disposable) market, so the chip is manufactured as inexpensively as possible. With a more modern process, more chips would fit on a wafer, dropping the price, but manufacturing each wafer would be more expensive, so there is a tradeoff.

Standard-cell logic

The chip's digital circuitry is implemented with standard-cell logic, a common way of implementing digital logic. The idea behind standard-cell logic is to use automated tools to create the chip layout from a description of the desired logic. The process starts with a library of standard cells. Each cell is a standardized implementation of a simple circuit such as a NAND gate or a flip-flop. The cells are designed so they have a fixed height and can be arranged in rows. The cells are then connected by metal wiring on top of the cells to produce the desired circuitry. Although the resulting circuitry isn't as dense and efficient as a fully customized and optimized layout, standard cell logic is much faster (and thus cheaper) to design than a hand-tuned layout. Thus, standard-cell logic has been heavily used for integrated circuit design since the 1980s.

The photo below shows four rows of gates implemented with standard cell logic, The chip (like most modern chips) uses CMOS logic, with each logic gate built from two types of transistors: NMOS and PMOS. To simplify manufacturing, the NMOS and PMOS transistors are arranged in separate rows. Thus, each row of logic consists of a row of PMOS transistors on top and a row of NMOS transistors below, or vice versa. Due to the physics of semiconductors, the PMOS transistors are larger, which allows the transistor types to be distinguished in the image.

A closeup of the standard cell logic.

A closeup of the standard cell logic.

Looking at some of the cells and extrapolating, I estimate about 8000 gates in the logic section with about 45,000 transistors. One question is if the chip is implemented as a hardcoded state machine, or if it contains a processor (microcontroller). The transistor count is barely large enough to implement a simple microcontroller such as an 8051, but that wouldn't leave many transistors left over for other necessary circuitry. If a microcontroller were present, it would need software stored somewhere. Given the simplicity of the protocol and the relatively small number of transistors, my guess is that the chip is implemented in hardware (state machines and counters) rather than through a microcontroller.

The diagram below shows how a standard cell implements a 2-input NAND. (This cell is from the Intel 386, not the NFC chip, but the structures are similar.) The cell contains four transistors. The yellow region is the P-type silicon that forms two PMOS transistors; the transistor gates are where the polysilicon (red) crosses the yellow region. (The middle yellow region is the drain for both transistors; there is no discrete boundary between the transistors.) Likewise, the two NMOS transistors are at the bottom, where the polysilicon (red) crosses the active silicon (green). The blue lines indicate the metal wiring for the cell. The black circles are contacts, connections between the metal and the silicon or polysilicon. Finally, the well taps are the opposite type of silicon, connected to the underlying silicon well or substrate to keep it at the proper voltage.

A standard cell for NAND in the Intel 386.

A standard cell for NAND in the Intel 386.

EEPROM

The chip stores its data in an EEPROM, similar to flash memory. The chip provides 640 or 1312 bits of EEPROM, based on the part number; I believe both versions use the same EEPROM implementation, but the cheaper version limits the amount that can be used. I think the EEPROM is the matrix shown below, with row and column drive circuitry to the right and below. (The diagonal lines are accidental scratches while I was processing the chip.)

A closeup of the presumed EEPROM circuitry on the die.

A closeup of the presumed EEPROM circuitry on the die.

In the photo, the EEPROM appears to be a 64×64 grid, 4K bits of storage rather than the advertised 1312 bits. There are several possible explanations. First, I could be miscounting the capacity (it is easy to be off by a factor of 2, depending on the cell structure). Second, the chip stores data that isn't reflected in the EEPROM memory map; for instance, the one-way counters and the UID signature are not included in the EEPROM storage count. Another possibility is that the extra EEPROM space holds code for a microcontroller (if the chip has one).

An EEPROM requires a relatively high voltage (10-20V) to force electrons into the storage cell for a bit. This voltage is generated by a charge pump circuit that switches capacitors at high frequency to boost the voltage. To the right of the EEPROM is a circuit with several large capacitors, presumably the charge pump.

Conclusions

It's remarkable that these NFC chips can be manufactured so cheaply that they are disposable. To keep the price down, the chips are sold by the wafer and then mounted in the tickets.11 You can buy an eight-inch silicon wafer with the chips for $9000 from Digikey. This may seem expensive until you realize that a single wafer provides an astonishing 100,587 chips, yielding a per-chip price of nine cents. According to the datasheet, a wafer has 103,682 potential good dies per wafer (PGDW). Some dies will be faulty, of course, so the wafer comes with a file telling you which dies are the good ones, 97% of them. (During the manufacturing of a typical chip, the faulty ones are marked with a spot of ink. But that won't work in this case since each die is much smaller than an ink spot.) If you need more chips, you can buy a 12" wafer for $19,000, providing 215,712 chips. A ticket manufacturer mounts each chip on an antenna sheet and then prints the ticket, adding a few cents to the cost of the ticket. The result is an inexpensive ticket that can be used once and discarded.

I'll leave you with one last die photo. In my first attempt at processing the chip, I treated it with Armour Etch. Although this failed to remove the passivation layer, it thinned it slightly, enough to generate some wild colors due to thin-film interference. I call this the "tie die".

The die after treatment with Armour Etch.

The die after treatment with Armour Etch.

Follow me on Twitter @kenshirriff or RSS for more. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in this type of chip, a few years ago, I looked at two RFID race timing chips, the Monza R4 and Monza R6.

Notes and references

  1. Because the card and the reader are positioned close together, the two antennas use "inductive coupling", coupled by magnetic fields rather than radio waves. That is, the two antennas act like transformer windings, transmitting the signal from the reader to the card. 

  2. The Montreal subway uses multiple types of cards. In this blog post, I examine the Occasional card (L'Occasionnelle). This is a non-rechargeable card that works for a single trip or up to three days, and then is discarded. For long-term usage, Montreal uses the Opus card, which provides more security and implements the Calypso standard. An Opus card is plastic rather than paper, giving it a longer life. The Calypso standard is much more secure, using cryptography such as AES, DES, and ECC (spec) and provides much larger EEPROM storage. Thus, the transit system uses the Occasional card for cheap, disposable tickets and the Opus card for a long-term ticket, where spending a dollar or two on the physical card isn't an issue.

    I haven't examined an Opus card, so I don't know what type of chip it uses or even who manufactures the chip. Many companies produce Calypso cards, for instance, the STMicroelectronics CD21 Calypso chip is based on an Arm core. 

  3. If you look closely at the lower right corner of the NFC card, it has three positions that can hold a chip, with the chip in position #3. Presumably, this allows three different NFC chips to be mounted in one card, so one card could have three functions. The NFC protocol is designed to avoid collisions if multiple chips respond, so the three chips won't interfere with each other. 

  4. You can easily examine NFC cards like this using your phone, with an app such as NFC Tools or NXP's Taginfo. Tapping a card will display the type of the card and allow the memory to be read (subject to security restrictions). It's entertaining to tap various NFC cards and see what type of chip they use; I found that hotels typically use the MIFARE Classic chip, more advanced than the MIFARE Ultralight chip in the subway ticket.

    The NFC Tools app shows that this card is a MIFARE Ultralight EV1.

    The NFC Tools app shows that this card is a MIFARE Ultralight EV1.

     

  5. The part number, as provided by the chip, is MF0UL1101DUx. "MF0UL" indicates the MIFARE Ultralight EV1, a chip in the Ultralight family manufactured by NXP. An "H" if present indicates 50 pF input capacitance, rather than 17 pF in the chip I examined, allowing a different antenna. Next, "1" indicates a chip with 384 bits of user memory, while "2" would indicate 1024 bits. This is followed by "101D", and then a code indicating the specific package: "U" indicates a wafer, while "A" indicates a plastic leadless module carrier (LCC). Other characters specify the wafer diameter and thickness. 

  6. It is instructive to think about the security of a printed ticket for a concert with a barcode. You could print out a hundred copies of the ticket, but it will only get you into the concert once. (This assumes that the venue has a centralized database so they can keep track of which tickets have been scanned.) Most of the security is implemented in the backend system, not the ticket itself. The ticket numbers need to be unforgeable, either by generating random numbers or using cryptography. (If the tickets just have QR codes with the numbers 1 to 100, for instance, it would be trivial to make fake tickets.) Moreover, there is nothing to ensure that the person scanning the ticket is legitimate; someone malicious could scan your ticket in line, print out a copy, and get into the concert instead of you. The MIFARE Ultralight chip is similar to a paper ticket in many ways with only slightly more security. 

  7. The UID signing is done with an ECC (elliptic-curve cryptography) algorithm. Note that the chip doesn't need any cryptographic support for this; the chip just holds the signature that was programmed during manufacturing. As far as the chip is concerned, it is just providing some stored bytes. 

  8. The MIFARE Ultralight has enough security to work as a limited-use ticket, but more advanced applications such as reloadable stored-value cards require a chip that supports encryption such as the DESFire. This allows the market to be partitioned, with the inexpensive Ultralight supporting the low-end market, while the more costly DESFire is required for more advanced applications.

    There are many types of MIFARE cards and it's hard to keep them straight, but the diagram below from NXP may help. The different families are arranged left to right: Ultralight, Classic, Plus, DESFire, and SmartMX. The Y dimension indicates the official security certification level. The Z dimension (front to back) shows the evolution within a family over time. I've added a red arrow to indicate the "Ultralight EV1" chip, the focus of this blog post. (Personally, if you need a three-dimensional diagram to explain your product line, the product line may be excessively complicated.)

    The various MIFARE NFC types. Diagram from aMIFARE Plus Product Family.

    The various MIFARE NFC types. Diagram from aMIFARE Plus Product Family.

     

  9. In more detail, a 3-byte counter can be incremented by a specified value until it reaches the all-1's state (0xFFFFFF), at which point it stops. If you wanted to allow, say, 5 uses of a ticket, you could initialize the counter to all-1's minus 5. Then the counter could be incremented 5 times before reaching the limit.

    One complication is that the counters have an "anti-tearing" feature for additional security. The problem is that if you tear the card away from the reader in the middle of an update, there is a possibility for counters to be partially updated, yielding a bad result. The anti-tearing feature ensures that a counter will be atomically updated, avoiding a partial update. 

  10. There are multiple NFC standards with differences in speed, protocol, and range, including NFC-A, NFC-B, NFC-C, NFC-F, and NFC-V. The MIFARE Ultralight cards use NFC-A, which is defined by the standard "ISO/IEC 14443 Type A". Annoyingly, each part of the standard costs $70. The NFC Forum Analog Technical Specification provides a lot of detail, though. 

  11. Instead of a wafer, you can buy the chips on tape but it costs more than twice as much. 





[#] Sun Jul 07 2024 10:38:09 UTC from rss <>

Subject: Standard cells: Looking at individual gates in the Pentium processor

[Reply] [ReplyQuoted] [Headers] [Print]

Intel released the powerful Pentium processor in 1993, a chip to "separate the really power-hungry folks from ordinary mortals." The original Pentium was followed by the Pentium Pro, the Pentium II, and others, spawning a long-running brand of high-performance processors, Intel's flagship line until the Core processors took over in 2006. The Pentium eventually became virtually synonymous with "PC" and even made it into pop culture.

Even though the Pentium is a complex chip with 3.3 million transistors, its transistors are visible under a microscope, unlike modern chips. By examining the chip, we can see the interesting circuits used for gates, flip-flops, and other circuits, including the use of an unusual technology called BiCMOS. In this article, I take a close look at the original Pentium chip1, showing how much of its circuitry was built out of structured rows of tiny transistors, a technique known as standard-cell design.

The die photo below shows the Pentium's fingernail-sized silicon die under a microscope. I removed the chip's four metal layers to show the underlying silicon, revealing the individual transistors, which are obscured in most die photos by the layers of metal. Standard-cell circuitry, indicated by red boxes, is recognizable because the circuitry is arranged in uniform columns of cells, giving it a characteristic striped appearance. In contrast, the chip's manually-optimized functional blocks are denser and more structured, giving them a darker appearance. Examples are the caches on the left, the datapaths in the middle, and the microcode ROMs on the right.

Die photo of the Intel Pentium processor with standard cells highlighted in red. The edges of the chip suffered some damage when I removed the metal layers. Click this image (or any other) for a larger version.

Die photo of the Intel Pentium processor with standard cells highlighted in red. The edges of the chip suffered some damage when I removed the metal layers. Click this image (or any other) for a larger version.

Standard-cell design

Early processors in the 1970s were usually designed by manually laying out every transistor individually, fitting transistors together like puzzle pieces to optimize their layout. While this was tedious, it resulted in a highly dense layout. Federico Faggin, designer of the popular Z80 processor, was almost done when he ran into a problem. The last few transistors wouldn't fit, so he had to erase three weeks of work and start over. The closeup of the resulting Z80 layout below shows that each transistor has a different, complex shape, optimized to pack the transistors as tightly as possible.2

A closeup of transistors in the Zilog Z80 processor (1976). This chip is NMOS, not CMOS, which provides more layout flexibility. The metal and polysilicon layers have been removed to expose the underlying silicon. The lighter stripes over active silicon indicate where the polysilicon gates were. I think this photo is from the Visual 6502 project but I'm not sure.

A closeup of transistors in the Zilog Z80 processor (1976). This chip is NMOS, not CMOS, which provides more layout flexibility. The metal and polysilicon layers have been removed to expose the underlying silicon. The lighter stripes over active silicon indicate where the polysilicon gates were. I think this photo is from the Visual 6502 project but I'm not sure.

Because manual layout is slow, difficult, and error-prone, people developed automated approaches such as standard-cell.3 The idea behind standard-cell is to create a standard library of blocks (cells) to implement each type of gate, flip-flop, and other low-level component. To use a particular circuit, instead of arranging each transistor, you use the standard design from the library. Each cell has a fixed height but the width varies as needed, so the standard cells can be arranged in rows. The Pentium die photo below seven cells in a row. (The rectangular blobs are doped silicon while the long, thin vertical lines are polysilicon.) Compare the orderly arrangement of these transistors with the Z80 transistors above.

Some standard cell circuitry in the Pentium.
I removed the metal to show the underlying silicon and polysilicon.

Some standard cell circuitry in the Pentium. I removed the metal to show the underlying silicon and polysilicon.

The photo below zooms out to show five rows of standard cells (the dark bands) and the wiring in between. Because CMOS circuitry uses two types of transistors (NMOS and PMOS), each standard-cell row appears as two closely-spaced bands: one of NMOS transistors and one of PMOS transistors. The space between rows is used as a "wiring channel" that holds the wiring between the cells. Power and ground for the circuitry run along the top and bottom of each row.

Some standard cells in the Pentium processor.

Some standard cells in the Pentium processor.

The fixed structure of standard cell design makes it suitable for automation, with the layout generated by "automatic place and route" software. The first step, placement, consists of determining an arrangement of cells that minimizes the distance between connected cells. Running long wires between cells wastes space on the die, since you end up with a lot of unnecessary metal wiring. But more importantly, long paths have higher resistance, slowing down the signals. Once the cells are placed in their positions, the "routing" step generates the wiring to connect the calls. Placement and routing are both difficult optimization problems that are NP-complete.

Intel started using automated place and route techniques for the 386 processor, since it was much faster than manual layout and dramatically reduced the number of errors. Placement was done with a program called Timberwolf, developed by a Berkeley grad student. As one member of the 386 team said, "If management had known that we were using a tool by some grad student as a key part of the methodology, they would never have let us use it." Intel developed custom software for routing, using an iterative heuristic approach. Standard-cell design is still used in current processors, but the software is much more advanced.

A brief overview of CMOS

Before looking at the standard cell circuits in detail, I'll give a quick overview of how CMOS circuits are implemented. Modern processors are built from CMOS circuitry, which uses two types of transistors: NMOS and PMOS. The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain regions (green) consist of silicon doped with impurities to change its semiconductor properties, forming N+ silicon. The gate consists of a layer of polysilicon (red), separated from the silicon by a very thin insulating oxide layer. Whenever polysilicon crosses active silicon, a transistor is formed. Diagram showing the structure of an NMOS transistor.

Diagram showing the structure of an NMOS transistor.

The NMOS and PMOS transistors are opposite in their construction and operation. A PMOS transistor swaps the N-type and P-type silicon, so it consists of P+ regions in a substrate of N silicon. In operation, an NMOS transistor turns on when the gate is high, while a PMOS transistor turns on when the gate is low.4 An NMOS transistor is best at pulling its output low, while a PMOS transistor is best at pulling its output high. In a CMOS circuit, the transistors work as a team, pulling the output high or low as needed; the "C" in CMOS indicates this "Complementary" approach. NMOS and PMOS transistors are not entirely symmetrical, however, due to the underlying semiconductor physics. Instead, PMOS transistors need to be larger than NMOS transistors, which helps to distinguish PMOS transistors from NMOS transistors on the die.

The layers of circuitry in the Pentium

The construction of the Pentium is more complicated than the diagram above, with four layers of metal wiring that connect the transistors.5 Starting at the surface of the silicon die, the Pentium's transistors are similar to the diagram, with regions of silicon doped to change their semiconductor properties. Polysilicon wiring is created on top of the silicon. The most important role of the polysilicon is that when it crosses doped silicon, a transistor is formed, with the polysilicon as the gate. However, polysilicon is also used as wiring over short distances.

Above the silicon, four layers of metal connect the components: multiple metal layers allow signals to crisscross the chip without running into each other. The metal layers are numbered M1 through M4, with M1 on the bottom. A few rules control the wiring: a metal layer can connect with the layer above or below through a tungsten plug called a "via". Only the bottom metal, M1, can connect to the silicon or polysilicon, through a "contact". The layers usually alternate between horizontal wiring and vertical wiring (at least locally). Thus, a signal from a transistor may travel through M1, bounce up to M2 and M3 to cross other signals, and then go back down to M1 to connect to another transistor. As you can see, automated place and route software has a complicated task, producing millions of complicated wiring paths as densely as possible.

The diagram below shows how the layers appear on the chip. (This photo shows one of the rare spots on the chip where all the layers are visible.) The M4 metal layer on top of the chip is the thickest, so it is mostly used for power, ground, and clock signals rather than data. An M4 ground wire covers the top of this photo. The next layer down is M3. In this part of the chip, M3 lines run vertically. (Due to optical effects, the vertical M3 lines may look like they are on top of M4, but they are below.) The horizontal M2 metal lines are lower and appear brown rather than golden, due to the oxide layers that cover them. The bottom metal layer is M1. The vertical M1 lines are thick in this part of the chip because they provide power to the circuitry.

The Pentium is constructed with four layers of metal. Because the chip has a three-dimensional structure, I used focus stacking to get a clearer image.

The Pentium is constructed with four layers of metal. Because the chip has a three-dimensional structure, I used focus stacking to get a clearer image.

The silicon and polysilicon are mostly obscured in the above photo. By removing all the metal layers, I obtained the image below. This image shows the same region as the image above, but it is hard to see the correlation because the metal layers almost completely obscure the silicon. The orderly columns of transistors reveal the standard-cell design. The irregular dark regions are doped silicon, which forms the chip's transistors. The dark or shiny horizontal bands are polysilicon. I will explain below how these regions form gates and other circuits.

A closeup of the silicon and polysilicon.

A closeup of the silicon and polysilicon.

Inverter

The fundamental CMOS gate is an inverter, shown in the schematic below. The inverter is built from one PMOS transistor (top) and one NMOS transistor (bottom). If the gate input is a "1", the bottom transistor turns on, pulling the output to ground (0). A "0" input turns on the top transistor, pulling the output high (1). Thus, this two-transistor circuit implements an inverter.10

Schematic diagram of a CMOS inverter.

Schematic diagram of a CMOS inverter.

The diagram below shows two views of how a standard-cell inverter appears on the Pentium die, with and without metal. The inverter consists of two transistors, just like the schematic above. The input is connected to the two polysilicon gates of the transistors. The metal output wire is connected to the two transistors (the left sides, specifically).

A standard-cell CMOS inverter in the Pentium.

A standard-cell CMOS inverter in the Pentium.

In more detail, the image on the left includes the bottom (M1) metal layer, but I removed the other metal layers. Two thick metal lines at the top and bottom provide power and ground to the standard cells. The multiple dark circles are contacts between the M1 metal layer and the metal layer on top (M2), providing a path for power and ground that eventually reaches the top (M4) metal layer and then the chip's pins. (The power and ground wires are thick to provide sufficient current to the circuitry while minimizing voltage drops and noise.) The small, lighter circles are vias that connect the M1 metal layer to the underlying silicon or polysilicon. The input to the gate is provided from the M2 metal, which connects to the M1 layer at the indicated contact. The smaller black dots at the top and bottom of this metal strip are vias, connections to the underlying silicon.

For the image on the right, I removed all four metal layers, revealing the polysilicon and doped silicon. Recall that a transistor is constructed from regions of doped silicon with a stripe of polysilicon between the regions, forming the transistor's gate. The diagram shows the two transistors that form the inverter. When combined with the metal wiring, they form the inverter schematic shown earlier. The final feature is the "well tap". The PMOS transistors are constructed in a "well" of N-doped silicon. The well must be kept at a positive voltage, so periodic "taps" connect the well to the +3.3V supply. As mentioned earlier, the PMOS transistor is larger than the NMOS transistor, which allowed me to figure out the transistor types in the photo.

By the way, the chip is built with a 600 nm process, so the width of the polysilicon lines is approximately 600 nm. For comparison, the wavelength of visible light is 400 to 700 nm, with 600 nm corresponding to orange light. This explains why the microscope photos are somewhat fuzzy; the features are the size of the wavelength of light.6

NAND gate

Another common gate in the Pentium is the NAND gate. The schematic below shows a NAND gate with two PMOS transistors above and two NMOS transistors below. If both inputs are high, the two NMOS transistors turn on, pulling the output low. If either input is low, a PMOS transistor turns on, pulling the output high. (Recall that NMOS and PMOS are opposites: a high voltage turns an NMOS transistor on while a low voltage turns a PMOS transistor on.) Thus, the CMOS circuit below produces the desired output for the NAND function.

Schematic of a CMOS NAND gate.

Schematic of a CMOS NAND gate.

The implementation of the gate as a standard cell, below, follows the schematic. The left photo shows the circuit with one layer of metal (M1). A thick metal line provides 3.3 volts to the gate; it has two contacts that provide power to the two PMOS transistors. The metal line for ground is similar, except only one NMOS transistor is grounded. The thinner metal in the middle has two contacts to get the transistor outputs and a via to connect the output to the M2 metal layer on top. Finally, two tiny bits of M1 metal connect the inputs from the M2 layer to the underlying polysilicon.

Implementation of a CMOS NAND gate as a standard cell.

Implementation of a CMOS NAND gate as a standard cell.

The right photo shows the circuit with all metal removed, showing the polysilicon and silicon. Since a transistor is formed where a polysilicon line crosses doped silicon, the two polysilicon lines create four transistors. Polysilicon functions both as local wiring and as the transistor gates. In particular, the inputs can be connected at the top or bottom of the circuit (or both), depending on what works best for wiring the circuitry. Note that the transistors are squashed together so the silicon in the middle is part of two transistors. An important asymmetry is that the output is taken from the middle of the PMOS transistors, wiring them in parallel, while the output is taken from the right side of the NMOS transistors, wiring them in series.

Zooming out a bit, the photo below shows three NAND gates. Although the underlying standard cell is the same for each one, there are differences between the gates. At the top, horizontal wiring links the inputs to M2 through vias. The length of each polysilicon line depends on the position of the metal. Moreover, in the middle of each gate, the metal connection to the output is positioned differently. Finally, note that the power wiring shifts upward in the upper right corner; this is to make room for a larger cell to the right. The point is that the standard cells aren't simply copies of each other, but are adjusted in each case to put the inputs, outputs, and power in the right location. Also note that these standard cells are not isolated, but are squeezed together so the PMOS transistors are touching. This optimization slightly increases the density.

Three NAND gates in the Pentium.

Three NAND gates in the Pentium.

OR-NAND gate

The standard cell library includes some complex gates. For instance, the gate below is a 5-input OR-NAND gate, computing ~((A+B+C+D)⋅E). In the NMOS circuit, transistors A through D are paralleled while E is in series. The PMOS circuit is the opposite, with A through D in series and E in parallel. To provide sufficient current, the PMOS circuit has two sets of transistors for A through D, so the PMOS block is much larger than the NMOS block.

The OR-NAND gate as it appears on the die. The left image shows the M1 metal layer while the right image shows the silicon
and polysilicon.

The OR-NAND gate as it appears on the die. The left image shows the M1 metal layer while the right image shows the silicon and polysilicon.

Latch

One of the key building blocks of the Pentium's circuitry is the latch. The idea of the latch is to hold one bit, controlled by the clock signal. A latch is "transparent": the latch's input immediately appears on the output while the clock is high. But when the clock is low, the latch holds its previous value. The latch is implemented with a feedback loop that passes the latch's output back into the latch. The heart of this latch circuit is the multiplexer (mux), which selects either the previous output (when the clock is low) or the new input (when the clock is high). The inverters amplify the feedback signal so it doesn't decay in the loop. An inverter also amplifies the output so it can drive other circuitry.

The circuit for a latch.

The circuit for a latch.

The circuit for a multiplexer is interesting since it uses "pass transistors". That is, the transistors simply pass their input through to the output, rather than pulling a signal to power or ground as in a typical logic gate. The schematic shows how this works. First, suppose that the select line is low. This will turn on the two transistors connected to the first input, allowing its level to flow to the output. Meanwhile, both transistors connected to the second input will be turned off, blocking that signal. But if the select line is high, everything switches. Now, the two transistors connected to the second input turn on, passing its level to the output. Thus, the multiplexer selects the first input if the control signal is low, and the second input if the control signal is high.

A multiplexer and its implementation in CMOS.

A multiplexer and its implementation in CMOS.

The diagram below shows a multiplexer, part of a latch. On the left, an inverter feeds into one input of the multiplexer.7 On the right is the other input to the multiplexer. The output is taken from the middle, between the pairs of the transistors.

A multiplexer as it appears on the Pentium die.

A multiplexer as it appears on the Pentium die.

Note that the multiplexer's circuit is opposite, in a way, to a logic gate. In a logic gate, you want either the NMOS transistor on or the PMOS transistor on, so the output is pulled low or high respectively. This is accomplished by giving the signals on the transistor gates the same polarity, so the same polysilicon line runs through both transistors. In a multiplexer, however, you want the corresponding PMOS and NMOS transistors to turn on at the same time, so they can pass the signal. This requires the signals on the transistor gates to have opposite polarity. One polysilicon line runs through the right PMOS transistor and the left NMOS transistor. The other polysilicon line runs through the left PMOS transistor and the right NMOS transistor, connected by metal wiring (not shown). The multiplexer includes an inverter to provide the necessary signal, but I cropped it out of the diagram below.

The flip-flop

The Pentium makes extensive use of flip-flops. A flip-flop is similar to a latch, except its clock input is edge-sensitive instead of level-sensitive. That is, the flip-flop "remembers" its input at the moment the clock goes from low to high, and provides that value as its output. This difference may seem unimportant, but it turns out to make the flip-flop more useful in counters, state machines, and other clocked circuits.

In the Pentium, a flip-flop is constructed from two latches: a primary latch and a secondary latch. The primary latch passes its value through while the clock is low and holds its value when the clock is high. The output of the primary latch is fed into the secondary latch, which has the opposite clock behavior. The result is that when the clock switches from low to high, the primary latch stops updating its output at the same time that the secondary starts passing this value through, providing the desired flip-flop behavior.

A standard-cell flip-flop.

A standard-cell flip-flop.

The photo above shows a standard-cell flop-flop, with an intricate pattern of metal wiring connecting the various sub-components. There are a few variants; with minor logic changes, the flip-flop can have "set" or "reset" inputs, bypassing the clock to force the output to the desired state. (Set and reset functions are useful for initializing flip-flops to a desired value, for example when the processor starts up.)

The BiCMOS buffer

Although I've been discussing CMOS circuits so far, the Pentium was built with BiCMOS, a process that allows circuits to use bipolar transistors in addition to CMOS. By adding a few extra processing steps to the regular CMOS manufacturing process, bipolar (NPN and PNP) transistors can be created. The Pentium made extensive use of BiCMOS circuits since they reduced signal delays by up to 35%. Intel also used BiCMOS for the Pentium Pro, Pentium II, Pentium III, and Xeon processors (but not the Pentium MMX). However, as chip voltages dropped, the benefit from bipolar transistors dropped too and BiCMOS was eventually abandoned.

The schematic below shows a standard-cell BiCMOS buffer in the Pentium chip.8 This circuit is more complex than a CMOS buffer: it uses two inverters, an NPN pull-up transistor, an NMOS pull-down transistor, and a PMOS pull-up transistor.9

Reverse-engineered schematic of the BiCMOS buffer.

Reverse-engineered schematic of the BiCMOS buffer.

In the die images below, note the circular structure of the NPN transistor, very different from the linear structure of the NMOS and PMOS transistors and considerably larger. A sign of the buffer's high-current drive capacity is the output's thick metal wiring, much thicker than the typical signal wiring.

A BiCMOS buffer in the Pentium.

A BiCMOS buffer in the Pentium.

Conclusions

Standard-cell layout is extensively used in modern chips. Modern processors, with their nanometer-scale transistors, are much too small to study under a microscope. The Pentium, on the other hand, has features large enough that its circuits can be observed and reverse engineered. Of course, with 3.3 million transistors, the Pentium is too much for me to reverse engineer in depth, but I still find it interesting to study small-scale circuits and see how they were implemented. This post presented a small sample of the standard cells in the Pentium. The full standard-cell library is much larger, with dozens, if not hundreds, of different cells: many types of logic gates in a variety of sizes and drive strengths. But the fundamental design and layout principles are the same as the cells described here.

One unusual feature of the Pentium is its use of BiCMOS circuitry, which had a peak of popularity in the 1990s, right around the era of the Pentium. Although changing tradeoffs made BiCMOS impractical for digital circuitry, BiCMOS still has an important role in analog ICs, especially high-frequency applications. The Pentium in a sense is a time capsule with its use of BiCMOS.

I hope that you have enjoyed this look at some of the Pentium's circuits. I find it reassuring to see that even complex processors are made up of simple transistor circuits and you can observe and understand these circuits if you look closely.

For more on standard-cell circuits, I wrote about standard cells in an IBM chip and standard cells in the 386 (the 386 article has a lot of overlap with this one). Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon occasionally as @kenshirriff@oldbytes.space.

Notes and references

  1. In this blog post, I'm focusing on the "P54C" version of the original Pentium processor. Intel produced many different versions of the Pentium, and it can be hard to keep them straight. Part of the problem is that "Pentium" is a brand name, with multiple microarchitectures, lines, and products. At the high level, the Pentium (1993) was followed by the Pentium Pro (1995) Pentium II (1997), Pentium III (1999), Pentium 4 (2000), and so on. The original Pentium used the P5 microarchitecture, a superscalar microarchitecture that was advanced but still executed instruction in order like traditional microprocessors. The Pentium Pro was a major jump, implementing a microarchitecture called P6 that broke instructions into micro-operations and executed them out of order using dataflow techniques. The next microarchitecture version was NetBurst, first used with the Pentium 4. NetBurst provided a deep pipeline and introduced hyper-threading, but it was disappointingly slow and was replaced by the Core microarchitecture. The Core microarchitecture is based on the P6 and is Intel's current microarchitecture.

    I'll focus now on the original Pentium, which went through several substantial revisions. The first Pentium product was the 80501 (codenamed P5), running at 60 or 66 MHz and using 5 volts. These chips were built with an 800 nm process and contained 3.1 million transistors.

    The power consumption of these chips was disappointing, so Intel improved the chip, producing the 80502. These chips, codenamed P54C, used 3.3 volts and ran at 75-120 MHz. The chip's architecture remained essentially the same but support was added for multiprocessing, boosting the transistor count to 3.3 million. The P54C had a much more advanced clock circuit, allowing the external bust speed to stay low (50-66 MHz) while the internal clock speed—and thus performance—climbed to 100 MHz. The chips were built with a smaller 600 nm process with four layers of metal, compared to the previous three. Visually, the die of the P54C is almost the same as the P5, with the additional multiprocessing logic at the bottom and the clock circuitry at the top. For this article, I examined the P54C, but the standard cells should be similar in other versions.

    Next, Intel moved to the 350 nm process, producing a smaller, faster Pentium chip, codenamed the P54CS; the die looks almost identical to the P54C (but smaller), with subtle changes to the bond pads. Another variant was designed for mobile use: the Pentium processor with "Voltage Reduction Technology" reduced power consumption by using a 2.9- or 3.1-volt supply for the core and a 3.3-volt supply to drive the I/O pins. These were built first with the 600 nm process (75-100 MHz) and then the 350 nm process (100-150 MHz).

    The biggest change to the original Pentium was the Pentium MMX, with part number 80503 and codename P55C. This chip extended the x86 instruction set with 57 new instructions for vector processing. It was built on a 350 nm process before moving to 280 nm, and had 4.5 million transistors. More obscure variants of the original Pentium include the P54CQS, P54CS, P54LM, P24T, and Tillamook, but I won't get into them. 

  2. Circuits that had a high degree of regularity, such as the arithmetic/logic unit (ALU) or register storage were typically constructed by manually laying out a block to implement the circuitry for one bit and then repeating the block as needed. Because a circuit was repeated 32 times for the 32-bit processor, the additional effort was worthwhile. 

  3. An alternative layout technique is the gate array, which doesn't provide as much flexibility as a standard cell approach. In a gate array (sometimes called a master slice), the chip had a fixed array of transistors (and often resistors). The chip could be customized for a particular application by designing the metal layer to connect the transistors as needed. The density of the chip was usually poor, but gate arrays were much faster to design, so they were advantageous for applications that didn't need high density or produced a relatively small volume of chips. Moreover, manufacturing was much faster because the silicon wafers could be constructed in advance with the transistor array and warehoused. Putting the metal layer on top for a particular application could then be quick. Similar gate arrays used a fixed arrangement of logic gates or flip-flops, rather than transistors. Gate arrays date back to 1967

  4. The behavior of MOS transistors is complicated, so the description above is simplified, just enough to understand digital circuits. In particular, MOS transistors don't simply switch between "on" and "off" but have states in between. This allows MOS transistors to be used in a wide variety of analog circuits. 

  5. The earliest Pentiums had three layers of metal wiring, but Intel moved to a four-layer process with the P54C die, the version that I'm examining. 

  6. To get this level of magnification with my microscope, I had to use an oil immersion lens. Instead of looking at the chip in air, as with a normal lens, I had to put a drop of special microscope oil on the chip. I carefully lower the lens until it dips into the oil (making sure I don't crash the lens into the chip). The purpose of the oil is that its index of refraction is almost the same as glass, much higher than air. This gives the lens a higher "numerical aperture", allowing the lens to resolve smaller details. 

  7. For completeness, I'll mention that the inverter feeding the multiplexer inverter isn't exactly an inverter. Specifically, the inverter's two transistors are not tied together to produce an output. Instead, the inverter's NMOS transistor provides an input to the multiplexer's NMOS transistor and likewise, the PMOS transistor provides an input to the PMOS transistor. The omission of this connection does not affect the circuit's behavior, but it makes calling the circuit an inverter and a multiplexer a bit of an abstraction. 

  8. Intel called this gate "BiNMOS" rather than "BiCMOS" because it uses a bipolar transistor and an NMOS transistor to drive the output, rather than two bipolar transistors. The Pentium's BiCMOS circuitry is described in a conference paper, showing a second NPN transistor to protect the first one. I don't see the second transistor on the die so the two transistors may be implemented in one silicon structure. Reference: R. F. Krick et al., “A 150 MHz 0.6 µm BiCMOS superscalar microprocessor,” IEEE Journal of Solid-State Circuits, vol. 29, no. 12, Dec. 1994, doi:10.1109/4.340418

  9. The Pentium contains multiple types of BiCMOS standard cells, which I'll show in this footnote. The cell below is an inverter. It is similar to the BiCMOS buffer described earlier, except it lacks the first inverter in the circuit. To make room for the NPN transistor on the left, the PMOS transistors are shifted to the right. As a result, they don't line up with the PMOS transistors in other cells. This is a break from the traditional orderliness of standard cells.

    A BiCMOS inverter with PMOS on the left and NMOS on the right. The input is at the bottom and the output is in the middle.

    A BiCMOS inverter with PMOS on the left and NMOS on the right. The input is at the bottom and the output is in the middle.

    The BiCMOS inverter below is similar, except it uses two NPN transistors, providing more output drive. I removed the M1 metal layer to provide a better view of the transistors.

    A BiCMOS inverter with two NPN transistors. The PMOS transistors are in the lower left and the NMOS transistors are in the lower right.

    A BiCMOS inverter with two NPN transistors. The PMOS transistors are in the lower left and the NMOS transistors are in the lower right.

    Another interesting BiCMOS circuit is the D flip-flop with enable and BiCMOS output, shown below. This is similar to the earlier flip-flop except it has an enable input, allowing it to either load a new value triggered by the clock, or to hold its earlier value. This allows the flip-flop to remember a value for more than one clock cycle. The additional functionality is implemented by another multiplexer, selecting either the old value or the new value. (This multiplexer is, in a way, one level higher than the multiplexer in each latch.) The transistor for the BiCMOS output is in the upper right, poking out from under the metal. (This circuit might be implemented as two independent cells, one for the flip-flop and one for the driver; I'm not sure.)

    A D flip-flop in the Pentium.

    A D flip-flop in the Pentium.

     

  10. One puzzling inverter variant is used in a gate I'll call the "slow buffer". This buffer consists of two inverters, so it passes its input through to the output, buffered. The strange part is that the first inverter uses transistors with wide gates, which makes these transistors much weaker than regular transistors. As a result, the first inverter will be slow to switch states. My guess is that this circuit is used to delay signals, for example, to keep a signal aligned with another signal that is delayed by multiple logic gates.

    The buffer consists of two inverters. The first inverter uses wide, weak transistors.

    The buffer consists of two inverters. The first inverter uses wide, weak transistors.

    You might expect that larger transistors would be stronger, not weaker. The problem is that these transistors are larger in the wrong dimension. If you make the gate wider, the effect is similar to multiple transistors in parallel, providing more current. But if you make the gate longer (as in this case), the effect is similar to multiple transistors in series, so the resistances add and the total current is reduced. In most cases, transistors are constructed with the smallest gate length possible, which is determined by the manufacturing process, so the transistors here are unusual. This chip was manufactured with an 800 nm process, so the smallest gate length is approximately 800 nm. The gate width (the normal direction for variation) varies dramatically depending on the circuit, optimized to provide maximum performance. 





[#] Tue Jul 16 2024 09:43:53 UTC from rss <>

Subject: Inside an IBM/Motorola mainframe controller chip from 1981

[Reply] [ReplyQuoted] [Headers] [Print]

In this article, I look inside a chip in the IBM 3274 Control Unit.1 But before I discuss the chip, I need to give some background on mainframes. (I didn't completely analyze the chip, so don't expect a nice narrative or solid conclusions.)

Die photo of the Motorola/IBM SC81150 chip. Click this image (or any other) for a larger version.

Die photo of the Motorola/IBM SC81150 chip. Click this image (or any other) for a larger version.

IBM's vintage mainframes were extremely underpowered compared to modern computers; a System/370 mainframe ran well under 1 million instructions per second, while a modern laptop executes billions of instructions per second. But these mainframes could support rooms full of users, while my 2017 laptop can barely handle one person.2 Mainframes achieved their high capacity by offloading much of the data entry overhead so the mainframe could focus on the "important" work. The mainframe received data directly into memory in bulk over high-speed I/O channels, without needing to handle character-by-character editing. For instance, a typical data entry terminal (a "3270") let the user update fields on the screen without involving the computer. When the user had filled out the screen, pressing the "Enter" key sent the entire data record to the mainframe at once. Thus, the mainframe didn't need to process every keystroke; it only dealt with complete records. (This is also why many modern keyboards have an "Enter" key.)

A room with IBM 3179 Color Display Stations, 1984. Note that these are terminals, not PCs. From 3270 Information Display System Introduction.

A room with IBM 3179 Color Display Stations, 1984. Note that these are terminals, not PCs. From 3270 Information Display System Introduction.

But that was just the beginning of the hierarchy of offloaded processing in a mainframe system. Terminals weren't attached directly to the mainframe. You could wire 16 terminals to a terminal multiplexer (such as the 3299). This would in turn be connected to a 3274 Control Unit that merged the terminal data and handled the network protocols. The Control Unit was connected to the mainframe's channel processor which handled I/O by moving data between memory and peripherals without slowing down the CPU. All these layers allowed the mainframe to focus on the important data processing while the layers underneath dealt with the details.3

An overview of the IBM 3270 Information Display System attachment. The yellow highlights indicate the 3274 Control Unit. From 3270 Information Display System: Introduction.

An overview of the IBM 3270 Information Display System attachment. The yellow highlights indicate the 3274 Control Unit. From 3270 Information Display System: Introduction.

The 3274 Control Unit (highlighted above) is the source of the chip I examined. The purpose of the Control Unit "is to take care of all communication between the host system and your organization's display stations and printers". The diagram above shows how terminals were connected to a mainframe, with the 3274 Control Unit (indicated by arrows) in the middle. The 3274 was an all-purpose box, handling terminals, printers, modems, and encryption (if needed). It could communicate with the mainframe at up to 650,000 characters per second. The control unit below (above) is a boring beige box. The control panel is minimal since people normally didn't interact with the unit. On the back are coaxial connectors for the lines to the terminals, as well as connectors to interface with the computer and other peripherals.

An IBM 3274-41D Control Unit. From bitsavers.

An IBM 3274-41D Control Unit. From bitsavers.

The Keystone II board

In 1983, IBM announced new Control Unit models with twice the speed: these were the Model 41 and Model 61. These units were built around a board called Keystone II, shown below. The board is constructed with IBM's peculiar PCB style. The board is arranged as a grid of squares with the PCB traces too small to see unless you zoom in. Most of the decoupling capacitors are in IBM's thin, rectangular packages, although I see a few capacitors in more standard blue packages. IBM is almost a parallel universe with its unusual packaging for ICs and capacitors as well as the strange circuit board appearance.

The Keystone II board. The box is labeled Keystone II FCS [i.e. First Customer Shipment] July 23, 1982. Photo from bitsavers, originally from Bob Roberts.

The Keystone II board. The box is labeled Keystone II FCS [i.e. First Customer Shipment] July 23, 1982. Photo from bitsavers, originally from Bob Roberts.

Most of the chips on the board are IBM chips packaged in square aluminum cans, known as MST (Monolithic System Technology). The first line on each package is the IBM part number, which is usually undocumented. The empty socket can hold a ROS chip; ROS is Read-Only Store, known as ROM to people outside IBM. The Texas Instruments ICs in the upper right are easier to identify; the 74LS641 chips are octal bus transceivers, presumably connecting this board to the rest of the system. Similarly, the 561 5843 is a 74S240 octal bus driver while the 561 6647 chips are 74LS245 octal bus transceivers.

The memory chips on the left side of this board are interesting: each one consists of two "piggybacked" 16-kilobit DRAM chips. IBM's part number 8279251 corresponds to the Intel 4116 chip, originally made by Mostek. With 18 piggybacked chips, the board holds 64 kilobytes of parity-protected memory.

The photo below shows the Keystone II board mounted in the 3274 Control Unit. The board is in slot E towards the left and the purple Motorola IC is visible.

The Keystone II card in slot E of a 3274-41D Control Unit. Photo from bitsavers.

The Keystone II card in slot E of a 3274-41D Control Unit. Photo from bitsavers.

The Motorola/IBM chip

The board has a Motorola chip in a purple ceramic package; this is the chip that I examined. Popping off the golden lid reveals the silicon die underneath. The package has the part number "SC81150R", indicating a Motorola Special/Custom chip. This part number is also visible on the die, as shown below.

The corner of the die is marked with the SC81150 part number. Bond pads and bond wires are also visible.

The corner of the die is marked with the SC81150 part number. Bond pads and bond wires are also visible.

While the outside of the IC is labeled "Motorola", there are no signs of Motorola internally. Instead, the die is marked "IBM" with the eight-striped logo. My guess is that IBM designed the chip and Motorola manufactured it.

The IBM logo on the die.

The IBM logo on the die.

The diagram below shows the chip with some of the functional blocks identified. Around the outside are the bond pads and the bond wires that are connected to the chip's grid of pins. At the right is the 16×16 block of memory, along with its associated control, byte swap, and output circuitry. The yellowish-white lines are the metal layer on top of the chip that provides the chip's wiring. The thick metal lines distribute power and ground throughout the chip. Unlike modern chips, this chip only has a single metal layer, so power and ground distribution tends to get in the way of useful circuitry.

The die with some functional blocks identified.

The die with some functional blocks identified.

The chip is centered around a 16-bit bus (yellow line) that connects many part of the chip. To write to the bus, a circuit pulls bus lines low. The bus lines are kept high by default by 16 pull-up transistors. This approach was fairly common in the NMOS era. However, performance is limited by the relatively weak pull-up current, making bus lines slow to go high due to R-C delays. For higher performance, some chips would precharge the bus high during one clock cycle and then pull lines low during the next cycle.

The two groups of I/O pins at the bottom are connected to the input buffer on the left and the output buffer on the right. The input buffer includes XOR circuits to compute the parity of each byte. Curiously, only 6 bits of the inputs are connected to the main bus, although other circuits use all 8 bits. The buffer also has a circuit to test for a zero value, but only using 5 of the bits.

I've put red boxes around the numerous PLAs, which can be identified by their grids of transistors. This chip has an unusually large number of PLAs. Eric Schlaepfer hypothesizes that the chip was designed on a prototype circuit board using commercial PAL chips for flexibility, and then they transferred the prototype to silicon, preserving the PLA structure. I didn't see any obvious structure to the PLAs; they all seemed to have wires going all over.

The miscellaneous logic scattered around the chip includes many latches and bus drivers; the latch circuit is similar to the memory cells. I didn't fully reverse-engineer this circuitry but I didn't see anything that looked particularly interesting, such as an ALU or counter. The circuitry near the PLAs could be latches as part of state machines, but I didn't investigate further.

I was hoping to find a recognizable processor inside the package, maybe a Motorola 6809 or 68000 processor. Instead, I found a complicated chip that doesn't appear to be a processor. It has a 16×16 memory block along with about 20 PLAs (Programmable Logic Arrays), a curiously large number. PLAs are commonly used in processors for decoding instructions, since they can match bit patterns. I couldn't find a datapatch in the chip; I expected to see the ALU and registers organized in a large but regular 8-bit or 16-bit block of circuitry. The chip doesn't have any ROM4 so there's no microcode on the chip. For these reasons, I think the chip is not a processor or microcontroller, but a specialized data-handling chip, maybe using the PLAs to interpret bits of a protocol.

The chip is built with NMOS technology, the same as the 6502 and 8086 for instance, rather than CMOS technology that is used in modern chips. I measured the transistor features and the chip appears to be built with a 3.5 µm process (not nm!), which Motorola also used for the 68000 processor (1979).

The memory buffer

The chip has a 16×16 memory buffer, which could be a register file or a FIFO buffer. One interesting feature is that the buffer is triple-ported, so it can handle two reads and one write at the same time. The buffer is implemented as a grid of cells, each storing one bit. Each row corresponds to a 16-bit word, while each column corresponds to one bit in a word. Horizontal control lines (made of polysilicon) select which word gets written or read, while vertical bit lines of metal transmit each bit of the word as it is written or read.

The microscope photo below shows two memory cells. These cells are repeated to create the entire memory buffer. The white vertical lines are metal wiring. The short segments are connections within a cell. The thicker vertical lines are power and ground. The thinner lines are the read and write bit lines. The silicon die itself is underneath the metal. The pinkish regions are active silicon, doped to make it conductive. The speckled golden lines are regions are polysilicon wires between the silicon and the metal. It has two roles: most importantly, when polysilicon crosses active silicon, it forms the gate of a transistor. But polysilicon is also used as wiring, important since this chip only has one layer of metal. The large, dark circles are contacts, connections between the metal layer and the silicon. Smaller square regions are contacts between silicon and polysilicon.

Two memory cells, side by side, as they appear under the microscope.

Two memory cells, side by side, as they appear under the microscope.

It was too difficult to interpret the circuits when they were obscured by the metal layer so I dissolved the metal layer and oxide with hydrochloric acid and Armour Etch respectively. The photo below shows the die with the metal removed; the greenish areas are remnants in areas where the metal was thick, mostly power and ground supplies. The dark regions in this image are regions of doped silicon. These are the active areas of the chip, showing the blocks of circuitry. There are also some thin lines of polysilicon wiring. The memory buffer is the large block on the right, just below the center.

The chip with the metal layer removed. Click to zoom in on the image.

The chip with the metal layer removed. Click to zoom in on the image.

Like most implementations of static RAM, each storage cell of the buffer is implemented with cross-coupled inverters, with the output of one inverter feeding into the input of the other. To write a new value to the cell, the new value simply overpowers the inverter output, forcing the cell to the new state. To support this, one of the inverters is designed to be weak, generating a smaller signal than a regular inverter. Most circuits that I've examined create the inverter by using a weak transistor, one with a longer gate. This chip, however, uses a circuit that I haven't seen before: an additional transistor, configured to limit the current from the inverter.

The schematic below shows one cell. Each cell uses ten transistors, so it is a "10T" cell. To support multiple reads and writes, each row of cells has three horizontal control signals: one to write to the word, and two to read. Each bit position has one vertical bit line to provide the write data and two vertical bit lines for the data that is read. Pass transistors connect the bit lines to the selected cells to perform a read or a write, allowing the data to flow in or out of the cell. The symbol that looks like an op-amp is a two-transistor NMOS buffer to amplify the signal when reading the cell.

Schematic of one memory cell.

Schematic of one memory cell.

With the metal layer removed, it is easier to see the underlying silicon circuitry and reverse-engineer it. The diagram below shows the silicon and polysilicon for one storage cell, corresponding to the schematic above. (Imagine vertical metal lines for power, ground, and the three bitlines.)

One memory cell with the metal layer removed. I etched the die a few seconds too long so some of the polysilicon is very thin or missing.

One memory cell with the metal layer removed. I etched the die a few seconds too long so some of the polysilicon is very thin or missing.

The output from the memory unit contains a byte swapper. A 16-bit word is generated with the left half from the read 1 output and the second half from the read 2 output, but the bytes can be swapped. This was probably used to read an aligned 16-bit word if it was unaligned in memory.

Parity circuits

In the lower right part of the chip are two parity circuits, each computing the parity of an 8-bit input. The parity of an input is computed by XORing the bits together through a tree of 2-input XOR gates. First, four gates process pairs of input bits. Next, two XOR gates combine the outputs of the first gates. Finally, an XOR gate combines the two previous outputs to generate the final parity.

The arrangement of the 14 XOR gates to compute parity of the two 8-bit values A and B.

The arrangement of the 14 XOR gates to compute parity of the two 8-bit values A and B.

The schematic below shows how an XOR gate is built from a NOR gate and an AND-NOR gate. If both inputs are 0, the first NOR gate forces the output to 0. If both inputs are 1, the AND gate forces the output to 0. Thus, the circuit computes XOR. Each labeled block above implements the XOR circuit below.

Schematic of an XOR gate.

Schematic of an XOR gate.

Conclusion

My conclusion is that the processor for the Keystone II board is probably one of the other chips, one of the IBM metal-can MST packages, and this chip helps with data movement in some way. It would be possible to trace out the complete circuitry of the chip and determine exactly how it functions, but that is too time-consuming a project for this relatively obscure chip.

Follow me on Twitter @kenshirriff or RSS for more chip posts. I'm also on Mastodon occasionally as @kenshirriff@oldbytes.space. Thanks to Al Kossow for providing the chip and Dag Spicer for providing photos. Thanks to Eric Schlaepfer for discussion.

Notes and references

  1. The 3274 Control Unit was replaced by the 3174 Establishment Controller, introduced in 1986. An "Establishment Controller" managed a cluster of peripherals or PCs connected to a host mainframe, essentially a box that provided a "kitchen-sink" of functionality including terminal support, local disk storage, Ethernet or token-ring networking, ASCII terminal support, encryption/decryption, and modem support. These units ranged from PC-sized boxes to mini-fridge-sized boxes, depending on how much functionality was required. 

  2. I'm serious that my laptop can barely handle one person; my 2017 MacBook Air starts dropping characters if it has even a moderate load, and I have to start one-finger typing. You would think that a 1.8 GHz dual-core i5 processor could handle more than 2 characters per second. I don't know if there's something wrong with it, or if modern software just has too much overhead. Don't worry, I upgraded and do most of my work on a faster, more recent laptop. 

  3. The IBM hardware model had the CPU focusing on the big picture, while the hierarchy of boxes underneath processed data, performed storage, handled printing, and so forth. In a sense, this paralleled the structure of offices in that era, where executives had assistants and secretaries to do the tedious work for them: typing, filing, and so forth. Nowadays, the computer hierarchy and the office hierarchy are both considerably flatter. Maybe there's a connection? 

  4. A ROM and a PLA are similar in many ways. The general distinction is that a ROM activates one word (row) at a time, while a PLA can activate multiple rows at a time and combine the values, giving more flexibility. A ROM generally has a binary decoder to select the row. This decoder can be recognized by its binary structure: transistors alternating by 1's, by 2's, by 4's, and so forth. 





[#] Sat Aug 03 2024 07:48:07 UTC from rss <>

Subject: Reverse engineering the 59-pound printer onboard the Space Shuttle

[Reply] [ReplyQuoted] [Headers] [Print]

The Space Shuttle contained a bulky printer so the astronauts could receive procedures, mission plans, weather reports, crew activity plans, and other documents. Needed for the first Shuttle launch in 1981, this printer was designed in just 7 months, built around an Army communications terminal. Unlike modern printers, the Shuttle's printer contains a spinning metal drum with raised characters, allowing it to rapidly print a line at a time.

The Space Shuttle's Interim Teleprinter. The horizontal rails allowed it to be mounted in a Space Shuttle stowage locker.

The Space Shuttle's Interim Teleprinter. The horizontal rails allowed it to be mounted in a Space Shuttle stowage locker. Click this image (or any other) for a larger version.

This printer is known as the Space Shuttle Interim Teleprinter System.1 As the name "Interim" suggests, this printer was intended as a stop-gap measure, operating for a few flights until a better printer was operational. However, the teleprinter proved to be more reliable than its replacement, so it remained in use as a backup for over 50 flights, often printing thousands of lines per flight. This didn't come cheap: with a Shuttle flight costing $27,000 per pound, putting the 59-pound teleprinter in space cost over $1.5 million per flight.

Pilot Overmyer reading a printout from the teleprinter, STS-5, November 16, 1982. From National Archives. The description says that this output is from the Text and Graphics System, but the yellow paper and the date show that this is the Interim Teleprinter.

Pilot Overmyer reading a printout from the teleprinter, STS-5, November 16, 1982. From National Archives. The description says that this output is from the Text and Graphics System, but the yellow paper and the date show that this is the Interim Teleprinter.

We obtained access to a Shuttle teleprinter (probably a development system that remained on the ground) and wanted to put it into operation. I had to reverse engineer three of the boards inside the printer to determine the data format the printer accepted: serial data encoded into audio. But after analyzing the printer and performing a lot of maintenance, we succeeded in getting the printer to print. In this article, I'll describe the Shuttle's Interim Teleprinter, explain its circuitry and drum-based printing mechanism, and show it in operation.

History of the Shuttle's Interim Teleprinter

The motivation for the teleprinter goes back to the Apollo program. During Apollo missions, the only way to send information to the astronauts was by talking to them over the radio and having the astronauts write down the data. NASA decided that the Space Shuttle should include a mechanism to send text and images to the astronauts, a 78-pound, high-tech fax machine called the Uplink Text & Graphics System (TAGS). A high-resolution grayscale image was sent to the Shuttle as a digital data stream. Onboard the Shuttle, a squat CRT displayed the image one line at a time and a fiber-optic faceplate transferred each line to light-sensitive silver emulsion paper. The paper was developed by passing it over a hot roller at 260ºF for 25 seconds, creating a permanent image.

The one flaw in this plan was that sending the digital image to the Shuttle required the Tracking and Data Relay Satellite System (TDRS), which due to delays wouldn't be ready until the sixth Shuttle flight. (The TDRS was a space-based replacement for the worldwide network of ground stations that was used during Apollo.) As a result, NASA decided just seven months before the first Shuttle launch that they needed an interim system "for transmission of real-time, flight-plan changes and other operational data to the crew."2

The Shuttle teleprinter is the result of this rushed effort to create a printer that could work over the existing audio channel rather than the digital TDRS satellite. Due to the time pressure, the Shuttle teleprinter needed to be based on an off-the-shelf printer. Thermal and electrostatic printers were rejected due to toxicity and flammability problems. (The Shuttle teleprinter used a roll of yellowish paper, which required a NASA waiver due to its flammability, a concern ever since the Apollo-1 disaster).

The AN/UGC-74 military communications terminal. This terminal was developed by the Army but also used by the Navy and Air Force. Image from the Operator's Manual, TM 11-5815-602-10.

The AN/UGC-74 military communications terminal. This terminal was developed by the Army but also used by the Navy and Air Force. Image from the Operator's Manual, TM 11-5815-602-10.

The decision was made to use a military communications terminal, the the AN/UCG-743 "Tactical Teletype". The terminal's interfacing was very flexible, supporting serial data in either ASCII or Baudot format, with multiple configurations and baud rates (up to 1200 baud), using either a current-loop or voltage signals. The military terminal supported two-way communication, so it had a keyboard. Remarkably, the terminal also implemented a word processor, controlled by a Motorola 6800 microprocessor (ancestor of the famous MOS 6502). The word processor allowed messages to be composed offline, minimizing the radio transmission time, which was important in a hostile environment. As will be seen, this 100-pound military system required many large changes to be usable on the Space Shuttle, most visibly removing the keyboard.

The printing mechanism

The teleprinter uses a spinning drum with raised characters, shown below.4 To print a character, the printer fires a hammer, forcing the inked ribbon and paper against the raised character on the drum. The drum is 80 characters wide, matching the line length, and there are 80 corresponding hammers, one for each print position. The drum has 64 printable characters, wrapped around each position of the drum.

The printer's drum rotating drum has 64 raised characters in each column. The characters spiral around the drum and are in reverse order, minimizing the chance that a line will fire all the hammers near-simultaneously.

The printer's drum rotating drum has 64 raised characters in each column. The characters spiral around the drum and are in reverse order, minimizing the chance that a line will fire all the hammers near-simultaneously.

The printer prints a line at a time, not instantaneously, but during each revolution of the drum. When the drum makes one complete revolution, each of the 64 characters passes by each print position once. Printing requires precise timing of the hammers to strike the right character on the drum as it whizzes by. The printer control circuitry triggers each hammer at the proper time, when the desired character on the drum is lined up with the hammer, producing the desired text.5

The character set is slightly different between the military printer and the Shuttle printer. The military drum had 64 ASCII characters (upper-case letters only, numbers, and special characters). The drum doesn't contain an explicit space character, since nothing is printed for a space. In its place, the drum has a diamond "◊", used as a special character to indicate a parity error or other error. The drum for the Shuttle teleprinter replaces 10 ASCII special characters with symbols that are more useful to the Shuttle, such as Greek letters for angles. Specifically, the characters ;@[\]^!"#$ are replaced by θ✓‾↑↓~αβΔϕ.

With the teleprinter disassembled, the 20 hammer cards are visible at the front. Two hammer driver cards are to the right of the hammer cards.

With the teleprinter disassembled, the 20 hammer cards are visible at the front. Two hammer driver cards are to the right of the hammer cards.

The video below shows a closeup of the hammers as they strike the paper to print text. The text is the teleprinter's built-in test message: "THE LAZY YELLOW DOG WAS CAUGHT BY THE SLOW RED FOX AS HE LAY SLEEPING IN THE SUN". This test message is based on the traditional quick brown fox..., which is a pangram, containing all 26 letters, but the teleprinter's test sentence is missing J, K, M, Q, and V. However, the test message is exactly 80 characters long and replaces spaces with the diamond "◊", so it is effective for verifying that all 80 columns work.

The electronics

The photo below shows the circuitry inside the teleprinter, looking down from above. At the left are the three interface boards, custom boards that demodulate the incoming audio signal. In front of the interface boards are large inductors to filter the incoming power. Hidden beneath them, a solid-state relay controls the power to the rest of the printer, implementing the low-power standby mode. In the middle, the blue board is the surprisingly complex switching power supply, mounted on a thick metal plate for cooling. Normally, the large roll of paper is mounted above the power supply board. At the right, four large circuit boards implement the main logic of the printer: a printer driver board, a communications board, a memory board, and the processor board. The rotating drum is protected by the perforated black metal grill at the front.

Inside the Shuttle teleprinter, showing the electronics.

Inside the Shuttle teleprinter, showing the electronics.

The demodulator boards

The original military teleprinter received data as a serial bitstream. However, on the Space Shuttle, data was encoded as frequencies on the audio link. Three custom boards were constructed to demodulate the audio data so the rest of the printer could handle it. These boards also performed Shuttle-specific tasks such as powering up the printer when a message comes in, and then returning the printer to standby mode. I reverse-engineered these boards to determine how they work and to determine the data encoding. (Schematics are in the footnotes.7) In this section, I'll discuss these three boards, which are on the left side of the printer.

To summarize, the serial bitstream is encoded with Frequency Shift Keying, with a 0 represented by 3600 Hz and a 1 represented by 7200 Hz.6 The serial data is transmitted at 600 baud, even parity, one stop bit. The demodulation process first converts the input audio to a digital signal by thresholding it. (That is, the input sine wave is converted to a square wave.) The digital signal is autocorrelated to distinguish the 3600 Hz and 7200 Hz signals, recovering the underlying serial data. This signal is passed to the printer's logic boards (part of the original military teleprinter), which convert the serial signal to ASCII bytes and prints them.

Signal processing starts with the "FSK input" board, shown below. First, it amplifies the input audio signal. (The two large resistors provide a 600 Ω load for the audio input.) Next, a 900 Hz high-pass filter eliminates low-frequency noise. (The filter is implemented by a two-stage Sallen-Key topology.)

The input board.

The input board.

The signal bounces from board to board, going to the "output FSK demod" board next. This board has a carrier-detect circuit that turns on the rest of the printer if it detects an input signal. This allows the printer to sit idle until it receives a signal from Earth. This board also applies the threshold to the signal to turn it into a digital waveform, which goes to the "control" board.

The output board.

The output board.

The output board also holds the 5-volt and 12-volt linear regulators that power the three boards; these are the metal-can ICs at the bottom of the board. To reduce the load on the regulators, two large resistors drop the input voltage (28 volts) to a lower level before it is regulated.

The control board holds the FSK decoder, an interesting circuit that converts the two FSK frequencies to binary by implementing a digital auto-correlator. It uses a 64-bit shift register to delay the digital input by 139 µs. The input and the delayed input are XOR'd together, generating a result that depends on the frequency. A 7200 Hz signal repeats every 139 µs, so the input and the delayed input match, yielding 0 from the XOR. However, a 3600 Hz square wave switches state every 139 µs, so the two XOR inputs will always differ, resulting in a 1 output. Thus, the circuit cleanly distinguishes between a 3600 Hz input and a 7200 Hz input. (The XOR output is opposite from the final value since it gets inverted later.)

The control board.

The control board.

The digital demodulator avoids some of the problems of an analog FSK demodulator. It is not sensitive to signal levels, since the signal is converted to digital. The digital demodulator is also not sensitive to harmonics, which can cause problems with analog demodulators. Finally, it doesn't require the carefully-tuned filters of an analog circuit.

The demodulated signal passes from the control board back to the output board. This board applies a 400 Hz low-pass filter and then a threshold to convert the signal back to binary. If the input frequencies are not exact, the demodulator will produce the correct 0 or 1 value over most of the waveform, but there will be glitches at the edges. The low-pass filter removes these glitches. (You might be concerned that a 600-baud signal would be wiped out by a 400 Hz low-pass filter. However, the worst case signal (alternating 0's and 1's) would be 300 Hz because it takes two bits to make one cycle, so the filter has plenty of margin.) Next, the board blocks the signal unless a carrier is detected. This ensures that random noise isn't demodulated and printed. Finally, the serial binary signal leaves the custom Shuttle boards and goes to the teleprinter's communication board, part of the standard teleprinter.

I noticed two unusual things about these boards. First, they have some modifications: "bodge" wires and added components. Second, the boards are not conformal coated, which is unusual for aerospace boards. (The four logic cards, in comparison, are protected with conformal coating.) My hypothesis is that these boards were development boards, early in the design process of the Shuttle teleprinter, so they were modified as the design changed. The teleprinter is also marked "Not for flight", which supports this theory.

Mission Specialist Thagard getting output from the teleprinter. Flight STS-7, June 24, 1983. From NARA. Although the description says this is the Text & Graphics System, it is clearly the Interim Teleprinter.

Mission Specialist Thagard getting output from the teleprinter. Flight STS-7, June 24, 1983. From NARA. Although the description says this is the Text & Graphics System, it is clearly the Interim Teleprinter.

The logic cards

The military teleprinter contained four logic circuit cards: a CPU card, a memory card, a communications card, and a print control card, mounted at the right rear of the teleprinter. These cards are used unchanged in the Shuttle teleprinter.

The circuitry is more complex than you might expect, with four large cards full of ICs. There are several reasons for this. First, the cards use 1970s microprocessor technology, so it takes a lot of circuitry to do anything. In particular, many simple 7400-series logic chips perform "glue" functions: decoding addresses, buffering data, latching signals, and so forth. Moreover, a drum printer is inherently complicated, since 80 hammers must be driven at the right time based on the desired characters. Third, the teleprinter is very flexible, supporting multiple signal levels and two character formats (ASCII and Baudot). Most surprisingly, the teleprinter implements a word processor, allowing messages to be composed and edited offline. Of course, since the Shuttle's teleprinter is only used to receive data, and doesn't even have a keyboard, the word processor feature is entirely useless.

The CPU card

The CPU card holds the microprocessor that controls the teleprinter. Its most important function is to convert a line of ASCII characters into print drum codes. These codes are stored in memory for use by the print control card. The CPU also implements configuration and self-test functions.

The diagram below shows some of the main components. The CPU card contains a Motorola 6800 CPU, 4 kilobytes of memory, and a ROM that holds its program code.8 Inconveniently, all the IC part numbers are military numbers so it takes some investigation to determine what a part really is. The MC6822 is a Peripheral Interface Adapter, a Motorola chip that provides two parallel I/O ports. This chip is used on three of the cards to support a variety of I/O tasks. On the CPU card, the I/O ports drive eight status lamps (most of which were removed for the Shuttle teleprinter) as well as internal status signals such as "paper low" or "keyboard present" and the baud rate setting input.

The CPU card is centered around a Motorola 6800 microprocessor.

The CPU card is centered around a Motorola 6800 microprocessor.

The print control card

In a sense, the print control card is the heart of the printer, since it causes characters to be printed by firing hammers against the rotating drum. As the drum goes through one revolution, all 64 characters will spin past each of the 80 print positions. By firing hammers at the exact time, the card prints a line of text.9 In more detail, for each row on the drum, the printer card scans through the 80-character memory buffer using Direct Memory Access (DMA). If the value in memory matches the current drum row number, the hammer is fired. Note that the hammers don't fire simultaneously, but in sequence as memory is scanned.

This diagram shows how the print control board interacts with the rest of the system. From the Maintenance manual, TM 11-5815-602-24.

This diagram shows how the print control board interacts with the rest of the system. From the Maintenance manual, TM 11-5815-602-24.

The diagram above shows the interaction between the drum, the print control card, and the 80 hammers. The hammers are implemented on 20 print hammer cards, each with 4 hammers. Electrically, the hammers are arranged in a matrix. One wire out of 20 (S1-S20) selects the hammer board, the group of four. Another wire selects one of four hammers (Col 1-4). This approach simplifies the electronics, since 20 + 4 driver circuits and wires are used, rather than 80 (one for each column). The print control card is synchronized to the drum by two photo-transistor sensors that detect the drum's position. One sensor is triggered on each row, while the other sensor triggers once per revolution.

The print control card is shown below, with the main functional blocks labeled. The large purple-and-gold chip is the PIA, the same I/O chip that appeared on the CPU card. It handles a variety of signals such as the self-test request, paper out, and the drum stop signal. The mode control logic generates timing signals depending on the printer's mode. The data compare logic increments the row counter on each drum pulse, and compares the row counter to the value read from memory.10 The hammer driver circuitry on the left selects one of the 20 hammer cards, while the hammer driver circuitry on the right selects one of four hammers. The ribbon circuitry raises and lowers the ribbon so the ribbon doesn't block the text when the printer is idle. The line feed circuitry advances the paper for a line feed operation.

The print control card prints data by driving the hammers.

The print control card prints data by driving the hammers.

The photo below shows one of the hammer cards, with four hammers. Each hammer has an electromagnet that pulls a lever, rotating the hammer wheel, and causing the hammer to strike the paper. (The hammers themselves are in the upper right of the photo.) A screw adjustment controls the distance between each hammer and the paper, allowing precise adjustment of the timing. (Marc had to carefully adjust all the hammers to make the print quality readable.)

One of the 20 Hammer driver cards. Photo courtesy of Marcel.

One of the 20 Hammer driver cards. Photo courtesy of Marcel.

The communication card

The communication card handles the teleprinter's serial data input. The key chip is the 8251A, a USART (Universal Synchronous/Asynchronous Receiver/Transmitter). This complex chip performs the conversion between the serial data stream and the bytes that the processor uses. (Note that the military teleprinter both sent and received serial data, while the Shuttle teleprinter only receives data.) The chip has a few support chips, labeled "UART" in the diagram below. The board has another Peripheral Interface Adapter chip, providing two I/O ports. These ports have functions such as reading the serial line settings (ASCII vs. Baudot, odd or even parity, number of stop bits, and current loop levels).

The communication card converts the serial input to parallel byte data.

The communication card converts the serial input to parallel byte data.

The board also has circuitry to generate the clock pulses for the selected baud rate. The mode circuitry handles various phases of transmit/receive. The filter/demod circuitry handles different input types, digitally filtering and demodulating as necessary.11

The memory card

The memory card supports the word-processing feature. It provides additional RAM to hold the text buffer as well as the ROM holding the software for editing. The 16 DRAM chips on the left (MK4027) provide 8 KB of RAM while the two ROM chips on the right provide 8K of ROM. The chips in the middle to the right of the resistors split the 12 address bits into row and column addresses as required by the RAM chips. The address signals go through the numerous 24 Ω resistors in the middle; I don't know why. According to the manual, the printer operates fine without this card, except without the word processor. Since the word processor was irrelevant to the Shuttle, I wonder why this card wasn't removed to reduce weight.

The memory card has additional RAM and ROM to support the word processing feature.

The memory card has additional RAM and ROM to support the word processing feature.

The power supply

The power supply board (shown earlier) implements separate power supplies for different parts of the printer.12 The supplies are implemented as switching power supplies, which were not as common at the time as now. The microprocessor supply provides +5V, +12V, and -5V, voltages required by memory chips in the 1970s. A separate switching power supply provides +5V, -8.6V, and +8.6V for the keyboard, dustcover, and interface module, components that were removed for the Shuttle teleprinter. Another supply powers the printer's status lamps.

The drum motor supply is important because its voltage is regulated to control the rotational speed of the drum. A sensor on the drum provides a feedback pulse for each row on the drum. (I think the drum speed is 868 RPM.) These pulses control the drum motor's switching supply. If the drum spins too slowly, the voltage is increased, and similarly if it spins too fast.

The hammers have an unusual constant-current power supply. When the printer is active, this power supply generates +18 V. However, the power supply is designed to use a constant current of 600 mA regardless of the hammer activity. A capacitor provides a reservoir of power that is filled by the constant current. If the hammers are using less current, the excess current is bled off through a resistor. The purpose of this is "to mask printing intelligence during periods of message traffic." In other words, if you used a teleprinter in the embassy in Moscow, for instance, spies could monitor power transients to see when hammers are firing, and perhaps figure out what is being printed. By keeping the current constant, this source of intelligence is blocked. Of course, this feature is useless on the Space Shuttle and only wastes power.

The military teleprinter accepted multiple input voltages: 22-30 VDC, 115 VAC, or 230 VAC, along with a 12 VDC battery backup. The transformers and diodes to support these voltages were part of the interface module that was removed for the Shuttle teleprinter. Instead, the Shuttle teleprinter is powered by 28 VDC.

Mechanical changes

The military teleprinter underwent significant mechanical changes to make it suitable for the Shuttle. These changes reduced its weight from 100 pounds to 59 pounds. The most visible change to the printer is the removal of the keyboard. The entire front section of the printer was replaced, removing the controls that were not needed in the Shuttle.13 The rugged frame of the original printer was replaced with a lighter-weight (but still substantial) frame. Horizontal rails were added to the frame to support the printer in the Shuttle locker.

The photo below shows the front of the Shuttle teleprinter. While the military teleprinter had numerous lights and switches on the front, the Shuttle teleprinter has just two lights and four switches.

Front view of the Shuttle teleprinter. The bar across the middle holds a paper cutter for removing the output.

Front view of the Shuttle teleprinter. The bar across the middle holds a paper cutter for removing the output.

NASA was concerned that the temperature of the teleprinter could become hazardous to the astronauts. To mitigate this danger, the teleprinter had a large heat-sensitive warning sticker. The yellow sticker on the left of the teleprinter changes color and displays an image if it heats up: it shows a bandaged hand and the word "HOT". Above it is an "Omegalabel" temperature monitoring sticker that shows the highest temperature the device reached. There are more of these stickers inside the teleprinter on various motors.

The Interim Teleprinter inside the Space Shuttle

The teleprinter was too large to be mounted on the flight deck, so it was mounted in a storage locker on the middeck, one level lower. The photo below shows the location of the locker that held the teleprinter (although the teleprinter was not present in this photo), looking backward (aft) toward the airlock. The locker is denoted MA9F, indicating Mid-deck Aft, position 9F (details), in the back on the right side of the Shuttle.

This photo shows the locker that held the teleprinter. Photo by DMolybdenum, panorama viewed on renderstuff.

This photo shows the locker that held the teleprinter. Photo by DMolybdenum, panorama viewed on renderstuff.

The teleprinter was noisy because of its impact printing; even with it in a locker, the sound outside was 69.5 dB. The solution was to soundproof the locker with acoustic insulation. Various insulating materials were tested until one was found that passed the toxicity requirements. Another flammability waiver was required for the insulation.

Putting the teleprinter in an insulated locker without cooling caused another problem: overheating. The military teleprinter used 34 watts even while idle, which would cause the printer to become dangerously hot after just 6 orbits. The printer was redesigned to support a standby mode that used just 1 watt. When a signal from Earth was detected, the printer would power up while in use, and then return to standby mode. A circuit was added to send a tone back to Earth when the printer was activated, reassuring Mission Control that the printer had switched out of standby mode. These circuits were on the three custom Shuttle boards described earlier.

Putting the teleprinter in a locker made cabling difficult. The solution was a panel on the locker door with connectors for power and audio. The panel has a power switch and light as well as a light to indicate that a message has been received.

The panel on the outside of the locker, used for connection to the teleprinter. From distantsuns, NASA Space Flight forum.

The panel on the outside of the locker, used for connection to the teleprinter. From distantsuns, NASA Space Flight forum.

The photo below shows the teleprinter locker with the connection panel on the far left. Note the cables attached to the connectors. These cables went across the back of the Shuttle to the left side, where they went up to the flight deck; the cable routing was performed before launch.14 For this flight, the neighboring locker MA16F held 3300 honeybees for a student experiment.

The teleprinter in middeck locker MA9F on flight STS-41C.  The hands belong to mission specialist van Hoften.  From National Archives; the description says the photo is from 1995 and shows the Thermal Impulse Printer system, but both are wrong. (STS-41C was in April, 1984.)

The teleprinter in middeck locker MA9F on flight STS-41C. The hands belong to mission specialist van Hoften. From National Archives; the description says the photo is from 1995 and shows the Thermal Impulse Printer system, but both are wrong. (STS-41C was in April, 1984.)

The teleprinter cables connect to the shuttle at panel A15 on the aft bulkhead of the flight deck on the left side of the Shuttle. In other words, if you sat in the Shuttle Commander's seat in the cockpit and turned around, this is what you would see.

The connections for the teleprinter in the flight deck. This photo shows Atlantis in the Kennedy Space Center visitor complex. In use, the Shuttle was much more cluttered.

The connections for the teleprinter in the flight deck. This photo shows Atlantis in the Kennedy Space Center visitor complex. In use, the Shuttle was much more cluttered.

The audio cable from the teleprinter went to the Payload Specialist communication connection on panel A15, while the power cable went to the DC power connection right below. During launch, this audio connection was needed for crew communication, so the teleprinter was plugged in after launch and the audio settings were reconfigured on panel L9. A cue card was placed above panel L9 with instructions on the teleprinter.

The teleprinter's replacements

The Shuttle teleprinter was supposed to be used for a short time until the Uplink Text and Graphics System (TAGS) entered service, but things didn't work out that way. TAGS, described earlier, was the fax-like system that could receive grayscale images, but it depended on the TDRS satellites with their support for digital data. The first TDRS satellite was launched by the sixth shuttle flight, STS-6 (1983). This allowed the use of TAGS on STS-7, but the printer promptly jammed.15 TAGS had constant problems with jamming; on STS-35, the printer jammed and then the unjamming tool broke. Due to the unreliability of the TAGS, the Interim Teleprinter was kept in service as a backup device. TAGS was mounted on a dual cold plate in avionics bay 3 of the crew compartment middeck (details), on the other side of the airlock from the teleprinter.

The Uplink Text and Graphics System, serial number 2. Photo from Smithsonian National Air and Space Museum.

The Uplink Text and Graphics System, serial number 2. Photo from Smithsonian National Air and Space Museum.

After a decade, another printer, the Thermal Impulse Printer System (TIPS) was put into service, probably on flight STS-56 in 1993. Once TIPS proved its reliability, it replaced both the teleprinter and the Text and Graphics System (TAGS). The TIPS printer was installed in mid-deck locker MF28E; the F indicates the locker was on the forward wall, not the aft wall that held the Interim Teleprinter. As a backup for the TIPS, the Shuttle flew with a second TIPS.

The Thermal Impulse Printer System (TIPS) on flight STS-58. From National Archives. The description says that this device is the teleprinter but it is TIPS.

The Thermal Impulse Printer System (TIPS) on flight STS-58. From National Archives. The description says that this device is the teleprinter but it is TIPS.

One motivation behind the TIPS thermal printer was NASA's desire to use more commercial-off-the-shelf (COTS) equipment instead of expensive custom equipment. The TIPS printer is the Raytheon TDU-850 printer (below), a commercial product that sold for $4950. A custom communication interface board inside the printer provided the interface between the printer and the Shuttle's S-Band and Ku-Band communications systems. This interface also allowed astronauts to use the TIPS as a printer for an onboard personal computer.

The Raytheon TDU-850 printer (Thermal Display Unit). From EDN, Mar 17, 1988, p.251.

The Raytheon TDU-850 printer (Thermal Display Unit). From EDN, Mar 17, 1988, p.251.

The photo below shows the TIPS printer in use, printing a long stream of output that Eileen Collins is reading. Collins was the first woman to pilot the Space Shuttle; she flew on the Shuttle four times, twice as pilot and twice as commander.

Pilot Collins reading output from the TIPS printer, the gray box on the right. This is flight STS-84, Atlantis. Photo from National Archives.

Pilot Collins reading output from the TIPS printer, the gray box on the right. This is flight STS-84, Atlantis. Photo from National Archives.

The teleprinter, operational

We succeeded in making the Shuttle teleprinter operational. The printer had many mechanical problems, mainly because the rubber rollers had turned to liquid and gummed up the mechanism. Marc disassembled the printer, carefully cleaned the mechanism, and realigned everything. I won't discuss the restoration process here since there will be a video on CuriousMarc's channel. We were able to send FSK-modulated data to the printer and it was printed successfully, as shown below.

Conclusions

At first, I thought that the Shuttle's Interim Teleprinter was a terrible design. It's absurdly heavy and was in danger of overheating. Although the design started with an existing product, much of it required redesign: the front section, the new drum, the interface, and even the frame. The design inherited features it couldn't use, such as the built-in word processor. And the constant-current feature was pointless for the Shuttle and just wasted power.

When I learned that the design had to be completed in just seven months, my opinion of the teleprinter improved. Moreover, the design had many constraints, such as toxicity and flammability restrictions, that limited the potential approaches.

In the end, the teleprinter was used on over 50 flights, acting as a reliable backup to the somewhat flaky Text and Graphics System (TAGS).16 Despite its name, the Interim Teleprinter turned out to be a long-lasting solution, not interim at all. So I have to conclude that the teleprinter was a good design, working much better and much longer than intended.17

In any case, the Interim Teleprinter is an interesting piece of hardware and I hope you enjoyed this article. Follow me on Mastodon as @kenshirriff@oldbytes.space or RSS. Thanks to Marcel for providing the printer. Restoration performed with CuriousMarc, Eric Schlapefer, and Mike Stewart.

Notes and references

  1. References for the teleprinter:
    The Interim Teleprinter and its development is described in detail in: M.D. Schuette, “Space Shuttle Interim Teleprinter System,” in Conference record: NTC ’82, Systems for the Eighties, IEEE. (I'll call this the "teleprinter paper" for short.)
    The Shuttle Crew Operations Manual has extensive information on the shuttle and some information on the teleprinter.
    The teleprinter is briefly discussed here.
    Some teleprinter information is in the "Crew Systems Equipment Workbook" via RR Auction.
    The layouts of the Shuttle panels are in Orbiter OV-102 Display and Control Panel Configuration.
    The lockers are described in Orbiter middeck/paylod standard interfaces control document.
    The manuals for the AN-UGC/74 are at RadioNerds.
    An enormous collection of Shuttle documents is at gandalfddi

  2. The teleprinter paper mentions that Shuttle had one other option for receiving hardcopy data: the Text Uplink to Mass Memory System (TUMMS). This allowed text to be displayed on a CRT and the crew could take a Polaroid photo. This was obviously an impractical solution. I couldn't find any other references to TUMMS, so TUMMS may be a proposal that wasn't implemented. 

  3. Specifically, the Shuttle teleprinter was based on the Honeywell Model AN/UGC-74A9(V)3 Communications Terminal. 

  4. The mechanism of a drum printer is similar to a chain printer such as the IBM 1403 line printer: each print position has a hammer that fires when the correct character is in that position. However, chain printers have better print quality than drum printers, due to the effect of timing errors. In a drum printer, a small timing error on a hammer will cause the character to be printed too high or too low. In a chain printer, however, a timing error will cause the character to be shifted to the left or right. Vertical mispositioning is obvious and looks terrible. Horizontal mispositioning is much less noticeable since character spacing is normally slightly variable. 

  5. To be precise, the hammer is fired 1.5 characters early due to its travel time. By the time the hammer hits the drum, the drum has rotated enough to put the desired character in place. Each hammer has a screw to adjust its distance to the drum, necessary to get the timing exact. It's amazing that this system works and doesn't produce a smudged mess. 

  6. After reverse-engineering the boards, I found a paper on the Shuttle teleprinter that specified the FSK frequencies as 1600 Hz for a 0 and 2057 Hz for a 1, different from what we used. Perhaps the frequencies were changed during development. 

  7. I created schematics of the three Shuttle-specific boards. Click an image for a larger (readable) version.

    Schematic of the input board.

    Schematic of the input board.

    Schematic of the control board.

    Schematic of the control board.

    Schematic of the output board.

    Schematic of the output board.

     

  8. The block diagram below shows the main functional blocks of the CPU card.

    CPU block diagram. From Maintenance Manual, TM 11-5815-602-24, p3-6

    CPU block diagram. From Maintenance Manual, TM 11-5815-602-24, p3-6

     

  9. I expected that a line would be printed during one drum revolution but looking at the print pattern, it appears to take multiple revolutions per line. Perhaps the printer is avoiding hammers firing too close together to minimize current spikes. Moreover, the published print speed of 60 characters per second is considerably slower than one revolution. Or perhaps the hammer pattern is randomized so spies can't listen in and determine what is being printed. I'm still investigating. 

  10. Looking at the circuitry, I think the memory buffer holds the drum row number for each position, and the print control card fires the hammer if the value matches the current row number. In contrast, the "obvious" approach would put the character values in the memory buffer and the print control card would match against the current drum character. The implemented solution puts less work on the print control card, which only needs to update the target comparison value once per line, rather than every character. However, it requires the CPU card to transform the input characters into row values. 

  11. The teleprinter accepts two types of inputs: NRZ and D10. NRZ (Non-Return to Zero) is the straightforward encoding of the serial signal as 0's or 1's. The manual doesn't define D10, but I think it is Manchester encoding, using a 01 sequence for a 0 and a 10 sequence for a 1 (or inverted). The D10 signal is self-clocking, since each bit contains a transition. The demodulation circuit converts the D10 signal into a straight bit sequence. An NRZ signal can either use an external clock or an internal clock from the baud rate generator. With the internal clock, the input is sampled four times and digitally filtered since the input may not exactly line up with the internal clock. 

  12. The power supply is explained in the Maintenance Manual. The fold-out power supply schematics in that manual were not scanned for some reason but can be found in the B&C Maintenance Manual

  13. The military teleprinter contained a large interface module at the back, providing the signal and power connections to the terminal. The serial-line signals could be a 20-milliamp current loop, a 60-milliamp current loop, or MIL-STD-188/144 (similar to RS-422). The interface module converts these signals to the TTL signals used internally. The interface module also contains a power supply for the interface circuitry. Since this interfacing was not required for the Shuttle, the interface module was discarded and replaced with the Shuttle's custom FSK interface cards. The AC power supply and filtering was also removed. 

  14. I was a bit surprised that the teleprinter cables would run for a long distance through the Shuttle. But the Shuttle is full of wires and cables running in all directions, as shown in the photo below. This photo is from the same angle as the earlier diagram showing where the teleprinter is connected. This flight was after the teleprinter was retired, but the teleprinter would have been plugged in behind the exercise equipment.

    The aft flight deck of Discovery during STS-116. From National Archives.

    The aft flight deck of Discovery during STS-116. From National Archives.

     

  15. One source says that the inaugural flight of TAGS was STS-29 (March 1989). Another source says that testing of the "new" TAGS system continued on STS-29. Contradicting this, TAGS was used on STS-7 (June 1983), jamming after the first page. TAGS was also used on STS-8 (August 1983) but failed after five pages. The TAGS unit was not flown on STS-41B (Feb 1984, the next Challenger flight after STS-8). (Note that STS-41B was the tenth flight, considerably before STS-29, the 28th flight. The Space Shuttle mission numbers are a mess.) It's hard to reconcile these statements. Probably, TAGS was still in the testing stage as late as STS-29 due to reliability problems. 

  16. The teleprinter had a few problems during use. On flight STS-6, the teleprinter got stuck in high power mode. On flight STS-30, messages were illegible (link). 

  17. The teleprinter shows the risk of building an interim solution that turns out to last much longer than expected. This also happened with the Interim Upper Stage (IUS), a launch system to boost Shuttle payloads to a higher orbit. The Interim Upper Stage was designed as a temporary solution until a space tug became available. Eventually, NASA realized that nothing was replacing the IUS, so it was renamed to "Inertial Upper Stage", preserving the acronym.

    I'll mention that this also happened with the 8086 processor. It was intended as an interim processor until the iAPX 432 "micro-mainframe" processor was ready. The iAPX 432 turned out to be a disaster, while the "stopgap" 8086 is still with us as the x86 architecture. 





[#] Mon Aug 19 2024 11:16:24 UTC from rss <>

Subject: Inside the guidance system and computer of the Minuteman III nuclear missile

[Reply] [ReplyQuoted] [Headers] [Print]

The Minuteman missile was introduced in 1962 as a key part of America's nuclear deterrent. The Minuteman III missile is currently the only US land-based intercontinental ballistic missile (ICBM), with 400 missiles ready for launch, spread across five central states.1 The missile contains a precision guidance system, capable of delivering a warhead to a target 13,000 km away (8000 miles) with an accuracy of 200 meters (660 feet).

The diagram below shows the guidance system of the Minuteman III missile (1970). This guidance system contains over 17,000 electronic and mechanical parts, costing $510,000 (about $4.5 million in current dollars). The heart of the guidance system is the gyro stabilized platform, which uses gyroscopes and accelerometers to measure the missile's orientation and acceleration. The computer uses the measurements from the platform to determine the missile's position and guide the missile on its trajectory to the target. Other key components are the missile guidance set controller, which contains electronics to support the gyro stabilized platform, and the amplifier, which interfaces the computer with the rest of the missile. In this blog post, I take a close look at the components of the guidance system that was used until the early 2000s.2

The Minuteman III guidance system (NS-20). Click on this image (or any other) for a larger version. Original image from National Air and Space Museum.

The Minuteman III guidance system (NS-20). Click on this image (or any other) for a larger version. Original image from National Air and Space Museum.

Fundamentally, the guidance computer constantly compares the missile position to the desired trajectory and generates the appropriate steering commands to keep the missile on track.3 The diagram below shows how directing the engine nozzles causes the missile to rotate around its three axes: roll, pitch, and yaw.4 In the silo, the roll angle (the azimuth) is aligned with the direction to the target. The missile takes off vertically and then the missile gradually rotates along the pitch axis to tilt over toward the target. During flight, adjustments along all three axes keep the missile on target. The Minuteman III has four rocket stages so the guidance computer jettisons each rocket stage and ignites the next stage in sequence.

The roll, pitch, and yaw axes for the Minuteman missile. The engine diagrams show how the nozzles are directed to rotate around each axis, Modified from A Simulation of Minuteman Trajectories, with changed axes.

The roll, pitch, and yaw axes for the Minuteman missile. The engine diagrams show how the nozzles are directed to rotate around each axis, Modified from A Simulation of Minuteman Trajectories, with changed axes.

The guidance platform

The idea behind inertial navigation is to keep track of the missile's position by constantly measuring its acceleration. By integrating the acceleration, you get the velocity. And by integrating the velocity, you get the position. Inertial navigation is self-contained, a big advantage for a missile since the enemy can't jam your navigation. The hard part is measuring the acceleration and angles with extreme accuracy, since even tiny errors are multiplied as the missile travels.

In more detail, the Minuteman's inertial guidance is built around a gyroscopically stabilized platform, which is kept in a fixed orientation. The platform is mounted on two beryllium gimbals. Feedback from gyroscopes drives three torque motors to rotate the gimbals to keep the stable platform in exactly the same orientation no matter how the missile twists and turns.

The Minuteman III stable platform. Original image from National Air and Space Museum.

The Minuteman III stable platform. Original image from National Air and Space Museum.

The diagram below shows the components of the stable platform, in approximately the same orientation as the photo above. Three accelerometers are mounted on the stable platform to measure acceleration. The accelerometers are oriented along three perpendicular axes so each one measures acceleration along one axis. (The accelerometer axes are not aligned with the platform axes; this distributes the acceleration (mostly "up") across the accelerometers, increasing accuracy.) The two alignment mirrors allow the stable platform to be aligned with a precise device called an autocollimator, as will be described below. The gyrocompass uses the Earth's rotation to precisely determine North, providing a backup alignment technique. Both the alignment mirrors and the gyrocompass can be rotated to a precise angle, reported by the resolver.

The stable platform for Minuteman II and III. Modified from Minuteman weapon system history and description.

The stable platform for Minuteman II and III. Modified from Minuteman weapon system history and description.

To target a Minuteman I missile, the missile had to be physically rotated in the silo to be aligned with the target, an angle called the launch azimuth. This angle had to be extremely precise, since even a tiny angle error will be greatly magnified over the missile's journey. Aligning the missile was a tedious process that used the North Start to determine North. Since the star was not visible from inside the silo, a complex surveying technique was used, using a surveyor's theodolite to measure the angles between the North Star and three concrete monuments outside the silo. Inside the silo, the closest monument was visible through a sighting tube, allowing the precise angle measurement to be transferred to the silo. After many more measurements inside the silo, a special device called an autocollimator was positioned precisely 90° from the desired launch azimuth. The autocollimator shot a beam of light through a window in the side of the missile, where it bounced off a mirror on the stable platform and returned to the autocollimator. If the returning beam wasn't exactly parallel, the autocollimator sent a signal to the missile, causing the stable platform to rotate as needed. The result of this process was that the stable platform was exactly aligned with the desired angle to the target.5

The guidance platform was completely redesigned for Minuteman II and III, eliminating the time-consuming alignment that Minuteman I required. The new platform had an alignment block with rotating mirrors. Instead of rotating the missile, the autocollimator remained fixed in the East position and the mirror (and thus the stable platform) was rotated to the desired launch azimuth. The new guidance platform also added a gyrocompass under the alignment block, a special compass that could precisely align itself to North by precessing against the Earth's rotation. At first, the gyrocompass was used as a backup check against the autocollimator, but eventually the gyrocompass became the primary alignment. For calibration, the alignment block also includes electrolytic bubble levels to position the stable platform in known orientations with respect to local gravity.6

The alignment block with mirrored surfaces. Image from National Air and Space Museum.

The alignment block with mirrored surfaces. Image from National Air and Space Museum.

The photo above shows the alignment block on top of the gyrocompass. The front and back of the block are the precision mirrors that reflect the light beam from the autocollimator. The circles on top of the block and at the right are two level detectors, with set screws for exact adjustment. The platform has four level detectors, allowing it to be aligned against gravity in multiple positions. Like the gimbals, the gyrocompass assembly is made of beryllium due to its rigidity and light weight; it has a warning sticker because beryllium is highly toxic.

The diagram below shows how the axes align with the gimbals of the stable platform.7 Note the window at the top of the photo. Light from the autocollimator shines in through the window, reflects off the mirror on the alignment block, and returns through the window to the autocollimator. The autocollimator detects any error in alignment and signals the guidance system to correct its position accordingly.

Coordinate system for the stable platform. Note that these axes don't match the missile axes; the stable platform axes remain constant as the missile turns. Original image from National Air and Space Museum.

Coordinate system for the stable platform. Note that these axes don't match the missile axes; the stable platform axes remain constant as the missile turns. Original image from National Air and Space Museum.

The stable platform uses gyroscopes to maintain its fixed orientation as the missile turns. The idea behind a gyroscope is that a spinning disk will tend to maintain its spin axis. The problem is that any friction, even from precision ball bearings, will reduce the accuracy. The solution in the Minuteman is a "gas bearing", where the gyroscope rotor is supported by an extremely thin layer of hydrogen. As shown below, the gyroscope is built around a stationary marble-sized ball (blue), fastened to the gyroscope frame at the top and bottom. The rotor (pink) is clamped around the equator of the ball and spins at high speed, powered by an induction motor (windings green, rotor yellow). If the gyroscope frame is tilted, the rotor will stay in its orientation. The resulting change in angle between the frame and the rotor is detected by sensitive capacitive pickups (purple). The gyroscope is sensitive to tilt in two axes: left-right, and front-back. Since nothing touches the rotor except the thin layer of gas around the ball, the influence of friction is minimal.

A gas-bearing gyroscope. Based on patent 3,025,708.

A gas-bearing gyroscope. Based on patent 3,025,708.

A gas-bearing gyroscope has the problem that when it starts or stops, the gas layer dissipates, allowing the rotor and the bearing to rub. The Minuteman missile's guidance system was kept continuously running, so starts and stops were infrequent. Moreover, when the gyroscope did need to be started, the electronics gave it a 40-volt jolt to get it up to speed quickly. Because the Minuteman's guidance system was always running—and its solid-fuel engines didn't require fueling—the missile could be launched in under a minute.

To summarize the guidance trajectory, a Minuteman flight is typically about 35 minutes,8 but only the first few minutes are powered by the rockets; the warheads coast most of the way on a ballistic trajectory. The first three rocket stages are active for just 180 seconds; this completed the boost phase for Minuteman I and II. However, the innovation of Minuteman III was that it held three warheads, a system called MIRV (Multiple Independently-targeted Reentry Vehicles). To direct these warheads to their targets, Minuteman III has a fourth stage, called PSRE (Propulsion System Rocket Engine), mounted just below the guidance system. The PSRE was active for 440 seconds, directing each warhead on its specific path. (Meanwhile, a retro-rocket sent the third stage in a random direction. Otherwise, it would tag along with the warheads, acting as a giant radar beacon for enemy anti-ballistic-missile systems.) The warheads travel very high, typically over 800 nautical miles (1500 km), more than three times the altitude of the International Space Station. As for the multiple-warhead MIRV, the Minuteman III missiles were converted back to single warheads as part of the New START arms reduction treaty, with the last MIRV removed in June 2014.

A MIRV configuration with three W78 warheads on the Minuteman III MK-12A reentry vehicle system. The conical reentry vehicles are smaller than you might expect, just under 6 feet tall (181 cm). In comparison, the Titan II had a reentry vehicle that was 14 feet long (4.3 m), holding a massive 9-megaton warhead. Photo from GAO-21-210.

A MIRV configuration with three W78 warheads on the Minuteman III MK-12A reentry vehicle system. The conical reentry vehicles are smaller than you might expect, just under 6 feet tall (181 cm). In comparison, the Titan II had a reentry vehicle that was 14 feet long (4.3 m), holding a massive 9-megaton warhead. Photo from GAO-21-210.

The Minuteman D-17B computer

The guidance computer has a key role in the Minuteman missile, determining the missile's position from the stable platform data, executing a guidance algorithm, and steering the missile on the desired trajectory. Before explaining the D-37 computer used in Minuteman II and III, I'll start by discussing the D-17B computer used in the first Minuteman, since its characteristics strongly influenced the later computers. The Minuteman I computer was very primitive by modern standards. Although it was a 24-bit machine, it was a serial computer, operating on one bit at a time. The big advantage of serial processing is that it dramatically reduces the hardware requirements. Since the computer only processes one bit at a time, it uses a one-bit ALU. Moreover, the buses and datapaths are one bit wide rather than 24 bits. The disadvantage, of course, is that a serial computer is slow; the D-17B took 27 clock cycles (24 bits and three overhead) to perform any operation. At best, the computer could perform 12,800 additions per second.

The computer has an unusual cylindrical structure, 29 inches (74 cm) in diameter, designed to fit the diameter of the Minuteman missile. The computer itself is the bottom half of the cylindrical shell. The top half is the electronic equipment chassis, holding the power supplies for the computer and the stable platform, as well as servo control amplifiers, oscillators, and converters.

The Minuteman I guidance computer. The computer itself is the bottom half of the cylinder, with the disk drive in the 4 o'clock position. The upper half is electronics to drive the IMU and rocket. The IMU itself would be mounted in the center. Photo by Steve Jurvetson, CC BY 2.0.

The Minuteman I guidance computer. The computer itself is the bottom half of the cylinder, with the disk drive in the 4 o'clock position. The upper half is electronics to drive the IMU and rocket. The IMU itself would be mounted in the center. Photo by Steve Jurvetson, CC BY 2.0.

The computer doesn't have any RAM. Instead, all instructions, data, and registers are stored on a hard disk, but not like a modern hard disk. The disk has separate, fixed heads for each track so it can access tracks without seeking. (This approach is similar to a computer built around drum memory, except the drum is flattened.) In total, the disk held just 2727 24-bit words (approximately 8 Kbytes). The computer's serial processing and its disk-based storage worked well together. The disk provided data one bit at a time, which the computer would process serially. The results were written back to the disk, one bit at a time as calculation proceeded. The write head was positioned just behind the read head so a value could be overwritten as it was computed.

The photo below shows the numerous read and write heads for the D-17B's hard disk. Note that the heads are fixed (unlike modern hard drives), and the heads are widely distributed across the surface. (There is no need for different tracks to be aligned.) I believe that the green and white heads in pairs are for the "regular" tracks, while the heads with other spacings implement registers and short-term storage called loops.9

Disk head assembly from the D-17B. Photo by LaserSam, CC BY-SA 40.

Disk head assembly from the D-17B. Photo by LaserSam, CC BY-SA 40.

The D-17B computer was transistorized. The photo below shows one of its circuit boards, crammed with transistors (the black cylinders), resistors, diodes, and other components. (This board is a read amplifier, amplifying the signals from the hard disk.) The computer used diode-resistor logic and diode-transistor logic to minimize the number of transistors; as a result, it used 6282 diodes and 5094 resistors compared to 1521 silicon and germanium transistors (source).

A read amplifier circuit board from the D-17B. Photo from bitsavers.

A read amplifier circuit board from the D-17B. Photo from bitsavers.

The computer supported 39 instructions. Many of the instructions are straightforward: add, subtract, multiply (but no divide), complement, magnitude, AND, left shift, and right shift. The computer handled 24-bit words as well as 11-bit split words, so many of these instructions had "split" versions to operate on a shorter value. One unusual instruction was "split compare and limit", which replaced the accumulator value with a limit value from memory, if the accumulator value exceeded the limit.

The focus of the computer was I/O with 48 digital inputs, 26 incremental inputs, 28 digital outputs, 12 analog voltage outputs, and 3 pulse outputs for gyro control. The computer had special instructions to support the various inputs and outputs.10 For example, to integrate pulse signals from the stable platform, the computer had instructions to enter and exit "Fine Countdown" mode, which caused two special registers to operate as digital integrators, in parallel with regular computation (details).

The D-37 computer

For the Minuteman II missile, Autonetics built the D-37 computer, one of the earliest integrated circuit computers. By using integrated circuits, the guidance computer was dramatically shrunk, increasing range, functionality, and accuracy. The photo below compares the size of the older D-17B computer (half-cylinder) with the D-37B (held by the engineer).

The Minuteman D-17B computer (cylinder) and D-37B computer (being held). From Microcompuer comes off the line, Electronics, Nov 1, 1963. Using modern definitions, the computer was a minicomputer, not a microcomputer.

The Minuteman D-17B computer (cylinder) and D-37B computer (being held). From Microcompuer comes off the line, Electronics, Nov 1, 1963. Using modern definitions, the computer was a minicomputer, not a microcomputer.

Although the main task of the computer is guidance, with the increased capacity of the D-37, the computer took over many of the tasks formerly performed by ground support equipment. The D-37 managed "ground control and checkout, monitoring, communication coding and decoding, as well as the airborne tasks of navigation, guidance, steering, and control" (link).

The D-37 had several models. The D-37A was the prototype system, while the D-37B was deployed in the first 60 Minuteman II missiles. The Air Force soon realized that nuclear radiation posed a threat to the computer, so they developed the radiation-hardened D-37C.11 The Minuteman III used the D-37D, an improved and slightly larger version. Even with additional disk space, program memory was so tight that software features were dropped to save just 47 words.

As far as architecture and performance, the D-37 computer is almost the same as the D-17B, but extended. Most importantly, the D-37 kept the serial architecture of the D-17B, so it had the same slow instruction speed. The D-37 kept the instruction set of the D-17B, with additional instructions such as division, logical OR, bit rotates, and more I/O, giving it 58 instructions versus 39 in the older computer. It expanded the hard disk storage, but with a double-sided disk providing 7222 words of storage in the D37-C.12 The D-37 included division implemented in hardware (which the D-17B didn't have), along with a faster hardware implementation of multiplication, improving the speed of those instructions.13 The D-37C added more I/O lines, as well as radio input and 32 analog voltage inputs.

The diagram below shows the D-37C computer, used in the Minuteman II. At the left is the hard disk that provides the computer's memory. Most of the computer is occupied by complex circuit boards covered with flat-pack integrated circuits. At the right is the advanced switching power supply, generating numerous voltages for the computer (±3, 6, 9, 12, 18, and 24 volts). The connectors at the top provide the interface between the computer and the rest of the system. Because the computer has so many digital (discrete) and analog signals, it uses multiple 61-pin connectors (details).

The D-37C computer. Image courtesy Martin Miller, www.martin-miller.us.

The D-37C computer. Image courtesy Martin Miller, www.martin-miller.us.

The D-37C computer was built from 22 different integrated circuits, custom-built by Texas Instruments for the Minuteman project. These chips ranged from digital functions such as NAND gates and a flip-flop to linear amplifiers to specialized functions such as a demodulator/chopper. Texas Instruments sold the Minuteman series integrated circuits on the open market, but the chips were spectacularly expensive ($55 for a flip-flop, over $500 in current dollars) and not as popular as TI's general-purpose integrated circuits.14 The circuit boards were very complex for the time, with 10 interconnected layers. Each board was about 4 × 5½ inches and held about 150 flatpack integrated circuits, with components on both sides.

The growth of the integrated circuit industry owes a lot to the Minuteman computer and the Apollo Guidance Computer, both developed during the early days of the integrated circuit. These projects bought integrated circuits by the hundreds of thousands, helping the IC industry move from low-volume prototypes to mass-produced commodities, both by providing demand and by motivating companies to fix yield problems. Moreover, both computers required high-reliability integrated circuits, forcing the industry to improve its manufacturing processes. Finally, Minuteman and Apollo gave integrated circuits credibility, showing that ICs were a practical design choice.

The Minuteman III used the D-37D computer, which had about twice the disk capacity, 14,137 words. The layout is similar to the D-37C above, with the disk drive on the left and the power supply on the right. Since the computer is mounted "upside down", the boards are not visible inside, blocked by the interconnect board.15 Note the use of flexible PCBs, advanced technology for the time, soldered with low-melting-point indium/tin solder.

The D-37D computer. Image from National Air and Space Museum.

The D-37D computer. Image from National Air and Space Museum.

By 1970, the D-37 computer had made the cylindrical D-17B obsolete. The government gave away surplus D-17B computers to universities and other organizations for use as general-purpose microcomputers. Dozens of organizations, from Harvard to the Center for Disease Control to Tektronix jumped at the chance to obtain a free computer, even if it was slow and difficult to use, forming a large users group to share programming tips.

The P92 amplifier

The amplifier provides the interface between the computer and the rest of the missile. The amplifier sends control signals to the missile's four stages, controlling the engines and steering. (The electronic circuitry from the Minuteman I's nozzle control units was moved to the amplifier, simplifying maintenance.) Moreover, the Minuteman has explosive ordnance in many places, ranging from small squibs that activate valves to explosives that separate the missile stages. The amplifier sends the high-current (30 amp) signals to detonate the ordnance, while monitoring the current to detect faults.16 The amplifier acts as a safety device for the ordnance, blocking signals unless the amplifier has been armed with the proper code. The amplifier sends control signals to the reentry system (i.e. the warheads) as well as the chaff dispenser, which emits clouds of wires to jam enemy radar. The amplifier also sends and receives signals through the umbilical cable from the ground equipment.

The PS 92A amplifier. Image from National Air and Space Museum. Click this (or any other image) for a higher-resolution version.

The PS 92A amplifier. Image from National Air and Space Museum. Click this (or any other image) for a higher-resolution version.

The photo above shows the amplifier with its cover removed. The amplifier is constructed as two stacks of six circuit boards, on top of a double-width power supply board. At the top and bottom of each board, connectors with thick cables connect the boards to the rest of the system. Each board is a multi-layer printed-circuit board built on a thick magnesium frame for cooling. The amplifier has five power switching boards, a valve driver board, three servo amplifier boards, and an ACTR control board (whatever that is). The system board is visible on the left, with large capacitors and precision 0.01% resistors. To its right is the decoder board, presumably decoding computer commands to select a particular I/O device. Note the extensive use of Texas Instruments flat-pack integrated circuits on this board, the tiny white rectangles.

Missile Guidance Set Control

The Missile Guidance Set Control (MGSC) contains the electronics to power and run the inertial measurement unit (IMU), providing the interface to the computer. The MGSC handles the platform servo loop, accelerometer server loops, gyroscope torquing, gyrocompass torquing and slew, and accelerometer temperature control.17 One unexpected function of the MGSC is powering the computer's hard disk, supplying 400 Hz, 3-phase power at 27.25 volts (source).

The Missile Guidance Set Control with the modules labeled. Original image from National Air and Space Museum.

The Missile Guidance Set Control with the modules labeled. Original image from National Air and Space Museum.

The MGSC is constructed from hinged metal modules, each with a particular function, shown above. The modules are constructed around printed circuit boards. Two large connectors at the right of the MGSC provide electrical connectivity with the IMU and computer. At the top and bottom of the MGSC are connections for coolant. The MGSC is roughly equivalent to the top half of the Minuteman I's cylindrical guidance system, opposite the computer half. The MGSC is unchanged between the Minuteman II and Minuteman III. The MGSC is normally covered with a metal cover that provides radiation protection, but the cover is missing in the photo above.

Battery

The battery in the Minuteman Guidance System is very unusual, since it is a "reserve battery", completely inert until activated. It is a silver/zinc battery with the electrolyte stored separately, giving the battery an essentially infinite shelf life. To power up the battery during a launch, a gas generator inside the battery is ignited by a squib. The gas pressure forces the potassium hydroxide electrolyte out of a tank and into the battery, energizing the battery in under a second. The battery can only be used once, of course, and you can't test it. The battery was built by Delco-Remy (a division of General Motors) (details). It provides 28 volts at 14.5 Amp-hours, powering the guidance system and most of the missile; a separate battery powers the first-stage rocket.

The battery inside the Minuteman III. Original image from National Air and Space Museum.

The battery inside the Minuteman III. Original image from National Air and Space Museum.

The photo above shows the battery mounted inside the guidance system. Note the two thin wires attached to the posts on the left front of the battery to enable the battery, and the thick power wires bolted to the posts on the right. Above these posts is an "electrolyte vent port"; I'm not sure what prevents caustic electrolyte from spraying out under high pressure.

The photo below shows the construction of a Minuteman I battery, similar but with two independent battery blocks. The two round gas generators on the front of the electrolyte tube force the electrolyte into the battery sections.

Inside the remotely-activated SE12G battery. (source)

Inside the remotely-activated SE12G battery. (source)

Squib-activated switch

Another unusual component is the squib-activated switch. This switch is activated by a small explosive squib; when fired, the squib forces the switch to change positions. This switch may seem excessively dramatic, but it has a few advantages over, say, an electromagnetic relay. The squib-activated switch will switch solidly, while the contacts on a relay may "chatter" or bounce before settling into their new positions. An electromagnetic relay may require more current to switch, especially if it has large contacts or many contacts. However, like the battery, the squib-activated switch can only be used once.

The squib-activated switch, next to a coolant line.
The manufacturer of this part is Boeing, as indicated by the Cage Code 94756 on the part.
Image from National Air and Space Museum.

The squib-activated switch, next to a coolant line. The manufacturer of this part is Boeing, as indicated by the Cage Code 94756 on the part. Image from National Air and Space Museum.

The purpose of the switch is to disconnect important signals, known as critical leads, during launch. The Minuteman missile has an umbilical connection that provides power, cooling, and signals while the missile is in the silo. Just before the umbilical cable is disconnected, the switch severs the connections for the master reset signal along with an enable and disable signal. Presumably, these control signals are cleanly disconnected to avoid stray signals or electrical noise that could cause problems when the umbilical connection is pulled off.

The photo below shows the umbilical cable connected to a Minuteman II missile in its silo. Also note the window in the side of the missile to allow the light beam from the autocollimator to reflect off the guidance platform for alignment.

A Minuteman II missile in its silo. Photo by Kelly Michals, CC BY-NC 2.0.

A Minuteman II missile in its silo. Photo by Kelly Michals, CC BY-NC 2.0.

Cooling

The guidance system is water-cooled while in the silo, using a solution of sodium chromate to inhibit corrosion. After launch, the guidance system operated for just a few minutes before releasing the warheads, so it operated without water cooling. (The stable platform has a fan and heat exchanger to keep it cool during flight.) The diagram below highlights the cooling lines. Coolant is provided from the ground support equipment through the umbilical connector in the upper right. It flows through the computer, diode assembly, MGSC, and stable platform. Finally, the coolant exits through the umbilical connector.

Original image from National Air and Space Museum.

Original image from National Air and Space Museum.

Diode assembly

In the middle of the guidance system, the diode assembly consists of seven power diodes. These diodes control the power flow when switching from ground power to battery power. The photo below shows the diode assembly, with coolant connections at the top and bottom. The thick gray wire in the center of the diode assembly receives power from the battery just to the left.

The diode assembly. Image from National Air and Space Museum.

The diode assembly. Image from National Air and Space Museum.

Permutation plug

The Permutation plug (or P-plug) was the key cryptographic element of the guidance system, defining the launch codes for a particular missile. The P-plug looked similar to a hockey puck and plugged into a 55-pin socket attached to the amplifier. The retaining bar held the P-plug in place.

The connector that receives the Permutation plug. Image from National Air and Space Museum.

The connector that receives the Permutation plug. Image from National Air and Space Museum.

Because the security of the missile hinged on the P-plug, the P-plug was handled in a highly ritualized way, transported by a two-person team, an airman and an officer, both armed (source). After the guidance system underwent maintenance, the P-plug team would ensure that the plug was properly installed, just before the missile was bolted back together. There was also a lot of ritual around the disk memory, since it held security codes and targeting information.18 Before anyone could work on the computer, a special team would come to the silo and erase the memory. Afterward, another team would load up the computer from a magnetic tape (in the case of Minuteman III) or punched tape (earlier).19

The missile launch codes are said to be split between the hard disk and the permutation plug. In particular, the missile software holds a two-word code for each of the five launch control facilities.22 The launch code in an Execute Launch Command (ELC) must match the combination of the P-plug value and the site-specific value on disk.23 Thus, the launch code is unique to each launch control site and each missile.24 As another security feature, a launch requires messages from two launch control sites, unless only one was available.25

Transient current detector

A nuclear blast has many bad effects on semiconductors and can cause transient errors. A rather brute-force approach was used to minimize this risk in the D-37C and D-37D computers: if a nuclear blast is detected, the computer stops writing to disk until the burst of radiation passes by. When the radiation level drops, the computer carries on from where it left off, extrapolating to make up for the lost time26 to minimize the error. Since all data is stored on the hard disk, the system doesn't need to worry about memory corruption as could happen with semiconductor RAM.

The Minuteman documents euphemistically refer to "operating in a hostile environment" for the ability to handle large pulses of radiation from a nearby nuclear explosion. Another euphemism is "seismic environment", when a nuclear blast near a silo could disturb the missile's targeting alignment. To get an idea of the expected forces, note that the launch officers were strapped into their seats with four-point harnesses to protect against the seismic environment.27

The Transient Current Detector. Image from National Air and Space Museum.

The Transient Current Detector. Image from National Air and Space Museum.

The "transient current detector" above detects dangerous levels of radiation. I couldn't find any details, but I suspect that it contains a semiconductor and detects transient current through the semiconductor induced by radiation. It would make sense to use a semiconductor similar to the ones in the computer so the detector's response matches the response of the computer, perhaps a matching Texas Instruments IC.

The Minuteman III also has two "field detectors" mounted on the outside of the guidance ring. These presumably detect large fluctuations in the electromagnetic field, indicating an electromagnetic pulse (EMP), different from the ionizing radiation picked up by the Transient Current Detector.

Conclusions

The Minuteman guidance system is full of innovative technologies. Among other things, Minuteman I used an early transistorized computer, and Minuteman II used one of the first integrated circuit computers. The Minuteman missile isn't just something from the past, though. There are currently 400 Minuteman missiles in the United States, ready to launch at a moment's notice and create global devastation. Thus, its technical achievements can't be glorified without reflecting on the negativity of its underlying purpose. On the other hand, Minuteman has succeeded (so far) in its purpose of deterrence, so it can also be viewed in a positive, peacekeeping role. In any case, the Minuteman technology is morally ambiguous, compared to, say, the Apollo Guidance Computer.

I plan to write more about the role of Minuteman and Apollo in the IC industry, so follow me on Mastodon as @kenshirriff@oldbytes.space or RSS for updates. Probably the best overview of Minuteman is Minuteman weapon system history and description. The book Minuteman: A technical history has thorough information. For information on the missile targeting and alignment process, see Association of Air Force Missileers Newsletter, December 2006. The Minuteman guidance system is described in detail in The evolution of Minuteman guidance and control. Much of the imagery in this article is from the National Air and Space Museum. Thanks to Martin Miller for providing a detailed D-37C photo. He has taken amazing photos of nuclear equipment, published in his book Weapons of Mass Destruction: Specters of the Nuclear Age, so check it out.

Notes and references

  1. The Minuteman missile was introduced in 1962, followed by the improved Minuteman II in 1965 and the Minuteman III in 1970. From 1966 to 1985, the US had 1000 Minuteman missiles fielded, but the number has been reduced since then due to various arms control agreements. At present, there are 400 active Minuteman III missiles spread among 450 launch sites. The Minuteman guidance system was updated in the early 2000s to a platform called the NS-50, using a computer based on a MIL-STD-1750A microprocessor. I'm not discussing that system in this post for reasons of space.

    Although the Minuteman has undergone modernization projects, it is reaching the end of its life and is scheduled to be replaced by the Sentinel missile. The Sentinel program is encountering delays and is over budget by 80%, raising the risk of cancellation but the Sentinel program is proceeding as of July 2024. 

  2. Disclaimer: This information is all from published sources. There's nothing secret, and it's mostly obsolete from 60 years ago. I don't have access to a Minuteman system (unlike the Titan), so this post is based on publications and photos, rather than hands-on experience. I've tried to be accurate, but I'm sure there are errors. 

  3. Different guidance algorithms can be used, such as Q-guidance, delta guidance, explicit guidance, and numerical integration; the more advanced algorithms require better computers but provide easier targeting, better accuracy, and more ability to correct for course deviations (see Present and Advanced Guidance Techniques). Q-guidance uses a precomputed "Q matrix" to constantly determine the direction in which velocity needs to be gained, while delta guidance attempts to keep the missile along a precomputed trajectory by using polynomials. In explicit guidance, the equations of motion are solved to determine the steering direction. Minuteman used delta guidance at first, but moved to "hybrid explicit" guidance when the computer became more advanced. See Minuteman: A technical history, page 234 for more on targeting algorithms. 

  4. On Minuteman I, the three stages were steered by changing the direction of the rocket nozzles. Minuteman II, however, used a single fixed nozzle on the second stage but injected fluid into the exhaust to steer the missile, a technique called liquid injection thrust vector control. The Minuteman III used this technique on the third stage as well, injecting a strontium perchlorate solution. (Small nozzles powered by a gas generator are used for roll control, since directing the exhaust won't produce roll motion.) The thrust control liquid was Freon 114B2, which turned out to be harmful to the ozone layer, so it was replaced in the 1990s with perfluorohexane

  5. Strictly speaking, the launch azimuth wasn't aimed at the target. Because the Earth rotated during the missile's flight, the launch azimuth was aimed at where the target would be when the warhead landed. Another factor was the Minuteman I had a limited ability to steer off the launch azimuth, about 10°, allowing the missile to switch between two targets at launch time. 

  6. The Minuteman guidance system is designed to achieve as much accuracy as possible. One problem is that the gyroscopes and accelerometers aren't perfect, but have small errors due to friction and other factors. Moreover, the construction of the stable platform isn't exact; components that should be parallel or perpendicular will have tiny angle errors. To deal with these problems, the missile performs periodic calibrations ranging from some every 15 minutes to some every few months.

    To assist with calibration, the guidance platform contains electrolytic bubble levels, similar to an ordinary carpentry level, but extremely sensitive. Each bubble level contains wires positioned partially in the bubble and partially in the conductive electrolyte fluid. As the bubble shifts, the amount of wire in the fluid changes, changing the measured resistance. These levels are so sensitive that The levels allow the stable platform to be rotated to known positions relative to gravity for calibration.

    The top of the gyrocompass has two mirrors for calibration, allowing the missile platform to rotate exactly 180° relative to the autocollimator. Every 15 minutes, the platform would flip over to measure the gyroscope and accelerometer signals in the opposite orientation. This allowed much better calibration, canceling out errors and improving the missile accuracy. Other calibrations were performed less frequently, such as checking each accelerometer in the up and down positions. Every 90 days, a calibration called PSAT (Perturbation Self-Alignment Technique) pitched the platform by 90° and then slowly rotated the gyrocompass around the vertical to simulate the Earth's rotation (details).

    Another alignment measurement checks the angle between the two mirrors. The two mirrors on the alignment block are supposed to be parallel, but they won't be exactly parallel. The guidance platform periodically rotates the mirror assembly to check one mirror and the other against the autocollimator to compute the angle between the mirrors, called zeta. (See Software Validation Study, page A-94.)

    These calibrations permitted the measurement of small biases and imperfections in the gyroscopes and accelerometers; this data was fed into the guidance calculations to squeeze out as much accuracy as possible. These measurements also provided statistical tracking of the devices so they could be replaced if their performance started to deteriorate. 

  7. Inconveniently, I found contradictory sources about the Minuteman coordinate system. Most sources specify Z as the roll axis, but one detailed paper swaps the X and Z axes, maybe to match simulation software. Examining Figure 5 closely shows that the new axis names were drawn in by hand. 

  8. The flight time of Minuteman depended on the distance and trajectory. The Minuteman's range is said to be 13,000 km. For a closer target, there are two possible trajectories: a high path and a low path. Being direct, the low path could take about 25 minutes, while the high path would reach over 1500 nautical miles (almost 3000 km, even times the altitude of the ISS) and take 45 minutes. See A simulation of Minuteman Trajectories

  9. The disk holds a timing track, which provides the timing for the computer, giving it a 345.6 kHz clock speed. Note that all operations in the computer are synchronized to the disk, rather than a clock inside the computer. One consequence of this is that the processor speed depends on the disk speed, so it isn't as precise as most computers, which generate the clock from a quartz crystal. The processor timing is very important for a guidance computer, since its calculations of positions depend on the time step. If the processor is running fast or slow, the position will be correspondingly wrong. The solution is that the computer calculates a parameter "tau", the ratio between processor time and wall clock time. The computer receives an interrupt exactly once per second; by counting the number of instructions executed between interrupts, the computer can compute tau and ensure that the calculations are accurate. 

  10. The computer has 8-bit analog-to-digital converters. The D-37C supports 32 analog inputs with a range of +/- 10 volts (source). It also has four digital-to-analog outputs with 8-bit accuracy, also +/- 10 volts.

    In the D-17B, nine analog outputs control the rocket steering, providing roll, pitch, and yaw to the three stages, while three analog outputs go to the stable platform, probably positioning the gimbals. 

  11. The housing for the stable platform provides radiation shielding; it is one of the few parts of the guidance system that is officially secret, but is said to be tantalum sheeting (see Minuteman: A technical history page 224). Although the computer is also said to have radiation shielding, it is curiously not on the secret list. 

  12. Sources give different memory capacities. The reason is that in addition to the regular memory, part of the disk is used for special purposes including registers and rapid access loops. The problem with the regular memory is that the processor may need to wait for an entire disk revolution to access a particular word. The solution is rapid access loops: by putting the write head just upstream of the read head, the data can be accessed more rapidly. For instance, if the write head is positioned one word length upstream, the word can be read (and rewritten) every cycle, providing immediate access to a single word. Putting the write head further upstream allows storage of longer values, with a corresponding longer wait. The D-37C has ten rapid-access channels of one to 16 words (source). The regular memory in the D-37C consists of 56 channels (i.e. tracks) of 128 words, totaling 7168 words. Counting the loops and registers yields the higher memory capacity of 7222 words. 

  13. The differences between the D-17B and D-37C instruction sets are described here

  14. The schematic for the Minuteman's flip-flop IC is shown below. This is a complex circuit for the time, with six transistors along with numerous resistors, diodes, and capacitors.

    Flip-flop schematic. From Integrated circuits go operational, Electronics, Feb 15, 1963.

     

  15. The diagram below shows an exploded view of the D-37D computer (rotated 180° from the earlier photo).

    Exploded view of the D-37D computer. Modified and fixed from Minuteman weapon system history and description.

    Exploded view of the D-37D computer. Modified and fixed from Minuteman weapon system history and description.

     

  16. The danger of these explosives is illustrated by a bizarre accident summarized by "The warhead is no longer on top of the missile." At 3:00 pm on December 5, 1964, two airmen were in the missile silo, troubleshooting a fault in the security system. One airman removed a fuse, triggering a loud explosion and the nuclear warhead fell off the missile, falling 75 feet to the floor of the silo. Nobody was injured and the warhead was hoisted out a few days later without incident.

    The problem was that the airmen used an "unauthorized tool" (a screwdriver) to remove the fuse, briefly shorting power to ground. This caused a current on a ground line connected to the missile through an umbilical cable. Inside the missile, the retrorocket for the warhead had an igniter, but a short on its connector caused another connection to ground. This ground went out through a second umbilical, closing the circuit. (Apparently, the resistance between the two grounds was high enough that the path through the two shorts had enough current to ignite the igniter.) The force of the retrorocket flung the warhead off the rocket.

    More details are in this report and this report. (This incident is not the 1980 Damascus Titan incident, where a dropped 8-pound wrench socket led to the explosion of the missile, killing one person and injuring 21 others, while flinging the warhead out of the silo. The very interesting book Command and Control discusses the Damascus incident and other mishaps with nuclear weapons.) 

  17. The functional diagram below shows the interactions between the stable platform and the guidance set. Shaded circuits are mounted on the stable platform, while others are in the control set. This diagram is for the later NS-50 platform, but it should be mostly relevant to the NS-20 used in Minuteman III earlier. At the top are the feedback loops for the PIGA accelerometers (top). The torque motors (TM) in the middle provide feedback through the gimbals for the gyroscopes. Below that, the gyrocompass has a a feedback loop with its internal torquer. The torque motor at the bottom rotates the gyrocompass and mirrors with feedback through the optical resolver.

    Platform Control Functional Diagram. From Technical Reference Handbook, SELECT WS133A, D2-27524-5, Fig. 3-12, page 3-68.

    Platform Control Functional Diagram. From Technical Reference Handbook, SELECT WS133A, D2-27524-5, Fig. 3-12, page 3-68.

     

  18. The Air Force was especially concerned with keeping the targeting information secret; the people launching the missiles had no idea what the targets were. It occurs to me, though, that since the Minuteman I missile had to be physically rotated in its silo to exactly line up with the target, one presumably could draw an azimuth line on the map and know the target was along the line. 

  19. The Minuteman computer has a conditional fill mode, where the computer can't be loaded with a new program unless the first four words match the first four words in memory channel 12. This ensures that the computer can't be loaded with unauthorized software. This four-word code must be different from the P-plug value for two reasons. First, the P-plug value is not allowed to be stored in memory. Second, the filling code is four words, while the P-plug value is two words.

    The P-plug held two hardwired code words that could be read by the processor.20 For security, the two words were not allowed to be in memory (i.e. the hard drive) at the same time. I assume it is called a Permutation Plug for historical reasons; the Saturn V booster used in Apollo used a security plug that provided a permutation of the 21-character code.21 (That is, it mapped 21 inputs to 21 outputs as a permutation.) 

  20. The processor read the P-plug code words by first triggering the discrete output #25 with the DOB 25 instruction (Discrete Output B) and then reading the value (twice for reliability). The process was repeated with output #6. Finally, the discretes were cleared with DOB 0 (reference). 

  21. The Apollo flights used "code plugs" to protect the Range Safety system from unauthorized access, since this system was capable of blowing up the Saturn V rockets (details). Signals were transmitted in a 21-symbol "alphabet" (encoded by 2 tones out of 7). The code plug permuted the 21 symbols in an arbitrary way. This wasn't a lot of security, just a simple substitution cipher, but it was sufficient for its role. A command consisted of 11 characters (9 for the address and 2 for the command), so the odds were low of hitting a valid message by chance. 

  22. One feature of the Minuteman missile is that the missile sites themselves are uncrewed; the missile officers who launch the missiles work remotely, handling multiple missiles to reduce the personnel required. Specifically, each group of 10 missiles (called a "flight") is controlled by an underground launch control center. A squadron consists of 50 missiles. A "wing" is the largest grouping, handling 150 to 200 missiles, and attached to a particular Air Force base. At its peak, Minuteman had 1000 missiles divided among six wings in Missouri, Montana, North Dakota, South Dakota, and Wyoming, with missiles spilling across the Wyoming border into Colorado and Nebraska. 

  23. Information on the launch code mechanism is from Technical Reference Handbook D2-27524-5, "System Engineering Level Evaluation Correction Team, WS133A", chapter 2. 

  24. The Command Signals Decoder provides another layer of security. It is an electromechanical stepping decoder that blocks the first-stage rocket from igniting unless it receives the proper 27-bit code as part of an Enable command. (The Enable command (ENC) happens before the Execute Launch command (ELC); see the state diagram below.) Its operation is murky; my hypothesis is that the decoder acts much like a combination lock, with the 27 code posts raised or lowered by the input bits. If all the posts are in the proper position, the inner wheel is released, allowing it to rotate to the armed position and close the electrical firing circuit for the motor igniters. Specifically, the 27 posts have a high notch on one side and a low notch on the other, so the device is programmed by rotating each pin so the desired notch faces inward. When the device receives code bits, the wheel rotates one position for each bit and a solenoid raises or lowers the pin, depending on if it is a zero or one. If all pins are in the correct positions, the inner wheel can rotate through the notches, but if any pins are incorrect, the inner wheel will bind on that pin. The 27 bits are the "CSD(M) secure code", probably consisting of 24 code bits and three padding bits. Another Command Signals Decoder on the ground "CSD(G)" provides an interlock for ground ordnance.

    The Command Signals Decoder, from Evolution of ordnance subsystems and components design in Air Force strategic missile systems.

    I think there are two motivations behind this complicated device. First, they want an interlock that is mechanical rather than electronic, since an electronic device can be affected unpredictably by radiation, power surges, component failure, programming errors, etc. Second, they want an interlock that physically disconnects the firing circuit so there is no path that can be triggered by stray current, lightning, EMP, etc.

    The Minuteman's P92 amplifier assembly also blocks ordnance unless armed with a code. It's unclear if this is the same enable code as the Comand Signals Decoder or a different code.

    The earlier Titan missile also had a code mechanism to prevent an unauthorized launch by blocking the engine. The Titan had a butterfly valve in the fuel line with a 6-digit code. If you don't enter the right code, the fuel line stays shut and the missile simply can't take off (video). 

  25. A missile launch normally requires an Execute Launch Command (ELC) from two launch control sites, moving the missile to the "Launch in Process" mode. However, that raises the concern that there could only be one surviving site. The solution is that after receiving a single launch command, the missile starts a timer. If the "one-vote launch time" passes uneventfully, the missile is launched. However, another site can cancel a rogue launch during that time by sending an Inhibit Command (INC) message. The sites have a complex system to detect which sites are active and to determine the primary and secondary sites controlling each missile. (This is reminiscent of the Byzantine generals problem.)

    The state machine for Minuteman missile status. From Technical Reference Handbook D2-27524-5, page 2-25.

    The state machine for Minuteman missile status. From Technical Reference Handbook D2-27524-5, page 2-25.

     

  26. After detecting a nuclear blast, the Minuteman computer shuts down for an integral number of disk revolutions. When it comes back up, it double-counts the accelerometer pulses for the same number of disk revolutions to make up for the missed time (see Minuteman: A technical history pages 220 and 223). As long as not much changed during the lost time, the accuracy loss is small. Of course, this counter would need to be outside the part of the computer that gets shut down. 

  27. Missiles were aligned to such accuracy that even running a diesel generator nearby could shift the silo enough to cause alignment problems, as happened with a Titan site. (See Association of Air Force Missileers Newsletter, March 2007, page 6.) A "seismic event" could also be an earthquake; the enormous 1964 Alaska earthquake—9.2 on the Richter scale—caused Minuteman guidance systems to lose alignment with the autocollimator (See Minuteman: A technical history page 221). 





[#] Sun Sep 01 2024 09:10:47 UTC from rss <>

Subject: The Pentium as a Navajo weaving

[Reply] [ReplyQuoted] [Headers] [Print]

Hurrying through the National Gallery of Art five minutes before closing, I passed a Navajo weaving with a complex abstract pattern. Suddenly, I realized the pattern was strangely familiar, so I stopped and looked closely. The design turned out to be an image of Intel's Pentium chip, the start of the long-lived Pentium family.1 The weaver, Marilou Schultz, created the artwork in 1994 using traditional materials and techniques. The rug was commissioned by Intel as a gift to AISES (American Indian Science & Engineering Society) and is currently part of an art exhibition—Woven Histories: Textiles and Modern Abstraction—focusing on the intersection between abstract art and woven textiles.

"Replica of a Chip", created by Marilou Schultz, 1994. Wool. Photo taken at the National Gallery of Art, 2024.

"Replica of a Chip", created by Marilou Schultz, 1994. Wool. Photo taken at the National Gallery of Art, 2024.

I talked with Marilou Schultz, a Navajo/Diné weaver and math teacher, to learn more about the artwork. Schultz learned weaving as a child—part of four generations of weavers—carding the wool, spinning it into yarn, and then weaving it. For the Intel project, she worked from a photograph of the die, marking it into 64 sections along each side so the die pattern could be accurately transferred to the weaving. Schultz used the "raised outline" technique, which gives a three-dimensional effect along borders. One of the interesting characteristics of the Pentium from the weaving perspective is its lack of symmetry, unlike traditional rugs. The Pentium weaving was colored with traditional plant dyes; the cream regions are the natural color of the wool from the long-horned Navajo-Churro sheep.2 The yarn in the weaving is a bit finer than the yarn typically used for knitting. Weaving was a slow process, with a day's work extending the rug by 1" to 1.5".

The Pentium die photo below shows the patterns and structures on the surface of the fingernail-sized silicon die, over three million tiny transistors. The weaving is a remarkably accurate representation of the die, reproducing the processor's complex designs. However, I noticed that the weaving was a mirror image of the physical Pentium die; I had to flip the rug image below to make them match. I asked Ms. Schultz if this was an artistic decision and she explained that she wove the rug to match the photograph. There is no specific front or back to a Navajo weaving because the design is similar on both sides,3 so the gallery picked an arbitrary side to display. Unfortunately, they picked the wrong side, resulting in a backward die image. This probably bothers nobody but me, but I hope the gallery will correct this in future exhibits. For the remainder of this article, I will mirror the rug to match the physical die.

Comparison of the Pentium weaving (flipped vertically) with a Pentium die photo. Original die photo from Intel.

Comparison of the Pentium weaving (flipped vertically) with a Pentium die photo. Original die photo from Intel.

The rug is accurate enough that each region can be marked with its corresponding function in the real chip, as shown below. Starting in the center, the section labeled "integer execution units" is the heart of the processor, performing arithmetic operations and other functions on integer numbers. The Pentium is a 32-bit processor, so the integer execution unit is a vertical rectangle, 32 bits wide. The horizontal lines correspond to different types of circuitry such as adders, multipliers, shifters, and registers. To the right, the "floating point unit" performs more complex arithmetic operations on floating-point numbers, numbers with a fractional part that are used in applications such as spreadsheets and CAD drawings. Like the integer execution unit, the floating point unit has horizontal stripes corresponding to different functions. Floating-point numbers are represented with more bits, so the stripes are wider.

The Pentium weaving, flipped and marked with the chip floorplan.

The Pentium weaving, flipped and marked with the chip floorplan.

At the top, the "instruction fetch" section fetches the machine instructions that make up the software. The "instruction decode" section analyzes each instruction to determine what operations to perform. Simple operations, such as addition, are performed directly by the integer execution unit. Complicated instructions (a hallmark of Intel's processors) are broken down into smaller steps by the "complex instruction support" circuitry, with the steps held in the "microcode ROM". The "branch prediction logic" improves performance when the processor must make a decision for a branch instruction.

The code and data caches provide a substantial performance boost. The problem is that the processor is considerably faster than the computer's RAM memory, so the processor can end up sitting idle until program code or data is provided by memory. The solution is the cache, a small, fast memory that holds bytes that the processor is likely to need. The Pentium processor had a small cache by modern standards, holding 8 kilobytes of code and 8 kilobytes of data. (In comparison, modern processors have multiple caches, with hundreds of kilobytes in the fastest cache and megabytes in a slower cache.) Cache memories are built from an array of memory storage elements in a structured grid, visible in the rug as uniform pink rectangles. The TLB (Translation Lookaside Buffer) assists the cache. Finally, the "bus interface logic" connects the processor to the computer's bus, providing access to memory and peripheral devices. Around the edges of the physical chip, tiny bond pads provide the connections between the silicon chip and the integrated circuit package. In the weaving, these tiny pads have been abstracted into small black rectangles.

The weaving is accurate enough to determine that it represents a specific Pentium variant, called P54C. The motivation for the P54C was that the original Pentium chips (called P5) were not as fast as hoped and ran hot. Intel fixed this by using a more advanced manufacturing process, reducing the feature size from 800 to 600 nanometers and running the chip at 3.3 volts instead of 5 volts. Intel also modified the chip so that when parts of the chip were idle, the clock signal could be stopped to save power. (This is the "clock driver" circuitry at the top of the weaving.) Finally, Intel added multiprocessor logic (adding 200,000 more transistors), allowing two processors to work together more easily. The improved Pentium chip was smaller, faster, and used less power. This variant was called the P54C (for reasons I haven't been able to determine). The "multiprocessor logic" is visible in the Pentium rug, showing that it is the P54C Pentium (right) and not the P5 Pentium (left).

The Pentium P5 on the left and the P54C on the right, showing the difference in die and package sizes. If you look closely, the P5 die on the left lacks the "multiprocessor logic" in the weaving, showing that the weaving is the P54C. I clipped the pins on the P5 to fit it under a microscope.

The Pentium P5 on the left and the P54C on the right, showing the difference in die and package sizes. If you look closely, the P5 die on the left lacks the "multiprocessor logic" in the weaving, showing that the weaving is the P54C. I clipped the pins on the P5 to fit it under a microscope.

Intel's connection with New Mexico started in 1980 when Intel opened a chip fabrication plant (fab) in Rio Rancho, a suburb north of Albuquerque. At the time, this plant, Fab 7, was Intel's largest and produced 70% of Intel's profits. Intel steadily grew the New Mexico facility, adding Fab 9 and then Fab 11, which opened in September 1995, building Pentium and Pentium Pro chips in a 140-step manufacturing process. Intel's investment in Rio Rancho has continued with a $4 billion project underway for Fab 9 and Fab 11x. Intel has been criticized for environmental issues in New Mexico, detailed in the book Intel inside New Mexico: A case study of environmental and economic injustice. Intel, however, claims a sustainable future in New Mexico, restoring watersheds, using 100% renewable electricity, and recycling construction waste.

Fairchild and Shiprock

Marilou Schultz is currently creating another weaving based on an integrated circuit, shown below. Although this chip, the Fairchild 9040, is much more obscure than the Pentium, it has important historical symbolism, as it was built by Navajo workers at a plant on Navajo land.

Marilou Schultz's current weaving project. Photo provided by the artist.

Marilou Schultz's current weaving project. Photo provided by the artist.

In 1965, Fairchild started producing semiconductors in Shiprock, New Mexico, about 200 miles northwest of Intel's future facility. Fairchild produced a brochure in 1969 to commemorate the opening of a new plant. Two of the photos in that brochure compared a traditional Navajo weaving to the pattern of a chip, which happened to be the 9040. Although Fairchild's Shiprock project started optimistically, it was suddenly shut down a decade later after an armed takeover. I'll discuss the complicated history of Fairchild in Shiprock and then describe the 9040 chip in more detail.

A Navajo rug and the die of a Fairchild 9040 integrated circuit. Images from Fairchild's commemorative brochure on the opening of a new plant at Shiprock.

A Navajo rug and the die of a Fairchild 9040 integrated circuit. Images from Fairchild's commemorative brochure on the opening of a new plant at Shiprock.

The story of Fairchild starts with William Shockley, who invented the junction transistor at Bell Labs, won the Nobel prize, and founded Shockley Semiconductor Laboratory in 1957 to build transistors. Unfortunately, although Shockley was brilliant, he was said to be the worst manager in the history of electronics, not to mention a notorious eugenicist and racist later in life. Eight of his top employees—called the "traitorous eight"—left Shockley's company in 1957 to found Fairchild Semiconductor. (The traitorous eight included Gordon Moore and Robert Noyce who ended up founding Intel in 1968). Noyce (co-)invented the integrated circuit in 1959 and Fairchild soon became a top semiconductor manufacturer, famous for its foundational role in Silicon Valley.

The Shiprock project was part of an attempt in the 1960s to improve the economic situation of the Navajo through industrial development. The Navajo had suffered a century of oppression including forced deportation from their land through the Long Walk (1864-1866). The Navajo were suffering from 65% unemployment, a per-capita income of $300, and a lack of basics such as roads, electricity, running water, and health care. The Bureau of Indian Affairs was now trying to encourage economic self-sufficiency by funding industrial projects on Indian land.4 Navajo Tribal Chairman Raymond Nakai viewed industrialization as the only answer. Called "the first modern Navajo political leader", Nakai stated, "There are some would-be leaders of the tribe calling for the banishment of industry from the reservation and a return to the life of a century ago! But, it would not solve the problems. There is not sufficient grazing land on the reservation to support the population so industry must be brought in." Finally, Fairchild was trying to escape the high cost of Silicon Valley labor by opening plants in low-cost locations such as Maine, Australia, and Hong Kong.

These factors led Fairchild to open a manufacturing facility on Navajo land in Shiprock, New Mexico. The project started in 1965 with 50 Navajo workers in the Shiprock Community Center manufacturing transistors, rapidly increasing to 366 Navajo workers.

Fairchild's manufacturing plant in Shiprock, NM, named after the Shiprock rock formation in the background. The formation is called Tsé Bitʼaʼí in Navajo.
    From The Industrialization of a 'Sleeping Giant', Commerce Today, January 25, 1971.

Fairchild's manufacturing plant in Shiprock, NM, named after the Shiprock rock formation in the background. The formation is called Tsé Bitʼaʼí in Navajo. From The Industrialization of a 'Sleeping Giant', Commerce Today, January 25, 1971.

By 1967, Robert Noyce, group vice-president of Fairchild, regarded the Shiprock plant as successful. He explained that Fairchild was motivated both by low labor costs and by social benefits, saying, "Probably nobody would ever admit it, but I feel sure the Indians are the most underprivileged ethnic group in the United States." Two years later, Lester Hogan, Fairchild's president, stated, "I thought the Shiprock plant was one of Bob Noyce's philanthropies until I went there," but he was so impressed that he decided to expand the plant. Hogan also directed Fairchild to help build hundreds of houses for workers; since a traditional Navajo dwelling is called a hogan, the houses were dubbed Hogan's hogans.

Workers in Fairchild's Shiprock plan, 1966. Photo by Jack Grimes. Photo courtesy of Computer History Museum, Henry Mahler collection of Fairchild Semiconductor photographs.

Workers in Fairchild's Shiprock plan, 1966. Photo by Jack Grimes. Photo courtesy of Computer History Museum, Henry Mahler collection of Fairchild Semiconductor photographs.

In 1969, Fairchild opened its new facility at Shiprock and produced the commemorative brochure mentioned earlier. As well as showing the striking visual similarity between the designs of traditional Navajo weavings and modern integrated circuits, it stated that "Weaving, like all Navajo arts, is done with unique imagination and craftsmanship" and described the "blending of innate Navajo skill and [Fairchild] Semiconductor's precision assembly techniques." Fairchild later said that "rug weaving, for instance, provides an inherent ability to recognize complex patterns, a skill which makes memorizing integrated circuit patterns a minimal problem."7

However, in Indigenous Circuits: Navajo Women and the Racialization of Early Electronic Manufacture, digital media theorist Lisa Nakamura critiques this language as a process by which "electronics assembly work became both gendered and identified with specific racialized qualities".5 Nakamura points out how "Navajo women’s affinity for electronics manufacture [was described] as both reflecting and satisfying an intrinsic gendered and racialized drive toward intricacy, detail, and quality."

Fairchild's Shiprock plant, 1966. From the patterns on the floor, this photo may show the time period when Fairchild set up manufacturing in the school gymnasium. Photo by Jack Grimes. Photo courtesy of Computer History Museum, Henry Mahler collection of Fairchild Semiconductor photographs.

Fairchild's Shiprock plant, 1966. From the patterns on the floor, this photo may show the time period when Fairchild set up manufacturing in the school gymnasium. Photo by Jack Grimes. Photo courtesy of Computer History Museum, Henry Mahler collection of Fairchild Semiconductor photographs.

At Shiprock, Fairchild employed 1200 workers,6 and all but 24 were Navajo, making Fairchild the nation's largest non-government employer of American Indians. Of the 33 production supervisors, 30 were Navajo. This project had extensive government involvement from the Bureau of Indian Affairs and the U.S. Public Health Service, while the Economic Development Administration made business loans to Fairchild, the Labor Department had job training programs, and Housing and Urban Development built housing at Shiprock7.

The Shiprock plant was considered a major success story at a meeting of the National Council on Indian Opportunity in 1971.7 US Vice President Agnew called the economic deprivation and 40-80% unemployment on Indian reservations "a problem of staggering magnitude" and encouraged more industrial development. Fairchild President Hogan stated that "Fairchild's program at Shiprock has been one of the most rewarding in the history of our company, from the standpoint of a sound business as well as social responsibility." He said that at first the plant was considered the "Shiprock experiment", but the plant was "now among the most productive and efficient of any Fairchild operation in the world." Peter MacDonald, Chairman of the Navajo Tribal Council and a World War II Navajo code talker, discussed the extreme poverty and unemployment on the Navajo reservation, along with "inadequate housing, inadequate health care and the lack of viable economic activities." He referred to Fairchild as "one of the best arrangements we have ever had" providing not only employment but also supporting housing through a non-profit.

Navajo workers using microscopes in Fairchild's Shiprock plant. From "The Navajo Nation Looks Ahead", National Geographic, December 1972.

Navajo workers using microscopes in Fairchild's Shiprock plant. From "The Navajo Nation Looks Ahead", National Geographic, December 1972.

In December 1972, National Geographic highlighted the Shiprock plant as "weaving for the Space Age", stating that the Fairchild plant was the tribe's most successful economic project with Shiprock booming due to the 4.5-million-dollar annual payroll. The article states: "Though the plant runs happily today, it was at first a battleground of warring cultures." A new manager, Paul Driscoll, realized that strict "white man's rules" were counterproductive. For instance, many employees couldn't phone in if they would be absent, as they didn't have telephones. Another issue was the language barrier since many workers spoke only Navajo, not English. So when technical words didn't exist in Navajo, substitutes were found: "aluminum" became "shiny metal". Driscoll also realized that Fairchild needed to adapt to traditional nine-day religious ceremonies. Soon the monthly turnover rate dropped from 12% to under 1%, better than Fairchild's other plants.

Unfortunately, the Fairchild-Navajo manufacturing partnership soon met a dramatic end. In 1975, the semiconductor industry was suffering from the ongoing US recession. Fairchild was especially hard hit, losing money on its integrated circuits, and it shed over 8000 employees between 1973 and 1975.8 At the Shiprock plant, Fairchild laid off9 140 Navajo employees in February 1975, angering the community. A group of 20 Indians armed with high-power rifles took over the plant, demanding that Fairchild rehire the employees. Fairchild portrayed the occupiers, part of the AIM (American Indian Movement), as an "outside group—representing neither employees, tribal authorities nor the community." Peter MacDonald, chairman of the Navajo Nation, agreed with the AIM on many points but viewed the AIM occupiers as "foolish" with "little sense of Navajo history" and "no sense of the need for an Indian nation to grow" (source). MacDonald negotiated with the occupiers and the occupation ended peacefully a week later, with unconditional amnesty granted to the occupiers.10 However, concerned about future disruptions, Fairchild permanently closed the Shiprock plant and transferred production to Southeast Asia.

An article entitled "Navajos Occupy Plant". Contrary to the title, MacDonald stated that many of the occupiers were from other tribes and were not acting in the best interest of the Navajo. From Workers' Power, the biweekly newspaper of the International Socialists, March 13-26, 1975.

An article entitled "Navajos Occupy Plant". Contrary to the title, MacDonald stated that many of the occupiers were from other tribes and were not acting in the best interest of the Navajo. From Workers' Power, the biweekly newspaper of the International Socialists, March 13-26, 1975.

For the most part, the Fairchild plant was viewed as a success prior to its occupation and closure. Navajo leader MacDonald looked back on the Fairchild plant as "a cooperative effort that was succeeding for everyone" (link). Alice Funston, a Navajo forewoman at Shiprock said, "Fairchild has not only helped women get ahead, it has been good for the entire Indian community in Shiprock."11 On the other hand, Fairchild general manager Charles Sporck had a negative view looking back: "It [Shiprock] never worked out. We were really screwing up the whole societal structure of the Indian tribe. You know, the women were making money and the guys were drinking it up. We had a very major negative impact upon the Navajo tribe."12

Despite the stereotypes in Sporck's comments, he touches on important gender issues, both at Fairchild and in the electronics industry as a whole. Fairchild had long recognized the lack of jobs for men at Shiprock, despite attempts to create roles for men. In 1971, Fairchild President Hogan stated that since "semiconductor assembly operation require a great amount of detail work with tiny components, [it] lends itself to female workers. As a result, there are nearly three times as many Navajo women employed by Fairchild as men."7

The role of women in fabricating and assembling electronics is often not recognized. A 1963 report on electronics manufacturing estimated that women workers made up 41 percent of total employment in electronics manufacturing, largely in gendered roles. The report suggested that microminiaturization of semiconductors gave women an advantage over men in assembly and production-line work; women made up over 70% of semiconductor production-line workers, with 90-99% of inspecting and testing jobs. and 90-100% of assembler jobs. Women were largely locked out of non-production jobs; although women held a few technician and drafting roles, the percentage of woman engineers was too low to measure.

The defense contractor General Dynamics also had Navajo plants, but with more success than Fairchild. General Dynamics opened a Navajo Nation plant in Fort Defiance, Arizona in 1967 to make missiles for the Navy. At the plant's opening, Navajo Tribal Chairman Raymond Nakai pushed for industrialization, stating that it was in "industrialization and the money and the jobs engendered thereby that the future of the Navajo people will lie." The plant started with 30 employees, growing to 224 by the end of 1969, but then dropping to 99 in 1971 due to a slowdown in the electronics industry. General Dynamics opened another Navajo plant near Farmington NM in 1988. Due to the end of the Cold War, Hughes acquired General Dynamics' missile business in 1991 before being acquired by General Motors in 1985 and sold to Raytheon in 1997. The Fort Defiance facility was closed in 2002 when its parent company, Delphi Automotive Systems, moved out of the military wiring business. The Farmington plant remains open, now Raytheon Diné, building components for Tomahawk, Javelin, and AMRAMM missiles.

Navajo workers at the General Dynamics plant in Fort Defiance, AZ. From the 1965 General Dynamics film "The Navajo moves into the electronic age". From American Indian Film Gallery.

Navajo workers at the General Dynamics plant in Fort Defiance, AZ. From the 1965 General Dynamics film "The Navajo moves into the electronic age". From American Indian Film Gallery.

Inside the Fairchild 9040 integrated circuit

The integrated circuit die image in Fairchild's commemorative brochure has an exceptionally striking design and color scheme. It's clear why this chip brings weaving to mind. Studying the die photo of the 9040 carefully reveals some interesting characteristics of integrated circuit design, so I will go into some detail.

Die photo of the Fairchild 9040 flip-flop. From the commemorative brochure.

Die photo of the Fairchild 9040 flip-flop. From the commemorative brochure.

The chip was fabricated from a tiny square of silicon, which appears purple in the photograph. Different regions of the silicon die were treated (doped) with impurities to change the properties of the silicon and thus create electronic devices. These doped regions appear as green or blue lines. The white lines are the metal layer on top of the silicon, connecting the components. The 13 metal rectangles around the border are the bond pads. The chip was packaged in an unusual 13-pin flat-pack, as shown below. Each of the 13 bond pads above was connected by a tiny wire to one of the 13 external pins.

The Fairchild 9040 packaged in a 13-pin flatpack integrated circuit. The chip was also available in a 14-pin DIP, a standard way of packaging chips. Photo from the commemorative brochure.

The Fairchild 9040 packaged in a 13-pin flatpack integrated circuit. The chip was also available in a 14-pin DIP, a standard way of packaging chips. Photo from the commemorative brochure.

The Fairchild 9040 was introduced in the mid-1960s as part of Fairchild's Micrologic family, a set of high-performance integrated circuits that were designed to work together.13 The 9040 chip was a "flip-flop", a circuit capable of storing a single bit, a 0 or 1. Flip-flops can be combined to form counters, counting the number of pulses, for instance.

The most dramatic patterns on the chip are the intricate serpentine blue lines. Each line forms a resistor, controlling the flow of electricity by impeding its path. The lines must be long to provide the desired resistance, so they wind back and forth to fit into the available space. Each end of a resistor is connected to the metal layer, wiring it to another part of the circuit. Most of the die is occupied by resistors, which is a disadvantage of this type of circuit. Modern integrated circuits use a different type of circuitry (CMOS), which is much more compact, partly because it doesn't need bulky resistors.

Resistors in the 9040 die.

Resistors in the 9040 die.

Transistors are the main component of an integrated circuit. These tiny devices act as switches, turning signals on and off. The photo below shows one of the transistors in the 9040. It consists of three layers of silicon, with metal wiring connected to each layer. Note the blue region in the middle, surrounded by a slightly darker purple region; these color changes indicate that the silicon has been doped to change its properties. The green region surrounding the transistor provides isolation between this transistor and the other circuitry, so the transistors don't interfere with each other. The chip also has many diodes, which look similar to transistors except a diode has two connections.

A transistor in the 9040 die. The three contacts are called the base, emitter, and collector.

A transistor in the 9040 die. The three contacts are called the base, emitter, and collector.

These transistors with their three layers of silicon are a type known as bipolar. Modern computers use a different type of transistor, metal-oxide-semiconductor (MOS), which is much more compact and efficient. One of Fairchild's major failures was staying with bipolar transistors too long, rather than moving to MOS.14 In a sense, the photo of the 9040 die shows the seeds of Fairchild's failure.

The 9040 chip was constructed on a completely different scale from the Pentium, showing the rapid progress of the IC industry. The 9040 contains just 16 transistors, while the Pentium contains 3.3 million transistors. Thus, individual transistors can be seen in the 9040 image, while only large-scale functional blocks are visible in the Pentium. This increasing transistor count illustrates the exponential growth in integrated circuit capacity between the 9040 in the mid-1960s and the Pentium in 1993. This growth pattern, with the number of transistors doubling about every two years, is known as Moore's law, since it was first observed in 1965 by Gordon Moore (one of Fairchild's "traitorous eight", who later started Intel).

The schematic below shows the circuitry inside the 9040 chip, with its 16 transistors, 16 diodes, and 22 resistors. The symmetry of the 9040 die photo makes it appealing, and that symmetry is reflected in the circuit below, with the left side and the right side mirror images. The idea behind a flip-flop is that it can hold either a 0 or a 1. In the chip, this is implemented by turning on the right side of the chip to hold a 0, or the left side to hold a 1. If one side of the chip is on, it forces the other side off, accomplished by the X-like crossings of signals in the center.15 Thus, the symmetry is not arbitrary, but is critical to the operation of the circuit.

Schematic of the Fairchild 9040 flip-flop chip. From Fairchild 1970 Data Catalog.

Schematic of the Fairchild 9040 flip-flop chip. From Fairchild 1970 Data Catalog.

Despite the obscurity of the 9040, multiple 9040 chips are currently on the Moon. The chip was used in the Apollo Lunar Surface Experiments Package (ALSEP),16 in particular, the Active Seismic Experiment on Apollo 14 and 16. This experiment detonated small explosives on the Moon and measured the resulting seismic waves. The photo below is a detail from a blueprint17 that shows three of the nineteen 9040 flip-flops (labeled "FF") as well as two 9041 logic gates, a chip in the same family as the 9040.

Detail from Logic Schematic Type B Board No.4 ASE.

Detail from Logic Schematic Type B Board No.4 ASE.

Conclusions

The similarities between Navajo weavings and the patterns in integrated circuits have been described since the 1960s. Marilou Schultz's weavings of integrated circuits make these visual metaphors into concrete works of art. Although the Woven Histories exhibit at the National Gallery of Art is no longer on display, the exhibit will be at the National Gallery of Canada (Ottawa) starting November 8, 2024, and the Museum of Modern Art (New York) starting April 20, 2025 (full dates here). If you're in the area, I recommend viewing the exhibit, but don't make my mistake: leave more than five minutes to see it!

Many thanks to Marilou Schultz for discussing her art with me. For more on her art, see A Conversation with Marilou Schultz on YouTube.18 Follow me on Mastodon as @kenshirriff@oldbytes.space or RSS for updates.

Notes and references

  1. The original Pentium was followed by the Pentium Pro, the Pentium II, and others, forming a long-running brand of high-performance processors. Pentium was Intel's flagship line until the Core processors took over in 2006. 

  2. Sheep hold a key role in Navajo culture and economy, which I'll briefly summarize here. Domestic sheep were brought to the Americas during the Spanish colonization, reaching the Navajo in the late 1500s. Since sheep were able to graze on semi-arid land unsuitable for crops, sheep became very important to the Navajo. Although the Navajo had used cotton for weaving in the past, the availability of wool made weaving a fundamental industry; the production and trading of woven Navajo blankets became an important economic factor in New Mexico by the 1750s (details).

    Navajo leader Peter MacDonald described the role of sheep: "Sheep were like money in the bank: the more you had, the better your life, your future, and your family's future." The number of sheep grew exponentially in the early 1900s, resulting in overgrazing of the land. The drought and Dust Bowl of the 1930s led the government to restrict the number of sheep on Navajo land, imposing the Navajo Livestock Reduction. This heavy-handed program purchased and slaughtered over half the livestock, which was catastrophic to the Navajo, both economically and culturally, destroying the Navajo's wealth and self-sufficiency.

    The Navajo-Churro sheep is a breed that the Navajo developed from the Churra sheep brought from Spain during the Spanish colonization of the Americas. These sheep have a long, lustrous fleece that is excellent for weaving. The Navajo-Churro is also called the Navajo Four-Horned Sheep as some rams have four horns, a rare trait. The Navajo-Churro breed was severely depleted when American troops killed livestock during the Navajo Wars (1863) and then brought close to extinction by the Livestock Reduction of the 1930s to 1950s. In the 1970s, the Navajo Sheep Project started efforts to preserve and revitalize the Navajo-Churro. The breed is still rare, but currently numbers in the thousands. Now, climate change and water shortages are putting more pressure on sheep grazing.  

  3. A photo of the rug was published in American Indian Science & Engineering Society 1994 Annual Report. This photo shows the "physically accurate" side of the rug, not the side that is currently on display.

    A photo of the rug from 1994.

    A photo of the rug from 1994.

    Which side of a die image is the top is mostly arbitrary. Intel usually presents die photos with the tiny text on the die right side up, so I will use that convention. For the Pentium die, this text is in the lower right corner and says "80P54C (m) (c) intel '92,'93". Of course, this text is much too small to be part of the woven rug. 

  4. Strengthening the Indian Economy (Indian Affairs, 1966) discusses various industrial development projects, of which Fairchild was the largest. Other projects included a plant at Rolla, ND to produce sapphire and ruby bearings, a Seminole project with Amphenol to produce electronic connectors, and a Hopi project with BVD to produce garments. Other economic development projects included timber and mining; extractive industries provided over half of Navajo income. 

  5. Racialization is defined by Nakamura as "the understanding of a specific population as possessing traits and behaviors that belong to a race, not an individual." 

  6. Many photos of workers at the Shiprock plant are in Fairchild VIEWS, March 1969. Fairchild deserves credit for referring to the workers by name rather than viewing them as anonymous props for photos. Fairchild followed the same practice in its annual reports

  7. NCIO (National Council on Indian Opportunity) News, Oct/Nov 1971 described a high-level meeting with industry to discuss "new development on Indian reservations" with industry. US Vice President Spiro Agnew ran the meeting, with Attorney General John Mitchell a speaker along with Navajo Tribal Council chairman Peter MacDonald. Bizarrely, all three ended up convicted of felonies for different reasons. Within a few years, Mitchell was imprisoned for Watergate crimes and Agnew pled guilty to federal tax evasion. In 1990, MacDonald was convicted of fraud, riot, extortion, racketeering, and conspiracy by a Navajo tribal judge and then a federal judge, spending eight years in prison until pardoned by Bill Clinton (details). The story of Peter MacDonald is complex and many view his prosecution as politically motivated; MacDonald's memoir provides his perspective. 

  8. Although Fairchild was highly successful at first, it suffered from chaotic management and economic decline. Fairchild steadily lost key employees, many of whom started competing companies. Most important was Intel, started in 1968 by Moore and Noyce, two of the "Traitorous Eight". Eventually, hundreds of companies (called the Fairchildren) could be traced back to Fairchild. Economic factors also battered Fairchild; the semiconductor industry had barely recovered from the 1970-1971 recession when it was hit by the severe 1975 recession. As a result, Fairchild had large layoffs, of which the Shiprock layoffs were a small part. Fairchild's business continued to decline; it was purchased by Schlumberger in 1979 and went through various acquisitions, mergers, and spinoffs until it finally ended in 2016, acquired by ON Semiconductor. 

  9. Were the employees "laid off" or "layed off"? Curiously, the New York Times article said "layed off" but sources uniformly state that "layed off" is grammatically wrong. The New York Times has extensively used "layed off" so this isn't a one-time typo. I hypothesized that usage had changed since the 1970s but Google Ngram Viewer shows laid off as the consistent and overwhelming winner. Maybe "layed off" was a stylistic quirk of the New York Times? 

  10. Looking back, MacDonald questioned his decision to let the occupation of Fairchild's plant continue rather than ordering the tribal police to forcibly remove the occupiers from the plant. In his view, his decision to let the occupation led to the closing of the plant and the loss of 1200 jobs. On the other hand, forcibly removing the occupiers risked violence and loss of life: "I would have become the chairman who killed his own people instead of the chairman who allowed Navajo to lose their jobs."

    The risk of bloodshed was not theoretical. In 1989, a riot between MacDonald's supporters and the police resulted in two Navajos being shot and killed by the police. MacDonald pressed for a federal investigation into police brutality, but instead MacDonald and Benally (a council delegate) received long prison sentences for inciting the riot even though they were not present at the time. 

  11. Alice Funston was Forewoman for the Reliability and Quality Assurance Section at Shiprock. In a Fairchild employee newsletter, she said, "Fairchild has not only helped women get ahead, it has been good for the entire Indian community in Shiprock. Before the plant was built here, there weren't many jobs available. You could work for the Bureau of Indian Affairs, the Navajo Tribe or other government agencies, but there just weren't enough jobs to go around. I started in assembly in 1965 and was recently promoted to Production Supervisor in R & Q.A. Since the beginning of the year, a number of women have been promoted into supervisory positions. When I joined Fairchild, most of the members of management were non-Indian. Today, almost all of our supervisors and managers are Indian."

    I quote this at length, since it was the only example I could find of an employee discussing Shiprock in their own words. It must be recognized, of course, that this is a company publication, so the comments may not be completely candid. See "Affirmative Action: A growing consciousness of the needs of the individual" in Fairchild HORIZONS, May-June, 1973. 

  12. See Interview with Charlie Sporck, 2000 February 21, timestamp 0:27. From "Silicon Genesis: oral history interviews of Silicon Valley scientists, 1995-2024," Stanford Digital Repository.

    I view Sporck's comments on the failure of Shiprock as highly questionable. First, Sporck left Fairchild in 1967, so he was not present for most of the Shiprock project. Moreover, he implies that Fairchild's closing of Shiprock was in the best interest of the Navajo, which is a morally convenient justification for Fairchild's decision, but contradicted by most other sources. 

  13. Fairchild's 9040 logic family was called LPDTμL for "low-power diode-transistor Micrologic". Some sources label this family as TTL (Transistor-Transistor Logic), probably confusing it with the 9000-family, which was TTL. 

  14. Fairchild's failure to recognize the importance of MOS transistors and transition from bipolar transistors is described in History of Semiconductor Engineering, page 170. 

  15. I'll provide more details of the 9040 schematic in this footnote. The 9040 is a flexible flip-flop. It can be wired as an R-S (reset-set) flip-flop, set to 1 or reset to 0 as needed. It can also be wired as a J-K flip-flop, a flexible circuit that can store a value, hold a value, or toggle, based on the settings of the J and K inputs.

    The 9040 is a "dual-rank" flip-flop, meaning it holds its value in two latches: a primary latch and a secondary latch. (This type of flip flop was generally called "master-slave", a name that is now controversial). Looking at the schematic, the primary latch at the bottom of the schematic passes its value to the secondary latch at the top under the control of the clock. This structure makes the flip-flop "edge-triggered", changing its value at the moment when the clock signal changes.

    This circuit uses diode-transistor logic. Diodes perform most of the logic operations by combining input signals, while the transistors provide amplification. Diodes play a different role in the "push-pull" output circuit, raising the level of the high-side transistor. Because the output circuit has a transistor, diode, and transistor stacked vertically, it is often called a totem pole output, a name that seems questionable in this context.

    One curious feature of the 9040 is that it contains two pull-up resistors that are not assigned any role. The user of the chip can attach them to unused inputs to keep the input at the desired value.

    Looking at the schematic shows 13 pins, corresponding to the 13 pins of the flat-pack integrated circuit. All but three of these pins are symmetrical; power (Vcc), ground, and the clock (CP) have single connections. The ground pad is in the bottom-center of the die, which maintains symmetry. The clock and power pads are side-by-side in the top-center of the die. If you study the die photograph closely, you will see that they subtlely break the chip's symmetry as the clock signal runs down the center of the die while the power connection runs down both sides. There are a few other subtle violations of symmetry when signals cross from one side of the chip to the other, as well as the obviously asymmetrical text. 

  16. I haven't been able to prove that the Apollo program used chips from the Shiprock plant rather than a different facility. Fairchild President Hogan stated that workers at Shiprock assembled guidance, communications, and gyro systems that were used on Apollo rockets. 

  17. The ALSEP schematic is from Miller, K. Logic Schematic Type B Board No.4 ASE, A4, technical drawing, January 27, 1967, University of North Texas Libraries, The Portal to Texas History; crediting Lunar Planetary Institute Library. 

  18. Marilou Schultz had another chip weaving on display at the National Gallery of Art. It is labeled "Untitled (Unknown Chip), 2008", but Antoine Bercovici identified it for me as the AMD K6 III processor, released in 1999 and comparable to the Pentium III.

    A weaving created by Marilou Schultz, "Untitled (Unknown Chip)".

    A weaving created by Marilou Schultz, "Untitled (Unknown Chip)".

    If you're interested in computer-related weaving, the exhibition also had "Copper Tapestry (Riva 128 Graphics Card, Nvidia, 1997)" by Argentinian artist Analia Saban, created on a computer-automated Jacquard loom. This weaving represents a PC graphics card, specifically, the STB Velocity 128, which uses the Nvidia Riva 128 GPU chip. This chip was released in 1997, at a point when Nvidia was in a dire financial position, thirty days from going out of business. The Riva 128 saved Nvidia and now Nvidia is the world's third most valuable company.

    A tapestry created by Analia Saban, "Copper Tapestry (Riva 128 Graphics Card, Nvidia, 1997)".

    A tapestry created by Analia Saban, "Copper Tapestry (Riva 128 Graphics Card, Nvidia, 1997)".

     





[#] Mon Sep 23 2024 11:44:32 UTC from rss <>

Subject: Inside a ferroelectric RAM chip

[Reply] [ReplyQuoted] [Headers] [Print]

Ferroelectric memory (FRAM) is an interesting storage technique that stores bits in a special "ferroelectric" material. Ferroelectric memory is nonvolatile like flash memory, able to hold its data for decades. But, unlike flash, ferroelectric memory can write data rapidly. Moreover, FRAM is much more durable than flash and can be be written trillions of times. With these advantages, you might wonder why FRAM isn't more popular. The problem is that FRAM is much more expensive than flash, so it is only used in niche applications.

Die of the Ramtron FM24C64 FRAM chip. (Click this image (or any other) for a larger version.)

Die of the Ramtron FM24C64 FRAM chip. (Click this image (or any other) for a larger version.)

This post takes a look inside an FRAM chip from 1999, designed by a company called Ramtron. The die photo above shows this 64-kilobit chip under a microscope; the four large dark stripes are the memory cells, containing tiny cubes of ferroelectric material. The horizontal greenish bands are the drivers to select a column of memory, while the vertical greenish band at the right holds the sense amplifiers that amplify the tiny signals from the memory cells. The eight whitish squares around the border of the die are the bond pads, which are connected to the chip's eight pins.1 The logic circuitry at the left and right of the die implements the serial (I2C) interface for communication with the chip.2

The history of ferroelectric memory dates back to the early 1950s.3 Many companies worked on FRAM from the 1950s to the 1970s, including Bell Labs, IBM, RCA, and Ford. The 1955 photo below shows a 256-bit ferroelectric memory built by Bell Labs. Unfortunately, ferroelectric memory had many problems,4 limiting it to specialized applications, and development was mostly abandoned by the 1970s.

A 256-bit ferroelectric memory made by Bell Labs. Photo from Scientific American, June, 1955.

A 256-bit ferroelectric memory made by Bell Labs. Photo from Scientific American, June, 1955.

Ferroelectric memory had a second chance, though. A major proponent of ferroelectric memory was George Rohrer, who started working on ferroelectric memory in 1968. He formed a memory company, Technovation, which was unsuccessful, and then cofounded Ramtron in 1984.5 Ramtron produced a tiny 256-bit memory chip in 1988, followed by much larger memories in the 1990s.

How FRAM works

Ferroelectric memory uses a special material with the property of ferroelectricity. In a normal capacitor, applying an electric field causes the positive and negative charges to separate in the dielectric material, making it polarized. However, ferroelectric materials are special because they will retain this polarization even when the electric field is removed. By polarizing a ferroelectric material positively or negatively, a bit of data can be stored. (The name "ferroelectric" is in analogy to "ferromagnetic", even though ferroelectric materials are not ferrous.)

This FRAM chip uses a ferroelectric material called lead zirconate titanate or PZT, containing lead, zircon, titanium, and oxygen. The diagram below shows how an applied electric field causes the lead or zircon atom to physically move inside the crystal lattice, causing the ferroelectric effect. (Red atoms are lead, purple are oxygen, and yellow are zircon or titanium.) Because the atoms physically change position, the polarization is stable for decades; in contrast, the capacitors in a DRAM chip lose their data in milliseconds unless refreshed. FRAM memory will eventually wear out, but it can be written trillions of times, much more than flash or EEPROM memory.

The ferroelectric effect in the PZT crystal. From Ramtron Catalog, cleaned up.

The ferroelectric effect in the PZT crystal. From Ramtron Catalog, cleaned up.

To store data, FRAM uses ferroelectric capacitors, capacitors with a ferroelectric material as the dielectric between the plates. Applying a voltage to the capacitor will create an electric field, polarizing the ferroelectric material. A positive voltage will store a 1, and a negative voltage will store a 0.

Reading a bit from memory is a bit tricky. A positive voltage is applied, forcing the material into the 1 state. If the material was already in the 1 state, minimal current will flow. But if the material was in the 0 state, more current will flow as the capacitor changes state. This allows the 0 and 1 states to be distinguished.

Note that reading the bit destroys the stored value. Thus, after a read, the 0 or 1 value must be written back to the capacitor to restore its previous state. (This is very similar to the magnetic core memory that was used in the 1960s.)6

The FRAM chip that I examined uses two capacitors per bit, storing opposite values. This approach makes it easier to distinguish a 1 from a 0: a sense amplifier compares the two tiny signals and generates a 1 or a 0 depending on which is larger. The downside of this approach is that using two capacitors per bit reduces the memory capacity. Later FRAMs increased the density by using one capacitor per bit, along with reference cells for comparison.7

A closer look at the die

The diagram below shows the main functional blocks of the chip.8 The memory itself is partitioned into four blocks. The word line decoders select the appropriate column for the address and the drivers generate the pulses on the word and plate lines. The signals from that column go to the sense amplifiers on the right, where the signals are converted to bits and written back to memory. On the left, the precharge circuitry charges the bit lines to a fixed voltage at the start of the memory cycle, while the decoders select the desired byte from the bit lines.

The die with the main functional blocks labeled.

The die with the main functional blocks labeled.

The diagram below shows a closeup of the memory. I removed the top metal layer and many of the memory cells to reveal the underlying structure. The structure is very three-dimensional compared to regular chips; the gray squares in the image are cubes of PZT, sitting on top of the plate lines. The brown rectangles labeled "top plate connection" are also three-dimensional; they are S-shaped brackets with the low end attached to the silicon and the high end contacting the top of the PZT cube. Thus, each PZT cube forms a capacitor with the plate line forming the bottom plate of the capacitor, the bracket forming the top plate connection, and the PZT cube sandwiched in between, providing the ferroelectric dielectric. (Some cubes have been knocked loose in this photo and are sitting at an angle; the cubes form a regular grid in the original chip.)

Structure of the memory. The image is focus-stacked for clarity.

Structure of the memory. The image is focus-stacked for clarity.

The physical design of the chip is complicated and quite different from a typical planar integrated circuit. Each capacitor requires a cube of PZT sandwiched between platinum electrodes, with the three-dimensional contact from the top of the capacitor to the silicon. Creating these structures requires numerous steps that aren't used in normal integrated circuit fabrication. (See the footnote9 for details.) Moreover, the metal ions in the PZT material can contaminate the silicon production facility unless great care is taken, such as using a separate facility to apply the ferroelectric layer and all subsequent steps.10 The additional fabrication steps and unusual materials significantly increase the cost of manufacturing FRAM.

Each top plate connection has an associated transistor, gated by a vertical word line.11 The transistors are connected to horizontal bit lines, metal lines that were removed for this photo. A memory cell, containing two capacitors, measures about 4.2 µm × 6.5 µm. The PZT cubes are spaced about 2.1 µm apart. The transistor gate length is roughly 700 nm. The 700 nm node was introduced in 1993, while the die contains a 1999 copyright date, so the chip appears to be a few years behind the cutting edge as far as node.

The memory is organized as 256 capacitors horizontally by 512 capacitors vertically, for a total of 64 kilobits (since each bit requires two capacitors). The memory is accessed as 8192 bytes. Curiously, the columns are numbered on the die, as shown below.

With the metal removed, the numbers are visible counting the columns.

With the metal removed, the numbers are visible counting the columns.

The photo below shows the sense amplifiers to the right of the memory, with some large transistors to boost the signal. Each sense amplifier receives two signals from the pair of capacitors holding a bit. The sense amplifier determines which signal is larger, deciding if the bit is a 0 or 1. Because the signals are very small, the sense amplifier must be very sensitive. The amplifier has two cross-connected transistors with each transistor trying to pull the other signal low. The signal that starts off larger will "win", creating a solid 0 or 1 signal. This value is rewritten to memory to restore the value, since reading the value erases the cells. In the photo, a few of the ferroelectric capacitors are visible at the far left. Part of the lower metal layer has come loose, causing the randomly strewn brown rectangles.

The sense amplifiers.

The sense amplifiers.

The photo below shows eight of the plate drivers, below the memory cells. This circuit generates the pulse on the selected plate line. The plate lines are the thick white lines at the top of the image; they are platinum so they appear brighter in the photo than the other metal lines. Most of the capacitors are still present on the plate lines, but some capacitors have come loose and are scattered on the rest of the circuitry. Each plate line is connected to a metal line (brown), which connects the plate line to the drive transistors in the middle and bottom of the image. These transistors pull the appropriate plate line high or low as necessary. The columns of small black circles are connections between the metal line and the silicon of the transistor underneath.

The plate driver circuitry.

The plate driver circuitry.

Finally, here's the part number and Ramtron logo on the die.

Closeup of the logo "FM24C64A Ramtron" on the die.

Closeup of the logo "FM24C64A Ramtron" on the die.

Conclusions

Ferroelectric RAM is an example of a technology with many advantages that never achieved the hoped-for success. Many companies worked on FRAM from the 1950s to the 1970s but gave up on it. Ramtron tried again and produced products but they were not profitable. Ramtron had hoped that the density and cost of FRAM would be competitive with DRAM, but unfortunately that didn't pan out. Ramtron was acquired by Cypress Semiconductor in 2012 and then Cypress was acquired by Infineon in 2019. Infineon still sells FRAM, but it is a niche product, for instance satellites that need radiation hardness. Currently, FRAM costs roughly $3/megabit, almost three orders of magnitude more expensive than flash memory, which is about $15/gigabit. Nonetheless, FRAM is a fascinating technology and the structures inside the chip are very interesting.

For more, follow me on Mastodon as @kenshirriff@oldbytes.space or RSS. (I've given up on Twitter.) Thanks to CuriousMarc for providing the chip, which was used in a digital readout (DRO) for his CNC machine.

Notes and references

  1. The photo below shows the chip's 8-pin package.

    The chip is packaged in an 8-pin DIP. "RIC" stands for Ramtron International Corporation.

    The chip is packaged in an 8-pin DIP. "RIC" stands for Ramtron International Corporation.

     

  2. The block diagram shows the structure of the chip, which is significantly different from a standard DRAM chip. The chip has logic to handle the I2C protocol, a serial protocol that uses a clock and a data line. (Note that the address lines A0-A2 are the address of the chip, not the memory address.) The WP (Write Protect) pin, protects one quarter of the chip from being modified. The chip allows an arbitrary number of bytes to be read or written sequentially in one operation. This is implemented by the counter and address latch.

    Block diagram of the FRAM chip. From the datasheet.

    Block diagram of the FRAM chip. From the datasheet.

     

  3. An early description of ferroelectric memory is in the October 1953 Proceedings of the IRE. This issue focused on computers and had an article on computer memory systems by J. P. Eckert of ENIAC fame. In 1953, computer memory systems were primitive: mercury delay lines, electrostatic CRTs (Williams tubes), or rotating drums. The article describes experimental memory technologies including ferroelectric memory, magnetic core memory, neon-capacitor memory, phosphor drums, temperature-sensitive pigments, corona discharge, or electrolytic diodes. Within a couple of years, magnetic core memory became successful, dominating storage until semiconductor memory took over in the 1970s, and most of the other technologies were forgotten. 

  4. A 1969 article in Electronics discussed ferroelectric memories. At the time, ferroelectric memories were used for a few specialized applications. However, ferroelectric memories had many issues: slow write speed, high voltages (75 to 150 volts), and expensive logic to decode addresses. The article stated: "These considerations make the future of ferroelectric memories in computers rather bleak." 

  5. Interestingly, the "Ram" in Ramtron comes from the initials of the cofounders: Rohrer, Araujo, and McMillan. Rohrer originally focused on potassium nitrate as the ferroelectric material, as described in his patent. (I find it surprising that potassium nitrate is ferroelectric since it seems like such a simple, non-exotic chemical.) An extensive history of Ramtron is here. A Popular Science article also provides information. 

  6. Like core memory, ferroelectric memory is based on a hysteresis loop. Because of the hysteresis loop, the material has two stable states, storing a 0 or 1. While core memory has a hysteresis loop for magnetization with respect to the magnetic field, ferroelectric memory The difference is that core memory has hysteresis of the magnetization with respect to the applied magnetic field, while ferroelectric memory has hysteresis of the polarization with respect to the applied electric field. 

  7. The reference cell approach is described in Ramtron patent 6028783A. The idea is to have a row of reference capacitors, but the reference capacitors are sized to generate a current midway between the 0 current and the 1 current. The reference capacitors provide the second input to the sense amplifiers, allowing the 0 and 1 bits to be distinguished. 

  8. Ramtron's 1987 patent describes the approximate structure of the memory. 

  9. The diagram below shows the complex process that Ramtron used to create an FRAM chip. (These steps are from a 2003 patent, so they may differ from the steps for the chip I examined.)

    Ramtron's process flow to create an FRAM die. From Patent 6613586.

    Ramtron's process flow to create an FRAM die. From Patent 6613586.

    Abbreviations: BPSG is borophosphosilicate glass. UTEOS is undoped tetraethylorthosilicate, a liquid used to deposit silicon dioxide on the surface. RTA is rapid thermal anneal. PTEOS is phosphorus-doped tetraethylorthosilicate, used to create a phosphorus-doped silicon dioxide layer. CMP is chemical mechanical planarization, polishing the die surface to be flat. TEC is the top electrode contact. ILD is interlevel dielectric, the insulating layer between conducting layers. 

  10. See the detailed article Ferroelectric Memories, Science, 1989, by Scott and Araujo (who is the "A" in "Ramtron"). 

  11. Early FRAM memories used an X-Y grid of wires without transistors. Although much simpler, this approach had the problem that current could flow through unwanted capacitors via "sneak" paths, causing noise in the signals and potentially corrupting data. High-density integrated circuits, however, made it practical to associate a transistor with each cell in modern FRAM chips. 





[#] Sat Sep 28 2024 08:24:48 UTC from rss <>

Subject: Reverse-engineering a three-axis attitude indicator from the F-4 fighter plane

[Reply] [ReplyQuoted] [Headers] [Print]

We recently received an attitude indicator for the F-4 fighter plane, an instrument that uses a rotating ball to show the aircraft's orientation and direction. In a normal aircraft, the artificial horizon shows the orientation in two axes (pitch and roll), but the F-4 indicator uses a rotating ball to show the orientation in three axes, adding azimuth (yaw).1 It wasn't obvious to me how the ball could rotate in three axes: how could it turn in every direction and still remain attached to the instrument?

The attitude indicator. The "W" forms a stylized aircraft. In this case, it indicates that the aircraft is climbing slightly. Photo from CuriousMarc.

The attitude indicator. The "W" forms a stylized aircraft. In this case, it indicates that the aircraft is climbing slightly. Photo from CuriousMarc.

We disassembled the indicator, reverse-engineered its 1960s-era circuitry, fixed some problems,2 and got it spinning. The video clip below shows the indicator rotating around three axes. In this blog post, I discuss the mechanical and electrical construction of this indicator. (The quick explanation is that the ball is really two hollow half-shells attached to the internal mechanism at the "poles"; the shells rotate while the "equator" remains stationary.)

The F-4 aircraft

The indicator was used in the F-4 Phantom II3 so the pilot could keep track of the aircraft's orientation during high-speed maneuvers. The F-4 was a supersonic fighter manufactured from 1958 to 1981. Over 5000 were produced, making it the most-produced American supersonic aircraft ever. It was the main US fighter jet in the Vietnam War, operating from aircraft carriers. The F-4 was still used in the 1990s during the Gulf War, suppressing air defenses in the "Wild Weasel" role. The F-4 was capable of carrying nuclear bombs.4

An F-4G Phantom II Wild Weasel aircraft. From National Archives.

An F-4G Phantom II Wild Weasel aircraft. From National Archives.

The F-4 was a two-seat aircraft, with the radar intercept office controlling radar and weapons from a seat behind the pilot. Both cockpits had a panel crammed with instruments, with additional instruments and controls on the sides. As shown below, the pilot's panel had the three-axis attitude indicator in the central position, just below the reddish radar scope, reflecting its importance.5 (The rear cockpit had a simpler two-axis attitude indicator.)

The cockpit of the F-4C Phantom II, with the attitude indicator in the center of the panel. Click this photo (or any other) for a larger version. Photo from National Museum of the USAF.

The cockpit of the F-4C Phantom II, with the attitude indicator in the center of the panel. Click this photo (or any other) for a larger version. Photo from National Museum of the USAF.

The attitude indicator mechanism

The ball inside the indicator shows the aircraft's position in three axes. The roll axis indicates the aircraft's angle if it rolls side-to-side along its axis of flight. The pitch axis indicates the aircraft's angle if it pitches up or down. Finally, the azimuth axis indicates the compass direction that the aircraft is heading, changed by the aircraft's turning left or right (yaw). The indicator also has moving needles and status flags, but in this post I'm focusing on the rotating ball.6

The indicator uses three motors to move the ball. The roll motor (below) is attached to the frame of the indicator, while the pitch and azimuth motors are inside the ball. The ball is held in place by the roll gimbal, which is attached to the ball mechanism at the top and bottom pivot points. The roll motor turns the roll gimbal and thus the ball, providing a clockwise/counterclockwise movement. The roll control transformer provides position feedback. Note the numerous wires on the roll gimbal, connected to the mechanism inside the ball.

The attitude indicator with the cover removed.

The attitude indicator with the cover removed.

The diagram below shows the mechanism inside the ball, after removing the hemispherical shells of the ball. When the roll gimbal is rotated, this mechanism rotates with it. The pitch motor causes the entire mechanism to rotate around the pitch axis (horizontal here), which is attached along the "equator". The azimuth motor and control transformer are behind the pitch components, not visible in this photo. The azimuth motor turns the vertical shaft. The two hollow hemispheres of the ball attach to the top and bottom of the shaft. Thus, the azimuth motor rotates the ball shells around the azimuth axis, while the mechanism itself remains stationary.

The components of the ball mechanism.

The components of the ball mechanism.

Why doesn't the wiring get tangled up as the ball rotates? The solution is two sets of slip rings to implement the electrical connections. The photo below shows the first slip ring assembly, which handles rotation around the roll axis. These slip rings connect the stationary part of the instrument to the rotating roll gimbal. The black base and the vertical wires are attached to the instrument, while the striped shaft in the middle rotates with the ball assembly housing. Inside the shaft, wires go from the circular metal contacts to the roll gimbal.

The first set of slip rings. Yes, there is damage on one of the slip ring contacts.

The first set of slip rings. Yes, there is damage on one of the slip ring contacts.

Inside the ball, a second set of slip rings provides the electrical connection between the wiring on the roll gimbal and the ball mechanism. The photo below shows the connections to these slip rings, handling rotation around the pitch axis (horizontal in this photo). (The slip rings themselves are inside and are not visible.) The shaft sticking out of the assembly rotates around the azimuth (yaw) axis. The ball hemisphere is attached to the metal disk. The azimuth axis does not require slip rings since only the ball shells rotates; the electronics remain stationary.

Connections for the second set of slip rings.

Connections for the second set of slip rings.

The servo loop

In this section, I'll explain how the motors are controlled by servo loops. The attitude indicator is driven by an external gyroscope, receiving electrical signals indicating the roll, pitch, and azimuth positions. As was common in 1960s avionics, the signals are transmitted from synchros, which use three wires to indicate an angle. The motors inside the attitude indicator rotate until the indicator's angles for the three axes match the input angles.

Each motor is controlled by a servo loop, shown below. The goal is to rotate the output shaft to an angle that exactly matches the input angle, specified by the three synchro wires. The key is a device called a control transformer, which takes the three-wire input angle and a physical shaft rotation, and generates an error signal indicating the difference between the desired angle and the physical angle. The amplifier drives the motor in the appropriate direction until the error signal drops to zero. To improve the dynamic response of the servo loop, the tachometer signal is used as a negative feedback voltage. This ensures that the motor slows as the system gets closer to the right position, so the motor doesn't overshoot the position and oscillate. (This is sort of like a PID controller.)

This diagram shows the structure of the servo loop, with a feedback loop ensuring that the rotation angle of the output shaft matches the input angle.

This diagram shows the structure of the servo loop, with a feedback loop ensuring that the rotation angle of the output shaft matches the input angle.

In more detail, the external gyroscope unit contains synchro transmitters, small devices that convert the angular position of a shaft into AC signals on three wires. The photo below shows a typical synchro, with the input shaft on the top and five wires at the bottom: two for power and three for the output.

A synchro transmitter.

A synchro transmitter.

Internally, the synchro has a rotating winding called the rotor that is driven with 400 Hz AC. Three fixed stator windings provide the three AC output signals. As the shaft rotates, the phase and voltage of the output signals changes, indicating the angle. (Synchros may seem bizarre, but they were extensively used in the 1950s and 1960s to transmit angular information in ships and aircraft.)

The schematic symbol for a synchro transmitter or receiver.

The schematic symbol for a synchro transmitter or receiver.

The attitude indicator uses control transformers to process these input signals. A control transformer is similar to a synchro in appearance and construction, but it is wired differently. The three stator windings receive the inputs and the rotor winding provides the error output. If the rotor angle of the synchro transmitter and control transformer are the same, the signals cancel out and there is no error output. But as the difference between the two shaft angles increases, the rotor winding produces an error signal. The phase of the error signal indicates the direction of error.

The next component is the motor/tachometer, a special motor that was often used in avionics servo loops. This motor is more complicated than a regular electric motor. The motor is powered by 115 volts AC, 400-Hertz, but this isn't sufficient to get the motor spinning. The motor also has two low-voltage AC control windings. Energizing a control winding will cause the motor to spin in one direction or the other.

The motor/tachometer unit also contains a tachometer to measure its rotational speed, for use in a feedback loop. The tachometer is driven by another 115-volt AC winding and generates a low-voltage AC signal proportional to the rotational speed of the motor.

A motor/tachometer similar (but not identical) to the one in the attitude indicator).

A motor/tachometer similar (but not identical) to the one in the attitude indicator).

The photo above shows a motor/tachometer with the rotor removed. The unit has many wires because of its multiple windings. The rotor has two drums. The drum on the left, with the spiral stripes, is for the motor. This drum is a "squirrel-cage rotor", which spins due to induced currents. (There are no electrical connections to the rotor; the drums interact with the windings through magnetic fields.) The drum on the right is the tachometer rotor; it induces a signal in the output winding proportional to the speed due to eddy currents. The tachometer signal is at 400 Hz like the driving signal, either in phase or 180º out of phase, depending on the direction of rotation. For more information on how a motor/generator works, see my teardown.

The amplifier

The motors are powered by an amplifier assembly that contains three separate error amplifiers, one for each axis. I had to reverse engineer the amplifier assembly in order to get the indicator working. The assembly mounts on the back of the attitude indicator and connects to one of the indicator's round connectors. Note the cutout in the lower left of the amplifier assembly to provide access to the second connector on the back of the indicator. The aircraft connects to the indicator through the second connector and the indicator passes the input signals to the amplifier through the connector shown above.

The amplifier assembly.

The amplifier assembly.

The amplifier assembly contains three amplifier boards (for roll, pitch, and azimuth), a DC power supply board, an AC transformer, and a trim potentiometer.7 The photo below shows the amplifier assembly mounted on the back of the instrument. At the left, the AC transformer produces the motor control voltage and powers the power supply board, mounted vertically on the right. The assembly has three identical amplifier boards; the middle board has been unmounted to show the components. The amplifier connects to the instrument through a round connector below the transformer. The round connector at the upper left is on the instrument case (not the amplifier) and provides the connection between the aircraft and the instrument.8

The amplifier assembly mounted on the back of the instrument. We are feeding test signals to the connector in the upper left.

The amplifier assembly mounted on the back of the instrument. We are feeding test signals to the connector in the upper left.

The photo below shows one of the three amplifier boards. The construction is unusual, with some components stacked on top of other components to save space. Some of the component leads are long and protected with clear plastic sleeves. The board is connected to the rest of the amplifier assembly through a bundle of point-to-point wires, visible on the left. The round pulse transformer in the middle has five colorful wires coming out of it. At the right are the two transistors that drive the motor's control windings, with two capacitors between them. The transistors are mounted on a heat sink that is screwed down to the case of the amplifier assembly for cooling. The board is covered with a conformal coating to protect it from moisture or contaminants.

One of the three amplifier boards.

One of the three amplifier boards.

The function of each amplifier board is to generate the two control signals so the motor rotates in the appropriate direction based on the error signal fed into the amplifier. The amplifier also uses the tachometer output from the motor unit to slow the motor as the error signal decreases, preventing overshoot. The inputs to the amplifier are 400 hertz AC signals, with the phase indicating positive or negative error. The outputs drive the two control windings of the motor, determining which direction the motor rotates.

The schematic for the amplifier board is below. The two transistors on the left amplify the error and tachometer signals, driving the pulse transformer. The outputs of the pulse transformer will have opposite phase, driving the output transistors for opposite halves of the 400 Hz cycle. One of the transistors will be in the right phase to turn on and pull the motor control AC to ground, while the other transistor will be in the wrong phase. Thus, the appropriate control winding will be activated (for half the cycle), causing the motor to spin in the desired direction.

Schematic of one of the three amplifier boards. (Click for a larger version.)

Schematic of one of the three amplifier boards. (Click for a larger version.)

It turns out that there are two versions of the attitude indicator that use incompatible amplifiers. I think that the motors for the newer indicators have a single control winding rather than two. Fortunately, the connectors are keyed differently so you can't attach the wrong amplifier. The second amplifier (below) looks slightly more modern (1980s) with a double-sided circuit board and more components in place of the pulse transformer.

The second type of amplifier board.

The second type of amplifier board.

The pitch trim circuit

The attitude indicator has a pitch trim knob in the lower right, although the knob was missing from ours. The pitch trim adjustment turns out to be rather complicated. In level flight, an aircraft may have its nose angled up or down slightly to achieve the desired angle of attack. The pilot wants the attitude indicator to show level flight, even though the aircraft is slightly angled, so the indicator can be adjusted with the pitch trim knob. However, the problem is that a fighter plane may, for instance, do a vertical 90º climb. In this case, the attitude indicator should show the actual attitude and ignore the pitch trim adjustment.

I found a 1957 patent that explained how this is implemented. The solution is to "fade out" the trim adjustment when the aircraft moves away from horizontal flight. This is implemented with a special multi-zone potentiometer that is controlled by the pitch angle.

The schematic below shows how the pitch trim signal is generated from the special pitch angle potentiometer and the pilot's pitch trim adjustment. Like most signals in the attitude indicator, the pitch trim is a 400 Hz AC signal, with the phase indicating positive or negative. Ignoring the pitch angle for a moment, the drive signal into the transformer will be AC. The split windings of the transformer will generate a positive phase and a negative phase signal. Adjusting the pitch trim potentiometer lets the pilot vary the trim signal from positive to zero to negative, applying the desired correction to the indicator.

The pitch trim circuit. Based on the patent.

The pitch trim circuit. Based on the patent.

Now, look at the complex pitch angle potentiometer. It has alternating resistive and conducting segments, with AC fed into opposite sides. (Note that +AC and -AC refer to the phase, not the voltage.) Because the resistances are equal, the AC signals will cancel out at the top and the bottom, yielding 0 volts on those segments. If the aircraft is roughly horizontal, the potentiometer wiper will pick up the positive-phase AC and feed it into the transformer, providing the desired trim adjustment as described previously. However, if the aircraft is climbing nearly vertically, the wiper will pick up the 0-volt signal, so there will be no pitch trim adjustment. For an angle range in between, the resistance of the potentiometer will cause the pitch trim signal to smoothly fade out. Likewise, if the aircraft is steeply diving, the wiper will pick up the 0 signal at the bottom, removing the pitch trim. And if the aircraft is inverted, the wiper will pick up the negative AC phase, causing the pitch trim adjustment to be applied in the opposite direction.

Conclusions

The attitude indicator is a key instrument in any aircraft, especially important when flying in low visibility. The F-4's attitude indicator goes beyond the artificial horizon indicator in a typical aircraft, adding a third axis to show the aircraft's heading. Supporting a third axis makes the instrument much more complicated, though. Looking inside the indicator reveals how the ball rotates in three axes while still remaining firmly attached.

Modern fighter planes avoid complex electromechanical instruments. Instead, they provide a "glass cockpit" with most data provided digitally on screens. For instance, the F-35's console replaces all the instruments with a wide panoramic touchscreen displaying the desired information in color. Nonetheless, mechanical instruments have a special charm, despite their impracticality.

For more, follow me on Mastodon as @kenshirriff@oldbytes.space or RSS. (I've given up on Twitter.) I worked on this project with CuriousMarc and Eric Schlapfer, so expect a video at some point. Thanks to John Pumpkinhead and another collector for supplying the indicators and amplifiers.

Notes and references

Specifications9

  1. This three-axis attitude indicator is similar in many ways to the FDAI (Flight Director Attitude Indicator) that was used in the Apollo space flights, although the FDAI has more indicators and needles. It is more complex than the Soyus Globus, used for navigation (teardown), which rotates in two axes. Maybe someone will loan us an FDAI to examine...
     

  2. Our indicator has been used as a parts source, as it has cut wires inside and is missing the pitch trim knob, several needles, and internal adjustment potentiometers. We had to replace two failed capacitors in the power supply. There is still a short somewhere that we are tracking down; at one point it caused the bond wire inside a transistor to melt(!). 

  3. The aircraft is the "Phantom II" because the original Phantom was a World War II fighter aircraft, the McDonnell FH Phantom. McDonnell Douglas reused the Phantom name for the F-4. (McDonnell became McDonnell Douglas in 1967 after merging with Douglas Aircraft. McDonnell Douglas merged into Boeing in 1997. Many people blame Boeing's current problems on this merger.) 

  4. The F-4 could carry a variety of nuclear bombs such as the B28EX, B61, B43 and B57, referred to as "special weapons". The photo below shows the nuclear store consent switch, which armed a nuclear bomb for release. (Somehow I expected a more elaborate mechanism for nuclear bombs.) The switch labels are in the shadows, but say "REL/ARM", "SAFE", and "REL". The F-4 Weapons Delivery Manual discusses this switch briefly.

    The nuclear store consent switch, to the right of the Weapons System Officer in the rear cockpit. Photo from National Museum of the USAF.

    The nuclear store consent switch, to the right of the Weapons System Officer in the rear cockpit. Photo from National Museum of the USAF.

     

  5. The photo below is a closeup of the attitude indicator in the F-4 cockpit. Note the Primary/Standby toggle switch in the upper-left. Curiously, this switch is just screwed onto the console, with exposed wires. Based on other sources, this appears to be the standard mounting. This switch is the "reference system selector switch" that selects the data source for the indicator. In the primary setting, the gyroscopically-stabilized inertial navigation system (INS) provides the information. The INS normally gets azimuth information from the magnetic compass, but can use a directional gyro if the Earth's magnetic field is distorted, such as in polar regions. See the F-4E Flight Manual for details.

    A closeup of the indicator in the cockpit of the F-4 Phantom II. Photo from National Museum of the USAF.

    A closeup of the indicator in the cockpit of the F-4 Phantom II. Photo from National Museum of the USAF.

    The standby switch setting uses the bombing computer (the AN/AJB-7 Attitude-Reference Bombing Computer Set) as the information source; it has two independent gyroscopes. If the main attitude indicator fails entirely, the backup is the "emergency attitude reference system", a self-contained gyroscope and indicator below and to the right of the main attitude indicator; see the earlier cockpit photo. 

  6. The diagram below shows the features of the indicator.

    The features of the Attitude Director Indicator (ADI). From F-4E Flight Manual TO 1F-4E-1.

    The features of the Attitude Director Indicator (ADI). From F-4E Flight Manual TO 1F-4E-1.

    The pitch steering bar is used for an instrument (ILS) landing. The bank steering bar provides steering information from the navigation system for the desired course. 

  7. The roll, pitch, and azimuth inputs require different resistances, for instance, to handle the pitch trim input. These resistors are on the power supply board rather than an amplifier board. This allows the three amplifier boards to be identical, rather than having slightly different amplifier boards for each axis. 

  8. The attitude indicator assembly has a round mil-spec connector and the case has a pass-through connector. That is, the aircraft wiring plugs into the outside of the case and the indicator internals plug into the inside of the case. The pin numbers on the outside of the case don't match the pin numbers on the internal connector, which is very annoying when reverse-engineering the system. 

  9. In this footnote, I'll link to some of the relevant military specifications.

    The attitude indicator is specified in military spec MIL-I-27619, which covers three similar indicators, called ARU-11/A, ARU-21/A, and ARU-31/A. The three indicators are almost identical except the the ARU-21/A has the horizontal pointer alarm flag and the ARU-31/A has a bank angle command pointer and a bank scale at the bottom of the indicator, along with a bank angle command pointer adjustment knob in the lower left. The ARU-11/A was used in the F-111A. (The ID-1144/AJB-7 indicator is probably the same as the ARU-11/A.) The ARU-21/A was used in the A-7D Corsair. The ARU-31/A was used in the RF-4C Phantom II, the reconnaissance version of the F-4. The photo below shows the cockpit of the RF-4C; note that the attitude indicator in the center of the panel has two knobs.

    Cockpit panel of the RF-4C. Photo from National Museum of the USAF.

    Cockpit panel of the RF-4C. Photo from National Museum of the USAF.

    The indicator was part of the AN/ASN-55 Attitude Heading Reference Set, specified in MIL-A-38329. I think that the indicator originally received its information from an MD-1 gyroscope (MIL-G-25597) and an ML-1 flux valve compass, but I haven't tracked down all the revisions and variants.

    Spec MIL-I-23524 describes an indicator that is almost identical to the ARU-21/A but with white flags. This indicator was also used with the AJB-3A Bomb Release Computing Set, part of the A-4 Skyhawk. This indicator was used with the integrated flight information system MIL-S-23535 which contained the flight director computer MIL-S-23367.

    My indicator has no identifying markings, so I can't be sure of its exact model. Moreover, it has missing components, so it is hard to match up the features. Since my indicator has white flags it might be the ID-1329/A.

     





[#] Wed Oct 09 2024 08:33:13 UTC from rss <>

Subject: Wealth distribution in the United States

[Reply] [ReplyQuoted] [Headers] [Print]

Forbes recently published the Forbes 400 List for 2024, listing the 400 richest people in the United States. This inspired me to make a histogram to show the distribution of wealth in the United States. It turns out that if you put Elon Musk on the graph, almost the entire US population is crammed into a vertical bar, one pixel wide. Each pixel is $500 million wide, illustrating that $500 million essentially rounds to zero from the perspective of the wealthiest Americans.

Graph showing the wealth distribution in the United States.

The histogram above shows the wealth distribution in red. Note that the visible red line is one pixel wide at the left and disappears everywhere else—this is the important point: essentially the entire US population is in that first bar. The graph is drawn with the scale of 1 pixel = $500 million in the X axis, and 1 pixel = 1 million people in the Y axis. Away from the origin, the red line is invisible—a tiny fraction of a pixel tall since so few people have more than 500 million dollars.

Since the median US household wealth is about $190,000, half the population would be crammed into a microscopic red line 1/2500 of a pixel wide using the scale above. (The line would be much narrower than the wavelength of light so it would be literally invisible). The very rich are so rich that you could take someone with a thousand times the median amount of money, and they would still have almost nothing compared to the richest Americans. If you increased their money by a factor of a thousand yet again, you'd be at Bezos' level, but still well short of Elon Musk.

Another way to visualize the extreme distribution of wealth in the US is to imagine everyone in the US standing up while someone counts off millions of dollars, once per second. When your net worth is reached, you sit down. At the first count of $1 million, most people sit down, with 22 million people left standing. As the count continues—$2 million, $3 million, $4 million—more people sit down. After 6 seconds, everyone except the "1%" has taken their seat. As the counting approaches the 17-minute mark, only billionaires are left standing, but there are still days of counting ahead. Bill Gates sits down after a bit over one day, leaving 8 people, but the process is nowhere near the end. After about two days and 20 hours of counting, Elon Musk finally sits down.

Sources

The main source of data is the Forbes 400 List for 2024. Forbes claims there are 813 billionaires in the US here. Median wealth data is from the Federal Reserve; note that it is from 2022 and household rather than personal. The current US population estimate is from Worldometer. I estimated wealth above $500 million, extrapolating from 2019 data.

I made a similar graph in 2013; you can see my post here for comparison.

Disclaimers: Wealth data has a lot of sources of error including people vs households, what gets counted, and changing time periods, but I've tried to make this graph as accurate as possible. I'm not making any prescriptive judgements here, just presenting the data. Obviously, if you want to see the details of the curve, a logarithmic scale makes more sense, but I want to show the "true" shape of the curve. I should also mention that wealth and income are very different things; this post looks strictly at wealth.





[#] Sat Nov 23 2024 11:59:15 UTC from rss <>

Subject: Antenna diodes in the Pentium processor

[Reply] [ReplyQuoted] [Headers] [Print]

I was studying the silicon die of the Pentium processor and noticed some puzzling structures where signal lines were connected to the silicon substrate for no apparent reason. Two examples are in the photo below, where the metal wiring (orange) connects to small square regions of doped silicon (gray), isolated from the rest of the circuitry. I did some investigation and learned that these structures are "antenna diodes," special diodes that protect the circuitry from damage during manufacturing. In this blog post, I discuss the construction of the Pentium and explain how these antenna diodes work.

Closeup of the Pentium die showing the silicon and bottom metal layer. The arrows indicate connections to two antenna diodes. I removed the top two layers of metal for this photo.

Closeup of the Pentium die showing the silicon and bottom metal layer. The arrows indicate connections to two antenna diodes. I removed the top two layers of metal for this photo.

Intel released the Pentium processor in 1993, starting a long-running brand of high-performance processors: the Pentium Pro, Pentium II, and so on. In this post, I'm studying the original Pentium, which has 3.1 million transistors.1 The die photo below shows the Pentium's fingernail-sized silicon die under a microscope. The chip has three layers of metal wiring on top of the silicon so the underlying silicon is almost entirely obscured.

The Pentium die with the main functional blocks labeled. Click this photo (or any other) for a larger version.

The Pentium die with the main functional blocks labeled. Click this photo (or any other) for a larger version.

Modern processors are built from CMOS circuitry, which uses two types of transistors: NMOS and PMOS. The diagram below shows how an NMOS transistor is constructed. A transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain regions (green) consist of silicon doped with impurities to change its semiconductor properties, forming N+ silicon. The gate consists of a layer of polysilicon (red), separated from the silicon by an absurdly thin insulating oxide layer. Since the oxide layer is just a few hundred atoms thick,2 it is very fragile and easily damaged by excess voltage. (This is why CMOS chips are sensitive to static electricity.) As we will see, the oxide layer can also be damaged by voltage during manufacturing.

Diagram showing the structure of an NMOS transistor.

Diagram showing the structure of an NMOS transistor.

The Pentium processor is constructed from multiple layers. Starting at the bottom, the Pentium has millions of transistors similar to the diagram above. Polysilicon wiring on top of the silicon not only forms the transistor gates but also provides short-range wiring. Above that, three layers of metal wiring connect the parts of the chip. Roughly speaking, the bottom layer of metal connects to the silicon and polysilicon to construct logic gates from the transistors, while the upper layers of wiring travel longer distances, with one layer for signals traveling horizontally and the other layer for signals traveling vertically. Tiny tungsten plugs called vias provide connections between the different layers of wiring. A key challenge of chip design is routing, directing signals through the multiple layers of wiring while packing the circuitry as densely as possible.

The photo below shows a small region of the Pentium die with the three metal layers visible. The golden vertical lines are the top metal layer, formed from aluminum and copper. Underneath, you can see the horizontal wiring of the middle metal layer. The more complex wiring of the bottom metal layer can be seen, along with the silicon and polysilicon that form transistors. The small black dots are the tungsten vias that connect metal layers, while the larger dark circles are contacts with the underlying silicon or polysilicon. Near the bottom of the photo, the vertical gray bands are polysilicon lines, forming transistor gates. Although the chip appears flat, it has a three-dimensional structure with multiple layers of metal separated by insulating layers of silicon dioxide. This three-dimensional structure will be important in the discussion below. (The metal wiring is much denser over most of the chip; this region is one of the rare spots where all the layers are visible.)

Closeup of the Pentium die showing the metal layers.
The L-shaped hook towards the lower left is a connection to an antenna diode.
This photo shows a tiny part of the floating point unit. To show all the layers in focus, I combined multiple images with focus stacking.

Closeup of the Pentium die showing the metal layers. The L-shaped hook towards the lower left is a connection to an antenna diode. This photo shows a tiny part of the floating point unit. To show all the layers in focus, I combined multiple images with focus stacking.

The manufacturing process for an integrated circuit is extraordinarily complicated but I'll skip over most of the details and focus on how each metal layer is constructed, layer by layer. First, a uniform metal layer is constructed over the silicon wafer. Next, the desired pattern is produced on the surface using a process called photolithography: a light-sensitive chemical called "resist" is applied to the wafer and exposed to light through a patterned mask. The light hardens the resist, creating a protective coating with the pattern of the desired wiring. Finally, the unprotected metal is etched away, leaving the wiring.

In the early days of integrated circuits, the metal was removed with liquid acids, a process called wet etching. Inconveniently, wet etching tended to eat away metal underneath the edges of the mask, which became a problem as integrated circuits became denser and the wires needed to be thinner. The solution was dry etch, using a plasma to remove the metal. By applying a large voltage to plates above and below the chip, a gas such as HCl is ionized into a highly reactive plasma. This plasma attacks the surface (unless it is protected by the resist), removing the unwanted metal. The advantage of dry etching is that it can act vertically (anisotropically), providing more control over the line width.

Although plasma etching improved the etching process, it caused another problem: plasma-induced oxide damage, also called the "antenna effect."3 The problem is that long metal wires on the chip could pick up an electrical charge from the plasma, producing a large voltage. As described earlier, the thin oxide layer under a transistor's gate is sensitive to voltage damage. The voltage induced by the plasma can destroy the transistor by blowing a hole through the gate oxide or it can degrade the transistor's performance by embedding charges inside the oxide layer.4

Several factors affect the risk of damage from the antenna effect. First, only the transistor's gate is sensitive to the induced voltage, due to the oxide layer. If the wire is also connected to a transistor's source or drain, the wire is "safe" since the source and drain provide connections to the chip's substrate, allowing the charge to dissipate harmlessly. Note that when the chip is completed, every transistor gate is connected to another transistor's source or drain (which provides the signal to the gate), so there is no risk of damage. Thus, the problem can only occur during manufacturing, with a metal line that is connected to a gate on one end but isn't connected on the other end. Moreover, the highest layer of metal is "safe" since everything is connected at that point. Another factor is that the induced voltage is proportional to the length of the metal wire, so short wires don't pose a risk. Finally, only the metal layer currently being etched poses a risk; since the lower layers are insulated by the thick oxide between layers, they won't pick up charge.

These factors motivate several ways to prevent antenna problems.5 First, a long wire can be broken into shorter segments, connected by jumpers on a higher layer. Second, moving long wires to the top metal layer eliminates problems.6 Third, diodes can be added to drain the charge from the wire; these are called "antenna diodes". When the chip is in use, the antenna diodes are reverse-biased so they have no electrical effect. But during manufacturing, the antenna diodes let charge flow to the substrate before it causes problems.

The third solution, the antenna diodes, explains the mysterious connections that I saw in the Pentium. In the diagram below, these diodes are visible on the die as square regions of doped silicon. The larger regions of doped silicon form PMOS transistors (upper) and NMOS transistors (lower). The polysilicon lines are faintly visible; they form transistor gates where they cross the doped silicon. (For this photo, I removed all the metal wiring.)

Closeup of the Pentium die showing transistors. The metal and polysilicon layers have been removed to show the silicon.

Closeup of the Pentium die showing transistors. The metal and polysilicon layers have been removed to show the silicon.

Confusingly, the antenna diodes look almost identical to "well taps", connections from the substrate to the chip's positive voltage supply, but have a completely different purpose. In the Pentium, the PMOS transistors are constructed in "wells" of N-type silicon. These wells must be raised to the chip's positive voltage, so there are numerous well tap connections from the positive supply to the wells. The well taps consist of squares of N+ doped silicon in the the N-type silicon well, providing an electrical connection. On the other hand, the antenna diodes also consist of N+ doped silicon, but embedded in P-type silicon. This forms a P-N junction that creates the diode.

In the Pentium, antenna diodes are used for only a small fraction of the wiring. The diodes require extra area on the die, so they are used only when necessary. Most of the antenna problems on the Pentium were apparently resolved through routing. Although the antenna diodes are relatively rare, they are still frequent enough that they caught my attention.

Antenna effects are still an issue in modern integrated circuits. Integrated circuit fabricators provide rules on the maximum allowable size of antenna wires for a particular manufacturing process.7 Software checks the design to ensure that the antenna rules are not violated, modifying the routing and inserting diodes as necessary. Violating the antenna rules can result in damaged chips and a very low yield, so it's more than just a theoretical issue.

Thanks to /r/chipdesign and Discord for discussion. If you're interested in the Pentium, I've written about standard cells in the Pentium, and the Pentium as a Navajo rug. Follow me on Mastodon (@kenshirriff@oldbytes.space) or Bluesky (@righto.com) or RSS for updates.

Notes and references

  1. In this post, I'm looking at the Pentium model 80501 (codenamed P5). This model was soon replaced with a faster, lower-power version called the 80502 (P54C). Both are considered original Pentiums. 

  2. IC manufacturing drives CPU performance states that gate oxide thickness was 100 to 300 angstroms in 1993. 

  3. The wires are acting metaphorically as antennas, not literally, as they collect charge, not picking up radio waves.

    Plasma-induced oxide damage gave rise to research and conferences in the 1990s to address this problem. The International Symposium on Plasma- and Process-Induced Damage started in 1996 and continued until 2003. Numerous researchers from semiconductor companies and academia studied the causes and effects of plasma damage. 

  4. The damage is caused by "Fowler-Nordheim tunneling", where electrons tunnel through the oxide and cause damage. Flash memory uses this tunneling to erase the memory; the cumulative damage is why flash memory can only be written a limited number of times. 

  5. Some relevant papers: Magnetron etching of polysilicon: Electrical damage (1991), Thin-oxide damage from gate charging during plasma processing (1992), Antenna protection strategy for ultra-thin gate MOSFETs (1998), Fixing antenna problem by dynamic diode dropping and jumper insertion (2000). The Pentium uses the "dynamic diode dropping" approach, adding antenna diodes only as needed, rather than putting them in every circuit. I noticed that the Pentium uses extension wires to put the diode in a more distant site if there is no room for the diode under the existing wiring. As an aside, the third paper uses the curious length unit of kµm; by calling 1000 µm a kµm, you can think in micrometers, even though this unit is normally called a mm. 

  6. Sources say that routing signals on the top metal prevents antenna violations. However, I see several antenna diodes in the Pentium that are connected directly from the bottom metal (M1) through M2 to long lines on M3. These diodes seem redundant since the source/drain connections are in place by that time. So there are still a few mysteries... 

  7. Foundries have antenna rules provided as part of the Process Design Kit (PDK). Here are the rules for MOSIS and SkyWater. I've focused on antenna effects from the metal wiring, but polysilicon and vias can also cause antenna damage. Thus, there are rules for these layers too. Polysilicon wiring is less likely to cause antenna problems, though, as it is usually restricted to short distances due to its higher resistance. 





[#] Sat Dec 28 2024 10:55:48 UTC from rss <>

Subject: Intel's $475 million error: the silicon behind the Pentium division bug

[Reply] [ReplyQuoted] [Headers] [Print]

In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line. The Pentium had many improvements over the previous processor, the Intel 486, including a faster floating-point division algorithm. A year later, Professor Nicely, a number theory professor, was researching reciprocals of twin prime numbers when he noticed a problem: his Pentium sometimes generated the wrong result when performing floating-point division. Intel considered this "an extremely minor technical problem", but much to Intel's surprise, the bug became a large media story. After weeks of criticism, mockery, and bad publicity, Intel agreed to replace everyone's faulty Pentium chips, costing the company $475 million.

In this article, I discuss the Pentium's division algorithm, show exactly where the bug is on the Pentium chip, take a close look at the circuitry, and explain what went wrong. In brief, the division algorithm uses a lookup table. In 1994, Intel stated that the cause of the bug was that five entries were omitted from the table due to an error in a script. However, my analysis shows that 16 entries were omitted due to a mathematical mistake in the definition of the lookup table. Five of the missing entries trigger the bug— also called the FDIV bug after the floating-point division instruction "FDIV"—while 11 of the missing entries have no effect.

This die photo of the Pentium shows the location of the FDIV bug. Click this image (or any other) for a larger version.

This die photo of the Pentium shows the location of the FDIV bug. Click this image (or any other) for a larger version.

Although Professor Nicely brought attention to the FDIV bug, he wasn't the first to find it. In May 1994, Intel's internal testing of the Pentium revealed that very rarely, floating-point division was slightly inaccurate.1 Since only one in 9 billion values caused the problem, Intel's view was that the problem was trivial: "This doesn't even qualify as an errata." Nonetheless, Intel quietly revised the Pentium circuitry to fix the problem.

A few months later, in October, Nicely noticed erroneous results in his prime number computations.2 He soon determined that 1/824633702441 was wrong on three different Pentium computers, but his older computers gave the right answer. He called Intel tech support but was brushed off, so Nicely emailed a dozen computer magazines and individuals about the bug. One of the recipients was Andrew Schulman, author of "Undocumented DOS". He forwarded the email to Richard Smith, cofounder of a DOS software tools company. Smith posted the email on a Compuserve forum, a 1990s version of social media.

A reporter for the journal Electronic Engineering Times spotted the Compuserve post and wrote about the Pentium bug in the November 7 issue: Intel fixes a Pentium FPU glitch. In the article, Intel explained that the bug was in a component of the chip called a PLA (Programmable Logic Array) that acted as a lookup table for the division operation. Intel had fixed the bug in the latest Pentiums and would replace faulty processors for concerned customers.3

The problem might have quietly ended here, except that Intel decided to restrict which customers could get a replacement. If a customer couldn't convince an Intel engineer that they needed the accuracy, they couldn't get a fixed Pentium. Users were irate to be stuck with faulty chips so they took their complaints to online groups such as comp.sys.intel. The controversy spilled over into the offline world on November 22 when CNN reported on the bug. Public awareness of the Pentium bug took off as newspapers wrote about the bug and Intel became a punchline on talk shows.4

The situation became intolerable for Intel on December 12 when IBM announced that it was stopping shipments of Pentium computers.5 On December 19, less than two months after Nicely first reported the bug, Intel gave in and announced that it would replace the flawed chips for all customers.6 This recall cost Intel $475 million (over a billion dollars in current dollars).

Meanwhile, engineers and mathematicians were analyzing the bug, including Tim Coe, an engineer who had designed floating-point units.7 Remarkably, by studying the Pentium's bad divisions, Coe reverse-engineered the Pentium's division algorithm and determined why it went wrong. Coe and others wrote papers describing the mathematics behind the Pentium bug.8 But until now, nobody has shown how the bug is implemented in the physical chip itself.

A quick explanation of floating point numbers

At this point, I'll review a few important things about floating point numbers. A binary number can have a fractional part, similar to a decimal number. For instance, the binary number 11.1001 has four digits after the binary point. (The binary point "." is similar to the decimal point, but for a binary number.) The first digit after the binary point represents 1/2, the second represents 1/4, and so forth. Thus, 11.1001 corresponds to 3 + 1/2 + 1/16 = 3.5625. A "fixed point" number such as this can express a fractional value, but its range is limited.

Floating point numbers, on the other hand, include very large numbers such as 6.02×1023 and very small numbers such as 1.055×10−34. In decimal, 6.02×1023 has a significand (or mantissa) of 6.02, multiplied by a power of 10 with an exponent of 23. In binary, a floating point number is represented similarly, with a significand and exponent, except the significand is multiplied by a power of 2 rather than 10.

Computers have used floating point since the early days of computing, especially for scientific computing. For many years, different computers used incompatible formats for floating point numbers. Eventually, a standard arose when Intel developed the 8087 floating point coprocessor chip for use with the 8086/8088 processor. The characteristics of this chip became a standard (IEEE 754) in 1985.9 Subsequently, most computers, including the Pentium, implemented floating point numbers according to this standard. The result of a basic arithmetic operation is supposed to be accurate up to the last bit of the significand. Unfortunately, division on the Pentium was occasionally much, much worse.

How SRT division works

How does a computer perform division? The straightforward way is similar to grade-school long division, except in binary. That approach was used in the Intel 486 and earlier processors, but the process is slow, taking one clock cycle for each bit of the quotient. The Pentium uses a different approach called SRT, performing division in base four. Thus, SRT generates two bits of the quotient per step, rather than one, so division is twice as fast. I'll explain SRT in a hand-waving manner with a base-10 example; rigorous explanations are available elsewhere.10

The diagram below shows base-10 long division, with the important parts named. The dividend is divided by the divisor, yielding the quotient. In each step of the long division algorithm, you generate one more digit of the quotient. Then you multiply the divisor (1535) by the quotient digit (2) and subtract this from the dividend, leaving a partial remainder. You multiply the partial remainder by 10 and then repeat the process, generating a quotient digit and partial remainder at each step. The diagram below stops after two quotient digits, but you can keep going to get as much accuracy as desired.

Base-10 division, naming the important parts.

Base-10 division, naming the important parts.

Note that division is more difficult than multiplication since there is no easy way to determine each quotient digit. You have to estimate a quotient digit, multiply it by the divisor, and then check if the quotient digit is correct. For example, you have to check carefully to see if 1535 goes into 4578 two times or three times.

The SRT algorithm makes it easier to select the quotient digit through an unusual approach: it allows negative digits in the quotient. With this change, the quotient digit does not need to be exact. If you pick a quotient digit that is a bit too large, you can use a negative number for the next digit: this will counteract the too-large digit since the next divisor will be added rather than subtracted.

The example below shows how this works. Suppose you picked 3 instead of 2 as the first quotient digit. Since 3 is too big, the partial remainder is negative (-261). In normal division, you'd need to try again with a different quotient digit. But with SRT, you keep going, using a negative digit (-1) for the quotient digit in the next step. At the end, the quotient with positive and negative digits can be converted to the standard form: 3×10-1 = 29, the same quotient as before.

Base-10 division, using a negative quotient digit. The result is the same as the previous example.

Base-10 division, using a negative quotient digit. The result is the same as the previous example.

One nice thing about the SRT algorithm is that since the quotient digit only needs to be close, a lookup table can be used to select the quotient digit. Specifically, the partial remainder and divisor can be truncated to a few digits, making the lookup table a practical size. In this example, you could truncate 1535 and 4578 to 15 and 45, the table says that 15 goes into 45 three times, and you can use 3 as your quotient digit.

Instead of base 10, the Pentium uses the SRT algorithm in base 4: groups of two bits. As a result, division on the Pentium is twice as fast as standard binary division. With base-4 SRT, each quotient digit can be -2, -1, 0, 1, or 2. Multiplying by any of these values is very easy in hardware since multiplying by 2 can be done by a bit shift. Base-4 SRT does not require quotient digits of -3 or 3; this is convenient since multiplying by 3 is somewhat difficult. To summarize, base-4 SRT is twice as fast as regular binary division, but it requires more hardware: a lookup table, circuitry to add or subtract multiples of 1 or 2, and circuitry to convert the quotient to the standard form.

Structure of the Pentium's lookup table

The purpose of the SRT lookup table is to provide the quotient digit. That is, the table takes the partial remainder p and the divisor d as inputs and provides an appropriate quotient digit. The Pentium's lookup table is the cause of the division bug, as was explained in 1994. The table was missing five entries; if the SRT algorithm accesses one of these missing entries, it generates an incorrect result. In this section, I'll discuss the structure of the lookup table and explain what went wrong.

The Pentium's lookup table contains 2048 entries, as shown below. The table has five regions corresponding to the quotient digits +2, +1, 0, -1, and -2. Moreover, the upper and lower regions of the table are unused (due to the mathematics of SRT). The unused entries were filled with 0, which turns out to be very important. In particular, the five red entries need to contain +2 but were erroneously filled with 0.

The 2048-entry lookup table used in the Pentium for division. The divisor is along the X-axis, from 1 to 2. The partial remainder is along the Y-axis, from -8 to 8. Click for a larger version.

The 2048-entry lookup table used in the Pentium for division. The divisor is along the X-axis, from 1 to 2. The partial remainder is along the Y-axis, from -8 to 8. Click for a larger version.

When the SRT algorithm uses the table, the partial remainder p and the divisor d are inputs. The divisor (scaled to fall between 1 and 2) provides the X coordinate into the table, while the partial remainder (between -8 and 8) provides the Y coordinate. The details of the table coordinates will be important, so I'll go into some detail. To select a cell, the divisor (X-axis) is truncated to a 5-bit binary value 1.dddd. (Since the first digit of the divisor is always 1, it is ignored for the table lookup.) The partial remainder (Y-axis) is truncated to a 7-bit signed binary value pppp.ppp. The 11 bits indexing into the table result in a table with 211 (2048) entries. The partial remainder is expressed in 2's complement, so values 0000.000 to 0111.111 are non-negative values from 0 to (almost) 8, while values 1000.000 to 1111.111 are negative values from -8 to (almost) 0. (To see the binary coordinates for the table, click on the image and zoom in.)

The lookup table is implemented in a Programmable Logic Array (PLA)

In this section, I'll explain how the lookup table is implemented in hardware in the Pentium. The lookup table has 2048 entries so it could be stored in a ROM with 2048 two-bit outputs.11 (The sign is not explicitly stored in the table because the quotient digit sign is the same as the partial remainder sign.) However, because the table is highly structured (and largely empty), the table can be stored more compactly in a structure called a Programmable Logic Array (PLA).12 By using a PLA, the Pentium stored the table in just 112 rows rather than 2048 rows, saving an enormous amount of space. Even so, the PLA is large enough on the chip that it is visible to the naked eye, if you squint a bit.

Zooming in on the PLA and associated circuitry on the Pentium die.

Zooming in on the PLA and associated circuitry on the Pentium die.

The idea of a PLA is to provide a dense and flexible way of implementing arbitrary logic functions. Any Boolean logic function can be expressed as a "sum-of-products", a collection of AND terms (products) that are OR'd together (summed). A PLA has a block of circuitry called the AND plane that generates the desired sum terms. The outputs of the AND plane are fed into a second block, the OR plane, which ORs the terms together. The AND plane and the OR plane are organized as grids. Each gridpoint can either have a transistor or not, defining the logic functions. The point is that by putting the appropriate pattern of transistors in the grids, you can create any function. For the division PLA, there are has 22 inputs (the 11 bits from the divisor and partial remainder indices, along with their complements) and two outputs, as shown below.13

A simplified diagram of the division PLA.

A simplified diagram of the division PLA.

A PLA is more compact than a ROM if the structure of the function allows it to be expressed with a small number of terms.14 One difficulty with a PLA is figuring out how to express the function with the minimum number of terms to make the PLA as small as possible. It turns out that this problem is NP-complete in general. Intel used a program called Espresso to generate compact PLAs using heuristics.15

The diagram below shows the division PLA in the Pentium. The PLA has 120 rows, split into two 60-row parts with support circuitry in the middle.16 The 11 table input bits go into the AND plane drivers in the middle, which produce the 22 inputs to the PLA (each table input and its complement). The outputs from the AND plane transistors go through output buffers and are fed into the OR plane. The outputs from the OR plane go through additional buffers and logic in the center, producing two output bits, indicating a ±1 or ±2 quotient. The image below shows the updated PLA that fixes the bug; the faulty PLA looks similar except the transistor pattern is different. In particular, the updated PLA has 46 unused rows at the bottom while the original, faulty PLA has 8 unused rows.

The division PLA with the metal layers removed to show the silicon. This image shows the PLA in the updated Pentium, since that photo came out better.

The division PLA with the metal layers removed to show the silicon. This image shows the PLA in the updated Pentium, since that photo came out better.

The image below shows part of the AND plane of the PLA. At each point in the grid, a transistor can be present or absent. The pattern of transistors in a row determines the logic term for that row. The vertical doped silicon lines (green) are connected to ground. The vertical polysilicon lines (red) are driven with the input bit pattern. If a polysilicon line crosses doped silicon, it forms a transistor (orange) that will pull that row to ground when activated.17 A metal line connects all the transistor rows in a row to produce the output; most of the metal has been removed, but some metal lines are visible at the right.

Part of the AND plane in the fixed Pentium. I colored the first silicon and polysilicon lines green and red respectively.

Part of the AND plane in the fixed Pentium. I colored the first silicon and polysilicon lines green and red respectively.

By carefully examining the PLA under a microscope, I extracted the pattern of transistors in the PLA grid. (This was somewhat tedious.) From the transistor pattern, I could determine the equations for each PLA row, and then generate the contents of the lookup table. Note that the transistors in the PLA don't directly map to the table contents (unlike a ROM). Thus, there is no specific place for transistors corresponding to the 5 missing table entries.

The left-hand side of the PLA implements the OR planes (below). The OR plane determines if the row output produces a quotient of 1 or 2. The OR plane is oriented 90° relative to the AND plane: the inputs are horizontal polysilicon lines (red) while the output lines are vertical. As before, a transistor (orange) is formed where polysilicon crosses doped silicon. Curiously, each OR plane has four outputs, even though the PLA itself has two outputs.18

Part of the OR plane of the division PLA. I removed the metal layers to show the underlying silicon and polysilicon. I drew lines for ground and outputs, showing where the metal lines were.

Part of the OR plane of the division PLA. I removed the metal layers to show the underlying silicon and polysilicon. I drew lines for ground and outputs, showing where the metal lines were.

Next, I'll show exactly how the AND plane produces a term. For the division table, the inputs are the 7 partial remainder bits and 4 divisor bits, as explained earlier. I'll call the partial remainder bits p6p5p4p3.p2p1p0 and the divisor bits 1.d3d2d1d0. These 11 bits and their complements are fed vertically into the PLA as shown at the top of the diagram below. These lines are polysilicon, so they will form transistor gates, turning on the corresponding transistor when activated. The arrows at the bottom point to nine transistors in the first row. (It's tricky to tell if the polysilicon line passes next to doped silicon or over the silicon, so the transistors aren't always obvious.) Looking at the transistors and their inputs shows that the first term in the PLA is generated by p0p1p2p3p4'p5p6d1d2.

The first row of the division PLA in a faulty Pentium.

The first row of the division PLA in a faulty Pentium.

The diagram below is a closeup of the lookup table, showing how this PLA row assigns the value 1 to four table cells (dark blue). You can think of each term of the PLA as pattern-matching to a binary pattern that can include "don't care" values. The first PLA term (above) matches the pattern P=110.1111, D=x11x, where the "don't care" x values can be either 0 or 1. Since one PLA row can implement multiple table cells, the PLA is more efficient than a ROM; the PLA uses 112 rows, while a ROM would require 2048 rows.

The first entry in the PLA assigns the value 1 to the four dark blue cells.

The first entry in the PLA assigns the value 1 to the four dark blue cells.

Geometrically, you can think of each PLA term (row) as covering a rectangle or rectangles in the table. However, the rectangle can't be arbitrary, but must be aligned on a bit boundary. Note that each "bump" in the table boundary (magenta) requires a separate rectangle and thus a separate PLA row. (This will be important later.)

One PLA row can generate a large rectangle, filling in many table cells at once, if the region happens to be aligned nicely. For instance, the third term in the PLA matches d=xxxx, p=11101xx. This single PLA row efficiently fills in 64 table cells as shown below, replacing the 64 rows that would be required in a ROM.

The third entry in the PLA assigns the value 1 to the 64 dark blue cells.

The third entry in the PLA assigns the value 1 to the 64 dark blue cells.

To summarize, the pattern of transistors in the PLA implements a set of equations, which define the contents of the table, setting the quotient to 1 or 2 as appropriate. Although the table has 2048 entries, the PLA represents the contents in just 112 rows. By carefully examining the transistor pattern, I determined the table contents in a faulty Pentium and a fixed Pentium.

The mathematical bounds of the lookup table

As shown earlier, the lookup table has regions corresponding to quotient digits of +2, +1, 0, -1, and -2. These regions have irregular, slanted shapes, defined by mathematical bounds. In this section, I'll explain these mathematical bounds since they are critical to understanding how the Pentium bug occurred.

The essential step of the division algorithm is to divide the partial remainder p by the divisor d to get the quotient digit. The following diagram shows how p/d determines the quotient digit. The ratio p/d will define a point on the line at the top. (The point will be in the range [-8/3, 8/3] for mathematical reasons.) The point will fall into one of the five lines below, defining the quotient digit q. However, the five quotient regions overlap; if p/d is in one of the green segments, there are two possible quotient digits. The next part of the diagram illustrates how subtracting q*d from the partial remainder p shifts p/d into the middle, between -2/3 and 2/3. Finally, the result is multiplied by 4 (shifted left by two bits), expanding19 the interval back to [-8/3, 8/3], which is the same size as the original interval. The 8/3 bound may seem arbitrary, but the motivation is that it ensures tht the new interval is the same size as the original interval, so the process can be repeated. (The bounds are all thirds for algebraic reasons; the value 3 comes from base 4 minus 1.20)

The input to a division step is processed, yielding the input to the next step.

The input to a division step is processed, yielding the input to the next step.

Note that the SRT algorithm has some redundancy, but cannot handle q values that are "too wrong". Specifically, if p/d is in a green region, then either of two q values can be selected. However, the algorithm cannot recover from a bad q value in general. The relevant case is that if q is supposed to be 2 but 0 is selected, the next partial remainder will be outside the interval and the algorithm can't recover. This is what causes the FDIV bug.

The diagram below shows the structure of the SRT lookup table (also called the P-D table since the axes are p and d). Each bound in the diagram above turns into a line in the table. For instance, the green segment above with p/d between 4/3 and 5/3 turns into a green region in the table below with 4/3 d ≤ p ≤ 5/3 d. These slanted lines show the regions in which a particular quotient digit q can be used.

The P-D table specifies the quotient digit for a partial remainder (Y-axis) and divisor (X-axis).

The P-D table specifies the quotient digit for a partial remainder (Y-axis) and divisor (X-axis).

The lookup table in the Pentium is based on the above table, quantized with a q value in each cell. However, there is one more constraint to discuss.

Carry-save and carry-lookahead adders

The Pentium's division circuitry uses a special circuit to perform addition and subtraction efficiently: the carry-save adder. One consequence of this adder is that each access to the lookup table may go to the cell just below the "right" cell. This is expected and should be fine, but in very rare and complicated circumstances, this behavior causes an access to one of the Pentium's five missing cells, triggering the division bug. In this section, I'll discuss why the division circuitry uses a carry-save adder, how the carry-save adder works, and how the carry-save adder triggers the FDIV bug.

The problem with addition is that carries make addition slow. Consider calculating 99999+1 by hand. You'll start with 9+1=10, then carry the one, generating another carry, which generates another carry, and so forth, until you go through all the digits. Computer addition has the same problem. If you're adding, say, two 64-bit numbers, the low-order bits can generate a carry that then propagates through all 64 bits. The time for the carry signal to go through 64 layers of circuitry is significant and can limit CPU performance. As a result, CPUs use special circuits to make addition faster.

The Pentium's division circuitry uses an unusual adder circuit called a carry-save adder to add (or subtract) the divisor and the partial remainder. A carry-save adder speeds up addition if you are performing a bunch of additions, as happens during division. The idea is that instead of adding a carry to each digit as it happens, you hold onto the carries in a separate word. As a decimal example, 499+222 would be 611 with carries 011; you don't carry the one to the second digit, but hold onto it. The next time you do an addition, you add in the carries you saved previously, and again save any new carries. The advantage of the carry-save adder is that the sum and carry at each digit position can be computed in parallel, which is fast. The disadvantage is that you need to do a slow addition at the end of the sequence of additions to add in the remaining carries to get the final answer. But if you're performing multiple additions (as for division), the carry-save adder is faster overall.

The carry-save adder creates a problem for the lookup table. We need to use the partial remainder as an index into the lookup table. But the carry-save adder splits the partial remainder into two parts: the sum bits and the carry bits. To get the table index, we need to add the sum bits and carry bits together. Since this addition needs to happen for every step of the division, it seems like we're back to using a slow adder and the carry-save adder has just made things worse.

The trick is that we only need 7 bits of the partial remainder for the table index, so we can use a different type of adder—a carry-lookahead adder—that calculates each carry in parallel using brute force logic. The logic in a carry-lookahead adder gets more and more complex for each bit so a carry-lookahead adder is impractical for large words, but it is practical for a 7-bit value.

The photo below shows the carry-lookahead adder used by the divider. Curiously, the adder is an 8-bit adder but only 7 bits are used; perhaps the 8-bit adder was a standard logic block at Intel.21 I'll just give a quick summary of the adder here, and leave the details for another post. At the top, logic gates compute signals in parallel for each of the 8 pairs of inputs: sum, carry generate, and carry propagate. Next, the complex carry-lookahead logic determines in parallel if there will be a carry at each position. Finally, XOR gates apply the carry to each bit. The circuitry in the middle is used for testing; see the footnote.22 At the bottom, the drivers amplify control signals for various parts of the adder and send the PLA output to other parts of the chip.23 By counting the blocks of repeated circuitry, you can see which blocks are 8 bits wide, 11, bits wide, and so forth. The carry-lookahead logic is different for each bit, so there is no repeated structure.

The carry-lookahead adder that feeds the lookup table. This block of circuitry is just above the PLA on the die. I removed the metal layers, so this photo shows the doped silicon (dark) and the polysilicon (faint gray).

The carry-lookahead adder that feeds the lookup table. This block of circuitry is just above the PLA on the die. I removed the metal layers, so this photo shows the doped silicon (dark) and the polysilicon (faint gray).

The carry-save and carry-lookahead adders may seem like implementation trivia, but they are a critical part of the FDIV bug because they change the constraints on the table. The cause is that the partial remainder is 64 bits,24 but the adder that computes the table index is 7 bits. Since the rest of the bits are truncated before the sum, the partial remainder sum for the table index can be slightly lower than the real partial remainder. Specifically, the table index can be one cell lower than the correct cell, an offset of 1/8. Recall the earlier diagram with diagonal lines separating the regions. Some (but not all) of these lines must be shifted down by 1/8 to account for the carry-save effect, but Intel made the wrong adjustment, which is the root cause of the FDIV error. (This effect was well-known at the time and mentioned in papers on SRT division, so Intel shouldn't have gotten it wrong.)

An interesting thing about the FDIV bug is how extremely rare it is. With 5 bad table entries out of 2048, you'd expect erroneous divides to be very common. However, for complicated mathematical reasons involving the carry-save adder the missing table entries are almost never encountered: only about 1 in 9 billion random divisions will encounter a problem. To hit a missing table entry, you need an "unlucky" result from the carry-save adder multiple times in a row, making the odds similar to winning the lottery, if the lottery prize were a division error.25

What went wrong in the lookup table

I consider the diagram below to be the "smoking gun" that explains how the FDIV bug happens: the top magenta line should be above the sloping black line, but it crosses the black line repeatedly. The magenta line carefully stays above the gray line, but that's the wrong line. In other words, Intel picked the wrong bounds line when defining the +2 region of the table. In this section, I'll explain why that causes the bug.

The top half of the lookup table, explaining the root of the FDIV bug.

The top half of the lookup table, explaining the root of the FDIV bug.

The diagram is colored according to the quotient values stored in the Pentium's lookup table: yellow is +2, blue is +1, and white is 0, with magenta lines showing the boundaries between different values. The diagonal black lines are the mathematical constraints on the table, defining the region that must be +2, the region that can be +1 or +2, the region that must be +1, and so forth. For the table to be correct, each cell value in the table must satisfy these constraints. The middle magenta line is valid: it remains between the two black lines (the redundant +1 or +2 region), so all the cells that need to be +1 are +1 and all the cells that need to be +2 are +2, as required. Likewise, the bottom magenta line remains between the black lines. However, the top magenta line is faulty: it must remain above the top black line, but it crosses the black line. The consequence is that some cells that need to be +2 end up holding 0: these are the missing cells that caused the FDIV bug.

Note that the top magenta line stays above the diagonal gray line while following it as closely as possible. If the gray line were the correct line, the table would be perfect. Unfortunately, Intel picked the wrong constraint line for the table's upper bound when the table was generated.26

But why are some diagonal lines lowered by 1/8 and other lines are not lowered? As explained in the previous section, as a consequence of the carry-save adder truncation, the table lookup may end up one cell lower than the actual p value would indicate, i.e. the p value for the table index is 1/8 lower than that actual value. Thus, both the correct cell and the cell below must satisfy the SRT constraints. Thus, the line moves down if that makes the constraints stricter but does not move down if that would expand the redundant area. In particular, the top line must not be move down, but clearly Intel moved the line down and generated the faulty lookup table.

Intel, however, has a different explanation for the bug. The Intel white paper states that the problem was in a script that downloaded the table into a PLA: an error caused the script to omit a few entries from the PLA.27 I don't believe this explanation: the missing terms match a mathematical error, not a copying error. I suspect that Intel's statement is technically true but misleading: they ran a C program (which they called a script) to generate the table but the program had a mathematical error in the bounds.

In his book "The Pentium Chronicles", Robert Colwell, architect of the Pentium Pro, provides a different explanation of the FDIV bug. Colwell claims that the Pentium design originally used the same lookup table as the 486, but shortly before release, the engineers were pressured by management to shrink the circuitry to save die space. The engineers optimized the table to make it smaller and had a proof that the optimization would work. Unfortunately, the proof was faulty, but the testers trusted the engineers and didn't test the modification thoroughly, causing the Pentium to be released with the bug. The problem with this explanation is that the Pentium was designed from the start with a completely different division algorithm from the 486: the Pentium uses radix-4 SRT, while the 486 uses standard binary division. Since the 486 doesn't have a lookup table, the story falls apart. Moreover, the PLA could trivially have been made smaller by removing the 8 unused rows, so the engineers clearly weren't trying to shrink it. My suspicion is that since Colwell developed the Pentium Pro in Oregon but the original Pentium was developed in California, Colwell didn't get firsthand information on the Pentium problems.

How Intel fixed the bug

Intel's fix for the bug was straightforward but also surprising. You'd expect that Intel added the five missing table values to the PLA, and this is what was reported at the time. The New York Times wrote that Intel fixed the flaw by adding several dozen transistors to the chip. EE Times wrote that "The fix entailed adding terms, or additional gate-sequences, to the PLA."

However, the updated PLA (below) shows something entirely different. The updated PLA is exactly the same size as the original PLA. However, about 1/3 of the terms were removed from the PLA, eliminating hundreds of transistors. Only 74 of the PLA's 120 rows are used, and the rest are left empty. (The original PLA had 8 empty rows.) How could removing terms from the PLA fix the problem?

The updated PLA has 46 unused rows.

The updated PLA has 46 unused rows.

The explanation is that Intel didn't just fill in the five missing table entries with the correct value of 2. Instead, Intel filled all the unused table entries with 2, as shown below. This has two effects. First, it eliminates any possibility of hitting a mistakenly-empty entry. Second, it makes the PLA equations much simpler. You might think that more entries in the table would make the PLA larger, but the number of PLA terms depends on the structure of the data. By filling the unused cells with 2, the jagged borders between the unused regions (white) and the "2" regions (yellow) disappear. As explained earlier, a large rectangle can be covered by a single PLA term, but a jagged border requires a lot of terms. Thus, the updated PLA is about 1/3 smaller than the original, flawed PLA. One consequence is that the terms in the new PLA are completely different from the terms in the old PLA so one can't point to the specific transistors that fixed the bug.

Comparison of the faulty lookup table (left) and the corrected lookup table (right).

Comparison of the faulty lookup table (left) and the corrected lookup table (right).

The image below shows the first 14 rows of the faulty PLA and the first 14 rows of the fixed PLA. As you can see, the transistor pattern (and thus the PLA terms) are entirely different. The doped silicon is darkened in the second image due to differences in how I processed the dies to remove the metal layers.

Top of the faulty PLA (left) and the fixed PLA (right). The metal layers were removed to show the silicon of the transistors. (Click for a larger image.)

Top of the faulty PLA (left) and the fixed PLA (right). The metal layers were removed to show the silicon of the transistors. (Click for a larger image.)

Impact of the FDIV bug

How important is the Pentium bug? This became a highly controversial topic. A failure of a random division operation is very rare: about one in 9 billion values will trigger the bug. Moreover, an erroneous division is still mostly accurate: the error is usually in the 9th or 10th decimal digit, with rare worst-case error in the 4th significant digit. Intel's whitepaper claimed that a typical user would encounter a problem once every 27,000 years, insignificant compared to other sources of error such as DRAM bit flips. Intel said: "Our overall conclusion is that the flaw in the floating point unit of the Pentium processor is of no concern to the vast majority of users. A few users of applications in the scientific/engineering and financial engineering fields may need to employ either an updated processor without the flaw or a software workaround."

However, IBM performed their own analysis,29 suggesting that the problem could hit customers every few days, and IBM suspended Pentium sales. (Coincidentally, IBM had a competing processor, the PowerPC.) The battle made it to major newspapers; the Los Angeles Times split the difference with Study Finds Both IBM, Intel Off on Error Rate. Intel soon gave in and agreed to replace all the Pentiums, making the issue moot.

I mostly agree with Intel's analysis. It appears that only one person (Professor Nicely) noticed the bug in actual use.28 The IBM analysis seems contrived to hit numbers that trigger the error. Most people would never hit the bug and even if they hit it, a small degradation in floating-point accuracy is unlikely to matter to most people. Looking at society as a whole, replacing the Pentiums was a huge expense for minimal gain. On the other hand, it's reasonable for customers to expect an accurate processor.

Note that the Pentium bug is deterministic: if you use a specific divisor and dividend that trigger the problem, you will get the wrong answer 100% of the time. Pentium engineer Ken Shoemaker suggested that the outcry over the bug was because it was so easy for customers to reproduce. It was hard for Intel to argue that customers would never encounter the bug when customers could trivially see the bug on their own computer, even if the situation was artificial.

Conclusions

The FDIV bug is one of the most famous processor bugs. By examining the die, it is possible to see exactly where it is on the chip. But Intel has had other important bugs. Some early 386 processors had a 32-bit multiply problem. Unlike the deterministic FDIV bug, the 386 would unpredictably produce the wrong results under particular temperature/voltage/frequency conditions. The underlying issue was a layout problem that didn't provide enough elctrical margin to handle the worst-case situation. Intel sold the faulty chips but restricted them to the 16-bit market; bad chips were labeled "16 BIT S/W ONLY", while the good processors were marked with a double sigma. Although Intel had to suffer through embarrassing headlines such as Some 386 Systems Won't Run 32-Bit Software, Intel Says, the bug was soon forgotten.

Bad and good versions of the 386. Note the labels on the bottom line. Photos (L), (R) by Thomas Nguyen, (CC BY-SA 4.0)

Bad and good versions of the 386. Note the labels on the bottom line. Photos (L), (R) by Thomas Nguyen, (CC BY-SA 4.0)

Another memorable Pentium issue was the "F00F bug", a problem where a particular instruction sequence starting with F0 0F would cause the processor to lock up until rebooted.30 The bug was found in 1997 and solved with an operating system update. The bug is presumably in the Pentium's voluminous microcode. The microcode is too complex for me to analyze, so don't expect a detailed blog post on this subject. :-)

You might wonder why Intel needed to release a new revision of the Pentium to fix the FDIV bug, rather than just updating the microcode. The problem was that microcode for the Pentium (and earlier processors) was hard-coded into a ROM and couldn't be modified. Intel added patchable microcode to the Pentium Pro (1995), allowing limited modifications to the microcode. Intel originally implemented this feature for chip debugging and testing. But after the FDIV bug, Intel realized that patchable microcode was valuable for bug fixes too.31 The Pentium Pro stores microcode in ROM, but it also has a static RAM that holds up to 60 microinstructions. During boot, the BIOS can load a microcode patch into this RAM. In modern Intel processors, microcode patches have been used for problems ranging from the Spectre vulnerability to voltage problems.

The Pentium PLA with the top metal layer removed, revealing the M2 and M1 layers. The OR and AND planes are at the top and bottom, with drivers and control logic in the middle.

The Pentium PLA with the top metal layer removed, revealing the M2 and M1 layers. The OR and AND planes are at the top and bottom, with drivers and control logic in the middle.

As the number of transistors in a processor increased exponentially, as described by Moore's Law, processors used more complex circuits and algorithms. Division is one example. Early microprocessors such as the Intel 8080 (1974, 6000 transistors) had no hardware support for division or floating point arithmetic. The Intel 8086 (1978, 29,000 transistors) implemented integer division in microcode but required the 8087 coprocessor chip for floating point. The Intel 486 (1989, 1.2 million transistors) added floating-point support on the chip. The Pentium (1993, 3.1 million transistors) moved to the faster but more complicated SRT division algorithm. The Pentium's division PLA alone has roughly 4900 transistor sites, more than a MOS Technology 6502 processor—one component of the Pentium's division circuitry uses more transistors than an entire 1975 processor.

The long-term effect of the FDIV bug on Intel is a subject of debate. On the one hand, competitors such as AMD benefitted from Intel's error. AMD's ads poked fun at the Pentium's problems by listing features of AMD's chips such as "You don't have to double check your math" and "Can actually handle the rigors of complex calculations like division." On the other hand, Robert Colwell, architect of the Pentium Pro, said that the FDIV bug may have been a net benefit to Intel as it created enormous name recognition for the Pentium, along with a demonstration that Intel was willing to back up its brand name. Industry writers agreed; see The Upside of the Pentium Bug. In any case, Intel survived the FDIV bug; time will tell how Intel survives its current problems.

I plan to write more about the implementation of the Pentium's PLA, the adder, and the test circuitry. Until then, you may enjoy reading about the Pentium Navajo rug. (The rug represents the P54C variant of the Pentium, so it is safe from the FDIV bug.) Thanks to Bob Colwell and Ken Shoemaker for helpful discussions.

Footnotes and references

  1. The book Inside Intel says that Vin Dham, the "Pentium czar", found the FDIV problem in May 1994. The book "The Pentium Chronicles" says that Patrice Roussel, the floating-point architect for Intel's upcoming Pentium Pro processor, found the FDIV problem in Summer 1994. I suspect that the bug was kept quiet inside Intel and was discovered more than once. 

  2. The divisor being a prime number has nothing to do with the bug. It's just a coincidence that the problem was found during research with prime numbers. 

  3. See Nicely's FDIV page for more information on the bug and its history. Other sources are the books Creating the Digital Future, The Pentium Chronicles, and Inside Intel. The New York Times wrote about the bug: Flaw Undermines Accuracy of Pentium Chips. Computerworld wrote Intel Policy Incites User Threats on threats of a class-action lawsuit. IBM's response is described in IBM Deals Blow to a Rival as it Suspends Pentium Sales 

  4. Talk show host David Letterman joked about the Pentium on December 15: "You know what goes great with those defective Pentium chips? Defective Pentium salsa!" Although a list of Letterman-style top ten Pentium slogans circulated, the list was a Usenet creation. There's a claim that Jay Leno also joked about the Pentium, but I haven't found verification. 

  5. Processors have many more bugs than you might expect. Intel's 1995 errata list for the Pentium had "21 errata (including the FDIV problem), 4 changes, 16 clarifications, and 2 documentation changes." See Pentium Processor Specification Update and Intel Releases Pentium Errata List

  6. Intel published full-page newspaper ads apologizing for its handling of the problem, stating: "What Intel continues to believe is an extremely minor technical problem has taken on a life of its own."

    Intel's apology letter, published in Financial Times. Note the UK country code in the phone number.

    Intel's apology letter, published in Financial Times. Note the UK country code in the phone number.

     

  7. Tim Coe's reverse engineering of the Pentium divider was described on the Usenet group comp.sys.intel, archived here. To summarize, Andreas Kaiser found 23 failing reciprocals. Tim Coe determined that most of these failing reciprocals were of the form 3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J))). He recognized that the factor of 2 indicated a radix-4 divider. The extremely low probability of error indicated the presence of a carry save adder; the odds of both the sum and carry bits getting long patterns of ones were very low. Coe constructed a simulation of the divider that matched the Pentium's behavior and noted which table entries must be faulty. 

  8. The main papers on the FDIV bug are Computational Aspects of the Pentium Affair, It Takes Six Ones to Reach a Flaw, The Mathematics of the Pentium Division Bug, The Truth Behind the Pentium Bug, Anatomy of the Pentium Bug, and Risk Analysis of the Pentium Bug. Intel's whitepaper is Statistical Analysis of Floating Point Flaw in the Pentium Processor; I archived IBM's study here

  9. The Pentium uses floating point numbers that follow the IEEE 754 standard. Internally, floating point numbers are represented with 80 bits: 1 bit for the sign, 15 bits for the exponent, and 64 bits for the significand. Externally, floating point numbers are 32-bit single-precision numbers or 64-bit double-precision numbers. Note that the number of significand bits limits the accuracy of a floating-point number. 

  10. The SRT division algorithm is named after the three people who independently created it in 1957-1958: Sweeney at IBM, Robertson at the University of Illinois, and Tocher at Imperial College London. The SRT algorithm was developed further by Atkins in his PhD research (1970).

    The SRT algorithm became more practical in the 1980s as chips became denser. Taylor implemented the SRT algorithm on a board with 150 chips in 1981. The IEEE floating point standard (1985) led to a market for faster floating point circuitry. For instance, the Weitek 4167 floating-point coprocessor chip (1989) was designed for use with the Intel 486 CPU (datasheet) and described in an influential paper. Another important SRT implementation is the MIPS R3010 (1988), the coprocessor for the R3000 RISC processor. The MIPS R3010 uses radix-4 SRT for division with 9 bits from the partial remainder and 9 bits from the divisor, making for a larger lookup table and adder than the Pentium (link).

    To summarize, when Intel wanted to make division faster on the Pentium (1993), the SRT algorithm was a reasonable choice. Competitors had already implemented SRT and multiple papers explained how SRT worked. The implementation should have been straightforward and bug-free. 

  11. The dimensions of the lookup table can't be selected arbitrarily. In particular, if the table is too small, a cell may need to hold two different q values, which isn't possible. Note that constructing the table is only possible due to the redundancy of SRT. For instance, if some values in the call require q=1 and other values require q=1 or 2, then the value q=1 can be assigned to the cell. 

  12. In the white paper, Intel calls the PLA a Programmable Lookup Array, but that's an error; it's a Programmable Logic Array. 

  13. I'll explain a PLA in a bit more detail in this footnote. An example of a sum-of-products formula with inputs a and b is ab' + a'b + ab. This formula has three sum terms, so it requires three rows in the PLA. However, this formula can be reduced to a + b, which uses a smaller two-row PLA. Note that any formula can be trivially expressed with a separate product term for each 1 output in the truth table. The hard part is optimizing the PLA to use fewer terms. 

  14. A ROM and a PLA have many similarities. You can implement a ROM with a PLA by using the AND terms to decode addresses and the OR terms to hold the data. Alternatively, you can replace a PLA with a ROM by putting the function's truth table into the ROM. ROMs are better if you want to hold arbitrary data that doesn't have much structure (such as the microcode ROMs). PLAs are better if the functions have a lot of underlying structure. The key theoretical difference between a ROM and a PLA is that a ROM activates exactly one row at a time, corresponding to the address, while a PLA may activate one row, no rows, or multiple rows at a time. Another alternative for representing functions is to use logic gates directly (known as random logic); moving from the 286 to the 386, Intel replaced many small PLAs with logic gates, enabled by improvements in the standard-cell software. Intel's design process is described in Coping with the Complexity of Microprocessor Design

  15. In 1982, Intel developed a program called LOGMIN to automate PLA design. The original LOGMIN used an exhaustive exponential search, limiting its usability. See A Logic Minimizer for VLSI PLA Design. For the 386, Intel used Espresso, a heuristic PLA minimizer that originated at IBM and was developed at UC Berkeley. Intel probably used Espresso for the Pentium, but I can't confirm that. 

  16. The Pentium's PLA is split into a top half and a bottom half, so you might expect the top half would generate a quotient of 1 and the bottom half would generate a quotient of 2. However, the rows for the two quotients are shuffled together with no apparent pattern. I suspect that the PLA minimization software generated the order arbitrarily. 

  17. Conceptually, the PLA consists of AND gates feeding into OR gates. To simplify the implementation, both layers of gates are actually NOR gates. Specifically, if any transistor in a row turns on, the row will be pulled to ground, producing a zero. De Morgan's laws show that the two approaches are the same, if you invert the inputs and outputs. I'm ignoring this inversion in the diagrams.

    Note that each square can form a transistor on the left, the right, or both. The image must be examined closely to distinguish these cases. Specifically, if the polysilicon line produces a transistor, horizontal lines are visible in the polysilicon. If there are no horizontal lines, the polysilicon passes by without creating a transistor. 

  18. Each OR plane has four outputs, so there are eight outputs in total. These outputs are combined with logic gates to generate the desired two outputs (quotient of 1 or 2). I'm not sure why the PLA is implemented in this fashion. Each row alternates between an output on the left and an output on the right, but I don't think this makes the layout any denser. As far as I can tell, the extra outputs just waste space. One could imagine combining the outputs in a clever way to reduce the number of terms, but instead the outputs are simply OR'd together. 

  19. The dynamics of the division algorithm are interesting. The computation of a particular division will result in the partial remainder bouncing from table cell to table cell, while remaining in one column of the table. I expect this could be analyzed in terms of chaotic dynamics. Specifically, the partial remainder interval is squished down by the subtraction and then expanded when multiplied by 4. This causes low-order bits to percolate upward so the result is exponentially sensitive to initial conditions. I think that the division behavior satisfies the definition of chaos in Dynamics of Simple Maps, but I haven't investigated this in detail.

    You can see this chaotic behavior with a base-10 division, e.g. compare 1/3.0001 to 1/3.0002:
    1/3.0001=0.33332222259258022387874199947368393726705454969006... 1/3.0002=0.33331111259249383619151572689224512820860216424246...
    Note that the results start off the same but are completely divergent by 15 digits. (The division result itself isn't chaotic, but the sequence of digits is.)

    I tried to make a fractal out of the SRT algorithm and came up with the image below. There are 5 bands for convergence, each made up of 5 sub-bands, each made up of 5 sub-sub bands, and so on, corresponding to the 5 q values.

    A fractal showing convergence or divergence of SRT division as the scale factor (X-axis) ranges from the normal value of 4 to infinity. The Y-axis is the starting partial remainder. The divisor is (arbitrarily) 1.5. Red indicates convergence; gray is darker as the value diverges faster.

    A fractal showing convergence or divergence of SRT division as the scale factor (X-axis) ranges from the normal value of 4 to infinity. The Y-axis is the starting partial remainder. The divisor is (arbitrarily) 1.5. Red indicates convergence; gray is darker as the value diverges faster.

     

  20. The algebra behind the bound of 8/3 is that p (the partial remainder) needs to be in an interval that stays the same size each step. Each step of division computes pnew = (pold - q*d)*4. Thus, at the boundary, with q=2, you have p = (p-2*d)*4, so 3p=8d and thus p/d = 8/3. Similarly, the other boundary, with q=-2, gives you p/d = -8/3. 

  21. I'm not completely happy with the 8-bit carry-lookahead adder. Coe's mathematical analysis in 1994 showed that the carry-lookahead adder operates on 7 bits. The adder in the Pentium has two 8-bit inputs connected to another part of the division circuit. However, the adder's bottom output bit is not connected to anything. That would suggest that the adder is adding 8 bits and then truncating to 7 bits, which would reduce the truncation error compared to a 7-bit adder. However, when I simulate the division algorithm this way, the FDIV bug doesn't occur. Wiring the bottom input bits to 0 would explain the behavior, but that seems pointless. I haven't examined the circuitry that feeds the adder, so I don't have a conclusive answer. 

  22. Half of the circuitry in the adder block is used to test the lookup table. The reason is that a chip such as the Pentium is very difficult to test: if one out of 3.1 million transistors goes bad, how do you detect it? For a simple processor like the 8080, you can run through the instruction set and be fairly confident that any problem would turn up. But with a complex chip, it is almost impossible to come up with an instruction sequence that would test every bit of the microcode ROM, every bit of the cache, and so forth. Starting with the 386, Intel added circuitry to the processor solely to make testing easier; about 2.7% of the transistors in the 386 were for testing.

    To test a ROM inside the processor, Intel added circuitry to scan the entire ROM and checksum its contents. Specifically, a pseudo-random number generator runs through each address, while another circuit computes a checksum of the ROM output, forming a "signature" word. At the end, if the signature word has the right value, the ROM is almost certainly correct. But if there is even a single bit error, the checksum will be wrong and the chip will be rejected. The pseudo-random numbers and the checksum are both implemented with linear feedback shift registers (LFSR), a shift register along with a few XOR gates to feed the output back to the input. For more information on testing circuitry in the 386, see Design and Test of the 80386, written by Pat Gelsinger, who became Intel's CEO years later. Even with the test circuitry, 48% of the transistor sites in the 386 were untested. The instruction-level test suite to test the remaining circuitry took almost 800,000 clock cycles to run. The overhead of the test circuitry was about 10% more transistors in the blocks that were tested.

    In the Pentium, the circuitry to test the lookup table PLA is just below the 7-bit adder. An 11-bit LFSR creates the 11-bit input value to the lookup table. A 13-bit LFSR hashes the two-bit quotient result from the PLA, forming a 13-bit checksum. The checksum is fed serially to test circuitry elsewhere in the chip, where it is merged with other test data and written to a register. If the register is 0 at the end, all the tests pass. In particular, if the checksum is correct, you can be 99.99% sure that the lookup table is operating as expected. The ironic thing is that this test circuit was useless for the FDIV bug: it ensured that the lookup table held the intended values, but the intended values were wrong.

    Why did Intel generate test addresses with a pseudo-random sequence instead of a sequential counter? It turns out that a linear feedback shift register (LFSR) is slightly more compact than a counter. This LFSR trick was also used in a touch-tone chip and the program counter of the Texas Instruments TMS 1000 microcontroller (1974). In the TMS 1000, the program counter steps through the program pseudo-randomly rather than sequentially. The program is shuffled appropriately in the ROM to counteract the sequence, so the program executes as expected and a few transistors are saved. 

  23. One unusual feature of the Pentium is that it uses BiCMOS technology: both bipolar and CMOS transistors. Note the distinctive square boxes in the driver circuitry; these are bipolar transistors, part of the high-speed drivers.

    Three bipolar transistors. These transistors transmit the quotient to the rest of
the division circuitry.

    Three bipolar transistors. These transistors transmit the quotient to the rest of the division circuitry.

     

  24. I think the partial remainder is actually 67 bits because there are three extra bits to handle rounding. Different parts of the floating-point datapath have different widths, depending on what width is needed at that point. 

  25. In this long footnote, I'll attempt to explain why the FDIV bug is so rare, using heatmaps. My analysis of Intel's lookup table shows several curious factors that almost cancel out, making failures rare but not impossible. (For a rigorous explanation, see It Takes Six Ones to Reach a Flaw and The Mathematics of the Pentium Division Bug. These papers explain that, among other factors, a bad divisor must have six consecutive ones in positions 5 through 10 and the division process must go through nine specific steps, making a bad result extremely uncommon.)

    The diagram below shows a heatmap of how often each table cell is accessed when simulating a generic SRT algorithm with a carry-save adder. The black lines show the boundaries of the quotient regions in the Pentium's lookup table. The key point is that the top colored cell in each column is above the black line, so some table cells are accessed but are not defined in the Pentium. This shows that the Pentium is missing 16 entries, not just the 5 entries that are usually discussed. (For this simulation, I generated the quotient digit directly from the SRT bounds, rather than the lookup table, selecting the digit randomly in the redundant regions.)

    A heatmap showing the table cells accessed by an SRT simulation.

    A heatmap showing the table cells accessed by an SRT simulation.

    The diagram is colored with a logarithmic color scale. The blue cells are accessed approximately uniformly. The green cells at the boundaries are accessed about 2 orders of magnitude less often. The yellow-green cells are accessed about 3 orders of magnitude less often. The point is that it is hard to get to the edge cells since you need to start in the right spot and get the right quotient digit, but it's not extraordinarily hard.

    (The diagram also shows an interesting but ultimately unimportant feature of the Pentium table: at the bottom of the diagram, five white cells are above the back line. This shows that the Pentium assigns values to five table cells that can't be accessed. (This was also mentioned in "The Mathematics of the Pentium Bug".) These cells are in the same columns as the 5 missing cells, so it would be interesting if they were related to the missing cells. But as far as I can tell, the extra cells are due to using a bound of "greater or equals" rather than "greater", unrelated to the missing cells. In any case, the extra cells are harmless.)

    The puzzling factor is that if the Pentium table has 16 missing table cells, and the SRT uses these cells fairly often, you'd expect maybe 1 division out of 1000 or so to be wrong. So why are division errors extremely rare?

    It turns out that the structure of the Pentium lookup table makes some table cells inaccessible. Specifically, the table is arbitrarily biased to pick the higher quotient digit rather than the lower quotient digit in the redundant regions. This has the effect of subtracting more from the partial remainder, pulling the partial remainder away from the table edges. The diagram below shows a simulation using the Pentium's lookup table and no carry-save adder. Notice that many cells inside the black lines are white, indicating that they are never accessed. This is by coincidence, due to arbitrary decisions when constructing in the lookup table. Importantly, the missing cells just above the black line are never accessed, so the missing cells shouldn't cause a bug.

    A heatmap showing the table cells accessed by an SRT simulation using the Pentium's lookup table but no carry-save adder.

    A heatmap showing the table cells accessed by an SRT simulation using the Pentium's lookup table but no carry-save adder.

    Thus, Intel almost got away with the missing table entries. Unfortunately, the carry-save adder makes it possible to reach some of the otherwise inaccessible cells. Because the output from the carry-save adder is truncated, the algorithm can access the table cell below the "right" cell. In the redundant regions, this can yield a different (but still valid) quotient digit, causing the next partial remainder to end up in a different cell than usual. The heatmap below shows the results.

    A heatmap showing the probability of ending up in each table cell when using the Pentium's division algorithm.

    A heatmap showing the probability of ending up in each table cell when using the Pentium's division algorithm.

    In particular, five cells above the black line can be reached: these are instances of the FDIV bug. These cells are orange, indicating that they are about 9 orders of magnitude less likely than the rest of the cells. It's almost impossible to reach these cells, requiring multiple "unlucky" values in a row from the carry-save adder. To summarize, the Pentium lookup table has 16 missing cells. Purely by coincidence, the choices in the lookup table make many cells inaccessible, which almost counteracts the problem. However, the carry-save adder provides a one-in-a-billion path to five of the missing cells, triggering the FDIV bug.

    One irony is that if division errors were more frequent, Intel would have caught the FDIV bug before shipping. But if division errors were substantially less frequent, no customers would have noticed the bug. Inconveniently, the frequency of errors fell into the intermediate zone: errors were too rare for Intel to spot them, but frequent enough for a single user to spot them. (This makes me wonder what other astronomically infrequent errors may be lurking in processors.) 

  26. Anatomy of the Pentium Bug reached a similar conclusion, stating "The [Intel] White Paper attributes the error to a script that incorrectly copied values; one is nevertheless tempted to wonder whether the rule for lowering thresholds was applied to the 8D/3 boundary, which would be an incorrect application because that boundary is serving to bound a threshold from below." (That paper also hypothesizes that the table was compressed to 6 columns, a hypothesis that my examination of the die disproves.) 

  27. The Intel white paper describes the underlying cause of the bug: "After the quantized P-D plot (lookup table) was numerically generated as in Figure 4-1, a script was written to download the entries into a hardware PLA (Programmable Lookup Array). An error was made in this script that resulted in a few lookup entries (belonging to the positive plane of the P-D plot) being omitted from the PLA." The script explanation is repeated in The Truth Behind the Pentium Bug: "An engineer prepared the lookup table on a computer and wrote a script in C to download it into a PLA (programmable logic array) for inclusion in the Pentium's FPU. Unfortunately, due to an error in the script, five of the 1066 table entries were not downloaded. To compound this mistake, nobody checked the PLA to verify the table was copied correctly." My analysis suggests that the table was copied correctly; the problem was that the table was mathematically wrong. 

  28. It's not hard to find claims of people encountering the Pentium division bug, but these seem to be in the "urban legend" category. Either the problem is described second-hand, or the problem is unrelated to division, or the problem happened much too frequently to be the FDIV bug. It has been said that the game Quake would occasionally show the wrong part of a level due to the FDIV bug, but I find that implausible. The "Intel Inside—Don't Divide" Chipwreck describes how the division bug was blamed for everything from database and application server crashes to gibberish text. 

  29. IBM's analysis of the error rate seems contrived, coming up with reasons to use numbers that are likely to cause errors. In particular, IBM focuses on slightly truncated numbers, either numbers with two decimal digits or hardcoded constants. Note that a slightly truncated number is much more likely to hit a problem because its binary representation will have multiple 1's in a row, a necessity to trigger the bug. Another paper Risk Analysis of the Pentium Bug claims a risk of one in every 200 divisions. It depends on "bruised integers", such as 4.999999, which are similarly contrived. I'll also point out that if you start with numbers that are "bruised" or otherwise corrupted, you obviously don't care about floating-point accuracy and shouldn't complain if the Pentium adds slightly more inaccuracy.

    The book "Inside Intel" says that "the IBM analysis was quite wrong" and "IBM's intervention in the Pentium affair was not an example of the company on its finest behavior" (page 364). 

  30. The F00F bug happens when an invalid compare-and-exchange instruction leaves the bus locked. The instruction is supposed to exchange with a memory location, but the invalid instruction specifies a register instead causing unexpected behavior. This is very similar to some undocumented instructions in the 8086 processor where a register is specified when memory is required; see my article Undocumented 8086 instructions, explained by the microcode

  31. For details on the Pentium Pro's patchable microcode, see P6 Microcode Can Be Patched. But patchable microcode dates back much earlier. The IBM System/360 mainframes (1964) had microcode that could be updated in the field, either to fix bugs or to implement new features. These systems stored microcode on metalized Mylar sheets that could be replaced as necessary. In that era, semiconductor ROMs didn't exist, so Mylar sheets were also a cost-effective way to implement read-only storage. See TROS: How IBM mainframes stored microcode in transformers





[#] Sun Jan 05 2025 09:29:02 UTC from rss <>

Subject: Pi in the Pentium: reverse-engineering the constants in its floating-point unit

[Reply] [ReplyQuoted] [Headers] [Print]

Intel released the powerful Pentium processor in 1993, establishing a long-running brand of high-performance processors.1 The Pentium includes a floating-point unit that can rapidly compute functions such as sines, cosines, logarithms, and exponentials. But how does the Pentium compute these functions? Earlier Intel chips used binary algorithms called CORDIC, but the Pentium switched to polynomials to approximate these transcendental functions much faster. The polynomials have carefully-optimized coefficients that are stored in a special ROM inside the chip's floating-point unit. Even though the Pentium is a complex chip with 3.1 million transistors, it is possible to see these transistors under a microscope and read out these constants. The first part of this post discusses how the floating point constant ROM is implemented in hardware. The second part explains how the Pentium uses these constants to evaluate sin, log, and other functions.

The photo below shows the Pentium's thumbnail-sized silicon die under a microscope. I've labeled the main functional blocks; the floating-point unit is in the lower right. The constant ROM (highlighted) is at the bottom of the floating-point unit. Above the floating-point unit, the microcode ROM holds micro-instructions, the individual steps for complex instructions. To execute an instruction such as sine, the microcode ROM directs the floating-point unit through dozens of steps to compute the approximation polynomial using constants from the constant ROM.

Die photo of the Intel Pentium processor with the floating point constant ROM highlighted in red. Click this image (or any other) for a larger version.

Die photo of the Intel Pentium processor with the floating point constant ROM highlighted in red. Click this image (or any other) for a larger version.

Finding pi in the constant ROM

In binary, pi is 11.00100100001111110... but what does this mean? To interpret this, the value 11 to the left of the binary point is simply 3 in binary. (The "binary point" is the same as a decimal point, except for binary.) The digits to the right of the binary point have the values 1/2, 1/4, 1/8, and so forth. Thus, the binary value `11.001001000011... corresponds to 3 + 1/8 + 1/64 + 1/4096 + 1/8192 + ..., which matches the decimal value of pi. Since pi is irrational, the bit sequence is infinite and non-repeating; the value in the ROM is truncated to 67 bits and stored as a floating point number.

A floating point number is represented by two parts: the exponent and the significand. Floating point numbers include very large numbers such as 6.02×1023 and very small numbers such as 1.055×10−34. In decimal, 6.02×1023 has a significand (or mantissa) of 6.02, multiplied by a power of 10 with an exponent of 23. In binary, a floating point number is represented similarly, with a significand and exponent, except the significand is multiplied by a power of 2 rather than 10. For example, pi is represented in floating point as 1.1001001...×21.

The diagram below shows how pi is encoded in the Pentium chip. Zooming in shows the constant ROM. Zooming in on a small part of the ROM shows the rows of transistors that store the constants. The arrows point to the transistors representing the bit sequence 11001001, where a 0 bit is represented by a transistor (vertical white line) and a 1 bit is represented by no transistor (solid dark silicon). Each magnified black rectangle at the bottom has two potential transistors, storing two bits. The key point is that by looking at the pattern of stripes, we can determine the pattern of transistors and thus the value of each constant, pi in this case.

A portion of the floating-point ROM, showing the value of pi. Click this image (or any other) for a larger version.

A portion of the floating-point ROM, showing the value of pi. Click this image (or any other) for a larger version.

The bits are spread out because each row of the ROM holds eight interleaved constants to improve the layout. Above the ROM bits, multiplexer circuitry selects the desired constant from the eight in the activated row. In other words, by selecting a row and then one of the eight constants in the row, one of the 304 constants in the ROM is accessed. The ROM stores many more digits of pi than shown here; the diagram shows 8 of the 67 significand bits.

Implementation of the constant ROM

The ROM is built from MOS (metal-oxide-semiconductor) transistors, the transistors used in all modern computers. The diagram below shows the structure of an MOS transistor. An integrated circuit is constructed from a silicon substrate. Regions of the silicon are doped with impurities to create "diffusion" regions with desired electrical properties. The transistor can be viewed as a switch, allowing current to flow between two diffusion regions called the source and drain. The transistor is controlled by the gate, made of a special type of silicon called polysilicon. Applying voltage to the gate lets current flow between the source and drain, which is otherwise blocked. Most computers use two types of MOS transistors: NMOS and PMOS. The two types have similar construction but reverse the doping; NMOS uses n-type diffusion regions as shown below, while PMOS uses p-type diffusion regions. Since the two types are complementary (C), circuits built with the two types of transistors are called CMOS.

Structure of a MOSFET in an integrated circuit.

Structure of a MOSFET in an integrated circuit.

The image below shows how a transistor in the ROM looks under the microscope. The pinkish regions are the doped silicon that forms the transistor's source and drain. The vertical white line is the polysilicon that forms the transistor's gate. For this photo, I removed the chip's three layers of metal, leaving just the underlying silicon and the polysilicon. The circles in the source and drain are tungsten contacts that connect the silicon to the metal layer above.

One transistor in the constant ROM.

One transistor in the constant ROM.

The diagram below shows eight bits of storage. Each of the four pink silicon rectangles has two potential transistors. If a polysilicon gate crosses the silicon, a transistor is formed; otherwise there is no transistor. When a select line (horizontal polysilicon) is energized, it will turn on all the transistors in that row. If a transistor is present, the corresponding ROM bit is 0 because the transistor will pull the output line to ground. If a transistor is absent, the ROM bit is 1. Thus, the pattern of transistors determines the data stored in the ROM. The ROM holds 26144 bits (304 words of 86 bits) so it has 26144 potential transistors.

Eight bits of storage in the ROM.

Eight bits of storage in the ROM.

The photo below shows the bottom layer of metal (M1): vertical metal wires that provide the ROM outputs and supply ground to the ROM. (These wires are represented by gray lines in the schematic above.) The polysilicon transistors (or gaps as appropriate) are barely visible between the metal lines. Most of the small circles are tungsten contacts to the silicon or polysilicon; compare with the photo above. Other circles are tungsten vias to the metal layer on top (M2), horizontal wiring that I removed for this photo. The smaller metal "tabs" act as jumpers between the horizontal metal select lines in M2 and the polysilicon select lines. The top metal layer (M3, not visible) has thicker vertical wiring for the chip's primary distribution power and ground. Thus, the three metal layers alternate between horizontal and vertical wiring, with vias between the layers.

A closeup of the ROM showing the bottom metal layer.

A closeup of the ROM showing the bottom metal layer.

The ROM is implemented as two grids of cells (below): one to hold exponents and one to hold significands, as shown below. The exponent grid (on the left) has 38 rows and 144 columns of transistors, while the significand grid (on the right) has 38 rows and 544 columns. To make the layout work better, each row holds eight different constants; the bits are interleaved so the ROM holds the first bit of eight constants, then the second bit of eight constants, and so forth. Thus, with 38 rows, the ROM holds 304 constants; each constant has 18 bits in the exponent part and 68 bits in the significand section.

A diagram of the constant ROM and supporting circuitry. Most of the significand ROM has been cut out to make it fit.

A diagram of the constant ROM and supporting circuitry. Most of the significand ROM has been cut out to make it fit.

The exponent part of each constant consists of 18 bits: a 17-bit exponent and one bit for the sign of the significand and thus the constant. There is no sign bit for the exponent because the exponent is stored with 65535 (0x0ffff) added to it, avoiding negative values. The 68-bit significand entry in the ROM consists of a mysterious flag bit2 followed by the 67-bit significand; the first bit of the significand is the integer part and the remainder is the fractional part.3 The complete contents of the ROM are in the appendix at the bottom of this post.

To select a particular constant, the "row select" circuitry between the two sections activates one of the 38 rows. That row provides 144+544 bits to the selection circuitry above the ROM. This circuitry has 86 multiplexers; each multiplexer selects one bit out of the group of 8, selecting the desired constant. The significand bits flow into the floating-point unit datapath circuitry above the ROM. The exponent circuitry, however, is in the upper-left corner of the floating-point unit, a considerable distance from the ROM, so the exponent bits travel through a bus to the exponent circuitry.

The row select circuitry consists of gates to decode the row number, along with high-current drivers to energize the selected row in the ROM. The photo below shows a closeup of two row driver circuits, next to some ROM cells. At the left, PMOS and NMOS transistors implement a gate to select the row. Next, larger NMOS and PMOS transistors form part of the driver. The large square structures are bipolar NPN transistors; the Pentium is unusual because it uses both bipolar transistors and CMOS, a technique called BiCMOS.4 Each driver occupies as much height as four rows of the ROM, so there are four drivers arranged horizontally; only one is visible in the photo.

ROM drivers implemented with BiCMOS.

ROM drivers implemented with BiCMOS.

Structure of the floating-point unit

The floating-point unit is structured with data flowing vertically through horizontal functional units, as shown below. The functional units—adders, shifters, registers, and comparators—are arranged in rows. This collection of functional units with data flowing through them is called the datapath.5

The datapath of the floating-point unit. The ROM is at the bottom.

The datapath of the floating-point unit. The ROM is at the bottom.

Each functional unit is constructed from cells, one per bit, with the high-order bit on the left and the low-order bit on the right. Each cell has the same width—38.5 µm—so the functional units can be connected like Lego blocks snapping together, minimizing the wiring. The height of a functional unit varies as needed, depending on the complexity of the circuit. Functional units typically have 69 bits, but some are wider, so the edges of the datapath circuitry are ragged.

This cell-based construction explains why the ROM has eight constants per row. A ROM bit requires a single transistor, which is much narrower than, say, an adder. Thus, putting one bit in each 38.5 µm cell would waste most of the space. Compacting the ROM bits into a narrow block would also be inefficient, requiring diagonal wiring to connect each ROM bit to the corresponding datapath bit. By putting eight bits for eight different constants into each cell, the width of a ROM cell matches the rest of the datapath and the alignment of bits is preserved. Thus, the layout of the ROM in silicon is dense, efficient, and matches the width of the rest of the floating-point unit.

Polynomial approximation: don't use a Taylor series

Now I'll move from the hardware to the constants. If you look at the constant ROM contents in the appendix, you may notice that many constants are close to reciprocals or reciprocal factorials, but don't quite match. For instance, one constant is 0.1111111089, which is close to 1/9, but visibly wrong. Another constant is almost 1/13! (factorial) but wrong by 0.1%. What's going on?

The Pentium uses polynomials to approximate transcendental functions (sine, cosine, tangent, arctangent, and base-2 powers and logarithms). Intel's earlier floating-point units, from the 8087 to the 486, used an algorithm called CORDIC that generated results a bit at a time. However, the Pentium takes advantage of its fast multiplier and larger ROM and uses polynomials instead, computing results two to three times faster than the 486 algorithm.

You may recall from calculus that a Taylor series polynomial approximates a function near a point (typically 0). For example, the equation below gives the Taylor series for sine.

Using the five terms shown above generates a function that looks indistinguishable from sine in the graph below. However, it turns out that this approximation has too much error to be useful.

Plot of the sine function and the Taylor series approximation.

Plot of the sine function and the Taylor series approximation.

The problem is that a Taylor series is very accurate near 0, but the error soars near the edges of the argument range, as shown in the graph on the left below. When implementing a function, we want the function to be accurate everywhere, not just close to 0, so the Taylor series isn't good enough.

The absolute error for a Taylor-series approximation to sine (5 terms), over two different argument ranges.

The absolute error for a Taylor-series approximation to sine (5 terms), over two different argument ranges.

One improvement is called range reduction: shrinking the argument to a smaller range so you're in the accurate flat part.6 The graph on the right looks at the Taylor series over the smaller range [-1/32, 1/32]. This decreases the error dramatically, by about 22 orders of magnitude (note the scale change). However, the error still shoots up at the edges of the range in exactly the same way. No matter how much you reduce the range, there is almost no error in the middle, but the edges have a lot of error.7

How can we get rid of the error near the edges? The trick is to tweak the coefficients of the Taylor series in a special way that will increase the error in the middle, but decrease the error at the edges by much more. Since we want to minimize the maximum error across the range (called minimax), this tradeoff is beneficial. Specifically, the coefficients can be optimized by a process called the Remez algorithm.8 As shown below, changing the coefficients by less than 1% dramatically improves the accuracy. The optimized function (blue) has much lower error over the full range, so it is a much better approximation than the Taylor series (orange).

Comparison of the absolute error from the Taylor series and a Remez-optimized polynomial, both with maximum term x9. This Remez polynomial is not one from the Pentium.

Comparison of the absolute error from the Taylor series and a Remez-optimized polynomial, both with maximum term x9. This Remez polynomial is not one from the Pentium.

To summarize, a Taylor series is useful in calculus, but shouldn't be used to approximate a function. You get a much better approximation by modifying the coefficients very slightly with the Remez algorithm. This explains why the coefficients in the ROM almost, but not quite, match a Taylor series.

Arctan

I'll now look at the Pentium's constants for different transcendental functions. The constant ROM contains coefficients for two arctan polynomials, one for single precision and one for double precision. These polynomials almost match the Taylor series, but have been modified for accuracy. The ROM also holds the values for arctan(1/32) through arctan(32/32); the range reduction process uses these constants with a trig identity to reduce the argument range to [-1/64, 1/64].9 You can see the arctan constants in the Appendix.

The graph below shows the error for the Pentium's arctan polynomial (blue) versus the Taylor series of the same length (orange). The Pentium's polynomial is superior due to the Remez optimization. Although the Taylor series polynomial is much flatter in the middle, the error soars near the boundary. The Pentium's polynomial wiggles more but it maintains a low error across the whole range. The error in the Pentium polynomial blows up outside this range, but that doesn't matter.

Comparison of the Pentium's double-precision arctan polynomial to the Taylor series.

Comparison of the Pentium's double-precision arctan polynomial to the Taylor series.

Trig functions

Sine and cosine each have two polynomial implementations, one with 4 terms in the ROM and one with 6 terms in the ROM. (Note that coefficients of 1 are not stored in the ROM.) The constant table also holds 16 constants such as sin(36/64) and cos(18/64) that are used for argument range reduction.10 The Pentium computes tangent by dividing the sine by the cosine. I'm not showing a graph because the Pentium's error came out worse than the Taylor series, so either I have an error in a coefficient or I'm doing something wrong.

Exponential

The Pentium has an instruction to compute a power of two.11 There are two sets of polynomial coefficients for exponential, one with 6 terms in the ROM and one with 11 terms in the ROM. Curiously, the polynomials in the ROM compute ex, not 2x. Thus, the Pentium must scale the argument by ln(2), a constant that is in the ROM. The error graph below shows the advantage of the Pentium's polynomial over the Taylor series polynomial.

The Pentium's 6-term exponential polynomial, compared with the Taylor series.

The Pentium's 6-term exponential polynomial, compared with the Taylor series.

The polynomial handles the narrow argument range [-1/128, 1/128]. Observe that when computing a power of 2 in binary, exponentiating the integer part of the argument is trivial, since it becomes the result's exponent. Thus, the function only needs to handle the range [1, 2]. For range reduction, the constant ROM holds 64 values of the form 2n/128-1. To reduce the range from [1, 2] to [-1/128, 1/128], the closest n/128 is subtracted from the argument and then the result is multiplied by the corresponding constant in the ROM. The constants are spaced irregularly, presumably for accuracy; some are in steps of 4/128 and others are in steps of 2/128.

Logarithm

The Pentium can compute base-2 logarithms. The constant ROM has 9 coefficients, presumably for the logarithm polynomial (or polynomials), but I can't form a useful polynomial out of them.12 Unlike the other polynomials, this polynomial doesn't resemble the corresponding Taylor series. The ROM also has 64 constants for range reduction: log2(1+n/64) for odd n from 1 to 63. The unusual feature of these constants is that each constant is split into two pieces to increase the bits of accuracy: the top part has 40 bits of accuracy and the bottom part has 67 bits of accuracy, providing a 107-bit constant in total. The extra bits are required because logarithms are hard to compute accurately.

Other constants

The x87 floating-point instruction set provides direct access to a handful of constants—0, 1, pi, log2(10), log2(e), log10(2), and loge(2)—so these constants are stored in the ROM. (These logs are useful for changing the base for logs and exponentials.) The ROM holds other constants for internal use by the floating-point unit such as -1, 2, 7/8, 9/8, pi/2, pi/4, and 2log2(e). The ROM also holds bitmasks for extracting part of a word, for instance accessing 4-bit BCD digits in a word. Although I can interpret most of the values, there are a few mysteries such as a mask with the inscrutable value 0x5c3bd5191b525a249. The ROM has 34 unused entries at the end; these entries hold words that include the descriptive hex value 0xbad.

How I examined the ROM

To analyze the Pentium, I removed the metal and oxide layers with various chemicals (sulfuric acid, phosphoric acid, Whink). (I later discovered that simply sanding the die works surprisingly well.) Next, I took many photos of the ROM with a microscope. The feature size of this Pentium is 800 nm, just slightly larger than visible light (380-700 nm). Thus, the die can be examined under an optical microscope, but it is getting close to the limits. To determine the ROM contents, I tediously went through the ROM images, examining each of the 26144 bits and marking each transistor. After figuring out the ROM format, I wrote programs to combine simple functions in many different combinations to determine the mathematical expression such as arctan(19/32) or log2(10). Because the polynomial constants are optimized and my ROM data has bit errors, my program needed checks for inexact matches, both numerically and bitwise. Finally, I had to determine how the constants would be used in algorithms.

Conclusions

By examining the Pentium's floating-point ROM under a microscope, it is possible to extract the 304 constants stored in the ROM. I was able to determine the meaning of most of these constants and deduce some of the floating-point algorithms used by the Pentium. These constants illustrate how polynomials can efficiently compute transcendental functions. Although Taylor series polynomials are well known, they are surprisingly inaccurate and should be avoided. Minor changes to the coefficients through the Remez algorithm, however, yield much better polynomials.

In a previous article, I examined the floating-point constants stored in the 8087 coprocessor. The Pentium has 304 constants in the Pentium, compared to just 42 in the 8087, supporting more efficient algorithms. Moreover, the 8087 was an external floating-point unit, while the Pentium's floating-point unit is part of the processor. The changes between the 8087 (1980, 65,000 transistors) and the Pentium (1993, 3.1 million transistors) are due to the exponential improvements in transistor count, as described by Moore's Law.

I plan to write more about the Pentium so follow me on Bluesky (@righto.com) or RSS for updates. (I'm no longer on Twitter.) I've also written about the Pentium division bug and the Pentium Navajo rug. Thanks to CuriousMarc for microscope help.

Appendix: The constant ROM

The table below lists the 304 constants in the Pentium's floating-point ROM. The first four columns show the values stored in the ROM: the exponent, the sign bit, the flag bit, and the significand. To avoid negative exponents, exponents are stored with the constant 0x0ffff added. For example, the value 0x0fffe represents an exponent of -1, while 0x10000 represents an exponent of 1. The constant's approximate decimal value is in the "value" column.

Special-purpose values are colored. Specifically, "normal" numbers are in black. Constants with an exponent of all 0's are in blue, constants with an exponent of all 1's are in red, constants with an unusually large or small exponent are in green; these appear to be bitmasks rather than numbers. Unused entries are in gray. Inexact constants (due to Remez optimization) are represented with the approximation symbol "≈".

This information is from my reverse engineering, so there will be a few errors.

expSFsignificandvaluemeaning
0 00000 0 0 07878787878787878 BCD mask by 4's
1 00000 0 0 007f807f807f807f8 BCD mask by 8's
2 00000 0 0 00007fff80007fff8 BCD mask by 16's
3 00000 0 0 000000007fffffff8 BCD mask by 32's
4 00000 0 0 78000000000000000 4-bit mask
5 00000 0 0 18000000000000000 2-bit mask
6 00000 0 0 27000000000000000 ?
7 00000 0 0 363c0000000000000 ?
8 00000 0 0 3e8287c0000000000 ?
9 00000 0 0 470de4df820000000 ?
10 00000 0 0 5c3bd5191b525a249 ?
11 00000 0 0 00000000000000007 3-bit mask
12 1ffff 1 1 7ffffffffffffffff all 1's
13 00000 0 0 0000007ffffffffff 43-bit mask
14 00000 0 0 00000000000003fff 14-bit mask
15 00000 0 0 00000000000000000 all 0's
16 0ffff 0 0 40000000000000000  1 1
17 10000 0 0 6a4d3c25e68dc57f2  3.3219280949 log2(10)
18 0ffff 0 0 5c551d94ae0bf85de  1.4426950409 log2(e)
19 10000 0 0 6487ed5110b4611a6  3.1415926536 pi
20 0ffff 0 0 6487ed5110b4611a6  1.5707963268 pi/2
21 0fffe 0 0 6487ed5110b4611a6  0.7853981634 pi/4
22 0fffd 0 0 4d104d427de7fbcc5  0.3010299957 log10(2)
23 0fffe 0 0 58b90bfbe8e7bcd5f  0.6931471806 ln(2)
24 1ffff 0 0 40000000000000000 ?
25 0bfc0 0 0 40000000000000000 ?
26 1ffff 1 0 60000000000000000 ?
27 0ffff 1 0 40000000000000000 -1 -1
28 10000 0 0 40000000000000000  2 2
29 00000 0 0 00000000000000001 low bit
30 00000 0 0 00000000000000000 all 0's
31 00001 0 0 00000000000000000 single exponent bit
32 0fffe 0 0 58b90bfbe8e7bcd5e  0.6931471806 ln(2)
33 0fffe 0 0 40000000000000000  0.5 1/2! (exp Taylor series)
34 0fffc 0 0 5555555555555584f  0.1666666667 ≈1/3!
35 0fffa 0 0 555555555397fffd4  0.0416666667 ≈1/4!
36 0fff8 0 0 444444444250ced0c  0.0083333333 ≈1/5!
37 0fff5 0 0 5b05c3dd3901cea50  0.0013888934 ≈1/6!
38 0fff2 0 0 6806988938f4f2318  0.0001984134 ≈1/7!
39 0fffe 0 0 40000000000000000  0.5 1/2! (exp Taylor series)
40 0fffc 0 0 5555555555555558e  0.1666666667 ≈1/3!
41 0fffa 0 0 5555555555555558b  0.0416666667 ≈1/4!
42 0fff8 0 0 444444444443db621  0.0083333333 ≈1/5!
43 0fff5 0 0 5b05b05b05afd42f4  0.0013888889 ≈1/6!
44 0fff2 0 0 68068068163b44194  0.0001984127 ≈1/7!
45 0ffef 0 0 6806806815d1b6d8a  0.0000248016 ≈1/8!
46 0ffec 0 0 5c778d8e0384c73ab  2.755731e-06 ≈1/9!
47 0ffe9 0 0 49f93e0ef41d6086b  2.755731e-07 ≈1/10!
48 0ffe5 0 0 6ba8b65b40f9c0ce8  2.506632e-08 ≈1/11!
49 0ffe2 0 0 47c5b695d0d1289a8  2.088849e-09 ≈1/12!
50 0fffd 0 0 6dfb23c651a2ef221  0.4296133384 266/128-1
51 0fffd 0 0 75feb564267c8bf6f  0.4609177942 270/128-1
52 0fffd 0 0 7e2f336cf4e62105d  0.4929077283 274/128-1
53 0fffe 0 0 4346ccda249764072  0.5255981507 278/128-1
54 0fffe 0 0 478d74c8abb9b15cc  0.5590044002 282/128-1
55 0fffe 0 0 4bec14fef2727c5cf  0.5931421513 286/128-1
56 0fffe 0 0 506333daef2b2594d  0.6280274219 290/128-1
57 0fffe 0 0 54f35aabcfedfa1f6  0.6636765803 294/128-1
58 0fffe 0 0 599d15c278afd7b60  0.7001063537 298/128-1
59 0fffe 0 0 5e60f4825e0e9123e  0.7373338353 2102/128-1
60 0fffe 0 0 633f8972be8a5a511  0.7753764925 2106/128-1
61 0fffe 0 0 68396a503c4bdc688  0.8142521755 2110/128-1
62 0fffe 0 0 6d4f301ed9942b846  0.8539791251 2114/128-1
63 0fffe 0 0 7281773c59ffb139f  0.8945759816 2118/128-1
64 0fffe 0 0 77d0df730ad13bb90  0.9360617935 2122/128-1
65 0fffe 0 0 7d3e0c0cf486c1748  0.9784560264 2126/128-1
66 0fffc 0 0 642e1f899b0626a74  0.1956643920 233/128-1
67 0fffc 0 0 6ad8abf253fe1928c  0.2086843236 235/128-1
68 0fffc 0 0 7195cda0bb0cb0b54  0.2218460330 237/128-1
69 0fffc 0 0 7865b862751c90800  0.2351510639 239/128-1
70 0fffc 0 0 7f48a09590037417f  0.2486009772 241/128-1
71 0fffd 0 0 431f5d950a896dc70  0.2621973504 243/128-1
72 0fffd 0 0 46a41ed1d00577251  0.2759417784 245/128-1
73 0fffd 0 0 4a32af0d7d3de672e  0.2898358734 247/128-1
74 0fffd 0 0 4dcb299fddd0d63b3  0.3038812652 249/128-1
75 0fffd 0 0 516daa2cf6641c113  0.3180796013 251/128-1
76 0fffd 0 0 551a4ca5d920ec52f  0.3324325471 253/128-1
77 0fffd 0 0 58d12d497c7fd252c  0.3469417862 255/128-1
78 0fffd 0 0 5c9268a5946b701c5  0.3616090206 257/128-1
79 0fffd 0 0 605e1b976dc08b077  0.3764359708 259/128-1
80 0fffd 0 0 6434634ccc31fc770  0.3914243758 261/128-1
81 0fffd 0 0 68155d44ca973081c  0.4065759938 263/128-1
82 0fffd 1 0 4cee3bed56eedb76c -0.3005101637 2-66/128-1
83 0fffd 1 0 50c4875296f5bc8b2 -0.3154987885 2-70/128-1
84 0fffd 1 0 5485c64a56c12cc8a -0.3301662380 2-74/128-1
85 0fffd 1 0 58326c4b169aca966 -0.3445193942 2-78/128-1
86 0fffd 1 0 5bcaea51f6197f61f -0.3585649920 2-82/128-1
87 0fffd 1 0 5f4faef0468eb03de -0.3723096215 2-86/128-1
88 0fffd 1 0 62c12658d30048af2 -0.3857597319 2-90/128-1
89 0fffd 1 0 661fba6cdf48059b2 -0.3989216343 2-94/128-1
90 0fffd 1 0 696bd2c8dfe7a5ffb -0.4118015042 2-98/128-1
91 0fffd 1 0 6ca5d4d0ec1916d43 -0.4244053850 2-102/128-1
92 0fffd 1 0 6fce23bceb994e239 -0.4367391907 2-106/128-1
93 0fffd 1 0 72e520a481a4561a5 -0.4488087083 2-110/128-1
94 0fffd 1 0 75eb2a8ab6910265f -0.4606196011 2-114/128-1
95 0fffd 1 0 78e09e696172efefc -0.4721774108 2-118/128-1
96 0fffd 1 0 7bc5d73c5321bfb9e -0.4834875605 2-122/128-1
97 0fffd 1 0 7e9b2e0c43fcf88c8 -0.4945553570 2-126/128-1
98 0fffc 1 0 53c94402c0c863f24 -0.1636449102 2-33/128-1
99 0fffc 1 0 58661eccf4ca790d2 -0.1726541162 2-35/128-1
100 0fffc 1 0 5cf6413b5d2cca73f -0.1815662751 2-37/128-1
101 0fffc 1 0 6179ce61cdcdce7db -0.1903824324 2-39/128-1
102 0fffc 1 0 65f0e8f35f84645cf -0.1991036222 2-41/128-1
103 0fffc 1 0 6a5bb3437adf1164b -0.2077308674 2-43/128-1
104 0fffc 1 0 6eba4f46e003a775a -0.2162651800 2-45/128-1
105 0fffc 1 0 730cde94abb7410d5 -0.2247075612 2-47/128-1
106 0fffc 1 0 775382675996699ad -0.2330590011 2-49/128-1
107 0fffc 1 0 7b8e5b9dc385331ad -0.2413204794 2-51/128-1
108 0fffc 1 0 7fbd8abc1e5ee49f2 -0.2494929652 2-53/128-1
109 0fffd 1 0 41f097f679f66c1db -0.2575774171 2-55/128-1
110 0fffd 1 0 43fcb5810d1604f37 -0.2655747833 2-57/128-1
111 0fffd 1 0 46032dbad3f462152 -0.2734860021 2-59/128-1
112 0fffd 1 0 48041035735be183c -0.2813120013 2-61/128-1
113 0fffd 1 0 49ff6c57a12a08945 -0.2890536989 2-63/128-1
114 0fffd 1 0 555555555555535f0 -0.3333333333 ≈-1/3 (arctan Taylor series)
115 0fffc 0 0 6666666664208b016  0.2 ≈ 1/5
116 0fffc 1 0 492491e0653ac37b8 -0.1428571307 ≈-1/7
117 0fffb 0 0 71b83f4133889b2f0  0.1110544094 ≈ 1/9
118 0fffd 1 0 55555555555555543 -0.3333333333 ≈-1/3 (arctan Taylor series)
119 0fffc 0 0 66666666666616b73  0.2 ≈ 1/5
120 0fffc 1 0 4924924920fca4493 -0.1428571429 ≈-1/7
121 0fffb 0 0 71c71c4be6f662c91  0.1111111089 ≈ 1/9
122 0fffb 1 0 5d16e0bde0b12eee8 -0.0909075848 ≈-1/11
123 0fffb 0 0 4e403be3e3c725aa0  0.0764169081 ≈ 1/13
124 00000 0 0 40000000000000000 single bit mask
125 0fff9 0 0 7ff556eea5d892a14  0.0312398334 arctan(1/32)
126 0fffa 0 0 7fd56edcb3f7a71b6  0.0624188100 arctan(2/32)
127 0fffb 0 0 5fb860980bc43a305  0.0934767812 arctan(3/32)
128 0fffb 0 0 7f56ea6ab0bdb7196  0.1243549945 arctan(4/32)
129 0fffc 0 0 4f5bbba31989b161a  0.1549967419 arctan(5/32)
130 0fffc 0 0 5ee5ed2f396c089a4  0.1853479500 arctan(6/32)
131 0fffc 0 0 6e435d4a498288118  0.2153576997 arctan(7/32)
132 0fffc 0 0 7d6dd7e4b203758ab  0.2449786631 arctan(8/32)
133 0fffd 0 0 462fd68c2fc5e0986  0.2741674511 arctan(9/32)
134 0fffd 0 0 4d89dcdc1faf2f34e  0.3028848684 arctan(10/32)
135 0fffd 0 0 54c2b6654735276d5  0.3310960767 arctan(11/32)
136 0fffd 0 0 5bd86507937bc239c  0.3587706703 arctan(12/32)
137 0fffd 0 0 62c934e5286c95b6d  0.3858826694 arctan(13/32)
138 0fffd 0 0 6993bb0f308ff2db2  0.4124104416 arctan(14/32)
139 0fffd 0 0 7036d3253b27be33e  0.4383365599 arctan(15/32)
140 0fffd 0 0 76b19c1586ed3da2b  0.4636476090 arctan(16/32)
141 0fffd 0 0 7d03742d50505f2e3  0.4883339511 arctan(17/32)
142 0fffe 0 0 4195fa536cc33f152  0.5123894603 arctan(18/32)
143 0fffe 0 0 4495766fef4aa3da8  0.5358112380 arctan(19/32)
144 0fffe 0 0 47802eaf7bfacfcdb  0.5585993153 arctan(20/32)
145 0fffe 0 0 4a563964c238c37b1  0.5807563536 arctan(21/32)
146 0fffe 0 0 4d17c07338deed102  0.6022873461 arctan(22/32)
147 0fffe 0 0 4fc4fee27a5bd0f68  0.6231993299 arctan(23/32)
148 0fffe 0 0 525e3e8c9a7b84921  0.6435011088 arctan(24/32)
149 0fffe 0 0 54e3d5ee24187ae45  0.6632029927 arctan(25/32)
150 0fffe 0 0 5756261c5a6c60401  0.6823165549 arctan(26/32)
151 0fffe 0 0 59b598e48f821b48b  0.7008544079 arctan(27/32)
152 0fffe 0 0 5c029f15e118cf39e  0.7188299996 arctan(28/32)
153 0fffe 0 0 5e3daef574c579407  0.7362574290 arctan(29/32)
154 0fffe 0 0 606742dc562933204  0.7531512810 arctan(30/32)
155 0fffe 0 0 627fd7fd5fc7deaa4  0.7695264804 arctan(31/32)
156 0fffe 0 0 6487ed5110b4611a6  0.7853981634 arctan(32/32)
157 0fffc 1 0 55555555555555555 -0.1666666667 ≈-1/3! (sin Taylor series)
158 0fff8 0 0 44444444444443e35  0.0083333333 ≈ 1/5!
159 0fff2 1 0 6806806806773c774 -0.0001984127 ≈-1/7!
160 0ffec 0 0 5c778e94f50956d70  2.755732e-06 ≈ 1/9!
161 0ffe5 1 0 6b991122efa0532f0 -2.505209e-08 ≈-1/11!
162 0ffde 0 0 58303f02614d5e4d8  1.604139e-10 ≈ 1/13!
163 0fffd 1 0 7fffffffffffffffe -0.5 ≈-1/2! (cos Taylor series)
164 0fffa 0 0 55555555555554277  0.0416666667 ≈ 1/4!
165 0fff5 1 0 5b05b05b05a18a1ba -0.0013888889 ≈-1/6!
166 0ffef 0 0 680680675b559f2cf  0.0000248016 ≈ 1/8!
167 0ffe9 1 0 49f93af61f5349300 -2.755730e-07 ≈-1/10!
168 0ffe2 0 0 47a4f2483514c1af8  2.085124e-09 ≈ 1/12!
169 0fffc 1 0 55555555555555445 -0.1666666667 ≈-1/3! (sin Taylor series)
170 0fff8 0 0 44444444443a3fdb6  0.0083333333 ≈ 1/5!
171 0fff2 1 0 68068060b2044e9ae -0.0001984127 ≈-1/7!
172 0ffec 0 0 5d75716e60f321240  2.785288e-06 ≈ 1/9!
173 0fffd 1 0 7fffffffffffffa28 -0.5 ≈-1/2! (cos Taylor series)
174 0fffa 0 0 555555555539cfae6  0.0416666667 ≈ 1/4!
175 0fff5 1 0 5b05b050f31b2e713 -0.0013888889 ≈-1/6!
176 0ffef 0 0 6803988d56e3bff10  0.0000247989 ≈ 1/8!
177 0fffe 0 0 44434312da70edd92  0.5333026735 sin(36/64)
178 0fffe 0 0 513ace073ce1aac13  0.6346070800 sin(44/64)
179 0fffe 0 0 5cedda037a95df6ee  0.7260086553 sin(52/64)
180 0fffe 0 0 672daa6ef3992b586  0.8060811083 sin(60/64)
181 0fffd 0 0 470df5931ae1d9460  0.2775567516 sin(18/64)
182 0fffd 0 0 5646f27e8bd65cbe4  0.3370200690 sin(22/64)
183 0fffd 0 0 6529afa7d51b12963  0.3951673302 sin(26/64)
184 0fffd 0 0 73a74b8f52947b682  0.4517714715 sin(30/64)
185 0fffe 0 0 6c4741058a93188ef  0.8459244992 cos(36/64)
186 0fffe 0 0 62ec41e9772401864  0.7728350058 cos(44/64)
187 0fffe 0 0 5806149bd58f7d46d  0.6876855622 cos(52/64)
188 0fffe 0 0 4bc044c9908390c72  0.5918050751 cos(60/64)
189 0fffe 0 0 7af8853ddbbe9ffd0  0.9607092430 cos(18/64)
190 0fffe 0 0 7882fd26b35b03d34  0.9414974631 cos(22/64)
191 0fffe 0 0 7594fc1cf900fe89e  0.9186091558 cos(26/64)
192 0fffe 0 0 72316fe3386a10d5a  0.8921336994 cos(30/64)
193 0ffff 0 0 48000000000000000  1.125 9/8
194 0fffe 0 0 70000000000000000  0.875 7/8
195 0ffff 0 0 5c551d94ae0bf85de  1.4426950409 log2(e)
196 10000 0 0 5c551d94ae0bf85de  2.8853900818 2log2(e)
197 0fffb 0 0 7b1c2770e81287c11  0.1202245867 coefficients for log?
198 0fff9 0 0 49ddb14064a5d30bd  0.0180336880
199 0fff6 0 0 698879b87934f12e0  0.0032206148
200 0fffa 0 0 51ff4ffeb20ed1749  0.0400377512
201 0fff6 0 0 5e8cd07eb1827434a  0.0028854387
202 0fff3 0 0 40e54061b26dd6dc2  0.0002475567
203 0ffef 0 0 61008a69627c92fb9  0.0000231271
204 0ffec 0 0 4c41e6ced287a2468  2.272648e-06
205 0ffe8 0 0 7dadd4ea3c3fee620  2.340954e-07
206 0fff9 0 0 5b9e5a170b8000000  0.0223678130 log2(1+1/64) top bits
207 0fffb 0 0 43ace37e8a8000000  0.0660892054 log2(1+3/64) top bits
208 0fffb 0 0 6f210902b68000000  0.1085244568 log2(1+5/64) top bits
209 0fffc 0 0 4caba789e28000000  0.1497471195 log2(1+7/64) top bits
210 0fffc 0 0 6130af40bc0000000  0.1898245589 log2(1+9/64) top bits
211 0fffc 0 0 7527b930c98000000  0.2288186905 log2(1+11/64) top bits
212 0fffd 0 0 444c1f6b4c0000000  0.2667865407 log2(1+13/64) top bits
213 0fffd 0 0 4dc4933a930000000  0.3037807482 log2(1+15/64) top bits
214 0fffd 0 0 570068e7ef8000000  0.3398500029 log2(1+17/64) top bits
215 0fffd 0 0 6002958c588000000  0.3750394313 log2(1+19/64) top bits
216 0fffd 0 0 68cdd829fd8000000  0.4093909361 log2(1+21/64) top bits
217 0fffd 0 0 7164beb4a58000000  0.4429434958 log2(1+23/64) top bits
218 0fffd 0 0 79c9aa879d8000000  0.4757334310 log2(1+25/64) top bits
219 0fffe 0 0 40ff6a2e5e8000000  0.5077946402 log2(1+27/64) top bits
220 0fffe 0 0 450327ea878000000  0.5391588111 log2(1+29/64) top bits
221 0fffe 0 0 48f107509c8000000  0.5698556083 log2(1+31/64) top bits
222 0fffe 0 0 4cc9f1aad28000000  0.5999128422 log2(1+33/64) top bits
223 0fffe 0 0 508ec1fa618000000  0.6293566201 log2(1+35/64) top bits
224 0fffe 0 0 5440461c228000000  0.6582114828 log2(1+37/64) top bits
225 0fffe 0 0 57df3fd0780000000  0.6865005272 log2(1+39/64) top bits
226 0fffe 0 0 5b6c65a9d88000000  0.7142455177 log2(1+41/64) top bits
227 0fffe 0 0 5ee863e4d40000000  0.7414669864 log2(1+43/64) top bits
228 0fffe 0 0 6253dd2c1b8000000  0.7681843248 log2(1+45/64) top bits
229 0fffe 0 0 65af6b4ab30000000  0.7944158664 log2(1+47/64) top bits
230 0fffe 0 0 68fb9fce388000000  0.8201789624 log2(1+49/64) top bits
231 0fffe 0 0 6c39049af30000000  0.8454900509 log2(1+51/64) top bits
232 0fffe 0 0 6f681c731a0000000  0.8703647196 log2(1+53/64) top bits
233 0fffe 0 0 72896372a50000000  0.8948177633 log2(1+55/64) top bits
234 0fffe 0 0 759d4f80cb8000000  0.9188632373 log2(1+57/64) top bits
235 0fffe 0 0 78a450b8380000000  0.9425145053 log2(1+59/64) top bits
236 0fffe 0 0 7b9ed1c6ce8000000  0.9657842847 log2(1+61/64) top bits
237 0fffe 0 0 7e8d3845df0000000  0.9886846868 log2(1+63/64) top bits
238 0ffd0 1 0 6eb3ac8ec0ef73f7b -1.229037e-14 log2(1+1/64) bottom bits
239 0ffcd 1 0 654c308b454666de9 -1.405787e-15 log2(1+3/64) bottom bits
240 0ffd2 0 0 5dd31d962d3728cbd  4.166652e-14 log2(1+5/64) bottom bits
241 0ffd3 0 0 70d0fa8f9603ad3a6  1.002010e-13 log2(1+7/64) bottom bits
242 0ffd1 0 0 765fba4491dcec753  2.628429e-14 log2(1+9/64) bottom bits
243 0ffd2 1 0 690370b4a9afdc5fb -4.663533e-14 log2(1+11/64) bottom bits
244 0ffd4 0 0 5bae584b82d3cad27  1.628582e-13 log2(1+13/64) bottom bits
245 0ffd4 0 0 6f66cc899b64303f7  1.978889e-13 log2(1+15/64) bottom bits
246 0ffd4 1 0 4bc302ffa76fafcba -1.345799e-13 log2(1+17/64) bottom bits
247 0ffd2 1 0 7579aa293ec16410a -5.216949e-14 log2(1+19/64) bottom bits
248 0ffcf 0 0 509d7c40d7979ec5b  4.475041e-15 log2(1+21/64) bottom bits
249 0ffd3 1 0 4a981811ab5110ccf -6.625289e-14 log2(1+23/64) bottom bits
250 0ffd4 1 0 596f9d730f685c776 -1.588702e-13 log2(1+25/64) bottom bits
251 0ffd4 1 0 680cc6bcb9bfa9853 -1.848298e-13 log2(1+27/64) bottom bits
252 0ffd4 0 0 5439e15a52a31604a  1.496156e-13 log2(1+29/64) bottom bits
253 0ffd4 0 0 7c8080ecc61a98814  2.211599e-13 log2(1+31/64) bottom bits
254 0ffd3 1 0 6b26f28dbf40b7bc0 -9.517022e-14 log2(1+33/64) bottom bits
255 0ffd5 0 0 554b383b0e8a55627  3.030245e-13 log2(1+35/64) bottom bits
256 0ffd5 0 0 47c6ef4a49bc59135  2.550034e-13 log2(1+37/64) bottom bits
257 0ffd5 0 0 4d75c658d602e66b0  2.751934e-13 log2(1+39/64) bottom bits
258 0ffd4 1 0 6b626820f81ca95da -1.907530e-13 log2(1+41/64) bottom bits
259 0ffd3 0 0 5c833d56efe4338fe  8.216774e-14 log2(1+43/64) bottom bits
260 0ffd5 0 0 7c5a0375163ec8d56  4.417857e-13 log2(1+45/64) bottom bits
261 0ffd5 1 0 5050809db75675c90 -2.853343e-13 log2(1+47/64) bottom bits
262 0ffd4 1 0 7e12f8672e55de96c -2.239526e-13 log2(1+49/64) bottom bits
263 0ffd5 0 0 435ebd376a70d849b  2.393466e-13 log2(1+51/64) bottom bits
264 0ffd2 1 0 6492ba487dfb264b3 -4.466345e-14 log2(1+53/64) bottom bits
265 0ffd5 1 0 674e5008e379faa7c -3.670163e-13 log2(1+55/64) bottom bits
266 0ffd5 0 0 5077f1f5f0cc82aab  2.858817e-13 log2(1+57/64) bottom bits
267 0ffd2 0 0 5007eeaa99f8ef14d  3.554090e-14 log2(1+59/64) bottom bits
268 0ffd5 0 0 4a83eb6e0f93f7a64  2.647316e-13 log2(1+61/64) bottom bits
269 0ffd3 0 0 466c525173dae9cf5  6.254831e-14 log2(1+63/64) bottom bits
270 0badf 0 1 40badfc0badfc0bad unused
271 0badf 0 1 40badfc0badfc0bad unused
272 0badf 0 1 40badfc0badfc0bad unused
273 0badf 0 1 40badfc0badfc0bad unused
274 0badf 0 1 40badfc0badfc0bad unused
275 0badf 0 1 40badfc0badfc0bad unused
276 0badf 0 1 40badfc0badfc0bad unused
277 0badf 0 1 40badfc0badfc0bad unused
278 0badf 0 1 40badfc0badfc0bad unused
279 0badf 0 1 40badfc0badfc0bad unused
280 0badf 0 1 40badfc0badfc0bad unused
281 0badf 0 1 40badfc0badfc0bad unused
282 0badf 0 1 40badfc0badfc0bad unused
283 0badf 0 1 40badfc0badfc0bad unused
284 0badf 0 1 40badfc0badfc0bad unused
285 0badf 0 1 40badfc0badfc0bad unused
286 0badf 0 1 40badfc0badfc0bad unused
287 0badf 0 1 40badfc0badfc0bad unused
288 0badf 0 1 40badfc0badfc0bad unused
289 0badf 0 1 40badfc0badfc0bad unused
290 0badf 0 1 40badfc0badfc0bad unused
291 0badf 0 1 40badfc0badfc0bad unused
292 0badf 0 1 40badfc0badfc0bad unused
293 0badf 0 1 40badfc0badfc0bad unused
294 0badf 0 1 40badfc0badfc0bad unused
295 0badf 0 1 40badfc0badfc0bad unused
296 0badf 0 1 40badfc0badfc0bad unused
297 0badf 0 1 40badfc0badfc0bad unused
298 0badf 0 1 40badfc0badfc0bad unused
299 0badf 0 1 40badfc0badfc0bad unused
300 0badf 0 1 40badfc0badfc0bad unused
301 0badf 0 1 40badfc0badfc0bad unused
302 0badf 0 1 40badfc0badfc0bad unused
303 0badf 0 1 40badfc0badfc0bad unused

Notes and references

  1. In this blog post, I'm looking at the "P5" version of the original Pentium processor. It can be hard to keep all the Pentiums straight since "Pentium" became a brand name with multiple microarchitectures, lines, and products. The original Pentium (1993) was followed by the Pentium Pro (1995), Pentium II (1997), and so on.

    The original Pentium used the P5 microarchitecture, a superscalar microarchitecture that was advanced but still executed instruction in order like traditional microprocessors. The original Pentium went through several substantial revisions. The first Pentium product was the 80501 (codenamed P5), containing 3.1 million transistors. The power consumption of these chips was disappointing, so Intel improved the chip, producing the 80502, codenamed P54C. The P5 and P54C look almost the same on the die, but the P54C added circuitry for multiprocessing, boosting the transistor count to 3.3 million. The biggest change to the original Pentium was the Pentium MMX, with part number 80503 and codename P55C. The Pentium MMX added 57 vector processing instructions and had 4.5 million transistors. The floating-point unit was rearranged in the MMX, but the constants are probably the same. 

  2. I don't know what the flag bit in the ROM indicates; I'm arbitrarily calling it a flag. My wild guess is that it indicates ROM entries that should be excluded from the checksum when testing the ROM. 

  3. Internally, the significand has one integer bit and the remainder is the fraction, so the binary point (decimal point) is after the first bit. However, this is not the only way to represent the significand. The x87 80-bit floating-point format (double extended-precision) uses the same approach. However, the 32-bit (single-precision) and 64-bit (double-precision) formats drop the first bit and use an "implied" one bit. This gives you one more bit of significand "for free" since in normal cases the first significand bit will be 1. 

  4. An unusual feature of the Pentium is that it uses bipolar NPN transistors along with CMOS circuits, a technology called BiCMOS. By adding a few extra processing steps to the regular CMOS manufacturing process, bipolar transistors could be created. The Pentium uses BiCMOS circuits extensively since they reduced signal delays by up to 35%. Intel also used BiCMOS for the Pentium Pro, Pentium II, Pentium III, and Xeon processors (but not the Pentium MMX). However, as chip voltages dropped, the benefit from bipolar transistors dropped too and BiCMOS was eventually abandoned.

    In the constant ROM, BiCMOS circuits improve the performance of the row selection circuitry. Each row select line is very long and is connected to hundreds of transistors, so the capacitive load is large. Because of the fast and powerful NPN transistor, a BiCMOS driver provides lower delay for higher loads than a regular CMOS driver.

    A typical BiCMOS inverter. From A 3.3V 0.6µm BiCMOS superscalar microprocessor.

    A typical BiCMOS inverter. From A 3.3V 0.6µm BiCMOS superscalar microprocessor.

    This BiCMOS logic is also called BiNMOS or BinMOS because the output has a bipolar transistor and an NMOS transistor. For more on BiCMOS circuits in the Pentium, see my article Standard cells: Looking at individual gates in the Pentium processor

  5. The integer processing unit of the Pentium is constructed similarly, with horizontal functional units stacked to form the datapath. Each cell in the integer unit is much wider than a floating-point cell (64 µm vs 38.5 µm). However, the integer unit is just 32 bits wide, compared to 69 (more or less) for the floating-point unit, so the floating-point unit is wider overall. 

  6. I don't like referring to the argument's range since a function's output is the range, while its input is the domain. But the term range reduction is what people use, so I'll go with it. 

  7. There's a reason why the error curve looks similar even if you reduce the range. The error from the Taylor series is approximately the next term in the Taylor series, so in this case the error is roughly -x11/11! or O(x11). This shows why range reduction is so powerful: if you reduce the range by a factor of 2, you reduce the error by the enormous factor of 211. But this also shows why the error curve keeps its shape: the curve is still x11, just with different labels on the axes. 

  8. The Pentium coefficients are probably obtained using the Remez algorithm; see Floating-Point Verification. The advantages of the Remez polynomial over the Taylor series are discussed in Better Function Approximations: Taylor vs. Remez. A description of Remez's algorithm is in Elementary Functions: Algorithms and Implementation, which has other relevant information on polynomial approximation and range reduction. For more on polynomial approximations, see Numerically Computing the Exponential Function with Polynomial Approximations and The Eight Useful Polynomial Approximations of Sinf(3),

    The Remez polynomial in the sine graph was generated by lolremez, a useful tool. The specific polynomial is:

    9.9997938808335731e-1 ⋅ x - 1.6662438518867169e-1 ⋅ x3 + 8.3089850302282266e-3 ⋅ x5 - 1.9264997445395096e-4 ⋅ x7 + 2.1478735041839789e-6 ⋅ x9

    The graph below shows the error for this polynomial. Note that the error oscillates between an upper bound and a lower bound. This is the typical appearance of a Remez polynomial. In contrast, a Taylor series will have almost no error in the middle and shoot up at the edges. This Remez polynomial was optimized for the range [-π,π]; the error explodes outside that range. The key point is that the Remez polynomial distributes the error inside the range. This minimizes the maximum error (minimax).

    Error from a Remez-optimized polynomial for sine.

    Error from a Remez-optimized polynomial for sine.

  9. I think the arctan argument is range-reduced to the range [-1/64, 1/64]. This can be accomplished with the trig identity arctan(x) = arctan((x-c)/(1+xc)) + arctan(c). The idea is that c is selected to be the value of the form n/32 closest to x. As a result, x-c will be in the desired range and the first arctan can be computed with the polynomial. The other term, arctan(c), is obtained from the lookup table in the ROM. The FPATAN (partial arctangent) instruction takes two arguments, x and y, and returns atan(y/x); this simplifies handling planar coordinates. In this case, the trig identity becomes arcan(y/x) = arctan((y-tx)/(x+ty)) + arctan c. The division operation can trigger the FDIV bug in some cases; see Computational Aspects of the Pentium Affair

  10. The Pentium has several trig instructions: FSIN, FCOS, and FSINCOS return the sine, cosine, or both (which is almost as fast as computing either). FPTAN returns the "partial tangent" consisting of two numbers that must be divided to yield the tangent. (This was due to limitations in the original 8087 coprocessor.) The Pentium returns the tangent as the first number and the constant 1 as the second number, keeping the semantics of FPTAN while being more convenient.

    The range reduction is probably based on the trig identity sin(a+b) = sin(a)cos(b)+cos(a)sin(b). To compute sin(x), select b as the closest constant in the lookup table, n/64, and then generate a=x-b. The value a will be range-reduced, so sin(a) can be computed from the polynomial. The terms sin(b) and cos(b) are available from the lookup table. The desired value sin(x) can then be computed with multiplications and addition by using the trig identity. Cosine can be computed similarly. Note that cos(a+b) =cos(a)cos(b)-sin(a)sin(b); the terms on the right are the same as for sin(a+b), just combined differently. Thus, once the terms on the right have been computed, they can be combined to generate sine, cosine, or both. The Pentium computes the tangent by dividing the sine by the cosine. This can trigger the FDIV division bug; see Computational Aspects of the Pentium Affair.

    Also see Agner Fog's Instruction Timings; the timings for the various operations give clues as to how they are computed. For instance, FPTAN takes longer than FSINCOS because the tangent is generated by dividing the sine by the cosine. 

  11. For exponentials, the F2XM1 instruction computes 2x-1; subtracting 1 improves accuracy. Specifically, 2x is close to 1 for the common case when x is close to 0, so subtracting 1 as a separate operation causes you to lose most of the bits of accuracy due to cancellation. On the other hand, if you want 2x, explicitly adding 1 doesn't harm accuracy. This is an example of how the floating-point instructions are carefully designed to preserve accuracy. For details, see the book The 8087 Primer by the architects of the 8086 processor and the 8087 coprocessor. 

  12. The Pentium has base-two logarithm instructions FYL2X and FYL2XP1. The FYL2X instruction computes y log2(x) and the FYL2XP1 instruction computes y log2(x+1) The instructions include a multiplication because most logarithm operations will need to multiply to change the base; performing the multiply with internal precision increases the accuracy. The "plus-one" instruction improves accuracy for arguments close to 1, such as interest calculations.

    My hypothesis for range reduction is that the input argument is scaled to fall between 1 and 2. (Taking the log of the exponent part of the argument is trivial since the base-2 log of a base-2 power is simply the exponent.) The argument can then be divided by the largest constant 1+n/64 less than the argument. This will reduce the argument to the range [1, 1+1/32]. The log polynomial can be evaluated on the reduced argument. Finally, the ROM constant for log2(1+n/64) is added to counteract the division. The constant is split into two parts for greater accuracy.

    It took me a long time to figure out the log constants because they were split. The upper-part constants appeared to be pointlessly inaccurate since the bottom 27 bits are zeroed out. The lower-part constants appeared to be miniscule semi-random numbers around ±10-13. Eventually, I figured out that the trick was to combine the constants.

    I haven't figured out how the coefficients form the logarithm polynomial. The Taylor series for logarithm has coefficients ±1/n, but the coefficients in the ROM are completely different from that. It's not obvious how to make the coefficients into a polynomial since the powers: does the first term go with x1, x2, or something else? And coefficients of 1 aren't stored in the table. It could even be a ratio of polynomials. Moreover, I may have bit errors in the coefficients. In any case, I tried many formulas and couldn't come up with anything reasonable. 





[#] Sun Jan 12 2025 08:56:33 UTC from rss <>

Subject: It's time to abandon the cargo cult metaphor

[Reply] [ReplyQuoted] [Headers] [Print]

The cargo cult metaphor is commonly used by programmers. This metaphor was popularized by Richard Feynman's "cargo cult science" talk with a vivid description of South Seas cargo cults. However, this metaphor has three major problems. First, the pop-culture depiction of cargo cults is inaccurate and fictionalized, as I'll show. Second, the metaphor is overused and has contradictory meanings making it a lazy insult. Finally, cargo cults are portrayed as an amusing story of native misunderstanding but the background is much darker: cargo cults are a reaction to decades of oppression of Melanesian islanders and the destruction of their culture. For these reasons, the cargo cult metaphor is best avoided.

Members of the John Frum cargo cult, marching with bamboo "rifles". Photo adapted from The Open Encyclopedia of Anthropology, (CC BY-NC 4.0).

Members of the John Frum cargo cult, marching with bamboo "rifles". Photo adapted from The Open Encyclopedia of Anthropology, (CC BY-NC 4.0).

In this post, I'll describe some cargo cults from 1919 to the present. These cargo cults are completely different from the description of cargo cults you usually find on the internet, which I'll call the "pop-culture cargo cult." Cargo cults are extremely diverse, to the extent that anthropologists disagree on the cause, definition, or even if the term has value. I'll show that many of the popular views of cargo cults come from a 1962 "shockumentary" called Mondo Cane. Moreover, most online photos of cargo cults are fake.

Feynman and Cargo Cult Science

The cargo cult metaphor in science started with Professor Richard Feynman's well-known 1974 commencement address at Caltech.1 This speech, titled "Cargo Cult Science", was expanded into a chapter in his best-selling 1985 book "Surely You're Joking, Mr. Feynman". He said:

In the South Seas there is a cargo cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.

Richard Feynman giving the 1974 commencement address at Caltech. Photo from Wikimedia Commons.

Richard Feynman giving the 1974 commencement address at Caltech. Photo from Wikimedia Commons.

But the standard anthropological definition of "cargo cult" is entirely different: 2

Cargo cults are strange religious movements in the South Pacific that appeared during the last few decades. In these movements, a prophet announces the imminence of the end of the world in a cataclysm which will destroy everything. Then the ancestors will return, or God, or some other liberating power, will appear, bringing all the goods the people desire, and ushering in a reign of eternal bliss.

An anthropology encyclopedia gives a similar definition:

A southwest Pacific example of messianic or millenarian movements once common throughout the colonial world, the modal cargo cult was an agitation or organised social movement of Melanesian villagers in pursuit of ‘cargo’ by means of renewed or invented ritual action that they hoped would induce ancestral spirits or other powerful beings to provide. Typically, an inspired prophet with messages from those spirits persuaded a community that social harmony and engagement in improvised ritual (dancing, marching, flag-raising) or revived cultural traditions would, for believers, bring them cargo.

As you may see, the pop-culture explanation of a cargo cult and the anthropological definition are completely different, apart from the presence of "cargo" of some sort. Have anthropologists buried cargo cults under layers of theory? Are they even discussing the same thing? My conclusion, after researching many primary sources, is that the anthropological description accurately describes the wide variety of cargo cults. The pop-culture cargo cult description, however, takes features of some cargo cults (the occasional runway) and combines this with movie scenes to yield an inaccurate and fictionalized dscription. It may be hard to believe that the description of cargo cults that you see on the internet is mostly wrong, but in the remainder of this article, I will explain this in detail.

Background on Melanesia

Cargo cults occur in a specific region of the South Pacific called Melanesia. I'll give a brief (oversimplified) description of Melanesia to provide important background. The Pacific Ocean islands are divided into three cultural areas: Polynesia, Micronesia, and Melanesia. Polynesia is the best known, including Hawaii, New Zealand, and Samoa. Micronesia, in the northwest, consists of thousands of small islands, of which Guam is the largest; the name "Micronesia" is Greek for "small island". Melanesia, the relevant area for this article, is a group of islands between Micronesia and Australia, including Fiji, Vanuatu, Solomon Islands, and New Guinea. (New Guinea is the world's second-largest island; confusingly, the country of Papua New Guinea occupies the eastern half of the island, while the western half is part of Indonesia.)

Major cultural areas of Oceania. Image by https://commons.wikimedia.org/wiki/File:Pacific_Culture_Areas.jpg.

The inhabitants of Melanesia typically lived in small villages of under 200 people, isolated by mountainous geography. They had a simple, subsistence economy, living off cultivated root vegetables, pigs, and hunting. People tended their own garden, without specialization into particular tasks. The people of Melanesia are dark-skinned, which will be important ("Melanesia" and "melanin" have the same root). Technologically, the Melanesians used stone, wood, and shell tools, without knowledge of metallurgy or even weaving. The Melanesian cultures were generally violent3 with everpresent tribal warfare and cannibalism.4

Due to the geographic separation of tribes, New Guinea became the most linguistically diverse country in the world, with over 800 distinct languages. Pidgin English was often the only way for tribes to communicate, and is now one of the official languages of New Guinea. This language, called Tok Pisin (i.e. "talk pidgin"), is now the most common language in Papua New Guinea, spoken by over two-thirds of the population.5

For the Melanesians, religion was a matter of ritual, rather than a moral framework. It is said that "to the Melanesian, a religion is above all a technology: it is the knowledge of how to bring the community into the correct relation, by rites and spells, with the divinities and spirit-beings and cosmic forces that can make or mar man's this-worldly wealth and well-being." This is important since, as will be seen, the Melanesians expected that the correct ritual would result in the arrival of cargo. Catholic and Protestant missionaries converted the inhabitants to Christianity, largely wiping out traditional religious practices and customs; Melanesia is now over 95% Christian. Christianity played a large role in cargo cults, as will be shown below.

European explorers first reached Melanesia in the 1500s, followed by colonization.6 By the end of the 1800s, control of the island of New Guinea was divided among Germany, Britain, and the Netherlands. Britain passed responsibility to Australia in 1906 and Australia gained the German part of New Guinea in World War I. As for the islands of Vanuatu, the British and French colonized them (under the name New Hebrides) in the 18th century.

The influx of Europeans was highly harmful to the Melanesians. "Native society was severely disrupted by war, by catastrophic epidemics of European diseases, by the introduction of alcohol, by the devastation of generations of warfare, and by the depredations of the labour recruiters."8 People were kidnapped and forced to work as laborers in other countries, a practice called blackbirding. Prime agricultural land was taken by planters to raise crops such as coconuts for export, with natives coerced into working for the planters.9 Up until 1919, employers were free to flog the natives for disobedience; afterward, flogging was technically forbidden but still took place. Colonial administrators jailed natives who stepped out of line.7

Cargo cults before World War II

While the pop-culture cargo cults explains them as a reaction to World War II, cargo cults started years earlier. One anthropologist stated, "Cargo cults long preceded [World War II], continued to occur during the war, and have continued to the present."

The first writings about cargo cult behavior date back to 1919, when it was called the "Vailala Madness":10

The natives were saying that the spirits of their ancestors had appeared to several in the villages and told them that all flour, rice, tobacco, and other trade belonged to the New Guinea people, and that the white man had no right whatever to these goods; in a short time all the white men were to be driven away, and then everything would be in the hands of the natives; a large ship was also shortly to appear bringing back the spirits of their departed relatives with quantities of cargo, and all the villages were to make ready to receive them.

The 1926 book In Unknown New Guinea also describes the Vialala Madness:11

[The leader proclaimed] that the ancestors were coming back in the persons of the white people in the country and that all the things introduced by the white people and the ships that brought them belonged really to their ancestors and themselves. [He claimed that] he himself was King George and his friend was the Governor. Christ had given him this authority and he was in communication with Christ through a hole near his village.

The Melanesians blamed the Europeans for the failure of cargo to arrive. In the 1930s, one story was that because the natives had converted to Christianity, God was sending the ancestors with cargo that was loaded on ships. However, the Europeans were going through the cargo holds and replacing the names on the crates so the cargo was fraudulently delivered to the Europeans instead of the rightful natives.

The Mambu Movement occurred in 1937. Mambu, the movement's prophet, claimed that "the Whites had deceived the natives. The ancestors lived inside a volcano on Manum Island, where they worked hard making goods for their descendants: loin-cloths, socks, metal axes, bush-knives, flashlights, mirrors, red dye, etc., even plank-houses, but the scoundrelly Whites took the cargoes. Now this was to stop. The ancestors themselves would bring the goods in a large ship." To stop this movement, the Government arrested Mambu, exiled him, and imprisoned him for six months in 1938.

To summarize, these early cargo cults believed that ships would bring cargo that rightfully belonged to the natives but had been stolen by the whites. The return of the cargo would be accompanied by the spirits of the ancestors. Moreover, Christianity often played a large role. A significant racial component was present, with natives driving out the whites or becoming white themselves.

Cargo cults in World War II and beyond

World War II caused tremendous social and economic upheavals in Melanesia. Much of Melanesia was occupied by Japan near the beginning of the war and the Japanese treated the inhabitants harshly. The American entry into the war led to heavy conflict in the area such as the arduous New Guinea campaign (1942-1945) and the Solomon Islands campaign. As the Americans and Japanese battled for control of the islands, the inhabitants were caught in the middle. Papua and New Guinea suffered over 15,000 civilian deaths, a shockingly high number for such a small region.12


The photo shows a long line of F4F Wildcats at Henderson Field, Guadalcanal, Solomon Islands, April 14, 1943.
Solomon Islands was home to several cargo cults, both before and after World War II (see map).
Source: US Navy photo 80-G-41099.

The photo shows a long line of F4F Wildcats at Henderson Field, Guadalcanal, Solomon Islands, April 14, 1943. Solomon Islands was home to several cargo cults, both before and after World War II (see map). Source: US Navy photo 80-G-41099.

The impact of the Japanese occupation on cargo cults is usually ignored. One example from 1942 is a cargo belief that the Japanese soldiers were spirits of the dead, who were being sent by Jesus to liberate the people from European rule. The Japanese would bring the cargo by airplane since the Europeans were blocking the delivery of cargo by ship. This would be accompanied by storms and earthquakes, and the natives' skin would change from black to white. The natives were to build storehouses for the cargo and fill the storehouses with food for the ancestors. The leader of this movement, named Tagarab, explained that he had an iron rod that gave him messages about the future. Eventually, the Japanese shot Tagarab, bringing an end to this cargo cult.13

The largest and most enduring cargo cult is the John Frum movement, which started on the island of Tanna around 1941 and continues to the present. According to one story, a mythical person known as John Frum, master of the airplanes, would reveal himself and drive off the whites. He would provide houses, clothes, and food for the people of Tanna. The island of Tanna would flatten as the mountains filled up the valleys and everyone would have perfect health. In other areas, the followers of John Frum believed they "would receive a great quantity of goods, brought by a white steamer which would come from America." Families abandoned the Christian villages and moved to primitive shelters in the interior. They wildly spent much of their money and threw the rest into the sea. The government arrested and deported the leaders, but that failed to stop the movement. The identity of John Frum is unclear; he is sometimes said to be a white American while in other cases natives have claimed to be John Frum.14

The cargo cult of Kainantu17 arose around 1945 when a "spirit wind" caused people in the area to shiver and shake. Villages built large "cargo houses" and put stones, wood, and insect-marked leaves inside, representing European goods, rifles, and paper letters respectively. They killed pigs and anointed the objects, the house, and themselves with blood. The cargo house was to receive the visiting European spirit of the dead who would fill the house with goods. This cargo cult continued for about 5 years, diminishing as people became disillusioned by the failure of the goods to arrive.

The name "Cargo Cult" was first used in print in 1945, just after the end of World War II.15 The article blamed the problems on the teachings of missionaries, with the problems "accentuated a hundredfold" by World War II.

Stemming directly from religious teaching of equality, and its resulting sense of injustice, is what is generally known as “Vailala Madness,” or “Cargo Cult.” "In all cases the "Madness" takes the same form: A native, infected with the disorder, states that he has been visited by a relative long dead, who stated that a great number of ships loaded with "cargo" had been sent by the ancestor of the native for the benefit of the natives of a particular village or area. But the white man, being very cunning, knows how to intercept these ships and takes the "cargo" for his own use... Livestock has been destroyed, and gardens neglected in the expectation of the magic cargo arriving. The natives infected by the "Madness" sank into indolence and apathy regarding common hygiene."

In a 1946 episode, agents of the Australian government found a group of New Guinea highlanders who believed that the arrival of the whites signaled that the end of the world was at hand. The highlanders butchered all their pigs in the expectation that "Great Pigs" would appear from the sky in three days. At this time, the residents would exchange their black skin for white skin. They created mock radio antennas of bamboo and rope to receive news of the millennium.16

The New York Times described Cargo Cults in 1948 as "the belief that a convoy of cargo ships is on its way, laden with the fruits of the modern world, to outfit the leaf huts of the natives." The occupants of the British Solomon Islands were building warehouses along the beaches to hold these goods. Natives marched into a US Army camp, presented $3000 in US money, and asked the Army to drive out the British.

A 1951 paper described cargo cults: "The insistence that a 'cargo' of European goods is to be sent by the ancestors or deceased spirits; this may or may not be part of a general reaction against Europeans, with an overtly expressed desire to be free from alien domination. Usually the underlying theme is a belief that all trade goods were sent by ancestors or spirits as gifts for their descendants, but have been misappropriated on the way by Europeans."17

In 1959, The New York Times wrote about cargo cults: "Rare Disease and Strange Cult Disturb New Guinea Territory; Fatal Laughing Sickness Is Under Study by Medical Experts—Prophets Stir Delusions of Food Arrivals". The article states that "large native groups had been infected with the idea that they could expect the arrival of spirit ships carrying large supplies of food. In false anticipation of the arrival of the 'cargoes', 5000 to 7000 native have been known to consume their entire food reserve and create a famine." As for "laughing sickness", this is now known to be a prion disease transmitted by eating human brains. In some communities, this disease, also called Kuru, caused 50% of all deaths.

A detailed 1959 article in Scientific American, "Cargo Cults", described many different cargo cults.16 It lists various features of cargo cults, such as the return of the dead, skin color switching from black to white, threats against white rule, and belief in a coming messiah. The article finds a central theme in cargo cults: "The world is about to end in a terrible cataclysm. Thereafter God, the ancestors or some local culture hero will appear and inaugurate a blissful paradise on earth. Death, old age, illness and evil will be unknown. The riches of the white man will accrue to the Melanesians."

In 1960, the celebrated naturalist David Attenborough created a documentary The People of Paradise: Cargo Cult.18 Attenborough travels through the island of Tanna and encounters many artifacts of the John Frum cult, such as symbolic gates and crosses, painted brilliant scarlet and decorated with objects such as a shaving brush, a winged rat, and a small carved airplane. Attenborough interviews a cult leader who claims to have talked with the mythical John Frum, said to be a white American. The leader remains in communication with John Frum through a tall pole said to be a radio mast, and an unseen radio. (The "radio" consisted of an old woman with electrical wire wrapper around her waist, who would speak gibberish in a trance.)

"Symbols of the cargo cult." In the center, a representation of John Frum with "scarlet coat and a white European face" stands behind a brilliantly painted cross. A wooden airplane is on the right, while on the left (outside the photo) a cage contains a winged rat. From Journeys to the Past, which describes Attenborough's visit to the island of Tanna.

"Symbols of the cargo cult." In the center, a representation of John Frum with "scarlet coat and a white European face" stands behind a brilliantly painted cross. A wooden airplane is on the right, while on the left (outside the photo) a cage contains a winged rat. From Journeys to the Past, which describes Attenborough's visit to the island of Tanna.

In 1963, famed anthropologist Margaret Mead brought cargo cults to the general public, writing Where Americans are Gods: The Strange Story of the Cargo Cults in the mass-market newspaper supplement Family Weekly. In just over a page, this article describes the history of cargo cults before, during, and after World War II.19 One cult sat around a table with vases of colorful flowers on them. Another cult threw away their money. Another cult watched for ships from hilltops, expecting John Frum to bring a fleet of ships bearing cargo from the land of the dead.

One of the strangest cargo cults was a group of 2000 people on New Hanover Island, "collecting money to buy President Johnson of the United States [who] would arrive with other Americans on the liner Queen Mary and helicopters next Tuesday." The islanders raised $2000, expecting American cargo to follow the president. Seeing the name Johnson on outboard motors confirmed their belief that President Johnson was personally sending cargo.20

A 1971 article in Time Magazine22 described how tribesmen brought US Army concrete survey markers down from a mountaintop while reciting the Roman Catholic rosary, dropping the heavy markers outside the Australian government office. They expected that "a fleet of 500 jet transports would disgorge thousands of sympathetic Americans bearing crates of knives, steel axes, rifles, mirrors and other wonders." Time magazine explained the “cargo cult” as "a conviction that if only the dark-skinned people can hit on the magic formula, they can, without working, acquire all the wealth and possessions that seem concentrated in the white world... They believe that everything has a deity who has to be contacted through ritual and who only then will deliver the cargo." Cult leaders tried "to duplicate the white man’s magic. They hacked airstrips in the rain forest, but no planes came. They built structures that look like white men’s banks, but no money materialized."21

National Geographic, in an article Head-hunters in Today's World (1972), mentioned a cargo-cult landing field with a replica of a radio aerial, created by villagers who hoped that it would attract airplanes bearing gifts. It also described a cult leader in South Papua who claimed to obtain airplanes and cans of food from a hole in the ground. If the people believed in him, their skins would turn white and he would lead them to freedom.

These sources and many others23 illustrate that cargo cults do not fit a simple story. Instead, cargo cults are extremely varied, happening across thousands of miles and many decades. The lack of common features between cargo cults leads some anthropologists to reject the idea of cargo cults as a meaningful term.24 In any case, most historical cargo cults have very little in common with the pop-culture description of a cargo cult.

Cargo beliefs were inspired by Christianity

Cargo cult beliefs are closely tied to Christianity, a factor that is ignored in pop-culture descriptions of cargo cults. Beginning in the mid-1800s, Christian missionaries set up churches in New Guinea to convert the inhabitants. As a result, cargo cults incorporated Christian ideas, but in very confusing ways. At first, the natives believed that missionaries had come to reveal the ritual secrets and restore the cargo. By enthusiastically joining the church, singing the hymns, and following the church's rituals, the people would be blessed by God, who would give them the cargo. This belief was common in the 1920s and 1930s, but as the years went on and the people didn't receive the cargo, they theorized that the missionaries had removed the first pages of the Bible to hide the cargo secrets.

A typical belief was that God created Adam and Eve in Paradise, "giving them cargo: tinned meat, steel tools, rice in bags, tobacco in tins, and matches, but not cotton clothing." When Adam and Eve offended God by having sexual intercourse, God threw them out of Paradise and took their cargo. Eventually, God sent the Flood but Noah was saved in a steamship and God gave back the cargo. Noah's son Ham offended God, so God took the cargo away from Ham and sent him to New Guinea, where he became the ancestor of the natives.

Other natives believed that God lived in Heaven, which was in the clouds and reachable by ladder from Sydney, Australia (source). God, along with the ancestors, created cargo in Heaven—"tinned meat, bags of rice, steel tools, cotton cloth, tinned tobacco, and a machine for making electric light"—which would be flown from Sydney and delivered to the natives, who thus needed to clear an airstrip (source).25

Another common belief was that symbolic radios could be used to communicate with Jesus. For instance, a Markham Valley cargo group in 1943 created large radio houses so they could be informed of the imminent Coming of Jesus, at which point the natives would expel the whites (source). The "radio" consisted of bamboo cylinders connected to a rope "aerial" strung between two poles. The houses contained a pole with rungs so the natives could climb to Jesus along with cane "flashlights" to see Jesus.

A tall mast with a flag and cross on top. This was claimed to be a special radio mast that enabled
communication with John Frum. It was decorated with scarlet leaves and flowers.
From Attenborough's Cargo Cult.

A tall mast with a flag and cross on top. This was claimed to be a special radio mast that enabled communication with John Frum. It was decorated with scarlet leaves and flowers. From Attenborough's Cargo Cult.

Mock radio antennas are also discussed in a 1943 report26 from a wartime patrol that found a bamboo "wireless house", 42 feet in diameter. It had two long poles outside and with an "aerial" of rope between them, connected to the "radio" inside, a bamboo cylinder. Villagers explained that the "radio" was to receive messages of the return of Jesus, who would provide weapons for the overthrow of white rule. The villagers constructed ladders outside the house so they could climb up to the Christian God after death. They would shed their skin like a snake, getting a new white skin, and then they would receive the "boats and white men's clothing, goods, etc."

Mondo Cane and the creation of the pop-culture cargo cult

As described above, cargo cults expected the cargo to arrive by ships much more often than airplanes. So why do pop-culture cargo cults have detailed descriptions of runways, airplanes, wooden headphones, and bamboo control towers?27 My hypothesis is that it came from a 1962 movie called Mondo Cane. This film was the first "shockumentary", showing extreme and shocking scenes from around the world. Although the film was highly controversial, it was shown at the Cannes Film Festival and was a box-office success.

The film made extensive use of New Guinea with multiple scandalous segments, such as a group of "love-struck" topless women chasing men,29 a woman breastfeeding a pig, and women in cages being fattened for marriage. The last segment in the movie showed "the cult of the cargo plane": natives forlornly watching planes at the airport, followed by scenes of a bamboo airplane sitting on a mountaintop "runway" along with bamboo control towers. The natives waited all day and then lit torches to illuminate the runway at nightfall. These scenes are very similar to the pop-culture descriptions of cargo cults so I suspect this movie is the source.

A still from the 1962 movie "Mondo Cane", showing a bamboo airplane sitting on a runway, with flaming torches acting as beacons. I have my doubts about its accuracy.

A still from the 1962 movie "Mondo Cane", showing a bamboo airplane sitting on a runway, with flaming torches acting as beacons. I have my doubts about its accuracy.

The film claims that all the scenes "are true and taken only from life", but many of the scenes are said to be staged. Since the cargo cult scenes are very different from anthropological reports and much more dramatic, I think they were also staged and exaggerated.28 It is known that the makers of Mondo Cane paid the Melanesian natives generously for the filming (source, source).

Did Feynman get his cargo cult ideas from Mondo Cane? It may seem implausible since the movie was released over a decade earlier. However, the movie became a cult classic, was periodically shown in theaters, and influenced academics.30 In particular, Mondo Cane showed at the famed Cameo theater in downtown Los Angeles on April 3, 1974, two months before Feynman's commencement speech. Mondo Cane seems like the type of offbeat movie that Feynman would see and the theater was just 11 miles from Caltech. While I can't prove that Feynman went to the showing, his description of a cargo cult strongly resembles the movie.31

Fake cargo-cult photos fill the internet

Fakes and hoaxes make researching cargo cults online difficult. There are numerous photos online of cargo cults, but many of these photos are completely made up. For instance, the photo below has illustrated cargo cults for articles such as Cargo Cult, UX personas are useless, A word on cargo cults, The UK Integrated Review and security sector innovation, and Don't be a cargo cult. However, this photo is from a Japanese straw festival and has nothing to do with cargo cults.

An airplane built from straw, one creation at a Japanese straw festival. I've labeled the photo with "Not cargo cult" to ensure it doesn't get reused in cargo cult articles.

An airplane built from straw, one creation at a Japanese straw festival. I've labeled the photo with "Not cargo cult" to ensure it doesn't get reused in cargo cult articles.

Another example is the photo below, supposedly an antenna created by a cargo cult. However, it is actually a replica of the Jodrell Bank radio telescope, built in 2007 by a British farmer from six tons of straw (details). The farmer's replica ended up erroneously illustrating Cargo Cult Politics, The Cargo Cult & Beliefs, The Cargo Cult, Cargo Cults of the South Pacific, and Cargo Cult, among others.32

A British farmer created this replica radio telescope. Photo by Mike Peel, (CC BY-SA 4.0).

A British farmer created this replica radio telescope. Photo by Mike Peel, (CC BY-SA 4.0).

Other articles illustrate cargo cults with the aircraft below, suspiciously sleek and well-constructed. However, the photo actually shows a wooden wind tunnel model of the Buran spacecraft, abandoned at a Russian airfield as described in this article. Some uses of the photo are Are you guilty of “cargo cult” thinking without even knowing it? and The Cargo Cult of Wealth.

This is an abandoned Soviet wind tunnel model of the Buran spacecraft. Photo by Aleksandr Markin.

This is an abandoned Soviet wind tunnel model of the Buran spacecraft. Photo by Aleksandr Markin.

Many cargo cult articles use one of the photo below. I tracked them down to the 1970 movie "Chariots of the Gods" (link), a dubious documentary claiming that aliens have visited Earth throughout history. The segment on cargo cults is similar to Mondo Cane with cultists surrounding a mock plane on a mountaintop, lighting fires along the runway. However, it is clearly faked, probably in Africa: the people don't look like Pacific Islanders and are wearing wigs. One participant wears leopard skin (leopards don't live in the South Pacific). The vegetation is another giveaway: the plants are from Africa, not the South Pacific.33

Two photos of a straw plane from "Chariots of the Gods".

Two photos of a straw plane from "Chariots of the Gods".

The point is that most of the images that illustrate cargo cults online are fake or wrong. Most internet photos and information about cargo cults have just been copied from page to page. (And now we have AI-generated cargo cult photos.) If a photo doesn't have a clear source (including who, when, and where), don't believe it.

Conclusions

The cargo cult metaphor should be avoided for three reasons. First, the metaphor is essentially meaningless and heavily overused. The influential "Jargon File" defined cargo-cult programming as "A style of (incompetent) programming dominated by ritual inclusion of code or program structures that serve no real purpose."34 Note that the metaphor in cargo-cult programming is the opposite of the metaphor in cargo-cult science: Feyman's cargo-cult science has no chance of working, while cargo-cult programming works but isn't understood. Moreover, both metaphors differ from the cargo-cult metaphor in other contexts, referring to the expectation of receiving valuables without working.35

The popular site Hacker News is an example of how "cargo cult" can be applied to anything: agile programming, artificial intelligence, cleaning your desk. Go, hatred of Perl, key rotation, layoffs, MBA programs, microservices, new drugs, quantum computing, static linking, test-driven development, and updating the copyright year are just a few things that are called "cargo cult".36 At this point, cargo cult is simply a lazy, meaningless attack.

The second problem with "cargo cult" is that the pop-culture description of cargo cults is historically inaccurate. Actual cargo cults are much more complex and include a much wider (and stranger) variety of behaviors. Cargo cults started before World War II and involve ships more often than airplanes. Cargo cults mix aspects of paganism and Christianity, often with apocalyptic ideas of the end of the current era, the overthrow of white rule, and the return of dead ancestors. The pop-culture description discards all this complexity, replacing it with a myth.

Finally, the cargo cult metaphor turns decades of harmful colonialism into a humorous anecdote. Feynman's description of cargo cults strips out the moral complexity: US soldiers show up with their cargo and planes, the indigenous residents amusingly misunderstand the situation, and everyone carries on. However, cargo cults really were a response to decades of colonial mistreatment, exploitation, and cultural destruction. Moreover, cargo cults were often harmful: expecting a bounty of cargo, villagers would throw away their money, kill their pigs, and stop tending their crops, resulting in famine. The pop-culture cargo cult erases the decades of colonial oppression, along with the cultural upheaval and deaths from World War II. Melanesians deserve to be more than the punch line in a cargo cult story.

Thus, it's time to move beyond the cargo cult metaphor.

Notes and references

  1. As an illustration of the popularity of Feynman's "Cargo Cult Science" commencement address, it has been on Hacker News at least 15 times. 

  2. The first cargo cult definition above comes from The Trumpet Shall Sound; A Study of "Cargo" Cults in Melanesia. The second definition is from the Cargo Cult entry in The Open Encyclopedia of Anthropology. Written by Lamont Lindstrom, a professor who studies Melanesia, the entry comprehensively describes the history and variety of cargo cults, as well as current anthropological analysis.

    For an early anthropological theory of cargo cults, see An Empirical Case-Study: The Problem of Cargo Cults in "The Revolution in Anthropology" (Jarvie, 1964). This book categorizes cargo cults as an apocalyptic millenarian religious movement with a central tenet:

    When the millennium comes it will largely consist of the arrival of ships and/or aeroplanes loaded up with cargo; a cargo consisting either of material goods the natives long for (and which are delivered to the whites in this manner), or of the ancestors, or of both.
     

  3. European colonization brought pacification and a reduction in violence. The Cargo Cult: A Melanesian Type-Response to Change describes this pacification and termination of warfare as the Pax Imperii, suggesting that pacification came as a relief to the Melanesians: "They welcomed the cessation of many of the concomitants of warfare: the sneak attack, ambush, raiding, kidnapping of women and children, cannibalism, torture, extreme indignities inflicted on captives, and the continual need to be concerned with defense." That article calls the peace the Pax Imperii.

    Warfare among the Enga people of New Guinea is described in From Spears to M-16s: Testing the Imbalance of Power Hypothesis among the Enga. The Enga engaged in tribal warfare for reasons such as "theft of game from traps, quarrels over possessions, or work sharing within the group." The surviving losers were usually driven off the land and forced to settle elsewhere. In the 1930s and 1940s, the Australian administration banned tribal fighting and pacified much of the area. However, after the independence of Papua New Guinea in 1975, warfare increased along with the creation of criminal gangs known as Raskols (rascals). The situation worsened in the late 1980s with the introduction of shotguns and high-powered weapons to warfare. Now, Papua New Guinea has one of the highest crime rates in the world along with one of the lowest police-to-population ratios in the world. 

  4. When you hear tales of cannibalism, some skepticism is warranted. However, cannibalism is proved by the prevalence of kuru, or "laughing sickness", a fatal prion disease (transmissible spongiform encephalopathy) spread by consuming human brains. Also see Headhunters in Today's World, a 1972 National Geographic article that describes the baking of heads and the eating of brains. 

  5. A 1957 dictionary of Pidgin English can be found here. Linguistically, Tok Pisin is a creole, not a pidgin. 

  6. The modern view is that countries such as Great Britain acquired colonies against the will of the colonized, but the situation was more complex in the 19th century. Many Pacific islands desperately wanted to become European colonies, but were turned down for years because the countries were viewed as undesiable burdens.

    For example, Fiji viewed colonization as the solution to the chaos caused by the influx of white settlers in the 1800s. Fijian political leaders attempted to cede the islands to a European power that could end the lawlessness, but were turned down. In 1874, the situation changed when Disraeli was elected British prime minister. His pro-imperial policies, along with the Royal Navy's interest in obtaining a coaling station, concerns about American expansion, and pressure from anti-slavery groups, led to the annexation of Fiji by Britain. The situation in Fiji didn't particularly improve from annexation. (Fiji obtained independence almost a century later, in 1970.)

    As an example of the cost of a colony, Australia was subsidizing Papua New Guinea (with a population of 2.5 million) with over 100 million dollars a year in the early 1970s. (source

  7. When reading about colonial Melanesia, one notices a constant background of police activity. Even when police patrols were very rare (annual in some parts), they were typically accompanied by arbitrary arrests and imprisonment. The most common cause for arrest was adultery; it may seem strange that the police were so concerned with it, but it turns out that adultery was the most common cause of warfare between tribes, and the authorities were trying to reduce the level of warfare. Cargo cult activity could be punished by six months of imprisonment. Jailing tended to be ineffective in stopping cargo cults, however, as it was viewed as evidence that the Europeans were trying to stop the cult leaders from spreading the cargo secrets that they had uncovered. 

  8. See The Trumpet Shall Sound

  9. The government imposed a head tax, which for the most part could only be paid through employment. A 1924 report states, "The primary object of the head tax was not to collect revenue but to create among the natives a need for money, which would make labour for Europeans desirable and would force the natives to accept employment." 

  10. The Papua Annual Report, 1919-20 includes a report on the "Vailala Madness", starting on page 118. It describes how villages with the "Vialala madness" had "ornamented flag-poles, long tables, and forms or benches, the tables being usually decorated with flowers in bottles of water in imitation of a white man's dining table." Village men would sit motionless with their backs to the tables. Their idleness infuriated the white men, who considered the villagers to be "fit subjects for a lunatic asylum." 

  11. The Vailala Madness is also described in The Missionary Review of the World, 1924. The Vaialala Madness also involved seizure-like physical aspects, which typically didn't appear in later cargo cult behavior.

    The 1957 book The Trumpet Shall Sound: A Study of "Cargo" Cults in Melanesia is an extensive discussion of cargo cults, as well as earlier activity and movements. Chapter 4 covers the Vailala Madness in detail. 

  12. The battles in the Pacific have been extensively described from the American and Japanese perspectives, but the indigenous residents of these islands are usually left out of the narratives. This review discusses two books that provide the Melanesian perspective.

    I came across the incredible story of Sergeant Major Vouza of the Native Constabulary. While this story is not directly related to cargo cults, I wanted to include it as it illustrates the dedication and suffering of the New Guinea natives during World War II. Vouza volunteered to scout behind enemy lines for the Marines at Guadalcanal but he was captured by the Japanese, tied to a tree, tortured, bayonetted, and left for dead. He chewed through his ropes, made his way through the enemy force, and warned the Marines of an impending enemy attack.

    SgtMaj Vouza, British Solomon Islands Constabulary.
From The Guadalcanal Campaign, 1949.

    SgtMaj Vouza, British Solomon Islands Constabulary. From The Guadalcanal Campaign, 1949.

    Vouza described the event in a letter:

    Letter from SgtMaj Vouza to Hector MacQuarrie, 1984. From The Guadalcanal Campaign.

    Letter from SgtMaj Vouza to Hector MacQuarrie, 1984. From The Guadalcanal Campaign.

     

  13. The Japanese occupation and the cargo cult started by Tagareb are described in detail in Road Belong Cargo, pages 98-110. 

  14. See "John Frum Movement in Tanna", Oceania, March 1952. The New York Times described the John Frum movement in detail in a 1970 article: "On a Pacific island, they wait for the G.I. who became a God". A more modern article (2006) on John Frum is In John They Trust in the Smithsonian Magazine.

    As for the identity of John Frum, some claim that his name is short for "John from America". Others claim it is a modification of "John Broom" who would sweep away the whites. These claims lack evidence. 

  15. The quote is from Pacific Islands Monthly, November 1945 (link). The National Library of Australia has an extensive collection of issues of Pacific Islands Monthly online. Searching these magazines for "cargo cult" provides an interesting look at how cargo cults were viewed as they happened. 

  16. Scientific American had a long article titled Cargo Cults in May 1959, written by Peter Worsley, who also wrote the classic book The Trumpet Shall Sound: A Study of 'Cargo' Cults in Melanesia. The article lists the following features of cargo cults:

    • Myth of the return of the dead
    • Revival or modification of paganism
    • Introduction of Christian elements
    • Cargo myth
    • Belief that Negroes will become white men and vice versa
    • Belief in a coming messiah
    • Attempts to restore native political and economic control
    • Threats and violence against white men
    • Union of traditionally separate and unfriendly groups

    Different cargo cults contained different subsets of these features but no specific feature The article is reprinted here; the detailed maps show the wide distribution of cargo cults. 

  17. See A Cargo Movement in the Eastern Central Highlands of New Guinea, Oceania, 1952. 

  18. The Attenborough Cargo Cult documentary can be watched on YouTube.

    I'll summarize some highlights with timestamps:
    5:20: A gate, palisade, and a cross all painted brilliant red.
    6:38: A cross decorated with a wooden bird and a shaving brush.
    7:00: A tall pole claimed to be a special radio mast to talk with John Frum.
    8:25: Interview with trader Bob Paul. He describes "troops" marching with wooden guns around the whole island.
    12:00: Preparation and consumption of kava, the intoxicating beverage.
    13:08: Interview with a local about John Frum.
    14:16: John Frum described as a white man and a big fellow.
    16:29: Attenborough asks, "You say John Frum has not come for 19 years. Isn't this a long time for you to wait?" The leader responds, "No, I can wait. It's you waiting for two thousand years for Christ to come and I must wait over 19 years." Attenborough accepts this as a fair point.
    17:23: Another scarlet gate, on the way to the volcano, with a cross, figure, and model airplane.
    22:30: Interview with the leader. There's a discussion of the radio, but Attenborough is not allowed to see it.
    24:21: John Frum is described as a white American.

    The expedition is also described in David Attenborough's 1962 book Quest in Paradise.  

  19. I have to criticize Mead's article for centering Americans as the heroes, almost a parody of American triumphalism. The title sets the article's tone: "Where Americans are Gods..." The article explains, "The Americans were lavish. They gave away Uncle Sam's property with a generosity which appealed mightily... so many kind, generous people, all alike, with such magnificent cargoes! The American servicemen, in turn, enjoyed and indulged the islanders."

    The article views cargo cults as a temporary stage before moving to a prosperous American-style society as islanders realized that "American things could come [...] only by work, education, persistence." A movement leader named Paliau is approvingly quoted: "We would like to have the things Americans have. [...] We think Americans have all these things because they live under law, without endless quarrels. So we must first set up a new society."

    On the other hand, by most reports, the Americans treated the residents of Melanesia much better than the colonial administrators. Americans paid the natives much more (which was viewed as overpaying them by the planters). The Americans treated the natives with much more respect; natives worked with Americans almost as equals. Finally, it appeared to the natives that black soldiers were treated as equals to white soldiers. (Obviously, this wasn't entirely accurate.)

    The Melanesian experience with Americans also strengthened Melanesian demands for independence. Following the war, the reversion to colonial administration produced a lot of discontent in the natives, who realized that their situation could be much better. (See World War II and Melanesian self-determination.) 

  20. The Johnson cult was analyzed in depth by Billings, an anthropologist who wrote about it in Cargo Cult as Theater: Political Performance in the Pacific. See also Australian Daily News, June 12, 1964, and Time Magazine, July 19, 1971. 

  21. In one unusual case, the islanders built an airstrip and airplanes did come. Specifically, the Miyanmin people of New Guinea hacked an airstrip out of the forest in 1966 using hand tools. The airstrip was discovered by a patrol and turned out to be usable, so Baptist missionaries made monthly landings, bringing medicine and goods for a store. It is pointed out that the only thing preventing this activity from being considered a cargo cult is that in this case, it was effective. See A Small Footnote to the 'Big Walk', p. 59. 

  22. See "New Guinea: Waiting for That Cargo", Time Magazine, July 19, 1971.  

  23. In this footnote, I'll list some interesting cargo cult stories that didn't fit into the body of the article.

    The 1964 US Bureau of Labor Statistics report on New Guinea describes cargo cults: "A simplified explanation of them is often given namely that contact with Western culture has given the indigene a desire for a better economic standard of living this desire has not been accompanied by the understanding that economic prosperity is achieved by human effort. The term cargo cult derives from the mystical expectation of the imminent arrival by sea or air of the good things of this earth. It is believed sufficient to build warehouses of leaves and prepare air strips to receive these goods. Activity in the food gardens and daily community routine chores is often neglected so that economic distress is engendered."

    Cargo Cult Activity in Tangu (Burridge) is a 1954 anthropological paper discussing stories of three cargo cults in Tangu, a region of New Guinea. The first involved dancing around a man in a trance, which was supposed to result in the appearance of "rice, canned meat, lava-lavas, knives, beads, etc." In the second story, villagers built a shed in a cemetery and then engaged in ritualized sex acts, expecting the shed to be filled with goods. However, the authorities forced the participants to dismantle the shed and throw it into the sea. In the third story, the protagonist is Mambu, who stowed away on a steamship to Australia, where he discovered the secrets of the white man's cargo. On his return, he collected money to help force the Europeans out, until he was jailed. He performed "miracles" by appearing outside jail as well as by producing money out of thin air.

    Reaction to Contact in the Eastern Highlands of New Guinea (Berndt, 1954) has a long story about Berebi, a leader who was promised a rifle, axes, cloth, knives, and valuable cowrie by a white spirit. Berebi convinces his villagers to build storehouses and they filled the houses with stones that would be replaced by goods. They take part in many pig sacrifices and various rituals, and endure attacks of shivering and paralysis, but they fail to receive any goods and Berebi concludes that the spirit deceived him. 

  24. Many anthropologists view the idea of cargo cults as controversial. One anthropologist states, "What I want to suggest here is that, similarly, cargo cults do not exist, or at least their symptoms vanish when we start to doubt that we can arbitrarily extract a few features from context and label them an institution." See A Note on Cargo Cults and Cultural Constructions of Change (1988). The 1992 paper The Yali Movement in Retrospect: Rewriting History, Redefining 'Cargo Cult' summarizes the uneasiness that many anthropologists have with the term "cargo cult", viewing it as "tantamount to an invocation of colonial power relationships."

    The book Cargo, Cult, and Culture Critique (2004) states, "Some authors plead quite convincingly for the abolition of the term itself, not only because of its troublesome implications, but also because, in their view, cargo cults do not even exist as an identifiable object of study." One paper states that the phrase is both inaccurate and necessary, proposing that it be written crossed-out (sous rature in Derrida's post-modern language). Another paper states: "Cargo cults defy definition. They are inherently troublesome and problematic," but concludes that the term is useful precisely because of this troublesome nature.

    At first, I considered the idea of abandoning the label "cargo cult" to be absurd, but after reading the anthropological arguments, it makes more sense. In particular, the category "cargo cult" is excessively broad, lumping together unrelated things and forcing them into a Procrustean ideal: John Frum has very little in common with Vaialala Madness, let alone the Johnson Cult. I think that the term "cargo cult" became popular due to its catchy, alliterative name. (Journalists love alliterations such as "Digital Divide" or "Quiet Quitting".) 

  25. It was clear to the natives that the ancestors, and not the Europeans, must have created the cargo because the local Europeans were unable to repair complex mechanical devices locally, but had to ship them off. These ships presumably took the broken devices back to the ancestral spirits to be repaired. Source: The Trumpet Shall Sound, p119. 

  26. The report from the 1943 patrol is discussed in Berndt's "A Cargo Movement in the Eastern Central Highlands of New Guinea", Oceania, Mar. 1953 (link), page 227. These radio houses are also discussed in The Trumpet Shall Sound, page 199. 

  27. Wooden airplanes are a staple of the pop-culture cargo cult story, but they are extremely rare in authentic cargo cults. I searched extensively, but could find just a few primary sources that involve airplanes.

    The closest match that I could find is Vanishing Peoples of the Earth, published by National Geographic in 1968, which mentions a New Guinea village that built a "crude wooden airplane", which they thought "offers the key to getting cargo".

    The photo below, from 1950, shows a cargo-house built in the shape of an airplane. (Note how abstract the construction is, compared to the realistic straw airplanes in faked photos.) The photographer mentioned that another cargo house was in the shape of a jeep, while in another village, the villagers gather in a circle at midnight to await the arrival of heavily laden cargo boats.

    The photo is from They Still Believe in Cargo Cult, Pacific Islands Monthly, May 1950.

    The photo is from They Still Believe in Cargo Cult, Pacific Islands Monthly, May 1950.

    David Attenborough's Cargo Cult documentary shows a small wooden airplane, painted scarlet red. This model airplane is very small compared to the mock airplanes described in the pop-culture cargo cult.

    A closeup of the model airplane. From Attenborough's Cargo Cult documentary.

    A closeup of the model airplane. From Attenborough's Cargo Cult documentary.

    The photo below shows the scale of the aircraft, directly in front of Attenborough. In the center, a figure of John Frum has a "scarlet coat and a white, European face." On the left, a cage contains a winged rat for some reason.

    David Attenborough visiting a John Frum monument on Tanna, near Sulfur Bay.
From Attenborough's Cargo Cult documentary.

    David Attenborough visiting a John Frum monument on Tanna, near Sulfur Bay. From Attenborough's Cargo Cult documentary.

     

  28. The photo below shows another scene from the movie Mondo Cane that is very popular online in cargo cult articles. I suspect that the airplane is not authentic but was made for the movie.

    Screenshot from Mondo Cane, 
 showing the cargo cultists posed in front of their airplane.

    Screenshot from Mondo Cane, showing the cargo cultists posed in front of their airplane.

     

  29. The tale of women pursuing men was described in detail in the 1929 anthropological book The Sexual Life of Savages in North-Western Melanesia, specifically the section "Yausa—Orgiastic Assaults by Women" (pages 231-234). The anthropologist heard stories about these attacks from natives, but didn't observe them firsthand and remained skeptical. He concluded that "The most that can be said with certainty is that the yausa, if it happened at all, happened extremely rarely". Unlike the portrayal in Mondo Cane, these attacks on men were violent and extremely unpleasant (I won't go into details). Thus, it is very likely that this scene in Mondo Cane was staged, based on the stories. 

  30. The movie Mondo Cane directly influenced the pop-culture cargo cult as shown by several books. The book River of Tears: The Rise of the Rio Tinto-Zinc Mining Corporation explains cargo cults and how one tribe built an "aeroplane on a hilltop to attract the white man's aeroplane and its cargo", citing Mondo Cane. Likewise, the book Introducing Social Change states that underdeveloped nations are moving directly from ships to airplanes without building railroads, bizarrely using the cargo cult scene in Mondo Cane as an example. Finally, the religious book Open Letter to God uses the cargo cult in Mondo Cane as an example of the suffering of godless people. 

  31. Another possibility is that Feynman got his cargo cult ideas from the 1974 book Cows, Pigs, Wars and Witches: The Riddle of Culture. It has a chapter "Phantom Cargo", which starts with a description suspiciously similar to the scene in Mondo Cane:

    The scene is a jungle airstrip high in the mountains of New Guinea. Nearby are thatch-roofed hangars, a radio shack, and a beacon tower made of bamboo. On the ground is an airplane made of sticks and leaves. The airstrip is manned twenty-four hours a day by a group of natives wearing nose ornaments and shell armbands. At night they keep a bonfire going to serve as a beacon. They are expecting the arrival of an important flight: cargo planes filled with canned food, clothing, portable radios, wrist watches, and motorcycles. The planes will be piloted by ancestors who have come back to life. Why the delay? A man goes inside the radio shack and gives instructions into the tin-can microphone. The message goes out over an antenna constructed of string and vines: “Do you read me? Roger and out.” From time to time they watch a jet trail crossing the sky; occasionally they hear the sound of distant motors. The ancestors are overhead! They are looking for them. But the whites in the towns below are also sending messages. The ancestors are confused. They land at the wrong airport.
     

  32. Some other uses of the radio telescope photo as a cargo-cult item are Cargo cults, Melanesian cargo cults and the unquenchable thirst of consumerism, Cargo Cult : Correlation vs. Causation, Cargo Cult Agile, Stop looking for silver bullets, and Cargo Cult Investing

  33. Chariots of the Gods claims to be showing a cargo cult from an isolated island in the South Pacific. However, the large succulent plants in the scene are Euphorbia ingens and tree aloe, which grow in southern Africa, not the South Pacific. The rock formations at the very beginning look a lot like Matobo Hills in Zimbabwe. Note that these "Stone Age" people are astounded by the modern world but ignore the cameraman who is walking among them.

    Many cargo cults articles use photos that can be traced back from this film, such as The Scrum Cargo Cult, Is Your UX Cargo Cult, The Remote South Pacific Island Where They Worship Planes, The Design of Everyday Games, Don’t be Fooled by the Bitcoin Core Cargo Cult, The Dying Art of Design, Retail Apocalypse Not, You Are Not Google, and Cargo Cults. The general theme of these articles is that you shouldn't copy what other people are doing without understanding it, which is somewhat ironic. 

  34. The Jargon File defined "cargo-cult programming" in 1991:

    cargo-cult programming: n. A style of (incompetent) programming dominated by ritual inclusion of code or program structures that serve no real purpose. A cargo-cult programmer will usually explain the extra code as a way of working around some bug encountered in the past, but usually, neither the bug nor the reason the code avoided the bug were ever fully understood.

    The term cargo-cult is a reference to aboriginal religions that grew up in the South Pacific after World War II. The practices of these cults center on building elaborate mockups of airplanes and military style landing strips in the hope of bringing the return of the god-like airplanes that brought such marvelous cargo during the war. Hackish usage probably derives from Richard Feynman's characterization of certain practices as "cargo-cult science" in `Surely You're Joking, Mr. Feynman'.

    This definition of "cargo-cult programming" came from a 1991 Usenet post to alt.folklore.computers, quoting Kent Williams. The definition was added to the much-expanded 1991 Jargon File, which was published as The New Hacker's Dictionary in 1993. 

  35. Overuse of the cargo cult metaphor isn't specific to programming, of course. The book Cargo Cult: Strange Stories of Desire from Melanesia and Beyond describes how "cargo cult" has been applied to everything from advertisements, social welfare policy, and shoplifting to the Mormons, Euro Disney, and the state of New Mexico.

    This book, by Lamont Linstrom, provides a thorough analysis of writings on cargo cults. It takes a questioning, somewhat trenchant look at these writings, illuminating the development of trends in these writings and the lack of objectivity. I recommend this book to anyone interested in the term "cargo cult" and its history. 

  36. Some more things that have been called "cargo cult" on Hacker News: the American worldview, ChatGPT fiction, copy and pasting code, hiring, HR, priorities, psychiatry, quantitative tests, religion, SSRI medication, the tech industry, Uber, and young-earth creationism