SESSION IX: LSI SYSTEMS

THAM 9.2: A Numeric Data Processor

Rafi Nave

Intel Israel, Ltd. Intel Corp.

Haifa, Israel Aloha, OR

A SINGLE CHIP NUMERIC DATA PROCESSOR (NDP)\* serving as a coprocessor for a family of microprocessors that includes both a 16b and an 8b bit processor\*\*, will be discussed.

The processor is not a peripheral, but is closely coupled to the microprocessor in its system. It acts to extend the base processor's register resources and instruction set. Together, they interpret the same instruction stream and when one of a particular set of instructions (called ESCAPE) is encountered, the microprocessor calculates the data address (if the instruction refers to memory) and puts it on the memory bus. The NDP latches the address and also reads the datum. After reading its operand from memory, the NDP begins processing the indicated operation (ADD, MULTIPLY, etc.) and releases the bus so that the microprocessor can continue processing non-numeric instructions. This coprocessing interface can increase system throughput over a peripheral interface.

The processor operates on seven data types: 16, 32 and 64b 2's complement integers; 32-b (REAL), 64b (LONG REAL), 80b (TEMPORARY REAL) binary floating-point numbers and 80b (18 digit) signed packed BCD integers. The operations supported include: add, subtract, multiply, divide, remainder, square root, transcendental functions (logarithmic and trigonometric) and binary-decimal conversion. All of these operations are performed at high speed and in conformance with the proposed IEEE Floating-Point Standard.

This functionality is implemented in HMOS on a single chip larger than 280 mils square containing the equivalent of over 65,000 devices. The internal evaluation stack requires more than 700 bits of RAM and the microcode (including constants) utilizes over 30,000 bits of ROM. This amount of microcode was made possible by a four-state ROM in HMOS.

The processor was designed both to conform to the full proposed IEEE Floating-Point Standard (including Temporary Real Precision and other options) and to provide computational performance in the minicomputer range. To achieve this performance goal while maintaining maximum precision, the architecture was designed to execute the multiply, divide, square root and remainder functions in a hardware arithmetic module. The hardware module, which is initiated by microcode, was a very fast 68b ALU with sophisticated steering logic on its registers and data lines yielding a maximum utilization of the system. A very wide (68b) data path is used to move operands quickly, while supporting the precision and rounding requirements of the standard. The data path is precharged and synchronous access is used. This approach enables easy interfacing of different sources and simplifies appending zeros to the operands. Appending

zeros is important because the NDP supports seven data types, but does all computations in the maximum precision (Temporary Real). Performance was also enhanced by a fast shifter (0 to 63 places in one clock), special hardware for counting leading zeros and a hardware module to implement the various rounding modes specified in the proposed standard.

## Data Support

John Palmer

The NDP supports the seven data types by maintaining an internal evaluation stack of eight Temporary Reals and permitting operands in any of the other six formats to be loaded to or stored from the stack. In addition to loading and storing, the basic operations of add, subtract, multiply, divide and compare can be performed on operands taken either from memory or from the evaluation stack. In addressing memory all of the addressing modes of the system's microprocessor are supported. Results, except for store operations, are always returned to the stack and, using special hardware, are correctly rounded according to the proposed IEEE standard with four rounding modes and three levels of precision control. The speeds of operations in 5MHz parts range from about  $4\mu s$  for compare to  $16\mu s$  for multiply and  $35\mu s$  for divide and square root. These speeds are all for results of Temporary Real precision (80b).

The NDP also supports the standard by providing error detection and on-chip recovery procedures. To facilitate recovery from underflow and overflow, the processor provides on-chip gradual underflow, in which underflow results are gradually denormalized to zero, and two user selectable modes of infinity arithmetic, since infinity is the fix up provided for overflow.

In addition to the basic operations, the NDP has instructions to support binary-decimal conversion and transcendental function evaluation, including exponentials, logarithms, and trigonometric and inverse trigonometric functions. The transcendental functions are all computed with an error of less than 3 units in the last place of Temporary Real (80b). To achieve high performance at this accuracy, a modified CORDIC algorithm is used which requires several constants ROMs, a loop counter capable of addressing the ROMs and of establishing the shift count for the fast shifter.

## Fast Shifter

The fast shifter, capable of shifting right or left 0 to 63 places in one clock, has been useful in almost all of the algorithms. As mentioned, it is indespensible in the CORDIC, which requires shifts of i places on the i<sup>th</sup> pass through the loop; the shifter also permits fast data formatting for both input and output and for fast normalization and denormalization in add-subtract operations. In many machines the add-subtract time is a function of how far the operands must be denormalized or the result must be normalized. In the NDP, these speeds are constant—independent of the shifting required.

<sup>\*</sup>Intel 8087.

<sup>\*\*</sup>Intel 8086 and 8088.



FIGURE 1-Numeric data processor functional block diagram.



FIGURE 2-Numeric data processor interface to microcomputer system.