沈阳理工大学学士学位论文
附录A 英文原文
A SINGLE-CHIP MULTIPROCESSOR DSP SOLUTION FOR
COMMUNICATION APPLICATIONS
David Regenold Intel Corporation
6505 W. Chandler Blvd., M/SCH11-91
Chandler, Arizona 85226
Abstract
This paper presents an overview of an architecture for a single chip multiprocessor DSP solution for communications applications. The integrated circuit was designed to handle the protocol and data-pump functions necessary to implement high-speed modem and audio tasks. The chip consists of a 186 microcontroller with two Digital Signal Processing (DSPs) coprocessors. It interfaces with a standard 186 bus and has a port for communicating with a custom CODEC, the80127. Introduction
Computer connectivity is an ever expanding and pervasive market. Standard POTS-line MODEMS continue to be pushed to higher frequencies even as the need for handling simultaneous voice emerges. At the same time, wireless capabilities are emerging and the concept of wireless email is becoming a reality with the advent of packet radio net works such as Mobitex, Ardis and Cellular Digital Packet Data (CDPD).To address these markets, the Communications Products Division of Intel has developed a chip-set specifically architected to meet the growing MIPS demands of the emerging communications applications. The chip set consists of the 80127 CODEC, which was presented at last year's ASIC conference, and the 8OC186CP controller/DSP which will be presented here Detailed Architecture
The 80C186CP is a highly integrated microcontroller derived from the 80C186EB and primarily targeted towards high end data communication applications like high speed modems (v.34rates) and audio process- ing. To a first order it is software compatible with the
42
沈阳理工大学学士学位论文
80C186EB. However, the 8OC186CP enhances the EB with the addition of a high performance Digital-Signal-Processing coprocessor and a versatile and flexible peripheral set designed to effectively interface with the necessary communication channels, such as an a- log CODECs, and ISA or PCMCIA buses.Figure I shows the block diagram of the8OC186CP architecture. As the diagram indicates, the 80C186CP is built around two internal buses known as the F-bus and the E-bus. The F-bus is the standard internal bus found on all80C186Ex proliferations and is used for data transfers between the 80C186 core, the external Address/Data bus and the 80C186 peripherals. It is a 20-bit bus which, like the external bus, multiplexes the 16-bit data with 20-bits of address. The majority of the peripherals attached to the F-bus are identical to those found on the 80C186EB.They include the EB's chip-select unit, timer unit, interrupt unit, parallel port unit, and serial-port unit (the EB had two). The clock power-management unit is similar to that found on the 80Cl86EB, but it has been substantially modified as will be discussed later. The EX core and all modules on the F-bus are clocked at 20.736 MHz in the target applications.
New F-bus peripherals not found on the standard 80C186EB include a host-interface unit and an HDLC accelerator. The host-interface unit allows the 80C186CP to be attached to either an ISA or a PCM- CIA bus. It contains a 16550 emulator, which makes the8OC186CP appear to the host as merely a 16550 UART. In actuality, the data transferred between the 16550 emulator is not serialized as in a real UART, but either taken from the cores in the 8OC186CP or presented to them for data manipulation as required by the communication standard being implemented. When in PCMCIA mode, the unit supports accesses to the Card Information Services(CIS) memory and contains the other special function registers defined by the PCMCIA standard.
Data being transferred across the host interface must sometimes be HDLC encoded or decoded depending again on the standard being implemented. For this purpose, an HDLC accelerator is provided to alleviate the EX core from the intense bit manipulation that would otherwise be required. Unlike a typical HDLC controller that receives and transmits data serially, this unit receives data from the F-bus in parallel form since the EX core is dealing exclusively with parallel data. The HDLC accelerator views the data as a serial bit stream and performs the tasks of CRC generation, zero bit insertion or deletion, and error detection.
Another module on the F-bus not found in other80C186EX products is the coprocessor-interface block (or arbiter). This unit is discussed after the DSP coprocessor. Suffice it to say for now that this module plays a pivotal role in the 8OC186CP chip and acts
43
沈阳理工大学学士学位论文
as the central point of communication between the 80C186 side of the controller and the DSP coprocessor. The DSP coprocessor consists primarily of two DSP cores known as the EP+ cores, several memory banks for program and data storage, and some peripherals, such as a CODEC interface and a Viterbi accelerator.
A block diagram of an EP+ is shown in Figure2. These DSP cores are simple Multiply and Accumulate (MAC) engines capable of executing a peak rate of one MAC per clock cycle. For the target application, the 80C186CP is clocked at 20.736 MHz, which results in a peak throughput of more than 40 million MACS per second.
The EP+ ALU is fed from two internal 256-word RAMs that contain data and coefficients required by the algorithms. One 16-bit word can be fetched from each SRAM every clock cycle in order to feed the MAC engine. The multiplier produces a 32-bit product that is accumulated in a 32-bit register. Additional hardware is available to assist division.
Access to memory beyond the internal 512 words is achieved through a memory-expansion unit that sits on an 8-word parallel port of the EP+ called the EXT bus. Through this expansion port, the EP+ can access up to 64k words of additional data (although only 3.5k is actually present in this implementation). The additional data may be present on a private bus of the particular EP+ or may reside on the shared E-bus. Data can be accessed through this port at a peak rate of one word per clock cycle and replaces one of the DSP’s internal SRAMs when in use.
The DSPs are supplied their program code and data through several banks of SRAM. Since they are of a Harvard architecture, they each have their own separate program bus a dseparate private data bus. Four 512-word SRAMs are provided for program purposes and four 256-word SRAMs are provided for data purposes. Any of the four program RAMs can be connected to the program bus of either EP+ while any of the four data RAMs can be connected to the private data bus of either EP+.
The E-bus is a single-clock-cycle-transfer bus consisting of 16 address bits and 16 data bits. It may be accessed by either EP+ core or the EX core. The DSPs can transfer data across the bus at a rate of one word per clock cycle. However, the EX core is still limited to one word every four clock cycles due to the limitations of the F-bus. Any of the program and data memory banks can be connected to the E-bus so that any of the three cores can arbitrate for access to them.
Programming of the memory configuration occurs through the IPCB block. This module also acts as a central-switching office for interrupts. Interrupt signals from all different parts
44
沈阳理工大学学士学位论文
of the system converge on this unit and are sent to any one of the three processors according to programming of a register within the unit.
Attached to one of the DSPs is a Viterbi Accelerator. The Viterbi algorithm is encountered frequently in communications but has little to do with Multiply/Accumulates and thus is not implemented efficiently by the EP+ architecture. The EP+ uses this module by simply writing a series of data pairs to it and then reading the resulting maximum or minimum of the sums of the data pairs and the corresponding index.
Data received across a communications media(either wired or wireless) comes into the 8OC186CP
from
an
Analog
Front
End(the
80127
AFE)
via
the
High-Speed-Synchronous-Serial Port (HSSSP). This unit uses an interface found on many industry standard DSPs to communicate with the AFE. The HSSSP has access to two more 256-word SRAMs between which it fetches and stores data. This allows the HSSSP to transmit and receive two blocks of256 words in full-duplex mode without intervention from any of the processors on chip. The more usual case is to have the HSSSP transmit and receive two blocks of 128 words in full duplex mode from a single SRAM while the EP+ works with the data previously received or soon to be transmitted in the other SRAM.
The last major block on the 80C186CP is the coprocessor interface module. This block is multifunctional. It allows the EX core to access all of the memory and Special Function Registers residing on the E-bus as a region in the C186's 1MByte address range. It also acts as the arbitration unit between the three on-chip processors for access to the E-bus and it contains a status register through which the EX core can monitor activity on the E-bus. But its most important function is that of a fly-by DMA unit to transfer data from the F- bus to the E-bus or visa-versa.
The code and data memory requirements of the DSP algorithms running on the two EPs are usually larger than the feasible on chip memory. Therefore, a fast and low overhead method of downloading hp loading code and data to the on chip memories from off-chip program memory is essential. This is accomplished via the fly-by DMA which resides in the coprocessor-interface unit. Any of the three E-bus masters can utilize the DMA by programming the necessary parameters, such as the starting address, ending address, and word count.
The programming of the fly-by DMA takes advantage of the fact that the DSP firmware running on the DSPs is fairly predictable. That is, the current tasks running on the DSPs should know which software modules are most likely to be needed next. The cur- rent task,
45
沈阳理工大学学士学位论文
througha write to a register, informs the DMA unit that it will be needing another page of code or data long before that code is needed. If the DMA is not busy, it immediately responds with an interrupt to the requesting master. Otherwise it finishes its current transfer before sending the interrupt. The master receiving the interrupt is then locked to the E-bus so that it can program the details of the transfer it wishes to initiate without fear of interference from other bus masters. The DMA unit then proceeds to download or upload there quested code or data. When the transfer is complete, another interrupt is sent to the bus master which had requested the transfer to indicate that the transfer is complete. If another bus master notifies the DMA unit of a pending transfer while it is busy, it interrupts that bus master also to inform it that it can take its turn.
While performing a transfer, the DMA unit normally runs at the lowest priority allowing the DSPs access to the E-bus for other activities. External bus masters and 80C186 interrupts can also have priority over DMA activities if so programmed.The 80C186Ex family of products usually operate with a 2x input clock. A Phase Locked Loop (PLL) is provided on the 8OC186CP that enables operation from a lx input clock. This PLL is similar to that used on the 80960 family of products.
When all modules on the 8OC186CP are being utilized, the device consumes a maximum of 1.1 Watts at 20.736 MHz and 5 volts. The device was also designed to run entirely at 3.3V or in a split mode with the pad buffers powered from a 5V supply while the internal cores run at 3.3V. Both of the 3.3V options allow running at even lower power with, of course, a corresponding sacrifice in performance.
The clock unit contains more power-management features that can reduce power consumption to less than 1 watt. First, any or all of the three processors can be shut down via software control when they are not being utilized. The 8OCl86CP power-save modes are available in which the internal frequency is reduced below the input frequency. A standby mode is avail- able during which all internal activity and clocks are disabled except for the phase locked loop and oscillator which continue to run. Finally, a full power-down mode is implemented in which all internal activity is stopped. These power-management modes can be sensitized to any of the external interrupt pins or NMI so that the power-down modes can be exited by activity on these pins. Design Methodology
The 80C186CP was designed in Intel's 1-micron 2- level metal process. Figure 3 shows a die photo of the chip which measures 14.8\
46