CEVA 4G/5G Demod - Scope
Scope
This document describes the CEVA Streaming Data Movement Accelerator (CEVA-SDMA™) architecture specifications.
Audience
This document is intended for system architects, software engineers, and system engineers who are implementing Hardware Accelerators using the CEVA-SDMA.
Related Documents
The following documents are related to the information in this document:
AMBA AXI Protocol Version: 2.0 Specification, ARM March 2010
AMBA APB Protocol Version: 2.0 Specification, ARM April 2010
AMBA 4 AXI4-Stream Protocol Version: 1.0 Specification, , ARM March 2010
Overview
The CEVA Streaming Data Movement Accelerator (CEVA-SDMA) is a device that allows for the complete separation between Data-Movement (i.e. loading data in and storing data out) and Data-Path (i.e. data signal processing) functionalities in Hardware Accelerators (HWAs).
The guiding principle behind the architecture of this block is that a single configurable CEVA-SDMA block may be reused with any Data-Path functionality to form flexible, instruction controlled HWAs with ultra fast time-to-market thanks to the complete reuse of the data-movement hardware and the common instruction set for programming the integrated hardware accelerator.
The CEVA-SDMA allows for distributed data stored in memory mapped buffer(s) to be synchronized and streamed through one or more data streams into a data-path block. The output data streams from the Datapath block can then be acquired and synchronized by the CEVA-SDMA and distributed into memory mapped buffers. The control of the Datapath block can be synchronized with the data packet movement into and out of the Datapath block.
illustrates the CEVA-SDMA connected to a data-path block with n output data-streams and m input data-streams.
This data-path block could be a DSP function, such as a filter or it could be a Digital Front End (DFE) for an RF Transceiver.
The basic principles of operation for this block are:
Data is loaded from AXI memory and streamed into one or more "output data streams".
Data is received from one or more "input data streams" and stored into AXI memory.
A loopback between a single "output data stream" channel and one (or more) "input data stream" channel(s) provides regular DMA mem-copy functionality.
Operation is instruction based with the instruction set built to support the following:
Issuing control transactions to the Data-path block
Reading distributed data from the "DATA INPUT BUFFER" and streaming it through one or more of the "output data stream" interfaces.
Receiving data from one or more "input data streams" and storing it in the "DATA OUTPUT BUFFER" in a distributed form.
The Tx, Rx and Control paths are entirely independent with separate instruction queues and DMA channels.
The data streams can be driven and acquired using different clocks than the AXI clock, and may have different bit-widths.
The IP is scalable to support one or more input data streams, output data streams, control streams.
Features
Task Subsystem
Instruction Set:
Tasks are a combination of instructions (tasks) and registers (task parameters).
A hardware configurable number of 32bit/16bit/8bit register sets are stored in a dedicated register-file and used by the instructions.
Instruction types: Transmit, Receive, Control, Register write, Block, Jump, Halt, Message write, General purpose output, NOP.
Instruction loading:
Instructions are loaded automatically via the AXI read port (the same one used for data)
Interleaved Weighed Round Robin arbitration between Tx, Rx, Control task queues requesting to load instructions.
Instruction queues:
Separate Tx, Rx and Ctrl task queues. One queue per data/control channel.
Hardware configurable depth of instruction queues (TBD: may be fixed to depth of 1x if proves to be sufficient).
Each instruction queue has a register-file with registers that can be loaded via dedicated instructions or by an external APB host.
Instruction decoding:
Reference the internal register-file for parameters based on instruction indirection.
Results in a common request format (DMA Request Interface) to the corresponding Data, Ctrl, Register, Task channels.
Some instructions have write-back values to internal registers within the register-file.
Specialized instructions involving in the modification of the instruction read-pointer (jmp), blockage of instructions from being popped (buz), disabling of the queue (hlq) and flow-control (gpo).
Data Read/Write Channels
Tx, Rx and Control channels are programmed independently.
AXI4.0 support with up to 256-beat data transactions and up to 256 outstanding read/write requests.
Built-in fragmentation of packets into programmable sized bursts.
Unaligned byte addressing.
Sparse data read patterns: linear,1D-3D address generation with option to interleave data between different data frames.
Built-in splitting of transactions on 4K address boundaries.
Channel synchronization using (optional) sample level external triggers.
Hardware configurable data bit-width: 64bit, 128bit, 256bit.
Fits all use cases: AXI<>AXI-Stream, AXI task loading, AXI<>REGFILE
Data Streamers
Support for AXI-Stream v1.0 data streams on transmit (Tx) and receive (Rx) channels (with some limitations vs. the full spec).
Upsizing / Downsizing data stream bus widths to the memory mapped AXI data bit-width.
Hardware configurable data bit-width: 32bit, 64bit, 128bit, 256bit.
Data alignment and packing for unaligned memory mapped AXI addressing.
Loop-back (TBD: external or internal ?) support from Tx to one (or more) Rx to support "mem copy" operations like regular AXI DMAs.
Null data transmission (Tx channel only): constant (configurable) "null" data or disabling of the "valid" signal.
Triggers:
Each channel has 1x input and output trigger signals.
Task fields define the behavior of the triggers.
Input triggers affect the outset of the channel: for Tx this is after the FIFO before the output to the AXI stream. For Rx this is right at the interface before the FIFO.
Clock domain crossing support to the asynchronous data streaming interface:
Any clock ratio supported.
Each channel can be clocked by a different asynchronous clock.
Events/Interrupts
All events are mask-able an can be enabled or disabled altogether.
Event groups are combined to allow for efficient SW probing.
Flow control / Task completion.
AXI-MM/APB/AXI-STRM bus error exceptions.
Triggering error events: trigger on an empty Tx channel, trigger without input samples on an Rx channel.
AXI read response timeout events.
AXI RD/WR Switches
Arbitrate between multiple AXI requesters (i.e. task loaders, DMA-RD, DMA-WR agents) to access a single AXI master port (1x read and 1x write).
Hardware Configurations
Task subsystem:
Depth of instruction queues
Number of R32/R16/R8 registers in the register file.
AXI (memory mapped) interfaces:
Data and ID bit-widths.
Number of outstanding transactions.
AXI (stream) interfaces:
Data bit-width.
DMA Channels
Number of Tx/Rx DMA channels
Number of CTRL channels
Type of DMA channel (1D, 2D, 3D, Interleaved).
Debug features
"Single step" support mode for instruction execution.
Access to select "variables" from each sub-block within "single step" mode.
Task subsystem bypass: direct path to the DMA channels from APB.
Block Diagram
The CEVA-SDMA top-level block diagram is illustrated in .
The different subsystems highlighted in are:
Task Subsystem: responsible for loading, queueing and decoding tasks from external memory.
Configuration & Flow Control Subsystem: stores the configuration and status registers for the CEVA-SDMA; Handles the flow-control with upstream and downstream blocks.
AXI Read/Write Switches: provide memory mapped AXI read/write access to multiple clients within the CEVA-SDMA;
Reset & Power Control: manage the reset and power management functionality for the CEVA-SDMA.
Data & Control Channels: Tx/Rx DMA channel that generate the address patterns, reads (for Tx) data from external AXI memory and stream it to the AXI stream channel, and writes (for Rx) the received data streams to external AXI memory.