COFFEE Project
The aim of the Coffee Project is to provide a set of hardware components and a software support to develop a complete computer system.



  • Harvard architecture
  • 6 pipeline stages
  • Flexible multiplication of 16-bit and 32-bit operands
  • Full precision 64-bit result in 4 clock cycles
  • Two separate register banks for fast context switching
  • SW-configurable through a memory-mapped register bank
  • Super user mode for OS-like functionality
  • Memory protection mechanism
  • Built in 12-line interrupt controller
  • Two timers
  • Coprocessor interface

MILK Co-Processor

  • Succesfully interfaced with 2 different RISC cores
  • 32-bit bidirectional data bus, used also to write instructions and read the status (flags)
  • 5-bit (3+2) address bus
  • Multi-port Register File
  • Signals for the handling of special situations (exceptions and stall)

Butter Co-Processor

  • NxM array of reconfigurable processing elements (cells)
  • Each cell features integer and floating-point arithmetic operations, shift and LUT-based operations
  • Flexible interconnect schemes between the PEs, providing nearest-neighbour and global communication
  • Dedicated input and output in addition to the system bus (or network!) interface which is mainly used for configuration purposes


NoC-Based Platform

  • Developed using several previously designed components
  • It includes the Proteo NoC, the Coffee Processor, the Milk co-processor
  • Main design goal: to enable efficient utilization of the communication resources through the bus-oriented standard interface

Bus-Based Platform

  • Platform based on a standard bus architecture
  • Basic version includes Cappuccino block, instruction and data memories, UART communication device
  • 3D Graphics version provides an interface for a double frame-buffer management and a related vga controller

DMA Platform

  • Includes the Cappuccino block and Butter Coarse-grain Accelerator, based on a tightly-coupled model of computation
  • Dedicated DMA provides an interface to access memory and peripherals directly with the accelerator
  • Switch-based interconnection network allows simultaneous access from the two cores towards the system memory and peripherals


  • Multiprocessor SoC based on COFFEE Risc Core
  • Nine procesor cores connected via a hierarchical NoC based on a mesh topology


C Compiler

  • Based on GCC v3.4.4
  • It supports Milk FPU
  • Coprocessor version
  • Integrated version
  • It provides a library of assembly routines
  • Memory operations
  • Integer division and modulo
  • Different levels of optimization


  • Based on GNU Binutils v2.17
  • Over a dozen different utilities covering all aspects of handling low-level object files
  • Linker (coffee-ld)
  • Assembler (coffee-as)
  • Disassembler (coffee-objdump)

C Newlib v1.15.0

  • Conglomeration of several library parts targeted to embedded systems


3D Graphics Library

  • Written in C to implement 3D graphics on a COFFEE platform
  • It supports polygons with variable number of edges and NURBS Curves and Surfaces
  • It provides drawing lists for transformation and rendering of groups of geometrical objects
  • It implements transformation using matrix stacks and lightening based on Gouraud shading
  • Polygons rendering is based on a scan-line algorithm
  • NURBS rendering implements De Boor algorithm
  • Hidden surface removal provided using Z-Buffer
  • A simple 3D graphics application has been developed on a FPGA prototype of a COFFEE system
  • Basic kernels (i.e. 4×4 Matrix-Vector multiplication) can be accelerated through Butter Array

GPS Tracking Channel

  • Mapping of a GPS tracking channel on the Butter reconfigurable array
  • Hardware modifications were needed: a configuration decoder, increase of the LUT size, some multiplexers
  • The synthesis of the new cell with 1.4V 90nm low-power standard cell technology requires 21680KGates and performs at a maximum frequency of 515MHz
  • The complete mapping occupies totally twenty-eight processing elements over sixty-four
  • The throughput of the application is six 22-bit words per clock cycles, while the latency is seven clock cycles.

Software Defined Radios

  • Implementation of Software Defined Radio applications on our platforms is under study
  • Target Cell Search in WCDMA mapped on NineSilica
  • Basic kernels (i.e. correlation filters) accelerated through Butter Array