This is the latest (and greatest!) in a long series of fully-unrolled
CORDIC processors written by Larry Doolittle, with occasional help
from Ming Choy and Gang Huang. It is written in Verilog, although
part of the Verilog is in turn composed by an Octave/Matlab program.
Good reference material on CORDIC hardware in general is given by
Ray Andraka at
http://www.andraka.com/cordic.php
The module cordicg (in cordicg.v) is ready for instantiation in
application code, and it has one parameter (width) that sets the
bit width of the X and Y input and output ports. The phase input
and outputs are one bit wider than the X and Y ports. See below
for additional configuration.
Phase ports are in "natural" binary units, wrapping around the finite-
width digital word is a 2-pi wrap of conventional angle. As such, it
can be interpreted equally well as signed or unsigned.
X and Y ports are always signed, even when X is used as a radius output
in R->P mode. You are expected to know something about CORDIC when
setting up their scaling: a CORDIC engine has an intrinsic gain of
about 1.64676, for a large number of stages. Also be aware that a
full-scale input on both X and Y has a radius sqrt(2) larger than
just full scale in one axis. This module does not detect or saturate
overflows; it just wraps.
The 2-bit op input selects the operation mode as follows:
0 Polar->Rectangular "rotation" (phaseout will be close to zero)
1 Rectangular->Polar "vectoring" (yout will be close to zero)
2 Slave
3 not used
All three data inputs are used in all modes. To get an ordinary P->R
computation with op==0, set yin to zero. It`s also possible to use that
mode for general vector rotation of the input (x,y) vector by angle
phasein. To get an ordinary R->P computation with op==1, set phasein
to zero. A non-zero phasein in that mode will simply be added to the
answer. See below for info on slave mode.
The op input is allowed to vary cycle-by-cycle. Feel free to interleave
R->P with P->R computations. One pipelined CORDIC computation, based on the
three data inputs and the op control input, starts on every (posedge clk).
Version 26 has no changes to the core synthesizable code; it improves the
documentation, fixes the generated Verilog code for 33 < o < 56, and adds
provision to check synthesis on three generations of Xilinx chips.
It is interesting to compare and contrast the maximum speed this CORDIC
engine can run (in its default 18-bit configuration) on the various
architectures. Summary:
part speed LUTs chip price chip LUTs CORDIC price
xc3s1000-ft256-5 7.1 ns 1606 46.60 15360 4.87
xc6slx45t-fgg484-3 5.2 ns 1546 84.39 27288 4.78
xc7a100t-fgg484-2 3.8 ns 1480 141.25 63400 3.31
xc7k70t-fbg484-1 3.1 ns 1567 127.21 41000 4.86
5CSXFC6D6F31C8N 4.4 ns 2320 226.89 110000 4.78
where the price is as of 2014-05-25 at Digi-Key. The CORDIC price
is an upper limit, since it assumes all the non-LUT resources on the
chip are valueless.
Semi-incompatible change between versions 24 and 25: the op parameter
is two bits instead of one. Just pad on the left with zero (as will be
performed by default in Verilog), and there will be no change in function.
The new bit enables slave mode, where the rotation phase is the negative
of the previous operation. For such cycles, the input phase is ignored.
The hardware required to implement this new mode is small, about one logic
cell per stage, and even that should be stripped away by the synthesizer
if op[1] is hard wired to zero. The slave mode's computation can be also
performed by two successive passes through the CORDIC engine; using slave
mode saves a factor of two in latency, and reduces round-off error.
Incompatible change between versions 23 and 24: the phase of the
rectangular to polar conversion has been changed by pi. That means
that when op==1, the angle output is truly atan2(y,x), and the x (R)
output in that mode is now positive.
The other new feature of version 24, besides an additional test
bench mode, is the parameter op_def. Use cases with constant op
input can set this parameter to match, and might save some gates
and/or timing when synthesizing for Xilinx.
The test platform is Icarus Verilog, Xilinx XST, and Octave,
using Debian GNU/Linux as the operating system. The code is
generally standards-based and not version-specific.
The cordicg module includes cordicg.vh, and the alert reader will notice
that this file is not part of the source distribution. On a *nix platform,
cordicg.vh will automatically appear when you type "make" to run the tests.
You are expected to configure the code as shown below, than run the same
process (which invokes Octave) to make a cordicg.vh tuned for your needs.
octave -q cordicgx.m > cordicg.vh
The cordicgx.m program is Matlab-compatible. If you have a working and
licensed Matlab, and for some reason don't want to also install Octave,
you should be able to run that script, and then cut-and-paste its output
to the cordicg.vh file.
Since the number of pipeline stages instantiated in cordicg.vh is
configurable, it is a little tricky to make application code adjust to
the possible configurations. The cordicg.vh file is set up to help
that process. One possible solution:
`include "cordicg.vh" // sets cordic_delay parameter
reg [cordic_delay:0] sync_chain=0;
always @(posedge clk) sync_chain <= {sync_chain[cordic_delay-1:0],in_sync};
wire out_sync = sync_chain[cordic_delay];
The code has three configuration settings:
In cordicg_tb.v:
parameter width=18; // Configure here!
In cordicg_conf.m:
o=22; % bit width of intermediate computations
s=20; % number of stages
As shipped, results are:
octave -q cordicgx.m > cordicg.vh
iverilog -Wall -o cordicg_tb cordicg_tb.v cordicg.v
vvp -n cordicg_tb +op=0 > cordic0.dat
Check of x,y,theta->x,y
gawk -f cordic_test.awk cordic0.dat
test covers 15958 points, maximum amplitude is 90325 counts
peak error 1.25 bits, 0.0010 %
rms error 0.36 bits, 0.0003 %
PASS
vvp -n cordicg_tb +op=1 > cordic1.dat
Check of x,y,theta->r,theta
gawk -f cordic_test.awk cordic1.dat
test covers 7979 points, maximum amplitude is 129001 counts
peak error 1.06 bits, 0.0008 %
rms error 0.36 bits, 0.0003 %
PASS
vvp -n cordicg_tb +rmix=1 > cordic2.dat
Check of downconversion bias
gawk -f cordic2_test.awk cordic2.dat
test covers 6102 points
averages 0.027 -0.007
PASS
vvp -n cordicg_tb +op=3 > cordic3.dat
Check of slave mode
gawk -f cordic_test.awk cordic3.dat
test covers 11968 points, maximum amplitude is 129001 counts
peak error 3.27 bits, 0.0025 %
rms error 0.41 bits, 0.0003 %
PASS
Note that the theoretical lower limit for peak error is 0.5, and for
rms error is 1/sqrt(12) = 0.29. More information about the accuracy
behavior is given in a plot you can create with the gnuplot command
load "perf.gp"
Happy computing!
Larry Doolittle June 17, 2014