The SNS Front End LLRF System This work is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC03-76SF-00098. The SNS project is being carried out as a collaboration of six US Laboratories: Argonne National Laboratory (ANL), Brookhaven National Laboratory (BNL), Thomas Jefferson National Accelerator Facility (TJNAF), Los Alamos National Laboratory (LANL), E. O. Lawrence Berkeley National Laboratory (LBNL), and Oak Ridge National Laboratory (ORNL). SNS is managed by UT-Battelle, LLC, under contract DE-AC05-00OR22725 for the U.S. Department of Energy. E. O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA LBNL has built the Front End for the Spallation Neutron Source (SNS) project, an H- injector that is designed to deliver 52 mA at 2.5 MeV of beam current in 1ms batches of 650 ns pulses at 60 Hz, for a beam duty factor of 6%. The Front End is comprised of an ion source, an electrostatic beam transport line (Low Energy Beam Transport - LEBT), an RFQ and a Medium Energy Beam Transport line (MEBT). The RFQ accelerates the beam from 65 keV to 2.5 MeV. The MEBT includes four bunching cavities which are designed to preserve the longitudinal bunch length during transport down the 3.7 m long MEBT. An FPGA-based Low Level RF (LLRF) control system has been built and is now used to control cavity amplitude to better than 1% and relative phase within 1 degree. The signal processing chain that includes ADC, DSP in FPGA, and DAC, adds an analog delay of about 250 ns. The fast digital processing is networked to the EPICS global controls system with a commercial miniature embedded Linux computer (nanoEngine). INTRODUCTION The LLRF system controls four 402.5 MHz rebuncher cavities of the SNS MEBT. Phase coherent operation of these cavities was needed during early commissioning activities at Berkeley. To merge successfully with SNS RF reference infrastructure, the system was designed to make use of 352.5 MHz LO, 50 MHz IF source, 40 MHz clock, and 10 MHz sync. Figure 1 shows the basic architecture selected. The 50 MHz input signals (mixed down from 402.5 MHz) from the cavity field probe, forward power, and reflected power, are sampled at 40 MS/s by individual 12-bit ADCs. Digital signal processing in an FPGA controls two 12-bit analog output channels, each updated at 20 MS/s, control a 50 MHz vector modulator. A networked host computer connects the fast signal processing chip to the outside world via Ethernet. Timing and interlocks are routed through the same FPGA. A system simulator was written to verify that such a system could provide enough appropriate feedback to keep the cavity fields and phases nearly constant during beam loading transients. Out of a total feedback loop latency of 1100 ns, 680 ns is simply due to cables and the 20 kW amplifier. Of the remaining 420 ns that is located in the LLRF chassis, 170 ns comes from analog filters, and 250 ns from the ADC/FPGA/DAC processing step. Note that, as usual for cavity control, there are two feedback loops. A fast one, that operates within the 1ms beam pulse, to keep the cavity field vector on its setpoint, and a slow one, that keeps the cavity on tune mechanically. Hardware Previous experience with Xilinx 4000 series FPGA chips convinced us that their next generation, Virtex derived, chips could easily handle the data path at 40 MHz with a moderate amount of pipelining. A pure digital signal processing flow was blocked out (figure 2), implementing auto offset subtraction, setpoint, complex proportional gain, scalar integral gain, and a feedforward table. I and Q channels are never separated; only a 10 MHz sign flip is required to change the input I/Q/-I/-Q sequence to the output I/Q/I/Q sequence. Xilinx's patented writable logic cell feature is used to advantage in the three data path multipliers (two to set the complex proportional gain, one to set the scalar integral gain). Such writable cells convert the conventional ``KCM'' (constant coefficient multiply) used in DSP on FPGA (to reduce logic element usage and increase speed) into a DKCM (dynamic constant coefficient multiply), where the coefficient can be reloaded, in our case, between beam pulses. The hardware was kept as simple as possible. Very little is included on the digital board besides the 4 input ADC's (TI/Burr-Brown ADS808), the FPGA (Xilinx XC2S150), and the output dual DAC (Analog Devices AD7965). See figure 3. The embedded computer's expansion bus is tied directly to the FPGA, so a virtual register map (64K x 16) can be implemented under complete control of the FPGA firmware. The four FPGA JTAG TAP pins are wired directly to the embedded computer's digital I/O lines, allowing the FPGA to be programmed without recourse to the usual separate chip of ROM. Of course, the embedded computer is programmed to get that FPGA bit file from the network, so firmware updates are trivial to propagate. Limited FPGA pin count and engineering time kept us from adding external memory to the FPGA. As a result, only 6 kBytes of data from within a 1 ms pulse can be stored. In contrast, during the same pulse 200 kBytes stream in through the high speed ADCs. This limited storage isn't all bad; such large quantities of data could easily overwhelm the host computer and/or the network. Our experience showed this memory to be adequate for both operator comfort displays (with about the information content of a digital storage 'scope) and acquisition of leading edge waveforms (used to compute detuning angle). A single board computer called a nanoEngine, from Bright Star Engineering, was chosen for this project, as a known good base on which to run EPICS on Linux. Relevant features include: 200 MHz StrongARM CPU100 baseT Ethernet 32 Mbytes SDRAM Serial (RS-232) console 4 Mbytes Flash 10 digital I/O Bus expansion The footprint is a single high density 160-pin connector, over which all power and signals are routed, including Ethernet. The 200 MHz processor is capable of nearly 200 MIPS operation, as long as it runs mostly out of the on-board 8K data/8K program cache. The curve fitting routines in particular had to be carefully constructed to avoid blowing this cache, as well as to avoid heavy floating point work -- there is no floating point hardware on a StrongARM chip, those instructions are emulated (slowly) in software. The only non-programmable logic in the system controls a PIN diode shutoff for the output 402.5 MHz signal. When the externally provided RF gate drops, no RF output is possible. This signal is also used as the trigger for a timing state machine (programmed in the FPGA). The feedforward table was used extensively to generate test patterns. When first hooked up to a cavity, with feedback disabled, this let us turn the system into a glorified pulse generator and vector oscilloscope. The RF chassis was constructed with connectorized (SMA) compoenents, to permit rearrangement of components and adjustment of signal levels without the schedule delay of revising circuit boards. See figure 4. Because of the tight schedule and risk-averse management, enough phase shifters and additional control and monitoring points were added that an analog control loop could potentially have operated the cavities. Much of that circuitry sat idle during the commissioning process, partly because the digital section worked, and partly because the accelerator beam timing signals were a moving target. That target could be hit with adjustments to the FPGA configuration far more easily than adjusting four copies of analog hardware. The chassis is convection cooled (no fans). Total system dissipation is about 35 W; the largest contributor (15 W) is the 352.5 MHz amplifier that drives the LO port of four mixers with +20 dBm each. The ADS808's are powered off when not in use; this avoids 2 W of dissipation on the digital board. These chips make a complete transition out of sleep mode to making accurate conversions in about 40 microseconds. Programming The cavity detune angle must be computed based on RF waveforms, in order to close the tuner feedback loop. Initial attempts to use the trailing edge of the field probe were stymied by the coupling between the cavity and the output cavity of the 20 kW driver amplifier, and the fact that the tune of this coupled system changed as a function of power level. We therefore switched to a method based on the leading edge of the reflected wave. The host computer analyzes 500 points of raw ADC data, and generates a six-parameter fit to the exponential decay waveform, in less than 1 ms. The result is an unambiguous measure of cavity frequency error. A simple set of averaging, glitch rejection, and deadbanding ran a bang-bang control of an external (network connected) stepper motor controller. Once realistic bounds were set on the fit parameters and quality, and the sign of the feedback was selected correctly, this system ran flawlessly. The EPICS interface was based on xcas. The FPGA delivered an interrupt at the end of an RF pulse, triggering software to exchange data between the EPICS data structures and the FPGA before the next pulse started, 15 ms later. The firmware and software used a handshake flag to monitor this process, as a check that the software was adequately responsive. No such errors were logged, despite the use of a stock, non-real-time Linux kernel. The development of the FPGA firmware and the support software was done in a coordinated manner, so that each task could use an appropriate mix of hardware and software. Typical race conditions were eliminated at the source. For example, latched error conditions were picked up by the software with a read-and-clear function; the software, working in a single thread process model, could then increment error counters for its internal use. The FPGA logic was coded entirely in synthesizable VHDL, using Virtex primitives only for RAM, multipliers, and clock distribution. No special constraints were required to meet the 40 MHz clock rate. All of the signal processing logic in the FPGA was configured in a single 40 MHz clock domain. A small amount of logic was placed in the clock domain of the host bus, 25 MHz. The usual techniques for crossing the two domains were mostly unnecessary, as the fast signal processing takes place within the 1ms beam pulse, and the host activity is limited to an interrupt routine following the pulse. The handshake discussed above was used to verify this assumption. Self-check logic was added to detect missing or erratic clocks, triggers, or synchronization pulses. In the available commissioning time, we learned how to systematically turn on a cavity, adjusting the various phases and gains. That did not leave any time to codify these techniques into automatic startup and self-tune software. Figure 5 shows an example of a short RF pulse, with a beam loading transient that starts at the 120 microsecond coordinate. These data were saved from a running LLRF control system, converted here from Real/Imaginary to display magnitude only. Acknowledgements The authors would like to acknowledge the support of and equipment from the LANL SNS LLRF team. The support of the ORNL project office is also acknowledged.