A specialized processor for track reconstruction at the LHC crossing rate

A. Abba<sup>2</sup>, M.Citterio<sup>2</sup>, F.Caponio<sup>2</sup>, A. Cusimano<sup>2</sup>, A. Geraci<sup>2</sup> P. Marino<sup>3</sup>, M. Morello<sup>3</sup>, N. Neri<sup>2</sup>, A. Piucci<sup>3</sup>, M. Petruzzo<sup>2</sup>, Giovanni Punzi<sup>3</sup>, L.Ristori<sup>3,4</sup>, F. Spinella<sup>3</sup>, S. Stracka<sup>3</sup>, D. Tonelli<sup>1</sup>

<sup>1</sup>CERN <sup>2</sup>Politecnico/INFN-Milano <sup>3</sup>University/INFN-Pisa <sup>4</sup>Fermilab

INSTR 2014 BINP, Novosibirsk, Russia

# Motivation

- The LHC has opened a new era, also about instrumentation
- Exploitation of HL will pose even greater challenges
- Data acquisition and reconstruction one of the toughest issues
- A big part of the problem is the reconstruction of charged particle trajectories
  - Large combinatorial problem, calls for high parallelization
  - In many cases, latencies are an issue due to need for buffering (e.g. in CMS tracker).

#### Some past examples of real-time track reconstruction

| Name                                       | Tech. | Exp.     | Year | Event rate | clock cycl | es/event | latency |  |
|--------------------------------------------|-------|----------|------|------------|------------|----------|---------|--|
| XFT                                        | FPGA  | CDF-L0   | 2000 | 2.5 MHz    | 200 MHz    | 80       | 4µs     |  |
| SVT                                        | AM    | CDF-L2   | 2000 | 0.03 MHz   | 40 MHz     | ~1600    | <20µs   |  |
| FTK                                        | AM    | ATLAS-L2 | 2014 | 0.1 MHz    | ~200 MHz   | ~2000    | O(10µs) |  |
| Compare with the requirements of a LO@LHC: |       |          |      |            |            |          |         |  |
|                                            | -     |          |      |            |            |          | -       |  |

? ? LHC-L0 ~2018 40MHz ~1GHz ~25 few μs

- The task of L0 tracking at LHC appears daunting despite the progress of electronics.
- Any complex tracking calls for O(10<sup>3</sup>) clock cycles/event (both in latency and throughput)
- No known example of a system making a non-trivial pattern reconstruction in O(25) time units

### Well, maybe I can think of ONE example...



Adapted from H. Kirchner, S.J. Thorpe / Vision Research 46 (2006) 1762–1776

The early visual areas (V1) in human brain produce a recognizable sketch of the image in ~30ms
 The maximum neuron firing frequency is ~1kHz → ~30 t.u
 Far-fetched example ? See [Del Viva MM, Punzi G, Benedetti D PloS one (2013) - DOI: 10.1371/journal.pone.0069154] experimental evidence that V1 functionality can be quantitatively modeled as a "trigger".

#### What's special about the "brain algorithm"?

- Parallelism, of course but SVT and FTK are based on Associative Memories, that are very parallel devices as well...
   Two important differences, though:
- Hit processing in AM still happens serially, while the visual system has no such serialization -> lots of processing power in the connectivity
- Second, the AM has "rigid templates", while the brain works by interpolation of analog responses → this saves a lot of internal storage. Also, makes it easier to deal with "missing layers".

• Can we engineer these general concepts into a viable trigger system ?

# A "cellular" tracking algorithm

Inspired by mechanism of visual receptive fields [D.H. Hubel, T.N. Wiesel, J. Physiol. 148 (1959) 574],





November 17, 1999

INSTR99 - An Artificial Retina for Fast Track Finding - L. Ristori - INFN Pisa

- Not really new: a study shown by one of us at INSTR99 showed that the idea is conceptually implementable in a toy tracker although not considered viable at the time of CDF SVT [NIM A453 (2000) 425-429]
- Vaguely related to "Hough transform" [P.V.C. Hough, Conf. Proc. C590914 (1959) 554]
- However, it takes *a lot* more to design an actually competitive system

Today I describe a realistic implementation on a realistic pixel detector, with existing electronic components.

#### **Geometry and track parameters**

- An array of pixel detectors
- Each detector plane provides a (x,y) point at fixed z
- Measure straight tracks in 3D (4 parameters)
- e.g.:  $\theta_x$ ,  $\theta_y$ ,  $z_0$ , d (impact parameter)
- In case of presence of magnetic field, an additional pamater p is sufficient
- Does not need to assume B uniform, or perfect alignment



## Realistic geometry example



LHCb planned upgraded VELOPIX detector [LHCb-INT-2013-025)]
 Picked a 6-layer telescope for this exercise
 Neglect B field.

#### Mapping to detector to a receptor cell array

- Easy and intuitive way is to take two parameters from the intersection of tracks with an arbitrary plane
- This two parameters can then be mapped to a 2D main grid
- Remaining track parameters are implemented in 2 step



#### Mapping to detector to a receptor cell array

- Intersection of "base tracks" with detectors gives a map of "nerve endings"
- Every hit on the detector produces a signal on nearby receptors, depending on distance
- (I skip on several subtleties.
   For instance, effective operation require distribution to be non-uniform)
- (not unlike the distribution of photoreceptors in visual system – but it is all virtual in our case, that is, implemented in the electronic network connections)



#### Tracks appear as clusters in the cell array



#### **Parameter extraction**

- u,v parameters extracted directly from cluster centroid
  What about other 2 or 3 parameters ?
  - Add "lateral cells" and interpolate their response
    - Enough for a good estimate due to limited parameter spread.



# **Results**



All Resolutions are offline-grade !

#### Intermediate conclusions

- We have shown with a realistic detector arrangement that It is possibile to reconstruct tracks and measure their parameters very well with a "brain inspired" cell-matrix method

- This algorithm is instrinsically very parallelizable

However:

Is it actually implementable in a hardware with reasonable size, cost, and with the needed timing to work at LHC crossing frequency ?

# System Architecture



## Implementation

- Use modern, large FPGA devices.
  - Large I/O capabilities: now O(Tb/s) with optical links !
  - Large internal bandwidth a must !
  - Fully flexible, easy to program and simulate
  - Steep Moore's slope, and easy to upgrade
  - Highly reliable, easy to maintain and update
  - Industry's method of choice for complex project with a small number of pieces (CT scanners, high-end radars...)
  - We used Altera's Stratix V
    - Same device used elsewhere in LHCb readout system.



Let's find out if commercials say the truth....

#### Show All / Hide Al

Altera's <u>28-nm Stratix® V FPGAs</u> deliver the industry's highest bandwidth, highest level of system integration, and ultimate flexibility with reduced cost and the lowest total power for high-end applications.

### Hit delivery by the switching logic





The switch network "knows" where to deliver hits
 All information about the network of connections is embedded in the network via distributed LUTs







# **Cellular engine**



- Performs calculation of weights for a hit into a cell
- Deals with surrounding cells as well.
- Handles time-skew between events



 In second stage performs local clustering in parallel, and queues results to output

### Track parameter estimation by cluster Center-of-Mass



# Fitting within a Stratix-V device

- All main components:
  - Switch
  - Engines
  - CoM

implemented in VHDL and placed in the FPGA

 Can fit O(10^3) engines/chip
 exact number depends on details (timeordering of pixel data, etc.)

 Implies that a meaningful tracking system can be build with O(100) chips

| FPGA LAYOUT<br>ALTERA 5SGXEA7H3F35C3 (AMC 40 FPGA) |  |
|----------------------------------------------------|--|
| INTERFACES<br>SWITCH (7.5%~13%)                    |  |
| ENGINES (65-70%)                                   |  |
| CoM UNITs (12%)                                    |  |
| (5-15% BACKUP)                                     |  |
|                                                    |  |

#### Simulation and Timing



#### Further progress: LHCb full-MC at upgrade luminosity



# CONCLUSIONS

- We showed that the "retina algorithm" actually allows realtime track reconstruction in a real HEP detector application.
- We developed a design for a real-time track processor that works at LHC crossing frequency, with latency ~1µs
   Specific R&D for LHCb already well advanced
- Enpowers experiments at high-luminosities to work as if reading complete tracks straight out of the detector. Might lead to fruitful future developments.