[IEEE 2014 15th International Radar Symposium (IRS) - Gdansk, Poland (2014.6.16-2014.6.18)] 2014...

Computational Performance Improvements of Multiple Hypothesis Tracking Algorithm

1) Marek Kwiatkowski, 1) Miosław Sankowski 1) PIT-RADWAR S.A.

Gdańsk, Poland [email protected]

1,2) Dawid Łukwiński 2) Dept. of Electronics, Telecommunications and Informatics

Gdańsk University of Technology Gdańsk, Poland

Abstract—Multiple hypothesis tracking (MHT) is considered to be the preferred solution for track formation and target tracking in complex environment. Main difficulty with MHT is it’s high computing power requirements. This paper deals with several aspects of computational performance of this algorithm. Improvements like space division, multithread processing and fast hypotheses generation techniques are considered.

Index Terms—Target tracking, multiple hypothesis tracking, multithread processing.

I. INTRODUCTION The objective of target tracking is to automatically analyse

the incoming data in order to correlate measurements and establish hypotheses on the real targets in the controlled space, which are represented in the system by entities called tracks. Modern radar systems require complex processing methods to achieve expected quality of results. For target tracking one of such methods is a multiple hypothesis tracker (MHT).

The MHT algorithm is based on a deferred decision logic, in which alternative data association hypotheses are formed whenever conflict situations occur. Then, rather than choosing the best hypothesis or combining them, the hypotheses are propagated to the future, in anticipation that subsequent data support resolving the uncertainties. Such approach has one major drawback: very high computational complexity.

In this paper three MHT improvements are discussed:

• limiting association area by dividing observed space into a number of range-azimuth sectors,

• multithread (parallel) processing,

• fast global hypothesis generation algorithm.

The above methods do not exhaust all possible ways of performance improvements of the MHT. Nevertheless, the discussed techniques make allowances for practical implementation of the MHT in a real radar system.

II. PERFORMANCE IMPROVEMENTS

A. Space division Dividing observed space for range-azimuth sectors, which

is illustrated in Fig. 1, is a natural way of limiting association

area in typical radar applications, which is often used in practice. Such division provides easily distinguishable and clearly defined periods of time for data acquisition, association and processing. It is an important preparation step for parallel computations, as certain sectors can be processed simultaneously. It also limits the number of possible associations of plots to the sectors adjacent to the currently processed sector. At the same time the non-processed sectors can be reordered by removing old data, truncating least likely branches, or by moving data to correct segments.

Figure 1. Division of space into sectors and segments

Division of space for the tracking algorithm must take into consideration all relevant classes of radar targets. The sizes of the segments reflect a distance that can be passed by considered targets travelling at their maximum speeds during one antenna rotation. A target shall not be associated to more than two adjacent segments. Additional difficulty is observed for the areas close to the radar, where the sectors can be very narrow.

It is worth noticing that for a uniform sectors’ split the areas that cover subsequent segments get larger with increasing distance from the radar. To assure the uniform areas of the segments the following formulae are applied for calculating their distance borders:

SEGMAXiSEGMAX

iRRi _,...,1,_max == (1)

where i is the segment number, Ri is the upper distance border of the i-th segment, Rmax denotes the maximum (instrumented)

radar range, while MAX_SEG is the amount of segments. In practice another formula is also used for calculating the segment number:

SEGMAXR

rSEGMAXi _mod*_2max

2

⎟⎟⎠

⎞⎜⎜⎝

⎛= (2)

where r is the distance between the radar and a plot or a track. This approach makes the segments that are located close to the radar narrow (in azimuth) and long (in distance), while the segments at far distance wide and short. One must also remember that splitting the area into too many sectors and segments would not guarantee that moving targets during a single rotation transit to at most the adjacent segment and not further.

Another important problem is the interaction among tracks and plots near segments’ borders. This is important especially when establishing a hypothesis referring to a track that can be linked to plots located in the adjacent segment. In order to manage such situations, for new hypotheses one has to involve updated tracks from adjacent segments.

Data processing is sequential and starts after acquiring a given sector data. Data from subsequent segments within the sector can be processed sequentially or in parallel.

B. Multithread processing Development of modern computer hardware go towards

multicore solutions. Because of power and cooling constraints adding more processing units is much more feasible than increasing single core processing rate.

Decomposition of MHT algorithm for parallel processing can be done in two main aspects:

• gating and association (expanding hypotheses tree),

• filtering (state estimation).

Parallel processing schemes require access control to shared data. The main structure common for every computing thread is the hypotheses tree. Several techniques can be used for providing thread-safe access to shared data structures.

A major element that requires controlled thread access is the set of track tree nodes, which are subject to write and change operations. For the set of plots that are accessed only in a read-only manner, such conflict does not take place. Therefore, our gating and filtering functions are node-oriented.

Among threads data are shared using the list mechanism, which eliminates the problem of accessing variables at the same time. Before accessing any list element by a given thread, either in write or the read mode, the simultaneous access is locked against other threads. If no earlier lock is found, the current lock succeeds and the thread performs required operations. Otherwise it waits until the required node is unlocked. This data access model is illustrated in Fig. 2.

In order to minimise the total time period spent by threads waiting and accessing shared data, each thread gets an assumed amount of data. Based on some basic experiments the lengths of the accessed data lists are set to 5 for gating and 50 for state

estimation. This difference comes from specific operations performed at these stages.

Figure 2. Block diagram of controlled access to shared data

C. Fast global hypothesis generation An optimum solution to the best global hypothesis search is

extremely expensive in terms of computation burden. An alternative greedy-class algorithm is employed. The greedy strategy does not produce an optimal solution in general, though it yields locally optimal solution at a reasonable cost.

The greedy algorithm used has the following form:

1) Sort the list in descending rating order 2) Establish an “empty” hypothesis 3) For each track from the list:

a. Check if it is compatible with the hypothesis b. If yes, add the track to the hypothesis The method is fast, though it does not guarantee an

optimum solution. In order to increase the chance of ending with the best solution, few hypotheses can be followed with different starting points. By employing multi-thread processing the time required for such increased burden can be reduced.

III. PERFORMANCE TESTS OF ALGORITHM The main objective of the study is to investigate the

possibility of decomposing the MHT algorithm for parallel processing to increase the overall performance of the tracking process. For this purpose, simulations are conducted using up to four threads and 5 to 100 targets in the environment with various levels of false positives (further also called noise): 0, 0,001/km3 and 0,01/km3, which corresponds to 0, 80 and 800 false positives per one antenna revolution. Impact of division on azimuthal sectors and range segments is also checked: 1 segment and 1 sector, 1-8, 8-1 and 8-8. Results of 240 simulations are presented using possible combinations of parameters. The tests are conducted on Red Hat Enterprise Linux on a computer with a quad-core Intel Xeon X5570 processor with 2.93 GHz clock.

Simulated trajectories of objects cover time of 500 seconds with 2s period of revolution of the antenna. They differ in the initial place and course, with similar shape of the flight path. Illustration is shown in fig. 3.

Based on the collected data simulation execution time and scaling factor are analysed for different numbers of threads and various space division schemes. As a point of reference execution time of one thread is used with no division into azimuth-distance areas.

The scaling factor is determined using

4,2,1,1

== ifortt

W ii (3)

where t1 is reference time, is ti is the computing time using i threads. The results are summarized in the following categories: overall performance, scaling with respect to the allocation of space and to the number of threads.

The data are presented in various ways and focus on various aspects. The first section, which describes the speed of operation, evaluates the entire simulation execution time (the lower the score the better). Figures in this section are grouped with respect to the division of space and each graph refers to the case of a specific scenario.

The following sections describe how the speed of operation changes due to the division of space and the number of threads. The reference point is the computation time for cases of non-divided space or using only one thread. Graphs were scaled according to the formula (3), so that the reference value is 1, values below 1 are positive and greater than 1 negative. It is worth noting that the level of scaling does not tell anything about the time of the calculation. The case of perfect scaling may not be within the given period of time, and vice versa, the case of an unfavourable scaling can be within a specified limit.

Figure 3. Example of simulated trajectories

A. Overall performance Fig. 4 and 5 show the simulation execution times for

different numbers of objects and with different levels of false positives, no space division and different number of threads.

For terms and conditions without false positives (blue columns in the graphs in fig. 4 to 6) single thread is enough that computation time was shorter than the pre-set time simulation (500s). After adding noise ratio (maroon column) of 0,001/km3 (80 echoes per revolution) only at 100 objects algorithm exceeded the allotted time, however, the use of multiple threads speeds up the calculations and allows to fit a predetermined limit. The most difficult case (yellow columns), in which false detection are at a level of 0.01/km3 (800 echoes), only for five objects and for 3 or 4 threads the duration of the simulation does not exceed the limit of 500 seconds.

Figure 4. Simulation time for 5 and 30 objects without space division

It is worth noting that the increase in the calculation time with the increase in the number of objects and the level of false positives is not linear. Increasing the noise ten times the calculation time will be longer than 14 to 170 times depending on the number of targets (the more objects, the less elongation). Increasing the number of targets from 50 to 100 causes a 5-fold increase, but from 15 to 30, only about 3-fold, and 5 to 15, about 6.5-fold. This illustrates how sensitive is the MHT to the amount of incoming data and their quality. Increase of data amount by adding false positives causes smaller prolongation than the increase in the number of "real" targets.

For comparable numbers of plots computation times are:

• 100 objects, noise 0 – 263,48s,

• 30 objects, noise 0,001 – 61,18s,

• 50 objects, noise 0,001 – 110,17s.

Algorithm is less loaded with unrelated data (noise) than with plots from simulated targets.

Space division reduces the number of attempts to associate plots to tracks, which results in a significant increase in computational efficiency. It is worth noting that for low intensity of incoming data performance deteriorates (fig. 6). This is described in the following sections.

The best results are achieved when divided into 8 sectors of azimuth and 8 segments of distance. The improvement compared to no division case is 50 times for 1 thread, 100

objects and noise ratio of 800 false positives (fig. 6). The more incoming data, the higher the achieved benefits of the division.

Figure 5. Simulation time for 50 and 100 objects without space division

B. Scaling with respect to space division and thread’s number Fig. 7 and 8 shows scaling of calculations time with respect

to space division. Each chart shows the situation for a specific number of threads and level of false positives. Each column group represents different space division scheme for various number of targets. The first group is the reference and relates to the lack of division, and its value was set at 1. Values below 1 represents better performance, above 1 worse.

At zero noise (fig. 7) it is difficult to distinguish whether the division into segments or sectors produces better results. For a smaller number of targets it is preferable to split into segments. Operation on 8 sectors and 8 segments causes deterioration of performance in relation to other divisions. It is especially noticeable for a small number of targets and 4 threads, where the simulation took longer than without division, for example for 5 objects and 4 threads almost 4.6 times longer. This is not a problem because total computation burden is generally very low in simple scenarios.

With bigger noise the space division shows its advantages (fig. 8). For the largest number of targets fiftyfold improvement was achieved. For fewer targets the improvement was twentyfold to fivefold. For small number of plots results degrade when space is divided. This is due to constant processing cost of a single space cell. The higher degree of division, the higher cost of handling such structure.

Figure 6. Simulation time for 5 and 100 objects, 8 sectors and 8 segments

The next analysed aspect is how number of threads affects processing time. In ideal conditions multithread processing time is equal to the single thread processing time divided by the number of threads. In modelled case the thread decomposition was implemented for two stages:

• gating and hypothesis tree expansion,

• Kalman filtering.

The ideal conditions are not feasible mainly because not all computation stages were decomposed. Moreover, multithread processing requires special handling of data structures, which also has negative impact on performance scaling.

Fig. 9 shows scaling without space division. Improvement can be seen for two and more threads. Best scaling results can be seen for scenarios with medium number of targets. Also adding processing threads does not provide constant improvement in performance. At certain point it can decrease it (i.e. 4th thread in scenario with few targets). Nevertheless, without space division the overall processing performance is not rewarding. Fig. 10 shows the same scenario with space division. In this case scaling is far worse. For low number of targets multithread processing takes more time than single thread. For fifty and more targets adding second and third thread provides improvement in performance. These gains are not as big as in the previous case, but one should remember, that space division significantly shortens processing time.

Further analysis of processing times of separate functions of the algorithm are gathered in tables 1 and 2.

Figure 7. Scalling acording to space division, without clutter.

Figure 8. Scalling acording to space division, 800 false detections per scan.

Figure 9. Scalling acording to number of threads, without space division.

Figure 10. Scalling acording to number of threads, with space division.

Table I show results for low number of objects without noise. As previously mentioned, without space division algorithm scales well. Adding second gating thread improves its performance almost twice. Third thread gives very small gain, while fourth deteriorate. Kalman filtering gives small improvement with every thread added. Results for divided space are contrary. Each additional thread decreases performance. Important thing to be addressed is the overall performance improvement – all time values are shorter by the factor of 10 to 100 compared to algorithm without space division. Because of this even if multithread version performs worse with space division in this case, these differences are insignificant due to generally very small computational load.

TABLE I. PROCESSING TIMES [S] FOR 5 TARGETS WITHOUT CLUTTER.

Number of threads

1 sector, 1 segment 8 sectors, 8 segmentsgating filtering gating Filtering

One sum 0,40084 0,35732 0,49770 0,52572avg 0,00160 0,00143 3,11E-05 3,29E-05

Two sum 0,20660 0,28144 0,64995 0,59642avg 0,00083 0,00113 4,06E-05 3,73E-05

Three sum 0,18584 0,23186 0,77269 0,70470avg 0,00074 0,00093 4,83E-05 4,4E-05

Four sum 0,19271 0,21986 1,31032 1,20323avg 0,00077 0,00088 8,19E-05 7,52E-05

Results for the high number of targets with dense noise are shown in table II. In such intense scenario each proposed method gives noticeable improvement.

TABLE II. PROCESSING TIMES [S] FOR 100 TARGETS.

Number of threads

1 sector, 1 segment No clutter

8 sectors, 8 segmentsWith clutter*

gating filtering gating filtering

One sum 153,21 31,77 52,79 24,38 avg 0,61 0,13 0,0033 0,0015

Two sum 82,12 19,70 32,70 18,49 avg 0,33 0,08 0,002 0,0012

Three sum 57,96 18,20 27,13 15,78 avg 0,23 0,073 0,0017 0,001

Four sum 39,79 17,73 24,87 14,92 avg 0,16 0,071 0,0016 0,001

*About 800 false detections per scan.

Adding more threads is most efficient for gating function. Four-thread processing shortens overall gating time by about 74% (no space division) and 53% (with space division). For filtering function these improvements are respectively 54% and 39%. Enabling space division can yield additional 10% for gating and 9% for filtering. One can also notice, that performance gain from space division (58%) is similar to that from adding second and third thread (59%). Using all

improvements mentioned in table II it is possible to shorten processing time by almost 80%.

The last proposed improvement is hypothesis formation using a greedy-class algorithm. Fig. 11 shows global hypothesis formation time for full (optimal) and greedy method in rather simple scenario with 3 targets and moderate clutter.

Figure 11. Global hypothesis formation time

As one can see the greedy algorithm performs very fast and stable, taking 10 to 100 microseconds per scan. In the same scenario full hypothesis search takes up to few seconds per scan, which is unacceptable in real-time applications. Suboptimal nature of greedy algorithms allows to expect it perform worse than full search algorithm. Although several synthetic tests run during research period were able to highlight some differences, they were minor (less than 5% of hypothesis score) and temporal (lasting for only one or two scans). Additionally in most tests differences were not observed at all.

IV. CONCLUSIONS Presented research shows that the considered improvements

have good impact on performance of MHT algorithm. Space division is necessary to properly process complicated scenario with many targets in cluttered environment. Decomposition of gating and filtering functions for multithread processing also shortens processing times. Using more threads, up to four discussed in this paper, significantly increases the performance. The only drawback of the algorithm involving all proposed methods is decreased performance for low number of targets in clean environment. However, in such situation only a small fraction of processing power is utilised, so even for the worst case scenario of twofold decrease it is practically insignificant.

Simulation results show that the proposed algorithm is capable of tracking over 100 targets in cluttered environment using a modern single-board computer.

[1] S. Blackman, R. Popoli, “Design and analysis of modern tracking

systems,” Artech House, Boston 1999. [2] M. Kwiatkowski, D. Lukwinski, M. Sankowski, “Badania wydajnosci

wielohipotezowego algorytmu sledzenia celow”, Proc. UiSR Conf.. Sobienie Szlacheckie, Dec. 2012, pp. 1-13.

[3] M. Mehrara, T. Jablin, D. Upton, D. August, K. Hazelwood, S. Mahlke, “Multicore compilation strategies and challenges”, IEEE Signal Processing M. Vol. 26, 2009.

[IEEE 2014 15th International Radar Symposium (IRS) - Gdansk, Poland (2014.6.16-2014.6.18)] 2014...

Documents

Transcript of [IEEE 2014 15th International Radar Symposium (IRS) - Gdansk, Poland (2014.6.16-2014.6.18)] 2014...