6DSLAMwykorzystujacyobliczeniaGPGPU › sites › fp7-icarus.eu › files › publications ›...

NAUKA

6DSLAM wykorzystujacy obliczenia GPGPUJanusz Będkowski∗, Geert De Cubber∗∗, Andrzej Masłowski∗

∗Instytut Automatyki i Robotyki, Politechnika Warszawska, Warsaw, Poland, ∗∗Royal Military Academy,Brussels, Belgium

Streszczenie: Głównym celem jest usprawnienie algorytmu 6DSLAM za pomocą implementacji modułu rejestracji danych wyko-rzystującą obliczenia równoległe. Moduł rejestracji danych jestoparty o algorytm ICP (ang. Iterative Closest Point), który zostałw pełni zaimplementowany w architekturze GPU-NVIDIA FERMI.W naszych badaniach koncentrujemy się na mobilnych systemachrobotycznych inspekyjno-interwencyjnych dedykowanym do pracyw niebezpiecznym środowisku. Celem jest opracowanie komplet-nego systemu, który może być wykorzystany w realnej aplikacji. Wtym artykule przedstawiamy nasze rezultaty w zakresie lokaliza-cji i budowy mapy w trybie on-line. Przedstawiamy eksperymentw rzeczywistym, rozległym środowisku. Zostały porównane dwiestrategie dopasowywania danych, klasyczna oraz wykorzystującątzw. "meta scan".

Słowa kluczowe: 6DSLAM, obliczenia równoległe

1. Introduction6D SLAM (Simultaneous Localization and Mapping) algo-rithm provides robot position (x, y, z, yaw, pitch, roll) foreach time frame. In the same time this algorithm stores 3Dclouds of points being observations for each robot position.Algorithm is composed of data registration module, loopclosing module and map refinement module. Data regis-tration module is responsible for data matching betweentwo different robot observations giving as an output con-sistent clouds. It is possible to use several techniques fordata registration for example point to point [1], point toplane [2] or hybrid approach [3]. Loop closing modulefinds locations in the map visited by robot at least twice.Map refinement module redistributes an loop-closing errorover all scan in the loop. 6D SLAM is very demanding taskbecause of large amount of data to be processed, thereforeGPGPU computation brings a possibility to develop algo-rithm’s improvements resulting on-line computation. Webelieve that in near future such algorithms implementedusing dedicated GPGPU or even FPGA boards will dras-tically improve robotic applications.

2. Related WorkAlignment and merging of two 3D scans, which are ob-tained from different sensor coordinates, with respect to areference coordinate system is called 3D registration [4][5] [6]. Park [7] proposed a real-time approach for 3Dregistration using GPU, where the registration techniqueis based on the Iterative Projection Point (IPP) algorithm.The IPP technique is a combination of point-to-plane and

point-to-projection registration schemes [8]. Processingtime for this approach is about 60ms for aligning 2 3Ddata sets of 76800 points during 30 iterations of the IPPalgorithm. Unfortunately, the IPP algorithm has a prob-lem concerning the scalability of the implementation. Fastsearching algorithms such as the k-d tree algorithm areusually used to improve the performance of the closestpoint search [9] [10]. GPU accelerated nearest neighborsearch for 3D registration is proposed in work of Qiu [11].

A fast variant of the Iterative Closest Points (ICP) al-gorithm that registers the 3D scans in a common coordi-nate system and relocalizes the robot is shown in [12].Consistent 3D maps are generated using closing loop de-tection and a global relaxation. The loop closing algorithmdetects a loop by registering the last acquired 3D scan withearlier acquired scans, e.g., the first scan. If a registrationis possible, the computed error is in a first step dividedby the number of 3D scans in the loop and distributedover all scans. The authors reported that the computationtime of about 1 min per scan is acceptable, but that fur-ther improvement is needed. An algorithm for efficient loopclosing and consistent scan alignment that avoids iterativescan matching over all scans is proposed in [13]. Detect-ing loops in the path is done by using the Euclidean dis-tance between the current and all previous poses (distancethreshold of 5 meters), or using GPS data if available. Athreshold of minimal number of intermediate scans (e.g.,20) is used to circumvent continuous loop closing withinconsecutive scans.

Another scan registration approach using 3D-NDT(Normal Distribution Transform) is shown in [14] andautomatic appearance-based loop detection from 3D laserdata using the Normal Distributions Transform is demon-strated in [15].

Vision has also been used successfully for localizationof a mobile robot [16] [17] [18]. A comparison of loop clos-ing techniques in monocular SLAM is show in [19]. Loopclosure detection in SLAM by combining visual and spatialappearance was shown in [20]. This approach relies uponmatching distinctive signatures of individual local scenesto prompt loop closure. Another advantage is the possibil-ity to enhance robustness of loop closure detection by in-corporating heterogeneous sensory observations. The laserscan is divided into smaller but sizeable segments and thecomplexity of a segment is encoded via entropy. SIFT [21](Scale Invariant Feature Transform) descriptors are alsoused to match images. In [22] a methodology combining

/2012 Pomiary Automatyka Robotyka 1

NAUKA

visual and spatial appearance is shown, whereas in [23],an approach based on multiple map intersection detectionbased on visual features appearance is shown. The authorsin [24] encode the similarity between all possible pairingsof scenes in a similarity matrix and then pose the loopclosing problem as the task of extracting statistically sig-nificant sequences of similar scenes from this matrix. Theanalysis (introspection) and decomposition (remediation)of the similarity matrix allows for the reliable detection ofloops despite the presence of repetitive and visually am-biguous scenes. Relaxing loop-closing errors in 3D mapsbased on planar surface patches were shown in [25].

3. SLAM 6D6D SLAM is composed of following components:– data registration;

– loop closing;

– map refinement;The result is consistent 3D clouds of points representingan environment. In following subsections each componentwill be discussed.

3.1. Point to point ICPThe key concept of the standard ICP algorithm can besummarized in two steps [26]:1) compute correspondences between the two scans (Near-est Neighbor Search).2) compute a transformation which minimizes distance be-tween corresponding points.Iteratively repeating these two steps should result in con-vergence to the desired transformation. Range images(scans) are defined as model set M where

(|M | = Nm) (1)

and data set D where

(|D| = Nd) (2)

The alignment of these two data sets is solved by mini-mization following cost function:

E = (R, t) =Nm∑i=1

Nd∑j=1

wij ‖mi − (Rdj + t)‖2 (3)

wij is assigned 1 if the ith point of M correspond to thejth point in D. Otherwise wij=0. R is the rotation matrix,t is the translation matrix, mi correspond to points fromthe model set M , dj correspond to points from the dataset D.

The main idea of using the GPU is to decompose the3D space into a regular grid of 256 x 256 x 256 buckets.Because we are violating the assumption of full overlap,we are forced to add a maximum matching threshold dmax

related to the dimension of single bucket. This thresholdaccounts for the fact that some points will not have anycorrespondence in the second scan. In most implementa-tions of ICP, the choice of dmax represents a trade off be-tween convergence and accuracy. A low value results inbad convergence, a large value causes incorrect correspon-dences to pull the final alignment away from the correct

value. In our implementation the choice of dmax is done bya normalization point cloud, which has XY Z coordinatesfrom the interval < −1, 1 >. We improved the state of theart algorithm described in [9] by replacing the complexk − d tree data structure to improve the performance ofthe closest point search. We where focused on a soultionthat does not need to download any data structures fromCPU instead of 3D data points, therefore we obtained analgorithm fully implemented on GPGPU. All derivationsof investigated registration method can be found in [27].Classic ICP is listed as algorithm 1.

Algorithm 1 Classic ICPINPUT: Two point clouds A = {ai}, B= {bi}, an initialtransformation T0

OUTPUT: The correct transformation T, which alignsA and BT ← T0

for iter ← 0 to maxIterations dofor i← 0 to N do

mi ← FindClosestPointInA(T · bi)if ‖mi − T · bi‖ ≤ dmax then

wi ← 1else

wi ← 0end if

end forT ← argmin

T

{∑iwi ‖T · bi −mi‖2}

end for

3.2. GPGPU implementation of classic ICPNVIDIA GPGPUs are fully programmable multi core chipsbuilt around an array of processors working in parallel.Details about the GPU architecture can be found in [28]and useful additional programming issues are published in[29]. The GPU is composed of an array of SM multiproces-sors, where each of them can launch up to 1024 co-residentconcurrent threads.

It should be noticed that available graphics units are inthe range from 1 SM up to 30 SMs in high end products.Each single SM contains 8 scalar processors (SP) each with1024 32-bit registers, the total of 64KB of register spaceis available for each SM. Each SM is also equipped with a16KB on-chip memory that is characterized by low accesslatency and high bandwidth.

It is important to realize that all thread manage-ment (creation, scheduling, synchronization) is performedin hardware (SM), and overhead is extremely low. The SMmultiprocessors work in SIMT scheme (Single Instruction,Multiple Thread), where threads are executed in groups of32 called warps. The CUDA programming model definesthe host and the device. The Host executes CPU sequen-tial procedures, whereas the device executes parallel pro-grams - kernels. A kernel works acoording to a SPMDscheme (Single Program, Multiple Data). CUDA gives anadvantage of using massively parallel computation for sev-eral applications.

The ICP algorithm using CUDA parallel programmingis listed as algorithm 2.

2 Pomiary Automatyka Robotyka /2012

NAUKA

Algorithm 2 ICP - parallel computing approachINPUT: Two point clouds M = {mi}, D = {di}, aninitial transformation T0

OUTPUT: The correct transformation T , which alignsM and D

Mdevice ←M

Ddevice ← D

Tdevice ← T0

for iter ← 0 to maxIterations dofor i← 0 to N {in parallel} do

mi ← FindClosestPointInM (Tdevice ·di) {using reg-ular grid decomposition}if foundClosestP ointInNeighboringBuckets

thenwi ← 1

elsewi ← 0

end ifend forTdevice ← argmin

Tdevice

{∑iwi ‖T · di −mi‖2} {calcula-

tion T←R,t with SVD}end forM ←Mdevice

D ← Ddevice

T ← Tdevice

3.3. Loop closingLoop closing occurs when robot is visiting the same placea second time. It is very important to realize that therobot is not able to perform exactly the same path, there-fore a displacement between loop closing robot positionsoccurs. In our opinion, this displacement should by mini-mized to increase the efficiency of the loop closing method.This can be done using the strategy of robot motion, es-pecially when it traverses narrow paths. Even when theloop closing method is guarantying heading and displace-ment invariance, the compromise has to be taken into theconsideration. For this reason we define loop closing as alocation of the robot that it visited twice with the similarregion of observation. The main concept of classing loopclosing approach is to find robot locations neighboring thecurrent robot position in the path-graph obtained by ICPodometry correction. The decision concerning loop closingis performed based on the best matching using also ICPmethod. Because the classic method needs to perform ICPfor all robot locations in the neighborhood, the processingtime is increasing with its radius. The loop closing trans-formation matrix is found from best matching.

3.4. Map refinementMap refinement is done by distributing the loop closingerror over all nodes in the loop. Each node stores an infor-mation concerning local 3D map - robot observation androbot position (x, y, z, yaw, pitch, roll). When loop closingoccurs we obtain the consistent map.

4. ExperimentsIn the experiments commercially available robot PIO-NEER 3AT was used (figure 1). Robot is equipped with

Rys. 1. Robot PIONEER 3AT z laserowym systemempomiarowym 3D.

Fig. 1. Robot PIONEER 3AT equipped with 3D lasermeasurement system.

3D laser measurement system 3DLSN based on rotatedSICK LMS 200. The robot was acquiring observations ina stop-scan fashion with a one meter step. Each scan con-tains 361 (horizontal) x 498 (vertical) data points. An en-vironment where robot acquired a set of 3D scans is shownon figure 2; A map composed of several aligned clouds ofpoints is shown on figure 3. Figures 4 and 5 show animportant comparison between two different strategies ofdata registration. Green line corresponds to robot trajec-tory acquired by odometry. Red line corresponds to robottrajectory corrected using iteratively ICP algorithm. Blueline corresponds to robot trajectory corrected using ICPalgorithm aligning iteratively next scan with previouslybuild so called meta scan. Meta scan stores few scans fromprevious robot locations, therefore an odometry error iscorrected better than using only iteratively ICP. Figure 6shows a result of loop closing and map refinement modules.

5. ConclusionThe robot was acquiring observations in a stop-scan fash-ion with a one meter step. The goal was to align iterativelyall scans, therefore the odometry error was decreased. Weset as a benchmark for obtaining a satisfying result ofodometry correction, performed with different strategiesof data registration, that the loop closing procedure canbe performed. In our approach, all strategies can be usedas a component of a 6D SLAM processing pipeline, but werecomend using meta scan strategy. The main observationis that both strategies did correction of robot path derivedfrom odometry with gyroscopic correction system.

However, we want to emphasize the fact that the accu-racy of ICP strongly depends on various of factors. The ac-curacy of ICP point to point depends more on the amountof points in the bucket (it can be tuned by normalizationof the point cloud) rather than number of iterations. Theaverage amount of points in buckets should be more than10 and less than 100 to obtain accurate alignment with on-


NAUKA

Rys. 2. Środowisko robota podczas eksperymentu (Royal MilitaryAcademy, Bruksela, Belgia). Trajektoria robota została

zaznaczona linią koloru czerwonego.Fig. 2. An environment during an eperiment (Royal Military

Academy, Brussels, Belgium). Robot’s path is shown as red line.

Rys. 3. Mapa składająca się z kilku dopasowanych chmurpunktów.

Fig. 3. Map composed of several aligned clouds of points.

Rys. 4. Porównanie dwóch różnych strategii rejestracji danych.Fig. 4. Comparison between two different strategies of data

registration.

line computation (300ms for 30 iterations). We observedthat 30 iterations guarantee satisfying result.

Rys. 5. Porównanie dwóch różnych strategii rejestracji danych(widok perspektywistyczny).

Fig. 5. Comparison between two different strategies of dataregistration (perspective view).

Rys. 6. Rezultat modułu zamykania pętli oraz modułu poprawymapy.

Fig. 6. A result of loop closing and map refinement modules.

The contribution of this paper is the improvement ofa state of the art 6D SLAM approach by using parallelcomputation. Two strategies of data registration were com-pared. Empirical evaluation performed on a large data setshowed the on-line capability of parallel computation.

AcknowledgmentThis work is performed during postdoctoral scholarship(CAS/19/POKL 15.05.2011-15.11.2011) in Royal MilitaryAcademy, Unmanned Ground Vehicle Center, Brussels,Belgium, funded by Center for Advanced Studies, WarsawUniversity of Technology (project: "priorytet IV ProgramuOperacyjnego Kapitał Ludzki z Europejskiego FunduszuSpołecznego").

Bibliografia1. J. Bedkowski, Data registration module - a component

of semantic simulation engine, in: Proceedings of 5thEuropean Conference on Mobile Robots ECMR 2011,Orebro, Sweden, 2011, pp. 133–138.

2. J. Bedkowski, A. Maslowski, GPGPU implementationof on-line point to plane 3d data registration, in: Pro-ceedings of the 2011 International Conference on Elec-trical Engineering and Informatics Volume 2, 17-19July 2011, Bandung, Indonesia, 2011, pp. 931–936.

3. J. Bedkowski, Parallel implementation of hybridicpdata registration, Elektronika (8) (2011) 114–118.


NAUKA

4. D. Huber, M. Hebert, Fully automatic registration ofmultiple 3d data sets, Image and Vision Computing21 (1) (2003) 637–650.

5. A. W. Fitzgibbon, Robust registration of 2d and 3dpoint sets, in: In British Machine Vision Conference,2001, pp. 411–420.

6. M. Magnusson, T. Duckett, A comparison of 3d regis-tration algorithms for autonomous underground min-ing vehicles, in: In Proc. ECMR, 2005, pp. 86–91.

7. S.-Y. Park, S.-I. Choi, J. Kim, J. Chae, Real-time 3dregistration using gpu, Machine Vision and Applica-tions (2010) 1–1410.1007/s00138-010-0282-z.URL http://dx.doi.org/10.1007/s00138-010-0282-z

8. S.-Y. Park, M. Subbarao, An accurate and fastpoint-to-plane registration technique, Pattern Recogn.Lett. 24 (2003) 2967–2976. doi:10.1016/S0167-8655(03)00157-0.

9. A. Nuchter, K. Lingemann, J. Hertzberg, Cachedk-d tree search for icp algorithms, in: Proceedingsof the Sixth International Conference on 3-D Dig-ital Imaging and Modeling, IEEE Computer So-ciety, Washington, DC, USA, 2007, pp. 419–426.doi:10.1109/3DIM.2007.15.

10. S. Rusinkiewicz, M. Levoy, Efficient variants of theICP algorithm, in: Third International Conference on3D Digital Imaging and Modeling (3DIM), 2001.

11. D. Qiu, S. May, A. Nuchter, Gpu-accelerated near-est neighbor search for 3d registration, in: Proceedingsof the 7th International Conference on Computer Vi-sion Systems: Computer Vision Systems, ICVS ’09,Springer-Verlag, Berlin, Heidelberg, 2009, pp. 194–203.

12. H. Surmann, A. Nuchter, K. Lingemann, J. Hertzberg,6d slam - preliminary report on closing the loop insix dimensions, in: Proceedings of the 5th IFAC Sym-posium on Intelligent Autonomous Vehicles (IAV’04),2004.

13. J. Sprickerhof, A. Nuchter, K. Lingemann,J. Hertzberg, An explicit loop closing techniquefor 6d slam, in: 4th European Conference on MobileRobots, September 23-25, 2009, Mlini/Dubrovnik,Croatia, 229-234, 2009.

14. M. Magnusson, T. Duckett, A. J. Lilienthal, 3d scanregistration for autonomous mining vehicles, Journalof Field Robotics 24 (10) (2007) 803–827.

15. M. Magnusson, H. Andreasson, A. Nüchter, A. J.Lilienthal, Automatic appearance-based loop detec-tion from 3D laser data using the normal distributionstransform, Journal of Field Robotics 26 (11–12) (2009)892–914.

16. P. Newman, D. Cole, K. L. Ho, Outdoor slam usingvisual appearance and laser ranging, in: Proceedingsof the IEEE International Conference on Robotics andAutomation (ICRA), 2006.

17. P. Newman, K. L. Ho, Slam - loop closing with visuallysalient features, in: IEEE International Conference onRobotics and Automation, 18-22 April, 2005.

18. M. Cummins, P. Newman, Fab-map: Probabilistic lo-calization and mapping in the space of appearance,

The International Journal of Robotics Research 27 (6)(2008) 647–665.

19. B. Williams, M. Cummins, J. Neira, P. Newman,I. Reid, J. Tardos, A comparison of loop closing tech-niques in monocular slam, Robotics and AutonomousSystems.

20. K. L. Ho, P. M. Newman, Loop closure detectionin slam by combining visual and spatial appearance,Robotics and Autonomous Systems (2006) 740–749.

21. D. G. Lowe, Distinctive image features from scale in-variant keypoints, Int. J. Comput. Vision 60 (2004)91–110.

22. K. Leong, P. Newman, Combining visual and spatialappearance for loop closure detection in slam, in: 2ndEuropean Conference on Mobile Robots ECMR, 2005.

23. K. L. Ho, P. Newman, Multiple map intersection de-tection using visual appearance, in: 3rd InternationalConference on Computational Intelligence, Roboticsand Autonomous Systems, Singapore, 2005.

24. K. L. Ho, P. Newman, Detecting loop closure withscene sequences, Int. J. Comput. Vision 74 (2007) 261–286.

25. N. V. Kaustubh Pathak, Max Pfingsthorn, A. Birk,Relaxing loop-closing errors in 3d maps based on pla-nar surface patches, in: 14th International Conferenceon Advanced Robotics (ICAR), IEEE Press, 2009.

26. A. Segal, D. Haehnel, S. Thrun, Generalized-icp, in:Proceedings of Robotics: Science and Systems, Seattle,USA, 2009.

27. A. Nüchter, J. Hertzberg, Towards se-mantic maps for mobile robots, Robot.Auton. Syst. 56 (11) (2008) 915–926.doi:http://dx.doi.org/10.1016/j.robot.2008.08.001.

28. NVIDIA CUDA C Programming Guide 3.2,http://www.nvidia.com/cuda (10 2010).

29. CUDA C Best Practices Guide 3.2,http://www.nvidia.com/cuda (8 2010).

6DSLAM with GPGPU computationAbstract: The main goal was to improve a state of the art 6DSLAM algorithm with a new GPGPU-based implementation of dataregistration module. Data registration is based on ICP (IterativeClosest Point) algorithm that is fully implemented in the GPU withNVIDIA FERMI architecture. In our research we focus on mobilerobot inspection intervention systems applicable in hazardous en-vironments. The goal is to deliver a complete system capable ofbeing used in real life. In this paper we demonstrate our achieve-ments in the field of on line robot localization and mapping. Wedemonstrated an experiment in real large environment. We com-pared two strategies of data alignment - simple ICP and ICP usingso called meta scan.

Keywords: 6DSLAM, parallel computation

dr inż. Janusz Będkowski


NAUKA

PhD in Automation and Robotics, Assistant Professor in Instituteof Automatic Control and Robotics - Warsaw University of Tech-nology; adjunct in Industrial Research Institute for Automation andMeasurements as well as Institute of Mathematical Machines. Thescope of research: inspection and intervention robot systems, se-mantic mapping, virtual training with AR techniques.

e-mail: [email protected]

dr Geert De Cubber

Geert De Cubber was born in Halle, Belgium, in 1979. He receivedhis PhD. in engineering from the Vrije Universiteit Brussels (VUB),Belgium and the Royal Military Academy (RMA), Belgium. He iscurrently head of the research activities of the Unmanned VehicleCentre of the Belgian Royal Military Academy. His main research

interest goes out to computer vision algorithms which can be usedby intelligent mobile robots.e-mail: [email protected]

prof. dr hab. inż. Andrzej Masłowski

Full Professor in Automation and Robotics in Institute of Auto-matic Control and Robotics - Warsaw University of Technologyas well as Air Force Academy in D?blin. The scope of research:intelligent mobile systems, multirobot systems for RISE applica-tions, e-Training for advanced mobile systems. Member of IFIP,IFAC, IMACS, IMEKOTC-17Measurement andControl in Robotics.Since 2006 Representative of Poland in International AdvancedRobotics Programme.e-mail: [email protected]


6DSLAMwykorzystujacyobliczeniaGPGPU › sites › fp7-icarus.eu › files › publications ›...

Documents

Transcript of 6DSLAMwykorzystujacyobliczeniaGPGPU › sites › fp7-icarus.eu › files › publications ›...