TY - CHAP
ID - QiaoCreput2016_930
PY - 2016-05-29
T1 - Stereo Matching by using Self-distributed Segmentation and Massively Parallel GPU Computing
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - As an extension of using image segmentation to do stereo matching, firstly, by using self-organizing map (som) and K-means algorithms, this paper provides a self-distributed segmentation method that allocates segments according to image's texture changement where in most cases depth discontinuities appear. Then, for stereo, under the fact that the segmentation of left image is not exactly same with the segmentation of right image, we provide a matching strategy that matches segments of left image to pixels of right image as well as taking advantage of border information from these segments. Also, to help detect occluded regions, an improved aggregation cost that considers neighbor valid segments and their matching characteristics is provided. For post processing, a gradient border based median filter that considers the closest adjacent valid disparity values instead of all pixels' disparity values within a rectangle window is provided. As we focus on real-time execution, these time-consumming works for segmentation and stereo matching are executed on a massively parallel cellular matrix GPU computing model. Finaly, we provide our visual dense disparity maps before post processing and final evaluation of sparse results after post-processing to allow comparison with several ranking methods top listed on Middlebury.
AD - Zakopane, Pologne
JF - International Conference on Artificial Intelligence and Soft Computing. Lecture Notes in Artificial Intelligence LNAI, Volume 9693, Springer International Publishing, 2016: 723-733
SN - ISBN 9783319393841 • 9783319393834
KW - stereo
KW - image segmentation
KW - SOM
KW - self-distributed segments
UR - http://dx.doi.org/10.1007/978-3-319-39384-1_64
UR - http://www.multiagent.fr/Special:FOAF/bib:930
ER -
TY - CHAP
ID - QiaoCreput2017_961
PY - 2017-11-01
T1 - Parallel 2-Opt Local Search on GPU
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges' 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2- opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.
JF - World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering, 2017, 11(3): 291-295.
KW - Doubly linked list
KW - parallel 2-opt
KW - tour division
KW - GPU.
UR - http://www.multiagent.fr/Special:FOAF/bib:961
ER -
TY - CHAP
ID - QiaoCreput2017_962
PY - 2017-06-15
T1 - Massive Parallel Self-organizing Map and 2-Opt on GPU to Large Scale TSP
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - This paper proposes a platform both for parallelism of self-organizing map (SOM) and the 2-opt algorithm to large scale 2-Dimensional Euclidean traveling salesman problems(TSP). Given a TSP tour with N input point, this platform makes these two algorithms working in a massively parallel way on graphical processing unit (GPU). Advantages of this platform include its flexibly topology preserving network, its fine parallel granularity and it allows maximum (N/3) 2-opt optimization moves to be executed with O(N) time complexity within one tour orientation as well as keep the tour valid. The parallel technique follows data decomposition and decentralized control. We test this optimization method on large TSPLIB instances, experiments show that the acceleration factor we obtained makes the proposed method competitive, and allows for further increasing for very large TSP instances along with the quantity increase of physical cores in GPU systems.
AD - Cádiz, Espagne
JF - International Work-Conference on Artificial Neural Networks. Lecture Notes in Computer Science LNCS, Volume 10305, Springer, Cham, 2017: 471-482.
T3 - LNCS, volume 10305
SP - 471
EP - 482
SN - ISBN 978-3-319-59152-0
KW - irregular topology
KW - massive parallel 2-opt
KW - SOM
KW - doubly linked network
KW - local spiral search
KW - TSP
UR - http://dx.doi.org/10.1007/978-3-319-59153-7_41
UR - http://www.multiagent.fr/Special:FOAF/bib:962
ER -
TY - CHAP
ID - QiaoCreput2018_1043
PY - 2018-12-16
T1 - Massive 2-opt and 3-opt Moves with High Performance GPU Local Search to Large-scale Traveling Salesman Problem
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - 2-opt, 3-opt or k-opt heuristics are classical local search algorithms for traveling salesman problems (TSP) in combinatorial optimization area. This paper introduces a judicious decision making methodology of offloading which part of the k-opt heuristic works in parallel on Graphics Processing Unit (GPU) while which part remains sequential, called ``multiple k-opt evaluation, multiple k-opt moves", in order to simultaneously execute, without interference, massive 2-/3-opt moves that are globally found on the same TSP tour or the same Euclidean space for many edges, as well as keep high performance for GPU massive k-opt evaluation. We prove the methodology is judicious and valuable because of our originally proposed sequential non- interacted 2-/3-exchange set partition algorithm taking linear time complexity and a new TSP tour representation, array of ordered coordinates-index, in order unveil how to use GPU on-chip shared memory to achieve the same goal as using doubly linked list and array of ordered coordinates for parallel k-opt implementation. We test this methodology on 22 national TSP instances with up to 71009 cities and with brute initial tour solution. Average maximum 997 non-interacted 2-opt moves are found and executed on the same tour of ch71009.tsp instance in one iteration of our proposed method. Experimental comparisons show that our proposed methodology gets huge acceleration over both classical sequential and a possible current fastest state-of-the-art GPU parallel 2-opt implementation.
AD - Kalamata, Greece
JF - International Learning and intelligent OptimizationN Conference
PB - LION 2018, LNCS 11353
KW - Massive 2-opt moves
KW - Parallel 2-opt
KW - Optimization
KW - High performance GPU Computing
UR - http://dx.doi.org/10.1007/978-3-030-05348-2_8
UR - http://www.multiagent.fr/Special:FOAF/bib:1043
ER -
TY - THES
ID - Qiao2018_1046
PY - 2018-03-14
T1 - GPU component-based neighborhood search for Euclidean graph minimization problems
AU - Qiao,Wenbao
N2 - In this thesis, we propose parallel solutions based on current graphics processing unit (GPU) system for two Euclidean graph minimization problems, namely the Euclidean minimum spanning forest/tree (EMSF/EMST) and the travelling salesman problem (TSP). The proposed solutions also solve the bichromatic closest pair (BCP) problem, and follow technique of ``decentralized control, data parallelism, GPU shared memories". We propose a Euclidean K-dimensional nearest neighbourhood search (NNS) technique, K-d search, based on classical Elias' NNS approaches that divide the Euclidean space into congruent and non- overlapping cells where size of points in each cell is bounded. We propose a pruning technique to obtain component- based NNS to find a query point set Q's closest outgoing point within sequential linear time complexity when the data is uniformly distributed. These techniques are used together with two proposed GPU tree traversal algorithms, namely the GPU two- direction Breadth- first search and distributed dynamic linked list, to address the BCP. Based on the BCP solution, a divide and conquer parallel algorithm is implemented for building EMSF and EMST totally on GPU side. The TSP is addressed with different parallel 2-opt local search algorithms, in which we propose a ``multiple K-opt evaluation, multiple K-opt moves" methodology in order to simultaneously execute, without interference, massive 2-/3-opt moves that are globally found on the same TSP tour for many edges. This methodology is explained in details to show how we obtain high performance computing both on GPU and CPU side. We test the proposed solutions and report experimental comparison results against the state- of-the-art algorithms.
PB - Univ. Bourgogne Franche-Comté, France
KW - Bentley spiral search
KW - K-d search
KW - GPU parallel Euclidean minimum spanning forest / tree
KW - GPU bichromatic closest points
KW - multiple 2-opt movesm GPU parallel 2-opt
UR - http://www.multiagent.fr/Special:FOAF/bib:1046
ER -
TY - CHAP
ID - QiaoCreput2018_1049
PY - 2018-06-21
T1 - Spiral Search Method to GPU Parallel Euclidean Minimum Spanning Tree Problem
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - We present both sequential and data parallel approaches to build hierarchical minimum spanning forest (MSF) or tree (MST) in Euclidean space (EMSF/EMST) for applications whose input N points are uniformly or boundedly distributed in the Euclidean space. Each iteration of the sequential approach takes O(N) time complexity through combining Borůvka's algorithm with an improved component-based neighborhood search algorithm, namely sliced spiral search, that is a newly proposed improvement of Bentley's spiral search for finding a component graph's closest outgoing point on 2D plane. We also propose a k-d search technique to extend this kind of search into 3D space. The data parallel approach includes a newly proposed two direction breadth-first search (BFS) implementation on graphics processing unit (GPU) platform, which is aimed for selecting a spanning tree's minimum outgoing weight. The GPU parallel approaches assign N threads with one thread associated to one input point, one thread occupies O(1) local memory and the whole algorithm occupies O(N) global memory. Experiments are conducted on point set in the plane of both uniformly distributed data sets and TSPLIB database. We evaluate computation time of the proposed approaches on more than 40 benchmarks with size N growing up to 10^5 points.
AD - Kalamata, Greece
JF - International Learning and intelligent OptimizationN Conference
PB - LION 2018, LNCS 11353
KW - Parallel Euclidean Minimum Spanning Tree
KW - spiral search
KW - Minimum Spanning forest
KW - GPU data clustering
UR - http://dx.doi.org/10.1007/978-3-030-05348-2_2
UR - http://www.multiagent.fr/Special:FOAF/bib:1049
ER -
TY - JOUR
ID - QiaoCreput2019_1089
PY - 2019-03-01
T1 - GPU Implementation of Borůvka's Algorithm to Euclidean Minimum Spanning Tree based on Elias Method
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - We present both sequential and data parallel approaches to build hierarchical minimum spanning forest (MSF) or tree (MST) in Euclidean space (EMSF/EMST) for applications whose input N points are uniformly or boundedly distributed in Euclidean space. Each iteration of the sequential approach takes O(N) time complexity through combining Borůvka's algorithm with an improved component-based neighborhood search algorithm, namely sliced spiral search, which is a newly proposed improvement to Bentley's spiral search for finding a component graph's closest outgoing point on the plane. It works based on the uniqueness property in Euclidean space, and allows O(1) time complexity for one search from a query point to find the component's closest outgoing point at different iterations of Borůvka's algorithm. The data parallel approach includes a newly proposed two-direction breadth- first search (BFS) implementation on graphics processing unit (GPU) platform, which is specialized for selecting a spanning tree's shortest ougoing edge. This GPU two-direction parallel BFS enables a tree traversal operation to start from any one of its vertex acting as root. These GPU parallel implementations work by assigning N threads with one thread associated to one input point, one thread occupies O(1) local memory and the whole algorithm occupies O(N) global memory. Experiments are conducted on both uniformly distributed data sets and TSPLIB database. We evaluate computation time of the proposed approaches on more than 80 benchmarks with size N growing up to 10^6 points on personal laptop.
JA - Applied Soft Computing
VL - 76
SP - 105
EP - 120
KW - Euclidean Minimum Spanning Tree EMST GPU Parallel EMST Breadth first search GPU parallel BFS decentralized control data parallel
UR - http://dx.doi.org/10.1016/j.asoc.2018.10.046
UR - http://www.multiagent.fr/Special:FOAF/bib:1089
ER -
TY - JOUR
ID - QiaoCreput2019_1091
PY - 2019-12-07
T1 - GPU K-dimensional nearest neighbor search for parallel Euclidean minimum spanning tree problem
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - We propose new data parallel approaches working on graphics processing unit (GPU) CUDA platform to build hierarchical minimum spanning forest (MSF) or tree (MST) for applications whose input only contains N points that are uniformly or boundedly distributed in Euclidean K-dimensional space (EMSF/EMST). Characteristic of these proposed algorithms follows ``data parallelism, decentralized control and constant local memory occupied by each GPU thread". Instead of using K-d tree search, we propose a nearest neighbor search algorithm, called K-d search, working based on dividing the Euclidean K-dimensional space into congruent and non- overlapping cells where size of points in each cell is bounded. We further combine the uniqueness property in Euclidean space with this K-d search approach. This combination achieves sequential O(N) time complexity to find bi-chromatic closest pair between two color point sets, and O(N)/N time when working in concurrent read/write parallel model. The data parallel approach also exploits parallelism of each sub-step to build EMST in Borůvka's framework, for example CUDA kernels on a distributed dynamic linked list data structure for GPU parallel tree traversal operation, GPU parallel union-find implementation. Experimental results are conducted on both 2D and 3D benchmarks with up to $10^7$ points on personal laptop, and show that our algorithm runs faster than current fastest dual-tree Mlpack EMST library.
JA - Submitted to journal
KW - Nearest neighbor search in Euclidean K-dimension space
KW - K-d search
KW - GPU parallel breadth-first search
KW - GPU tree traversal
KW - GPU parallel union-find
UR - http://www.multiagent.fr/Special:FOAF/bib:1091
ER -
TY - JOUR
ID - QiaoCreput_1092
PY - 2019-12-07
T1 - Massive 2-opt and 3-opt Moves with High Performance GPU Local Search to Large-scale Traveling Salesman Problem
AU - Qiao,Wenbao
AU - CREPUT,Jean-Charles
N2 - 2-opt, 3-opt or k-opt heuristics are classical local search algorithms for traveling salesman problems (TSP) in combinatorial optimization area. This paper introduces a judicious decision making methodology of offloading which part of the k-opt heuristic works in parallel on Graphics Processing Unit (GPU) while which part remains sequential, called ``multiple k-opt evaluation, multiple k-opt moves", in order to simultaneously execute, without interference, massive 2-/3-opt moves that are globally found on the same TSP tour or the same Euclidean space for many edges, as well as keep high performance for GPU massive k-opt evaluation. We prove the methodology is judicious and valuable because of our originally proposed sequential non- interacted 2-/3-exchange set partition algorithm taking linear time complexity and a new TSP tour representation, array of ordered coordinates-index, in order unveil how to use GPU on-chip shared memory to achieve the same goal as using doubly linked list and array of ordered coordinates for parallel k- opt implementation. We test this methodology on 22 national TSP instances with up to 71009 cities and with brute initial tour solution. Average maximum 997 non-interacted 2-opt moves are found and executed on the same tour of ch71009.tsp instance in one iteration of our proposed method. Experimental comparisons show that our proposed methodology gets huge acceleration over both classical sequential and a possible current fastest state-of-the-art GPU parallel 2-opt implementation.
JA - conference paper recommended to journal of Annals of Mathematics and Artificial Intelligence
UR - http://www.multiagent.fr/Special:FOAF/bib:1092
ER -