TY - CHAP
ID - WangZhangCreput2013_673
PY - 2013-11-24
T1 - A Massive Parallel Cellular GPU implementation of Neural Network to Large Scale Euclidean TSP
AU - Wang,Hongjian
AU - ZHANG,NAIYU
AU - CREPUT,Jean-Charles
N2 - This paper proposes a parallel model of the self- organizing map (SOM) neural network applied to the Euclidean traveling salesman problem (TSP) and intended for implementation on the graphics processing unit (GPU) platform. The plane is partitioned into an appropriate number of cellular units, that are each responsible of a certain part of the data and network. The advantage of the parallel algorithm is that it is decentralized and based on data decomposition, rather than based on data duplication, or mixed sequential/parallel solving, as often with GPU implementation of optimization metaheuristics. The processing units and the required memory are with linear increasing relationship to the problem size, which makes the model able to deal with very large scale problems in a massively parallel way. The approach is applied to Euclidean TSPLIB problems and National TSPs with up to 33708 cities on both GPU and CPU, and these two types of implementation are compared and discussed.
JF - 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, Mexico City, Mexico, November 24-30, 2013, Proceedings, Part II (Best Student Paper Award), Advances in Soft Computing and Its Applications, Lecture Notes in Computer Science, Volume 8266, pp.118-129
KW - Neural network
KW - Self-organizing map
KW - Euclidean traveling salesman problem
KW - Parallel cellular model
KW - Graphics processing unit
UR - http://dx.doi.org/10.1007/978-3-642-45111-9_10
UR - http://www.multiagent.fr/Special:FOAF/bib:673
ER -
TY - CHAP
ID - ZhangWangCreputMoreauRuichek2013_650
PY - 2013-11-17
T1 - A Near Real-Time Color Stereo Matching Method for GPU
AU - ZHANG,NAIYU
AU - Wang,Hongjian
AU - CREPUT,Jean-Charles
AU - MOREAU,Julien
AU - RUICHEK,Yassine
N2 - This paper presents a near real-time stereo matching method with acceptable matching results. This method consists of three important steps: SAD-ALD cost measure, cost aggregation in adaptive window in cross-based support regions and a refinement step. These three steps are well organized to be adopted by the GPU's parallel architecture. The parallelism brought by GPU and CUDA implementations provides significant acceleration in running time. This method is tested on six pairs of images from Middlebury dataset, each possibly declined within different sizes. For each pair of images it can generate acceptable matching results in roughly less than 100 milliseconds. The method is also compared with three GPU-based methods and one CPU-based method on increasing size image pairs.
AD - Lisbon, Portugal
JF - Third International Conference on Advanced Communications and Computation (INFOCOMP'2013)
UR - http://www.thinkmind.org/index.php?view=article&articleid=infocomp_2013_2_10_10094
KW - GPU
KW - Real-Time Stereovision
KW - SAD-ALD
KW - Adaptive Window
KW - CUDA
L1 - /extensions/ICAPManager/pdf/ZhangWangCreputMoreauRuichek2013.pdf
UR - http://www.thinkmind.org/index.php?view=article&articleid=infocomp_2013_2_10_10094
UR - http://www.multiagent.fr/Special:FOAF/bib:650
ER -
TY - CHAP
ID - ZhangWangCreputMoreauRuichek2013_649
PY - 2013-12-09
T1 - Cellular GPU Model for Structured Mesh Generation and its Application to the Stereo-Matching Disparity Map
AU - ZHANG,NAIYU
AU - Wang,Hongjian
AU - CREPUT,Jean-Charles
AU - MOREAU,Julien
AU - RUICHEK,Yassine
N2 - This paper presents a cellular GPU model for structured mesh generation according to an input stereo-matching disparity map. Here, the disparity map stands for a density distribution that reflects the proximity of objects to the camera in 3D space. The meshing process consists in covering such data density distribution with a topological structured hexagonal grid that adapts itself and deforms according to the density values. The goal is to generate a compressed mesh where the nearest objects are provided with more details than objects which are far from the camera. The solution we propose is based on the Kohonen's Self-Organizing Map learning algorithm for the benefit of its ability to generate a topological map according to a probability distribution and its ability to be a natural massive parallel algorithm. We propose a GPU parallel model and its implantation of the SOM standard algorithm, and present experiments on a set of standard stereo- matching disparity map benchmarks.
AD - Anaheim, California, USA
JF - IEEE International Symposium on Multimedia (ISM'2013)
KW - image matching
KW - mesh generation
KW - probability
KW - self
KW - organising feature maps
KW - stereo image processing
UR - http://dx.doi.org/10.1109/ISM.2013.18
UR - http://www.multiagent.fr/Special:FOAF/bib:649
ER -
TY - CHAP
ID - ZhangCreputWangMeurieRuichek2013_658
PY - 2013-11-17
T1 - Partial demosaicing for stereo matching of CFA images on GPU and CPU
AU - ZHANG,NAIYU
AU - CREPUT,Jean-Charles
AU - Wang,Hongjian
AU - MEURIE,Cyril
AU - RUICHEK,Yassine
N2 - This paper presents a GPU implementation of a partial demosaicing scheme that is specially designed for stereo matching of CFA image. This method consists of three main techniques keys: the adapted matching cost for CFA image, the estimated Second color component based on Hamilton’s estimate method and a robust cost aggregation window. Experiments are carried out to explore the performance for this method on GPU both at matching quality and matching efficiency, with comparison with version on CPU. The experiments on different size image pairs from Middlebury dataset show that this method can be substantially accelerated on GPU when the image size is large and has still space for improvements in performance.
AD - Lisbon, Portugal
JF - International Conference on Advanced Communications and Copmutation (INFOCOMP'2013)
UR - http://newtheme.hgpu.org/?p=11164
KW - GPU
KW - Demaosaicing Stereovision
KW - CFA Image
KW - CUDA
L1 - /extensions/ICAPManager/pdf/ZhangCreputWangMeurieRuichek2013.pdf
UR - http://newtheme.hgpu.org/?p=11164
UR - http://www.multiagent.fr/Special:FOAF/bib:658
ER -
TY - JOUR
ID - WangZhangCreputMoreauRuichek2015_794
PY - 2015-03-16
T1 - Parallel Structured Mesh Generation with Disparity Maps by GPU Implementation
AU - Wang,Hongjian
AU - ZHANG,NAIYU
AU - CREPUT,Jean-Charles
AU - MOREAU,Julien
AU - RUICHEK,Yassine
N2 - The goal of structured mesh is to generate a compressed representation of the 3D surface, where near objects are provided with more details than objects far from the camera, according to the disparity map. The solution is based on the Kohonens Self-Organizing Map algorithm for the benefits of its ability to generate a topological map according to a probability distribution and its potential to be a natural massive parallel algorithm. The disparity map, which stands for a density distribution that reflects the proximity of objects to the camera, is partitioned into an appropriate number of cell units, in such a way that each cell is associated to a processing unit and responsible of a certain area of the plane. The advantage of the proposed model is that it is decentralized and based on data decomposition. The required processing units and memory are with linearly increasing relationship to the problem size. Experimental results show that our GPU implementation is able to provide near real- time performance with small size disparity maps and the running time increases in a linear way with a very weak increasing coefficient. The proposed method is suitable to deal with large scale problems in a massively parallel way.
JA - IEEE Transactions on Visualization and Computer Graphics
IS - 9
VL - 21
KW - Disparity Map
KW - GPU Implementation
KW - Mesh Generation
KW - Parallel Cellular Model
KW - Self
KW - Organizing Map
UR - http://dx.doi.org/10.1109/TVCG.2015.2413775
UR - http://www.multiagent.fr/Special:FOAF/bib:794
ER -
TY - CHAP
ID - WangCreput2014_795
PY - 2014-11-28
T1 - Parallel and Distributed Implementation Models for Bio-inspired Optimization Algorithms
AU - Wang,Hongjian
AU - CREPUT,Jean-Charles
N2 - Bio-inspired optimization algorithms have natural parallelism but practical implementations in parallel and distributed computational systems are nontrivial. Gains from different parallelism philosophies and implementation strategies may vary widely. In this paper, we contribute with a new taxonomy for various parallel and distributed implementation models of metaheuristic optimization. This taxonomy is based on three factors that every parallel and distributed metaheuristic implementation needs to consider: control, data, and memory. According to our taxonomy, we categorize different parallel and distributed bio-inspired models as well as local search metaheuristic models. We also introduce a new designed GPU parallel model for the Kohonen’s self-organizing map, as a representative example which belongs to a significant category in our taxonomy.
JF - International Conference on Swarm Intelligence Based Optimization, ICSIBO'2014, Mulhouse, France, May 13-14, 2014, Swarm Intelligence Based Optimization, Lecture Notes in Computer Science Volume 8472, pp 68-79, 28 Nov.
KW - Parallel and distributed computing·Metaheuristic·Genetic algorithm·Ant colony optimization·Self-organizing map
UR - http://dx.doi.org/10.1007/978-3-319-12970-9_8
UR - http://www.multiagent.fr/Special:FOAF/bib:795
ER -
TY - CHAP
ID - WangMansouriCreputRuichek2015_896
PY - 2015-10-25
T1 - Massively parallel cellular matrix model for superpixel adaptive segmentation map
AU - Wang,Hongjian
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
AU - RUICHEK,Yassine
N2 - We propose the concept of superpixel adaptive segmentation map, to produce a perceptually meaningful representation of rigid pixel image, with higher resolution of more superpixels on interesting regions according to the density distribution of desired attributes. The solution is based on the self-organizing map (SOM) algorithm, for the benefits of SOM's ability to generate a topological map according to a probability distribution and its potential to be a natural massive parallel algorithm. We also propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input image into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data and the cluster center network, and carries out massively parallel spiral searches based on the cellular matrix topology. Experimental results from our GPU implementation show that the proposed algorithm can generate adaptive segmentation map where the distribution of superpixels reflects the gradient distribution or the disparity distribution of input image, with respect to scene topology. When the input size augments, the running time increases in a linear way with a very weak increasing coefficient.
JF - 14th Mexican International Conference on Artificial Intelligence (MICAI 2015), Cuernavaca, Morelos, Mexico, Oct. 25-31, 2015, Advances in Artificial Intelligence and Its Applications, Lecture Notes in Computer Science, vol.9414, pp.325-336
KW - Superpixel
KW - Image segmentation
KW - Self-organizing map
KW - Cellular matrix model
KW - Graphics processing unit
UR - http://dx.doi.org/10.1007/978-3-319-27101-9_24
UR - http://www.multiagent.fr/Special:FOAF/bib:896
ER -
TY - CHAP
ID - WangMansouriCreput2015_897
PY - 2015-12-06
T1 - Massively parallel cellular matrix model for self-organizing map applications
AU - Wang,Hongjian
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
N2 - We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data and the network of the self- organizing map (SOM), and carries out massive parallel spiral searches based on the cellular matrix topology. The advantage of the proposed model is that it is decentralized and based on data decomposition. The required processing units and memory are with linearly increasing relationship to the problem size. Based on the cellular matrix model, the parallel SOM is implemented to deal with various applications including the traveling salesman problem, structured mesh generation, and superpixel adaptive segmentation map. Experimental results of our GPU implementation show that the running time increases in a linear way with a very weak increasing coefficient according to the input size. The proposed cellular matrix model is suitable to deal with large scale problems in a massively parallel way.
JF - 2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2015), Cairo, Egypt, Dec. 06-09
KW - parallel computation model
KW - self-organizing map
KW - traveling salesman problem
KW - mesh generation
KW - superpixel
KW - GPU implementation
UR - http://dx.doi.org/10.1109/ICECS.2015.7440384
UR - http://www.multiagent.fr/Special:FOAF/bib:897
ER -
TY - THES
ID - Wang2015_902
PY - 2015-12-03
T1 - Cellular Matrix for Parallel K-means and Local Search to Euclidean Grid Matching
AU - Wang,Hongjian
N2 - In this thesis, we propose a parallel computing model, called cellular matrix, to provide answers to problematic issues of parallel computation when applied to Euclidean graph matching problems. These NP-hard optimization problems involve data distributed in the plane and elastic structures represented by graphs that must match the data. They include problems known under various names, such as geometric k- means, elastic net, topographic mapping, and elastic image matching. The Euclidean traveling salesman problem (TSP), the median cycle problem, and the image matching problem are also examples that can be modeled by graph matching. The contribution presented is divided into three parts. In the first part, we present the cellular matrix model that partitions data and defines the level of granularity of parallel computation. We present a generic loop for parallel computations, and this loop models the projection between graphs and their matching. In the second part, we apply the parallel computing model to k-means algorithms in the plane extended with topology. The proposed algorithms are applied to the TSP, structured mesh generation, and image segmentation following the concept of superpixel. The approach is called superpixel adaptive segmentation map (SPASM). In the third part, we propose a parallel local search algorithm, called distributed local search (DLS). The solution results from the many local operations, including local evaluation, neighborhood search, and structured move, performed on the distributed data in the plane. The algorithm is applied to Euclidean graph matching problems including stereo matching and optical flow.
PB - Université de technologie Belfort-Montbéliard
UR - https://tel.archives-ouvertes.fr/tel-01265951
KW - cellular matrix
KW - graph matching
KW - k-means
KW - local search
KW - parallel algorithms
KW - graphics processing unit (GPU)
UR - https://tel.archives-ouvertes.fr/tel-01265951
UR - http://www.multiagent.fr/Special:FOAF/bib:902
ER -
TY - JOUR
ID - WangZhangCreputRuichekMoreau2016_927
PY - 2016-04-06
T1 - Massively parallel gpu computing for fast stereo correspondence algorithms
AU - Wang,Hongjian
AU - ZHANG,NAIYU
AU - CREPUT,Jean-Charles
AU - RUICHEK,Yassine
AU - MOREAU,Julien
N2 - Current accurate stereo matching algorithms employ some key techniques that are not suitable for parallel GPU architecture. It will be tricky and cumbersome to directly take these techniques into GPU applications. Trying to tackle this difficulty, we design two GPU-based stereo matching algorithms, one using a local fixed aggregation window whose size is configurable, and the other using an adaptive aggregation window which only includes necessary pixels. We use the winner-takes-all (WTA) principle for optimization and a plain voting refinement for post-processing; both do not need complex data structures. We aim to implement on GPU platforms fast stereo matching algorithms that produce results with same-level quality as other WTA local dense methods that use window-based cost aggregation. In our GPU- based implementation of the fixed window partially demosaiced CFA stereo matching application, accelerations up to 20 times are obtained for large size images. In our GPU- based implementation of the adaptive window color stereo matching application, experiment results show that it can handle four pairs of standard images from Middlebury database within roughly 100 milliseconds.
JA - Journal of Systems Architecture
VL - 65
KW - GPU computing
KW - Stereo matching
KW - CFA demosaicing
KW - Adaptive aggregation window
UR - http://dx.doi.org/10.1016/j.sysarc.2016.03.002
UR - http://www.multiagent.fr/Special:FOAF/bib:927
ER -
TY - CHAP
ID - WangMansouriCreputRuichek2016_929
PY - 2016-06-13
T1 - Distributed Local Search for Elastic Image Matching
AU - Wang,Hongjian
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
AU - RUICHEK,Yassine
N2 - We propose a distributed local search (DLS) algorithm, which is a parallel formulation of a local search procedure in an attempt to follow the spirit of standard local search metaheuristics. Applications of different operators for solution diversification are possible in a similar way to variable neighborhood search. We formulate a general energy function to be equivalent to elastic image matching problems. A specific example application is stereo matching. Experimental results show that the GPU implementation of DLS seems to be the only method that provides an increasing acceleration factor as the instance size augments, among eight tested energy minimization algorithms.
JF - International Conference on Swarm Intelligence Based Optimization, ICSIBO'2016, Mulhouse, France, June 13-14. Swarm Intelligence Based Optimization, ICSIBO 2016, Lecture Notes in Computer Science: Volume 10103, pp.65-74, 25 Nov.
KW - Parallel and distributed computing
KW - Variable neighborhood search
KW - Stereo matching
KW - Graphics processing unit
UR - http://dx.doi.org/10.1007/978-3-319-50307-3_5
UR - http://www.multiagent.fr/Special:FOAF/bib:929
ER -
TY - JOUR
ID - WangZhangCreput2017_953
PY - 2017-02-20
T1 - A massively parallel neural network approach to large-scale Euclidean traveling salesman problems
AU - Wang,Hongjian
AU - ZHANG,NAIYU
AU - CREPUT,Jean-Charles
N2 - This paper proposes a parallel computation model for the self-organizing map (SOM) neural network applied to Euclidean traveling salesman problems (TSP). This model is intended for implementation on the graphics processing unit (GPU) platform. The Euclidean plane is partitioned into an appropriate number of cellular units, called cells, and each cell is responsible of a certain part of the data and network. Compared to existing GPU implementations of optimization metaheuristics, which are often based on data duplication or mixed sequential/parallel solving, the advantage of the proposed model is that it is decentralized and based on data decomposition. Designed for handling large-scale problems in a massively parallel way, the required computing resources grow linearly along the problem size. Experiments are conducted on 52 publicly available Euclidean TSP instances with up to 85,900 cities for the largest TSPLIB instance and 71,009 cities for the largest National TSP instance. Experimental results show that our GPU implementations of the proposed model run significantly faster than the currently best- performing neural network approaches, to obtain results of similar quality.
JA - Neurocomputing
VL - 240
SP - 137
EP - 151
KW - Neural network
KW - Self-organizing map
KW - Parallel computing
KW - Graphics processing unit
KW - Euclidean traveling salesman problem
UR - http://dx.doi.org/10.1016/j.neucom.2017.02.041
UR - http://www.multiagent.fr/Special:FOAF/bib:953
ER -
TY - JOUR
ID - WangMansouriCreput2017_974
PY - 2017-08-15
T1 - Cellular matrix model for parallel combinatorial optimization algorithms in Euclidean plane
AU - Wang,Hongjian
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
N2 - We propose a parallel computation model, called cellular matrix model (CMM), to address large-size Euclidean graph matching problems in the plane. The parallel computation takes place by partitioning the plane into a regular grid of cells, each cell being affected to a single processor. Each processor operates on local data, starting from its cell location and extending its search to the neighborhood cells in a spiral search way. In order to deal with large-size problems, memory size and processor number are fixed as O(N), where N is the problem size. Then one key point is that closest point searching in the plane is performed in O(1) expected time for uniform or bounded distribution, for each processor independently. We define a generic loop that models the parallel projection between graphs and their matching, as executed by the many cells at a given level of computation granularity. To illustrate its efficacy and versatility, we apply the CMM, on GPU platforms, to two problems in image processing: superpixel segmentation and stereo matching energy minimization. Firstly, we propose an extended version of the well-known SLIC superpixel segmentation algorithm, which we call SPASM algorithm, by using a parallel 2D self-organizing map instead of k-means algorithm. Secondly, we investigate the idea of distributed variable neighborhood search, and propose a parallel search heuristic, called distributed local search (DLS), for global energy minimization of stereo matching problem. We evaluate the approach with regards to the state-of-the-art graph cut and belief propagation algorithms. For each problem, we argue that the parallel GPU implementation provides new competitive quality/time trade-offs, with substantial acceleration factors as the problem size increases.
JA - Applied Soft Computing
PB - Elsevier
KW - Cellular matrix
KW - Graph matching
KW - k-means
KW - local search
KW - parallel algorithms
KW - graphics processing unit (GPU)
UR - http://dx.doi.org/https://doi.org/10.1016/j.asoc.2017.08.015
UR - http://www.multiagent.fr/Special:FOAF/bib:974
ER -
TY - CHAP
ID - MansouriCreputLauriWang2018_993
PY - 2018-02-21
T1 - GPU based optical flow estimation using parallel local search algorithm
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
AU - LAURI,Fabrice
AU - Wang,Hongjian
N2 - optical flow estimation problem of a pair of images using a distributed local search (DLS)
JF - 19ème congrès de la société Française de Recherche Opérationnelle et d'Aide à la Décision, ROADEF 2018
KW - Optical flow
KW - Parallel and distributed computing
KW - Variable neighborhood search
KW - Graphics processing unit.
UR - http://www.multiagent.fr/Special:FOAF/bib:993
ER -
TY - CHAP
ID - MansouriCreputLauriWang2018_994
PY - 2018-02-22
T1 - Massively Parallel Optical Flow using Distributed Local Search
AU - MANSOURI,Abdelkhalek
AU - CREPUT,Jean-Charles
AU - LAURI,Fabrice
AU - Wang,Hongjian
N2 - The design of many tasks in computer vision field requires addressing difficult NP-hard energy optimization problems. An example of application is the visual correspondence problem of optical flow, which can be formulated as an elastic pattern matching optimization problem. Pixels of a first image have to be matched to pixels in a second image while preserving elastic smoothness constraint on the first image deformation. In this paper, we present a parallel approach to address optical flow problem following the concept of distributed local search. Distributed local search consists in the parallel execution of many standard local search processes operating on a partition of the data. Each process performs local search on its own part of the data such that the overall energy is minimized. The approach is implemented on graphics processing unit (GPU) platform and evaluated on standard Middlebury benchmarks to gauge the substantial acceleration factors that can be achieved in the task of energy minimization.
JF - The Tenth International Conference on Pervasive Patterns and Applications PATTERNS 2018. Barcelona, Spain. February 18-22
SN - ISBN 978-1-61208-612-5
SN - ISSN 2308-3557
KW - Optical flow
KW - Parallel and distributed computing
KW - Variable neighborhood search
KW - Graphics processing unit.
UR - http://www.multiagent.fr/Special:FOAF/bib:994
ER -