Available software

Out-of-Core Randomized Singular Value Decomposition on multi-GPU

The following code includes out-of-core randomized singular value decomposition on multi-GPU and multi-core CPU.

Please see the README.md file for more information about this program.

Download Source Code (20200114)

For any comments or questions, please contact me (ino(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

Please refer to the following license information regarding MAGMA.

License
Copyright © 2020 The University of Tennessee. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. in no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Accelerated Held-Karp implementation for the sTSP

The following code includes CPU and GPU implementations of the Held-Karp Algorithm for the symmetric traveling salesman problem (sTSP). We achieved acceleration with two techniques, parallelization and meet in the middle (MITM).

Please see the README file for more information about this program.

Download Source Code (20190318)

For any comments or questions, please contact me (ino(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

References:
Kazuro Kimura, Shinya Higa, Masao Okita, and Fumihiko Ino. ``Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem''. IEICE Transactions on Information and Systems, Vol. E102-D, No. 12, pp. 2329--2340, (2019-12). [DOI]


PACC: pipelined accelerator

PACC is an extention of OpenACC directives.

Please see the README file for more information about this program.

Download Source Code (20170112)

For any comments or questions, please contact me (ino(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

References:
Nobuhiro Miki, Fumihiko Ino, and Kenichi Hagihara, ``An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking,'' In Proceedings of the 3rd Workshop on Accelerator Programming Using Directives (WACCPD 2016), pp. 36--45, Salt Lake City, UT, USA, (2016-11). [DOI]


cuShiftOr: exact and approximate string matching

cuShift is an implementation of string matching algorithms accelerated on a CUDA-compatible GPU.

Please see the README file for more information about this program.

Download Source Code (20160623)

For any comments or questions, please contact me (ino(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

References:
Yasuaki Mitani, Fumihiko Ino, and Kenichi Hagihara, ``Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU,'' IEEE Transactions on Parallel and Distributed Systems, Vol.28, No.7, pp.1989-2002, (2017-07). [DOI]


All-pairs comparison based on SW#

This tar ball includes a sample program and a patch file to SW#, a CUDA-accelerated Smith-Waterman implementation distributed at http://sourceforge.net/projects/swsharp/

Please see the README file for more information about this program.

Download Source Code (20150801)

For any comments or questions, please contact me (ino(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

References:
Daiki Okada, Fumihiko Ino, and Kenichi Hagihara, ``Accelerating the Smith-Waterman Algorithm with an Interpair Pruning Method for All-Pairs Comparison of Base Sequences,'' BMC Bioinformatics, Vol.16, No.321, 15 pages, Oct. 2015. [DOI]


cudaRegistration

cudaRegistration contains a program for accelerating nonrigid registrations of medical images [1,2].

The following code includes a GPU implementation of nonrigid registrations based on Rueckert's registration algorithm. The implementation of the rigid registration program is NOT included.

Please see the README file for more information about this program.

Download Source Code (20140320)

For any comments or questions, please contact me (i-kei(at)ist.osaka-u.ac.jp).
* Please replace (at) with @.

References:
[1] Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara, ``Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA,'' IEEE Journal of Biomedical and Health Informatics, Vol.18, No.3, pp.956--968, May 2014. [DOI]
[2] Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara, ``Accelerating Mutual Information Computation for Nonrigid Registration the GPU,'' In Poster in the 3rd GPU Technology Conference (GTC 2012), San Jose, CA, USA, (2012-05). [PDF]