************************ H-BLAST v1.0 release [Dec. 30, 2012] ******************
H-BLAST is developed to accelerate the protein sequence alignment algorithms of 
the NCBI-BLAST implementation based on a GPU+CPU heterogeneous computer, with 
preserving the same output results as those obtained by NCBI-BLAST. It can take 
advantage of multiply CPU cores and GPUs. It has been tested on CentOS 5.4 and 
Ubuntu 11.04 with three NVIDIA Tesla Fermi C2050 GPUs and two NVIDIA Geforce 
GTX 560 GPUs, respectively.

H-BLAST is free for academic and non-commercial use. The current implementation 
is an enhancing variation of GPU-BLAST. Further imformation about GPU-BLAST, 
please refer to the paper
Panagiotis D. Vouzis and Nikolaos V. Sahinidis, "GPU-BLAST: Using graphics 
processors to accelerate protein sequence alignment," Vol. 27, no. 2, 
pages 182-188, Bioinformatics, 2011 (Open Access). And imformation 
about H-BLAST, please refer to the paper 
Weicai Ye, Yongdong Zhang, Ying Chen and Yuesheng Xu, 
"H-BLASTP: A Fast Scalable Protein Sequence Alignment
Toolkit on Heterogeneous Computers".

Please cite the authors in any work or product based on this material:
Weicai Ye, Yongdong Zhang, Ying Chen and Yuesheng Xu, 
"H-BLASTP: A Fast Scalable Protein Sequence Alignment
Toolkit on Heterogeneous Computers".

For any questions and feedback about H-BLAST, contact cai_rcy@mail.tom.com 
or lnszyd@mail.sysu.edu.cn.

I. Supported features
=====================
H-BLAST v1.0 supports protein alignment algorithms "blastp" and “blastx” 
(for test only). It can handle input files with multiple protein queries 
and take advantage of multiply CPU cores and GPUs. H-BLAST v1.0 does not 
support PSI BLAST. The substitution matrix is fixed with the Blosum62 matrix.

II. Files in this folder.
=====================
./queries 	The folder contains all queries for benchmark test. For example, 
		the file with the name "24Sequence_length_30k.txt" contains 
		24 sequences with total length 30kbp.
./h_blast	The folder contains all source code of h-blast.
./install 	The script for installation the h-blast into NCBI-BLAST.
./README 	The current file for your eyes.
./gpu_db_used_ratioq The file address additional parameters for H-BLAST.	
	     

III. Quick start
=====================
The following scenario shows a procedure to build and use the H-BLAST software 
with “blastp” algorithm (H-BLASTP). There are three steps, installation of H-BLAST, 
making databases and sequence alignment with H-BLAST. All commands are on a CentOS 
5.4 with CUDA 4.0, and the login shell is bash. We assume that there are 12 CPU 
cores and 2 GPU cards on the given computer, and the database in fasta format and 
the query sequence are located at “/home/Blast/db” folder. The pre-built 
NCBI-BLAST-2.25+ software with source code is located at 
“/home/Blast/ncbi_blast/blast/ncbi-blast-2.2.25+-src/c++/GCC412-ReleaseMT64/bin” 
folder. The procedure to use the H-BLAST software with “blastx” algorithm 
(H-BLASTX) is the same except using the blastx interface.

Step one: Installation of H-BLAST
$tar -xzf h-blast-1.0_ncbi-blast-2.2.25.tar.gz
$cd h-blast-1.0_ncbi-blast-2.2.25
$./install ncbi-blast-2.2.25
Do you want to install H-BLAST on an existing installation of "blastp" [yes/no]
yes: you will be asked for the installation directory of the "blastp" executable
no: will download and install "ncbi-blast-2.2.25+-src"
yes
Please input the installation directory of "blastp" of "ncbi-blast-2.2.25+-src"
/home/Blast/ncbi_blast/blast/ncbi-blast-2.2.25+-src/c++/GCC412-ReleaseMT64/bin
"blastp" version 2.2.25+ is compatible
Continuing with the installation of H-BLAST...

Modifying NCBI BLAST files

Compiling CUDA code
.gpu_blastp.cu(484): warning: variable "result" was set but never used

gpu_blastp.cu(484): warning: variable "result" was set but never used

.gpu_blastp.cu: In function ‘void GPU_BLASTP(int, const BLAST_SequenceBlk*, const BlastQueryInfo*, const LookupTableWrap*, const Blast_ExtendWord*, const BlastInitialWordParameters*, const BlastGPUOptions*, const Int4*)’:
gpu_blastp.cu:858: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
gpu_blastp.cu:860: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
gpu_blastp.cu:862: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
gpu_blastp.cu:863: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
gpu_blastp.cu:864: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’
gpu_blastp.cu:865: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’
gpu_blastp.cu:924: warning: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’
gpu_blastp.cu:932: warning: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’
gpu_blastp.cu: In function ‘Boolean GPU_BLASTP_check_memory(const LookupTableWrap*, const Blast_ExtendWord*, const BlastGPUOptions*, int, Int4, Int4, Int4, Int4, Int4)’:
gpu_blastp.cu:1230: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long long unsigned int’
gpu_blastp.cu:1230: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long long unsigned int’
gpu_blastp.cu: In function ‘void GPU_BLASTP_check_rehit_ratio(const Int4*, Int4)’:
gpu_blastp.cu:1312: warning: unknown conversion type character ‘)’ in format

Compiling CUDA code

Building NCBI-BLAST with H-BLAST
.........................................................................................
.........................................................................................
......................................................................................
$cp /home/Blast/ncbi_blast/blast/ncbi-blast-2.2.25+-src/c++/GCC412-ReleaseMT64/bin/makeblastdb /home/Blast/db
$cp /home/Blast/ncbi_blast/blast/ncbi-blast-2.2.25+-src/c++/GCC412-ReleaseMT64/bin/blastp /home/Blast/db
$cp gpu_db_used_ratioq /home/Blast/db

Step two: Make a BLAST Database and a GPU database
$cd /home/Blast/db
$./makeblastdb -in nr.fasta -out sorted_nr -sort_volumes -max_file_sz 250MB
$./blastp -query  SequenceLength_00001000.txt -db sorted_nr -gpu t -method -1 -gpu_blocks 256 -gpu_threads 64 > create_db_out.txt	

Step three: Sequence alignment with H-BLAST 
$./blastp -db sorted_nr -query 24Sequence_length_30k.txt -out result_30k -num_threads 12 -gpu t –method 2  &> output.txt

IV. FAQ
=====================
Q1. How to use H-BLAST?
Since the H-BLAST modifies NCBI-BLAST with GPU functionality, the interface of 
“blastp” and “blastx” in H-BLAST is identical to those from NCBI-BLAST except 
additional options. 

Q2. What are the additional options of using H-BLAST?
The additional options are shown as follows:
-gpu <Boolean>
   Use GPU for “blastp” or “blastx”
   Default = `F'

 -gpu_threads <Integer, 1..1024>
   Number of GPU threads per block
   Default = `64'
 -gpu_blocks <Integer, 1..65536>
  Number of GPU block per grid
   Default = `512'
 -method <Integer, -1..6>
   Method to be used
     x >= 1 = for GPU-based sequence alignment with x GPU cards (default x = 1),
	0 = for GPU-based sequence alignment with all GPU cards in the computer
     -1 = for GPU database creation
   Default = `1'
    * “-method -1” is incompatible with:  num_threads

Typing "./blastp -help" will print the above options towards the end of the 
output.

Q3. Are there any relationships among the options “-max_file_sz”, “-gpu_threads” 
and “-gpu_blocks” in H-BLAST?
Yes, they are very close. The max. size of a GPU database volume and the number 
of gpu threads are limited by the memory size on a GPU card. H-BLAST will reject 
any unacceptable option combos. Furthermore, different option combos affect 
performance. In our tests, two option combos are good for different GPU cards, 
listed as follows:
	a. -max_file_sz 250MB -gpu_threads 64 -gpu_blocks 256
	b. -max_file_sz 500MB -gpu_threads 64 -gpu_blocks 512
Option a fits for any GPU cards with at least 1GB memory, such as NVIDIA Geforce 
GTX 560. And Option b fits for any GPU cards with at least 2GB memory, such as 
NVIDA Tesla C2050. And the option a is our first choice, even using NVIDA Tesla 
C2050 GPU cards. 

Q4. Is the option “-sort_volumes” necessary?
Yes, if you want to use GPU cards. And the sorted database produces identical 
alignments with the unsorted one.

Q5. What is the file “gpu_db_used_ratioq” used for?
The file “gpu_db_used_ratioq” list some parameters using in H-BLAST, including 
the initial load assignment ratio for GPU cards, using self-adaptive load 
balancer or not, the computational capacity ratio between a GPU card and a CPU 
core and the number of max. successful hits per subject sequence to be saved. 
Since some parameters depend on empirical experience, we show the parameters 
with two different computers. The initial load assignment ratios are listed 
in tables. 
	a. A cluster node has 2 six-core Intel Xeon E5650 2.6GHz CPUs, 6 NVIDIA 
		Tesla C2050 GPUs, and 72GB memory with GCC 4.1.2 compiler and 
		CUDA 4.0 on CentOS 5.4.

	///=================== file content in gpu_db_used_ratioq ===========
	# The gpu_db_used_ratio is
	1
	# the gpu_db_used_ratio is fixed or not, '1' for fixed, others for changed by program itself
	0
	# gpu_cpu_capacity_rate is
	4
	# MAX_SUCCESSFUL_HITS_PER_SEQUENCE is
	19

	///============= table of initial load assignment ratios ===========
	CPU threads\ GPU cards |    1     |     2     |     3    
        ------------------------------------------------------------------------------
                  1               |     1      |      1      |      1
        ------------------------------------------------------------------------------
                  2               |     1      |      1      |      1
        ------------------------------------------------------------------------------
                  4               |     1      |      1      |      1
        ------------------------------------------------------------------------------
                  6               |    0.8     |      1      |      1
        ------------------------------------------------------------------------------
                  8               |    0.7     |      1      |      1
        ------------------------------------------------------------------------------
                 12               |    0.5     |     0.8     |      1
        ------------------------------------------------------------------------------


	b. A desktop PC has an i5-2300 2.8GHz quad-core CPU, 2 NVIDIA GTX560 
		GPUs, and 8G memory with GCC 4.4.6 compiler and CUDA 4.0 on 
		Ubuntu 11.04.

	///=================== file content in gpu_db_used_ratioq ===========
	# The gpu_db_used_ratio is
	1
	# the gpu_db_used_ratio is fixed or not, '1' for fixed, others for changed by program itself
	0
	# gpu_cpu_capacity_rate is
	2
	# MAX_SUCCESSFUL_HITS_PER_SEQUENCE is
	19

	///============= table of initial load assignment ratios ===========
	CPU threads\ GPU cards |    1     |     2     |        
        ------------------------------------------------------------------------------
                  1               |     1      |      1      |      
        ------------------------------------------------------------------------------
                  2               |     1      |      1      |      
        ------------------------------------------------------------------------------
                  4               |    0.8     |      1      |      
        ------------------------------------------------------------------------------

Q6. What is the screen output used for?
The screen output shows the details of the H-BLAST execution, including the
execution time for each database volume with CPU cores and GPU cards, the
realignment ratio and any error messages. 

Q7 What are the options of H-BLAST used for installation configures with NCBI-BLAST?
The installation configures H-BLAST with the following options:
./configure --without-debug --with-mt --without-sybase --without-ftds --without-fastcgi
--without-fltk --without-wxwin --without-ncbi-c --without-sssdb --without-sss
--without-geo --without-sp --without-orbacus --without-boost

