Data processing, please wait... |
Here is the step-by-step guide to generate mtable from BS-seq raw reads by BS-Seeker2 or Bismark. EpiMolas.jar can only accept report from these two mapping package. If you prefer to some other else but wish to use TEA for data visualization, please check the mtable section and generate an compatible report file indexed by a proper gtf gene ID (e.g., TAIR10 gene id) and make sure the measurement column arranged properly and the values are ranged between 0 and 1. We do not do further calculation on the upload mtable.
An adequate read preprocessing pipeline, especially if the experiment is done in a special design (e.g., PCR amplification for low amount of input DNA) is critical. However, there are various strategies may be adopted in the experiment. Thus, we can not provide a firm suggestion of read preprocessing for everybody. In general, trim the BS-seq raw reads using BS-Seeker2's argument -a adapter.txt
in the alignment (step2.2), or other tools like Trim Galore!, cutadapt to remove adapter-vector sequence is necessary since the short DNA fragment takes advantage in the NGS library prep and sequencing outputs. If you find low mapping rate for BS Seq data, you may try clean up the reads by removing adapters first.
Here we write some example scripts for executing BS-Seeker2 or Bismark. You have to make sure all the softwares and the accompanies (eg., python for BS-Seeker, perl for Bismark, and Bowtie2 for both of them; besides, java for EpiMolas) installed properly. If you have problem using the example script, please check the menu/user guide of the two mapping tools, looks for system admininstrator's help for environment setting.
-f
is to assign the sequence file (in fasta format) of reference genome. Aligner program need to build reference database before mapping process. BS Seeker2 will build the db in the default directory : (e.g., /path_to_installed_BSseeker2/bs_utils/reference_genomes/)
unless users specify the db path by argument -d
. The genome fasta file name will be used as the indexed db name. Once the index is built, you can refer the genome/index path in the next, mapping, and not need to build index every time.
--aligner=bowtie2
is for specifying bowtie2 as the aligner. If you prefer other aligneer, please check
BS Seeker2's menu for compatible aligners.
-i
is for specifying input file. Each experiment (independent dataset) should run separately.
-g
is for specifying the reference genome indexed db. It should be the same name in building the
genome index db (same wording used in the bs_seeker2-build.py -f
).
--bt-p
is to run the bowtie2 alignment in a parallel mode, e.g., increase the number of threads for bowtie to 4. If the computing environment allowed, user can set to higher core number to accelerate the run.
-m
is applied to set the read mapping flexibility (e.g., up to 2 mismatch bases in the example script). BS-Seeker2's default setting is eqival to -m 4 for up to 4 mismatches.
Besides, -a path_to/adapter.txt
can be added if you are going to use BS-Seeker2's trimming function; you need to provide the adapter sequence file (adapter.txt) in accompany.
-i
is for specifying input bam file.
-o
is for specifying output report file name (i.e., "output.txt" is the file name generated from this example script).
--db=BSseeker2_formatted_reference_genome_db
is for specifying the index db. Here is the default path. User should set to the path specifying the db path if -d
was applied in previous index db building (bs_seeker2-build.py ) process.
Bismark provides many options to send to aligner for optimizing. Please check the UserGuide if you need to specify more parameters for your dataset.
USAGE: bismark_genome_preparation [options] path_to_genome_folderbismark_genome_preparation --path_to_bowtie /usr/local/bin/ --bowtie2 path to/reference_genome.fa
Bismark authors suggest a pre-trimmed process before mapping. The alignment is compatible to bowtie and bowtie2. Please specify --bowtie2
if the alignment will be done by bowtie2. The reference sequence fasta file will be converted to Bismark required the indexed db and use the file name as db's name. The subfolders will be created to hold the indexed db for later mapping processes
--path_to_bowtie
is to specify the full path to Bowtie 1/ Bowtie 2 installation. Be sure that bowtie2 is installed properly. It is often in /usr/local/bin/ if the installation is done by administrators.
USAGE: bismark [options] genome_folder {-1 r1_read.fq -2 r2_read.fq | -se single_read.fq }bismark --se --bowtie2 path to/reference_genome.fa/ my_bs_reads.fastq
The mapping result is bam file named as the input read file name with .bam extention, e.g., my_bs_reads.fastq.bam generated by the sample script.
--se
is specifying the input bisulfite sequencing data is single-end reads. Please check Bismark's menu for pair-end readsets.
--bowtie2
is specifying the alignment use bowtie2. Please be sure the consistancy between refence genome indexed db and aligner.
USAGE: bismark_methylation_extractor [options] filenamesbismark_methylation_extractor --multicore 8 --comprehensive --cytosine_report --CX_context --genome_folder /full_path_to_genome_folder/ my_bs_reads_bismark_bt2.bam
--multicore
the number of cores to be used for the methylation extraction process. --comprehensive
, --cytosine_report
, and --CX_context
are arguments for generating a report summary for all Cs from the genome in the context-dependent (e.g., CG, CHG, CHH) way.--genome_folder
Enter the genome folder (full path only) where the reference genome sequence file is (where the fa file is located).
The mapping result now can be convert to methylation measurement for each gene by EpiMolas.jar.
Each experiment readsets should be run through mapping, extracting the methylation levels, then be converted to mtable separately. For example, two experiment conditions with three replicates each will have six read sets. Finally, six read sets should give six mtables for EpiMolas. Please check mtable section for further details.
Lastest update 2016/07/15 |