Data processing, please wait...
TEA TEA Help About Us

BS Seq Mapping Process

Here is the step-by-step guide to generate mtable from BS-seq raw reads by BS-Seeker2 or Bismark. EpiMolas.jar can only accept report from these two mapping package. If you prefer to some other else but wish to use TEA for data visualization, please check the mtable section and generate an compatible report file indexed by a proper gtf gene ID (e.g., TAIR10 gene id) and make sure the measurement column arranged properly and the values are ranged between 0 and 1. We do not do further calculation on the upload mtable.


STEP 1 Read preprocessing

An adequate read preprocessing pipeline, especially if the experiment is done in a special design (e.g., PCR amplification for low amount of input DNA) is critical. However, there are various strategies may be adopted in the experiment. Thus, we can not provide a firm suggestion of read preprocessing for everybody. In general, trim the BS-seq raw reads using BS-Seeker2's argument -a adapter.txt in the alignment (step2.2), or other tools like Trim Galore!, cutadapt to remove adapter-vector sequence is necessary since the short DNA fragment takes advantage in the NGS library prep and sequencing outputs. If you find low mapping rate for BS Seq data, you may try clean up the reads by removing adapters first.

STEP 2 Mapping cleaned, trimmed reads to the reference genome

Here we write some example scripts for executing BS-Seeker2 or Bismark. You have to make sure all the softwares and the accompanies (eg., python for BS-Seeker, perl for Bismark, and Bowtie2 for both of them; besides, java for EpiMolas) installed properly. If you have problem using the example script, please check the menu/user guide of the two mapping tools, looks for system admininstrator's help for environment setting.

  • For BS-Seeker2

    1. Building genome index for BS-Seeker2
    2. python bs_seeker2-build.py -f path_to/Arabidopsis_thaliana.TAIR10.31.dna.genome.fa --aligner=bowtie2

      -f is to assign the sequence file (in fasta format) of reference genome. Aligner program need to build reference database before mapping process. BS Seeker2 will build the db in the default directory : (e.g., /path_to_installed_BSseeker2/bs_utils/reference_genomes/) unless users specify the db path by argument -d . The genome fasta file name will be used as the indexed db name. Once the index is built, you can refer the genome/index path in the next, mapping, and not need to build index every time.
      --aligner=bowtie2 is for specifying bowtie2 as the aligner. If you prefer other aligneer, please check BS Seeker2's menu for compatible aligners.

    3. Mapping Readsto the indexed reference genome
    4. python bs_seeker2-align.py -i path_to/my_bs_reads.fastq -g Arabidopsis_thaliana.TAIR10.31.dna.genome.fa -m 2 --aligner=bowtie2 -f bam -o path_to/exp1.bam --bt-p 4

      -i is for specifying input file. Each experiment (independent dataset) should run separately.
      -g is for specifying the reference genome indexed db. It should be the same name in building the genome index db (same wording used in the bs_seeker2-build.py -f).
      --bt-p is to run the bowtie2 alignment in a parallel mode, e.g., increase the number of threads for bowtie to 4. If the computing environment allowed, user can set to higher core number to accelerate the run.
      -m is applied to set the read mapping flexibility (e.g., up to 2 mismatch bases in the example script). BS-Seeker2's default setting is eqival to -m 4 for up to 4 mismatches.
      Besides, -a path_to/adapter.txt can be added if you are going to use BS-Seeker2's trimming function; you need to provide the adapter sequence file (adapter.txt) in accompany.

    5. Generating the methylation report from alignment (bam file)
    6. python bs_seeker2-call_methylation.py -i path_to/exp1.bam -o path_to/output --db=/path_to_BSseeker2_path/bs_utils/reference_genomes/Arabidopsis_thaliana.TAIR10.31.dna.genome.fa_bowtie2/

      -i is for specifying input bam file.
      -o is for specifying output report file name (i.e., "output.txt" is the file name generated from this example script).
      --db=BSseeker2_formatted_reference_genome_db is for specifying the index db. Here is the default path. User should set to the path specifying the db path if -d was applied in previous index db building (bs_seeker2-build.py ) process.

  • For Bismark's users
  • Bismark provides many options to send to aligner for optimizing. Please check the UserGuide if you need to specify more parameters for your dataset.


    1. Building the reference genome
    2. USAGE:   bismark_genome_preparation [options] path_to_genome_folder 
      bismark_genome_preparation --path_to_bowtie /usr/local/bin/ --bowtie2 path to/reference_genome.fa

      Bismark authors suggest a pre-trimmed process before mapping. The alignment is compatible to bowtie and bowtie2. Please specify --bowtie2 if the alignment will be done by bowtie2. The reference sequence fasta file will be converted to Bismark required the indexed db and use the file name as db's name. The subfolders will be created to hold the indexed db for later mapping processes
      --path_to_bowtie is to specify the full path to Bowtie 1/ Bowtie 2 installation. Be sure that bowtie2 is installed properly. It is often in /usr/local/bin/ if the installation is done by administrators.

    3. Mapping Reads to the indexed reference genome
    4. USAGE:  bismark [options]  genome_folder {-1 r1_read.fq -2 r2_read.fq | -se single_read.fq } 
      bismark --se --bowtie2 path to/reference_genome.fa/ my_bs_reads.fastq

      The mapping result is bam file named as the input read file name with .bam extention, e.g., my_bs_reads.fastq.bam generated by the sample script.
      --se is specifying the input bisulfite sequencing data is single-end reads. Please check Bismark's menu for pair-end readsets.
      --bowtie2 is specifying the alignment use bowtie2. Please be sure the consistancy between refence genome indexed db and aligner.

    5. Generating the summary report from mapping result (bam file)
    6. USAGE:   bismark_methylation_extractor [options]  filenames  
      bismark_methylation_extractor --multicore 8 --comprehensive --cytosine_report --CX_context --genome_folder /full_path_to_genome_folder/ my_bs_reads_bismark_bt2.bam
      --multicore the number of cores to be used for the methylation extraction process.
      --comprehensive, --cytosine_report, and --CX_context are arguments for generating a report summary for all Cs from the genome in the context-dependent (e.g., CG, CHG, CHH) way.
      --genome_folder Enter the genome folder (full path only) where the reference genome sequence file is (where the fa file is located).

STEP 3 Converting the mapping report to mtable

The mapping result now can be convert to methylation measurement for each gene by EpiMolas.jar.

    1. java -jar EpiMolas.jar mapping_report.txt Arabidopsis_thaliana.TAIR10.31.gtf > result.mtable

Each experiment readsets should be run through mapping, extracting the methylation levels, then be converted to mtable separately. For example, two experiment conditions with three replicates each will have six read sets. Finally, six read sets should give six mtables for EpiMolas. Please check mtable section for further details.



© 2016, Laboratory of Systems Biology and Network Biology,

Institute of Information Science, Academia Sinica, TAIWAN.

Lastest update 2016/07/15