Data processing, please wait...
TEA TEA Help About Us

Methylation Landscapes


5-methylcytosine (5mC) is the best known methylation modificated nucleotides in genome. They are often classified in three sequence context CG, CHG, CHH and the location where the Cis in. To catch the essence of genome methylation status and to meet the efficiency for performing analysis online, we introduce a straightforward method to measure the methylation landscapes regarding to the sequence contexts.

The methylation landscape in TEA is based on the gene, in which DNA methylation levels in the promoter and gene body are estimated from the WGBS data.

The DNA methylation level for individual cytosines is estimated as Equation (1.1), then, the average promoter or gene body methylation levels within the promoter or gene body is calculated as the average within the range by Equation (1.2):

Each C should have mapped depth in a minimum threshold of 4, and at least five C sites of each sequence context type to be reported. The mapping report is converted to a summary which includes six measurements (i.e., pmt-CG, pmt-CHG, pmt-CHH, gene_CG, gene_CHG, gene_CHH ) for each gene. This method is implement to an in-house program EpiMolas.jar to process BS-Seq mapping results into a small, tab-delimited data file, mtable :

gene_id	pmt_CG	gene_CG	pmt_CHG	gene_CHG	pmt_CHH	gene_CHH
AT1G01010	0.011463	0.053009	0.010000	0.011635	0.021765	0.012631
AT1G01020	0.000000	0.081519	0.006957	0.007177	0.003614	0.007521
AT1G01030	0.005385	0.012800	0.002439	0.023452	0.003116	0.016939
AT1G01040	0.011200	0.589821	0.009677	0.015773	0.016944	0.011699
AT1G01046	0.765250	0.385000	0.022500	0.058750	0.014325	0.047727

Note The align of column name and value is not changed to make perfect view on webpage, but they do seperated to the neighbors by tab. You can find seven column names in the first row, and the six data columns, lead by the gene id column.

These measurements are a normalized score from 0 (all observed sites are unmethylated) to 1 (all observed sites are methylated), or "NaN" for genes which do not have sufficient reads/sites to calculate the value. A deviation of 0.1 on the measurement reflects an overall 10% of Cs contributed to the observed feature changes the methylation state.

Executing EpiMolas.jar to generate mtable

  1. Check The Java Environment

    Before you run the EpiMolas.jr, please check the java environment installed properly in your linux environment. For example, simply type a version check :

    java -version

    and you will get a return like:

      openjdk version "1.8.0_45-internal"
      OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
      OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode) 

    If not, you need to ask the administrator's help for installing Java.

  2. Download the EpiMolas.jar from github (move to the directory that you want to save the script):


  3. Converting the report to mtable:

    First, we assume that you have completed the mapping process and had the right mapping report, *.CGmap from BS-Seeker2, or CXreport.txt from Bismark

    The Usage: java -jar EpiMolas.jar the_input_mapping_report_file gtf > the_output_file

    mtable from BS-Seeker2 CGmap output file (e.g., the input file: my.CGmap and the output mtable file: result.mtable

    java -jar path_to/EpiMolas.jar path_to/my.CGmap path_to/TAIR10.gtf > result.mtable &

    mtable from Bismark CX_report output file (e.g., the input file: my.CX_report.txt and the output mtable file: result.mtable

    java -jar path_to/EpiMolas.jar path_to/my.CX_report.txt path_to/TAIR10.gtf > result.mtable &

    You need to indicate paths to the required files (EpiMolas.jar, the input mapping report file, gtf) if they are not in the same directory where you execute EpiMolas.jar.

    You may specify -Xmx on the maximum RAM memory in use and -Xms on the initial memory. Emprically, if you have a *.CGmap file in size of X Gb, you may assign 2.3*X GB in the -Xms to ensure the success of run.

    It will look like $java -Xms10G -jar EpiMolas.jar exp1.CGmap TAIR10.gtf > exp1.mtable & if 10G RAM is allocated as the initial memory of run.

Upload the mtable to TEA Server

Now you should have mtable in hand. More precisely, one mtable for each dataset. Make sure the demultiplex step and read pooling had been done before mapping step if you had a complicated design and multiple lanes used in one experiment.

Build Your Own Project : get familiar with TEA's data uploading process

© 2016, Laboratory of Systems Biology and Network Biology,

Institute of Information Science, Academia Sinica, TAIWAN.

Lastest update 2016/08/20