|Data processing, please wait...|
5-methylcytosine (5mC) is the best known methylation modificated nucleotides in genome. They are often classified in three sequence context CG, CHG, CHH and the location where the Cis in. To catch the essence of genome methylation status and to meet the efficiency for performing analysis online, we introduce a straightforward method to measure the methylation landscapes regarding to the sequence contexts.
The methylation landscape in TEA is based on the gene, in which DNA methylation levels in the promoter and gene body are estimated from the WGBS data.
The DNA methylation level for individual cytosines is estimated as Equation (1.1), then, the average promoter or gene body methylation levels within the promoter or gene body is calculated as the average within the range by Equation (1.2):
Each C should have mapped depth in a minimum threshold of 4, and at least five C sites
of each sequence context type to be reported. The mapping report is converted to a summary
which includes six measurements (i.e., pmt-CG, pmt-CHG, pmt-CHH, gene_CG,
gene_CHG, gene_CHH ) for each gene. This method is implement to an in-house program
EpiMolas.jar to process BS-Seq mapping results into a small, tab-delimited
data file, mtable :
gene_id pmt_CG gene_CG pmt_CHG gene_CHG pmt_CHH gene_CHH AT1G01010 0.011463 0.053009 0.010000 0.011635 0.021765 0.012631 AT1G01020 0.000000 0.081519 0.006957 0.007177 0.003614 0.007521 AT1G01030 0.005385 0.012800 0.002439 0.023452 0.003116 0.016939 AT1G01040 0.011200 0.589821 0.009677 0.015773 0.016944 0.011699 AT1G01046 0.765250 0.385000 0.022500 0.058750 0.014325 0.047727 ........
Note The align of column name and value is not changed to make perfect view on webpage, but they do seperated to the neighbors by tab. You can find seven column names in the first row, and the six data columns, lead by the gene id column.
These measurements are a normalized score from 0 (all observed sites are unmethylated) to 1 (all observed sites are methylated), or "NaN" for genes which do not have sufficient reads/sites to calculate the value. A deviation of 0.1 on the measurement reflects an overall 10% of Cs contributed to the observed feature changes the methylation state.
Check The Java Environment
Before you run the EpiMolas.jr, please check the java environment installed properly in your linux environment. For example, simply type a version check :
and you will get a return like:
openjdk version "1.8.0_45-internal" OpenJDK Runtime Environment (build 1.8.0_45-internal-b14) OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
If not, you need to ask the administrator's help for installing Java.
Download the EpiMolas.jar from github (move to the directory that you want to save the script):
Converting the report to mtable:
First, we assume that you have completed the mapping process and had the right mapping report, *.CGmap from BS-Seeker2, or CXreport.txt from Bismark
java -jar EpiMolas.jar the_input_mapping_report_file gtf > the_output_file
mtable from BS-Seeker2 CGmap output file (e.g., the input file: my.CGmap and the output mtable file: result.mtable
java -jar path_to/EpiMolas.jar path_to/my.CGmap path_to/TAIR10.gtf > result.mtable &
mtable from Bismark CX_report output file (e.g., the input file: my.CX_report.txt and the output mtable file: result.mtable
java -jar path_to/EpiMolas.jar path_to/my.CX_report.txt path_to/TAIR10.gtf > result.mtable &
You need to indicate paths to the required files (EpiMolas.jar, the input mapping report file, gtf) if they are not in the same directory where you execute EpiMolas.jar.
You may specify
-Xmx on the maximum RAM memory in use and
-Xms on the initial
memory. Emprically, if you have a *.CGmap file in size of X Gb, you may assign 2.3*X GB in the -Xms to
ensure the success of run.
It will look like
$java -Xms10G -jar EpiMolas.jar exp1.CGmap TAIR10.gtf > exp1.mtable & if 10G
RAM is allocated as the initial memory of run.
Now you should have mtable in hand. More precisely, one mtable for each dataset. Make sure the demultiplex step and read pooling had been done before mapping step if you had a complicated design and multiple lanes used in one experiment.
Build Your Own Project : get familiar with TEA's data uploading process