java软件运行常见错误_GSEA原理以及软件的运行以及常见的错误及解决办法

第一部分 GSEA原理

目标:预先定义的基因集S是否随机的分布在排序的基因list

1. 表达谱,样品分为两类,以1/2定义

GSEA considers experiments with genomewide expression profiles from samples belonging to two classes, labeled

1 or 2.

2. 基因按照表达与分类的相关性排序

Genes are ranked based on the correlation between their expression and the class distinction by using any suitable metric

3. 计算富集打分(ES)

Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom. We expect that sets related to the phenotypic distinction will tend to show the latter distribution.

Step 1: Calculation of an Enrichment Score.

We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L.

The score is calculated by walking down the list L, increasing a running-sum statistic when we encounter a gene in S and decreasing it when we encounter genes not in S.

The magnitude of the increment depends on the correlation of the gene with the phenotype. The enrichment score is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov–Smirnov-like statistic

a running-sum statistic,

2,文件准备

2.1.  Expression dataset file (res, gct, pcl, or txt)     样品表达文件

2.3.  Gene sets file (gmx or gmt)     预定义基因集(非必须)

3.2 选择参数

4) 显著性参数若选择phenotype,FDR可设置0.25

若选择gene_set, FDR需低于0.05

5) metric for ranking genes一般可以选择log2_Ratio_of_classes,就是logFC

还可以根据自己需要选择另外的参数

6) gene set database可以选择软件中的如KEGG,GO,以及GO里面的cc,bp,mf等等

也可以是用户自己定义的gmt文件

7) 用户还可以选择自己的结果保存路径

4、点击下面的Run按钮

第三部分  常见的错误及解决办法

1、第一种错误Java heap space ,OutOfMemoryError

如这张图的右下角,你会看到运行的内存,这里是84M,用了43M

那就改运行java的运行内存吧,我自己的笨办法是下载了一个eclipse软件https://www.eclipse.org/downloads/

然后按照下面的教程改然后就可以运行了,你再次运行的时候可以看到上面的那个84M会变大很多

https://jingyan.baidu.com/article/5d6edee2f5efff99ebdeec63.html

https://blog.csdn.net/tomorrow13210073213/article/details/53031818

可以更改的大一些

对基因进行排序的各种参数解释

Metrics for Ranking Genes

For categorical phenotypes, GSEA determines a gene’s mean expression value for each phenotype and then uses one of the following metrics to calculate the gene’s differential expression with respect to the two phenotypes. To use median rather than mean expression values, set the Median for class metrics parameter to True, as described above.

●      Signal2Noise(default) uses the difference of means scaled by the standard deviation. Note: You must have at least three samples for each phenotype to use this metric.

where μ is the mean, n is the number of samples, and σ is the standard deviation; σ has a minimum value of

.2 * absolute(μ), where μ=0 is adjusted to μ=1. The larger the tTest ratio, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●         Ratio_of_Classes (also referred to as fold change) uses the ratio of class means to calculate fold change for natural scale data:

where μ is the mean. The larger the fold change, the more distinct the gene expression is in each phenotype and the more the gene acts as a “class marker.”

●    log2_Ratio_of_Classes uses the log2 ratio of class means to calculate fold change for natural scale data:

147557589_17_20181119092825122

where μ is the mean. This is the recommended statistic for calculating fold change for natural scale data.

文章知识点与官方知识档案匹配,可进一步学习相关知识Java技能树首页概览92707 人正在系统学习中 相关资源:Yalefree雅乐简谱打谱软件_打谱软件-WindowsServer工具类资源…

声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!

上一篇 2021年1月19日
下一篇 2021年1月19日

相关推荐