DESeq2:基于负二项式模型的高通量测序数据基因差异表达分析。

1. 读入基因表达谱数据 # if (!requireNamespace(“BiocManager”, quietly = TRUE))#   install.packages(“BiocManager”)# # BiocManager::install(“DESeq2”)# setwd(work_dir)# count_df <- read.csv(file,row.names = 1)count_df <- round(count_df) # 如果有小数dim(count_df)count_df[1:3,1:4] 2.  生成DESeqDataSet 对象 library(DESeq2)# 样本分类condition <- factor(c(rep(“control”,50),rep(“treat”,63))) # mockcolData <- data.frame(row.names=colnames(count_df), condition)dds <-DESeqDataSetFromMatrix(countData = as.matrix(count_df),                             colData = colData,                             design= ~condition)head(dds)

注:DESeqDataSet(),DESeqDataSetFromMatrix(),DESeqDataSetFromHTSeqCount()  都能生成DESeqDataSet 对象

3. 差异表达基因分析 dds <- DESeq(dds)res <- results(dds)summary(res)head(res)resOdered <- res[order(res$padj),]deg <- as.data.frame(resOdered)#deg <- na.omit(deg)dim(deg)write.csv(deg,file= “diff_deseq2.csv”)

DESeq函数包含三步,estimation of size factors(estimateSizeFactors), estimation of dispersion(estimateDispersons), Negative Binomial GLM fitting and Wald statistics(nbinomWaldTest) 返回DESeqResults 对象。