生物数据库
生物数据库是收集自科学实验、出版文献、高通量实验技术和计算分析等生命科学信息库,它包含来自基因组学、蛋白质组学、代谢组学、微阵列基因表达和系统发育学等领域的信息。
生物数据库大致可分为序列、结构和功能数据库。序列数据库储存核酸和蛋白质序列;结构数据库储存RNA和蛋白质的结构信息;功能数据库提供关于基因产物的生理作用信息(例如,酶活性、突变表型和生物途径等)。
数据库类型
生物数据库有两个常见的概念:一级数据库和二级数据库。一级数据库储存实验中获得数据;二级数据库使用其它数据库(例如,一级数据库)作为其信息源,然后根据需要进行处理或分析获得的结果。
数据库查找
查找生物数据库的重要资源是NAR(Nucleic Acids Research,核酸研究)期刊的特刊,它将许多与生物学和生物信息学相关的公开在线数据库分类,截止2018年共收录了1737个数据库。
NAR将所有数据库划分为15类,核苷酸序列数据库、RNA序列数据库、蛋白质序列数据库、结构数据库、基因组学数据库(非脊椎动物)、代谢和信号通路数据库、人类和其他脊椎动物基因组数据库、人类基因和疾病数据库、微阵列数据和其他基因表达数据库、蛋白质组学资源数据库、其他分子生物学数据库、细胞器数据库、植物数据库、免疫学数据库和细胞生物学数据库。
在线工具
NAR除了收录生物数据库,每年还发布可用于分子生物学数据分析和可视化的网络资源。
表1 2017年网络资源
Web Server name | URL | Brief description |
---|---|---|
agriGO v2 | http://systemsbiology.cau.edu.cn/agriGOv2/ | GO analysis for agricultural species |
AMMOS2 | http://drugmod.rpbs.univ-paris-diderot.fr/ammosHome.php | Energy minimization of protein–ligand complexes |
antiSMASH | http://antismash.secondarymetabolites.org/ | Secondary metabolite biosynthetic gene cluster mining in bacterial and fungal genomes |
ARTS | http://arts.ziemertlab.com | Biosynthetic gene cluster mining for novel antibiotics |
BAR 3.0 | http://bar.biocomp.unibo.it/bar3 | Protein structure and function annotation |
BepiPred-2.0 | http://www.cbs.dtu.dk/services/BepiPred-2.0/ | B-cell epitope prediction from a protein sequence |
BioAtlas | http://bioatlas.compbio.sdu.dk | Visualization of microbiome and metagenome locations |
BIS2Analyzer | http://www.lcqb.upmc.fr/BIS2Analyzer/ | Analysis of coevolving amino-acid pairs in protein sequences |
BusyBee | https://ccb-microbe.cs.uni-saarland.de/busybee | Metagenome binning |
CAFE | https://github.com/younglululu/CAFE | Stand-alone program for alignment-free comparison of metagenome data |
Cancer PanorOmics | http://panoromics.irbbarcelona.org | Mapping of cancer mutations to 3D protein–protein interaction sites |
COFACTOR | http://zhanglab.ccmb.med.umich.edu/COFACTOR/ | Structure-based protein function annotation |
compleXView | http://xvis.genzentrum.lmu.de/compleXView | Protein-protein interaction based on affinity purification mass spectrometry |
ConTra v3 | http://bioit2.irc.ugent.be/contra/v3 | Transcription factor binding sites analysis |
CPC2 | http://cpc2.cbi.pku.edu.cn | Protein coding potential of RNA transcripts |
CSPADE | http://cspade.fimm.fi/ | Chemoinformatics bioactivity assay visualization |
CSTEA | http://comp-sysbio.org/cstea/ | Analysis of time-series gene expression data on cell state transitions |
DEOGEN2 | http://deogen2.mutaframe.com/ | Prediction of deleterious mutations in proteins |
DNAproDB | http://dnaprodb.usc.edu | Structural analysis of DNA–protein complexes |
DSSR | http://jmol.x3dna.org | DNA and RNA structure visualization |
DynOmics | http://dyn.life.nthu.edu.tw/oENM/ | Protein molecular dynamics using elastic network models |
EBISearch | http://www.ebi.ac.uk/ebisearch | Web services text search in EMBL-EBI data |
FireProt | http://loschmidt.chemi.muni.cz/fireprot | Design of thermostable proteins |
GalaxyHomomer | http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=HOMOMER | Prediction of protein homo-oligomer structure |
GASS-WEB | http://gass.unifei.edu.br/ | Identification of enzyme active sites |
GeMSTONE | http://gemstone.yulab.org/ | Genetic variant prioritization in human disease |
Gene ORGANizer | http://geneorganizer.huji.ac.il | Linkage of human genes to their affected body organs |
GenProBiS | http://genprobis.insilab.org | Mapping of SNPs to protein binding sites |
GEPIA | http://gepia.cancer-pku.cn/ | Analysis of differential gene expression in cancer |
GeSeq | https://chlorobox.mpimp-golm.mpg.de/geseq.html | Annotation of chloroplast genomes |
GibbsCluster | http://www.cbs.dtu.dk/services/GibbsCluster-2.0 | Detection of protein short linear motifs |
GPCR-SSFE 2.0 | http://www.ssfa-7tmr.de/ssfe2/ | Homology modeling of G-protein coupled receptors |
GWAB | http://www.inetbio.org/gwab/ | Network-based genome wide association analysis |
HDOCK | http://hdock.phys.hust.edu.cn/ | Protein–protein and protein–DNA/RNA docking |
HGVA | http://bioinfodev.hpc.cam.ac.uk/web-apps/hgva | Archive of human genetic variant annotations |
HH-MOTiF | http://chimborazo.biochem.mpg.de/ | Detection of protein short linear motifs |
I-TASSER-MR | http://zhanglab.ccmb.med.umich.edu/I-TASSER-MR/ | Protein structure modeling for X-ray crystallography |
INTAA | http://bioinfo.uochb.cas.cz/INTAA/ | Analysis of amino acid interaction energies |
IntaRNA 2.0 | http://rna.informatik.uni-freiburg.de/IntaRNA/Input.jsp | Prediction of interactions between RNA molecules |
IslandViewer 4.0 | http://www.pathogenomics.sfu.ca/islandviewer4/ | Prediction of bacterial genomic islands (horizontal gene transfer) |
kpLogo | http://kplogo.wi.mit.edu/ | Detection and visualization of short sequence motifs |
LigParGen | http://jorgensenresearch.com/ligpargen | Force field parameters for molecular dynamics |
LimTox | http://limtox.bioinfo.cnio.es | Text mining for compound toxicity |
mCSM-NA | http://structure.bioc.cam.ac.uk/mcsm_na | Prediction of protein mutation effect on nucleic acid binding affinity |
MicrobiomeAnalyst | http://microbiomeanalyst.ca | Analysis of microbiome data |
MinePath | http://www.minepath.org | Differential expression analysis for regulatory network subpaths |
ModFOLD6 | http://www.reading.ac.uk/bioinf/ModFOLD/ | Protein structure quality assessment |
mTCTScan | http://jjwanglab.org/mTCTScan | Mutation prioritization for cancer drug response |
MutaGene | https://www.ncbi.nlm.nih.gov/projects/mutagene/ | Visualization and analysis of mutational profiles in cancer |
NNAlign-2.0 | http://www.cbs.dtu.dk/services/NNAlign-2.0 | Detection of ligand motifs for receptor–ligand interactions |
NOREVA | http://server.idrb.cqu.edu.cn/noreva/ | Evaluation of data normalization methods for mass spectrometry based metabolomics data |
Olelo | http://www.hpi.de/plattner/olelo | Text mining in PubMed |
OmicSeq | http://www.omicseq.org | Search for omics data in major repositories |
P4P | http://sing.ei.uvigo.es/p4p | Bacterial strain classification based on peptide datasets |
Pathview | http://pathview.uncc.edu/ | Visualization and annotation of metabolic pathways |
pepATTRACT | http://bioserv.rpbs.univ-paris-diderot.fr/services/pepATTRACT | Prediction of protein–peptide docking |
PharmMapper | http://lilab.ecust.edu.cn/pharmmapper | Drug target search using pharmacophore mapping |
PhD-SNPg | http://snps.biofold.org/phd-snpg | Deleterious SNP classification |
PIGSPro | http://cassandra.med.uniroma1.it/AbPrediction/web/pigs.php | Modeling of immunoglobulin variable domains |
plantiSMASH | http://plantismash.secondarymetabolites.org | Detection of biosynthetic gene clusters in plants |
PMut | http://mmb.irbbarcelona.org/PMut/ | Prediction of disease potential for protein mutations |
Prism3 | http://prism3.magarveylab.ca/prism | Prediction of natural product structures from biosynthetic gene clusters |
ProteinsAPI | http://www.ebi.ac.uk/proteins/api | Web service for protein data from UniProtKB |
ProteinsPlus | http://proteins.plus | Structure-based modeling of proteins |
ProteoSign | http://bioinformatics.med.uoc.gr/ProteoSign | Protein differential abundance analysis |
ReFOLD | http://www.reading.ac.uk/bioinf/ReFOLD/ | Protein structure refinement |
RegulatorTrail | https://regulatortrail.bioinf.uni-sb.de | Analysis of transcription factors and target genes |
RiPPMiner | http://www.nii.ac.in/rippminer.html | Prediction of chemical structures for ribosomally synthesized and post translationally modified peptides |
RNA workbench | https://github.com/bgruening/galaxy-rna-workbench | Stand-alone collection of tools for analyzing RNAseq and RNA sequence data |
RNA-MoIP | http://rnamoip.cs.mcgill.ca/ | Prediction of RNA 2D and 3D structure |
SBSPKSv2 | http://www.nii.ac.in/sbspks2.html | Analysis of polyketide synthases |
SCENERY | http://mensxmachina.org/en/software/ | Network reconstruction from cytometry data |
SDM | http://structure.bioc.cam.ac.uk/sdm2 | Prediction of stability in protein mutants |
SeMPI | http://www.pharmaceutical-bioinformatics.de/sempi/ | Prediction of polyketide synthase products from biosynthetic gene clusters |
SLiMSearch | http://slim.ucd.ie/slimsearch/ | Detection of protein short linear motifs |
SODA | http://protein.bio.unipd.it/soda/ | Prediction of solubility in protein mutants |
SpartaABC | http://spartaabc.tau.ac.il/webserver | Sequence simulation with indels |
ThreaDomEx | http://zhanglab.ccmb.med.umich.edu/ThreaDomEx | Prediction of protein domains and domain boundaries |
Tools at EMBL-EBI | http://www.ebi.ac.uk/Tools/webservices/ | Web service tools from EMBL-EBI |
TraitRateProp | http://traitrate.tau.ac.il/prop | Test of sequence evolution association with phenotype |
TRAPP | http://trapp.h-its.org | Analysis of protein binding site dynamics |
VCF.Filter | https://biomedical-sequencing.at/VCFFilter/ | Stand-alone program for filtering and annotating genetic variants in vcf files |
Web3DMol | http://web3dmol.duapp.com/ | Protein structure visualization |
WebGestalt | http://www.webgestalt.org | Gene set functional enrichment analysis |
WoPPER | http://WoPPER.ba.itb.cnr.it/ | Detection of bacterial genome regions with coordinated gene expression changes |
XSuLT | http://structure.bioc.cam.ac.uk/xsult | Annotation and visualization of protein multiple sequence alignment |
表2 2018年网络资源
Web server name | URL | Brief description |
---|---|---|
AAI-profiler | http://ekhidna2.biocenter.helsinki.fi/AAI | proteome average amino acid identity comparison |
AlloFinder | http://mdl.shsmu.edu.cn/ALF/ | allosteric modulator identification |
ArDock | http://ardock.ibcp.fr | protein–protein interaction region prediction |
BAGEL4 | http://bagel4.molgenrug.nl | secondary metabolite gene clusters (RIPPs, bacteriocins) |
BaMM | https://bammmotif.mpibpc.mpg.de | nucleotide binding motifs |
BeStSel | http://bestsel.elte.hu | circular dichroism spectroscopy based protein secondary structure analysis |
BRepertoire | http://mabra.biomed.kcl.ac.uk/BRepertoire | antibody repertoire analysis |
BUSCA | http://busca.biocomp.unibo.it | protein subcellular localization prediction |
CABS-flex 2.0 | http://biocomp.chem.uw.edu.pl/CABSflex2 | simulation of protein structure flexibility |
CalFitter | https://loschmidt.chemi.muni.cz/calfitter/ | protein thermal denaturation analysis |
CASTp 3.0 | http://sts.bioe.uic.edu/castp/ | topology of protein pockets, cavities and channels |
CavityPlus | http://www.pkumdl.cn/cavityplus | protein binding site cavities |
CellAtlasSearch | http://www.cellatlassearch.com | single cell gene expression data search |
cgDNAweb | http://cgDNAweb.epfl.ch | double-stranded DNA coarse-grain models |
CircadiOmics | http://circadiomics.ics.uci.edu | circadian rhythm dataset analysis and repository |
COACH-D | http://yanglab.nankai.edu.cn/COACH-D/ | protein–ligand binding site prediction |
Coloc-stats | https://hyperbrowser.uio.no/coloc-stats/ | genomic location enrichment analysis |
ComplexContact | http://raptorx2.uchicago.edu/ComplexContact/ | protein heterodimer complex residue–residue contact prediction |
CoNekT-Plants | http://conekt.plant.tools | comparative analyses of plant gene co-expression |
CRISPOR | http://crispor.org | guide sequences for CRISPR/Cas9 genome editing |
CRISPRCasFinder | https://crisprcas.i2bc.paris-saclay.fr | CRISPR array and Cas gene detection |
CSAR-web | http://genome.cs.nthu.edu.tw/CSAR-web | contig scaffolding |
dbCAN2 | http://cys.bios.niu.edu/dbCAN2 | carbohydrate-active enzyme annotation |
DynaMut | http://biosig.unimelb.edu.au/dynamut/ | point mutation effects on protein stability and dynamics |
easyFRAP-web | https://easyfrap.vmnet.upatras.gr/ | protein mobility analysis with fluorescence recovery after photobleaching data |
EviNet | https://www.evinet.org/ | gene set network enrichment analysis |
ezTag | http://eztag.bioqrator.org | biomedical concept annotation |
FragFit | http://proteinformatics.de/FragFit | protein segment modeling of cryo-EM density maps |
Freiburg RNA tools | http://rna.informatik.uni-freiburg.de | RNA analysis |
GADGET | http://gadget.biosci.gatech.edu | population-based distributions of genetic variants |
Galaxy | https://usegalaxy.org | biomedical data analysis workflows |
Galaxy HiCExplorer | https://hicexplorer.usegalaxy.eu | chromatin 3D conformation analysis |
GDA | http://gda.unimore.it/ | integration of drug response, gene expression profiles and mutations for cancer |
GeneMANIA | http://genemania.org | gene function prediction |
geno2pheno[ngs-freq] | http://ngs.geno2pheno.org | viral drug resistance prediction |
GIANT 2.0 | http://giant-v2.princeton.edu | human tissue-specific gene functional relationships |
GPCRM | http://gpcrm.biomodellab.eu/ | G protein-coupled receptors structure modeling |
gRINN | http://grinn.readthedocs.io | protein molecular dynamics residue interaction energies |
GWAS4D | http://mulinlab.org/gwas4d | prioritization of regulatory variants from GWAS data |
HMMER | http://www.ebi.ac.uk/Tools/hmmer | profile hidden Markov models homology search |
HotSpot Wizard 3.0 | http://loschmidt.chemi.muni.cz/hotspotwizard3 | protein engineering directed mutation |
HPEPDOCK | http://huanglab.phys.hust.edu.cn/hpepdock/ | peptide–protein docking |
HSYMDOCK | http://huanglab.phys.hust.edu.cn/hsymdock/ | symmetric protein complex docking |
InterEvDock2 | http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock2/ | protein–protein docking |
INTERSPIA | http://bioinfo.konkuk.ac.kr/INTERSPIA/ | protein–protein interactions in multiple species |
iPath3.0 | http://pathways.embl.de | metabolic pathway visualization and customization |
IUPred2A | http://iupred2a.elte.hu | intrinsically disordered protein regions |
Kinact | http://biosig.unimelb.edu.au/kinact/ | kinase activating missense mutations prediction |
KnotGenome | http://knotgenom.cent.uw.edu.pl/ | topological analysis of chromosome knots and links |
LitVar | https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar | genetic variant information retrieval from PubMed |
LOLAweb | http://lolaweb.databio.org | genomic region enrichment analysis |
MetaboAnalyst 4.0 | http://metaboanalyst.ca | metabolomics data analysis |
MetExplore | https://metexplore.toulouse.inra.fr/metexplore2/ | metabolic network analysis |
MiGA | http://microbial-genomes.org/ | prokaryotic genome and metagenome classification |
MISTIC2 | https://mistic2.leloir.org.ar | residue pair covariation in protein families |
MOLEonline | https://mole.upol.cz | biomolecule channels, tunnels, and pores |
mTM-align | http://yanglab.nankai.edu.cn/mTM-align/ | protein structure multiple alignment and database search |
Mutalisk | http://mutalisk.org | somatic mutations correlation with genomic, transcriptional and epigenomic features |
Ocean Gene Atlas | http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/ | marine plankton gene geolocation and abundance |
oli2go | http://oli2go.ait.ac.at/ | PCR primer and hybridization probe design for non-human DNA |
OmicsNet | http://www.omicsnet.ca | molecular interactions networks visualization |
oriTfinder | http://bioinfo-mml.sjtu.edu.cn/oriTfinder | origin of transfer sites in bacterial mobile genetic elements |
PaintOmics 3 | http://bioinfo.cipf.es/paintomics/ | visualization of omics data on KEGG pathways |
PANNZER2 | http://ekhidna2.biocenter.helsinki.fi/sanspanz/ | protein function prediction |
PatScanUI | https://patscan.secondarymetabolites.org/ | DNA and protein sequence pattern search |
PhytoNet | http://www.gene2function.de | phytoplankton gene expression profiles |
pirScan | http://cosbi4.ee.ncku.edu.tw/pirScan/ | piRNA target prediction |
ProTox-II | http://tox.charite.de/protox_II | chemical toxicity prediction |
psRNATarget | http://plantgrn.noble.org/psRNATarget/ | plant small RNA target prediction |
PSSMSearch | http://slim.ucd.ie/pssmsearch/ | protein motifs for binding and post-translational modification |
PUG-REST | https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest | PubChem cheminformatics programmatic access |
RepeatsDB-lite | http://protein.bio.unipd.it/repeatsdb-lite | tandem repeats in proteins |
RNApdbee 2.0 | http://lepus.cs.put.poznan.pl/rnapdbee-2.0/ | RNA secondary structure annotation |
RSAT | http://www.rsat.eu/ | DNA regulatory motifs |
SMARTIV | http://smartiv.technion.ac.il/ | RNA sequence and structure motifs for RNA binding proteins |
SNPnexus | http://www.snp-nexus.org | SNP functional annotation |
SPAR | https://www.lisanwanglab.org/SPAR | analysis of small RNA sequencing data |
SWISS-MODEL | https://swissmodel.expasy.org | structure homology modeling for proteins and protein complexes |
TAM 2.0 | http://www.scse.hebut.edu.cn/tam/ | microRNA set enrichment analysis |
TCRmodel | http://tcrmodel.ibbr.umd.edu/ | T cell receptor structure modeling |
UNRES | http://unres-server.chem.ug.edu.pl | coarse-grained simulation of protein structure |
VarAFT | http://varaft.eu | disease-causing variants annotation |
WEGO 2.0 | http://wego.genomics.org.cn | Gene Ontology visualization |
X2K Web | http://X2K.cloud | kinase enrichment analysis for differentially expressed gene signatures |
xiSPEC | http://spectrumviewer.org | proteomics mass spectrometry data analysis |
参考资料
https://en.wikipedia.org/wiki/Biological_database
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection
Editorial: The 15th annual Nucleic Acids Research web server issue 2017
Editorial: The 16th annual Nucleic Acids Research web server issue 2018