生物数据库

      生物数据库是收集自科学实验、出版文献、高通量实验技术和计算分析等生命科学信息库,它包含来自基因组学、蛋白质组学、代谢组学、微阵列基因表达和系统发育学等领域的信息。

      生物数据库大致可分为序列、结构和功能数据库。序列数据库储存核酸和蛋白质序列;结构数据库储存RNA和蛋白质的结构信息;功能数据库提供关于基因产物的生理作用信息(例如,酶活性、突变表型和生物途径等)。

数据库类型

    生物数据库有两个常见的概念:一级数据库和二级数据库。一级数据库储存实验中获得数据;二级数据库使用其它数据库(例如,一级数据库)作为其信息源,然后根据需要进行处理或分析获得的结果。

数据库查找

       查找生物数据库的重要资源是NAR(Nucleic Acids Research,核酸研究)期刊的特刊,它将许多与生物学和生物信息学相关的公开在线数据库分类,截止2018年共收录了1737个数据库。

    NAR将所有数据库划分为15类,核苷酸序列数据库、RNA序列数据库、蛋白质序列数据库、结构数据库、基因组学数据库(非脊椎动物)、代谢和信号通路数据库、人类和其他脊椎动物基因组数据库、人类基因和疾病数据库、微阵列数据和其他基因表达数据库、蛋白质组学资源数据库、其他分子生物学数据库、细胞器数据库、植物数据库、免疫学数据库和细胞生物学数据库。

在线工具

      NAR除了收录生物数据库,每年还发布可用于分子生物学数据分析和可视化的网络资源。

表1 2017年网络资源

Web Server name  URL  Brief description 
agriGO v2  http://systemsbiology.cau.edu.cn/agriGOv2/  GO analysis for agricultural species 
AMMOS2  http://drugmod.rpbs.univ-paris-diderot.fr/ammosHome.php  Energy minimization of protein–ligand complexes 
antiSMASH  http://antismash.secondarymetabolites.org/  Secondary metabolite biosynthetic gene cluster mining in bacterial and fungal genomes 
ARTS  http://arts.ziemertlab.com  Biosynthetic gene cluster mining for novel antibiotics 
BAR 3.0  http://bar.biocomp.unibo.it/bar3  Protein structure and function annotation 
BepiPred-2.0  http://www.cbs.dtu.dk/services/BepiPred-2.0/  B-cell epitope prediction from a protein sequence 
BioAtlas  http://bioatlas.compbio.sdu.dk  Visualization of microbiome and metagenome locations 
BIS2Analyzer  http://www.lcqb.upmc.fr/BIS2Analyzer/  Analysis of coevolving amino-acid pairs in protein sequences 
BusyBee  https://ccb-microbe.cs.uni-saarland.de/busybee  Metagenome binning 
CAFE  https://github.com/younglululu/CAFE  Stand-alone program for alignment-free comparison of metagenome data 
Cancer PanorOmics  http://panoromics.irbbarcelona.org  Mapping of cancer mutations to 3D protein–protein interaction sites 
COFACTOR  http://zhanglab.ccmb.med.umich.edu/COFACTOR/  Structure-based protein function annotation 
compleXView  http://xvis.genzentrum.lmu.de/compleXView  Protein-protein interaction based on affinity purification mass spectrometry 
ConTra v3  http://bioit2.irc.ugent.be/contra/v3  Transcription factor binding sites analysis 
CPC2  http://cpc2.cbi.pku.edu.cn  Protein coding potential of RNA transcripts 
CSPADE  http://cspade.fimm.fi/  Chemoinformatics bioactivity assay visualization 
CSTEA  http://comp-sysbio.org/cstea/  Analysis of time-series gene expression data on cell state transitions 
DEOGEN2  http://deogen2.mutaframe.com/  Prediction of deleterious mutations in proteins 
DNAproDB  http://dnaprodb.usc.edu  Structural analysis of DNA–protein complexes 
DSSR  http://jmol.x3dna.org  DNA and RNA structure visualization 
DynOmics  http://dyn.life.nthu.edu.tw/oENM/  Protein molecular dynamics using elastic network models 
EBISearch  http://www.ebi.ac.uk/ebisearch  Web services text search in EMBL-EBI data 
FireProt  http://loschmidt.chemi.muni.cz/fireprot  Design of thermostable proteins 
GalaxyHomomer  http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=HOMOMER  Prediction of protein homo-oligomer structure 
GASS-WEB  http://gass.unifei.edu.br/  Identification of enzyme active sites 
GeMSTONE  http://gemstone.yulab.org/  Genetic variant prioritization in human disease 
Gene ORGANizer  http://geneorganizer.huji.ac.il  Linkage of human genes to their affected body organs 
GenProBiS  http://genprobis.insilab.org  Mapping of SNPs to protein binding sites 
GEPIA  http://gepia.cancer-pku.cn/  Analysis of differential gene expression in cancer 
GeSeq  https://chlorobox.mpimp-golm.mpg.de/geseq.html  Annotation of chloroplast genomes 
GibbsCluster  http://www.cbs.dtu.dk/services/GibbsCluster-2.0  Detection of protein short linear motifs 
GPCR-SSFE 2.0  http://www.ssfa-7tmr.de/ssfe2/  Homology modeling of G-protein coupled receptors 
GWAB  http://www.inetbio.org/gwab/  Network-based genome wide association analysis 
HDOCK  http://hdock.phys.hust.edu.cn/  Protein–protein and protein–DNA/RNA docking 
HGVA  http://bioinfodev.hpc.cam.ac.uk/web-apps/hgva  Archive of human genetic variant annotations 
HH-MOTiF  http://chimborazo.biochem.mpg.de/  Detection of protein short linear motifs 
I-TASSER-MR  http://zhanglab.ccmb.med.umich.edu/I-TASSER-MR/  Protein structure modeling for X-ray crystallography 
INTAA  http://bioinfo.uochb.cas.cz/INTAA/  Analysis of amino acid interaction energies 
IntaRNA 2.0  http://rna.informatik.uni-freiburg.de/IntaRNA/Input.jsp  Prediction of interactions between RNA molecules 
IslandViewer 4.0  http://www.pathogenomics.sfu.ca/islandviewer4/  Prediction of bacterial genomic islands (horizontal gene transfer) 
kpLogo  http://kplogo.wi.mit.edu/  Detection and visualization of short sequence motifs 
LigParGen  http://jorgensenresearch.com/ligpargen  Force field parameters for molecular dynamics 
LimTox  http://limtox.bioinfo.cnio.es  Text mining for compound toxicity 
mCSM-NA  http://structure.bioc.cam.ac.uk/mcsm_na  Prediction of protein mutation effect on nucleic acid binding affinity 
MicrobiomeAnalyst  http://microbiomeanalyst.ca  Analysis of microbiome data 
MinePath  http://www.minepath.org  Differential expression analysis for regulatory network subpaths 
ModFOLD6  http://www.reading.ac.uk/bioinf/ModFOLD/  Protein structure quality assessment 
mTCTScan  http://jjwanglab.org/mTCTScan  Mutation prioritization for cancer drug response 
MutaGene  https://www.ncbi.nlm.nih.gov/projects/mutagene/  Visualization and analysis of mutational profiles in cancer 
NNAlign-2.0  http://www.cbs.dtu.dk/services/NNAlign-2.0  Detection of ligand motifs for receptor–ligand interactions 
NOREVA  http://server.idrb.cqu.edu.cn/noreva/  Evaluation of data normalization methods for mass spectrometry based metabolomics data 
Olelo  http://www.hpi.de/plattner/olelo  Text mining in PubMed 
OmicSeq  http://www.omicseq.org  Search for omics data in major repositories 
P4P  http://sing.ei.uvigo.es/p4p  Bacterial strain classification based on peptide datasets 
Pathview  http://pathview.uncc.edu/  Visualization and annotation of metabolic pathways 
pepATTRACT  http://bioserv.rpbs.univ-paris-diderot.fr/services/pepATTRACT  Prediction of protein–peptide docking 
PharmMapper  http://lilab.ecust.edu.cn/pharmmapper  Drug target search using pharmacophore mapping 
PhD-SNPg  http://snps.biofold.org/phd-snpg  Deleterious SNP classification 
PIGSPro  http://cassandra.med.uniroma1.it/AbPrediction/web/pigs.php  Modeling of immunoglobulin variable domains 
plantiSMASH  http://plantismash.secondarymetabolites.org  Detection of biosynthetic gene clusters in plants 
PMut  http://mmb.irbbarcelona.org/PMut/  Prediction of disease potential for protein mutations 
Prism3  http://prism3.magarveylab.ca/prism  Prediction of natural product structures from biosynthetic gene clusters 
ProteinsAPI  http://www.ebi.ac.uk/proteins/api  Web service for protein data from UniProtKB 
ProteinsPlus  http://proteins.plus  Structure-based modeling of proteins 
ProteoSign  http://bioinformatics.med.uoc.gr/ProteoSign  Protein differential abundance analysis 
ReFOLD  http://www.reading.ac.uk/bioinf/ReFOLD/  Protein structure refinement 
RegulatorTrail  https://regulatortrail.bioinf.uni-sb.de  Analysis of transcription factors and target genes 
RiPPMiner  http://www.nii.ac.in/rippminer.html  Prediction of chemical structures for ribosomally synthesized and post translationally modified peptides 
RNA workbench  https://github.com/bgruening/galaxy-rna-workbench  Stand-alone collection of tools for analyzing RNAseq and RNA sequence data 
RNA-MoIP  http://rnamoip.cs.mcgill.ca/  Prediction of RNA 2D and 3D structure 
SBSPKSv2  http://www.nii.ac.in/sbspks2.html  Analysis of polyketide synthases 
SCENERY  http://mensxmachina.org/en/software/  Network reconstruction from cytometry data 
SDM  http://structure.bioc.cam.ac.uk/sdm2  Prediction of stability in protein mutants 
SeMPI  http://www.pharmaceutical-bioinformatics.de/sempi/  Prediction of polyketide synthase products from biosynthetic gene clusters 
SLiMSearch  http://slim.ucd.ie/slimsearch/  Detection of protein short linear motifs 
SODA  http://protein.bio.unipd.it/soda/  Prediction of solubility in protein mutants 
SpartaABC  http://spartaabc.tau.ac.il/webserver  Sequence simulation with indels 
ThreaDomEx  http://zhanglab.ccmb.med.umich.edu/ThreaDomEx  Prediction of protein domains and domain boundaries 
Tools at EMBL-EBI  http://www.ebi.ac.uk/Tools/webservices/  Web service tools from EMBL-EBI 
TraitRateProp  http://traitrate.tau.ac.il/prop  Test of sequence evolution association with phenotype 
TRAPP  http://trapp.h-its.org  Analysis of protein binding site dynamics 
VCF.Filter  https://biomedical-sequencing.at/VCFFilter/  Stand-alone program for filtering and annotating genetic variants in vcf files 
Web3DMol  http://web3dmol.duapp.com/  Protein structure visualization 
WebGestalt  http://www.webgestalt.org  Gene set functional enrichment analysis 
WoPPER  http://WoPPER.ba.itb.cnr.it/  Detection of bacterial genome regions with coordinated gene expression changes 
XSuLT  http://structure.bioc.cam.ac.uk/xsult  Annotation and visualization of protein multiple sequence alignment 

表2 2018年网络资源

Web server name  URL  Brief description 
AAI-profiler  http://ekhidna2.biocenter.helsinki.fi/AAI  proteome average amino acid identity comparison 
AlloFinder  http://mdl.shsmu.edu.cn/ALF/  allosteric modulator identification 
ArDock  http://ardock.ibcp.fr  protein–protein interaction region prediction 
BAGEL4  http://bagel4.molgenrug.nl  secondary metabolite gene clusters (RIPPs, bacteriocins) 
BaMM  https://bammmotif.mpibpc.mpg.de  nucleotide binding motifs 
BeStSel  http://bestsel.elte.hu  circular dichroism spectroscopy based protein secondary structure analysis 
BRepertoire  http://mabra.biomed.kcl.ac.uk/BRepertoire  antibody repertoire analysis 
BUSCA  http://busca.biocomp.unibo.it  protein subcellular localization prediction 
CABS-flex 2.0  http://biocomp.chem.uw.edu.pl/CABSflex2  simulation of protein structure flexibility 
CalFitter  https://loschmidt.chemi.muni.cz/calfitter/  protein thermal denaturation analysis 
CASTp 3.0  http://sts.bioe.uic.edu/castp/  topology of protein pockets, cavities and channels 
CavityPlus  http://www.pkumdl.cn/cavityplus  protein binding site cavities 
CellAtlasSearch  http://www.cellatlassearch.com  single cell gene expression data search 
cgDNAweb  http://cgDNAweb.epfl.ch  double-stranded DNA coarse-grain models 
CircadiOmics  http://circadiomics.ics.uci.edu  circadian rhythm dataset analysis and repository 
COACH-D  http://yanglab.nankai.edu.cn/COACH-D/  protein–ligand binding site prediction 
Coloc-stats  https://hyperbrowser.uio.no/coloc-stats/  genomic location enrichment analysis 
ComplexContact  http://raptorx2.uchicago.edu/ComplexContact/  protein heterodimer complex residue–residue contact prediction 
CoNekT-Plants  http://conekt.plant.tools  comparative analyses of plant gene co-expression 
CRISPOR  http://crispor.org  guide sequences for CRISPR/Cas9 genome editing 
CRISPRCasFinder  https://crisprcas.i2bc.paris-saclay.fr  CRISPR array and Cas gene detection 
CSAR-web  http://genome.cs.nthu.edu.tw/CSAR-web  contig scaffolding 
dbCAN2  http://cys.bios.niu.edu/dbCAN2  carbohydrate-active enzyme annotation 
DynaMut  http://biosig.unimelb.edu.au/dynamut/  point mutation effects on protein stability and dynamics 
easyFRAP-web  https://easyfrap.vmnet.upatras.gr/  protein mobility analysis with fluorescence recovery after photobleaching data 
EviNet  https://www.evinet.org/  gene set network enrichment analysis 
ezTag  http://eztag.bioqrator.org  biomedical concept annotation 
FragFit  http://proteinformatics.de/FragFit  protein segment modeling of cryo-EM density maps 
Freiburg RNA tools  http://rna.informatik.uni-freiburg.de  RNA analysis 
GADGET  http://gadget.biosci.gatech.edu  population-based distributions of genetic variants 
Galaxy  https://usegalaxy.org  biomedical data analysis workflows 
Galaxy HiCExplorer  https://hicexplorer.usegalaxy.eu  chromatin 3D conformation analysis 
GDA  http://gda.unimore.it/  integration of drug response, gene expression profiles and mutations for cancer 
GeneMANIA  http://genemania.org  gene function prediction 
geno2pheno[ngs-freq]  http://ngs.geno2pheno.org  viral drug resistance prediction 
GIANT 2.0  http://giant-v2.princeton.edu  human tissue-specific gene functional relationships 
GPCRM  http://gpcrm.biomodellab.eu/  G protein-coupled receptors structure modeling 
gRINN  http://grinn.readthedocs.io  protein molecular dynamics residue interaction energies 
GWAS4D  http://mulinlab.org/gwas4d  prioritization of regulatory variants from GWAS data 
HMMER  http://www.ebi.ac.uk/Tools/hmmer  profile hidden Markov models homology search 
HotSpot Wizard 3.0  http://loschmidt.chemi.muni.cz/hotspotwizard3  protein engineering directed mutation 
HPEPDOCK  http://huanglab.phys.hust.edu.cn/hpepdock/  peptide–protein docking 
HSYMDOCK  http://huanglab.phys.hust.edu.cn/hsymdock/  symmetric protein complex docking 
InterEvDock2  http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock2/  protein–protein docking 
INTERSPIA  http://bioinfo.konkuk.ac.kr/INTERSPIA/  protein–protein interactions in multiple species 
iPath3.0  http://pathways.embl.de  metabolic pathway visualization and customization 
IUPred2A  http://iupred2a.elte.hu  intrinsically disordered protein regions 
Kinact  http://biosig.unimelb.edu.au/kinact/  kinase activating missense mutations prediction 
KnotGenome  http://knotgenom.cent.uw.edu.pl/  topological analysis of chromosome knots and links 
LitVar  https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar  genetic variant information retrieval from PubMed 
LOLAweb  http://lolaweb.databio.org  genomic region enrichment analysis 
MetaboAnalyst 4.0  http://metaboanalyst.ca  metabolomics data analysis 
MetExplore  https://metexplore.toulouse.inra.fr/metexplore2/  metabolic network analysis 
MiGA  http://microbial-genomes.org/  prokaryotic genome and metagenome classification 
MISTIC2  https://mistic2.leloir.org.ar  residue pair covariation in protein families 
MOLEonline  https://mole.upol.cz  biomolecule channels, tunnels, and pores 
mTM-align  http://yanglab.nankai.edu.cn/mTM-align/  protein structure multiple alignment and database search 
Mutalisk  http://mutalisk.org  somatic mutations correlation with genomic, transcriptional and epigenomic features 
Ocean Gene Atlas  http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/  marine plankton gene geolocation and abundance 
oli2go  http://oli2go.ait.ac.at/  PCR primer and hybridization probe design for non-human DNA 
OmicsNet  http://www.omicsnet.ca  molecular interactions networks visualization 
oriTfinder  http://bioinfo-mml.sjtu.edu.cn/oriTfinder  origin of transfer sites in bacterial mobile genetic elements 
PaintOmics 3  http://bioinfo.cipf.es/paintomics/  visualization of omics data on KEGG pathways 
PANNZER2  http://ekhidna2.biocenter.helsinki.fi/sanspanz/  protein function prediction 
PatScanUI  https://patscan.secondarymetabolites.org/  DNA and protein sequence pattern search 
PhytoNet  http://www.gene2function.de  phytoplankton gene expression profiles 
pirScan  http://cosbi4.ee.ncku.edu.tw/pirScan/  piRNA target prediction 
ProTox-II  http://tox.charite.de/protox_II  chemical toxicity prediction 
psRNATarget  http://plantgrn.noble.org/psRNATarget/  plant small RNA target prediction 
PSSMSearch  http://slim.ucd.ie/pssmsearch/  protein motifs for binding and post-translational modification 
PUG-REST  https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest  PubChem cheminformatics programmatic access 
RepeatsDB-lite  http://protein.bio.unipd.it/repeatsdb-lite  tandem repeats in proteins 
RNApdbee 2.0  http://lepus.cs.put.poznan.pl/rnapdbee-2.0/  RNA secondary structure annotation 
RSAT  http://www.rsat.eu/  DNA regulatory motifs 
SMARTIV  http://smartiv.technion.ac.il/  RNA sequence and structure motifs for RNA binding proteins 
SNPnexus  http://www.snp-nexus.org  SNP functional annotation 
SPAR  https://www.lisanwanglab.org/SPAR  analysis of small RNA sequencing data 
SWISS-MODEL  https://swissmodel.expasy.org  structure homology modeling for proteins and protein complexes 
TAM 2.0  http://www.scse.hebut.edu.cn/tam/  microRNA set enrichment analysis 
TCRmodel  http://tcrmodel.ibbr.umd.edu/  T cell receptor structure modeling 
UNRES  http://unres-server.chem.ug.edu.pl  coarse-grained simulation of protein structure 
VarAFT  http://varaft.eu  disease-causing variants annotation 
WEGO 2.0  http://wego.genomics.org.cn  Gene Ontology visualization 
X2K Web  http://X2K.cloud  kinase enrichment analysis for differentially expressed gene signatures 
xiSPEC  http://spectrumviewer.org  proteomics mass spectrometry data analysis 

参考资料

https://en.wikipedia.org/wiki/Biological_database

The 2018 Nucleic Acids Research database issue and the online molecular biology database collection

Editorial: The 15th annual Nucleic Acids Research web server issue 2017

Editorial: The 16th annual Nucleic Acids Research web server issue 2018