adv bi unit 1

download adv bi unit 1

of 12

Transcript of adv bi unit 1

  • 8/9/2019 adv bi unit 1

    1/39

  • 8/9/2019 adv bi unit 1

    2/39

    contained (4,5)6,433 records #ith a total o ((,543,(72,85) bases9

    see the EMBL DB statistics page!

    -t can be accessed and searched through the ; system at EB-, or

    one can do#nload the entire database as lat iles! 1n e'ample o #hat

    an entry looks like is gi"en or the human ra oncogene protein, -D:

    *;1&; !

    "enBan www.ncbi.nlm.nih.gov!"enban!

    The GenBank nucleotide database is maintained by the

  • 8/9/2019 adv bi unit 1

    3/39

     primary ones, or ha"e a dierent organi$ation o the data to better suit

    some speciic purpose! *o#e"er, the nucleotide sequences themsel"es

    should al#ays be a"ailable in the EMBL>GenBank databases! -n this

    sense, the databases belo# are secondary databases!

    &ni"ene www.ncbi.nlm.nih.gov!&ni"ene!

    The /niGene system attempts to process the GenBank sequence data

    into a non%redundant set o gene%oriented clusters! Each /niGene

    cluster contains sequences that represent a unique gene, as #ell as

    related inormation such as the tissue types in #hich the gene has

     been e'pressed and map location!

    S"# genome'www.stan(ord.edu!Saccharomyces!

    The accharomyces Genome Database +GD is a scientiic databaseo the molecular biology and genetics o the yeast accharomyces

    cere"isiae!

    EB) "enomes www.ebi.ac.u!genomes!

    This #eb site pro"ides access and statistics or the completed

    genomes, and inormation about ongoing pro?ects!

    "enome Biology www.ncbi.nlm.nih.gov!"enomes!

    The Genome Biology site at

  • 8/9/2019 adv bi unit 1

    4/39

    Ensembl is a ?oint pro?ect bet#een EMBL%EB- and the anger .entre

    to de"elop a sot#are system #hich produces and maintains automatic

    annotation o eukaryotic genomes!

    *rotein Sequence

    The t#o protein sequence databases @-%A;=T and A-; are

    dierent rom the nucleotide databases in that they are both curated!

    This means that groups o designated curators +scientists prepare the

    entries rom literature and>or contacts #ith e'ternal e'perts!

    SWISS-PROT, TrEMBL www.expasy.ch/sprot/

    @-%A;=T is a protein sequence database #hich stri"es to pro"ide

    a high level o( annotations +such as the description o the unction o a protein, its domains structure, post%translational modiications,

    "ariants, etc!, a minimal le"el o redundancy and high le"el o

    integration #ith other databases!

    -t #as started in (862 by 1mos Bairoch in the Department o Medical

    Biochemistry at the /ni"ersity o Gene"a! This database is generally

    considered one o the best protein sequence databases in terms o the

    quality o the annotation! ;elease 58!(3 +(( Jan 344( contained

    83,3(( entries!

    TrEMBL is a computer'annotated supplement o @-%A;=T

    that contains all the translations o EMBL nucleotide sequence entries

    http://www.expasy.ch/sprot/http://www.expasy.ch/sprot/

  • 8/9/2019 adv bi unit 1

    5/39

    not yet integrated in @-%A;=T! The procedure that is used to

     produce it #as de"eloped by ;ol 1p#eiler! ;elease (7!( +7 Jan

    344( contained 5)6,(73 entries! The annotation o an entry in

    TrEMBL has not +yet reached the standards required or inclusion

    into @-%A;=T proper!

    @-%A;=T and TrEMBL are de"eloped by the @-%A;=T

    groups at #iss -nstitute o Bioinormatics +-B and at EB-! The

    databases can be accesses and searched through the the ; system atE'A1y, or one can do#nload the entire database as one single lat

    ile! 1n e'ample o #hat an entry looks like is gi"en or the human ra 

    oncogene protein, -D 0;1&C*/M1

    The @-%A;=T database has some legal restrictions: the entries

    themsel"es are copyrighted, but reely accessible and usable byacademic researchers! .ommercial companies must pay a license ee

    rom -B to use @-%A;=T!

    PIR pir.georgetown.e!

    The Arotein -normation ;esource +A-; is a di"ision o the

  • 8/9/2019 adv bi unit 1

    6/39

    A-; gre# out o Margaret Dayhos #ork in the middle o the (824s!

    -t stri"es to be comprehensive, #ell%organi$ed, accurate, and

    consistently annotated! *o#e"er, it is generally belie"ed that it does

    not reach the le"el o completeness in the entry annotation as does

    @-%A;=T! 1lthough @-%A;=T and A-; o"erlap e'tensi"ely,

    there are still many sequences #hich can be ound in only one o

    them!

    =ne can search or entries or do sequence similarity searches at theA-; site! The database can also be do#nloaded as a set o iles! 1n

    e'ample o #hat an entry looks like is gi"en or the human ra%(

    oncogene protein, -D T*/&2!

    A-; also produces the N+L'#, #hich is a database o sequences

    e'tracted rom the three%dimensional structures in the AroteinDatabank +ADB +see also the ollo#ing page in this lecture! The

     

  • 8/9/2019 adv bi unit 1

    7/39

    domain, it contains a multiple alignment o a set o deining

    sequences +the seeds and the other sequences in @-%A;=T and

    TrEMBL that can be matched to that alignment!

    The database #as started in (882 and is maintained by a consortium

    o scientists, among them Erik onnhammer +.G;, 0-, #eden,

    ean Eddy +@ash/, t Louis /1, ;ichard Durbin, 1lan Bateman

    and E#an Birney +anger .entre, /0! ;elease 7!7 +ep 3444

    contains 3)6 amilies!

    The alignments can be con"erted into hidden Marov

    models +*MM, #hich can be used to search or domains in a query

     protein sequence! The sot#are *MME;  +by ean Eddy is the

    computational oundation or Aam! The domain structure o protein

    sequences in @-%A;=T and TrEMBL are a"ailable directly romthe Aam #eb sites, and it is also possible to search or domains in

    other sequences using ser"ers at the #eb sites!

    The technology behind Aam>*MME; #ill be discussed in a lecture

    later in this course!

    The Aam database can be searched, or used to identiy domains in a

    sequence, or do#nloaded rom the #ebsites abo"e! 1n e'ample o a

    multiple sequence alignment that deines a protein amily +domain is

    gi"en or the ;a%like ;as%binding domain +Aam name ;BD,

    accession code A&43(82!

    http://hmmer.wustl.edu/http://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/RBD.alihttp://www.avatar.se/molbioinfo2001/RBD.alihttp://hmmer.wustl.edu/http://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/hmm-pfam.htmlhttp://www.avatar.se/molbioinfo2001/RBD.alihttp://www.avatar.se/molbioinfo2001/RBD.ali

  • 8/9/2019 adv bi unit 1

    8/39

    The Aam database is licensed under the G

  • 8/9/2019 adv bi unit 1

    9/39

    *rimary and Secondary databases.

    Primary Databases:

    Databases consisting of data derivedexperimentally such as nucleotide sequences

    and three dimentional structures are known as primary databases.

    primary databases(consisting of data derived experimentally)

    • grown tremendously over the years

    • contains information of the sequence or  structure alone and associated

    annotation information

    econdary Dtabases:

    !hose data that are derived from the analysis or treatement of primary data such assecondary structures" hydrophobicity plots" and domain are stored in secondary

    databases

    • contains derived information from a primary database" like information

    about conserved sequence" signature sequence and active site residues of

    the protein families arrived by multiple sequence alignment of a set of related

    proteins

    • secondary structure database contains entries of the PDB in an organi#ed

    way (for instance" by classification of all PD$ entries according to structures

    like alpha%helix or &%sheets) and also information on conserved secondary

    structure motifs of a particular protein

     

    composite databases

  • 8/9/2019 adv bi unit 1

    10/39

    •  'oins a variety of different primary database sources" which obviates the

    need to search multiple resources

    *rimary databases '

     "enBan,• The "enBan  sequence database is an open access,

    annotated collection o all publicly

    a"ailable nucleotide sequences and

    their protein translations!

    • This database is produced and maintained by

    the 

  • 8/9/2019 adv bi unit 1

    11/39

    • -n the more than 54 years since its establishment,

    GenBank has become the most important and most

    inluential database or research in almost all biological

    ields, #hose data are accessed and cited by millions o 

    researchers around the #orld!

    • GenBank is built by direct submissions rom indi"idual

    laboratories, as #ell as rom bulk submissions rom

    large%scale sequencing centers!

    • =nly original sequences can be submitted to GenBank!

    Direct submissions are made to GenBank using Bank-t,

    #hich is a @eb%based orm, or the stand%alone

    submission program, equin!

    • /pon receipt o a sequence submission, the GenBank 

    sta e'amines the originality o the data and assigns

    an accession number  to the sequence and perorms

    quality assurance checks!

    • The submissions are then released to the public database,

    #here the entries are retrie"able by Entre$ or 

    do#nloadable by &TA!

    EMBL,

    http://www.ncbi.nlm.nih.gov/BankIt/http://www.ncbi.nlm.nih.gov/Sequin/http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)http://en.wikipedia.org/wiki/Entrezhttp://en.wikipedia.org/wiki/File_Transfer_Protocolhttp://www.ncbi.nlm.nih.gov/BankIt/http://www.ncbi.nlm.nih.gov/Sequin/http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)http://en.wikipedia.org/wiki/Entrezhttp://en.wikipedia.org/wiki/File_Transfer_Protocol

  • 8/9/2019 adv bi unit 1

    12/39

    • The EMBL >###!ebi!ac!uk>embl, maintained at the European

    Bioinormatics -nstitute +EB- near .ambridge, /0, is a

    comprehensi"e collection o nucleotide sequences and

    annotation rom a"ailable public sources!

    • The database is part o an international collaboration

    #ith DDBJ +Japan and GenBank +/1!

    • Data are e'changed daily bet#een the collaborating

    institutes!

    • @ebinis the preerred tool or indi"idual submissions o 

    nucleotide sequences, including Third Aarty 1nnotation

    +TA1 and alignments!

    • 1utomated procedures are pro"ided or submissions

    rom large%scale sequencing pro?ects and data rom the

    European Aatent =ice!

    •  

  • 8/9/2019 adv bi unit 1

    13/39

    • =ther tools are a"ailable or sequence similarity

    searching +e!g! &1T1 and BL1T!

    ##B$,

    • The #N #ata Ban o( $apan +DDBJ is a biological

    database that collects D

  • 8/9/2019 adv bi unit 1

    14/39

    • DDBJ is primarily unded by the Japanese Ministry o 

    Education, .ulture, ports, cience and

    Technology +MEFT!• The principal purpose o DDBJ operations is to impro"e

    the quality o -

  • 8/9/2019 adv bi unit 1

    15/39

    • sequence database consists o sequence entries! equence

    entries are composed o dierent line types,

    each #ith their o#n ormat! &or standardi$ation purposesthe ormat o @-%A;=T ollo#s as closely as

    •  possible that o the EMBL TrEMBL contains high%quality

    computationally analy$ed records, #hich are enriched

    #ith automatic annotation!

    • -t #as introduced in response to increased datalo#

    resulting rom genome pro?ects, as the time% and labour%

    consuming manual annotation process o  

    /niArot0B>#iss%Arot could not be broadened to include

    all a"ailable protein sequences!

  • 8/9/2019 adv bi unit 1

    16/39

    • The translations o annotated coding sequences in

    the EMBL%Bank>GenBank>DDBJ nucleotide sequence

    database are automatically processed and entered in

    /niArot0B>TrEMBL! /niArot0B>TrEMBL also contains

    sequences rom ADB, and rom gene prediction,

    including Ensembl, ;eeqand ..D!

    • Due to the nature o the source /niArot0B>TrEMBL is

    highly redundant and the quality o the annotation is "ery

    "ariable! 1s #ell as the original annotations carried o"er 

    rom EMBL%Bank additional annotations are added

     based on a series o automated annotation #orklo#s!

    •   1s the entries in /niArot0B>TrEMBL and manually

    re"ie#ed by the /niArot curators they graduate into

    /niArot0B>#iss%Arot +the human curated section o 

    /niArot0B and may be merged into e'isting entries

    #hich describe the same gene in the same species! 

    • The usual #iss%Arot annotation pipeline in"ol"es the

    manual annotation o TrEMBL entries, their integration

    into #iss%Arot, #ith their original accession number,

    and subsequent deletion rom TrEMBL!

     

    Secondary databases 5

    http://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Ensemblhttp://en.wikipedia.org/wiki/RefSeqhttp://en.wikipedia.org/wiki/Consensus_CDS_Projecthttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/INSDChttp://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Ensemblhttp://en.wikipedia.org/wiki/RefSeqhttp://en.wikipedia.org/wiki/Consensus_CDS_Project

  • 8/9/2019 adv bi unit 1

    17/39

    *+3S)4E,

    • *+3S)4E is a protein database!(H3H -t consists o entries

    describing the protein amilies, domains and unctional

    sites as #ell as amino acid patterns and proiles in them!

    • -t is based on the obser"ation that, #hile there is a huge

    number o dierent proteins, most o them can be

    grouped, on the basis o similarities in their sequences,

    into a limited number o amilies!

    • Aroteins or protein domains belonging to a particular 

    amily generally share unctional attributes and are

    deri"ed rom a common ancestor!

    • A;=-TE currently contains patterns and proiles

    speciic or more than a thousand protein amilies or 

    domains!

    • Each o these signatures comes #ith documentation

     pro"iding background inormation on the structure and

    unction o these proteins!

    The Aro;ule section o A;=-TE is constituted o manually created rules that can automatically generate

    annotation in the /niArot0B>#iss%Arot ormat based on

    A;=-TE motis!

    http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/PROSITE#cite_note-DeCastro2006-1http://en.wikipedia.org/wiki/PROSITE#cite_note-Hulo2007-2http://en.wikipedia.org/wiki/Protein_familieshttp://en.wikipedia.org/wiki/Protein_domainshttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Amino_acidhttp://www.uniprot.org/http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/PROSITE#cite_note-DeCastro2006-1http://en.wikipedia.org/wiki/PROSITE#cite_note-Hulo2007-2http://en.wikipedia.org/wiki/Protein_familieshttp://en.wikipedia.org/wiki/Protein_domainshttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Functional_sitehttp://en.wikipedia.org/wiki/Amino_acidhttp://www.uniprot.org/

  • 8/9/2019 adv bi unit 1

    18/39

    • A;=-TEs uses include identiying possible unctions o 

    ne#ly disco"ered proteins and analysis o kno#n

     proteins or pre"iously undetermined acti"ity!• A;=-TE oers tools or protein sequence analysis and

    moti detection +see sequence moti , A;=-TE patterns!

    -t is part o the E'A1y  proteomicsanalysis ser"ers!

    *+)N4S,• *+)N4S database is a collection o so%called

    IingerprintsI

    • it pro"ides both a detailed annotation resource or protein

    amilies, and a diagnostic tool or ne#ly determinedsequences!

    • 1 ingerprint is a group o conser"ed motis taken rom

    a multiple sequence alignment % together, the motis orm

    a characteristic signature or the aligned protein amily!

    • The motis themsel"es are not necessarily contiguous in

    sequence, but may come together in 5D space to deine

    molecular binding sites or interaction suraces!

    http://en.wikipedia.org/wiki/Sequence_analysishttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Sequence_motif#PROSITE_pattern_notationhttp://en.wikipedia.org/wiki/ExPASyhttp://en.wikipedia.org/wiki/Proteomicshttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_analysishttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Sequence_motif#PROSITE_pattern_notationhttp://en.wikipedia.org/wiki/ExPASyhttp://en.wikipedia.org/wiki/Proteomicshttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Protein_familyhttp://en.wikipedia.org/wiki/Sequence_motifhttp://en.wikipedia.org/wiki/Multiple_sequence_alignment

  • 8/9/2019 adv bi unit 1

    19/39

    • The particular diagnostic strength o ingerprints lies in

    their ability to distinguish sequence dierences at the

    clan, superamily, amily and subamily le"els!

    • This allo#s ine%grained unctional diagnoses o 

    uncharacterised sequences, allo#ing, or e'ample,

    discrimination bet#een amily members on the basis o 

    the ligands they bind or the proteins #ith #hich theyinteract, and highlighting potential oligomerisation or 

    allosteric sites!

    • A;-

  • 8/9/2019 adv bi unit 1

    20/39

    • ie# protein domain architectures

    • E'amine species distribution

    • &ollo# links to other databases

    • ie# kno#n protein structures

    •  

  • 8/9/2019 adv bi unit 1

    21/39

     

    The database can be searched by e%mail and @orld @ide

    @eb +@@@ ser"ers +http:>>blocks!hcrc!org>help to

    classiy protein and nucleotide sequences!  The description o a protein amily by its conser"ed

    regions ocuses on the amilys characteristic and

    distincti"e sequence eatures, thus reducing noise!

      Databases o conser"ed eatures o protein amilies can

     be utili$ed to classiy sequences rom proteins, cD

  • 8/9/2019 adv bi unit 1

    22/39

    Bio"+)#

    • The Biological "eneral +epository (or )nteraction

    #atasets +Bio"+)# is a curated biological

    database o protein%protein and genetic interactions

    created in 3445

    • -t stri"es to pro"ide a comprehensi"e resource

    o proteinKprotein and genetic interactions or all

    ma?or model organism species #hile attempting to

    remo"e redundancy to create a single mapping o 

    interactions!• The Biological General ;epository or -nteraction

    Datasets +BioG;-D database #as de"eloped to house

    and distribute collections o protein and genetic

    interactions rom ma?or model organism species!

    • /sers o The BioG;-D can search or their protein o 

    interest and retrie"e annotation, as #ell as physical

    and genetic interaction data as reported, by the primary

    literature and compiled by in house large%scale curation

    eorts!

    http://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Biological_databasehttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Geneticshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Model_organismhttp://en.wikipedia.org/wiki/Proteinhttp://en.wikipedia.org/wiki/Genetics

  • 8/9/2019 adv bi unit 1

    23/39

    • =riginally separated into organism speciic databases, the

    ne#est "ersion no# pro"ides a uniied ront end allo#ing

    or searches across se"eral organisms simultaneously!• The BioG;-D is unded by the BB;., 

  • 8/9/2019 adv bi unit 1

    24/39

    • Each o the member databases o -nterAro contribute

    to#ards a dierent niche, rom "ery high%le"el, structure%

     based classiications +/AE;&1M-LN and .1T*%

    Gene5D through to quite speciic sub%amily

    classiications +A;-

  • 8/9/2019 adv bi unit 1

    25/39

    • The data, typically obtained by F%ray

    crystallography or 

  • 8/9/2019 adv bi unit 1

    26/39

  • 8/9/2019 adv bi unit 1

    27/39

    • 1 moti"ation or this classiication is to determine the

    e"olutionary relationship bet#een proteins!

    • Aroteins #ith the same shapes but ha"ing little sequence

    or unctional similarity are placed in dierent

    IsuperamiliesI, and are assumed to ha"e only a "ery

    distant common ancestor!

    • Aroteins ha"ing the same shape and some similarity o 

    sequence and>or unction are placed in IamiliesI, and

    are assumed to ha"e a closer common ancestor!

     

    The .=A database is reely accessible on the internet!

     

    .=A #as created in (88!(H 

    The source o protein structures is the Arotein Data Bank !

    The unit o classiication o structure in .=A is

    the protein domain!

    The shapes o domains are called IoldsI in .=A!

    Domains belonging to the same old ha"e the same ma?or 

    secondary structures in the same arrangement #ith the

    same topological connections!

    http://en.wikipedia.org/wiki/Structural_Classification_of_Proteins_database#cite_note-NAR2007-1http://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Protein_domainhttp://en.wikipedia.org/wiki/Structural_Classification_of_Proteins_database#cite_note-NAR2007-1http://en.wikipedia.org/wiki/Protein_Data_Bankhttp://en.wikipedia.org/wiki/Protein_domain

  • 8/9/2019 adv bi unit 1

    28/39

    The le"els o .=A are as ollo#s!

    • .lass: Types o olds, e!g!, beta sheets!

    • &old: The dierent shapes o domains #ithin a class!

    • uperamily: The domains in a old are grouped into

    superamilies, #hich ha"e at least a distant common

    ancestor!

    • &amily: The domains in a superamily are grouped into

    amilies, #hich ha"e a more recent common ancestor!

    • Arotein domain: The domains in amilies are grouped

    into protein domains, #hich are essentially the same

     protein!

    • pecies: The domains in Iprotein domainsI are grouped

    according to species!

    • Domain: part o a protein! &or simple proteins, it can be

    the entire protein!

    /47,

  • 8/9/2019 adv bi unit 1

    29/39

     

    The /47 *rotein Structure /lassi(icationis a semi%

    automatic, hierarchical classiication o protein domains

    .1T* shares many broad eatures #ith its principalri"al, .=A, ho#e"er there are also many areas in #hich

    the detailed classiication diers greatly!

     

    =nly crystal structures sol"ed to resolution better than

    !4 angstroms are considered, together #ith

  • 8/9/2019 adv bi unit 1

    30/39

    7omologous

    superamily

    indicati"e o a demonstrable e"olutionary

    relationship! Equi"alent to the

    superamily le"el o .=A!

     

    .lass is determined according to the secondarystructure composition and packing #ithin the

    structure! Three ma?or classes are recognised9

    mainly%alpha, mainly%beta and alpha%beta!

    Euro/arb#B,

    • Euro/arb#B is an E/%unded initiati"e or the creation

    o sot#are and standards or the systematic collection

    o carbohydrate structures and their e'perimental data!

    • The E/;=.arbDB pro?ect is a design study or a

    technical rame#ork, #hich pro"ides sophisticated,

    reely accessible, open%source inormatics tools and

    databases to support glycobiology and glycomic

    research!

    http://en.wikipedia.org/wiki/European_Unionhttp://en.wikipedia.org/wiki/Carbohydratehttp://en.wikipedia.org/wiki/European_Unionhttp://en.wikipedia.org/wiki/Carbohydrate

  • 8/9/2019 adv bi unit 1

    31/39

    • E/;=.arbDB is a relational database containing glycan

    structures, their biological conte't and, #hen a"ailable,

     primary and interpreted analytical data rom high%

     perormance liquid chromatography, mass spectrometry

    and nuclear magnetic resonance e'periments!

    • Database content can be accessed "ia a #eb%based user 

    interace!

    • The database is complemented by a suite o 

    glycoinormatics tools, speciically designed to assist the

    elucidation and submission o glycan structure and

    e'perimental data #hen used in con?unction #ith

    contemporary carbohydrate research #orklo#s

    • The pro?ect includes a database o kno#n carbohydrate

    structures and e'perimental data, speciically mass

    spectrometry, *AL. and 

  • 8/9/2019 adv bi unit 1

    32/39

    • 1 speciic design ob?ecti"e o the architecture o the

    database #as to allo# or the e'tension and incorporation

    o ne# modules and tools to support urther types o

    e'perimental data and #orklo#s!

    *ub/hem /ompound,

    *ub/hem is a database o chemicalmolecules and their 

    acti"ities against biological assays! The system ismaintained by the 

  • 8/9/2019 adv bi unit 1

    33/39

    ubstances,

    Bio1ssay,

    • Aub.hem .ompound + is a searchable database o 

    chemical structures #ith "alidated chemical depiction

    inormation pro"ided to describe substances in Aub.hem

    ubstance!

    tructures stored #ithin Aub.hem .ompounds are pre%clustered and cross%reerenced by identity and similarity

    groups!

    • Aub.hem .ompound includes o"er 7M compounds!

    • Molecular chemical properties, and

    descriptors!

    • imple Elemental earches +all compounds containing

    Gallium allo# searching #ith speciic element

    restrictions!

    #rugBan,

  • 8/9/2019 adv bi unit 1

    34/39

  • 8/9/2019 adv bi unit 1

    35/39

  • 8/9/2019 adv bi unit 1

    36/39

    • The database contains more than 54 million

    unique molecules rom o"er 74 data sources including:

    /!! &ood and Drug 1dministration +&D1,  

  • 8/9/2019 adv bi unit 1

    37/39

    #eight range, .1 numbers, suppliers, etc! The search

    can be used to #iden or restrict already ound results!

    • tructure searching on mobile de"ices can be done using

    ree apps or i=+iAhone>iAod>iAad(2H and or  

    the 1ndroid +operating system!()H

    and /ambridge Structural #atabase.

    • The /ambridge Structural #atabase +.D, is a

    repository or small molecule crystal structures!

    •   cientists use single%crystal '%ray crystallography to

    determine the crystal structure o a compound!

    • =nce the structure is sol"ed, inormation about the

    structure is sa"ed in a ile +.-& ormat and deposited in

    the .D!

    • =ther scientists can search and retrie"e structures rom

    the database!

    • The inormation consists o the space group symmetry o 

    the crystalline phase, its cell parameters, the relati"e

    atomic coordinates o all the atoms in the cell in 5D!

    http://en.wikipedia.org/wiki/Molecular_weighthttp://en.wikipedia.org/wiki/Chemical_Abstracts_Servicehttp://en.wikipedia.org/wiki/IOShttp://en.wikipedia.org/wiki/ChemSpider#cite_note-16http://en.wikipedia.org/wiki/Android_operating_systemhttp://en.wikipedia.org/wiki/ChemSpider#cite_note-17http://en.wikipedia.org/wiki/Moleculehttp://en.wikipedia.org/wiki/Crystal_structureshttp://en.wikipedia.org/wiki/X-ray_crystallographyhttp://en.wikipedia.org/wiki/Crystallographic_Information_Filehttp://en.wikipedia.org/wiki/Space_grouphttp://en.wikipedia.org/wiki/Lattice_constanthttp://en.wikipedia.org/wiki/Atomshttp://en.wikipedia.org/wiki/Molecular_weighthttp://en.wikipedia.org/wiki/Chemical_Abstracts_Servicehttp://en.wikipedia.org/wiki/IOShttp://en.wikipedia.org/wiki/ChemSpider#cite_note-16http://en.wikipedia.org/wiki/Android_operating_systemhttp://en.wikipedia.org/wiki/ChemSpider#cite_note-17http://en.wikipedia.org/wiki/Moleculehttp://en.wikipedia.org/wiki/Crystal_structureshttp://en.wikipedia.org/wiki/X-ray_crystallographyhttp://en.wikipedia.org/wiki/Crystallographic_Information_Filehttp://en.wikipedia.org/wiki/Space_grouphttp://en.wikipedia.org/wiki/Lattice_constanthttp://en.wikipedia.org/wiki/Atoms

  • 8/9/2019 adv bi unit 1

    38/39

    • cientists can use the .D to compare e'isting data #ith

    that obtained rom crystals gro#n in their laboratories!

    • The inormation can also be used to "isuali$e the

    structure in a "ariety o sot#are such

    as atoms, powdercell  etc!

    • -t is also possible to calculate #hat the

    theoretical po#der diraction pattern o the phase #ould

    look like! This option is particularly important or 

    analytical reasons because it acilitates the identiication

    o phases present in a crystalline po#der mi'ture #ithout

    the need or gro#ing crystals!

    • Many o the small molecules are organic compounds o 

    the sort that could potentially act as medical drugs, and a

    "ery important use o the .D is or structural

    comparisons among related molecules that can suggest

    ne# leads or drug design!

    • The .D is compiled and maintained by the .ambridge

    .rystallographic Data .entre!

    • Each crystal structure undergoes e'tensi"e "alidation and

    cross%checking by e'pert chemists and crystallographers

    http://en.wikipedia.org/wiki/Crystalshttp://en.wikipedia.org/wiki/Powder_diffractionhttp://en.wikipedia.org/wiki/Organic_compoundshttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Crystalshttp://en.wikipedia.org/wiki/Powder_diffractionhttp://en.wikipedia.org/wiki/Organic_compoundshttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centrehttp://en.wikipedia.org/wiki/Cambridge_Crystallographic_Data_Centre

  • 8/9/2019 adv bi unit 1

    39/39

    to ensure that the .D is maintained to the highest

     possible standards!

    • 1lso, each database entry is enriched #ith bibliographic,

    chemical and physical property inormation, adding

    urther "alue to the ra# structural data!

    • These editorial processes are "ital or enabling scientists

    to interpret structures in a chemically meaningul #ay!

    • The .D is continually updated #ith ne# structures

    +Q4,444 ne# structures each year and #ith

    impro"ements to e'isting entries!

    • @ith regular #eb%updates and early online access to

    ne#ly published structures you can keep ully inormed

    o the latest research!