This list presents all the known SPARQL endpoints avaiable in the Bioinformatics community. Feel free to add anyone you know of.
More about Best Life Science SPARQL endpoint of All Time:
Best Life Science SPARQL endpoint of All Time is a public top list created by Listnerd on rankly.com on November 27th 2012. Items on the Best Life Science SPARQL endpoint of All Time top list are added by the rankly.com community and ranked using our secret ranking sauce. Best Life Science SPARQL endpoint of All Time has gotten 320 views and has gathered 93 votes from 93 voters. O O
Best Life Science SPARQL endpoint of All Time is a top list in the General category on rankly.com. Are you a fan of General or Best Life Science SPARQL endpoint of All Time? Explore more top 100 lists about General on rankly.com or participate in ranking the stuff already on the all time Best Life Science SPARQL endpoint of All Time top list below.
If you're not a member of rankly.com, you should consider becoming one. Registration is fast, free and easy. At rankly.com, we aim to give you the best of everything - including stuff like the Best Life Science SPARQL endpoint of All Time list.
Get your friends to vote! Spread this URL or share:
FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats. Information in FlyBase originates from a variety of sources ranging from large-scale genome projects to the primary research literature. These data types include mutant phenotypes,molecular characterization of mutant alleles and other deviations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. Query tools allow navigation of FlyBase through DNA or protein sequence, by gene or mutant name, or through terms from the several ontologies used to capture functional, phenotypic, and anatomical data. The database offers several different query tools in order to provide efficient access to the data available and facilitate the discovery of significant relationships within the database. Links between FlyBase and external databases, such as BDBG or modENCODE, provide
LinkedLifeData is a platform for semantic data integration trough RDF warehousing and efficient reasoning that helps to resolve conflicts in the data. One of the major problems that biotechnology and pharmaceutical industries face today is how to combine data from multiple sources and make their research more productive. Data integration takes much time and often leads to errors and redundancies that require more time and resources to resolve. The typical problems in working with biomedical data sources are that information is:
* Supported by different organizations
* Highly distributed and redundant
* Encoded in different syntax and structural formats with special semantics for each data source
* Locked in vast data silos accessible with limited query functionality
LinkedLifeData is a data warehouse that syndicates tons of heterogeneous biomedical knowledge in a common data model. The platform uses an extension of the RDF model that is able to track the provenance of each individual fact in the repository and thus update the information.
SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations.
The Swiss-Prot, TrEMBL, and PIR protein database activities have united to form the Universal Protein Resource (UniProt), which provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKBSwiss-Prot section and the automatically annotated UniProtKBTrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations, and literature-based evidence attribution enable scientists to analyze proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90), or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. The UniProt databases continue to grow in size and in availability of information. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division, and complete proteomes. A bibliography mapping service has been added, and an ID mapping service is available.
The Biogateway is an integrated systems offering an interface, via SPARQL, to the entire set of the OBO foundry candidate ontologies, the whole set of GOA files, SwissProt, the NCBI taxonomy as well as in-house ontologies. The BioGateway provides a single entry point for exploiting these ontologies and constitutes a step towards a semantic web integration for biological data.
The BioGateway aims to support Systems Biology approaches by combining semantic web technologies which in turn enable data-driven research. The semantic web approach that has been taken enhances data exchange and integration by providing a standardized mechanism for interrogating such system.
Candidate OBO foundry ontologies
Gene Ontology Annotations (GOA)
The BioGateway system can be explored using the SPARQL browser. With this browser, SPARQL results can be visually seen as a network of resources.
Chemical Entities of Biological Interest, also known as ChEBI, is a database and ontology of molecular entities focused on 'small' chemical compounds, that is part of the Open Biomedical Ontologies effort. The term "molecular entity" refers to any "constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity". The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms. Molecules directly encoded by the genome, such as nucleic acids, proteins and peptides derived from proteins by proteolytic cleavage, are not as a rule included in ChEBI.
ChEBI uses nomenclature, symbolism and terminology endorsed by the International Union of Pure and Applied Chemistry (IUPAC) and Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB).
All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original
myExperiment is a social web site for researchers sharing Research Objects such as Scientific Workflows.
The myExperiment website was launched in November 2007 and contains a significant collection of scientific workflows for a variety of workflow systems, most notably Taverna, but also other tools such as Bioclipse. myExperiment has a REST API and is based on an open source Ruby on Rails codebase. It supports Linked data and has a SPARQL Endpoint, with an interactive tutorial.
The myExperiment project is directed by David De Roure at University of Oxford and is one of the activities of the myGrid consortium led by Carole Goble of The University of Manchester, UK and of the e-Research South UK regional consortium led by the Oxford e-Research Centre. It was originally funded by JISC under the Virtual Research Environment programme and by the Microsoft Technical Computing Initiative. myExperiment is being enhanced by the workflows for ever project (Wf4Ever) which aims to provide new features to support the preservation of Research Objects in conjunction with the dLibra digital library framework.
The NeuroCommons project seeks to make all scientific research materials - research articles, knowledge bases, research data, physical materials - as available and as usable as they can be. We do this by fostering practices that render information in a form that promotes uniform access by computational agents - sometimes called "interoperability". We want knowledge sources to combine easily and meaningfully, enabling semantically precise queries that span multiple information sources.
Our work covers general data and knowledge sources used in computational biology as well as sources specific to neuroscience and neuromedicine. The practices that we develop and promote are designed to play well on the Semantic Web. We view our technical work not as creating a new service or content library, although we do both, but rather as helping to promote semantically linked scientific information and of generic practices that lead to such a "commons".
Live Virtuoso instance hosting Linked Open Data (LOD) Cloud
We have reached a beachead re. the Virtuoso instance hosting the Linked Open Data (LOD) Cloud; meaning, we are not going to be performing any major updates and deletions short-term, bar incorporation of fresh data sets from the Freebase and Bio2RDF projects (both communities a prepping new RDF data sets).
At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums.
What does this mean?
You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks:
Find entities associated with full text search patterns -- Google Style, but with Entity & Text proximity Rank instead of Page Rank, since we are dealing with Entities rather than documents about entities
Find and Lookup entities by Identifier (URI) -- which is helpful when locating URIs to use for identify entities in your own linked data spaces on the Web
View entity descriptions via a variety of representation formats (HTML, RDFa, RDF/XML, N3, Turtle etc.)
Determine uses of entity identifiers across the LOD cloud -- which helps you select preferred URIs based on usage statistics.
Affymetrix' GeneChip® technology was invented in the late 1980's by a
team of scientists led by Stephen P.A. Fodor, Ph.D. The theory behind
their work was revolutionary - a notion that semiconductor
manufacturing techniques could be united with advances in combinatorial
chemistry to build vast amounts of biological data on a small glass
chip. This technology became the basis of a new company, Affymetrix,
formed as a division of Affymax, N.V. in 1991. Affymetrix began
operating independently in 1992.
DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets. DBpedia has been described by Tim Berners-Lee as one of the more famous parts of the Linked Data project.
The project was started by people at the Free University of Berlin and the University of Leipzig, in collaboration with OpenLink Software, and the first publicly available dataset was published in 2007. It is made available under free licences, allowing others to reuse the dataset.
Wikipedia articles consist mostly of free text, but also include structured information embedded in the articles, such as "infobox" tables, categorisation information, images, geo-coordinates and links to external Web pages. This structured information is extracted and put in a uniform dataset which can be queried.
As of September 2011, the DBpedia dataset describes more than 3.64 million things, out of which 1.83 million are classified in a consistent ontology, including
BioCyc is a collection of 371 Pathway/Genome Databases. Each
Pathway/Genome Database in the BioCyc collection describes the genome
and metabolic pathways of a single organism, with the exception of the MetaCyc database, which is a reference source on metabolic pathways from many organisms.
To learn more about BioCyc, read the Introduction to BioCyc or watch our free online instructional videos.
The BioCyc databases are divided into three tiers, based on their quality.
BioCyc Tier 1: Intensively Curated Databases
EcoCyc Escherichia coli K-12 MetaCyc Metabolic pathways and enzymes from more than 900 organisms
The BioCyc Open Chemical Database is also an intensively
curated database. It is an open database of chemical compounds from other BioCyc databases.
Because it contains chemical compounds only, it is not a Pathway/Genome Database.
BioCyc Tier 2: Computationally-Derived Databases Subject to Moderate Curation
20 databases are available.
[list of tier 2 DBs]
BioCyc Tier 3: Computationally-Derived Databases Subject to No Curation
349 databases are available and ready for adoption
by interested scientists for curation and updating.
PGDBs in Tier 3 were produced as a collaboration
between the groups of Peter D. Karp at SRI International and
Christos Ouzounis at the European Bioinformatics Institute.
[list of tier 3 DBs]
FlyTED, the Drosophila Testis Gene Expression Database, is a public database currently containing 2,762 mRNA in situ images and ancillary data revealing the extent of expression of 817 individual genes involved in spermatogenesis in the testis of the fruitfly, Drosophila melanogaster, both in normal wild type flies and in seven meiotic arrest mutant strains. The latest release of FlyTED contains 2,762 images. An explanatory diagram showing the stages of spermatogenesis in the fruit fly can be viewed here.
This dataset was generated between 2003 and 2008 by Elizabeth Benson, Elin Gudmannsdottir and Helen White-Cooper in the Department of Zoology at the University of Oxford, with funding from the UK's BBSRC. The FlyTED Database was developed by the Image Bioinformatics Research Group in the Department of Zoology at the University of Oxford (IBRG), with funding from the UK's BBSRC.
CardioSHARE is a unique framework for querying distributed data and performing data analysis using Semantic Web standards. CardioSHARE's two main innovations are an enhancement to a standard SPARQL query engine, which enables the required data to be retrieved dynamically from Web Services; and the ability to use OWL class restrictions to drive the discovery and execution of Web Services capable of generating that class' defining properties, thus allowing naiive data to be "lifted" into more complex OWL classifications. Both of these behaviours are accomplished by mapping predicates onto Web Services capable of producing RDF data that satisfy those predicates. Our initial focus has been on integration with the BioMoby project: a set of 1500+ interoperable bioinformatics web services. CardioSHARE effectively brings this established pool of resources into conformance with Semantic Web standards. Given that much of the data from CardioSHARE is generated dynamically based on analysis of incoming query data, the effective size of the "virtual" triplestore is un-measurable; limited only by the number of conceivable inputs.
Observe how genes interact in dynamic graphical models. Our online
maps depict molecular relationships from areas of active research. In
an "open source" approach, this community-fed forum constantly
integrates emerging proteomic information from the scientific
community. It also catalogs and summarizes important resources
providing information for over 120,000 genes from multiple species.
Find both classical pathways as well as current suggestions for new
Recent advance in high throughput technique has generated biological data in myriad volumes, which simultaneously contributes to a newly emerged discipline -- system biology, which adopts comprehensive approach to study biological systems. Chemogenomics, as an integrated part of system biology, studies the impact of small molecules towards biological systems and carries datum description about interaction among chemical entities and protein molecules. The integration between chemical informatics and bioinformatics within the realm of system biology leads to a new synergetic subject, namely systems chemical biology(ref).
However, the current de facto of chemical and biological data distribution impedes the growth of systems chemical biology due to heterogeneous formats used. This project is dedicated to address such challenges using existing semantic web technology, in particular bio2rdf, Linking open drug data. Beyond the generic scopes of these two initiatives, we are also planning to incorporate new semantic clauses to embed the core interests of system chemical biology, for instance chemical structural similarity and biological sequence similarity. Figure 1 shows the overall scope of systems chemical biology.
The goal of this effort is to demonstrate the ability of RDF Gateway to efficiently store and query massive amounts of RDF data in its native RDF repository. The Uniprot RDF project provides all UniProt protein sequence and annotation data in RDF format and is an excellent large source of data.
The Bio2RDF project is a tool to convert bioinformatics data and knowledge bases to RDF format. It is a kind of generalized rdfizer for bioinformatics applications, and it is a place for the semantic web life science community to develop and grow.
BioPortal SPARQL is a service to query BioMedical ontologies using the SPARQL standard. Ontologies have been transformed into RDF triples from their original formats (OWL, OBO and UMLS/RRF, ...) and asserted into a triple store. This service provides programatic access to that triple store.
Chempedia is a free service for uniquely identifying and naming chemical substances. If you or your organization work with chemical substances and would like a convenient way to keep track of them in spreadsheets, wikis, web pages and other databases, Chempedia can help. If you just have a substance name, you can use Chempedia to find what's known about it.
Community Created and Reviewed
Chempedia is created and maintained by volunteers worldwide. We take quality very seriously. That's why all content is subjected to a streamlined form of peer-review that borrows from the best practices of modern social media.
Chempedia is as much about the people using it as the data it contains. Interested in knowing who submitted or named a substance? We make it easy to find other chemists likely to share your interests.
Free to Use and Adapt
We think information created by the community belongs to the community. That's why all information contained in Chempedia is free to download, use, and adapt.
BioPAX is a collaborative effort to create a data exchange format for biological pathway data. Get involved...
BioPAX Level 3 covers metabolic pathways, molecular interactions, signaling pathways (including molecular states and generics), gene regulation and genetic interactions. BioPAX Level 3 is currently under development and review by pathway databases and is scheduled for release by mid-2008.
The ChEMBL team's research focuses on mapping the interactions and functional effects of small molecules binding to their macromolecular targets.
The group studies the interactions of pharmacologically active molecules and their receptors. In particular the group builds and maintains a series of drug discovery databases that are components of ChEMBL.
This dataset was generated by Venkat Chintapalli, Jing Wang & Julian Dow at the University of Glasgow with funding from the UK's BBSRC.
It gives you a quick answer to the question: where is my gene of interest expressed/enriched in the adult fly?
For each gene & tissue, you're given the mRNA SIGNAL (how abundant the mRNA is), the mRNA ENRICHMENT (compared to whole flies), and the Affymetrix PRESENT CALL (out of 4 arrays, how many times it was detectably expressed).
Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
LinkedCT is a Linked Data source of clinical trials available at ClinicalTrials.gov, a registry of federally and privately supported clinical trials conducted in the United States and around the world.
ClinicalTrials.gov give you information about a trial's purpose, who may participate, locations, and phone numbers for more details. This information should be used in conjunction with advice from health care professionals.
The LinkedCT data space is published according to the principles of publishing Linked Data. Each entity in LinkedCT is identified by a unique HTTP dereferenceable Uniform Resource Identifier (URI). When the URI is looked up, related RDF statements about the entity is returned in HTML or RDF/XML based on the user’s agent. Moreover, a SPARQL endpoint is provided as the standard access method for RDF data.
The DrugBank database, available at the University of Alberta, is a bioinformatics and cheminformatics resource that combines detailed drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, pathway) information. The database contains nearly 4800 drug entries including:
More than 2500 protein (i.e., drug target, non-redundant) sequences are linked to these drug entries.
Each DrugCard entry contains 148 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.
It is maintained by David Wishart and Craig Knox.
Users may query DrugBank in a number of ways:
The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to:
The GO is part of a larger classification effort, the Open Biomedical Ontologies (OBO).
There is no universal standard terminology in biology and related domains, and term usages may be specific to a species, research area or even a particular research group. This makes communication and sharing of data more difficult. The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:
Each GO term within the ontology has a term name, which may be a word or string of words; a unique alphanumeric identifier; a definition with cited sources; and a namespace indicating the domain to which it belongs. Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related; references to equivalent concepts in other databases; and comments on term meaning or usage. The GO ontology is structured as a directed acyclic graph, and each term has defined relationships to one or more other
The Berkeley Drosophila Genome Project (BDGP) is a consortium of the Drosophila Genome Center (funded by the National Human Genome Research Institute, National Cancer Institute, and the Department of Energy), and the Howard Hughes Medical Institute (through its support of work in the Gerald Rubin, Allan Spradling, Roger Hoskins, Hugo Bellen, Susan Celniker, and Gary Karpen laboratories). The Drosophila Heterochromatin Genome Project is part of th Drosophila Genome Center (DGC) located at the Lawrence Berkeley National Labaratory in Berkeley, CA. The DHGP is also funded by the National Human Genome Research Institute.
PubChem provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. If you would like to learn more about how to use the PubChem resources, please go to our help page.
DailyMed provides high quality information about marketed drugs. This information includes FDA labels (package inserts). This Web site provides health information providers and the public with a standard, comprehensive, up-to-date, look-up and download resource of medication content and labeling as found in medication package inserts. The National Library of Medicine (NLM) provides this as a public service and does not accept advertisements.
STITCH is a resource to explore known and predicted interactions of chemicals and proteins. Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature.
STITCH contains interactions for over 74,000 small molecules and over 2.5 million proteins in 630 organisms.