Data & Tools

SPOKE is populated with several openly available data sources and tools, and efforts to incorporate more information are ongoing


SPOKE Nodes & Edges

See current list of Nodes & Edges

SPOKE (Scalable Precision Medicine Open Knowledge Engine) is a very large network containing multiple types of biological data. Pooling such diverse data into a single knowledge environment allows identifying new connections, with implications for biomedical applications like personalized medicine: suggesting which drugs may be effective for a specific patient. An earlier version of the network was used to suggest new uses for existing drugs (Himmelstein, 2017).

SPOKE is a heterogeneous network, meaning that different nodes (points) within the network can represent different types of data. The edges between pairs of nodes represent known connections. Paths that follow a series of edges may connect nodes not previously known to be related.

Examples of SPOKE resources

See Full List of Resources 

Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced from multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data).

BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules.


ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties.

Catalog Of Somatic Mutations In Cancer (COSMIC)
COSMIC is a catalog to somatic mutations in human cancer.


CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. 

ClinicalTrials is a database of privately and publicly funded clinical studies conducted around the world.

Disease Ontology

The Disease Ontology semantically integrates disease and medical vocabularies through extensive cross mapping of DO terms to MeSH, ICD, NCI’s thesaurus, SNOMED and OMIM.


DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. 


DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases.

DistiLD database aims allows you toquery and visualize disease-associated SNPs and genes in their chromosomal context.


Disease Ontology Annotation Framework (DOAF) comprises a collection of disease-gene mappings between disease ontology and gene.

DrugBank 4.2

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.


Drug efficacy targets, indications, and pharmacologic class.

Entrez Gene

Databases of molecular data on the NCBI Web site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. 

Evolutionary Rate Covariation

ERC measures correlated rates across a phylogeny, allowing for extraction of genes with similar evolutionary histories.

Biggest and most comprehensive database for food constituents, chemistry and biology

Gene Ontology

The Gene Ontology (GO) project is a major bioinformatics initiative to develop a computational representation of our evolving knowledge of how genes encode biological functions at the molecular, cellular and tissue system levels.

Genetics Home Reference (GHR)
Genetics Home Reference provides info on effects of genetic variation

Genomics of Drug Sensitivity in Cancer
GDSC contains drug response data and genomic markers of sensitivity.

GWAS Catalog

Catalog of published genome-wide association studies.


This repository hosts data for the disease-associated genes project on

Human Interactome Database

A reference of binary protein-protein interactions generated by systematically interrogating all pairwise combinations of predicted gene products in defined search spaces using proteome-scale technologies.

Incomplete Interactome

iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB and MPPI.


The LINCS L1000 dataset is a comprehensive resource for gene expression changes observed in human cell lines perturbed with small molecules and genetic constructs. The L1000 experiments systematically measure the changes in gene expression after small molecule exposure, gene knockdown by RNAi, and gene overexpression.


Medical Subject Headings (MeSH) is the NLM's curated medical vocabulary resource, providing a hierarchically-organized terminology for indexing and cataloging of biomedical information such as MEDLINE/PUBmed and other NLM databases. 

Online Mendelian Inheritance in Man
OMIM is a catalog of human genes and genetic disorders.

Pathway Commons 
Pathway Commons integrates a number of pathway and molecular interaction databases supporting BioPAX and PSI-MI formats into one large BioPAX model, which can be queried using our web API (documented below).

Pathway Interaction Database

The Pathway Interaction Database is a highly-structured, curated collection ofinformation about known bio-molecular interactions and key cellular processes assembled into signaling pathways.

PharmacotherapyDB is a catalog of medical indications between small molecule compounds and complex human diseases.


REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database.


SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations.


STAR provides a powerful search engine across samples, experiments, and attributes from GEO in order to Search, Tag, Analyze & Resource.

STRING is a protein-protein interaction network of functional enrichment analysis.


TISSUES is a weekly updated web resource that integrates evidence on tissue expression from manually curated literature, proteomics and transcriptomics screens, and automatic text mining. 


Uberon is an integrated cross-species ontology covering anatomical structures in animals.


UniProt is a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.


WikiPathways is a database of biological pathways maintained by and for the scientific community.