Geneweaver Data Model
GeneWeaver utilizes a relational normalized data model to store both user data, and external sources data. The database is designed to be flexible and extensible, and to allow for the addition of new data types and analysis tools without requiring changes to the data model.
On a high level the data model uses three schemas to organize the types of data that are stored in the database. The schemas are:
production
: Geneweaver Application Dataodestatic
: Static Dataextsrc
: External Sources Data
This page discusses the concepts and structure of the data model in detail, but is not intended to be used as a reference for the database and data model. For example, this page does not use the actual database table and column names, but instead uses full descriptive name of the entities and their relationships.
Tip
For a complete reference of the Geneweaver data model, see the data model reference page.
Production Schema
The production
schema is the primary schema used to store user data. The schema's
central entity is the geneset
🧬+📂. The schema contains tables & relationships for
user data, but external source and static data relationships utilize tables in the
odestatic
and extsrc
schemas.
erDiagram
GENESET }o--o| PUBLICATION : hasA
GENESET }o--|| USER : ownedBy
GENESET }o--o{ PROJECT: containedIn
ODEStatic Schema
The odestatic
schema contains tables for static data, such as species, gene databases,
and geneset tier. The schema is used to store data that is not expected to change, and
is used to provide a reference for the production
schema.
The following diagram shows how the geneset
🧬+📂 entity is related to the odestatic
schema entities: species
and tier
.
erDiagram
SPECIES }o--|| GENE_DB : usedBy
SPECIES ||--o{ GENESET : usedBy
GENESET }o--|| TIER : isOfA
The odestatic
schema also contains tables that are used for internal tracking
and configuration. Above, the gene_db
entity for the platform
, tool
, and
attribution
entities. These entities are used internally by the system to track
information about enabled analysis tools, microarray expression platforms, and data
sources.
erDiagram
PLATFORM
TOOL
ATTRIBUTION
Extsrc Schema
The extsrc
schema contains tables for external sources data, this is where the
magic 🪄 happens.
Fundamentally, the gene 🧬 to geneset 🧬+📂 association is a many-to-many association.
A geneset can contain many genes, and a gene can be associated with many genesets. To
represent this relationship, the association is stored in an
associative table, which we call
geneset_value
.
The gene
entity is a
polymorphic entity that
can be associated with multiple external sources, which are represented by the gene_db
entity.
The following diagram shows how the geneset
🧬+📂 entity is related to the extsrc
schema entities: geneset_value
, gene
, and gene_db
.
erDiagram
GENESET_VALUE }o--|| GENESET : isOfA
GENESET_VALUE }o--|| GENE : isOfA
GENE ||--o{ GENE_DB : isOfA
Microarray Expression Data
Geneweaver also supports microarray expression data. Due to its complexity, this document does not cover the data model that supports this feature.
For more information on microarray expression data, see the Data Model reference page.