Genomic Platform - iGAP
In Zettagene we believe that the analysis of genomic should be straightforward and efficient. We do not want our users to take care about security, performance, storage of the data or integrity of their computing environment.
After several research projects within genomic data field we came up with the solution that addresses the most important challenges that Genomics puts in front of the Computer Science. Solution that uses the latest advancements of Big Data technologies and Cloud computing, but can be deployed on premise or in the cloud.
Our design principles
We use standard well-known pipelines for automated processing of genomic data, but adjusted to distributed architecture for performance gains and enhanced maintainability
We optimize query algorithms to reflect the genomic data specifity and allow efficient data access. iGAP is designed to store a large number of genomic WES/WGS data and effective on-demand data reprocessing. It can be extended to store data sets for proteomics, metabolomics, transcriptomics.
Open Source technologies
iGAP is built with well-established Big Data technologies tailored together to provide open but consistent architecture. We are leveraging the power of Open Source tools to meet the enterprise IT requirements.
Common data model
Genomics cannot be only about files. We use defined data model to reach out to data coming from different sources and pipelines so you can work on them in one unified data environment.
Data Security & Governance
All-or-nothing data access is not enough! iGAP provides enterprise-ready data access mechanisms to restrict access on different levels of granularity, to even single variant. All activities on data can be tracked for audit purposes to provide GDPR compliance.
Flexible Data Access / Analytics ready
Analysis of patients data is the most crucial part of diagnostic process. It can require ad-hoc access to specific data, comparison of sample with the larger database or unsupervised searches that combine sets of variants, sets of genomic intervals and phenotypic information. We provide JupiterLab interface and possibility to perform your own analysis using external tools you like.
Automated Secondary and Tertiary analysis
iGAP is using standard bioinformatics pipelines designed for processing NGS data (e.g. bcbio) prepared for being used in highly distributed environment built on top of Apache Spark. This approach leads to significant performance gains that we have described in one of our research papers. All processing workflows are automated and task execution can be monitored from one single place. Open architecture of iGAP allows thorough customization of tasks.
Optimized genomic operations
Regular databases and query algorithms were not designed to handle genomic data, especially at scale. iGAP includes our analytics engine, SeQuiLa, that provides an efficient way to query and process genomic intervals. This allows to perform depth of coverage analysis and quality control checks on the large amount of samples.
Secured by design
Genomic data are the most personal information ever. iGAP brings enterprise level security to research platforms - data can be secured on a single variant level with complex rules on who and how can access these information. With iGAP you can monitor and manage comprehensive data security across the whole platform. Data Governance features allows to monitor how and where the data is being used - to meet compliance requirements. With advanced metadata management and governance capabilities you can classify and govern your data and provide collaboration capabilities around data for data scientists, analysts, clinicians and the data governance team.