January 19, 2021 / by Rafał Małanij
Our Team is heavily involved in academia and teaching Big Data and Bioinformatics courses. This year we were also involved in “Omics Data Science - Bioinformatics and Large Scale Medical Data Analysis” course for non-genetists about analysing genomics data. It is organised by “Institute of Mother and Child” together with “ICM UW”.
Just like a year ago it was a set of 2-day sessions which covered computational methods of analysing omics data. We have delivered a workshop about genomic pipelines and data analysis. This time we have switched from on-premise Hadoop cluster and Jupyter-based analytics workbench to fully cloud-native environment built on Google Cloud. We had dockerised pipelines running on Google Kubernetes Engine. Every student had access to analyst workbench based on Jupyter and was able to work on files provided through object storage. There was also a chance to check on our SeQuiLa tools
Actually redesigning the whole platform to be served in a cloud on managed Kubernetes was a bit of a challenge. It turned out that not all tools widely used in Bioinformatics and Genomics are ready for cloud native architectures. However right now we have an end-to-end pipelines with analytics layer available in the cloud that we can use for future workshops. If you are organising workshops around omics, let us know. I am sure your students will be extremely interested in learning how to analyse thousands of genomes in a “Data Science” way.