Our research
Check the list of our publications
SeQuiLa: an elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals
Abstract:
Efficient processing of large-scale genomic datasets has recently become possible due to the application of ‘big data’ technologies in bioinformatics pipelines. We present SeQuiLa—a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing.The project is available at http://biodatageeks.org/sequila/. Supplementary data are available at Bioinformatics online.
Downloads:
BibTeX:
{%raw%}@article{10.1093/bioinformatics/bty940,
author = {Szmurło, Agnieszka and Wiewiórka, Marek and Gambin, Tomasz and Leśniewska, Anna and Stępień, Kacper and Borowiak, Mateusz and Okoniewski, Michał},
title = {SeQuiLa: an elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals},
year = {2018},
month = nov,
doi = {10.1093/bioinformatics/bty940},
url = {https://doi.org/10.1093/bioinformatics/bty940},
eprint = {http://oup.prod.sis.lan/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/bty940/27037165/bty940.pdf},
public = {yes}
}
{%endraw%}