December 10, 2018 / by Agnieszka Szmurło
Recently we were working on improving performance of available CNV callers by proper selection of reference sample set. We were aiming at choosing the most similar samples to the investigated one and we accomplished this using clustering based approach.
We have evaluated both kNN and k-means clustering methods. The results show that they both improve the performance of CNV callers in comparison to choosing whole sample set as reference. However the k-means method has much less computational complexity and we suggest that it should be the preferred way of preprocessing reference data.
The method overview is shown below: