In some cases, such as gene annotation, the presence of a gene model is simply identified by means of size of the sequence. While for other purposes, such as family sequence (i.e. detection of gene family) or multi-genes analysis, compression may be required. To our knowledge, there is no tool available to compress a whole genome sequence as commonly executed at the level of coding sequences. To fill this gap, we wrote compress to compress whole human genome at the level of coding sequences.
Here we report the use of compress in different ways, generating annotation, clustering and phylogenetic trees from coding sequences. https://lanesida.com/wp-content/uploads/2022/06/kendreg.pdf