KeBABS - An R Package for Kernel-Based Analysis of Biological Sequences

The kebabs package provides functionality for kernel based analysis of biological sequences via Support Vector Machine (SVM) based methods. Biological sequences include DNA, RNA, and amino acid (AA) sequences. Sequence kernels define similarity measures between sequences. The package implements some of the most important kernels for sequence analysis in a very flexible and efficient way and extends the standard position-independent functionality of these kernels in a novel way to take the position of patterns in the sequences into account for the similarity measure.

Installation

The R package kebabs is available from Bioconductor. The first version of the package has been released as part of Bioconductor 3.0 on October 14, 2014. The current release version is 1.6.0 (released on May 4, 2016, as part of Bioconductor 3.3). To install kebabs, follow the simple standard procedure for installing Bioconductor packages, i.e. enter the following into your R session:
source("http://www.bioconductor.org/biocLite.R")
biocLite("kebabs")
Please note that Bioconductor 3.3 requires R version 3.3.0.

The current development version of the package is 1.7.0.

Documentation

  1. User Manual: PDF
  2. Reference Manual: PDF

Getting started

  1. To load the package, enter "library(kebabs)" in your R session.
  2. To view the user manual, enter "vignette("kebabs")".
  3. To do a first example, enter "example(kebabs)".

Citing this package

If you use this package for research that is published later, you are kindly asked to cite it as follows:

J. Palme, S. Hochreiter, and U. Bodenhofer (2015). KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics 31(15):2574-2576. DOI: 10.1093/bioinformatics/btv176.

R source code for example on epitope-to-MHC binding: A0201-Example.zip (3.7 KB; see file README.txt for more information)

Change log

Version 1.6.0:
release as part of Bioconductor 3.3
Version 1.5.4:
  • importing apcluster package for avoiding method clashes
  • improved and completed change history in inst/NEWS and package vignette
Version 1.5.3:
  • correction in prediction via feature weights for very large sparse explicit representation
  • adaption of vignette template
  • vignette engine changed from Sweave to knitr
Version 1.5.2:
  • correction in distance weights for mixed distance weighted spectrum and gappy pair kernel
  • allow featureWeights as numeric vector for method getPredictionProfile
  • correction for plot of single prediction profile without legend
  • change of copyright note
  • namespace fixes
Version 1.5.1:
  • new method to compute prediction profiles from models trained with mixture kernels
  • correction for position specific kernel with offsets
  • corrections for prediction profile of motif kernel
  • additional hint on help page of kbsvm
Version 1.5.0:
devel branch created from version 1.4.0
Version 1.4.0:
release as part of Bioconductor 3.2
Version 1.3.4:
  • correction of Ubuntu problem with realloc for 0 elements in linearKernel generating a sparse empty kernel matrix
  • correction of problem with feature weights and prediction profiles for position specific gappy pair kernel
  • orrection of problem with feature weights and prediction profiles for position specific motif kernel
  • corrections for feature weights, prediction via feature weights and prediction profile for distance weighted kernels
  • update of KeBABS citation
Version 1.3.3:
  • new export kebabsCollectInfo for collection of package info
  • update of version dependency to Biostrings, XVector, S4Vector
  • correction for leading + or - in factor label
  • change of bibtex style sheet in vignette to plainnat.bst
Version 1.3.2:
  • correction of error in kernel lists
  • user defined sequence kernel example SpectrumKernlabKernel moved to separate directory
Version 1.3.1:
  • correction of error in model selection for processing via dense LIBSVM
  • remove problem in check for loading of SparseM
Version 1.3.0:
devel branch created from version 1.2.0
Version 1.2.0:
release as part of Bioconductor 3.1
Version 1.1.9:
  • inclusion of dense LIBSVM 3.20 for dense kernel matrix support to provide a reliable way for training with kernel matrices
  • new accessors folds and performance for CrossValidationResult
  • removed fold performance from show of CV result
  • adaptions for user defined sequence kernel with new export isUserDefined, example in inst/examples/UserDefinedKernel
  • correction of errors with position offset for position specific kernels
  • computation of AUC via trapezoidal rule
  • changes for auto mode in CV, grid search, model selection
  • check for non-negative mixing coefficients in spectrum and gappy pair kernel
  • build warnings on Windows removed
  • added definition of performance parameters for binary and multiclass classification to vignette
  • update of citation file and reference section in help pages
Version 1.1.8:
  • new accessors selGridRow, selGridCol and fullModel for class ModelSelectionResult
  • change of naming of feature weights because of change in LiblineaR 1.94-2
  • GCC warnings in Linux removed
Version 1.1.7:
  • change in LiblineaR - upgrade to LIBLINEAR 1.94 in function LiblineaR the parameter labels was renamed to target
  • correction in model selection for performance parameters
  • error correction of vector length overflow in sparse explicit representation for very large number of sequences in spectrum, gappy pair and motif kernel
  • error correction for AUC in cross validation
  • minor changes in help pages
  • minor changes in vignette
Version 1.1.6:
  • error correction for training with position specific kernel and computation of feature weights
  • error correction in coercion of kernel to character for distance weighting
  • error correction in spectrum, gappy pair and motif kernel for kernel matrix - last feature was missing in kernel value in rare situations
  • correction of Windows build problem in linearKernel
  • build warnings on Windows removed
  • minor changes in help pages
  • minor changes in vignette
Version 1.1.5
  • new method heatmap to display heatmap of prediction profiles
  • extension of function linearKernel to optionally return a sparse kernel matrix
  • correction of computation of feature weights for LiblineaR with more than 3 classes
  • new accessor SVindex for class KBModel
  • correction in subsetting of sparse explicit representation for head / tail
  • error correction in subsetting of prediction profile
  • error correction in mismatch kernel
  • heck uniqueness of motifs in motif kernel
  • minor changes in help pages
  • change name of vignette Rnw to lowercase
  • minor changes in vignette
Version 1.1.4:
  • added two help pages
Version 1.1.3:
  • fix to adapt for changed Biostrings/S4Vectors API
Version 1.1.2:
  • minor C code changes for mismatch kernel
  • correction of MCC
  • new class ROCData and new function computeROCandAUC for binary classification added
  • new plot function for ROCData to plot ROC for binary classes
  • AUC as additional performance parameter in cross validation and as performance objective in grid search
Version 1.1.1:
  • correction for cross validation with factor label
  • correction for storing prob model in kebabs model for kernlab
  • removal of clang warnings for unused functions
Version 1.1.0:
devel branch created from version 1.0.00
Version 1.0.0:
first official release as part of Bioconductor 3.0

Contact

For suggestions, bug reports, and other matters regarding the package, please contact kebabs@bioinf.jku.at.