Application of high-throughput sequencing of T and B lymphocyte antigen receptors has great potential for improving the monitoring of lymphoid malignancies, assessing immune reconstitution after hematopoietic stem cell transplantation, and characterizing the composition of lymphocyte repertoires (Warren, E. H. et al. Blood 2013;122:19–22). LymhoSeq is an R package designed to import, analyze, and visualize antigen receptor sequencing from Adaptive Biotechnologies’ ImmunoSEQ assay. The package is also adaptable to the analysis of T and B cell receptor sequencing processed using other platforms such as MiXCR or IMGT/HighV-QUEST. This vignette has been written to highlight some of the features of LymphoSeq and guide the user through a typical workflow.

Importing data

The LymphoSeq function readImmunoSeq imports tab-separated value (.tsv) files exported by Adaptive Biotechnologies ImmunoSEQ analyzer v2 where each row represents a unique sequence and each column is a variable with information about that sequence such as read count, frequency, or variable gene name. Note that the file format for ImmunoSEQ analyzer v3 is not yet supported and users must choose to export the v2 format from the analyzer software. Only files with the extension .tsv are imported while all other are disregarded. It is possible to import files processed using other platforms as long as the files are tab-delimited, are given the extension .tsv and have identical column names as the ImmunoSEQs files (see readImmunoSeq manual for a list of column names used by this file type). Refer to the LymphoSeq manual regarding the required column names used by each function.

To explore the features of LymphoSeq, this package includes 2 example data sets. The first is a data set of T cell receptor beta (TCRB) sequencing from 10 blood samples acquired serially from a single patient who underwent a bone marrow transplant (Kanakry, C.G., et al. JCI Insight 2016;1(5):pii: e86252). The second, is a data set of B cell receptor immunoglobulin heavy (IGH) chain sequencing from Burkitt lymphoma tumor biopsies acquired from 10 different individuals (Lombardo, K.A., et al. Blood Advances 2017 1:535-544). To improve performance, both data sets contain only the top 1,000 most frequent sequences. The complete data sets are publicly available through Adapatives’ immuneACCESS portal. As shown in the example below, you can specify the path to the example data sets using the command system.file("extdata", "TCRB_sequencing", package = "LymphoSeq") for the TCRB files and system.file("extdata", "I