Skip to contents

Different functions were developed to filter out or extract samples or sequences within a RepSeqExperiment object.

Filtering

Sequence filtering

Based on sequence occurrence

filterCount() filters out sequences, at any level, having a count below a chosen threshold. The function returns a RepSeqExperiment object that can be used to perform in-depth analyses on the remaining dataset.

RepSeqData_filtered <- filterCount(x = RepSeqData,
                                   n = 1,
                                   level = "aaClone")

The parameter group allows the selection of a group of samples on which the filtering must be applied. These samples should all belong to a particular group in the metaData slot. In this the following case, we’re filtering out aaClones with a count of 1 in all amTreg samples.

RepSeqData_filtered_amTreg <- filterCount(x = RepSeqData,
                                          n = 1,
                                          level = "aaClone",
                                          group = c("cell_subset", "amTreg"))

Based on sequence name

filterSequence() filters out specific sequences in all or a in group of samples.

RepSeqData_filtered <- filterSequence(x = RepSeqData,
                                      level = "aaClone",
                                      name = "TRAV11 CVVGDRGSALGRLHF TRAJ18",
                                      group = c("cell_subset" , "Teff"))

Sample filtering

dropSamples() offers the possibility to filter out one or multiple repertoires by specifying their corresponding sample_id. These repertoires can be ones identified, for instance, as outliers during the exploratory analysis

RepSeqData_drop <- dropSamples(x = RepSeqData,
                              sampleNames=c("tripod-30-813", "tripod-30-815"))

Data selection

Based on sequence sharing

getPublic() allows to subset a RepSeqExperiment object in order to extract sequences that are either shared by:

  • samples belonging to a specified group
  • samples within the whole dataset if the group parameter is not specified.

The sharing threshold is set to 50% of the selected samples.

# Get clones present in at least 50% of the samples belonging to the amTreg group
RepSeqData_public <- getPublic(x = RepSeqData, 
                                level = "aaClone", 
                                group = c("cell_subset", "amTreg"))

Similarly, it is possible to extract private sequences with getPrivate(). If the parameter singletons is set to TRUE, only private sequences with a count of 1 will be returned.

RepSeqData_private <- getPrivate(x = RepSeqData,
                                level = "ntClone",
                                singletons = FALSE)

Based on sequence occurrence

getTopSequences() allows the extraction of the top most expressed sequences. The prop parameter allows users to specify the percentage of top sequences to extract.

RepSeqData_top <- getTopSequences(x = RepSeqData,
                                  level = "aaClone",
                                  group = c("cell_subset", "Teff"), 
                                  prop = 0.1)

Based on sequence functionality

getProductive() and getUnproductive() allow the extraction of productive or unproductive sequences respectively in case no filters were applied during the building of the RepSeqExperiment object.

Note: All the above-mentioned functions can be applied at any of the following repertoire level: clone, clonotype, CDR3aa and CDR3nt.