Filter Words from a Corpus — filterWords • tenet

The function extracts words and their relative position from a corpus object based either on a keyword list or a dictionary.

Usage

filterWords(corpus,
            keywords,
            rem.accent = FALSE,
            rem.punct = TRUE,
            case.insensitive = TRUE,
            lang = "es",
            fast = TRUE)

Arguments

corpus: A quanteda corpus object.
keywords: Keywords or dictionary employed to search for terms in texts.
rem.accent: Remove accents. The default is FALSE.
rem.punct: Remove punctuation. The default is TRUE.
case.insensitive: Search for both upper and lowercase words. The default is TRUE.
lang: The language for removing stopwords. The default is Spanish: "es".
fast: Use a fast algorithm to tokenize texts. The default is TRUE (lower precision).

Details

The function searches for terms using a keyword list or a dictionary and returns a list of words and their relative position in each text. These results are useful to be employed later in a Lexical Dispersion Plot.

Value

A data.frame containing the words retreived, their relative position in each text and the grouping variable, if existing.

Examples

if (FALSE) {
# Retrieve a corpus of text 
tx <- quanteda::data_corpus_inaugural

# find the relative position of keywords
filterWords(corpus=tx, 
            keywords=c("democ","liber","govern"),
            lang="en")
}