Skip to contents

The function extracts words and their relative position from a corpus object based either on a keyword list or a dictionary.

Usage

filterWords(corpus,
            keywords,
            rem.accent = FALSE,
            rem.punct = TRUE,
            case.insensitive = TRUE,
            lang = "es",
            fast = TRUE)

Arguments

corpus

A quanteda corpus object.

keywords

Keywords or dictionary employed to search for terms in texts.

rem.accent

Remove accents. The default is FALSE.

rem.punct

Remove punctuation. The default is TRUE.

case.insensitive

Search for both upper and lowercase words. The default is TRUE.

lang

The language for removing stopwords. The default is Spanish: "es".

fast

Use a fast algorithm to tokenize texts. The default is TRUE (lower precision).

Details

The function searches for terms using a keyword list or a dictionary and returns a list of words and their relative position in each text. These results are useful to be employed later in a Lexical Dispersion Plot.

Value

A data.frame containing the words retreived, their relative position in each text and the grouping variable, if existing.

Examples

if (FALSE) {
# Retrieve a corpus of text 
tx <- quanteda::data_corpus_inaugural

# find the relative position of keywords
filterWords(corpus=tx, 
            keywords=c("democ","liber","govern"),
            lang="en")
}