Filter Words from a Corpus
filterWords.RdThe function extracts words and their relative position from a corpus object based either on a keyword list or a dictionary.
Usage
filterWords(corpus,
keywords,
rem.accent = FALSE,
rem.punct = TRUE,
case.insensitive = TRUE,
lang = "es",
fast = TRUE)Arguments
- corpus
A quanteda corpus object.
- keywords
Keywords or dictionary employed to search for terms in texts.
- rem.accent
Remove accents. The default is FALSE.
- rem.punct
Remove punctuation. The default is TRUE.
- case.insensitive
Search for both upper and lowercase words. The default is TRUE.
- lang
The language for removing stopwords. The default is Spanish: "es".
- fast
Use a fast algorithm to tokenize texts. The default is TRUE (lower precision).
Details
The function searches for terms using a keyword list or a dictionary and returns a list of words and their relative position in each text. These results are useful to be employed later in a Lexical Dispersion Plot.
Value
A data.frame containing the words retreived, their relative position in each text and the grouping variable, if existing.
Examples
if (FALSE) {
# Retrieve a corpus of text
tx <- quanteda::data_corpus_inaugural
# find the relative position of keywords
filterWords(corpus=tx,
keywords=c("democ","liber","govern"),
lang="en")
}