Correlation Analysis of Words in a Corpus
corTerms.Rd
Performs a correlation analysis of the frequency of the words contained in a corpus.
Usage
corTerms(corpus,
min.freq = 20,
lang="es",
method="spearman",
r.lim=0,
n.terms=50,
remove.wordlist=NULL)
Arguments
- corpus
A quanteda corpus containing texts.
- min.freq
The minimum frequency to be included in the analysis. The default is 20.
- lang
The language for removing stopwords. The default is Spanish: "es".
- method
The correlation method to be employed in the correlation. The default is "spearman", the other options are "pearson", "kendall", and "yule" (this last one converts frequencies into binary data before calculating the correlation).
- r.lim
Indicates the degree of correlation that will be used to filter the values returned. The default is 0.
- n.terms
Indicates the number of terms to be returned by the function. The default is 50.
- remove.wordlist
List of words to be removed from the analysis alonside stopwords. The default is NULL.
Details
The function corTerm calculates the correlation coefficient for the frequency of words contained in a corpus object. It is designed to work with the corNet function which creates a sociogram of the links among words.
Value
A list containing two data.frame objects. The first, edges, is an edge list with three variables: term1, term2, and value. The second, vertices, indicates the feature, its frequency, and the number of documents in which it appears.
Examples
# Create a corpus object
cb <- bra.inaugural
# Generates a list of correlations
ll <- corTerms( cb,
lang = "pt",
min.freq = 50,
n.terms = 50,
remove.wordlist = c("é",
"ser",
"fazer",
"cada",
"neste"))