Skip to contents

Performs a correlation analysis of the frequency of the words contained in a corpus.

Usage

corTerms(corpus, 
        min.freq = 20,
        lang="es",
        method="spearman",
        r.lim=0,
        n.terms=50,
        remove.wordlist=NULL)

Arguments

corpus

A quanteda corpus containing texts.

min.freq

The minimum frequency to be included in the analysis. The default is 20.

lang

The language for removing stopwords. The default is Spanish: "es".

method

The correlation method to be employed in the correlation. The default is "spearman", the other options are "pearson", "kendall", and "yule" (this last one converts frequencies into binary data before calculating the correlation).

r.lim

Indicates the degree of correlation that will be used to filter the values returned. The default is 0.

n.terms

Indicates the number of terms to be returned by the function. The default is 50.

remove.wordlist

List of words to be removed from the analysis alonside stopwords. The default is NULL.

Details

The function corTerm calculates the correlation coefficient for the frequency of words contained in a corpus object. It is designed to work with the corNet function which creates a sociogram of the links among words.

Value

A list containing two data.frame objects. The first, edges, is an edge list with three variables: term1, term2, and value. The second, vertices, indicates the feature, its frequency, and the number of documents in which it appears.

Examples

# Create a corpus object
cb <- bra.inaugural

# Generates a list of correlations
ll <- corTerms( cb, 
                lang = "pt", 
                min.freq = 50, 
                n.terms = 50, 
                remove.wordlist = c("é",
                                "ser",
                                "fazer",
                                "cada",
                                "neste"))