TextAnalysisR


Select columns and click 'Apply' to unite text columns


                        

Configure options and click 'Apply' to tokenize texts

Outdated Results: Based on previous Step 2 settings. Click Apply to update with latest Step 2 output.


Select stopwords and click 'Apply' to remove common words




Configure settings and click 'Apply' to detect multi-words

Outdated Results: Based on previous Step 3 settings. Click Apply to update with latest Step 3 output.


Select n-grams and click 'Apply' to compound multi-words

Outdated Results: Based on previous Step 4 settings. Click Process to update with latest Step 4 output.


Click 'Process' to create document-feature matrix



Click 'Apply' to run spaCy linguistic analysis, or 'Skip' to use standard tokenization.



First, click 'Apply' on the Word Forms tab to run spaCy analysis. Then configure and click 'Apply' here to view POS tags.



Click 'Apply' on the Word Forms tab to run spaCy analysis and extract named entities.



Click 'Apply' on the Word Forms tab to run spaCy analysis and extract dependency parsing data.

Height of the plot

Select terms, continuous variable, and click 'Plot Terms' to analyze frequency trends





Embedding Generation

Generate embeddings for advanced semantic analyses (Document Similarity, Search, Clustering).


Load data and process documents in the 1. Setup tab first

Process documents in the 1. Setup tab first

Configure settings and click 'Calculate' to begin analysis


                        

Process documents in the 1. Setup tab first

Enter a search query and click 'Search' to see results



Dimensions of the plot

Configure settings and click 'Plot Network' to visualize word co-occurrence

Dimensions of the plot

Configure settings and click 'Plot Network' to visualize word correlation



Click 'Reduce Dimensionality' to generate visualization


Explore Groups

Top Terms
Sample Documents

Click 'Reduce Dimensionality' then optionally 'Apply Clustering' to create document groups


Label Generation

Click 'Reduce Dimensionality' then 'Apply Clustering' to create groups for labeling

Dimensions of the plots




Overall Score = Coherence(z) + Exclusivity(z) - Residual(z) + Heldout(z)

Coherence: How interpretable topics are based on co-occurring words

Exclusivity: How distinctive topics are from each other

Residual: Model fit to data (lower is better)

Heldout: Model's ability to generalize to new data




Configure K range and click 'Search K' to find optimal topic numbers

Configure settings and click 'Run Model' to discover topics




Hybrid model combines STM probabilistic topics with semantic embeddings

STM Metrics: Based on statistical topic modeling

Coherence: How interpretable topics are

Exclusivity: How distinctive topics are

Heldout Likelihood: Generalization performance



Configure K range and click 'Search K' to find optimal topic numbers

Dimensions of the plot



Search K, and then click 'Display' to view word-topic distributions

Run model, then click 'Display' to view topic keywords

Search K, and then click 'Display' to view word-topic distributions

Dimensions of the plot


Complete Word-Topic tab to view document-topic distributions

Complete Word-Topic tab to view document-topic distributions

Complete Word-Topic tab to view document-topic distributions enhanced with semantic embeddings

Run topic model and select a topic to view representative quotes

Click 'Estimate' button to generate effect estimates

Dimensions of the plot


Estimate effects, select categorical covariate, then click 'Display' to visualize topic prevalence by categories

Dimensions of the plot


Estimate effects, select continuous covariate, then click 'Display' to visualize topic prevalence trends