トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

R_Stem

Rによるステミング

lsa(CRAN)

The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.

RWeka

An R interface to Weka (Version 3.7.1). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Both the R interface and Weka itself are contained in the RWeka package. For more information on Weka see http://www.cs.waikato.ac.nz/~ml/weka/.

Snowball

Snowball stemmers.

tm

A framework for text mining applications within R.

R以外によるステミング

Lingua::Stem(CPAN)

参考資料