Cluster any count data matrix with a fixed number of variables. Implements the branch & bound Classification-Variational Expectation-Maximisation of this paper (to appear in Computational Statistics).

**MoMPCA** is available on CRAN and the development version available on Github.

**MoMPCA** needs the following CRAN R packages, so check that they are are installed on your computer.

```
required_CRAN <- c("methods",
"topicmodels",
"tm",
"Matrix",
"slam",
"magrittr",
"dplyr",
"stats",
"doParallel",
"foreach",
"ggplot2",
"reshape2",
"tidytext")
not_installed_CRAN <- setdiff(required_CRAN, rownames(installed.packages()))
if (length(not_installed_CRAN) > 0) install.packages(not_installed_CRAN)
```

- For the last stable version, use the CRAN version

- For the development version, use the github install

The package comes with the BBCmsg data set and a `simulate_BBC()`

function wich allows to reproduce the simulation of the paper.

```
library(MoMPCA)
simu <- simulate_BBC(N = 400, L = 200, epsilon = 0, lambda = 1)
dtm <- simu$dtm.full
Ytruth <- simu$Ytruth # true clustering
```

The `dtm`

is a `tm::DocumentTermMatrix()`

object. The main fitting function is `mmpca_clust()`

, which allow for a parralel backend via its argument `mc.cores`

. There is a simple wrapper around this function called `mmpca_clust_modelselect()`

which allows for model selection of `(Q, K)`

with an ICL criterion. Please be aware that the greedy nature of the algorithm may induce quite intensive computations.

```
res <- mmpca_clust(simu$dtm.full, Q = 6, K = 4,
Yinit = 'random',
method = 'BBCVEM',
max.epochs = 7,
keep = 1,
verbose = 2,
nruns = 2,
mc.cores = 1)
```

The top words of the topic matrix `beta`

can then be plotted (if working with text)

And the bound evolution throughout the epochs

```
res <- mmpca_clust_modelselect(simu$dtm.full, Qs = 5:7, Ks = 3:5,
Yinit = 'kmeans_lda',
init.beta = 'lda',
method = 'BBCVEM',
max.epochs = 7,
nruns = 3,
verbose = 1)
best_model = res$models
```

Please cite our work using the following reference:

- N. Jouvin, P. Latouche, C. Bouveyron, A. Livartowski, G. Bataillon, Greedy clustering of count data through a mixture of multinomial PCA (To appear in Computational Statistics)

```
@article{jouvin:hal-02278224,
TITLE = {{Greedy clustering of count data through a mixture of multinomial PCA}},
AUTHOR = {Jouvin, Nicolas and Latouche, Pierre and Bouveyron, Charles and Bataillon, Guillaume and Livartowski, Alain},
URL = {https://hal.archives-ouvertes.fr/hal-02278224},
NOTE = {31 pages, 10 figures},
JOURNAL = {{Computational Statistics}},
PUBLISHER = {{Springer Verlag}},
YEAR = {2020},
KEYWORDS = {Dimension reduction ; Topic modeling ; Count data ; Mixture models ; Clustering ; Variational inference},
HAL_ID = {hal-02278224},
HAL_VERSION = {v1},
}
```

and consider citing this package

