# 利用R的包flexmix实现有限混合模型分析

• A+

A Practical Introduction To Finite Mixture Models

A practical introduction to finite mixture modeling with flexmix in R

## Introduction

Finite mixture models are very useful when applied to data where observations originate from various groups and the group affiliations are not known. For example, in single cell RNA-seq data, transcripts in each cell can be modeled as a mixture of two probabilistic processes: 1) a negative binomial process for when a transcript is amplified and detected at a level correlating with its abundance and 2) a low-magnitude Poisson process for when drop-outs occur. These error model can be then used to provide a basis for further statistical analysis including those described in Fan et al.

In this tutorial, we will use simulations and sample data to learn about finite mixture models using the `flexmix` package in `R`.

## Simulated data

First, we will simulate some data. Let’s simulate two normal distributions - one with a mean of 0, another with a mean of 50, both with standard deviation of 5.

Now let’s ‘mix’ the data together…

…and see if we can model our new mixture as two gaussian processes. We expect to be able to fit two gaussians and recover our initial parameters.

So how did we do? It looks like we got class assignments perfectly.

Let’s visualize the real data and our fitted mixture models.

Looks like we did pretty well!

What if we have a more challenging mixture?

Or an even more challenging mixture?

Expectedly, as the simulated distributions become less distinct, we have a harder time modeling them as the correct mixtures.

## Iris example

Now, let’s consider a real example with petal widths of iris flowers. Indeed, this distribution looks a little like a finite mixture of distributions.

Let’s assume they are three normals and see what happens.

Even if we didn’t know the underlying species assignments, we would be able to make certain statements about the underlying distribution of petal widths as likely coming from three different groups with distinctly different means and variances for their petal widths.

What happens if we try to model petal width as only 2 normal processes? What if we use a Poisson and a negative binomial to categorize iris flowers as those with few petals (dead? damaged? ugly?) and some detectable number of petals (count-based processes)?