Variational inference derivation

Latent variable models:

Untitled

Decompose into simpler prior and conditional distributions:

Untitled

Take the following as accepted for now:

Untitled

Apply Jensen’s ineq:

Untitled

Entropy review:

Untitled

Intuition: going back to the expectation, there’s some $q_i$ where sampling from it maximizes the first part (under $p(x_i \mid z)$), but we also want it to be max entropy.

Untitled