[Topic-models] [CTM] Constrained Newton Method for the nu (ν) in Correlated Topic Models
Lesaege Clement
Clement.Lesaege at technicolor.com
Fri May 20 08:00:00 EDT 2016
Dear everyone,
I'm trying to understand the process of optimizing the ν_i in the variational step of Variational EM for Correlated Topic Models.
In the long article "A correlated topic model of science", the indications are:
"We use Newton's method for each coordinate with the constraint that ν_i > 0".
It seems to refer to the barrier method. (Explained simply there: https://www.or-exchange.org/questions/1160/using-newtons-algorithm-with-inequality-constraints )
"The basic outline of a barrier method is:
1. Start with a problem min{f(x) | x >= 0}, say.
2. Replace with min f(x) - mu sum_i log x_i
3. Apply a truncated Newton's method to the resulting unconstrained problem. (I.e., don't let x go all the way to zero.)
4. At each iteration, reduce mu.
5. Stop when you reach a satisfactory solution to the unconstrained problem."
However
The code of Blei and Lafferty (available http://www.cs.princeton.edu/~blei/ctm-c/ ) corresponding to the optimization of ν_i is the following:
"
log_nu_i = log(init_nu);
do
{
iter++;
nu_i = exp(log_nu_i);
// assert(!isnan(nu_i));
if (isnan(nu_i))
{
init_nu = init_nu*2;
printf("warning : nu is nan; new init = %5.5f\n", init_nu);
log_nu_i = log(init_nu);
nu_i = init_nu;
}
// f = f_nu_i(nu_i, i, var, mod, d);
// printf("%5.5f %5.5f \n", nu_i, f);
df = df_nu_i(nu_i, i, var, mod, d);
d2f = d2f_nu_i(nu_i, i, var, mod, d);
log_nu_i = log_nu_i - (df*nu_i)/(d2f*nu_i*nu_i+df*nu_i); // Look at this line in particular
}
while (fabs(df) > NEWTON_THRESH);
vset(var->nu, i, exp(log_nu_i));
"
First it seems that the optimization is done in the log domain, but I don't understand how to get these update equations.
log_nu_i = log_nu_i - (df*nu_i)/(d2f*nu_i*nu_i+df*nu_i);
Moreover, it seems that this equation can be simplified by nu_i.
And in these equations, I see nothing which could correspond to the nu_i>0 constraint.
If we look at the equation we want to set to 0 (first derivative), we can see that the nu_i value where this equation is 0 is never negative.
Proof.
By taking (nu_i)^2 as the variable (in the code nu_i correspond in fact to (nu_i)^2 ).
dL/d((nu_i)^2)=-(Sigma_ii)/2 - (N/(2 zeta)) exp(lambda_i+(1/2)*(nu_i)^2 ) + 1 / (2 (nu_i)^2)
Sigma_ii>0, zeta>0, exp(x)>0
So -(Sigma_ii)/2 - (N/(2 zeta)) exp(lambda_i+(1/2)*(nu_i)^2 )<0
If we assume (nu_i)^2<0, then we have 1 / (2 (nu_i)^2)<0, and dL/d((nu_i)^2)<0 so dL/d((nu_i)^2)!=0.
So it seems that there is no need for a positivity constraint.
To sum up my questions:
Do we need to take into account the positivity constraint?
Are the update equations used in Blei and Lafferty code correct, and if so, how do you derivate them?
Best regards,
Clement Lesaege
Technicolor Intern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160520/03c92b9c/attachment.html>
More information about the Topic-models
mailing list