[Topic-models] [CTM] Constrained Newton Method for the nu (ν) in Correlated Topic Models
Lesaege Clement
Clement.Lesaege at technicolor.com
Fri May 20 09:49:45 EDT 2016
Dear everyone,
I found our the derivation the derivations of the optimization of ν_i (less than an hour after posting that, despite having spent a significant amount of time on it before asking for help, it seems that rubber duck debugging (https://en.wikipedia.org/wiki/Rubber_duck_debugging) also works for math problems).
The code of Blei and Lafferty is effectively optimizing the ν_i in the log domain without using any constraints. (In my previous mail I've shown that the solution is always positive so constrained optimization is unnecessary).
So we simply have to define w=log( (ν_i)^2 ) and derive the Newton method in respect with w.
We don't need to compute derivative of the L (log-likelyhood) in respect to Omega, but can get an expression of the update equations directly from the derivatives of L in respect with (ν_i)^2 .
We use the ' symbol as derivative in respect to (ν_i)^2.
We have L((ν_i)^2) = L(exp(w))
dL/dw = L'(exp(w)) exp(w)=L'((ν_i)^2) (ν_i)^2
(d^2L)/(d(w^2))=[L'((ν_i)^2) (ν_i)^2] (ν_i)^2 + L'((ν_i)^2) [exp(w)]' = L''((ν_i)^2)) ((ν_i)^2)^2 + L'((ν_i)^2) (ν_i)^2
So using the Newton method we have the newton update equation:
w_{n+1}=w_n-(dL/dw)/((d^2L)/(d(w^2))) = w_n - (L'((ν_i)^2)((ν_i)^2)/(L''((ν_i)^2)) ((ν_i)^2)^2 + L'((ν_i)^2) (ν_i)^2) (1)
w_{n+1}=w_n - (L'((ν_i)^2))/(L''((ν_i)^2)(ν_i)^2+L'((ν_i)^2)) (2)
The equation (1) is the one used in the Blei and Lafferty code, the equation (2) is equation (1) simplified by (ν_i)^2.
So I answer my own questions:
>>>Do we need to take into account the positivity constraint?
No.
>>>Are the update equations used in Blei and Lafferty code correct, and if so, how do you derivate them?
Yes, but they are doing extra multiplications because we can simplify by nu_i. Derivation is above.
I searched for Newton method in the log-domain in the literature but did not find any results. If someone has a reference, it would interest me.
Best regards,
Clément Lesaege
Technicolor Intern
________________________________
From: topic-models-bounces at lists.cs.princeton.edu <topic-models-bounces at lists.cs.princeton.edu> on behalf of Lesaege Clement <Clement.Lesaege at technicolor.com>
Sent: Friday, May 20, 2016 2:00:00 PM
To: topic-models at lists.cs.princeton.edu
Subject: [Topic-models] [CTM] Constrained Newton Method for the nu (ν) in Correlated Topic Models
Dear everyone,
I'm trying to understand the process of optimizing the ν_i in the variational step of Variational EM for Correlated Topic Models.
In the long article "A correlated topic model of science", the indications are:
"We use Newton’s method for each coordinate with the constraint that ν_i > 0".
It seems to refer to the barrier method. (Explained simply there: https://www.or-exchange.org/questions/1160/using-newtons-algorithm-with-inequality-constraints )
"The basic outline of a barrier method is:
1. Start with a problem min{f(x) | x >= 0}, say.
2. Replace with min f(x) - mu sum_i log x_i
3. Apply a truncated Newton's method to the resulting unconstrained problem. (I.e., don't let x go all the way to zero.)
4. At each iteration, reduce mu.
5. Stop when you reach a satisfactory solution to the unconstrained problem."
However
The code of Blei and Lafferty (available http://www.cs.princeton.edu/~blei/ctm-c/ ) corresponding to the optimization of ν_i is the following:
"
log_nu_i = log(init_nu);
do
{
iter++;
nu_i = exp(log_nu_i);
// assert(!isnan(nu_i));
if (isnan(nu_i))
{
init_nu = init_nu*2;
printf("warning : nu is nan; new init = %5.5f\n", init_nu);
log_nu_i = log(init_nu);
nu_i = init_nu;
}
// f = f_nu_i(nu_i, i, var, mod, d);
// printf("%5.5f %5.5f \n", nu_i, f);
df = df_nu_i(nu_i, i, var, mod, d);
d2f = d2f_nu_i(nu_i, i, var, mod, d);
log_nu_i = log_nu_i - (df*nu_i)/(d2f*nu_i*nu_i+df*nu_i); // Look at this line in particular
}
while (fabs(df) > NEWTON_THRESH);
vset(var->nu, i, exp(log_nu_i));
"
First it seems that the optimization is done in the log domain, but I don't understand how to get these update equations.
log_nu_i = log_nu_i - (df*nu_i)/(d2f*nu_i*nu_i+df*nu_i);
Moreover, it seems that this equation can be simplified by nu_i.
And in these equations, I see nothing which could correspond to the nu_i>0 constraint.
If we look at the equation we want to set to 0 (first derivative), we can see that the nu_i value where this equation is 0 is never negative.
Proof.
By taking (nu_i)^2 as the variable (in the code nu_i correspond in fact to (nu_i)^2 ).
dL/d((nu_i)^2)=-(Sigma_ii)/2 - (N/(2 zeta)) exp(lambda_i+(1/2)*(nu_i)^2 ) + 1 / (2 (nu_i)^2)
Sigma_ii>0, zeta>0, exp(x)>0
So -(Sigma_ii)/2 - (N/(2 zeta)) exp(lambda_i+(1/2)*(nu_i)^2 )<0
If we assume (nu_i)^2<0, then we have 1 / (2 (nu_i)^2)<0, and dL/d((nu_i)^2)<0 so dL/d((nu_i)^2)!=0.
So it seems that there is no need for a positivity constraint.
To sum up my questions:
Do we need to take into account the positivity constraint?
Are the update equations used in Blei and Lafferty code correct, and if so, how do you derivate them?
Best regards,
Clément Lesaege
Technicolor Intern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160520/06d0808f/attachment-0001.html>
More information about the Topic-models
mailing list