Chuck threads, or "shreds", need only be synchronized every sample time. So intermediate calculations can be put on several system (POSIX) threads. So that is one possibility.

If only it were so simple. consider a variable and three shreds. Let all three shreds read the variable, apply some manipulation to it and write the result back, then advance time by "1.0/Std.rand2(3,8)::samp". Advancing time by fractions of a samp is perfectly valid; we have a temporal resolution that goes down to a smaller granularity than clock ticks of the host CPU. Evidently this is all purely imaginary and only used to determine calculation order to a extreme precision. Merely syncing every samp isn't precise enough; no matter how often you would sync I will be able to write a ChucK program that will yield a different result on your multi-threaded system from what it currently yields on the single threaded strongly timed VM, at least until you start syncing at arbitary moments in-between CPU instructions. I'm fairly sure you won't be doing that. At the very least the moments you sync at should take the way shreds advance time into account.

Another thing is that you do not yet have a solution for how to divide n+i shreds over n cores when it can't be determined ahead of time how much time any of them will take.

It's a non-trivial issue to say the least. The whole syntax of ChucK looks like it is meant to enable concurency but behind the screens it is actually dealing with calculation order instead. Right now it simply attempts to do that to the best of it's abilities and when the CPU runs out that's it. In a multi-threaded system you will need to determine exactly what "to the best of it's abilities" means for the next period and that's notoriously hard.

So is that not possible in your example?

No. my example deals with the UGen graph. For the 5 seconds it runs no shreds are doing any work, it's meant to illustrate a UGen graph that won't run on a single core for a given number of SinOsc's yet that can't benefit from multiple cores either (as far as I know) because all UGens are interconected. It's meant to show that multiple CPU's aren't always a substitute for clockspeed. This is a well known phenomenon, I wrote the example speciffically to cause that sort of problem.

If not, one may need to invent new syntax which is parallelizable. Specifically, highly serial "for" loops can probably not be done much about without such rewriting.

Yes, but not for all kinds of calculation. For some kinds of calculation dualcore is nearly as fast as a single CPU at twice the speed, for others the second core won't help at all. Most ChucK programs (or at least the kind of caluculation they describe) will be somewhere in between those two extremes.

A second possibility might be have several Chuck synchronizing threads, each synchronizing a set shreds, but between the sets there is a less rigid timing requirement, like a hundredth of second. This would be suitable to make different sound generators.

Maybe.... I'm not saying we can't ever benefit from multiple CPU configurations at all, "merely" pointing out that there are very hard questions there.