On 24 Jul 2009, at 19:57, Kassen wrote:
Chuck threads, or "shreds", need only be synchronized every sample time. So intermediate calculations can be put on several system (POSIX) threads. So that is one possibility.
If only it were so simple. consider a variable and three shreds. Let all three shreds read the variable, apply some manipulation to it and write the result back, then advance time by "1.0/ Std.rand2(3,8)::samp". Advancing time by fractions of a samp is perfectly valid; we have a temporal resolution that goes down to a smaller granularity than clock ticks of the host CPU. Evidently this is all purely imaginary and only used to determine calculation order to a extreme precision.
If time isn't real, and only emulated, it might be parallelized, but may require special syntax for it. One concept is "causality": if at some computed time, a decision is made as how to proceed with new calculations, then that is not immediately parallelizable. Only things that flows on their own in such a causality diagram can be parallelized.
Merely syncing every samp isn't precise enough; no matter how often you would sync I will be able to write a ChucK program that will yield a different result on your multi-threaded system from what it currently yields on the single threaded strongly timed VM, at least until you start syncing at arbitary moments in-between CPU instructions. I'm fairly sure you won't be doing that. At the very least the moments you sync at should take the way shreds advance time into account.
The idea is to treat time between samples as only computed, emulated, and not real. The program computes ahead, and collects the values for the next sample which is presented.
Another thing is that you do not yet have a solution for how to divide n+i shreds over n cores when it can't be determined ahead of time how much time any of them will take.
It's a non-trivial issue to say the least. The whole syntax of ChucK looks like it is meant to enable concurency but behind the screens it is actually dealing with calculation order instead. Right now it simply attempts to do that to the best of it's abilities and when the CPU runs out that's it. In a multi-threaded system you will need to determine exactly what "to the best of it's abilities" means for the next period and that's notoriously hard.
So in this vein, I posted a POSIX-threaded sample illustrating how to reuse threads when searching files in small 'grep'-like program. One can use any number of threads for any number of files. Each thread opens up a file and searches it. When one thread is finished, if there are more files to search, it continues to the next. So if this can be down with sample times as snapshots, all that is needed is sufficient CPU power to complete all computations until the next sample is presented.
So is that not possible in your example?
No. my example deals with the UGen graph. For the 5 seconds it runs no shreds are doing any work, it's meant to illustrate a UGen graph that won't run on a single core for a given number of SinOsc's yet that can't benefit from multiple cores either (as far as I know) because all UGens are interconected. It's meant to show that multiple CPU's aren't always a substitute for clockspeed. This is a well known phenomenon, I wrote the example speciffically to cause that sort of problem.
Is this some kind of resource starvations setup? Like in: http://en.wikipedia.org/wiki/Dining_philosophers_problem http://en.wikipedia.org/wiki/Resource_starvation Then those things cannot be prevented as per design of the computer language, just as one cannot prevent non-termination (infinite loops) being programmed. So one is back at finding better syntaxes that supports what works. It may require new syntax. Not immediately related to this, I just happened to read a bit about it, Apple has launched some "Grand Central Dispatch" that involve syntax extensions to C, C++ and Obj-C that allows one to identify block of parallelizable code. http://en.wikipedia.org/wiki/Grand_Central_Dispatch Hans