more on the 64-bit problem: hope?
I've been looking a little more into Chuck's issues with 64-bit platforms, and thought I should summarize what I've found. I have managed to get it running to a certain point, but of course many things don't work. But I got basic math and the <<<>>> operator working. No point posting a patch since it would be almost totally useless, but instead I'll describe it in english: By the way, everything I'll discuss here is flagged in the source, so Ge is of course aware of these things, but I thought it might help to start discussing it a bit. Basically the 64-bit problems come down to two issues, as far as I can see: - the stack pointer is divided by 4 instead of divided by 8. - assumptions of type depending on size=4 or size=8. How many issues are we dealing with? $ grep -e 'ISSUE' * | grep 64 | wc -l 70 This is not trivial, but you know, 70 seems kind of manageable. Where are they, mostly? $ grep -e 'ISSUE' * | grep 64 | cut -d: -f1 | sort -u chuck_emit.cpp chuck_instr.cpp util_opsc.h Hey, that's not bad, the problems are mostly just in these three files. Let's look at the different kinds of problems. Typically the type assumptions have lines like "size == 4" or "size == 8". $ grep -e '== 4' * | wc -l 38 $ grep -e '== 4' * | grep ISSUE | cut -d: -f1 | sort -u chuck_emit.cpp chuck_instr.cpp Okay, we can assume the type assumptions constitute about half of the issues. What about the divide-by-4 problems, which can simply be replaced by divide-by-8? These are usually something like ">> 2": $ grep -e '>> 2)' -e '>> 2 ' * | wc -l 49 $ grep -e '>> 2)' -e '>> 2 ' * | cut -d: -f1 | sort -u chuck_instr.cpp util_opsc.cpp util_sndfile.c We hit on a few more potential areas there, but it's still within the same ballpark. These divide-by-4 errors are pretty simple to fix anyways, just replace >> 2 with >> 3. So, in summary, the problems are concentrated in something like 70 to 80 lines of only 3 or 4 source files. The division problems can, I think, mostly be fixed with search-and-replace. The erroneous type assumptions will need more attention, for sure. It basically constitutes refactoring of the execution code for all instructions, making sure not to assume that 4-byte values are either integers or string pointers, and 8-byte values are floating point. imho it would be better to change this code to get the data type from the actual data type member of the Chuck_Type struct, which is *right there*, called "xid". There might be other issues hiding away, of course. And I haven't even begun looking at the DSP code, or any of the ugens, since I've stuck to simple test cases so far. However, I think there's a good chance that this could be solved with a little attention. Raking over the VM code didn't turn up any other major issues at first glance. By the way, this little excursion into the source made me notice that every single chuck instruction is executed as a virtual function. This means that for every instruction the struct must be dereferenced, and then it's vtable, and then the function can be called. This would be a great place to look into for optimization, I think. A jump table or big switch statement with inlined functions might be way faster. cheers, Steve
On Wed, 2008-08-06 at 21:44 -0400, Stephen Sinclair wrote:
By the way, this little excursion into the source made me notice that every single chuck instruction is executed as a virtual function. This means that for every instruction the struct must be dereferenced, and then it's vtable, and then the function can be called. This would be a great place to look into for optimization, I think. A jump table or big switch statement with inlined functions might be way faster.
1. Before you do any optimization, it might be worthwhile to do some profiling. I don't happen to have a 32-bit machine handy (actually, I have one but it's a laptop and probably way underpowered for ChucK.) But I do have the "recipes" for profiling Linux code down to the chip level (cache misses, pipeline stalls, close-enough-for-government-work annotated source and assembler listings, etc.), courtesy of "oprofile". 2. If you're digging down into virtual machines, "pseduo-instruction dispatching", interpreters, etc., you should probably read http://www.complang.tuwien.ac.at/forth/threaded-code.html. There's a little "trick" you can use to speed up threaded code if you're willing to use a non-standard feature of GCC that most compilers don't have, and some other things you can do to speed up other forms of dispatch. -- M. Edward (Ed) Borasky ruby-perspectives.blogspot.com "A mathematician is a machine for turning coffee into theorems." -- Alfréd Rényi via Paul Erdős
participants (2)
-
M. Edward (Ed) Borasky
-
Stephen Sinclair