MSA-A Matter of Speed-Ruby

__NOTITLE__

Table of Contents = A Matter of Speed (Ruby Version) =

Slowdown by a Factor Two
Dan: Clean and pretty, sure, but is it fast enough? Let's compare the speed of the old code and the new vector code. I wonder whether all this fancy vector stuff is affecting execution speed.

Carol: We can use the  time command, to find out how much time a code spends before finishing. We can redirect the standard output to /dev/null, literally a null device, effectively a waste basket. This is a Unix way of throwing the ordinary output away, the output that appears on the standard output channel. In that way, we are left only with output that appears on the standard error channel, such as timing information provided by the  time command.

Let me run all three forward Euler codes, the original version we wrote, the array version, and the vector version:

|gravity> time ruby euler.rb > /dev/null 0.053u 0.000s 0:00.05 100.0%  0+0k 0+0io 0pf+0w |gravity> time ruby euler_array_each_def.rb > /dev/null 0.077u 0.000s 0:00.07 100.0%  0+0k 0+0io 0pf+0w |gravity> time ruby euler_vector.rb > /dev/null 0.134u 0.001s 0:00.14 92.8%   0+0k 0+0io 0pf+0w

Dan: It seems that using our nifty vector notation, we get a penalty in speed of a factor two. But before jumping to conclusions, can we check once more that we are really doing the same calculations in all three cases?

Carol: Sure, here are the last lines of all three calculations:

|gravity> ruby euler.rb | tail -1 7.6937453936572 -6.27772005661599  0.0  0.812206830641815  -0.574200201239989  0.0 |gravity> ruby euler_array_each_def.rb | tail -1 7.6937453936572 -6.27772005661599  0.0  0.812206830641815  -0.574200201239989  0.0   |gravity> ruby euler_vector.rb | tail -1 7.6937453936572 -6.27772005661599  0.0  0.812206830641815  -0.574200201239989  0.0

Dan: Good! And now I have another question. Presumably even the first version is slower than an equivalent code written in Fortran or C or C++. I would like to know how much slower.

A C Version of Forward Euler
Carol: That's a good idea. Given that our first code, euler.rb, is rather simple, it should be easy to translate it into C. And to make the time measurement a bit more precise, I'll make the time step a hundred times smaller, so that we let the code make a hundred thousand steps.

Finally, we are interested now in execution speed, and for now at least we don't worry about the cost of frequent outputs. After all,when we switch to the real N-body problem, for N>2, the total costs will be dominated by inter-particle force calculations, not by print statements.

Here is the C version, for 100,000 steps, with only one output at the end, in file euler_100000_steps.c:

int main {    double r[3], v[3], a[3]; double dt = 0.0001; int ns, k;    r[0] = 1; r[1] = 0; r[2] = 0; v[0] = 0; v[1] = 0.5; v[2] = 0; for (ns = 0; ns &lt; 100000; ns++){ double r2 = r[0]*r[0] + r[1]*r[1] + r[2]*r[2]; for (k = 0; k &lt; 3; k++) a[k] = - r[k] / (r2 * sqrt(r2)); for (k = 0; k &lt; 3; k++){ r[k] += v[k] * dt; v[k] += a[k] * dt; }    }     printf("%.15g %.15g %.15g %.15g %.15g %.15g\n",            r[0], r[1], r[2], v[0], v[1], v[2]); }
 * 1) include &lt;stdio.h>
 * 2) include &lt;math.h>

Let me compile and run it:

|gravity> gcc -o euler_100000_steps euler_100000_steps.c -lm |gravity> time euler_100000_steps 0.292716737827072 0.382907748579753 0 -1.56551896976935 -0.313957063866527 0 0.070u 0.000s 0:00.08 87.5%   0+0k 0+0io 0pf+0w

Dan: Pretty fast indeed! But you really should compile it with the optimizer flag -O, to bring out the real speed that C is capable off.

Carol: Good point! Here goes:

|gravity> gcc -O -o euler_100000_steps euler_100000_steps.c -lm |gravity> time euler_100000_steps 0.292716737827197 0.382907748579793 0 -1.56551896976913 -0.313957063866306 0 0.035u 0.000s 0:00.03 100.0%  0+0k 0+0io 0pf+0w

Dan: Even faster. One and a half times faster, it seems, but we can't really be sure, given the limited accuracy of the timing output. In any case, the C code really flies!

Erica: Note that the output values differ in the last few significant digits. That must be because optimization causes a different order of arithmetic operations, which means that the roundoff errors are different too. Since we are taking a hundred thousand steps, it is perhaps not so strange that we are losing several digits in accuracy.

Dan: Time to compare these results with the Ruby code. I have this sinking feeling that it will be muuuuuch slower.

A Simple Ruby Version
Carol: You are probably right. Here is the simplest forward Euler version, also with only one output statement at the end, in euler_100000_steps.rb:

include Math x = 1 y = 0 z = 0 vx = 0 vy = 0.5 vz = 0 dt = 0.0001 100000.times{ r2 = x*x + y*y + z*z r3 = r2 * sqrt(r2) ax = - x / r3  ay = - y / r3   az = - z / r3   x += vx*dt y += vy*dt z += vz*dt vx += ax*dt vy += ay*dt vz += az*dt } print(x, " ", y, "  ", z, "  ") print(vx, " ", vy, "  ", vz, "\n")

and here is the timing result:

|gravity> time ruby euler_100000_steps.rb 0.292716737826681 0.38290774857968  0.0  -1.56551896976999  -0.313957063867402  0.0 1.841u 0.003s 0:02.03 90.6%   0+0k 0+0io 0pf+0w

Dan: A dramatic difference. As far as we can see here, Ruby is at least thirty times slower than C!

Erica: That's a bit of a shock.

Carol: Yes, I knew that Ruby would be slow, but I didn't expect it to be quite that slow. Well, at least C and Ruby give the same output results, apart from the last few digits, which change anyway in C when we switch on optimization, as we have seen. So as far as what it is the same between the optimized and unoptimized C results, Ruby produces that part exactly.

Dan: Let's complete the exercise and make similar versions for arrays and vectors.

A Ruby Array Version
Carol: Here is the array version, in euler_array_100000_steps.rb:

include Math def print_pos_vel(r,v) r.each{|x| print(x, " ")} v.each{|x| print(x, " ")} print "\n" end r = [1, 0, 0] v = [0, 0.5, 0] dt = 0.0001 100000.times{ r2 = 0 r.each{|x| r2 += x*x} r3 = r2 * sqrt(r2) a = r.map{|x| -x/r3} r.each_index{|k| r[k] += v[k]*dt} v.each_index{|k| v[k] += a[k]*dt} } print_pos_vel(r,v)

and here is what timing gives:

|gravity> time ruby euler_array_100000_steps.rb 0.292716737826681 0.38290774857968  0.0  -1.56551896976999  -0.313957063867402  0.0 3.449u 0.001s 0:05.46 63.0%   0+0k 0+0io 0pf+0w

Dan: So adding arrays let Ruby slow down by a factor two.

A Ruby Vector Version
Carol: And here is the vector version, in :

require "vector.rb" include Math def print_pos_vel(r,v) r.each{|x| print(x, " ")} v.each{|x| print(x, " ")} print "\n" end r = [1, 0, 0].to_v v = [0, 0.5, 0].to_v dt = 0.0001 100000.times{ r2 = r*r r3 = r2 * sqrt(r2) a = -r/r3 r += v*dt v += a*dt } print_pos_vel(r,v)

and here is how slow it runs:

|gravity> time ruby euler_vector_100000_steps.rb 0.292716737826681 0.38290774857968  0.0  -1.56551896976999  -0.313957063867402  0.0 9.748u 0.005s 0:10.55 92.3%   0+0k 0+0io 0pf+0w

Dan: And now we're losing yet another factor of three. That's pretty terrible!

More Timing Precision
Carol: Let's try to get a bit more timing precision. Instead of taking a hundred thousands steps, we can take a million steps, to make the timing comparison more accurate. You can guess the names of the files I will create: euler_1000000_steps.c</tt>, euler_1000000_steps.rb</tt>, euler_array_1000000_steps.rb</tt> and euler_vector_1000000_steps.rb</tt>.

Here are the results:

|gravity> gcc -o euler_1000000_steps euler_1000000_steps.c -lm |gravity> time euler_1000000_steps 0.519970642634788 -0.376817992041735 0 1.17126787143565 0.114700879740638 0 0.456u 0.000s 0:00.48 93.7%   0+0k 0+0io 0pf+0w

|gravity> gcc -O -o euler_1000000_steps euler_1000000_steps.c -lm |gravity> time euler_1000000_steps 0.519970642633723 -0.376817992041925 0 1.17126787143747 0.114700879739195 0 0.539u 0.000s 0:00.58 91.3%   0+0k 0+0io 0pf+0w

|gravity> time ruby euler_1000000_steps.rb 0.519970642634004 -0.376817992041834  0.0  1.17126787143698  0.114700879739653  0.0 18.212u 0.003s 0:19.81 91.9%  0+0k 0+0io 0pf+0w

|gravity> time ruby euler_array_1000000_steps.rb 0.519970642634004 -0.376817992041834  0.0  1.17126787143698  0.114700879739653  0.0 35.190u 0.015s 0:38.24 92.0%  0+0k 0+0io 0pf+0w

|gravity> time ruby euler_vector_1000000_steps.rb 0.519970642634004 -0.376817992041834  0.0  1.17126787143698  0.114700879739653  0.0 99.017u 0.022s 1:49.38 90.5%  0+0k 0+0io 0pf+0w

Conclusion
Dan: This confirms our earlier conclusions. At least on this particular computer, that we are now using to do some speed tests, the unoptimized C version takes 50% more time than the optimized version, the simplest Ruby version takes about 50 times more time, the Ruby array version about 100 times more, and finally the Ruby vector version takes more than 250 times more time than the optimized C version.

Carol: But even so, for short calculations, who cares if a run takes ten millisecond or a few seconds? I certainly like the power of Ruby in giving us vector classes, and a lot more goodies. We have barely scratched the surface of all the power that Ruby can give us. You should see what we can do when we really start to pass blocks to methods and. ..

Dan:. . . and then we will start drinking a lot of coffee, while waiting for results when we begin to run 100-body experiments! Is there no way to speed up Ruby calculations?

Carol: There is. By the time we use 100 particles, we are talking about 10$2$⋅10$2$=10$4$ force calculations for every time step. This means that the calculation of the mutually accelerations will take up almost all of the computer time. What we can do is write a short C code for computing the accelerations. It is possible to invoke such a C code from within a Ruby code. In that way, we can leave most of the Ruby code unchanged, while gaining most of the C speed.

Erica: I certainly like the flexibility of a high-level language like Ruby, at least for writing a few trial versions of a new code. In order to play around, Ruby is a lot more fun and a lot easier to use than C or C++ or Fortran. After we have constructed a general N-body code that we are really happy with, we can always translate part of it into C, as Carol just suggested.

Or, if really needed to gain speed, we could even translate the whole code into C. Translating a code will always take far less time than developing a code in the first place. And is seems pretty clear to me that development will be faster in Ruby.

Dan: I'm not so sure about all that. In any case, we got started now with Ruby, so let us see how far we get. But if and when we really get bogged down by the lack of speed of Ruby, we should not hesitate to switch to a more efficient language.

Carol: Fair enough! Let's continue our project, and apply our vector formalism to second-order integration schemes.