<?xml version="1.0" encoding="utf8"?>
<?xml-stylesheet type="text/xsl" href=".xsl"?>
<s>
  <c>Better random numbers for javascript, v. 0.9</c>
  <s>
    <c>Update: 2011-11-09</c>
    <s>
    <c>I keep this page since it is currently very often reached from a <c ref="http://news.ycombinator.com/item?id=3211809">Hacker News</c> discussion and there is a thread about the XSLT.</c>
    <c>But actually, it would be better to go to the new <c ref="http://baagoe.org/en/wiki/Better_random_numbers_for_javascript">wiki</c>. This page will soon simply redirect there.</c>
    </s>
    <c>ECMAScript provides a standard function, <c class="code">Math.random</c>, which according to the specifications (<c ref="http://www.ecma-international.org/publications/standards/Ecma-262-arch.htm">ECMA-262, 3rd edition</c>, 15.8.2.14) "returns a number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments."</c>
    <c>This is good enough for its intended use, viz., to provide an easy way for script authors to write cute animations, games, etc. However, if you want to use it for more serious programming, there are several problems.</c>
    <c>Why <c class="code">Math.random</c> should not be used for serious programming</c>
    <s>
      <c>No guarantee of quality</c>
      <s>
        <c>"Using an implementation-dependent algorithm or strategy" means that you don't usually know how the "random" numbers are made, and that you cannot examine the code to get an impression of how "random" they are likely to be, even if you know a lot about <c ref="http://en.wikipedia.org/wiki/Pseudorandom_number_generator">PRNG</c>s. You essentially have to trust the vendor. As a matter of fact, most <c class="code">Math.random</c> implementations are not good. J R Stockton's <c ref="http://www.merlyn.demon.co.uk/">Merlyn</c> site provides some <c ref="http://www.merlyn.demon.co.uk/js-randm.htm#TRFP">proofs</c> of that fact.</c>
      </s>
      <c>No standard way to repeat sequences</c>
      <s>
        <c>In most implementations if not in all, <c class="code">Math.random</c>'s generator is silently seeded at startup by some supposedly random value, usually the present time to the millisecond or whatever other granularity the system provides.</c>
        <c>This means that there is no easy way to repeat a series of pseudo-random numbers in order, e.g., to determine what went wrong in a simulation. It also means that you cannot give your code to a colleague and expect that she will find what you want to show. Anything that uses <c class="code">Math.random</c> is inherently not portable.</c>
      </s>
      <c>Security</c>
      <s>
        <c>On the other hand, the weakness of many implementations mean that a sequence of outputs of <c class="code">Math.random</c> <c class="emph">can</c> often be predicted, even when the intent was that it could not. This may present security risks. <c class="code">Math.random</c> should never be used to generate <c ref="http://www.trusteer.com/sites/default/files/Temporary_User_Tracking_in_Major_Browsers.pdf">separators in POST requests</c>, <c ref="http://en.wikipedia.org/wiki/Universally_Unique_Identifier">Version 4 UUIDs</c>, <c ref="http://en.wikipedia.org/wiki/Session_key">session keys</c>, etc.</c>
        <c>If one wants to do things like that in javascript, <c class="code">Math.random</c> is simply not good enough. Actually, even a <c ref="http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator">cryptographically secure PRNG</c> would not by itself be enough, you would also need some means of collecting truly random entropy to seed it.</c>
        <c>I shall address that problem separately. For now, let us concentrate on making PRNGs with better statistical properties and longer periods, which can be seeded either by something that is repeatable, or by, e.g., the current time if less predictable output is desired. Cryptological security and true, physical randomness are separate issues.</c>
      </s>
    </s>
    <c>Why it is difficult to write good PRNGs in javascript</c>
    <s>
      <c>Many families of excellent PRNGs have been conceived for other languages. However, javascript's notion of Numbers make most of them less than ideal for our purpose. They <c class="emph">can</c> be ported (javascript, like any other decent programming language, is <c ref="http://en.wikipedia.org/wiki/Turing_completeness">Turing complete</c>), but at a heavy price in speed.</c>
      <c>Only 53 significant bits in multiplications</c>
      <s>
        <c>Many present algorithms assume exact multiplications on 32-bit unsigned integers, yielding a 64-bit result. Some even multiply 64-bit integers, yielding a 128-bit result. If you want to emulate that in javascript, you have to write multi-precision routines. That is certainly possible, but hardly efficient.</c>
        <c>Even worse, most algorithms based on multiplications assume that overflow results in truncation and discard of the high bits, effectively giving modular arithmetic for free. This works fine on unsigned integers, but javascript Numbers are <c class="emph">always</c> "double-precision 64-bit binary format IEEE 754 values". Which means that in the case of overflow, it is the <c class="emph">low</c> order bits who are discarded.</c>
        <c>As an example, let us examine the best-known PRNG of all: the venerable <c ref="http://en.wikipedia.org/wiki/Linear_congruential_generator">LCG</c>, on 32-bit unsigned integers. If you use Marsaglia's "easy to remember" multiplier 69069, everything works nicely: 69069 being less than 2<c class="sup">21</c>, <c class="code">x = 69069 * x + 1 &gt;&gt;&gt; 0</c> will indeed cycle through all the possible values. But if you choose a bigger multiplier modulo 2<c class="sup">32</c> with good figures of merit in the <c ref="http://random.mat.sbg.ac.at/tests/theory/spectral/">spectral test</c>, say, 1103515245 which is used in ANSI C and the <c class="code">glibc</c> library, the result of the multiplication will almost never fit in 53 bits. Low order bits are discarded, which ruins the basis for the algorithm: its operation modulo 2<c class="sup">32</c>.</c>
      </s>
      <c>Limited and inefficient bit operations</c>
      <s>
        <c>Javascript has the usual set of bitwise operations: AND, OR, XOR, shifts, etc. However, they only apply to 32-bit integers, and since javascript Numbers are always floating point, it means that the Numbers must be converted internally to integers before the operation, and back to floats again when it is done. Smart implementations may attempt to optimise, but on all the platforms I have tried, the results remain slow.</c>
      </s>
      <c>Altogether, precious few of the usual tools are well adapted to javascript. Some ingenuity is required if one wants good performance.</c>
    </s>
    <c>A collection of better PRNGs</c>
    <s>
      <c>I present several PRNGs that should have much better statistical properties than the built-in <c class="code">Math.random</c>. They have been taken from, or inspired by, well-known sources on the subject, and they have been extensively tested. They share a common interface, but the actual algorithms differ, so that one can choose the best for one's platform and needs.</c>
      <c>Individual PRNGs in javascript can be viewed and downloaded in their respective entries below. The entire collection of javascript PRNGs can be downloaded <c ref="Alea-0.9.zip">here</c>, and the corresponding C versions <c ref="alea-0.9.tar.gz">here</c>.</c>
      <c/>
      <s>
        <c>The C versions are what is used for the tests. Those consume about 2<c class="sup">38</c> numbers and take more than 24 hours even in C; using javascript is hardly an option.</c>
        <c>They are completely equivalent to the javascript versions, as can be seen from the sources and checked by experiments. The only difference in output is when one uses non-ASCII arguments. (For the  C versions, the interpretation of such arguments depends on the platform.)</c>
        <c/>
      </s>
    </s>
    <c>Common interface</c>
    <s>
      <c>We shal take <c class="code">Alea</c> as example, but the other PRNGs -  <c class="code">MRG32k3a</c>,  <c class="code">KISS07</c>, etc -  share the same interface.</c>
      <c>Simple usage:</c>
      <s>
        <c class="code">var random = [new] Alea([...]);</c>
        <c>Calling Alea (or any of the others) with <c class="code">new</c> is unnecessary, but harmless. The optional arguments may be of any number and nature.</c>
        <c>The call returns a <c class="emph">function</c>, random, that is functionally equivalent to <c class="code">Math.random</c>. That is, successive calls of <c class="code">random()</c> return Number values with positive sign, greater than or equal to 0 but less than 1, chosen pseudo-randomly with approximately <c ref="http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)">uniform distribution</c> over that range. (Hopefully, with a much better approximation of that uniform distribution.)</c>
        <c>Example:</c>
        <s>
          <c class="code">var random = new Alea('my', 3, 'seeds');</c>
          <c><c class="code">random(); //</c> returns 0.30802189325913787</c>
          <c><c class="code">random(); //</c> returns 0.5190450621303171</c>
          <c><c class="code">random(); //</c> returns 0.43635262292809784</c>
          <c>Any implementation of ECMAScript should yield exactly those values.</c>
        </s>
        <c>The internal state of random, and hence the sequence of pseudo-random numbers it returns, is determined by the arguments to Alea. Two functions returned by calls to Alea with the same argument values will return exactly the same sequence of pseudo-random numbers. String and Number arguments should provide repeatable output across platforms. Objects provide repeatable output on the same platform, but not necessarily on others. (The point of using Objects would be to provide thoroughly <c class="emph">unpredictable</c> output. As an extreme example, one could try <c class="code">Alea(document)</c> or <c class="code">Alea(this)</c> in a browser.)</c>
        <c>If you call <c class="code">Alea()</c>, that is, with no arguments, a single argument of <c class="code">+new Date()</c> is silently assumed. This provides an easy means to provide somewhat unpredictable numbers, like <c class="code">Math.random</c> does.</c>
        <c>Example:</c>
        <s>
          <c class="code">var random = Alea();</c>
          <c><c class="code">random(); //</c> returns 0.6198398587293923</c>
          <c><c class="code">random(); //</c> returns 0.8385338634252548</c>
          <c><c class="code">random(); //</c> returns 0.3644848605617881</c>
          <c>Your return values should be completely different. (But see below how you can reproduce mine on your computer.)</c>
        </s>
        <c>The generated numbers should be "more random" than in most implementations of <c class="code">Math.random</c>. Any sequence of any pick of the returned bits should pass any known test, unless where mentioned. (Please <c ref="mailto:baagoe@baagoe.com?subject=[Random Musings] Bad PRNG">send me a mail</c> if you have evidence that this is not true. But please remember that "p's happen": if you repeat any test a million times, some of the results are bound to have very suspicious p-values.)</c>
        <c>The specification of <c class="code">Math.random</c> says nothing about the precision of the returned fraction. Here, it is guaranteed that the precision is at least 32 bits.</c>
        <c/>
        <s>
          <c>Very often, to get the full attainable precision of 53 bits, the main generator has to be called twice, which takes more time. It would be wasteful to insist on doing that systematically, since a 32 bit resolution is enough in most cases. If you really need 53 bits, the function provides a method described below to get them.</c>
        </s>
      </s>
      <c>More advanced usage:</c>
      <s>
        <c>Functions being Objects in javascript, the returned function also has two methods, uint32 and fract53, and two properties, version and args.</c>
        <c><c class="emph">uint32</c> is a function that takes no arguments and returns an unsigned random integer in the range [0, 2<c class="sup">32</c>[.</c>
        <c>Example:</c>
        <s>
          <c class="code">var intRandom = Alea('').uint32;</c>
          <c><c class="code">intRandom(); //</c> returns 715789690</c>
          <c><c class="code">intRandom(); //</c> returns 2091287642</c>
          <c><c class="code">intRandom(); //</c> returns 486307</c>
          <c>Any implementation of ECMAScript should yield exactly those values.</c>
        </s>
        <c> To obtain an integer in [0, n[, one may simply take the remainder modulo n. With some generators, this is faster than using the default function in <c class="code">Math.floor(random() * n)</c> or its clever variants, <c class="code">random() * n | 0</c> and <c class="code">random() * n &gt;&gt;&gt; 0</c>.</c>
        <c><c class="emph">fract53</c> is a function that takes no arguments and returns a 53-bit fraction in [0, 1[. It is usually slower than the main function, though.</c>
        <c>Example:</c>
        <s>
          <c class="code">var rand53 = Alea('').fract53;</c>
          <c><c class="code">rand53(); //</c> returns 0.16665777435687268</c>
          <c><c class="code">rand53(); //</c> returns 0.00011322738143160205</c>
          <c><c class="code">rand53(); //</c> returns 0.17695781631176488</c>
          <c>Any implementation of ECMAScript should yield exactly those values.</c>
        </s>
        <c><c class="emph">version</c> is a String that indicates the nature and the version of the generator, like 'Alea 0.9'. (For now, all the generators have a version number of 0.9; those on first versions of this page had a version number of 0.5. The versions may eventually reach 1.0 when they have been quite thoroughly tested, inspected by experts, etc.)</c>
        <c><c class="emph">args</c> is an Array of the arguments that were given to Alea. Thus, if you called it without an argument and it assumed <c class="code">+new Date()</c>, you may recover the time stamp that was used in order to repeat the sequence.</c>
        <c>Example, assuming <c class="code">random</c> is the function created above without arguments:</c>
        <s>
          <c><c class="code">var seed = random.args; //</c> seed = 1277182878230 - I wrote the example on 22 June 2010 shortly after 7AM in France (summer time), exactly at 2010-06-22T07:01:18.230+02:00, 1277182878230 milliseconds after the Unix epoch.</c>
          <c class="code">random = Alea(1277182878230);</c>
          <c><c class="code">random(); //</c> returns 0.6198398587293923</c>
          <c><c class="code">random(); //</c> returns 0.8385338634252548</c>
          <c><c class="code">random(); //</c> returns 0.3644848605617881</c>
          <c>Any implementation of ECMAScript should yield exactly those values.</c>
        </s>
      </s>
    </s>
    <c>Common implementation details</c>
    <s>
      <c>Regardless of the differences in their algorithms, internal state, etc, all the generators provided here share a common implementation pattern. First, the internal state is declared. Then, it is seeded. Then, the main function for the algorithm is presented. Finally, the function-object to be returned is built and returned.</c>
      <c>Seeding and the <c class="code">Mash</c> function</c>
      <s>
        <c>The arguments to Alea (and the other provided generators) are always converted into a String which is hashed by the means of a suitable <c ref="http://en.wikipedia.org/wiki/Hash_function">hash function</c>. (You may test it <c ref="../hash/avalanche.xhtml">here</c>.)</c>
        <c>That hash function, with its initialised internal state, is provided by another function called <c class="code">Mash</c>, which must therefore be <c ref="Mash.js">included</c> in any application that uses any of the provided functions.</c>
        <c>(Alternatively, if you decide to use only one PRNG, you may simply copy <c class="code" ref="Mash.js">Mash</c> into your copy of the relevant file.)</c>
        <c>Each element of the internal state is altered by a hash of each argument. (The hash function preserves its own internal state, so that each of the alterations is likely to be different.) The exact nature of the alteration depends on the nature of the state element - if it is an integer, the hash is simply added, if it is a single bit, it is XOR-ed, if it is a fraction between 0 and 1, the hash is subtracted and the element is incremented by 1 if it has become negative, etc.</c>
        <c>Finally, whenever the PRNG algorithm places constraints on certain elements of its internal state, the last stage of the seeding process makes sure that these constraints are met.</c>
      </s>
      <c>At least two random functions, not just one</c>
      <s>
        <c>All the generators provide two functions, the main function which returns 32-bit or 53-bit fractions, and its uint32 method which returns 32-bit unsigned integers. They also provide the fract53 method which is guaranteed to return 53-bit fractions; it may or may not be the same as the main function.</c>
        <c>In most algorithms ported from other languages, the algorithm is actually implemented in the uint32 method. (Operating on 53-bit numbers without ever losing a bit to overflow leaves very few possibilities. 32-bit unsigned integers provide handy bit operations in addition to reducing the problems of unintended overflow; most PRNGs operate on integers for that reason.) The "main" container function calls that integer function and multiplies the result by 2<c class="sup">32</c>, the fract53 method calls it twice, once for its 32 high bits and once again for the 21 lowest.</c>
        <c>In the generators of the Lagged Fibonacci family, the PRNG algorithm directly provides the 53-bit fractions. The uint32 method multiplies by 2<c class="sup">32</c> and returns the floored result. The fract53 method is the same as the main container function.</c>
        <c>In Alea and Kybos, the main PRNG algorithm provides 32-bit fractions. Both uint32 and fract53 are derived.</c>
      </s>
    </s>
    <c id="Generators">The generators</c>
    <s>
      <c id="MRG32k3a" ref="MRG32k3a.js">MRG32k3a</c>
      <s>
        <c>One of Pierre L'Ecuyer's <c ref="http://www.iro.umontreal.ca/~lecuyer/myftp/papers/combmrg2.ps">Combined Multiple Recursive</c> PRNGs. It is remarkable for its ingenious use of 53-bit integer arithmetic (originally implemented in C <c class="code">double</c>s) which makes it very suitable for javascript, as well as for the care in the choice of multipliers and moduli. The period is about 2<c class="sup">191</c>, and it passes - of course -  L'Ecuyers own <c ref="http://www.iro.umontreal.ca/~simardr/testu01/tu01.html">TestU01</c> BigCrush battery of tests. (The <c ref="http://en.wikipedia.org/wiki/Mersenne_twister">Mersenne Twister</c> fails Crush.)</c>
        <c>This is surely the most reputable of all the generators presented here. It has been quite extensively tested and it is widely used. It is copyrighted but free for non-commercial uses. Commercial users must request written permission from Professor L'Ecuyer.</c>
      </s>
      <c id="Xorshift03" ref="Xorshift03.js">Xorshift03</c>
      <s>
        <c>George Marsaglia's <c ref="http://groups.google.com/group/comp.lang.c/msg/e3c4ea1169e463ae">2003 version</c> of his xorshift family of PRNGs. They are among the fastest available in integer arithmetic, but they become <c ref="time.xhtml">much slower</c> in javascript. Also, most variants tend to fail L'Ecuyer's tests which are rather more stringent than Marsaglia's own.</c>
        <c>This version, however, passes BigCrush, mainly because it contains a final multiplication of two components. Its period is about 2<c class="sup">160</c>.</c>
      </s>
      <c id="KISS07" ref="KISS07.js">KISS07</c>
      <s>
        <c>George Marsaglia's <c ref="http://groups.google.com/group/comp.lang.fortran/msg/6edb8ad6ec5421a5">2007 version</c> of his KISS family of PRNGs.</c>
        <c>It is deceptively simple. It merely adds the outputs of three small generators which individually would not have the slightest chance of passing any kind of randomness test. Yet, the result passes BigCrush. The period is about 2<c class="sup">121</c>.</c>
        <c>Its xorshift component makes it <c ref="time.xhtml">rather slow</c>, though. It remains a very attractive generator, because its combination of three completely different algorithms with different periods makes it likely that any bias from one of them is blurred by the two others.</c>
      </s>
      <c id="LFIB4" ref="LFIB4.js">LFIB4</c>
      <s>
        <c>Marsaglia's <c ref="http://groups.google.com/group/sci.crypt/msg/eb4ddde782b17051">LFIB4</c>, adapted to subtraction modulo 1 on 53-bit fractions. (One cannot add two 53-bit fractions in [0, 1[ in IEEE double precision without risking the loss of a bit to overflow. One can subtract them, though, and add 1 if the result is negative - the sign bit serves as a temporary 54-th bit.)</c>
        <c> The period is very close to 2<c class="sup">308</c>, and its combination of four "taps" instead of two makes it more robust than the usual <c ref="http://en.wikipedia.org/wiki/Lagged_Fibonacci_generator">lagged Fibonacci</c> generators. It also makes it just a little slower, although much faster in javascript than integer-based algorithms.</c>
      </s>
      <c id="LFib" ref="LFib.js">LFib</c>
      <s>
        <c>This is a L(2<c class="sup">53</c>, 255, 52, -) generator - I have never understood why people insist on 55, 24, especially since one often uses an array of 256 anyway, in order to take advantage of C and similar languages' implicit modulo 256 arithmetic on <c class="code">unsigned char</c> indices. The period is very close to 2<c class="sup">307</c>.</c>
        <c>LFib and LFIB4 are <c ref="time.xhtml">the fastest of the generators presented here to provide full 53-bit resolution</c>, since that is what they provide natively. However, the simple linear dependency between their last bits makes me somewhat reluctant to recommend them wholeheartedly, even though they pass BigCrush. They fail <c ref="http://guru.multimedia.cx/pseudo-random-number-generators/">Michael Niedermayer's tests</c>. All of the other PRNGs presented here pass.</c>
      </s>
      <c id="Alea" ref="Alea.js">Alea</c>
      <s>
        <c>This generator implements my variation on Marsaglia's <c ref="http://en.wikipedia.org/wiki/Multiply-with-carry">Multiply-with-carry</c> theme, adapted to javascript's quaint notion of numbers: the carries are exactly the integer parts of Numbers with exactly 32 bits of fractional part.</c>
        <c>Such generators depend crucially on their multiplier, a. It must be less than 2<c class="sup">21</c>, so that the result of the multiplication fits in 53 bits, and for an array of n 32-bit numbers, a * 2<c class="sup">32n</c> - 1 must be a <c ref="http://en.wikipedia.org/wiki/Safe_prime">safe prime</c>. The period is the corresponding <c ref="http://en.wikipedia.org/wiki/Sophie_Germain_prime">Germain prime</c>, a * 2<c class="sup">32n - 1</c> - 1.</c>
        <c>The one presented here uses n = 3: just 3 32-bit fractions, which means that one may use three rotating variables without walking through an Array. (Table lookup is rather expensive, time-wise.) The period is close to 2<c class="sup">116</c>, it passes BigCrush, and it is <c ref="time.xhtml">the fastest</c> javascript PRNG I know that does so.</c>
        <c>I expected such generators with any n up to 256 (or even beyond, if one wants monstrously long periods and can find the appropriate multipliers) to be <c ref="time.xhtml">faster than those relying on integer arithmetics</c>, which they are. But they also turn out to be faster than the lagged Fibonacci generators if one does not insist on 53 bits, much to my surprise.</c>
        <c>I therefore propose them as the PRNGs of choice in javascript. I must however confess to the bias induced by more personal involvement in those generators than in the others, which are simple ports in the case of MRG32k3a, Xorshift03 and KISS07, and rather straightforward adaptations in the case of the lagged Fibonaccis.</c>
      </s>
      <c id="Kybos" ref="Kybos.js">Kybos</c>
      <s>
        <c>All the generators presented so far have quite well-defined mathematical properties which have been extensively studied. This gives good reasons to consider them satisfactory, but it also raises a suspicion - aren't they too well-behaved? Isn't their elegant mathematical structure by itself a sign that they are not <c class="emph">random</c>? Except possibly for KISS07 which combines three quite different ideas, they could be victims of the Mersenne Twister syndrome - a theoretically wonderful algorithm, with actual <c class="emph">proofs</c> of good distribution in all dimensions up to a large number and a huge period, which turns out to fail tests that other, less sophisticated but more robust generators pass easily.</c>
        <c>Kybos combines Alea with a variant of the <c ref="http://www.cryptographic2.tk/bays-durham-shuffle">Bays-Durham shuffle</c>, using the ultimate non linear tool: table lookup. The original shuffle rarely improves PRNGs in a noticeable way, but the variant used here, in which the "random" numbers in the lookup table are <c class="emph">incremented</c> by those from the original generator instead of being replaced, makes some terrible generators pass very stringent tests.</c>
        <c>The problem is that one cannot say much about the period or other aspects of the resulting numbers. Which may be what one would expect of anything that is indeed random, but it doesn't make mathematicians happy.</c>
        <c>I mainly propose Kybos because it has an extra method, addNoise, which allows one to alter the state of the lookup table using true, physical randomness, without changing anything in the state of the Alea generator that provides the guarantee of a suitably long period and good distribution. The lookup table essentially serves as an entropy pool. Its size of 256 bits makes it suitable for the generation of 128-bit <c ref="http://en.wikipedia.org/wiki/Universally_Unique_Identifier">UUIDs</c>.</c>
      </s>
    </s>
  </s>
</s>


