What's the best way to concatenate Uint8Arrays?

by
, posted

This post is for people familiar with JavaScript’s Uint8Array.

Sometimes, I want to combine multiple Uint8Arrays into one. Something like this:

const a = new Uint8Array([1, 2]);
const b = new Uint8Array([3, 4]);
const c = new Uint8Array([5]);

concatenate([a, b, c]);
// => Uint8Array(5) [1, 2, 3, 4, 5]

What’s the best way to do this?

The short answer

If you’re only using Node, use Buffer.concat.

If you’re not only using Node, I prefer this solution:

/**
 * Combine multiple Uint8Arrays into one.
 *
 * @param {Uint8Array[]} uint8arrays
 * @returns {Uint8Array}
 */
function concatenate(uint8arrays) {
  const totalLength = uint8arrays.reduce(
    (total, uint8array) => total + uint8array.byteLength,
    0
  );

  const result = new Uint8Array(totalLength);

  let offset = 0;
  uint8arrays.forEach((uint8array) => {
    result.set(uint8array, offset);
    offset += uint8array.byteLength;
  });

  return result;
}

There are some other solutions (including some bad ones) listed below. If you want to learn more, read on.

The long answer

For this problem, I wanted to write a function, concatenate, which took an array of Uint8Arrays as input and returned a single, combined Uint8Array as output. I didn’t want to install any libraries.

I found three reasonable ways to do this:

  1. Allocate the result upfront, then update the result in pieces
  2. Put them in a Blob
  3. Buffer.concat() (requires Node)

Overall, I think option 1 is best. If you’re using Node, option 3 is probably best, just because it’s built in.

I also tried a few ideas that didn’t work well. We’ll see some of those failures below.

I tested these with Deno and Firefox. I generated random Uint8Arrays of sizes between 0 bytes and 1 mebibyte, then tried to combine them. I did this with 0, 1, 10, 100, and 1000 inputs. (If you’re curious, check out the simple benchmark script I wrote.)

Let’s look at these solutions.

Reasonable Option 1: allocate the result, set each piece

Here’s the first solution I came up with:

function concatenate(uint8arrays) {
  // Determine the length of the result.
  const totalLength = uint8arrays.reduce(
    (total, uint8array) => total + uint8array.byteLength,
    0
  );

  // Allocate the result.
  const result = new Uint8Array(totalLength);

  // Copy each Uint8Array into the result.
  let offset = 0;
  uint8arrays.forEach((uint8array) => {
    result.set(uint8array, offset);
    offset += uint8array.byteLength;
  });

  return result;
}

At a high level, it:

  1. Allocates the result buffer. To do this, you need the length of the result, which you can get by adding up all the inputs’ lengths.
  2. Piece by piece, copy the inputs into the result.

This option is reasonably fast and works for large inputs. I would reach for a solution like this in most cases.

Reasonable Option 2: put them in a Blob

Option 1 is good but is a bit long. Here’s a much shorter solution:

async function concatenate(uint8arrays) {
  // Put the inputs into a Blob.
  const blob = new Blob(uint8arrays);

  // Pull an ArrayBuffer out. (Has to be async.)
  const buffer = await blob.arrayBuffer();

  // Convert that ArrayBuffer to a Uint8Array.
  return new Uint8Array(buffer);
}

It’s a little harder to read, but you could even shorten this further:

const concatenate = async (uint8arrays) =>
  new Uint8Array(await new Blob(uint8arrays).arrayBuffer());

At a high level, this solution takes advantage of the fact that the Blob constructor accepts an array of Uint8Arrays as input (among other things), and can then be converted to an ArrayBuffer. It’s easy to convert an ArrayBuffer to a Uint8Array once you have one.

I like the brevity of this solution, and also the fact that it takes more advantage of the standard library. However, it is asynchronous, which means you need to await it (or handle the promise). Also, in my informal testing, this version was about 5% slower than Option 1.

Overall, I think this solution is worse than Option 1 unless brevity is your primary goal, and then it’s better.

Reasonable Option 3: Buffer.concat (Node-only)

If you’re using Node, there’s an even briefer solution: Buffer.concat. It’s built in!

const a = new Uint8Array([1, 2]);
const b = new Uint8Array([3, 4]);
const c = new Uint8Array([5]);

Buffer.concat([a, b, c]);
// => <Buffer 01 02 03 04 05>

This returns a Buffer. Buffers are Uint8Arrays with subtle differences that probably don’t affect you (and you can convert them to Uint8Arrays easily if you wish).

If you look at Node’s source code, Buffer.concat looks a lot like Option 1; it allocates a result buffer and copies each Uint8Array inside.

Unlike Option 1, Buffer.concat uses Buffer.allocUnsafe internally. These buffers are pulled from a memory pool that might contain stale data. It doesn’t set every value to 01, unlike what new Uint8Array does:

new Uint8Array(3);
// => Uint8Array(3) [ 0, 0, 0 ]

Buffer.allocUnsafe(3);
// => <Buffer 62 4c 36> (your results may vary)

Buffer.allocUnsafe is sometimes dangerous, as its name suggests, but it can be faster. In this case, it doesn’t matter whether the result buffer is initialized to, because Node overwrites whatever was there before, so we can safely enjoy the performance benefit.

I recommend using this option if you’re only in the Node world. It was the fastest option I tested, and doesn’t require writing/importing any code.

The graveyard of bad solutions

I tried a bunch of other ideas and all of them were bad for various reasons. If you want to see several ideas that don’t work…read on.

Bad idea 1: a big array

My simplest (stupidest?) idea was to put everything into a big array, then convert that to a Uint8Array at the end.

// Warning: this solution is bad!
function badConcatenate(uint8arrays) {
  const array = [];
  for (const uint8array of uint8arrays) {
    array.push(...uint8array);
  }
  return new Uint8Array(array);
}

This worked okay for smaller inputs but crashed with larger inputs. These arrays can get quite big.

I evaluated a similar idea where you would pass an array-like object (e.g., { length: 3, "0": 9, "1": 8, "2": 7 }). This “solution” had a similar problem—allocating a huge object—and failed for similar reasons.2

Bad idea 2: a generator

To avoid allocating a giant array myself, I tried using a generator function:

// Warning: this solution is bad!
const badConcatenate = (uint8arrays) =>
  new Uint8Array(
    (function* () {
      for (const uint8array of uint8arrays) {
        yield* uint8array;
      }
    })()
  );

This was about 10× slower for small inputs and ran out of memory for larger ones. I assume this is because the result size is unknown upfront, so JavaScript has to store everything in an internal buffer somewhere, effectively allocating that giant array I was trying to avoid.

Bad idea 3: an iterator (no generators)

In my anecdotal experience, generators can be slower than hand-rolling iterator code yourself.

This code was long, but here’s an abbreviated version3:

// Warning: this solution is bad!

class ByteIterator {
  /* ...code skipped... */
}

class ByteIterable {
  // ...code skipped...
  [Symbol.iterator]() {
    return new ByteIterator(this.uint8arrays);
  }
}

const iterator = (uint8arrays) =>
  new Uint8Array(new ByteIterable(uint8arrays));

This was faster than the generator solution but still much slower than the other good solutions, and still ran out of memory for large inputs. I assume it failed for the same reason as the generator one.

Bad idea 4: Uint8Array.from

If you call Uint8Array.from with a second argument, you get a “map function” that we can abuse for this purpose.

This code was also very long. Here’s a very abbreviated version:

// Warning: this solution is bad!
Uint8Array.from({ length: totalLength }, () => {
  // This is pseudocode:
  return nextByteFromLatestUint8Array;
});

Unlike all the other bogus ideas I tried, this one does actually work. It’s not a complete failure!

However, when compared to the other good options, I don’t think it has any benefits at all. It’s a lot slower (the worst I saw was 16× slower!), harder to understand, more code, and less compatible with browsers and runtimes.

Summary

I recommend using Buffer.concat if you’re using Node and Option 1 otherwise.

Contact me if I missed something!


  1. Unless you override its default behavior↩︎

  2. There might be a way to make this idea workable using proxies, but I didn’t try that. ↩︎

  3. If you want to read some long code that doesn’t work, here’s the full iterator-based “solution”↩︎