Understanding Buffer and Binary Data in Node.js

Understanding Buffer and Binary Data in Node.js

Buffers tripped me up for a while. They're how Node handles raw binary data outside the V8 heap, and once you actually understand what's happening with Buffer.from() and friends, a lot of Node's I/O behavior starts to make sense. Here's what I wish someone had told me earlier.

Buffers confused me for the longest time. I'd see Buffer.from() in code and just... accept it without understanding what was happening. Something about bytes? Something about memory? I nodded along and moved on. If that's you right now, this post is for you.

Once I got it, I felt dumb for not getting it sooner. The idea is actually pretty simple: JavaScript was built for the browser, where you mostly deal with strings, objects, and DOM elements. There was never a reason to read a file byte by byte or poke at raw network packets. But Node.js runs on the server, where you absolutely need to do those things. So the Buffer class exists as a way to hold a fixed chunk of raw binary data. Each slot in a Buffer holds one byte (a number from 0 to 255), and the memory lives outside V8's normal heap. That's it. That's the whole concept.

What Actually Is a Buffer?

Think of a Buffer like an array, but instead of holding JavaScript values, it holds raw bytes. When you read a file without specifying an encoding, you get a Buffer back. When data shows up on a TCP socket, it arrives as a Buffer. When you hash something with the crypto module, the result is a Buffer. They're everywhere in Node, even if you don't usually see them because higher-level APIs hide the details.

Here's a bit of history that helped me understand the design: before ES6 gave us TypedArray and ArrayBuffer, the Buffer class was the only way to work with binary data in Node. These days, Buffer is actually a subclass of Uint8Array, so it plays nicely with the newer typed array APIs while keeping its own set of convenience methods. You get the best of both worlds.

Creating Buffers: alloc, from, and allocUnsafe

The old new Buffer() constructor is deprecated. Don't use it. It had security issues and confusing overloaded behavior. Here's what to use instead:

// Create a zero-filled buffer of 16 bytes
const buf1 = Buffer.alloc(16);
console.log(buf1); // <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00>

// Create a buffer from a string
const buf2 = Buffer.from('Hello, Node.js!', 'utf-8');
console.log(buf2); // <Buffer 48 65 6c 6c 6f 2c 20 4e 6f 64 65 2e 6a 73 21>
console.log(buf2.toString()); // Hello, Node.js!

// Create a buffer from an array of bytes
const buf3 = Buffer.from([0x48, 0x65, 0x6c, 0x6c, 0x6f]);
console.log(buf3.toString()); // Hello

// Create a buffer from another buffer (copy)
const buf4 = Buffer.from(buf2);

// Create an uninitialized buffer (faster but may contain old data)
const buf5 = Buffer.allocUnsafe(16);
console.log(buf5); // Contains random leftover memory data

The difference between alloc and allocUnsafe matters. Buffer.alloc(size) zeroes out the memory, so you know every byte is 0x00. Buffer.allocUnsafe(size) skips that step, which is faster but means you might be looking at leftover data from whatever used that memory previously. That leftover data could include passwords, tokens, anything. Only use allocUnsafe when you're going to immediately overwrite every single byte before reading from it.

You can also fill a buffer with a specific value right when you create it:

// Fill with a specific byte value
const buf = Buffer.alloc(8, 0xFF);
console.log(buf); // <Buffer ff ff ff ff ff ff ff ff>

// Fill with a string pattern
const padded = Buffer.alloc(16, 'abc');
console.log(padded.toString()); // abcabcabcabcabca

Encoding and Decoding

This is where Buffers become genuinely useful in day-to-day code. You constantly need to convert between binary data and strings, and Node supports a bunch of encodings for this.

const original = 'This is a test string with unicode: \u00f1, \u00fc, \u65e5\u672c\u8a9e';

// Encode to different formats
const utf8Buf = Buffer.from(original, 'utf-8');
const base64Str = utf8Buf.toString('base64');
const hexStr = utf8Buf.toString('hex');

console.log('UTF-8 length:', utf8Buf.length, 'bytes');
console.log('Base64:', base64Str);
console.log('Hex:', hexStr);

// Decode back
const fromBase64 = Buffer.from(base64Str, 'base64');
console.log('Decoded:', fromBase64.toString('utf-8'));

// Convert between encodings
function convertEncoding(str, fromEnc, toEnc) {
  return Buffer.from(str, fromEnc).toString(toEnc);
}

// Hex to Base64
const hexData = '48656c6c6f';
const b64Data = convertEncoding(hexData, 'hex', 'base64');
console.log(b64Data); // SGVsbG8=

The supported encodings: utf-8 (the default), ascii, base64, base64url, hex, binary (same as latin1), utf-16le, and latin1. You'll mostly use utf-8 for text, base64 for embedding binary in JSON, and hex for crypto hashes and debugging.

One thing that bit me early on: with multi-byte encodings like UTF-8, a single character can take multiple bytes. So the byte length of a string isn't always the same as its character count:

const ascii = 'Hello';        // 5 characters, 5 bytes
const emoji = '\ud83d\udc4b\ud83c\udf0d';          // 2 characters, 8 bytes

console.log(Buffer.byteLength(ascii, 'utf-8')); // 5
console.log(Buffer.byteLength(emoji, 'utf-8')); // 8

I wasted a good hour debugging a protocol parser because I assumed string length equaled byte length. Don't make that mistake.

ArrayBuffer and Typed Arrays

This is where things used to confuse me the most. There's Buffer, there's ArrayBuffer, there's Uint8Array, and they all seem to do similar things. Here's the relationship:

Since Node v4, Buffer is a subclass of Uint8Array. That means every Buffer is a Uint8Array, and it's backed by an ArrayBuffer under the hood. You can move between all three without too much friction.

// Buffer is a Uint8Array
const buf = Buffer.from([1, 2, 3, 4]);
console.log(buf instanceof Uint8Array); // true

// Access the underlying ArrayBuffer
console.log(buf.buffer);        // SharedArrayBuffer or ArrayBuffer
console.log(buf.byteOffset);    // Offset within the ArrayBuffer
console.log(buf.byteLength);    // 4

// Create a Buffer from an ArrayBuffer
const arrayBuffer = new ArrayBuffer(8);
const view = new Uint8Array(arrayBuffer);
view[0] = 0x48;
view[1] = 0x69;
const bufFromAB = Buffer.from(arrayBuffer);
console.log(bufFromAB.toString()); // Hi (followed by null bytes)

// Using DataView for multi-byte reads
const packet = Buffer.alloc(12);
packet.writeUInt32BE(0x01020304, 0);
packet.writeFloatBE(3.14, 4);
packet.writeInt32BE(-1, 8);

const dataView = new DataView(packet.buffer, packet.byteOffset, packet.byteLength);
console.log(dataView.getUint32(0));   // 16909060
console.log(dataView.getFloat32(4));  // 3.14...
console.log(dataView.getInt32(8));    // -1

Other typed arrays like Int16Array, Float32Array, and BigInt64Array let you interpret the same memory as different data types. This matters when you're working with binary protocols or scientific data where you need to read multi-byte numbers from a byte stream.

// Interpret bytes as 16-bit integers
const bytes = Buffer.from([0x00, 0x0A, 0x00, 0x14, 0x00, 0x1E]);
const int16View = new Int16Array(bytes.buffer, bytes.byteOffset, 3);

// Note: typed arrays use platform endianness
console.log(int16View[0], int16View[1], int16View[2]);

Watch out for one thing: Buffer.from(arrayBuffer) does not copy the data. It creates a view over the same memory. Change one, and the other changes too. If you need a separate copy, do Buffer.from(Buffer.from(arrayBuffer)). Yeah, the double wrapping looks weird, but that's how it works.

Working with Binary Protocols

This is where Buffers stop being abstract and start being really practical. Lots of protocols -- DNS, MQTT, custom game servers -- define their messages as exact sequences of bytes. You need to read and write those byte sequences precisely.

Here's a small example: a binary message format with a type byte, a 2-byte length, a payload, and a checksum.

// Format: [type: 1 byte] [length: 2 bytes BE] [payload: N bytes] [checksum: 1 byte]

function encodeMessage(type, payload) {
  const payloadBuf = Buffer.from(payload, 'utf-8');
  const totalLength = 1 + 2 + payloadBuf.length + 1;
  const message = Buffer.alloc(totalLength);

  let offset = 0;

  // Write message type (1 byte)
  message.writeUInt8(type, offset);
  offset += 1;

  // Write payload length (2 bytes, big-endian)
  message.writeUInt16BE(payloadBuf.length, offset);
  offset += 2;

  // Write payload
  payloadBuf.copy(message, offset);
  offset += payloadBuf.length;

  // Calculate and write checksum (XOR of all previous bytes)
  let checksum = 0;
  for (let i = 0; i < offset; i++) {
    checksum ^= message[i];
  }
  message.writeUInt8(checksum, offset);

  return message;
}

function decodeMessage(buffer) {
  let offset = 0;

  const type = buffer.readUInt8(offset);
  offset += 1;

  const payloadLength = buffer.readUInt16BE(offset);
  offset += 2;

  const payload = buffer.slice(offset, offset + payloadLength).toString('utf-8');
  offset += payloadLength;

  const checksum = buffer.readUInt8(offset);

  // Verify checksum
  let computed = 0;
  for (let i = 0; i < offset; i++) {
    computed ^= buffer[i];
  }

  if (computed !== checksum) {
    throw new Error('Checksum mismatch');
  }

  return { type, payload };
}

// Usage
const encoded = encodeMessage(0x01, 'Hello Protocol');
console.log('Encoded:', encoded.toString('hex'));

const decoded = decodeMessage(encoded);
console.log('Decoded:', decoded); // { type: 1, payload: 'Hello Protocol' }

Buffer gives you read and write methods for every common number format: readUInt8, readUInt16BE, readUInt16LE, readUInt32BE, readInt32LE, readFloatBE, readDoubleBE, and all the corresponding write methods. The BE and LE suffixes mean big-endian and little-endian. Network protocols almost always use big-endian; x86 CPUs use little-endian natively. Getting this wrong produces hilarious garbage values, and I say "hilarious" because it took me two hours to figure out why my numbers were wrong.

For more complex messages, Buffer.concat() lets you assemble pieces:

const header = Buffer.alloc(4);
header.writeUInt16BE(0x0001, 0); // version
header.writeUInt16BE(0x0042, 2); // command

const body = Buffer.from('payload data');

const lengthBuf = Buffer.alloc(4);
lengthBuf.writeUInt32BE(body.length, 0);

const packet = Buffer.concat([header, lengthBuf, body]);
console.log(packet.toString('hex'));

File I/O with Buffers

If you've ever called fs.readFileSync('something') without a second argument, you got a Buffer back. That's the default. Node gives you raw bytes unless you tell it otherwise by passing an encoding like 'utf-8'.

const fs = require('fs');
const path = require('path');

// Reading a file as a buffer
const imageBuffer = fs.readFileSync(path.join(__dirname, 'photo.png'));
console.log('File size:', imageBuffer.length, 'bytes');
console.log('PNG signature:', imageBuffer.slice(0, 8).toString('hex'));
// PNG files start with: 89504e470d0a1a0a

// Reading specific bytes using file descriptors
const fd = fs.openSync('largefile.bin', 'r');
const headerBuf = Buffer.alloc(64);
fs.readSync(fd, headerBuf, 0, 64, 0);  // Read 64 bytes from position 0
fs.closeSync(fd);

// Writing binary data to a file
const data = Buffer.alloc(256);
for (let i = 0; i < 256; i++) {
  data[i] = i;  // Write bytes 0x00 through 0xFF
}
fs.writeFileSync('binary-output.bin', data);

// Appending buffer data to a file
const appendData = Buffer.from('\nNew line of data');
fs.appendFileSync('log.txt', appendData);

For big files, don't read the whole thing into memory. Use streams. Each chunk in a stream is a Buffer:

const fs = require('fs');
const crypto = require('crypto');

// Calculate SHA-256 hash of a large file using streams
function hashFile(filePath) {
  return new Promise((resolve, reject) => {
    const hash = crypto.createHash('sha256');
    const stream = fs.createReadStream(filePath);

    stream.on('data', (chunk) => {
      // chunk is a Buffer
      hash.update(chunk);
    });

    stream.on('end', () => {
      resolve(hash.digest('hex'));
    });

    stream.on('error', reject);
  });
}

// Copy a file using buffer chunks
function copyFile(src, dest) {
  return new Promise((resolve, reject) => {
    const readStream = fs.createReadStream(src);
    const writeStream = fs.createWriteStream(dest);

    readStream.pipe(writeStream);
    writeStream.on('finish', resolve);
    writeStream.on('error', reject);
  });
}

If your file is more than a couple hundred megabytes, fs.readFile will eat your memory. I've seen servers crash because someone read a 2GB log file into a Buffer. Streams exist for a reason.

Buffer Manipulation and Comparison

Buffers have a decent set of methods for searching, slicing, copying, and comparing. Most of them work similarly to array methods, with one important catch I'll get to in a second.

const buf = Buffer.from('Hello, World! Hello, Node.js!');

// Searching
console.log(buf.indexOf('Hello'));      // 0
console.log(buf.lastIndexOf('Hello'));  // 14
console.log(buf.includes('Node'));      // true

// Slicing (returns a view, not a copy)
const slice = buf.slice(0, 5);
console.log(slice.toString()); // Hello

// Note: modifying the slice modifies the original
slice[0] = 0x4A; // 'J'
console.log(buf.toString()); // Jello, World! Hello, Node.js!

// To get an independent copy, use subarray with Buffer.from
const copy = Buffer.from(buf.subarray(0, 5));

// Copying between buffers
const source = Buffer.from('ABCDEF');
const target = Buffer.alloc(10, 0x2D); // Fill with '-'
source.copy(target, 2, 0, 4); // Copy 4 bytes to target starting at position 2
console.log(target.toString()); // --ABCD----

// Comparing buffers
const a = Buffer.from('abc');
const b = Buffer.from('abd');
const c = Buffer.from('abc');

console.log(Buffer.compare(a, b)); // -1 (a comes before b)
console.log(Buffer.compare(b, a)); // 1 (b comes after a)
console.log(a.equals(c));          // true

// Filling a buffer
const fillBuf = Buffer.alloc(10);
fillBuf.fill('ab');
console.log(fillBuf.toString()); // ababababab

Here's the catch I mentioned: buf.slice() and buf.subarray() return views over the same memory, not copies. That slice example above where changing the slice changed the original? That will absolutely burn you if you're not expecting it. Anytime you need a separate, independent copy, wrap it: Buffer.from(buf.subarray(start, end)).

Honestly, most Node developers go their whole careers without thinking much about Buffers. The higher-level APIs -- fs.readFile with an encoding, JSON.parse, string-mode streams -- hide the binary layer pretty well. But when you do hit a situation where you need to parse a binary file format, implement a wire protocol, or figure out why your string has weird characters... that's when this stuff saves you. It's one of those topics where you don't need it until you really, really need it, and then you're glad you spent the time.

Written by Anurag Kumar

Full-stack developer passionate about Node.js and building fast, scalable web applications. Writing about what I learn every day.

Comments (0)

No comments yet. Be the first to share your thoughts!