Database Indexing: How It Works and Why It Matters

I'm going to tell you about the single biggest performance win I've ever shipped. It wasn't a rewrite. It wasn't a migration to a faster database. It wasn't even a code change. It was adding one index to one table. A query that took 14 seconds dropped to 3 milliseconds. The deploy took less time than making coffee. And yet, I've watched teams spend weeks optimizing application code while their database sits there doing full table scans because nobody bothered to look at the query plan.

Indexes are the most under-appreciated, most impactful tool in your performance toolkit. But they're not magic, they're not free, and applying them wrong can actually make things slower. Here's how they actually work, when to use them, and -- just as importantly -- when not to.

What a Database Does Without an Index (Spoiler: It Suffers)

Without an index, your database has exactly one option when you run a query: read every single row in the table, check each one against your WHERE clause, and return the matches. This is called a full table scan, and for a table with 10 million rows, it's roughly as efficient as reading an entire phone book to find one person's number.

An index is a separate data structure that lets the database jump directly to the rows it needs. Think of it like the index at the back of a textbook. Instead of flipping through 800 pages looking for "B-tree," you check the index, find it says "page 247," and go straight there. The database does the same thing: instead of scanning millions of rows, it traverses a small, organized structure and follows a pointer to the exact row.

The cost? Extra disk space for the index structure, and slightly slower writes because every INSERT, UPDATE, and DELETE has to maintain the index too. That trade-off is almost always worth it for read-heavy workloads, which is most web applications. But you need to understand the trade-off or you'll end up with a table that has 15 indexes and writes that take 200ms.

B-Tree Indexes: The One You'll Use 90% of the Time

The B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, SQLite, and basically every relational database. When someone says "add an index," they mean a B-tree index unless they specify otherwise. There's a reason it's the default -- it handles almost everything well.

A B-tree organizes data into a hierarchy of sorted nodes. The root node sits at the top, pointing to internal nodes, which point to leaf nodes containing the actual indexed values and pointers back to the table rows. When you search for a value, the database starts at the root and works its way down, making comparisons at each level to narrow the path. It's like a binary search, but each node can have many children, so the tree stays very shallow even for huge datasets.

-- Creating a B-tree index in PostgreSQL (this is the default type)
CREATE INDEX idx_users_email ON users (email);

-- You can be explicit about it, but it's the same thing
CREATE INDEX idx_users_email ON users USING btree (email);

-- B-tree indexes support all of these operators:
-- <   <=   =   >=   >   BETWEEN   IN   IS NULL   IS NOT NULL

The lookup time is O(log n). In practice, that means a table with 10 million rows typically requires only 3-4 node traversals to find your row. Let that sink in: 3-4 I/O operations instead of scanning 10 million rows. This is why a single index can turn a 14-second query into a 3-millisecond one.

B-trees excel at equality lookups (WHERE email = '[email protected]'), range queries (WHERE age BETWEEN 18 AND 30), and sorting (ORDER BY created_at DESC). Because the data is stored in sorted order within the tree, the database can efficiently serve all of these operations without scanning the whole table. That's a lot of versatility from one data structure, which is exactly why it's the default.

Hash Indexes: Fast but One-Dimensional

Hash indexes use a hash function to map keys to buckets, giving you O(1) constant-time lookups for equality checks. That sounds faster than B-tree's O(log n), and technically it is for pure equality lookups. But hash indexes can't do anything else -- no range queries, no sorting, no partial matches.

-- Creating a hash index in PostgreSQL
CREATE INDEX idx_users_email_hash ON users USING hash (email);

-- Hash indexes ONLY support equality:
-- WHERE email = '[email protected]'  (works)
-- WHERE email > 'a'                 (nope)
-- ORDER BY email                    (nope)

Historically, PostgreSQL hash indexes were unreliable because they weren't WAL-logged before version 10, meaning they could corrupt after a crash. That's been fixed, but the reputation stuck. In practice, B-trees are so efficient that the marginal speedup from hash indexes rarely justifies using a more limited index type. I've never used a hash index in production PostgreSQL, and I don't think I'm unusual in that.

In MongoDB, hashed indexes serve a different purpose: they're primarily used for hash-based sharding, distributing documents evenly across shards.

// Creating a hashed index in MongoDB
db.users.createIndex({ email: "hashed" });

// Equality queries work
db.users.find({ email: "[email protected]" });  // Uses the hashed index

// Range queries don't
db.users.find({ email: { $gt: "a" } });         // Cannot use the hashed index

Composite Indexes: Column Order Will Make or Break You

A composite index covers two or more columns, and the order you put them in matters enormously. This is the concept that separates people who understand indexing from people who just throw indexes at slow queries and hope for the best.

The reason order matters is the leftmost prefix rule. The database can only use the index efficiently if your query filters on columns starting from the left side of the index definition. Think of it like a phone book sorted by last name, then first name. You can look up "Smith" efficiently. You can look up "Smith, John" efficiently. But you can't look up everyone named "John" efficiently because the book isn't sorted by first name at the top level.

-- Composite index: status first, then created_at
CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- Uses the index (matches leftmost prefix)
SELECT * FROM orders WHERE status = 'pending';

-- Uses the full index (both columns)
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01';

-- CANNOT efficiently use this index (skips the leftmost column)
SELECT * FROM orders WHERE created_at > '2026-01-01';

The rule of thumb for column order: equality columns first, then sort columns, then range columns. MongoDB's documentation calls this the ESR rule (Equality, Sort, Range), and it's a genuinely useful framework. Put your most selective equality conditions at the front, followed by your ORDER BY columns, and range filters last.

A covering index is a special optimization where the index contains all the columns a query needs. When the database can answer a query entirely from the index without touching the actual table, that's called an index-only scan, and it's blazing fast because it skips the expensive "heap fetch" -- the lookup from the index back to the actual row data.

-- Covering index using INCLUDE (PostgreSQL 11+)
CREATE INDEX idx_orders_covering ON orders (status, created_at)
  INCLUDE (total_amount, customer_id);

-- This query is satisfied entirely from the index -- no table access needed
SELECT status, created_at, total_amount, customer_id
FROM orders
WHERE status = 'shipped'
ORDER BY created_at DESC;

In MongoDB, covering indexes work the same way -- include all queried and projected fields in the index.

// MongoDB covering index
db.orders.createIndex({ status: 1, createdAt: -1, totalAmount: 1, customerId: 1 });

// Covered query (all fields in the index, _id excluded from projection)
db.orders.find(
  { status: "shipped" },
  { _id: 0, status: 1, createdAt: 1, totalAmount: 1, customerId: 1 }
).sort({ createdAt: -1 });

Covering indexes are one of those optimizations that can feel like magic when you first see them work. A query that was doing millions of random I/O operations against a huge table suddenly reads everything it needs from a compact, sorted index. Use them for your hottest queries.

Partial Indexes and Why Selectivity Matters More Than You Think

A partial index only indexes rows that match a condition. This is incredibly useful when you frequently query a subset of your data and don't want to waste space indexing the rest. The classic example: if 95% of your orders are completed and you only ever query pending ones, why maintain an index over the entire table?

-- Only index active users -- this index will be dramatically smaller
CREATE INDEX idx_active_users_email ON users (email)
  WHERE is_active = true;

-- This query uses the partial index
SELECT * FROM users WHERE email = '[email protected]' AND is_active = true;

-- This query CANNOT use it (doesn't match the partial condition)
SELECT * FROM users WHERE email = '[email protected]' AND is_active = false;

// MongoDB partial index
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
);

// MongoDB sparse index (only indexes documents where the field exists)
db.users.createIndex(
  { middleName: 1 },
  { sparse: true }
);

Now let's talk about selectivity, which is the concept that determines whether your index is actually useful. Selectivity is the ratio of distinct values to total rows. A column with high selectivity (like email -- almost every value is unique) makes an excellent index because each lookup narrows the results to a tiny number of rows. A column with low selectivity (like a boolean is_active column with only true/false) makes a terrible index on its own because each value matches half the table.

-- Check selectivity in PostgreSQL
SELECT
  COUNT(DISTINCT email)::float / COUNT(*)::float AS email_selectivity,
  COUNT(DISTINCT status)::float / COUNT(*)::float AS status_selectivity
FROM orders;

-- Result might look like:
-- email_selectivity: 0.98 (high -- excellent index candidate)
-- status_selectivity: 0.00004 (low -- indexing this alone is usually pointless)

Here's the thing that surprises people: if your index has low selectivity, the query planner might ignore it entirely and do a full table scan instead. That's actually the right call -- when an index matches 50% of rows, the random I/O of bouncing between the index and the table is slower than just scanning sequentially. So you end up paying the write overhead of maintaining the index while getting zero read benefit. That's the worst of both worlds.

EXPLAIN ANALYZE: Stop Guessing, Start Measuring

This is the single most important section of this post. If you take away one thing, let it be this: never create an index without first running EXPLAIN on the query you're trying to optimize. Guessing at indexes is like prescribing medicine without diagnosing the illness. You might get lucky, or you might make things worse.

-- EXPLAIN shows the plan WITHOUT executing the query
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- Output:
-- Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=120)
--   Index Cond: (email = '[email protected]')

-- EXPLAIN ANALYZE actually executes and shows real timings
EXPLAIN ANALYZE SELECT * FROM orders
  WHERE status = 'pending'
  ORDER BY created_at DESC
  LIMIT 10;

-- Output:
-- Limit  (cost=0.43..12.56 rows=10 width=85) (actual time=0.034..0.078 rows=10 loops=1)
--   ->  Index Scan Backward using idx_orders_status_date on orders
--        (cost=0.43..4521.23 rows=37240 width=85)
--        (actual time=0.032..0.074 rows=10 loops=1)
--         Index Cond: (status = 'pending')
-- Planning Time: 0.125 ms
-- Execution Time: 0.098 ms

Here's what to look for in the output. Seq Scan means no index is being used -- the database is reading the entire table. Index Scan means it's using an index. Bitmap Index Scan means it's combining multiple indexes. Compare the estimated rows to actual rows -- big discrepancies mean your statistics are stale and the planner is making bad decisions (run ANALYZE on the table to fix this). And obviously, look at the execution time.

In MongoDB, the equivalent is explain():

// MongoDB explain with execution stats
db.orders.find({ status: "pending" })
  .sort({ createdAt: -1 })
  .limit(10)
  .explain("executionStats");

// Key fields to check:
// - winningPlan.stage: "IXSCAN" = using an index, "COLLSCAN" = full collection scan
// - executionStats.totalKeysExamined: how many index entries scanned
// - executionStats.totalDocsExamined: how many documents fetched
// - executionStats.executionTimeMillis: total time

A critical ratio to watch in MongoDB: totalDocsExamined vs nReturned. If you're examining 100,000 documents to return 10 results, something is very wrong. You want that ratio as close to 1:1 as possible. A high ratio means your index isn't selective enough, is missing a column, or isn't being used at all.

When Indexes Hurt Performance (Yes, This Happens)

I've seen teams that treat indexes like they're free. "Query slow? Add an index. Still slow? Add another one." They end up with tables that have 12 indexes, writes that take forever, and half those indexes never get used by any query. Indexes have real costs, and you need to respect them.

Write overhead is the most direct cost. Every INSERT, UPDATE, and DELETE must also update every index on the affected columns. Five indexes on a table means each write does six operations -- the row plus five index updates. For write-heavy workloads, this can be devastating. I've seen INSERT throughput drop by 60% after someone added four "just in case" indexes.

Storage costs add up fast. A table with 10 million rows and several indexes can easily have its indexes consume more space than the actual data. In cloud environments with metered storage, this costs real money.

Index bloat is a sneaky problem in PostgreSQL. Due to MVCC, dead tuples from frequent updates accumulate in indexes over time, making them larger and slower. You may need to periodically REINDEX or rely on autovacuum to keep things healthy.

-- Check index sizes in PostgreSQL
SELECT
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find indexes that are never used (these are pure overhead)
SELECT
  indexrelname AS index_name,
  idx_scan AS times_used,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Run that second query on any production database that's been around for a while. I guarantee you'll find unused indexes wasting space and slowing down writes. Drop them. If nobody's using them after six months of traffic, nobody needs them.

PostgreSQL Indexing Strategies for the Real World

PostgreSQL has some genuinely powerful indexing features that go way beyond basic B-trees. Here are the ones I actually use in production.

Expression indexes let you index a transformed value. The classic use case: case-insensitive email lookups.

-- Index on lowercase email for case-insensitive lookups
CREATE INDEX idx_users_lower_email ON users (LOWER(email));

-- Query must use the same expression to hit the index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

GIN indexes are essential for JSONB columns and full-text search. If you're storing JSON in PostgreSQL (and you probably are), you need to know about these.

-- GIN index for JSONB data
CREATE INDEX idx_products_metadata ON products USING gin (metadata);

-- Query JSONB data efficiently
SELECT * FROM products WHERE metadata @> '{"color": "red"}';

-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles
  USING gin (to_tsvector('english', title || ' ' || body));

BRIN indexes (Block Range Indexes) are a hidden gem for time-series data. They're a fraction of the size of B-tree indexes and work beautifully when your data is inserted in roughly chronological order -- which is exactly what happens with event logs, analytics data, and audit trails.

-- BRIN index for time-series data (dramatically smaller than B-tree)
CREATE INDEX idx_events_timestamp ON events USING brin (created_at);

-- Works well when data is inserted in roughly chronological order
-- A BRIN index might be 1000x smaller than the equivalent B-tree

Conditional unique constraints via partial indexes solve a common business requirement elegantly:

-- Ensure only one active subscription per user
CREATE UNIQUE INDEX idx_unique_active_subscription
  ON subscriptions (user_id)
  WHERE is_active = true;

That last one is one of my favorites. Try implementing "only one active subscription per user" in application code, with race conditions and concurrent requests. Now look at that one-line database constraint. The database handles it atomically and correctly every time.

MongoDB Indexing Strategies That Actually Work

MongoDB has its own set of indexing tools, and some of them are genuinely excellent for document-oriented workloads.

Compound indexes following the ESR rule are the foundation of MongoDB query optimization. Get this right and most of your performance problems disappear.

// Query: find pending orders for a customer, sorted by date, in a price range
db.orders.find({
  customerId: ObjectId("..."),  // Equality
  status: "pending",            // Equality
  totalAmount: { $gte: 50, $lte: 200 }  // Range
}).sort({ createdAt: -1 });      // Sort

// Optimal index following ESR:
db.orders.createIndex({
  customerId: 1,    // Equality first
  status: 1,        // Equality second
  createdAt: -1,    // Sort third
  totalAmount: 1    // Range last
});

Text indexes give you full-text search without an external search engine. They're not as powerful as Elasticsearch, but for many apps, they're enough.

// Create a text index with weights
db.articles.createIndex({
  title: "text",
  body: "text"
}, {
  weights: { title: 10, body: 1 }
});

// Search using the text index
db.articles.find({ $text: { $search: "database indexing" } });

TTL indexes automatically delete old documents. This is perfect for sessions, temporary tokens, and any data with a natural expiration. I love these because they eliminate the need for a separate cleanup cron job.

// Documents automatically deleted 24 hours after createdAt
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400 }
);

Wildcard indexes handle schemas with unpredictable field names, which is something you'd never need in a relational database but comes up constantly with document stores.

// Index all fields inside the attributes subdocument
db.products.createIndex({ "attributes.$**": 1 });

// Now any query on attributes fields can use this index
db.products.find({ "attributes.color": "red" });
db.products.find({ "attributes.size": "large" });

An Indexing Checklist From Five Years of Mistakes

After years of adding, removing, and agonizing over indexes, here's the checklist I keep in my head for every performance issue:

EXPLAIN first, index second. Understand the current query plan before changing anything. Measure before and after. If EXPLAIN shows a Seq Scan on a million-row table, that's your smoking gun.
Index columns in WHERE, JOIN, and ORDER BY clauses. These are your primary candidates. If a column doesn't appear in any of these, it probably doesn't need an index.
Don't over-index. Each index slows writes. Five well-chosen indexes beats fifteen speculative ones. Aim for the minimum set that covers your critical queries.
Audit and drop unused indexes. Run that unused index query I showed you earlier. At least once a quarter. You'll always find dead weight.
Keep indexes narrow. Fewer columns is better unless you're building a covering index for a specific hot query path.
Update statistics. In PostgreSQL, make sure autovacuum is running and run ANALYZE on tables after bulk data loads. Stale statistics lead to bad query plans, and bad query plans lead to 3 AM pages.
Match indexes to your workload. OLTP workloads (many small reads and writes) want targeted, narrow indexes. OLAP workloads (few large analytical queries) might benefit more from BRIN indexes or materialized views.

Here's what I've learned the hard way: the developers who are really good at database performance aren't the ones who memorize every index type. They're the ones who run EXPLAIN on every slow query, understand what the output means, and make deliberate, measured decisions. Indexing is less about knowing arcane data structures and more about having the discipline to measure, change one thing, and measure again. Do that consistently and your databases will stay fast as your application grows.

Database Indexing: How It Works and Why It Matters

What a Database Does Without an Index (Spoiler: It Suffers)

B-Tree Indexes: The One You'll Use 90% of the Time

Hash Indexes: Fast but One-Dimensional

Composite Indexes: Column Order Will Make or Break You

Partial Indexes and Why Selectivity Matters More Than You Think

EXPLAIN ANALYZE: Stop Guessing, Start Measuring

When Indexes Hurt Performance (Yes, This Happens)

PostgreSQL Indexing Strategies for the Real World

MongoDB Indexing Strategies That Actually Work

An Indexing Checklist From Five Years of Mistakes

Written by Anurag Kumar

Comments (0)

What a Database Does Without an Index (Spoiler: It Suffers)

B-Tree Indexes: The One You'll Use 90% of the Time

Hash Indexes: Fast but One-Dimensional

Composite Indexes: Column Order Will Make or Break You

Partial Indexes and Why Selectivity Matters More Than You Think

EXPLAIN ANALYZE: Stop Guessing, Start Measuring

When Indexes Hurt Performance (Yes, This Happens)

PostgreSQL Indexing Strategies for the Real World

MongoDB Indexing Strategies That Actually Work

An Indexing Checklist From Five Years of Mistakes

Written by Anurag Kumar

Related Articles

MongoDB Aggregation Pipeline: From Beginner to Advanced Queries

PostgreSQL with Node.js: Getting Started with pg and Connection Pooling

Redis Caching Strategies for Node.js Applications

Comments (0)