09. Buffered Queries

Buffered Queries

Let’s say you have 1,000,000 records and you want to write them to a CSV file. How would you start?

If we were to run SELECT * FROM Books, your application is going to be unresponsible until it’s pulled all the records from the database and put them in memory.

What happens if you don’t have enough memory to allocate that? Well, everything will fall over.

This is exactly the kind of situation where buffered queries become super useful!

There’s two separate places where these “buffers” can be thought about - the most important to note is that of server buffering, the second is that of extension buffering.

Buffering from the Server

Let’s say that we have a 10GB of book data and a server with 2GB of memory. Our task looks impossible. This is where cursor based querying comes in.

Using PostgreSQL?

You continue to be blessed with a beautiful database choice. In order to do this, you want to use cursors In order to use it, you’ll need to:

start a transaction! Cursors can only be used within transactions
create your cursor

DECLARE my_custom_cursor_id_532432 CURSOR FOR SELECT * FROM Books

repeatedly fetch chunks of your data using a query like the following until no more data is returned.

FETCH 1000 FROM my_custom_cursor_id_532432

end the transaction

This will allow you to slowly chunk through all of the data, avoiding ever overloading your local server with too much data.

Using MySQL?

Look into MYSQL_ATTR_USE_BUFFERED_QUERY - I think this handles it transparently, but honestly - I haven’t tested it and it’s been a long time since I’ve had to do it for MySQL. This is also very PHP/PDO specific instructions, I apologise.

Buffering from Database Extension

This is drawing from my experience with PHP - whether this is relevant to your specific language will need some investigation.

A typical way you might fetch data from a database in PHP could look like this:

$statement = $pdo->prepare("SELECT * FROM Books");
$statement->execute([]);
$results = $statement->fetchAll(PDO::FETCH_ASSOC)
// $results = [["id" => 1, "name" => "MyBookName"],...]

By fetching all the data at once from MySQL, we now have all the data in memory in two places - we have it within the MySQL extension and we have it within the PHP array object.

While (I believe) the memory will be cleared from MySQL as soon as $statement is garbage collected, if we take it that the data retrieved was 1GB in size, we’d have ballooned our memory usage to ~2GB.

As such, if you’re working with large datasets, consider using something similar to generators to buffer this instead:

$statement = $pdo->prepare("SELECT * FROM Books");
$statement->execute([]);
while ($result = $statement->fetch(PDO::FETCH_ASSOC)) {
    yield $result;
}

Assuming we’re writing the book to the CSV immediately, the max memory usage here is still going to be barely above 1GB.