Let’s say you have 1,000,000 records and you want to write them to a CSV file. How would you start?
If we were to run SELECT * FROM Books
, your application is going to be unresponsible until it’s pulled all the records from the database and put them in memory.
What happens if you don’t have enough memory to allocate that? Well, everything will fall over.
This is exactly the kind of situation where buffered queries become super useful!
There’s two separate places where these “buffers” can be thought about - the most important to note is that of server buffering, the second is that of extension buffering.
Let’s say that we have a 10GB of book data and a server with 2GB of memory. Our task looks impossible. This is where cursor based querying comes in.
You continue to be blessed with a beautiful database choice. In order to do this, you want to use cursors In order to use it, you’ll need to:
DECLARE my_custom_cursor_id_532432 CURSOR FOR SELECT * FROM Books
FETCH 1000 FROM my_custom_cursor_id_532432
This will allow you to slowly chunk through all of the data, avoiding ever overloading your local server with too much data.
Look into MYSQL_ATTR_USE_BUFFERED_QUERY
- I think this handles it transparently, but honestly - I haven’t tested it and it’s been a long time since I’ve had to do it for MySQL. This is also very PHP/PDO specific instructions, I apologise.
This is drawing from my experience with PHP - whether this is relevant to your specific language will need some investigation.
A typical way you might fetch data from a database in PHP could look like this:
$statement = $pdo->prepare("SELECT * FROM Books");
$statement->execute([]);
$results = $statement->fetchAll(PDO::FETCH_ASSOC)
// $results = [["id" => 1, "name" => "MyBookName"],...]
By fetching all the data at once from MySQL, we now have all the data in memory in two places - we have it within the MySQL extension and we have it within the PHP array object.
While (I believe) the memory will be cleared from MySQL as soon as $statement
is garbage collected, if we take it that the data retrieved was 1GB in size, we’d have ballooned our memory usage to ~2GB.
As such, if you’re working with large datasets, consider using something similar to generators to buffer this instead:
$statement = $pdo->prepare("SELECT * FROM Books");
$statement->execute([]);
while ($result = $statement->fetch(PDO::FETCH_ASSOC)) {
yield $result;
}
Assuming we’re writing the book to the CSV immediately, the max memory usage here is still going to be barely above 1GB.