Amazon Aurora MySQL Parallel Query: Beware

I'll start by saying that I'm a big fan of Amazon Aurora. Plain old RDS shares a lot of the benefits by removing the 'undifferentiated heavy lifting' of running your own database cluster on EC2 but Aurora goes further by allowing you access to extra smarts such as:
Removing the need to provision and manage storage due to the shared layer native to Aurora
Running multi-master in a single region
Running a master in one region and having continual, millisecond latency replication to other regions with the ability to initiate swift failover via Global Database
Supporting the concept of blue/green deployments of your database cluster
Vastly and opportunistically improving the performance of certain queries by offloading them from the database engine itself to the shared storage layer beneath
It is the last feature in that list that I want to talk about today.
The general recommendation from AWS before embarking on any compute migration (or greenfield deployment) is to experiment to find the sweet spot for performance vs. cost, i.e. right-sizing your infrastructure. I'm here to tell you the same is absolutely true for the Parallel Query functionality in Amazon Aurora for MySQL.
What is Parallel Query?
AWS pitch it as below:
While some databases can parallelize query processing across CPUs in one or a handful of servers, Parallel Query takes advantage of Aurora’s unique architecture to push down and parallelize query processing across thousands of CPUs in the Aurora storage layer. By offloading analytical query processing to the Aurora storage layer, Parallel Query reduces network, CPU, and buffer pool contention with the transactional workload.
Essentially rather than chew up CPU cores on the Aurora writer or reader nodes, your query will (if chosen) be pushed down to the storage layer thereby freeing up resources to handle more concurrent queries.
Why should I beware?
Whilst Parallel Query is disabled by default but as it is something which appears on the surface to be entirely beneficial (who wouldn't want increased performance and reduced CPU contention!) you may want to toggle it on. It works in conjunction with the Query Optimiser present in native MySQL and variants to decide the most optimal path to take to retrieve your query results most efficiently; the optimiser itself is relatively opaque to the client. In my experience, it can slow down your query response times and cause a significant increase in your monthly AWS spending if your schema and dataset don't fit the profile.
In my testing, this has been especially relevant on large tables with extensive column indexes. I would see queries that would depend almost exclusively on indexes be targeted for pushing down to the storage layer when it would have taken a much shorter period to complete if the query had taken the non-Parallel Query route. In some cases, the query took over 900x longer using Parallel Query rather than having this functionality disabled and resorting to traditional query optimiser behaviour.
What is also not immediately obvious is this has a direct impact on your AWS bill; Aurora charges both for compute time and IOPS to the storage layer, and whilst queries made to indexes that are already loaded in memory can be satisfied without generating read traffic to the storage subsystem, every query pushed down to storage results in read IOPS.
How to experiment
AWS document how to go about kicking the tyres on Parallel Query at https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html#aurora-mysql-parallel-query-sql-explain. Different variables need adjusting depending on the major version of Aurora MySQL you are running, but enabling this at a session level allows you to experiment without downtime.
How much could it cost?
This will vary drastically based on your workload, request volume, row count and query and schema complexity.
The cost factor only gets a cursory mention in the AWS documentation:
If your Aurora MySQL cluster uses parallel query, you might see an increase in
VolumeReadIOPSvalues. Parallel queries don't use the buffer pool. Thus, although the queries are fast, this optimized processing can result in an increase in read operations and associated charges.
In my experience, turning off Parallel Query support (after demonstrating it did not provide any performance improvement for a given dataset) reduced a production AWS bill by $4k a month.
Conclusion
There are undoubtedly use cases where Amazon Aurora Parallel Query makes sense. Examples show that in best-case scenarios it can dramatically improve query responsiveness and offload work from your Aurora cluster nodes, but it does not fit every use case, every query and every workload.
The worst-case scenario is what I experienced - slower query response times and a much larger AWS bill for the privilege of using it.
