ActiveRecord query log tags for GraphQL

I recently heard of a Rails 7 config option called config.active_record.query_log_tags_enabled. The feature adds comments to the SQL queries that Rails generates, like this:

/*action='index',application='MyApp',controller='myrecords'*/
SELECT "myrecords".* FROM "myrecords" WHERE "myrecords"."user_id" = 200 AND...
/*application='MyApp',job='myrecordjob'*/
SELECT "myrecords".* FROM "myrecords" WHERE "myrecords"."user_id" = 199 AND...

For readability I’ve split the comments and the queries into separate lines (but they’ll be on a single line in your DB logs) and I’ve turned on ActiveRecord::QueryLogs.prepend_comment = true so that the comments appear before the query. By default the logs are added at the end of the query.

These comments are visible in Rails application logs, but that’s not very helpful on its own because many production Rails deployments will decrease log levels to the point where ActiveRecord queries are invisible.

More importantly, the comments are actually sent with the SQL queries to my database server, so anything monitoring my queries will also see this metadata.

This is tremendously useful for working with database monitoring like Amazon’s RDS Performance Insights, where I can see what SQL queries are causing the biggest problems, but I can’t identify what application code generated those queries.

Improving on the defaults

By default Rails will log the application, controller, controller action, and (under ActiveJob) job name¹. That’s pretty good for most cases, but many Rails apps generate queries that aren’t usefully described by this data. The two I’ve seen so far are Sidekiq jobs running without ActiveJob, and GraphQL with the graphql-ruby gem, which will generate lines like this:

/*action='execute',application='MyApp',controller='graphql'*/
SELECT "myrecords".* FROM "myrecords" WHERE "myrecords"."user_id" = 200 AND "myrecords"."saved" = TRUE

No matter what graphQL operation is being executed, the generated SQL queries all receive the same comment, because GraphQL requests offer infinite variety from the same endpoint.

Logging graphql-ruby’s resolvers

graphql-ruby offers tracing functionality, which can be enabled like this:

class MySchema < GraphQL::Schema
  # normal schema stuff
  mutation(Types::MutationType)
  query(Types::QueryType)
  # tell graphql-ruby what code to use for tracing:
  trace_with(QueryTrace)
end

The module passed to trace_with just needs to implement a graphql-ruby lifecycle hook:

module QueryTrace
  def execute_field(**args)
    if args[:field].resolver
      ActiveSupport::ExecutionContext.set(graphql_resolver: args[:field].resolver)
    end
    super
  end
end

There is some nuance here about which of several GraphQL lifecycle hooks to choose that I’ll discuss later. Now that this data is in ExecutionContext, I can configure query_log_tags to use it:

# in config/application.rb
config.active_record.query_log_tags_enabled = true
config.active_record.query_log_tags << :graphql_resolver

Now my SQL queries generated by graphQL operations look like this:

/*...,graphql_resolver='Queries::AppVersions'*/
SELECT "app_versions".* FROM "app_versions"
/*...,graphql_resolver='Queries::SavedRecords'*/
SELECT "myrecords".* FROM "myrecords" WHERE "myrecords"."user_id" = 200 AND "myrecords"."saved" = TRUE

Some caveats to review

There are some things worth knowing before following this advice.

Most execute_field invocations do not have a resolver

The QueryTrace#execute_field method is invoked once for every field in a query, but most likely only has an explicit resolver object to log about for root nodes:

query MyQuery {
  savedRecords {  # savedRecords has explicit resolver Queries::SavedRecords
    # id and saved are implicitly resolved
    id
    saved
  }
}

execute_field will execute once for each field here, but ActiveSupport::ExecutionContext.set calls are ignored when nil is passed as a value. With this GraphQL query, I’ll only get a tag for Queries::SavedRecords, but that should still be enough detail to narrow down my search a lot.

Why not use other hooks that are executed less often?

For what it’s worth, when I enabled this on a GraphQL API that receives 50 queries per second, logging per field did not increased the response times.

If your traffic is much heavier than that, or you’re letting clients send particularly heavy queries, you might want to investigate a lifecycle hook that is invoked less often than execute_field. execute_query or analyze_query (see the documentation) execute only once per query, but here’s why I don’t use them:

The information available at execute_query time does not include resolvers, so I’d have to log something like graphql_root: query.selected_query.selections.map(&:name), which is a lot of graphql-ruby implementation detail to depend on.
If logging only once for a GraphQL query with multiple roots, all the SQL would be tagged with all of the roots regardless of relevance. The query tag would look like graphql_root='savedRecords,appVersions'.

Why not just use GraphQL’s integration with ActiveSupport notifications?

GraphQL offers GraphQL::Tracing::ActiveSupportNotificationsTrace, which would then allow me to hook into ActiveSupport notifications like graphql.execute_field, rather than writing my own custom QueryTrace module. Unfortunately I found the timing of the ActiveSupport notifications to be unreliable, sometimes running after a query completed (and so the SQL query was generated without the added tags).

Conclusion

If I had my way, active_record.query_log_tags_enabled would be enabled by default in Rails, but I gather that it disables prepared statements, which are important for some applications. For my own applications, adding it is an automatic must, along with the additional configuration necessary for cases like GraphQL or sidekiq.

Recent versions of Rails (7.2 and 8+) also include support for optionally logging :source_location which will output file and line number of the query. I haven’t used this option hence not discussing it here, but I’d view it as complimentary to the logging suggested here rather than a replacement. The linked PR also contains a performance warning about enabling it in production. ↩