Bugs in applications & pinned plans in queries

At my place of work, one of the ways non optimal plans are handled , is by gathering stats. The other method is to use SQL profiles, by ‘pinning’ plans to queries. Oracle also supports SQL baselines another method of doing a similar thing.

This leads to problems as plans which work today, might not work two years down the line. (Think table stats evolution over time).

We recently figured a two year old pinned plan was causing a 12 minute job to run for 40 minutes. No one had figured this out because it is especially hard to debug jobs which run on the grid.

After excruciating monitoring steps, we were able to find that out of ~40 jobs which run on the grid, 1 of them was using this plan which caused this issue.

Ultimately the fix was just disabling this particular SQL profile, thus unpinning the plan.

Solutions to this problem

  1. Use pinning plans as the last resort, hard to do in practice
  2. Monitor all pinned plans periodically , also hard as it would need deep understanding of the schema as well as execution plans
  3. Use Baselines instead of SQL plans, could we have a guaranteed performance benchmark in this case? I don’t think so, might as well discard this solution.