Coalesce Automation Inc.

05/22/2026 | Press release | Archived content

Hands-on with Coalesce’s MCPs, Part 2: Building Data Engineering Agents with Skills

In part 1 of this series, we looked at how to use MCPs across the Coalesce suite, and how they let us speed up workflows like root cause analysis and tagging owners. In part 2, we go a level up and look at how to use skills to package those workflows into reproducible recipes that follow the same path each time.

Introduction to skills

A skill is a markdown file (SKILL.md) plus optional supporting files that provide instruction about how to perform a task. While skills are not unique to Claude, we'll focus on using them with Claude's approach for the remainder of this article. In short, a skill tells Claude what the task is, what tools to use, what good output looks like, and what to avoid.

Claude is designed to help identify which skills to use. When a conversation starts, Claude reads the skill description and automatically tries to match it against what the user is asking. You can also force a skill explicitly. Typing / brings up the available skills and lets you pick one directly.

Skills are only as good as the underlying data they can access, so the more context you can pass through MCPs, the better the skills you can build on top. Our Coalesce examples here rely heavily on this, using metadata around data transformations, governance definitions, and data quality.

Planning a skill suite

For data roles specifically, I like to think of data skills as jobs to be done. These can be grouped into areas of responsibility such as data quality, data modelling, analytics, data governance, and each job category contains specific jobs. This is powerful and for example lets us encode our testing philosophy as a skill so each time data engineers want to add new tests, they can do it with the skill that follows our standards (internally we've taken this a step further and encoded our data quality guide as a skill with step by step instructions for how to add high-impact tests).

As a rule of thumb, the more often you find yourself doing a workflow, and the more time it takes, the more likely it is to benefit from being converted into a skill.

Each skill is written with a description that helps Claude quickly navigate to the right one for the job. So when someone asks to "triage this issue," the model consistently points them to that skill and they get all the right context, instead of the model chaining LLM calls and reinventing the workflow each time. That's the benefit of building out a deliberate skill suite, and a good way to figure out which skills you should build.

Let's put it into practice by building an actual skill.

Deep dive: designing a skill for a weekly data quality report

A common challenge we see customers face is wanting a weekly data quality report. We've designed this inside our product, but customers often want bespoke metrics that fit their use cases.

This may look simple on the surface but designing this as a skill requires some deliberate choices. A few things to consider before we start writing the skill:

  • What should the structure of the report look like
  • What counts as a "data issue", are all severities included, or only error
  • Should it be actionable, tagging owners, linking to data products, suggesting next steps

We could start documenting all these steps manually but creating a skill for it is a much better idea.

Using skill-creator to create the skill

Claude's skill-creator is excellent for kickstarting most of the process and then iterating with your own feedback. It takes in a description of what you want and walks you through creating the skill

The prompt we used:

Use skill-creator to create a skill that generates a weekly data quality report from Coalesce Quality and posts it to Slack. Use when the user asks for a "weekly data quality report," "Coalesce Quality weekly summary," "data health digest," "weekly DQ recap," or anything that combines Coalesce Quality incidents/issues/test results with a Slack delivery. Covers open incidents, open issues, failing monitors, recent execution failures, and affected entities over the past 7 days.

There are three key parts to a good description: what it does + when to use it + key capabilities.

With this information, Claude will create a skill following the best practice structure, and using its best understanding of the tools available in the connected MCPs. Importantly for our example, it defaults to using prescriptive instructions for each step, by mentioning the tool name it uses. This helps us give confidence that each time it's run, it produces the same result, which is exactly what we want.

The better the description is, the better the first iteration of the skill is. Below are a few examples of descriptions that are not as good of a starting point.

"Create a weekly data quality report". This says nothing about when to trigger or what it contains, and leaves too much up to interpretation about which metrics to use.

"Use list_incidents… to create a weekly data quality report…". Specifying this level of detail is too technical for a first draft. Claude will automatically capture a lot of this from a good natural language description, and you can instead go back and edit the skill with these details later on.

The result

A few real-world things the first draft didn't handle, which we ended up editing in by hand:

  • Whether the report goes for review before being sent, or posts directly
  • Which Slack channel it should go to (and what happens when the default doesn't exist in the workspace)
  • Whether the time window is adjustable, what if we want a monthly report instead of weekly?

We also had to go back a few times on iterations around how the data is presented. For example, the initial report was too dense, and instead we adjusted the skill to send a brief summary and then reply with evidence in the Slack thread.

A simple weekly overview of ongoing data quality issues with details around specific failures, owners and recommended next steps.

The powerful thing about our data quality report is that it will look consistent next week when another person is on data ops rota and calls it. We can also schedule it to be run weekly to automatically send out an update. In other words, we've gone from using MCPs ad-hoc to a production grade report that's used as a key part of our data quality workflow.

From report to action with a triage skill

A weekly report tells you what is broken but a common workflow is that triaging of issues can still be a manual process of understanding if issues should be prioritized, routing to owners and pasting information into Linear tickets.

We built another skill called . It pulls the same Coalesce Quality data the report skill uses, but proposes Linear tickets instead of formatting for Slack.

The interesting design problem is how each issue gets triaged. The skill scores issues by severity, downstream impact (does it hit a P1 data product), and ownership status (is anyone already investigating), then assigns one of four actions: Create a ticket, skip it because someone's on it or it's already been filed, acknowledge it as known variance, or flag it for monitor tuning if it's a known noisy alert.

Watch the final skill in action

This goes to show how different skills can tie into each other and start automating manual parts of the workflow that would otherwise take a long time to do.

Evaluating our skill

We've now created a skill that's part of our workflow. Hopefully it saves the team hours every week, and more importantly, distributes a sense of ownership over data quality without anyone having to call it out explicitly.

In many cases, eyeballing the results gives us a good idea of how well it works. But sometimes it makes sense to approach it more systematically. When evaluating skills, it's most often worth doing through these lenses:

  1. Does it trigger on the right requests
  2. Does it produce what we expect it to
  3. How does it handle edge cases

Which of these matters most depends on the use case. For a weekly report where users can also explicitly invoke the skill, triggering accuracy matters less. For an agentic analytics skill that's expected to fire on any business question, triggering is critical. In our case, what we care about most is output quality and edge-case handling; the report should be good and still good when the data looks different than it does this week.

Putting the skill through actual evals

We ran a small benchmark: three test cases, each run twice, once with the skill loaded (Claude reads SKILL.md and follows it) and once without (Claude makes structural decisions from scratch using the same tools). Claude skill-builder comes with this baked in and can help generate these for you.

Each output was graded against five objective assertions per eval, does it contain a TL;DR, does it deduplicate the noisy monitor, does it call out P1 downstream impact, is it actually Slack-formatted, and so on.

Result from the test benchmark

In short, the skill provides us with better output (more of the assertions we're running are passing), but on average also adds time it takes to complete the task. The extra tokens are the price of the skill pulling more data (get_issue_impact per top issue, deeper execution history) and following a stricter template. For a once-a-week report, that's fine.

With Skills, the variance also drops significantly as they make each task execution much more predictable. Without one, Claude makes structural decisions fresh every time. Sometimes it writes a great report, sometimes it skips the TL;DR or forgets to deduplicate noise. With a skill, the template is enforced. That's the actual case for skills: without one, every run is a roll of the dice. Although we shouldn't read too much into these numbers with such a small sample, they still go to show the point.

Summary and what's next

Skills are great when the task is well-defined and bounded. They start to break when the workflow has multiple decision points, needs to integrate with ticketing or PR review, or has to coordinate across teams. That's where playbooks and integrated AI agents can help.

Part 3 picks up there: sharing skills across teams, instrumenting them so you can see which ones actually get used, and building a skill overview with evals and usage metrics, and using purposeful build agents built directly into the Coalesce suite.

Coalesce Automation Inc. published this content on May 22, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 26, 2026 at 07:53 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]