Home | RuneSec

Every enterprise security team is being told to adopt AI. Build an agentic SOC. Automate triage. Let LLMs reason about your detection coverage.

Here's what nobody is saying out loud: AI agents can't actually read your detection library.

Your detections live in SPL, KQL, EQL, YAML, and TOML files scattered across repos, plus proprietary detection logic locked inside vendor platforms like CrowdStrike, Microsoft, and Palo Alto that you can't even inspect. They're full of unresolved macros, stale descriptions, wrong MITRE mappings, and runbooks that haven't been touched since someone left the team two years ago. These formats were designed for humans and SIEMs, not for LLM consumption. So when an AI agent tries to assess your detection coverage, it does the only thing it can: it trusts the metadata. It hallucinates investigation steps from outdated runbooks. It tells you you're covered for T1059.001 because three rules are tagged to it, without understanding that all three detect the same narrow behavioral pattern and miss dozens of others.

The Agent Data Layer

The problem isn't the models. It's the data layer feeding them. There's a missing piece of infrastructure between your detection rules and AI-powered security operations: an Agent Data Layer that translates what your detections actually do into agent-optimized, quality-assessed metadata that any AI model can reliably consume.

The framework is called Rune. It's open source, Apache 2.0, and launching soon on GitHub.

Rune is SIEM-agnostic, supporting Splunk, Microsoft Sentinel, and Elastic. It's model-agnostic at every level of the stack, whether you run frontier models like Claude, GPT, or Gemini through hosted APIs, deploy through cloud-segmented infrastructure like AWS Bedrock or GCP Vertex, or run local LLMs on your own hardware through Ollama. The framework doesn't care what you use. It bolts onto your existing detection repos without changing a single thing about how you work today.

Five Quality Dimensions

A multi-agent scanner pipeline reads your detection code, resolves dependencies, analyzes query logic, researches threat context, and produces agent-optimized metadata assessed across five independent quality dimensions:

Anchor Depth: The stability of the behavior targeted, from immutable system artifacts to easily changed surface indicators.
Logic Resilience: How well the query logic holds up against adversarial evasion, assessed across four distinct vulnerability classes that cover everything from format manipulation and alternative execution paths to time-window exploitation and Boolean logic flaws.
Behavioral Reach: The fraction of a technique's real-world attack surface this detection actually covers.
Signal Confidence: How reliably the detection separates genuinely malicious activity from benign noise, and whether it has been validated against real or simulated attack data. A detection that has never been tested against actual telemetry is a hypothesis, not a capability.
Operational Stamina: How sustainable the detection is in production, accounting for dependency health, documentation, and how recently it was reviewed against current threat intelligence. Adversaries evolve, schemas change, infrastructure shifts, and exclusion lists grow. Detections that aren't periodically reviewed against these changes degrade silently until an incident exposes the gap.

All of this happens at the behavioral pattern level, not the MITRE technique level. T1059.001 encompasses hundreds of distinct behaviors, and claiming coverage at the technique level is an illusion that gives teams false confidence in their defensive posture.

Detections also degrade for reasons that have nothing to do with the query itself. A vendor updates their log schema and a field your detection depends on stops populating. An infrastructure migration changes forwarding paths. Most teams don't discover these failures until an incident exposes the gap. The framework tracks field reliability and data source health so that degradation shows up in the quality scores before it shows up in a missed detection, without prescribing a specific review cadence, because a team reviewing quarterly has different constraints than a team reviewing monthly.

The scanner doesn't just assess quality in isolation. It can send the same detection to multiple LLM providers independently and score based on cross-model agreement rather than single-model certainty. If Claude, Gemini, and GPT all extract the same behavioral pattern from your query independently, that agreement significantly increases confidence in the result. If they disagree, it gets flagged. Every piece of scanner output also carries dual confidence scores measuring how certain the AI is in its own analysis, separate from the detection's quality scores. Output above 85% confidence is auto-approved. Output between 60-84% gets flagged for human review. Below 60% is rejected outright. The framework is built for teams that don't blindly trust AI output, because in this domain, you shouldn't.

The Detection Hierarchy

Rune organizes detection knowledge into a token-optimized hierarchy. Each level goes deeper, costs more tokens, and contains more detail. Agents load only what they need for the task at hand. In a world where context window efficiency directly translates to reasoning quality, giving an agent the minimum viable context for each task means more of its capacity goes toward actual analysis instead of parsing overhead.

DETECTION_INDEX.md is the top of the hierarchy. One file, one line per detection, covering the entire library. It includes the behavioral description, quality grade, limiting factor, and five-dimensional profile for every active detection. An agent performing coverage gap analysis loads this single file and sees your full defensive posture in seconds. 500 detections fit comfortably in a single context load on current models, and agents that need to work within smaller windows can filter or chunk the index without losing the coverage picture.

detection-summary.md is the per-detection summary in Markdown-KV format, optimized for LLM parsing accuracy. It contains the behavioral pattern description, full quality profile with sub-factor breakdowns, trigger conditions, uncovered variations, and MITRE mapping. An agent investigating a specific detection loads this file for the detail that the index entry doesn't include.

detection.yaml is the deep structured metadata. Full quality assessment with per-field anchor classifications, vulnerability class breakdowns with analyst notes, behavioral reach with explicit covered and uncovered patterns, signal confidence sub-factors, operational stamina sub-factors, hypothesis, threat model, scheduling, and tuning history. This is the file that powers trending, portfolio-level analysis, and schema validation.

runbook.md is the investigation guidance. Not numbered procedural steps that break the moment your tooling changes. Goals, risk criteria, and decision frameworks that an analyst or an agent can use to dynamically select the right investigative approach based on what they find. The runbook describes what a successful investigation looks like and what escalation criteria apply, not which buttons to click in which console.

Each level references the one below it. An agent starts at the index, drills into a summary for context, and only loads the full structured metadata when it needs the complete picture. This is the token efficiency argument in practice: you don't feed an agent 500 full detection files when a single-line index entry answers the question.

Environment-Aware Coverage

Quality scoring is only half the picture. A perfectly written detection is worthless if the telemetry it needs doesn't reach your SIEM.

Rune profiles your environment by mapping your actual data sources, asset inventory, and log forwarding topology against what each detection requires. It does not build or maintain an asset inventory. Instead, it consumes fleet counts from whatever source your team trusts, whether that's LDAP, your EDR console, a CMDB, Splunk ES Asset and Identity, a cloud provider API, or a manual estimate. The more accurate your fleet numbers, the more accurate the coverage calculations become.

Getting started is practical, not painful. Rune provides copy-paste SIEM discovery commands for each supported platform. Run them, paste the output into the helper script, and you get a draft environment profile with roughly 60% of the fields auto-populated: your data sources, field population rates, observed host counts, ingestion latency. The remaining 40% requires human input because the SIEM cannot know things like how many hosts should have a given log source, which tiers are forwarding which event types, or what your EDR deployment actually covers. That 40% is where the real operational knowledge lives, and no tool can guess it for you.

The scanner then computes effective coverage per detection by comparing observed host counts against your fleet inventory, identifying collection gaps where agents aren't installed, forwarding gaps where telemetry exists on the endpoint but isn't reaching the SIEM, and field reliability gaps where critical fields are truncated or missing. It also checks whether your EDR provides parallel coverage, and it distinguishes between alert forwarding and raw telemetry forwarding because those are fundamentally different capabilities. An EDR that forwards detection alerts to your SIEM gives you a structured event for correlation. An EDR that also forwards raw telemetry gives you full investigative context without pivoting to another console. Most environments forward alerts, but far fewer also forward the underlying telemetry, which means analysts often have to pivot to the EDR console for investigation even when the SIEM received the initial alert. The framework accounts for that distinction when calculating blind spots and suggesting remediation options with effort and impact ratings.

This produces three types of gap analysis that no other framework delivers together:

Quality gaps: this detection exists but has weaknesses. Behavioral Reach is Narrow, Logic Resilience is Partial.
Telemetry gaps: this detection is well-built but blind on 87% of your fleet because of a forwarding gap nobody documented.
Investment gaps: you're paying to ingest firewall logs, email gateway logs, or DNS logs but have zero detections covering any of them.

Quality scores are portable. They travel with the detection and work across any environment. Telemetry realization is computed locally, showing where your detections are blind in your specific infrastructure. A public repo has quality grades. Your private deployment adds an environment profile and discovers that a Grade A detection only covers 12% of the fleet.

The Scanner is Optional

The scanner automates the process, but the real value is the open schema standard itself. Teams can populate Layer 4 metadata by hand, adopt it incrementally, or run the scanner when they want to generate it across an entire repo. However you get there, the output is the same.

Once scores exist, the framework provides a clear prioritization path. Fix first: detections where the limiting factor is Logic Resilience or Signal Confidence, because those are fixable by improving the existing query. Build next: techniques where combined Behavioral Reach across all your detections is still Narrow or Pinpoint, especially where active threat intelligence says adversaries are targeting those gaps. Monitor: detections where Operational Stamina is trending downward, because those are the ones that will silently stop working while your MITRE heatmap still shows green. Individual detection scores also aggregate at the portfolio level, so three Narrow detections covering different behavioral patterns of the same technique can combine to Substantial coverage, and that combined view is what agents use for gap analysis against threat reports.

Repository-First Architecture

Detection knowledge belongs in your repository, not behind live API calls. The popular pattern right now is connecting agents to SIEMs via MCP, where tool definitions alone can consume tens of thousands of tokens before the conversation even starts. Rune takes the opposite approach: instead of bringing the agent to the SIEM at query time, you bring the detection knowledge to the agent ahead of time. The Markdown-KV format is designed to fit large detection libraries into a single context window, so agents spend their token budget on actual analysis instead of connection overhead and format parsing.

Foundations

Detection engineering as a discipline has been shaped by foundational work. David Bianco's Pyramid of Pain. Jared Atkinson's research on capability abstraction. MITRE ATT&CK's threat-informed defense model. SANS research establishing that detection engineering is a lifecycle discipline where detections decay from the moment they hit production, and that detection-as-code alone does not solve detection efficacy. The broader community of practitioners who've spent years writing detections, building coverage frameworks, and pushing the discipline forward. I built my career on these foundations. Rune is the next layer: the quality assessment and environment-aware coverage analysis that tells you which detections are strong, which are brittle, and which are blind, all in a format that both your team and your AI agents can consume.

I built and validated this against a live detection environment with real queries, real dependencies, and real alerts firing. The quality scores and coverage analysis come from working detections, not a design doc.

Once this layer exists, it becomes the foundation for everything that comes next: threat intel gap analysis, pen test coverage validation, autonomous alert triage, crown jewels risk assessment, detection drift monitoring. And the detection metadata stays in your repository, under your control, not locked inside a vendor platform that disappears when the contract ends.

What You’ll Get

Launch access. Be first to know when Rune drops on GitHub. Early subscribers get notified before anyone else.

Field Notes from a practitioner. I spend a lot of personal time keeping up with the intersection of AI and cybersecurity: new techniques, open-source tooling, agentic architectures, detection engineering patterns. When I learn something worth sharing, it goes here. Just a practitioner writing about what he's building and learning, nothing else.

Rune launches soon. Get on the list.

No spam. Unsubscribe anytime.

Who’s Behind This

Rich Alldrin is a cybersecurity practitioner whose background spans Air Force cyber warfare operations, threat hunting, digital forensics, and detection engineering across enterprise and managed security environments. He recently completed the SANS Incident Response Graduate program and holds multiple GIAC certifications.

Outside of work, he spends an unreasonable amount of time exploring how AI is reshaping security operations: building open-source tools, experimenting with agentic architectures, and trying to close the gap between what AI models can do today and what security teams are actually able to leverage. The detection engineering community has a missing infrastructure layer, and nobody seemed to be building it in the open. So here we are.

runesec.org