From Terraform Diff to Infrastructure Risk Review

A closer look at Terraform Review Agent, an open GitHub Actions workflow that combines Terraform scanners, Infracost, LangGraph, and LLM-generated explanations into one pull request review comment.

Terraform pull requests are deceptively hard to review.

The diff may only be twenty lines, but those twenty lines can change network exposure, IAM permissions, storage encryption, logging, backups, and monthly cloud spend. A reviewer has to answer several questions at once:

Is this secure enough to merge?
Will it cost more than expected?
Does it follow the team’s Terraform conventions?
Are the scanner warnings real problems or just noise?
Should this block the pull request or simply inform the reviewer?

Most CI pipelines answer those questions badly. They scatter logs across separate jobs, bury important findings inside raw scanner output, or rely on reviewers to remember which tool checks which class of risk.

Terraform Review Agent is our attempt to make that review loop more usable. It is a reusable GitHub Actions workflow that reviews Terraform pull requests and posts one ranked comment with security, cost, and style findings.

The goal is not to make AI responsible for infrastructure approval. The goal is to put better evidence in front of the humans who already approve infrastructure changes.

The Problem: IaC Review Has Too Many Signals

Infrastructure-as-code review usually starts simple.

A team adds terraform fmt. Then it adds tflint. Then a security scanner. Then another security scanner because the first one misses a different class of issue. Then cost estimation becomes important. Eventually, the pull request has five jobs and the useful information is spread across five places.

That creates a workflow problem:

Developers see a failed check but not the business impact.
Reviewers skim logs instead of reviewing a concise risk summary.
Low-value findings sit next to high-risk findings with the same visual weight.
Cost changes are often reviewed separately from security and style.
Teams either block too much or block nothing.

The missing piece is not another scanner. The missing piece is orchestration.

Terraform Review Agent treats the pull request review as one workflow with multiple evidence sources. The scanners still do the detection. The agent collects the evidence, normalizes it, ranks it, and turns it into a review comment that people can actually use.

How the Workflow Runs

A consuming repository adds a small workflow that calls the reusable reviewer:

jobs:
  terraform-review:
    uses: infiniumtek/terraform-review-agent/.github/workflows/terraform-review.yml@v1
    permissions:
      contents: read
      pull-requests: write
    with:
      llm-provider: anthropic
      llm-model: claude-sonnet-4-5
      fail-on-severity: high
    secrets:
      anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
      infracost-api-key: ${{ secrets.INFRACOST_API_KEY }}

The workflow is triggered by Terraform file changes on pull requests. It checks out the PR merge ref, runs inside a prebuilt container, invokes the LangGraph agent, and posts or updates a single sticky comment.

That sticky-comment detail matters. A bot that leaves a fresh comment on every push quickly becomes part of the noise. This agent uses a hidden marker, finds its previous comment, and edits it in place.

The result is a review surface that stays current without cluttering the conversation.

Three Review Lanes, One Comment

The LangGraph workflow fans out into three specialist lanes:

pull request context
  -> security lane: tfsec + checkov
  -> cost lane: infracost diff
  -> style lane: tflint + terraform fmt
  -> aggregate, rank, render, comment

Each lane owns a different kind of evidence.

The security lane looks for risky cloud configuration, insecure defaults, missing encryption, public exposure, and policy mistakes. It uses tfsec and checkov because each scanner catches different Terraform patterns.

The cost lane uses infracost diff to compare the PR against the base branch. It can show both the total monthly estimate and the monthly delta introduced by the pull request.

The style lane catches Terraform lint and formatting issues with tflint and terraform fmt -check. Those findings are lower drama than security or cost, but they still matter because consistent Terraform is easier to maintain.

The aggregator then removes duplicates, sorts by severity, and renders the final comment. Critical, high, and medium findings are visible immediately. Low and informational items are collapsed so they do not crowd out the important issues.

The Model Explains. It Does Not Decide.

The most important implementation detail is the boundary around the LLM.

The scanners decide:

whether a finding exists
where it appears
which rule produced it
what severity it has

The model is only allowed to rewrite the message and suggestion so the finding is easier to understand. It cannot move the finding to another file, change severity, or invent scanner-backed evidence.

That boundary is what makes the workflow appropriate for infrastructure review. We do not want a model hallucinating a critical finding or quietly downgrading a real one. We want deterministic tools to produce the facts, then use the model to make the facts easier for people to act on.

This is a pattern we expect to use often in production AI systems:

Let rules, APIs, scanners, and databases provide the source of truth.
Let the model help with synthesis, wording, prioritization, and explanation.
Keep the final action visible to a human when the workflow affects real systems.

That is much more useful than treating the model like an all-purpose judge.

What a Reviewer Sees

The pull request gets one comment titled Terraform Review Agent.

At the top, the comment summarizes the number of findings, affected files, severity counts, and which lane produced each issue. If cost data is available, the Infracost estimate appears as a short callout with total monthly cost and the PR delta.

The rest of the comment is organized by severity. A high-risk item might show:

the severity and lane
a short description of the issue
a practical remediation suggestion
the scanner rule id
a link to the exact file and line

This turns scanner output into review language. Instead of asking a reviewer to open a CI log and mentally connect the dots, the comment says what changed, why it matters, and where to look.

The reviewer still makes the call. The agent just reduces the amount of hunting required to make that call.

Configurable Gates Without All-or-Nothing Adoption

Infrastructure teams rarely agree on day one which findings should fail CI.

Some teams want critical security findings to block immediately. Some want high and critical blocked. Others want the first rollout to be advisory until they tune scanner rules and build trust with developers.

Terraform Review Agent supports that through fail-on-severity:

with:
  fail-on-severity: high

Set it to none and the agent only comments. Set it to critical, high, medium, low, or info, and the workflow fails when a finding meets or exceeds that threshold.

That gives teams a practical adoption path:

Start in comment-only mode.
Review the signal for a few pull requests.
Tune repository scanner configuration where needed.
Begin blocking on critical or high findings.
Tighten the policy as confidence improves.

This is how security automation should roll out. Visibility first, enforcement second.

Why LangGraph Fits This Shape

This project does not need an autonomous agent that freely chooses tools. It needs a controlled workflow with parallel branches, typed state, and a predictable end result.

LangGraph fits because the review has distinct steps:

load pull request context from GitHub
prepare Terraform file payloads
run scanners in separate lanes
optionally ask the LLM to improve wording
merge lane output into one state object
render Markdown
upsert the GitHub comment

That is not a conversation. It is a graph.

The graph also makes the system easier to reason about. Each node has a small responsibility. Security findings, cost findings, and style findings live in separate state fields until aggregation. Tests can exercise each node independently without needing a real pull request or live scanner call.

For business workflows, that structure matters more than the novelty of the model. The workflow has to be inspectable, testable, and boring enough to trust.

Built for Repeatable Platform Use

The reviewer runs in a container image that already includes pinned versions of Terraform, tfsec, tflint, Infracost, and Checkov.

That solves a common platform-team problem: every repository should not have to rediscover scanner installation, version pinning, permissions, comment formatting, and cost baseline behavior.

With a reusable workflow, the platform team can improve the reviewer once and roll that improvement across many repositories. Application teams only need the small caller workflow and the relevant secrets.

The workflow also supports OpenAI, Anthropic, and Google model providers through configuration. That keeps the project adaptable without adding provider-specific logic throughout the review code.

The important part is repeatability. A Terraform PR in one repo should receive the same class of review as a Terraform PR in another repo, without every team maintaining its own copy-pasted automation.

Where This Helps Most

Terraform Review Agent is a good fit for teams that already use pull requests as the approval point for infrastructure changes.

It is especially useful when:

multiple repos contain Terraform
senior engineers are overloaded with review requests
cloud cost changes are easy to miss
scanner results exist but are not easy to consume
platform teams want shared standards without owning every application repo
security wants earlier visibility without becoming a manual approval bottleneck

It is not a replacement for deeper architecture review. It will not understand every business exception. It will not tell you whether the infrastructure design is the right design.

But it can catch common risks, summarize concrete evidence, and make the pull request conversation sharper.

That is enough to be useful.

What This Project Shows

Terraform Review Agent is a narrow project by design. It reviews Terraform pull requests and posts one useful comment.

That narrowness is the point.

The strongest AI automation projects usually do not start with a vague mandate to “add AI.” They start with an annoying workflow that already exists, identify where judgment and evidence are getting lost, and then build a controlled system around that decision.

For Terraform review, the decision is simple: should this infrastructure change merge, and what should the reviewer know before approving it?

The agent does not replace that decision. It makes the decision better informed.

That same design pattern applies to many technical workflows:

release readiness reviews
cloud configuration audits
CI/CD policy checks
dependency risk summaries
ticket triage with evidence collection
compliance checks before a human sign-off

Good AI automation is not just model access. It is workflow design.

Check Out the Project

Terraform Review Agent is open here: https://github.com/infiniumtek/terraform-review-agent

If your team is reviewing infrastructure changes manually, or if your CI output has become too noisy to be useful, this project is a concrete example of how to bring scanners, cost estimation, GitHub Actions, and LLM-assisted explanation into one controlled workflow.

Interested in building a focused AI workflow for your engineering or operations team? Schedule a Digital Health Check and we can help identify the review loops where automation would create real leverage without taking control away from your team.

This post was last reviewed and updated in May 2026. Terraform tooling, LLM platforms, and GitHub Actions features continue to evolve, but the operating principle is stable: use deterministic tools for facts, use AI for synthesis, and keep humans responsible for production decisions.