Engineering

The MDX brace trap

Junaid Siddiqi (Jay), Principal · 13 June 2026 · ~11 min read

Clawdemy is an educational curriculum site that ships hundreds of original lessons across around two dozen tracks. The content is authored in MDX, the format Astro uses when a page mixes Markdown prose with embedded components. The drafting is done by AI coding agents in parallel; the editing and the review are mine.

This is the engineering retrospective on a class of failure I did not see coming when I picked MDX: the way a Markdown file that reads perfectly clean can refuse to parse, the way the failure modes branch into sub-classes that grew faster than I could write linters for them, and the discipline that eventually turned the build from an evening of unwind work into a reliable gate.

This piece is for anyone shipping LLM-authored Markdown or MDX content at any scale. The list at the heart of it is the ten classes I have personally caught. I expect to find an eleventh and I do not expect it to be the last.

The first time the build refused

The pattern that hooked me on this problem was the first time a track-level build refused, and the locus of the failure was not obvious from the error.

The MDX compiler takes a Markdown file with embedded expressions and produces a React component. Anywhere a literal brace appears in the source, the compiler tries to read it as a JavaScript expression. Most of the time, in prose, the braces in question come from things the author wrote that look like prose to a human reader and look like a malformed JavaScript expression to a parser. A set of options written in math notation as {A, B, C}. A LaTeX subscript written as theta_{old}. A conditional density written as q(x_t | x_{t-1}). None of these are valid JavaScript. The error messages the compiler emits are the ones that brought you to this article if you came here from a search: Could not parse expression with acorn, Could not parse import/exports with acorn, Unexpected character in expression, with a line number that is often correct but unhelpful, because the author did not write JavaScript.

The first failure I hit was a set notation. Three letters in braces, in prose, in the middle of a paragraph about decision options. The lesson read clean. The grep for unbalanced fences was clean. The frontmatter was clean. The build refused. The fix was to wrap the offending characters in backticks so the compiler treated them as inline code instead of an expression. Total fix time: ninety seconds. Total diagnosis time: forty minutes, because I did not yet know what I was looking at.

The lesson at this stage was small but load-bearing: a Markdown file is a tree of escape contexts, and the compiler enforces the contexts whether the author was thinking about them or not.

Precision grep is half the answer

My first defense was a precision grep. I wrote a script that searched lessons for unwrapped braces in prose. It ignored braces inside fenced code blocks, ignored braces inside backticked spans, and ignored braces in frontmatter. It flagged anything else.

This caught the next ten or so authoring failures. It made the build pipeline feel safer. It was also, in retrospect, the thing that gave me false confidence.

The script knew about the failure classes I had personally seen. It enumerated them by pattern. The compiler did not enumerate. The compiler either parsed the file or it did not, and any time a new failure class showed up that I had not added a pattern for, the script passed it clean.

The drift this caused was that I started trusting the script as if it were the build. It was not. It was a heuristic against a finite list of known classes, and the LLM authors I was relying on were creative enough about prose to introduce new classes faster than I could write patterns. The build kept catching things downstream that my pre-commit grep had cleared.

The discipline that fixed this was not a better grep. It was demoting the grep to a heuristic and promoting the actual build to the first gate. The grep still runs. It catches the common classes cheaply, which saves time. But the grep is no longer the decision point. The build is. bun astro build runs at the executor’s stage, before the work even reaches the lead. The build is no longer the last thing before promotion; it is the first thing after drafting.

The general principle, in a form I now write for any LLM-authored-content workflow: static analysis cannot enumerate the parse classes of a strict consumer. The consumer can. Put the consumer at the front of the pipeline.

A taxonomy that grew under me

Once the build was at the front, I started cataloging the classes the build caught that my grep had missed. The list grew from two classes to ten across four months. It is worth writing down, because the LLM authoring patterns that produce each one are predictable, and the fixes are usually obvious once the class is named. The hard part is naming the class.

These are the ten I have personally caught.

One. Set or tuple notation. Prose about decision options or combinatorics often produces literal braces in the middle of a sentence. The compiler reads the brace as the start of an expression.

Broken:  The agent chooses from {explore, exploit, idle}.
Fixed:   The agent chooses from `{explore, exploit, idle}`.

Wrap the offending span in backticks, or rewrite in plain prose.

Two. LaTeX subscripts. A symbol like π subscript theta-old renders in mathematical writing with a brace as the LaTeX argument. The compiler does not know that.

Broken:  The policy π_{theta_old} is updated in place.
Fixed:   The policy `π_{theta_old}` is updated in place.

If you are not rendering LaTeX in MDX, write the subscript as plain text or backtick it. If you are, the expression has to live inside the renderer’s delimiters and not loose in prose.

Three. Conditional densities. Statistics writing produces expressions like q of x at time t given x at time t minus one, often with the given written as a pipe. Same brace problem.

Broken:  We sample from q(x_t | x_{t-1}) at each step.
Fixed:   We sample from `q(x_t | x_{t-1})` at each step.

Four. Nested LaTeX inside ordinary parentheses. A sentence that mentions a function followed by an indexed argument, written as ordinary math, breaks even when the outer parens are prose.

Broken:  Compute F(x_{i+1}) for each step i.
Fixed:   Compute `F(x_{i+1})` for each step i.

The compiler reads the inner braces as JavaScript and fails on the outer context.

Five. Em-dashes inside quoted attribution. This one looks like a typographic quirk and is not. When a lesson quoted an instructor with an em-dash for the attribution, certain combinations of surrounding punctuation produced an unparseable inline span. The fix is to drop the em-dash. I now strip em-dashes from all authored prose for unrelated reasons, but this is the class that first made me suspicious of them.

Six. YAML colon-space in frontmatter. Frontmatter is YAML, and YAML treats a colon followed by a space as a key-value separator.

Broken:  ---
         title: Chapter 1: Foundations
         ---

Fixed:   ---
         title: "Chapter 1: Foundations"
         ---

Easy, common, silent on grep, fatal on build.

Seven. Bare URLs that auto-link. Markdown auto-linking is a parser-level transform. A bare URL in prose gets wrapped as a link automatically. When the URL contains characters that the link parser treats specially, the auto-link fails, and the failure looks like a structural break in the paragraph. The fix is to use the explicit link syntax [text](url), never bare.

Eight. Inequality signs. A sentence that mentions n less than 10 written naively breaks the compiler, which reads the less than as the start of an HTML or JSX tag.

Broken:  Holds whenever n < 10 across all batches.
Fixed:   Holds whenever `n < 10` across all batches.
         (or: Holds whenever n &lt; 10 across all batches.)

Nine. Bloom-tag gaps. This one is local to our authoring contract: every lesson has a structured frontmatter field listing the Bloom’s taxonomy levels the lesson exercises. A typo in that field (a level name not in the allowed set) makes the page render but fails downstream validation. Not strictly a parse class, but it lives in the same gate and I count it because it cost me the same kind of time.

Ten. Agent tool format leakage. This is the class I did not believe existed until the lead found thirty-six instances of it in a single track. When an LLM author drafts a file using a tool-calling protocol, the protocol’s own serialization markup, the closing tags it uses to delimit its tool calls and parameters, can occasionally leak into the file content.

Broken:  The agent then calls send().</content>

         <next_paragraph>
         The reply arrives in the inbox.

The tags look like XML. They read like XML. They are silent to math greps, silent to URL greps, silent to YAML greps. They break the build because the MDX parser treats them as malformed JSX. The defense is a specific grep against the known leakage shapes, run after every LLM-authored draft and before any commit.

That tenth class was the one that taught me the most. The first nine classes are failures the author would have produced as a human, given the same prose. The tenth is a failure of the authoring protocol, leaking into the content. It does not exist if a human typed the file. It exists because of how the agent’s runtime encodes its tool calls, and a glitch in that encoding can persist into a written file. I would not have caught it by inspection. The lead’s pre-commit grep caught it on a staging build. It is now part of the standing pre-commit gate for any agent-authored file in the project.

The defense that finally worked

I spent too long believing the right defense for this was a better lint. It was not. It was three layers, in order, each one doing exactly what it is good at.

The first layer is the cheap one: precision grep, run by the executor, against the ten known classes. The patterns are tuned to ignore frontmatter, fenced code, and backticked spans, so they fire only on prose-level violations. They catch the common classes in a few milliseconds. They are not the gate. They are the cheap filter.

The second layer is the load-bearing one: the actual production build, bun astro build, run by the executor before any push. If the build fails, the work does not leave the executor’s workspace. This is the gate that decides whether the work is shippable at all. It is slower than the grep. It does not enumerate failure classes. It either parses or it does not.

The third layer is the cross-check: the lead agent runs a staged build at its own gate, against a fresh checkout, with the executor’s work integrated. This catches the case where the executor’s local environment differed from a clean build, which is the case that bit me three or four times before I added it. After the staged build, the review team runs its five-gate review, and only then does the work promote.

The static grep did not get deleted. It got positioned as the heuristic at the front. The build became the gate. The lead’s staged build became the safety net. None of the three is redundant, and removing any of them lets a class of failure through. I tried, more than once, to consolidate them. Each consolidation eventually let something through that one of the other layers had been quietly catching.

What the parser knows that you do not

The deeper realization, the one that this work taught me beyond MDX specifically, is that any sufficiently strict consumer has a grammar that the producer cannot fully enumerate. The MDX compiler has a grammar that mixes Markdown, JSX, JavaScript expressions, YAML frontmatter, and Astro-specific extensions. Each of those layers has its own escape contexts. The braces from set notation are JavaScript. The pipe in a conditional density is JavaScript syntax. The em-dash in a quote breaks because of how the inline parser is tokenizing the surrounding punctuation. The agent tool format leakage breaks because of how JSX is recognized in the source.

A static grep can encode my best guess about what the consumer cares about. The consumer is the only thing that knows the full answer. Trying to recreate the grammar in static checks is reinventing the consumer, and you will be wrong about edge cases the original consumer is right about.

The discipline that follows from this generalizes. Whenever you are generating content for a strict consumer, whether that is an MDX compiler or a JSON schema validator or a database constraint checker, put the consumer at the front of your pipeline, not at the end. Did the work parse and did the work look right are different questions, and a heuristic answer to the first one is worth what you paid for it. Build first, lint later.

What I would tell someone shipping LLM-authored content tomorrow

Three things, in order of how often they would save you.

First, run the production build before the human review, not after. The build is the cheapest authoritative check you have, and it catches a class of failure that no inspection will. Make it the first gate, not the last. Read failure messages calmly when they happen; the line number is usually right, even when the locus is not obvious.

Second, expect new failure classes to appear faster than you can write patterns for them. Maintain a list of the classes you have seen, write grep patterns for the cheap ones, and keep the actual parse check at the front of the pipeline. The grep is a heuristic. The parse is the truth.

Third, and this is the one that I would not have believed if someone had told me at the start: LLM-authored content has failure classes that have no analogue in human-authored content. The agent tool format leakage class is the cleanest example. It exists because of how the model’s runtime encodes its work, not because of any prose the model wrote. You will find more of these as you ship more LLM-authored content. The right posture is to expect them, hunt for the leakage shapes specifically, and add them to the cheap-filter layer. The hunt is faster than you think. The cost of skipping it is real.

The build is now the part of the pipeline I worry about the least, which is what infrastructure that earns trust feels like. It got there by being placed at the front, not by getting smarter.