Why We Chose Structured Generation for Hands-Free Tree Risk Assessments
- Roger Erismann
- Feb 6
- 4 min read
Tree risk assessments don’t happen at a desk. They happen outside. In the rain. On uneven ground. Sometimes wearing gloves. Typing notes into a phone or navigating dropdown-heavy forms isn’t just annoying — it slows the work down and breaks focus. Paper notes aren’t much better when it’s wet or windy.
We wanted something simpler: let arborists talk, and let the form fill itself out.
This post explains how we are building that, and why we ended up using structured generation with schemas instead of free‑form LLM outputs.
The constraint: hands-free and accurate
From the start, we had two requirements that push in opposite directions:
Hands-free capture (voice notes instead of typing)
Structured, reliable form output
Voice input is naturally unstructured. Forms are the opposite — strict fields, booleans, numbers, and checkboxes. Bridging those two worlds is the whole problem. A normal “chatty” LLM that summarizes text isn’t enough. Summaries are flexible and interpretive. Forms aren’t.
If the arborist doesn’t say something, we cannot invent it. If they say “minor decay,” we cannot reinterpret it. So we needed something that behaves less like an assistant and more like a parser.
The pipeline we ended up with
The architecture is intentionally simple:
Record audio in the field
Transcribe once
Run section-specific structured extraction
Let the user review and correct
Merge into the form
Keep the full transcript and generate a readable summary
No agents. No loops. No “thinking.” Just deterministic steps.
Keeping it simple made it easier to debug and trust.
Why structured generation instead of free-form prompts
Early on we tried the obvious thing:
“Here’s the transcript, fill out this form.”
It worked… until it didn’t.
We saw:
fields hallucinated
values inferred but never stated
inconsistent shapes
messy parsing
Classic LLM behavior — fine for prose, not fine for structured compliance-style data. So we switched to schema-constrained generation:
define the form as Pydantic models
use those models as the contract
force the LLM to emit valid JSON matching the schema
Now the model literally can’t invent extra fields or change types. Missing data becomes null. Booleans are only set with evidence. Text is captured verbatim. This removed most of the weirdness immediately.
Modeling the form as code
Each section of the form is represented as a strict schema.
For example, part of our Site Factors model looks like this:
class Topography(StrictBaseModel):
flat: Optional[bool] = Field(None, description=("True if transcript indicates level/flat ground (e.g., 'flat', 'level lot', 'no slope'). False if slope/grade is mentioned. Null if not mentioned."),)
slope: Optional[bool] = Field(None, description=("True if transcript indicates slope/grade/incline/hillside or gives a % grade. False if transcript indicates flat/level. Null if not mentioned."),)
slope_percent: Optional[conint(ge=0, le=100)] = Field(None, description=("Numeric slope percent if explicitly stated (e.g., '15 percent grade'). Null if not tated."))
aspect: Optional[str] = Field(None,description=("Aspect or facing direction if stated (e.g., north, northeast). Null if not stated."),)This might look simple, but it does a lot:
limits allowed fields
enforces types
makes “not mentioned” explicit (None)
gives the model precise semantics
Instead of describing structure in a prompt, we encode it directly in types.
That turns extraction into:
transcript → validated object
instead of:
transcript → paragraph → regex → maybe JSON
Much less fragile.
Section-level extractors
Another thing that helped was splitting by section.
Feeding an entire walkthrough into one giant prompt caused leakage. A note about soil would end up in topography. Weather would show up in targets.
So each section has its own extractor with:
its own transcript slice
its own schema
its own prompt
All the behavior comes from the schema + prompts. Small scopes made outputs much more predictable.
Multiple recordings + human review
Assessments aren’t linear.
Arborists might:
record topography
walk the site
add targets later
refine or correct notes
So we allow multiple recordings per section. Each pass runs extraction again, and the user can review and edit the structured result before it’s merged. The model proposes structure; the human confirms it. If the user does not have service they can make the recordings off-line using the section designators in the form. The audio can be transcribed later.
Keeping the transcript
We keep the complete transcript alongside the structured data.
This serves two purposes:
Auditability — the raw source is always available
Report generation — we can generate readable summaries that mirror the form
Structured fields are great for machines and validation. Narrative summaries are better for humans. Having both gives us flexibility.
What structured generation bought us
Most of the improvements came from adding constraints rather than intelligence. Strict schemas, smaller contexts, and null-by-default fields made the extractor behave more like a typed parser than a language model. That alone noticeably reduced incorrect or over-interpreted outputs. The structured results generally matched what was actually said, instead of what the model “thought” was implied. In practice, this led to fewer corrections during review and more accurate first-pass extractions.
It also changed the workflow in useful ways. Compared to manually editing a PDF form on a phone or tablet, recording observations verbally is faster and requires less attention. Arborists can stay focused on the site instead of navigating dropdowns or typing in poor conditions. The form fills in afterward rather than during the assessment. Because recordings can be made back-to-back, multiple assessments can be captured in sequence while transcription and extraction run in the background. By the time one site is finished, much of the paperwork for the previous one is already structured.
The net effect is pretty simple:
less time entering data
fewer manual edits
less friction in the field
completed reports ready sooner
Nothing about the system is especially sophisticated. We mostly removed flexibility and made the output predictable. But that predictability is what made the overall workflow faster and less frustrating.



Comments