top of page

Why We Chose Structured Generation for Hands-Free Tree Risk Assessments

  • Roger Erismann
  • Feb 6
  • 4 min read

Tree risk assessments don’t happen at a desk. They happen outside. In the rain. On uneven ground. Sometimes wearing gloves. Typing notes into a phone or navigating dropdown-heavy forms isn’t just annoying — it slows the work down and breaks focus. Paper notes aren’t much better when it’s wet or windy.


We wanted something simpler: let arborists talk, and let the form fill itself out.

This post explains how we are building that, and why we ended up using structured generation with schemas instead of free‑form LLM outputs.


The constraint: hands-free and accurate

From the start, we had two requirements that push in opposite directions:

  1. Hands-free capture (voice notes instead of typing)

  2. Structured, reliable form output


Voice input is naturally unstructured. Forms are the opposite — strict fields, booleans, numbers, and checkboxes. Bridging those two worlds is the whole problem. A normal “chatty” LLM that summarizes text isn’t enough. Summaries are flexible and interpretive. Forms aren’t.


If the arborist doesn’t say something, we cannot invent it. If they say “minor decay,” we cannot reinterpret it. So we needed something that behaves less like an assistant and more like a parser.


The pipeline we ended up with

The architecture is intentionally simple:

  1. Record audio in the field

  2. Transcribe once

  3. Run section-specific structured extraction

  4. Let the user review and correct

  5. Merge into the form

  6. Keep the full transcript and generate a readable summary

No agents. No loops. No “thinking.”   Just deterministic steps.

Keeping it simple made it easier to debug and trust.

Why structured generation instead of free-form prompts


Early on we tried the obvious thing:

“Here’s the transcript, fill out this form.”

It worked… until it didn’t.

We saw:

  • fields hallucinated

  • values inferred but never stated

  • inconsistent shapes

  • messy parsing

Classic LLM behavior — fine for prose, not fine for structured compliance-style data. So we switched to schema-constrained generation:

  • define the form as Pydantic models

  • use those models as the contract

  • force the LLM to emit valid JSON matching the schema

Now the model literally can’t invent extra fields or change types. Missing data becomes null. Booleans are only set with evidence.   Text is captured verbatim. This removed most of the weirdness immediately.


Modeling the form as code

Each section of the form is represented as a strict schema.

For example, part of our Site Factors model looks like this:

class Topography(StrictBaseModel):

  flat: Optional[bool] = Field(None, description=("True if 	transcript indicates level/flat ground (e.g., 'flat', 'level lot', 'no slope'). False if slope/grade is mentioned. Null if not mentioned."),)
   slope: Optional[bool] = Field(Nonedescription=("True if transcript indicates slope/grade/incline/hillside or gives a % grade. False if transcript indicates flat/level. Null if not mentioned."),)    
  slope_percent: Optional[conint(ge=0, le=100)] = Field(Nonedescription=("Numeric slope percent if explicitly stated (e.g., '15 percent grade'). Null if not tated."))
  aspect: Optional[str] = Field(None,description=("Aspect or facing direction if stated (e.g., north, northeast). Null if not stated."),)​

This might look simple, but it does a lot:

  • limits allowed fields

  • enforces types

  • makes “not mentioned” explicit (None)

  • gives the model precise semantics

Instead of describing structure in a prompt, we encode it directly in types.

That turns extraction into:

transcript → validated object

instead of:

transcript → paragraph → regex → maybe JSON

Much less fragile.

Section-level extractors

Another thing that helped was splitting by section.

Feeding an entire walkthrough into one giant prompt caused leakage. A note about soil would end up in topography. Weather would show up in targets.

So each section has its own extractor with:

  • its own transcript slice

  • its own schema

  • its own prompt

      

All the behavior comes from the schema + prompts. Small scopes made outputs much more predictable.

Multiple recordings + human review

Assessments aren’t linear.

Arborists might:

  • record topography

  • walk the site

  • add targets later

  • refine or correct notes


So we allow multiple recordings per section. Each pass runs extraction again, and the user can review and edit the structured result before it’s merged. The model proposes structure; the human confirms it. If the user does not have service they can make the recordings off-line using the section designators in the form. The audio can be transcribed later.


Keeping the transcript

We keep the complete transcript alongside the structured data.

This serves two purposes:

  1. Auditability — the raw source is always available

  2. Report generation — we can generate readable summaries that mirror the form


Structured fields are great for machines and validation. Narrative summaries are better for humans. Having both gives us flexibility.


What structured generation bought us

Most of the improvements came from adding constraints rather than intelligence. Strict schemas, smaller contexts, and null-by-default fields made the extractor behave more like a typed parser than a language model. That alone noticeably reduced incorrect or over-interpreted outputs. The structured results generally matched what was actually said, instead of what the model “thought” was implied. In practice, this led to fewer corrections during review and more accurate first-pass extractions.


It also changed the workflow in useful ways. Compared to manually editing a PDF form on a phone or tablet, recording observations verbally is faster and requires less attention. Arborists can stay focused on the site instead of navigating dropdowns or typing in poor conditions. The form fills in afterward rather than during the assessment. Because recordings can be made back-to-back, multiple assessments can be captured in sequence while transcription and extraction run in the background. By the time one site is finished, much of the paperwork for the previous one is already structured.


The net effect is pretty simple:

  • less time entering data

  • fewer manual edits

  • less friction in the field

  • completed reports ready sooner


Nothing about the system is especially sophisticated. We mostly removed flexibility and made the output predictable. But that predictability is what made the overall workflow faster and less frustrating.

Comments


© 2026 hammerdirt 

Contact roger[at]hammerdirt.solutions
bottom of page