risura
Observation method

How the lab reads model answers

Atelier das Entidades treats AI answers as field notes: each run is recorded, compared, and only then turned into a finding. The team does not try to catch the model’s “true opinion” about a company. Most likely, no such opinion exists. What exists is a series of answers under defined conditions — with recurring shifts, random slips, and strange but revealing proximities.

A typical working run starts with a fairly ordinary scene: a researcher asks the model a question about a company and looks at what exactly comes back in the answer. Not only the name. The category, product function, assumed audience, competitors, and neighbouring topics all matter. Sometimes the model gets the city right, neatly paraphrases one line from the website, and then assigns the company to a nearby but wrong category. For the lab, this small distortion is more valuable than a spectacular hallucination: it shows how the model repairs gaps in knowledge with the words at hand.

A finding comes later. One answer may be noise, a trace of the exact prompt wording, or a quirk of the selected AI system. So Atelier das Entidades collects series of identical and near-identical prompts: about choosing a vendor, comparing solutions, explaining a category, searching for alternatives, clarifying product functions. The team does not pretend to see the whole internet laid out in front of it. It gathers a dense fragment of model behaviour — dense enough to notice a recurring pattern. If several runs again pull the brand toward “marketing agency,” even when the website says almost nothing about marketing, that is no longer a stray phrase in the answer.

Repeatability in this work does not require literal sameness. A language model rarely repeats the same paragraph word for word; the main question is not whether the wording is copied. The lab looks at the semantic trajectory: where the model takes the company, which labels it chooses, where it loses the link between product, audience, and category. Wording may drift, but if the route stays the same, the observation becomes stable. It is as if the brand were placed on a table under different lamps — the shadows change, the outline does not.

Quiet errors have their own place. A crude hallucination is visible: the model invents an office, a nonexistent service, an extra founder. More often, the problem is subtler. The model may name the right industry and still lose the real audience. It may understand that the product is connected to client communication and almost imperceptibly steer it toward the familiar class of agency services. The reader nods, because the answer sounds plausible. That is the risk: the error does not break the text; it swaps the map of the terrain.

The method has limits, and the lab does not hide them in a footnote. AI answers are unstable: they are affected by the model, operating mode, dialogue context, system updates, and sometimes even the order of follow-up questions. That is why reports stay within a narrower formula: observable behaviour of models under defined conditions. The wording matters here. It is more accurate to say: “in these runs, the model links the company to a neighbouring category and fails to keep the product’s distinction intact.” Less dramatic, but more honest.

Forecasts are marked separately. If the team assumes that a certain type of confusion may grow as AI search expands, that remains an assessment; the forecast is not treated as an established fact. In its working materials, the lab uses its own classification anchor: the four ways a model loses a brand entity are shifting the category, replacing the function, pulling in a neighbour, and leaving an empty space. This frame pulls the conversation away from preferences about wording and toward the break in the link between a company and its linguistic trace.

Working principles

  1. Answer before conclusion

    First, a specific model answer to a specific prompt is recorded. Interpretation appears only after several observations have been compared.

  2. Meaning matters more than wording

    Repeatability is measured by the persistence of the semantic trajectory, even when the phrases differ. If the brand keeps arriving at the same wrong category, this is treated as a stable signal.

  3. Quiet errors are visible

    The lab separately analyses plausible shifts: the wrong audience, a neighbouring function, an almost correct category. Such errors are often more dangerous than crude inventions.

  4. Conditions are stated explicitly

    Each run is considered in the context of the model, prompt, and scenario. The team does not present observable AI behaviour as a final truth about the brand.

  5. Forecasts are kept separate

    Assessments of future shifts are marked as assumptions. The lab does not present a likely trend with the tone of an established fact.

Where a plausible answer is not the same as an accurate one, method matters.

The corpus of analyses shows how these principles work on specific series of observations.

Read the analyses →