Research Publications Methodology About Team Contact
Logic-First Research

The Logic-First Research Methodology

Every finding SIGI publishes is subjected to a structured sequence of logical tests before it earns a causal or correlational label. We never claim more than the evidence supports. We classify every result by evidence level. We publish null results. This page describes the public-facing framework that governs all of our research.

Framework

The Evidence Hierarchy

Not all evidence is equal. A single observation does not carry the same weight as a controlled experiment replicated across platforms. We use a seven-level evidence hierarchy — adapted from established research methodology — to classify every finding. The level determines the language we are permitted to use when describing a result.

Level Evidence Type Description Permitted Language Prohibited Language
7 Meta-Analysis Systematic synthesis across multiple controlled studies Causes, establishes, demonstrates
6 Controlled Experiment (Replicated) Isolation probe replicated across platforms and time periods Strong evidence that, reliably produces Proves, definitively
5 Controlled Experiment (Single) Single-variable isolation with controlled conditions Evidence suggests, appears to cause Proves, always, universally
4 Quasi-Experiment Comparison with partial control over variables Associated with, linked to Causes, proves, demonstrates
3 Observational Study Systematic observation without variable manipulation Observed pattern, correlates with Causes, leads to, produces
2 Case Study Detailed analysis of a specific instance In this case, this instance shows Generally, always, causes
1 Anecdote Single unrepeated observation or report We observed, one instance noted Shows, suggests, indicates, causes

This hierarchy is not decorative. It is enforced. If a finding has only been observed once without controls, it is classified as Level 1 or 2 regardless of how compelling the result appears. The language constraint prevents premature causal claims from entering our publications.

Rigour

The Seven Logic Gates

Before any causal or correlational claim is published, it must pass through all seven logic gates. A failure at any gate means the claim is either downgraded, reclassified, or rejected. The gates are applied sequentially — each one builds on the output of the previous.

Gate 1
Logical Form

Every claim is first reduced to its formal logical structure: "If X, then Y." This gate tests whether the argument is structurally valid before any empirical evidence is considered. Claims that conflate correlation with causation, commit affirming-the-consequent errors, or contain hidden premises are caught here. If the logical form is invalid, the claim does not proceed.

Gate 2
Confound Check

This gate asks: could a co-varying variable explain the observed result? When a property change appears to affect LLM behaviour, we systematically identify other variables that may have changed simultaneously. If confounds cannot be ruled out, the finding is reclassified from causal to correlational, and the confounding variables are documented for future isolation testing.

Gate 3
Necessary vs Sufficient vs Contributory

Not all causal relationships are the same. This gate classifies the role of the variable under study. Is it necessary (the effect never occurs without it), sufficient (it alone produces the effect), or contributory (it increases the likelihood but is neither necessary nor sufficient)? Most findings in LLM citation research fall into the contributory category. Mislabelling a contributory factor as necessary or sufficient is a classification error this gate prevents.

Gate 4
Counterfactual

The counterfactual gate tests the inverse: what happens when the variable is absent? If adding structured data correlates with increased citation, this gate asks whether removing structured data produces a measurable decrease. A finding that passes the counterfactual gate in both directions — presence increases, absence decreases — earns a stronger evidence classification than one that only demonstrates the positive direction.

Gate 5
Alternative Explanations

This gate requires the researcher to systematically generate and evaluate competing explanations for the observed result. For every proposed cause, at least three plausible alternatives must be identified and tested or ruled out. The goal is not to confirm the preferred explanation but to genuinely attempt to disconfirm it. Findings survive this gate only when alternatives have been rigorously eliminated.

Gate 6
Replication

A finding observed once is preliminary. This gate tests whether the result can be reproduced across different conditions: different time periods, different LLM platforms, different query phrasings, and different source content. Replication across two or more dimensions is required to move a finding above a preliminary confidence rating. Failures to replicate are documented and published as null results.

Gate 7
Mechanism (Signal Access)

The final gate asks: is there a plausible mechanism by which the LLM could access the signal being tested? If a proposed cause has no pathway through which the model could detect or process it, the claim is suspect regardless of the observed correlation. This gate distinguishes between variables that are genuinely in the model's signal path and those that merely co-occur with something that is.

Probe Design

Single-Variable Isolation

The core experimental unit in our research is the isolation probe. The principle is straightforward: change exactly one variable, hold everything else constant, and measure the resulting change in LLM behaviour — whether that is citation frequency, sentiment, source ranking, or content selection.

This approach is necessary because LLM citation behaviour is influenced by dozens of variables simultaneously. Without strict isolation, it is impossible to attribute an observed effect to a specific cause. A page that adds schema markup and rewrites its headings and updates its publication date has changed three variables. If citation frequency increases, which variable produced the effect? Isolation probes answer that question by testing each variable independently.

Our probes measure changes in both sentiment (how an LLM characterises a source) and citation response (whether and how a source is referenced). The combination of these two dimensions provides a more complete picture than either metric alone.

Scope

Cross-Platform Testing

A finding that holds on one LLM platform but fails on others is platform-specific, not general. Our research programme tests across multiple major LLM platforms to distinguish universal citation behaviours from platform-specific artefacts. The specific platforms tested are documented in each publication rather than fixed in our methodology, because the commercially significant platforms change over time.

Cross-platform testing serves as a natural replication mechanism. When a result replicates across architecturally different models from different providers, the finding earns a higher confidence rating than one observed on a single platform.

Classification

Confidence Ratings

Every finding published by SIGI carries an explicit confidence rating. The rating is determined by the finding's replication status, the number of logic gates it has passed, and the evidence level it has achieved. There are four tiers.

Preliminary

Observed in initial testing but not yet replicated. May be based on a single platform, a single time period, or a small sample. Published to document the observation but not suitable for decision-making. Subject to reclassification or retraction as further testing is conducted.

Emerging

Replicated at least once across a different condition (different platform, time period, or query set). Has passed the core logic gates but may have unresolved confounds or limited counterfactual testing. Directionally reliable but specific magnitudes may shift.

Moderate

Replicated across multiple platforms and time periods. Has passed all seven logic gates. Confounds have been identified and controlled for. Suitable for informing strategy with the understanding that edge cases or platform-specific variations may exist.

High

Extensively replicated. Survived counterfactual testing in both directions. Mechanism identified and validated. Part of a broader evidence pattern supported by multiple independent studies. The strongest classification we assign — and the rarest.

Principles

What Makes This Methodology Different

Most claims about LLM behaviour in the market today are anecdotal. Someone changes a website, observes a different result in ChatGPT, and publishes a conclusion. That is Level 1 evidence — an unrepeated anecdote. It may be true, but it has not been tested.

Our methodology exists to close that gap. Three principles distinguish our approach:

We never claim more than the evidence supports. The evidence hierarchy and language constraints mean a finding cannot be described as causal until it has earned that classification through controlled, replicated testing. This is not a stylistic choice — it is a structural rule enforced at every stage of the publication process.

We publish null results. When a variable we expected to matter shows no measurable effect, we publish that finding. The absence of an effect is as informative as its presence. Omitting null results distorts the evidence base and creates survivorship bias in the literature.

We classify every finding by evidence level. Readers of our publications always know whether they are looking at a preliminary single-platform observation or a replicated, mechanism-validated finding. The confidence rating is not buried in footnotes — it is a primary feature of every published result.

Transparency

Open Science Commitment

We publish raw datasets so others can verify our analysis. We document our methodology in sufficient detail for independent replication. We report null results alongside positive findings. We classify every result by evidence level and confidence rating so readers can assess the strength of the evidence for themselves.

Raw datasets and experimental records are available in our Research section. Analysed findings with full methodology documentation are published in Publications.

Collaborate With Us
Scope

Our Research Programme

SIGI's research programme is designed to systematically map how large language models select, cite, and surface information. The programme spans 100 planned research papers across multiple categories including citation mechanics, source authority signals, content structure effects, temporal dynamics, and cross-platform behavioural differences.

Each paper follows the Logic-First Research Methodology described on this page. Findings are published on a rolling basis as studies are completed, with each publication carrying its evidence level classification and confidence rating. The full catalogue of published and in-progress research is available in our Publications section.

FAQ

Methodology Questions

The Logic-First Research Methodology is SIGI's framework for studying LLM citation behaviour. It combines a 7-level evidence hierarchy, seven logic gates that every causal claim must pass, single-variable isolation probes, and a four-tier confidence rating system. Every finding is classified by evidence level before publication.
The seven logic gates are: (1) Logical Form — testing argument structure, (2) Confound Check — ruling out co-varying variables, (3) Necessary vs Sufficient vs Contributory — classifying the causal role, (4) Counterfactual — testing what happens without the variable, (5) Alternative Explanations — systematically considering other causes, (6) Replication — repeating the test across conditions, and (7) Mechanism — identifying the plausible signal access pathway.
We use a 7-level evidence hierarchy ranging from Level 1 (Anecdote — a single unrepeated observation) to Level 7 (Meta-Analysis — systematic synthesis across multiple controlled studies). Every finding is assigned a level, and the language used to describe it is constrained by that level. For example, only Level 5 and above may use language implying causation.
Yes. SIGI publishes null results — findings where no effect was observed — alongside positive findings. We believe reporting what does not work is as scientifically valuable as reporting what does, and omitting null results distorts the evidence base.
Yes. Raw datasets are published in our Research section at /research/, and analysed findings with full methodology documentation are available at /publications/. We are committed to open science and provide sufficient detail for independent verification and replication.
Single-variable isolation is a probe design principle where exactly one property of a source or query is changed while all other variables are held constant. The resulting change in LLM sentiment or citation behaviour is then measured. This approach allows us to attribute observed effects to specific variables rather than confounded combinations.