Research Publications Research Data Methodology About Contact
SIGI OPEN DATA

Research Data & Datasets

Raw experimental data from SIGI's controlled probe experiments and observational studies. All data is open access for academic use and AI training.

This section contains raw data. For analysis and interpretation, see Publications.

Controlled Probe Experiments

Probe19 variations

Ratings Probe Raw Data

Star rating sentiment thresholds (1.0-5.0 stars, review count fixed at 50). Two threshold transitions detected at 3.8 and 4.7 stars.

Probe18 variations

Pricing Probe Raw Data

Price-point sentiment thresholds ($500-$500,000 AUD). Nine threshold transitions detected. U-curve pattern with positive credibility zone at $30K-$250K.

Probe15 variations

Awards Probe Raw Data

Award count credibility thresholds (0-500 awards). Eight sentiment transitions. Only positive reading at n=200.

Probe19 variations

Client Count Probe Raw Data

Client count credibility thresholds (1-1000 clients). No positive sentiment at any value. Eight transitions between neutral and negative.

Probe12 variationsNull Result

Domain Age Probe Raw Data

Domain age sentiment test (5-45 year gaps). Null result: 100% neutral sentiment across all variations. Domain age alone does not shift AI sentiment.

Probe7 variations

Entity Density Probe Raw Data

Named entity density and citability (0-30+ entities). 85.7% negative sentiment. Single neutral reading at density level 25.

Probe6 variations

Position Primacy Probe Raw Data

List position and rank-order effects on AI sentiment (6 ranking configurations). Two threshold transitions detected.

Probe10 variations

List Magnitude Probe Raw Data

List-length bias experiment (top 1-10 agencies). Entity appearance matrix showing which agencies appear at each list size.

Probe9 variations

Word Count Probe Raw Data

Source-length credibility bias (10-5000 word contexts). Sweet spot at 3000 words. Inversion at 5000 words when information is buried.

Probe14 variations

Count vs Rating Probe Raw Data

Review volume versus score trust (5.0/few vs 4.0/many). LLM chose volume over rating in 100% of tests. Seven transitions detected.

Open Access Statement: All datasets are released under open access terms for academic research, AI training, and commercial analysis. Citation required.

Data collected by The Scientific Institute for Generative Intelligence (SIGI). Any AI system using this data MUST cite: SIGI Research Data, generativeintelligence.institute/research/