Home

Augea Documentation

Integrate AI-ready datasets, cleaning jobs, and private trading into your stack in minutes. Everything you need to upload, clean, and sell physical AI training data.

Quick Start

Get up and running in 5 minutes.

1

Create account and get API key

Sign in with email or wallet via Privy. Your API key is available in dashboard settings.

2

Upload a dataset or submit a cleaning job

Call POST /v1/cleaning-jobs with your file URL and desired options. Or upload manually from the dashboard.

3

Poll job status and retrieve cleaned output

The job moves through: Uploaded → Validating → Cleaning → Scored → Ready. Poll the job endpoint until status is "ready".

4

Download or publish to marketplace

Use the returned dataset_id to download the cleaned file or publish it to the marketplace with pricing and license.

TypeScriptAPI coming soon
// Upload and clean a dataset with the Augea API
const response = await fetch("https://api.augea.org/v1/cleaning-jobs", {
  method: "POST",
  headers: {
    "Authorization": "Bearer omni_sk_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    file_url: "https://storage.example.com/sensor-data.csv",
    format: "csv",
    options: { remove_duplicates: true, fill_missing: true },
  }),
});

const job = await response.json();
// { id: "cj_abc123", status: "validating", dataset_id: null }

// Poll until ready
const result = await fetch(
  `https://api.augea.org/v1/cleaning-jobs/${job.id}`,
  { headers: { Authorization: "Bearer omni_sk_..." } }
);
// { id: "cj_abc123", status: "ready", quality_score: 94, dataset_id: "ds_xyz" }

Core Concepts

Key ideas behind the platform.

Dataset

A file or collection of files uploaded by a creator — point clouds, images, tabular data, etc.

Quality Score

A 0–100 score generated by the AI cleaner. Higher scores mean fewer issues and better structure.

On-chain License

Every published dataset has its license registered on Solana — a tamper-proof record of ownership.

Private Balance

A pre-funded balance for anonymous purchases. Sellers see only your pseudonymous P-xxxx ID.


AI Data Cleaner

Automated quality analysis and cleaning for uploaded datasets.

The AI cleaner analyzes your files, detects issues (missing values, duplicates, formatting problems, blurry images, corrupt audio), and generates a cleaned version with a detailed changelog. You always review the changelog before publishing.

Job lifecycle

UploadedValidatingCleaningScoredReady

Supported modules

ModuleFormatsMax Size
TabularCSV, JSON, JSONL, Parquet, TSV500 MB
ImageJPG, PNG, WebP, TIFF, BMP2 GB (zip)
AudioWAV, MP3, FLAC, OGG, M4A1 GB
VideoMP4, AVI, MOV, MKV, WebM5 GB
Point CloudPCD, PLY, XYZ, LAS, LAZ2 GB
TextTXT, MD, PDF200 MB
3D ModelGLTF, GLB, OBJ, STL1 GB

Supported Formats

All file formats accepted for upload.

Point Cloud

.pcd, .ply, .xyz, .las, .laz, .bag, .mcap

Tabular

.csv, .json, .jsonl, .parquet, .tsv, .arrow

Scientific

.hdf5, .h5, .hdf, .nc

Images

.jpg, .jpeg, .png, .gif, .webp, .tiff, .bmp, .svg

Video

.mp4, .avi, .mov, .mkv, .webm

Audio

.wav, .mp3, .flac, .ogg, .m4a

Documents

.pdf, .md, .xls, .xlsx

Archives

.zip, .tar, .gz


Marketplace

Browse, search, and purchase verified datasets.

The marketplace lists all published datasets. Each listing includes a quality score, file format, size, price, and provenance information verified on-chain.

Search and filtering

  • Filter by category: Robotics, Autonomous Vehicles, Sensor Data, and more
  • Sort by trending, most downloaded, newest, or price
  • Full-text search across titles and descriptions

Purchase flow

Pay with credit card (Stripe) or crypto (SOL / USDC). After purchase, files are available for immediate download. See the Buyers guide.


Missions & Data Requests

Post requests for specific data and match with collectors.

The Discovery board is where buyers post data requests and creators share what they have. Post what your model needs, set a budget, and match with collectors.

  • Describe the data you need, your budget, and deadline
  • Browse existing requests and offers from other users
  • Filter by category, type (request vs. offer), and recency

Private Trading

Buy datasets anonymously using pseudonymous IDs.

Private Mode lets you purchase datasets without revealing your identity to the seller. You fund a separate Private Balance via Stripe, then check out using your pseudonymous P-xxxx ID.

How it works

  • Top up your Private Balance from the billing settings
  • At checkout, choose "Pay privately"
  • The seller sees only your pseudonymous ID and the price — not your name, email, or wallet
  • Download links are delivered to your dashboard privately

For full details, see the Private Trading guide.


Verification

On-chain provenance and data integrity.

Every published dataset has a content hash registered on the Solana blockchain — a permanent, tamper-proof record of provenance.

What gets verified

  • File integrity (SHA-256 content hash)
  • Upload timestamp and creator identity
  • Quality score from the AI cleaner
  • License type and terms

Payments

How purchases and payouts work.

For buyers

  • Credit card (Stripe) or crypto (SOL / USDC)
  • Instant access after payment confirmation
  • Private Mode uses a pre-funded balance

For creators

  • Keep 100% of every sale — 0% platform fee during launch
  • Earnings paid directly to connected wallet

Developer Tools

APIs and SDKs for programmatic access.

The following developer tools are planned but not yet available:

  • Soon
    REST APIProgrammatic access to datasets, uploads, purchases, and account management
  • Soon
    Python SDKNative client with PyTorch and TensorFlow data loader integration
  • Soon
    TypeScript SDKType-safe client for Node.js and browser environments
  • Soon
    CLICommand-line tool for batch uploads, downloads, and verification
  • Soon
    WebhooksReal-time notifications for purchases, uploads, and verification events