Augea Documentation
Integrate AI-ready datasets, cleaning jobs, and private trading into your stack in minutes. Everything you need to upload, clean, and sell physical AI training data.
Quick Start
Get up and running in 5 minutes with a sample dataset upload and your first AI cleaning job.
Learn moreAI Data Cleaner
Submit cleaning jobs and retrieve verified, scored datasets ready for training.
Learn moreMarketplace
Search, filter, and purchase datasets. Browse by category, format, or quality score.
Learn moreQuick Start
Get up and running in 5 minutes.
Create account and get API key
Sign in with email or wallet via Privy. Your API key is available in dashboard settings.
Upload a dataset or submit a cleaning job
Call POST /v1/cleaning-jobs with your file URL and desired options. Or upload manually from the dashboard.
Poll job status and retrieve cleaned output
The job moves through: Uploaded → Validating → Cleaning → Scored → Ready. Poll the job endpoint until status is "ready".
Download or publish to marketplace
Use the returned dataset_id to download the cleaned file or publish it to the marketplace with pricing and license.
// Upload and clean a dataset with the Augea API
const response = await fetch("https://api.augea.org/v1/cleaning-jobs", {
method: "POST",
headers: {
"Authorization": "Bearer omni_sk_...",
"Content-Type": "application/json",
},
body: JSON.stringify({
file_url: "https://storage.example.com/sensor-data.csv",
format: "csv",
options: { remove_duplicates: true, fill_missing: true },
}),
});
const job = await response.json();
// { id: "cj_abc123", status: "validating", dataset_id: null }
// Poll until ready
const result = await fetch(
`https://api.augea.org/v1/cleaning-jobs/${job.id}`,
{ headers: { Authorization: "Bearer omni_sk_..." } }
);
// { id: "cj_abc123", status: "ready", quality_score: 94, dataset_id: "ds_xyz" }Core Concepts
Key ideas behind the platform.
Dataset
A file or collection of files uploaded by a creator — point clouds, images, tabular data, etc.
Quality Score
A 0–100 score generated by the AI cleaner. Higher scores mean fewer issues and better structure.
On-chain License
Every published dataset has its license registered on Solana — a tamper-proof record of ownership.
Private Balance
A pre-funded balance for anonymous purchases. Sellers see only your pseudonymous P-xxxx ID.
AI Data Cleaner
Automated quality analysis and cleaning for uploaded datasets.
The AI cleaner analyzes your files, detects issues (missing values, duplicates, formatting problems, blurry images, corrupt audio), and generates a cleaned version with a detailed changelog. You always review the changelog before publishing.
Job lifecycle
Supported modules
| Module | Formats | Max Size |
|---|---|---|
| Tabular | CSV, JSON, JSONL, Parquet, TSV | 500 MB |
| Image | JPG, PNG, WebP, TIFF, BMP | 2 GB (zip) |
| Audio | WAV, MP3, FLAC, OGG, M4A | 1 GB |
| Video | MP4, AVI, MOV, MKV, WebM | 5 GB |
| Point Cloud | PCD, PLY, XYZ, LAS, LAZ | 2 GB |
| Text | TXT, MD, PDF | 200 MB |
| 3D Model | GLTF, GLB, OBJ, STL | 1 GB |
Supported Formats
All file formats accepted for upload.
Point Cloud
.pcd, .ply, .xyz, .las, .laz, .bag, .mcap
Tabular
.csv, .json, .jsonl, .parquet, .tsv, .arrow
Scientific
.hdf5, .h5, .hdf, .nc
Images
.jpg, .jpeg, .png, .gif, .webp, .tiff, .bmp, .svg
Video
.mp4, .avi, .mov, .mkv, .webm
Audio
.wav, .mp3, .flac, .ogg, .m4a
Documents
.pdf, .md, .xls, .xlsx
Archives
.zip, .tar, .gz
Marketplace
Browse, search, and purchase verified datasets.
The marketplace lists all published datasets. Each listing includes a quality score, file format, size, price, and provenance information verified on-chain.
Search and filtering
- Filter by category: Robotics, Autonomous Vehicles, Sensor Data, and more
- Sort by trending, most downloaded, newest, or price
- Full-text search across titles and descriptions
Purchase flow
Pay with credit card (Stripe) or crypto (SOL / USDC). After purchase, files are available for immediate download. See the Buyers guide.
Missions & Data Requests
Post requests for specific data and match with collectors.
The Discovery board is where buyers post data requests and creators share what they have. Post what your model needs, set a budget, and match with collectors.
- Describe the data you need, your budget, and deadline
- Browse existing requests and offers from other users
- Filter by category, type (request vs. offer), and recency
Private Trading
Buy datasets anonymously using pseudonymous IDs.
Private Mode lets you purchase datasets without revealing your identity to the seller. You fund a separate Private Balance via Stripe, then check out using your pseudonymous P-xxxx ID.
How it works
- Top up your Private Balance from the billing settings
- At checkout, choose "Pay privately"
- The seller sees only your pseudonymous ID and the price — not your name, email, or wallet
- Download links are delivered to your dashboard privately
For full details, see the Private Trading guide.
Verification
On-chain provenance and data integrity.
Every published dataset has a content hash registered on the Solana blockchain — a permanent, tamper-proof record of provenance.
What gets verified
- File integrity (SHA-256 content hash)
- Upload timestamp and creator identity
- Quality score from the AI cleaner
- License type and terms
Payments
How purchases and payouts work.
For buyers
- Credit card (Stripe) or crypto (SOL / USDC)
- Instant access after payment confirmation
- Private Mode uses a pre-funded balance
For creators
- Keep 100% of every sale — 0% platform fee during launch
- Earnings paid directly to connected wallet
Developer Tools
APIs and SDKs for programmatic access.
The following developer tools are planned but not yet available:
- SoonREST API — Programmatic access to datasets, uploads, purchases, and account management
- SoonPython SDK — Native client with PyTorch and TensorFlow data loader integration
- SoonTypeScript SDK — Type-safe client for Node.js and browser environments
- SoonCLI — Command-line tool for batch uploads, downloads, and verification
- SoonWebhooks — Real-time notifications for purchases, uploads, and verification events