Check before you train.
Training data comes from more sources than ever. Sourcemark helps AI teams check content and datasets against declared creator choices before work enters models, with match results, licence routes and audit records attached.
Dataset → Check → Result → Action
- Match type
- Visual / partial · 97%
- Declared status
- Available by licence
- Licence route
- Contact representative
- Next action
- Request licence
- Audit
- Recorded
Training data needs a clearer permission trail.
Training datasets draw from mixed sources: licensed material, public content, open datasets, suppliers, platforms and internal archives. Before training, teams need to know whether a creator, catalogue or representative has declared a position on AI training.
Signals are fragmented
AI training choices may sit in bios, metadata, platform settings or nowhere at all.
There is no standard check
Dataset teams and suppliers lack a common place to query declared status before training.
Disputes surface late
When questions about permission come up after training, they are slower to resolve, costlier to remediate and harder to document.
Evidence is hard to produce
Even where teams want stronger processes, there is often no standard record of what was checked, matched and acted on.
Sourcemark helps separate what is available by licence, what is unavailable, what needs review and what has no registered signal.
What a Sourcemark check returns
A check should not just return a match. It should return a result your team can act on.
Match result
Whether a registered work was found, including exact, visual or partial match signals.
Declared status
Whether the work is available by licence, unavailable for AI training, not yet declared or needs review, with the timestamp and version of the declaration.
Licence route
Where to request permission when work is available by licence: creator, representative, agency, stock library, CMO or publisher.
Next action
Licence, exclude, hold for review or proceed under your internal policy, based on the declared status and match confidence.
Audit and governance record
A record of what was checked, what matched, what declaration was returned and when the check was made, designed to support legal, procurement, transparency and internal AI governance review.
No registered signal
For unmatched files, Sourcemark returns no registered signal. Absence of a match does not mean permission exists, but it gives teams a clearer result to handle under their own policy.
Every returned declaration is linked to a timestamped, cryptographically anchored Sourcemark record, helping AI teams evidence what was checked, what matched and what decision was made.
How it works
Add Sourcemark as a pre-training check across candidate datasets, supplier data or internal review workflows.
Prepare for checking
Register interest in Sourcemark checks for batch audits, workflow integration and dataset review.
Run checks across candidate data
Generate or submit supported fingerprints and query Sourcemark before training or fine-tuning.
Interpret the result
Matched works return a declared status, licence route, declaration timestamp and record details. Unmatched works return no registered signal.
Act before training
Route licence requests, exclude unavailable works, hold uncertain matches for review and document the checks carried out for legal, procurement, transparency and internal AI governance review.
Prepare → Check → Interpret → Act
Where Sourcemark fits
Use Sourcemark as a pre-training check across dataset audits, supplier reviews, ingestion workflows and governance processes.
Dataset audit
Check an existing, acquired or proposed dataset against declared AI training choices before training begins.
Licensable content discovery
Find registered works marked available by licence and route enquiries to the right creator, representative, catalogue or stock library.
Creator-led content briefs
For future dataset needs, route specific briefs to creators and catalogues who can produce or supply work with the right permissions, releases and licence route.
Supplier review
Ask data suppliers to run or provide Sourcemark checks as part of dataset procurement and review.
Ingestion pipeline
Add Sourcemark checks into ingestion or review workflows as a standard pre-training step.
Compliance and governance
Use Sourcemark query records to support legal, procurement and internal AI governance review.
Transparency and procurement
Show how declared AI training signals were checked, what matched and what action was taken.
Platform signal
If you host creator content, surface Sourcemark declarations so AI training signals are clearer at the point of use.
Final review stays with your team. Sourcemark helps structure the check before training, while your legal, procurement and licensing teams stay in control of final decisions.
What Sourcemark is not
Not an enforcement tool
Sourcemark records declarations, it does not police use.
Not currently a licensing broker
It provides the licence route, not the negotiation.
Not an ownership verification service
It records who made a declaration, not whether they had the legal authority to do so.
Not a guarantee of coverage
A no registered signal result means no Sourcemark declaration was found, not that permission exists.
Check first. Train with a record.
Sourcemark is in early access. Join the API waitlist to help shape dataset checks, licence routing and audit records for AI training workflows.