Check before you train.

Training data comes from more sources than ever. Sourcemark helps AI teams check content and datasets against declared creator choices before work enters models, with match results, licence routes and audit records attached.

Join the API waitlist See how it works

Dataset → Check → Result → Action

Dataset check resultMatch found

Match type: Visual / partial · 97%
Declared status: Available by licence
Licence route: Contact representative
Next action: Request licence
Audit: Recorded

Training data needs a clearer permission trail.

Training datasets draw from mixed sources: licensed material, public content, open datasets, suppliers, platforms and internal archives. Before training, teams need to know whether a creator, catalogue or representative has declared a position on AI training.

Signals are fragmented

AI training choices may sit in bios, metadata, platform settings or nowhere at all.

There is no standard check

Dataset teams and suppliers lack a common place to query declared status before training.

Disputes surface late

When questions about permission come up after training, they are slower to resolve, costlier to remediate and harder to document.

Evidence is hard to produce

Even where teams want stronger processes, there is often no standard record of what was checked, matched and acted on.

Sourcemark helps separate what is available by licence, what is unavailable, what needs review and what has no registered signal.

What a Sourcemark check returns

A check should not just return a match. It should return a result your team can act on.

Match result

Whether a registered work was found, including exact, visual or partial match signals.

Declared status

Whether the work is available by licence, unavailable for AI training, not yet declared or needs review, with the timestamp and version of the declaration.

Licence route

Where to request permission when work is available by licence: creator, representative, agency, stock library, CMO or publisher.

Next action

Licence, exclude, hold for review or proceed under your internal policy, based on the declared status and match confidence.

Audit and governance record

A record of what was checked, what matched, what declaration was returned and when the check was made, designed to support legal, procurement, transparency and internal AI governance review.

No registered signal

For unmatched files, Sourcemark returns no registered signal. Absence of a match does not mean permission exists, but it gives teams a clearer result to handle under their own policy.

Every returned declaration is linked to a timestamped, cryptographically anchored Sourcemark record, helping AI teams evidence what was checked, what matched and what decision was made.

How it works

Add Sourcemark as a pre-training check across candidate datasets, supplier data or internal review workflows.

Prepare for checking

Run checks across candidate data

Generate or submit supported fingerprints and query Sourcemark before training or fine-tuning.

Interpret the result

Matched works return a declared status, licence route, declaration timestamp and record details. Unmatched works return no registered signal.

Act before training

Route licence requests, exclude unavailable works, hold uncertain matches for review and document the checks carried out for legal, procurement, transparency and internal AI governance review.

Prepare → Check → Interpret → Act

Where Sourcemark fits

Use Sourcemark as a pre-training check across dataset audits, supplier reviews, ingestion workflows and governance processes.

Dataset audit

Check an existing, acquired or proposed dataset against declared AI training choices before training begins.

Licensable content discovery

Find registered works marked available by licence and route enquiries to the right creator, representative, catalogue or stock library.

Roadmap

Creator-led content briefs

For future dataset needs, route specific briefs to creators and catalogues who can produce or supply work with the right permissions, releases and licence route.

Supplier review

Ask data suppliers to run or provide Sourcemark checks as part of dataset procurement and review.

Ingestion pipeline

Add Sourcemark checks into ingestion or review workflows as a standard pre-training step.

Compliance and governance

Use Sourcemark query records to support legal, procurement and internal AI governance review.

Transparency and procurement

Show how declared AI training signals were checked, what matched and what action was taken.

Platform signal

If you host creator content, surface Sourcemark declarations so AI training signals are clearer at the point of use.

Final review stays with your team. Sourcemark helps structure the check before training, while your legal, procurement and licensing teams stay in control of final decisions.

What Sourcemark is not

Not an enforcement tool

Sourcemark records declarations, it does not police use.

Not currently a licensing broker

It provides the licence route, not the negotiation.

Not an ownership verification service

It records who made a declaration, not whether they had the legal authority to do so.

Not a guarantee of coverage

A no registered signal result means no Sourcemark declaration was found, not that permission exists.

Check first. Train with a record.

Sourcemark is in early access. Join the API waitlist to help shape dataset checks, licence routing and audit records for AI training workflows.

Join the API waitlist Speak with us