The foundational acoustic dataset platform — voice, instrument, breath, and human gesture.

Rights-cleared, provenance-audited datasets spanning world instruments, extended vocal techniques, emotional and physiological vocality, and the acoustic primitives of human gesture — engineered for music generation, expressive TTS, multimodal AI, robotics, and synthetic humans.

Structured at the articulation level, produced through documented workflows, and secured by the Proteus Standard™—so as scrutiny around training data provenance and auditability increases, teams can clearly explain not just what data they use, but where it came from and how it can be defended.

Browse the CatalogEnterprise Licensing Inquiries

New here? Start with this.

If this is your first visit, these four pages provide the fastest orientation to what Harmonic Frontier Audio is, how teams use the datasets, and how licensing works.

Most teams begin with a short conversation to confirm fit.

Catalog Series

Structured dataset families spanning instruments, vocality, breath, and human gesture — each engineered for clarity, expressive range, and provenance.

Celtic Constellation

Traditional and modern Celtic instruments and voices—pipes, whistles, fiddles, and more—captured for melodic, modal, and drone-based generation.

World Percussion Nexus

Frame drums, hand percussion, and rhythmic instruments from around the world—focused on gesture-rich, playable patterns.

Extended Vocal Techniques Spectrum (Music)

Overtone singing, multiphonics, throat voice, and experimental vocal colors presented as phrase-level, musical gestures for expressive voice modeling.

World Resonance Gallery

Plucked, bowed, struck, and wind-resonant instruments—kalimba, wooden flutes, ocarina-type voices, and other acoustic bodies—engineered to teach models resonance shaping, harmonic decay, and spectral color transitions.

Novelty Gems Cabinet

Oddities, curiosities, and rare sound-makers—boutique textures for giving models a unique sonic fingerprint.

Human Vocality Primitives

Foundational non-verbal phonation datasets—breath noise, glottal gestures, and expressive primitives engineered for speech, robotics, TTS, and multimodal AI.

Human Gesture & Body Primitives

Body-driven sounds—clothing movement, footsteps, impacts, and subtle human foley—captured as clean primitives for multimodal, robotics, and video-to-audio models.

Featured Datasets

A growing catalogue of rights-cleared acoustic datasets—spanning instruments, vocality, breath, and human gesture—optimized for modern generative and multimodal AI.

Highland Bagpipes

Celtic Constellation · Air

A focused preview of Highland pipe gestures—sustains, embellishments, and calibrated dynamic sweeps for melodic and drone-based generation.

Preview available · Full dataset in production

Overtone Singing

Extended Vocal Techniques Spectrum (Music) · Voice

Musical overtone gestures with stable fundamentals and clearly articulated harmonic bands—ideal for spectral learning, timbre modeling, and extended-voice generative systems.

Preview available · Full dataset in production

Irish Tin Whistle in D

Celtic Constellation · Air

Sustains, cuts, rolls, and breath-nuanced passages designed to teach models the agility and ornamentation of traditional whistle playing.

Preview available · Full dataset in production

Subharmonic Phonation / Vocal Fry

Extended Vocal Techniques Spectrum (Music) · Voice

Deep subharmonic and fry-based phonation performed with controlled musical intent—capturing overtone–undertone transitions, expressive gestures, and stylized extended-vocal phrasing for generative modeling.

Preview available · Full dataset in production

Kalimba

World Resonance Gallery · Metal

Close-mic’d plucks, rolls, and articulated patterns across the board—captured at 96 kHz for clean spectral learning and detailed harmonic decay.

Preview available · Full dataset in production

View all datasets

The Proteus Standard™

A three-layer provenance framework for rights-managed, regulation-ready audio datasets—connecting every file to its source, its signature, and its acoustic fingerprint across instrument, vocal, breath, and gesture domains.

Layer I
Source
Layer II
Signature
Layer III
Fingerprint
Layer I · Source Provenance

Session-level transparency

Each dataset—instrumental, vocal, breath-based, or gesture-derived—links back to its recording sessions: performer, technique, signal chain, room, and capture parameters. This forms the base of the Proteus provenance graph, ensuring every file has a human-readable and legally defensible origin.

Layer II · Cryptographic Integrity

Tamper-evident manifests

Per-file hashes and signed manifests allow engineering, compliance, and legal teams to verify that received datasets match the exact materials authored by HFA. This supports secure deployment across generative audio systems, robotics, TTS pipelines, multimodal models, and agentic voice systems.

Layer III · Acoustic Fingerprinting

Detection & provenance signals

Proteus maintains internal acoustic fingerprints and provenance signals to support downstream identification and analysis of HFA-origin material. Methods may include content-based fingerprinting and similarity testing, enabling informed assessment in leakage, misuse, or diligence scenarios without imposing DRM or restricting legitimate use.

The Proteus Standard is built into every full HFA dataset, providing a verifiable chain of custody from performer to file, and giving AI teams—in music, robotics, multimodal systems, voice technologies, and research—a defensible, auditable foundation for training, deployment, and enterprise compliance.

Explore the HFA catalog

Rights-cleared acoustic datasets spanning instruments, vocality, breath, and human gesture—captured with full provenance for generative and multimodal AI.

Browse the CatalogContact about Licensing