Catalog Series

Human Vocality Primitives

Rights-cleared, articulation-level recordings of fundamental human vocal behaviors, designed as modular primitives for both phonetic research and expressive vocal systems. This series focuses on vocal mechanics (airflow, phonation, resonance, and gesture), rather than text-to-speech or linguistic content.

Why this series matters

Most “voice datasets” are optimized for language. Human vocality is broader: breath noise, phonation modes, resonance shaping, non-lexical gestures, and physiological cycles. HVP isolates these behaviors so models can learn controllable, human-aligned vocal mechanics—independent of words. For researchers, this enables isolation and analysis of vocal variables that are tightly confounded in natural speech corpora. For applied systems, it provides interpretable, controllable foundations for expressive vocal modeling.

Airflow control

Phonation modes

Resonance shaping

Non-lexical gesture

What makes it hard to capture

These signals are subtle, highly variable, and easy to mislabel without performer-grade domain knowledge. Clean isolation must preserve intent while avoiding fatigue artifacts, and metadata must describe controllable behaviors rather than subjective adjectives. These same constraints make such data difficult to reproduce in laboratory settings and costly to generate at scale for production systems, which is why comparable resources rarely exist.

Subseries

Human Vocality Primitives is organized into four subseries. Each is listed below with its datasets, CMS-driven for accurate status updates.

Airflow & Airstream Primitives

Non-phonated vocal output: breath noise, aspiration, whisper continua, and physiological breathing patterns— captured as controllable airstream behaviors.

Breathing Cycles & Physiological Patterns

FULL DATASET

A comprehensive collection of natural inhale–exhale cycles, emotional breaths, and physiological airflow patterns captured with clinical clarity for expressive audio modeling.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

FULL DATASET

Breath & Noise Airstreams

FULL DATASET

Controlled airflow textures including exhalations, fricative noise, and shaped airstream bursts.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

FULL DATASET

Plosives & Non-Lexical Consonant Bursts

FULL DATASET

Unvoiced consonant transients including plosives, clicks, bursts, and shaped onsets.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

FULL DATASET

Whisper & Aspiration

FULL DATASET

Whisper phonation with varying pressure, turbulence, and vowel shaping.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

FULL DATASET

Phonation Mode Primitives

Voiced output across foundational laryngeal modes: modal, breathy, pressed, falsetto/loft, and fry/subharmonic behaviors— captured with technique boundaries and controllable transitions.

Breathy & Semi-Modal Phonation

IN PRODUCTION

Soft phonation with turbulent onsets and airy harmonic profiles.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Falsetto / Loft Phonation

IN PRODUCTION

Light-register phonation with smooth attacks and harmonic purity.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Modal Phonation

IN PRODUCTION

Steady, natural vocal tone across pitch, amplitude, and sustained vowel sets.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Pressed & Constricted Phonation

IN PRODUCTION

Intense phonation types with high tension, narrowed resonance, and sharp onsets.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Vocal Fry & Subharmonic Phonation (Physiological)

IN PRODUCTION

Low-register fry, creak, and controlled subharmonic behaviors.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Formant & Resonance Primitives

Vowel and tract-shaping behaviors: formant motion, nasalization/velum control, and harmonic–formant interactions— designed for explicit resonance control rather than linguistic labeling.

Nasalization & Velum Control

IN PRODUCTION

Resonance adjustments through velum shaping and nasal airflow variations.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Overtone / Harmonic Formant Interaction

IN PRODUCTION

Harmonic emphasis techniques demonstrating partial isolation and resonance steering.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Vowel Morphing & Formant Shifts

IN PRODUCTION

Continuous vowel transitions with controlled formant movement.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Gestural & Expressive Primitives

Non-lexical vocal actions: mouth clicks, pops, smacks, expressive bursts, vocal gestures, and reverse-phonation behaviors— captured as discrete controllable events and expressive envelopes.

Effort and Exertion Vocal Primitives

IN PRODUCTION

Non-lexical vocal sounds produced under physical load, including strained airflow, exertion exhales, and effort-linked vocal gestures.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Inhales & Reverse Phonation (Physiological)

IN PRODUCTION

Reverse airflow gestures including gasps, inhales, and reversed articulation effects.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Mouth Clicks, Smacks & Pop Articulations

IN PRODUCTION

Close-miked percussive gestures with diverse transient shapes.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Non-Lexical Emotional Vocalizations

IN PRODUCTION

A diverse set of human emotional sounds—including laughter, sobs, sighs, and expressive non-lexical gestures—recorded for nuanced vocal and affective AI synthesis.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Non-Lexical Vocal Gestures

IN PRODUCTION

Expressive vocalizations including sighs, hums, yelps, and affective gestures.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Proteus + Orpheus

All datasets in this series are protected under the Proteus Standard™. Orpheus metadata may be applied at the Enterprise tier to support instruction-tuned control vocabularies for airflow, phonation mode, resonance behaviors, and non-lexical vocal gesture.

Discuss licensing Browse datasets