Catalog Series

Human Vocality Primitives

Rights-cleared, articulation-level recordings of fundamental human vocal behaviors—captured as modular primitives for generative audio systems, embodied agents, and expressive vocal modeling. This series focuses on vocal mechanics (airflow, phonation, resonance, and gesture), not text-to-speech or linguistic content.

Why this series matters

Most “voice datasets” are optimized for language. Human vocality is broader: breath noise, phonation modes, resonance shaping, non-lexical gestures, and physiological cycles. HVP isolates these behaviors so models can learn controllable, human-aligned vocal mechanics—independent of words.

Airflow control
Phonation modes
Resonance shaping
Non-lexical gesture

What makes it hard to capture

These signals are subtle, highly variable, and easy to mislabel without performer-grade domain knowledge. Clean isolation must preserve intent while avoiding fatigue artifacts, and metadata must describe controllable behaviors rather than subjective adjectives.

Subseries

Human Vocality Primitives is organized into four subseries. Each is listed below with its datasets, CMS-driven for accurate status updates.

Phonation Mode Primitives

Voiced output across foundational laryngeal modes: modal, breathy, pressed, falsetto/loft, and fry/subharmonic behaviors— captured with technique boundaries and controllable transitions.

Breathy & Semi-Modal Phonation

IN PRODUCTION

Soft phonation with turbulent onsets and airy harmonic profiles.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Falsetto / Loft Phonation

IN PRODUCTION

Light-register phonation with smooth attacks and harmonic purity.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Modal Phonation

IN PRODUCTION

Steady, natural vocal tone across pitch, amplitude, and sustained vowel sets.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Pressed & Constricted Phonation

IN PRODUCTION

Intense phonation types with high tension, narrowed resonance, and sharp onsets.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Vocal Fry & Subharmonic Phonation (Physiological)

IN PRODUCTION

Low-register fry, creak, and controlled subharmonic behaviors.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Gestural & Expressive Primitives

Non-lexical vocal actions: mouth clicks, pops, smacks, expressive bursts, vocal gestures, and reverse-phonation behaviors— captured as discrete controllable events and expressive envelopes.

Effort and Exertion Vocal Primitives

IN PRODUCTION

Non-lexical vocal sounds produced under physical load, including strained airflow, exertion exhales, and effort-linked vocal gestures.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Inhales & Reverse Phonation (Physiological)

IN PRODUCTION

Reverse airflow gestures including gasps, inhales, and reversed articulation effects.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Mouth Clicks, Smacks & Pop Articulations

IN PRODUCTION

Close-miked percussive gestures with diverse transient shapes.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Non-Lexical Emotional Vocalizations

IN PRODUCTION

A diverse set of human emotional sounds—including laughter, sobs, sighs, and expressive non-lexical gestures—recorded for nuanced vocal and affective AI synthesis.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION

Non-Lexical Vocal Gestures

IN PRODUCTION

Expressive vocalizations including sighs, hums, yelps, and affective gestures.

View dataset →

Human Vocality Primitives

Voice and Vocal Techniques

IN PRODUCTION