1. Quick start #

Specula main window with a stereo master loaded: waveform, spectrogram, FFT, stereo width, and the live loudness sidebar.
The full window. Waveform, spectrogram, FFT, stereo width, and the live loudness sidebar on one file.
  1. ⌘O - pick an audio file (WAV / AIFF / FLAC / CAF / MP3 / AAC).
  2. Space plays. ⌘. stops. Click the waveform to move the playhead.
  3. -drag on the waveform to select a region. ⌘Return runs full offline analysis on the selection.
  4. Scroll to zoom, -scroll for amplitude (vertical) zoom, -scroll to pan.
  5. ⌃1-⌃5 toggle each analysis section on/off. ⌃⌥1-⌃⌥5 focuses one section (hides the rest).
  6. ⌘1 Edit · ⌘2 Compare · ⌘3 Chapter · ⌘4 Dialogue - the four authoring modes, mutually exclusive. Each carries its own accent colour (amber / sky-blue / green / rose) and its own toolbar in the same chrome slot.
  7. ⌘, opens Preferences (Layout · Spectrogram · FFT / Analysis · Speech · Targets · Report · Updates · License · Acknowledgements tabs).

2. Loading & playback #

Open a file

⌘O opens the standard file picker. Multiple-selection is allowed: the first file becomes the primary and the rest open as compare slots. Specula loads each file into RAM as a Float32 non-interleaved buffer. Pre-computed waveform peaks and clip regions are calculated during load.

Other ways to open files:

Opening files always starts a clean comparison. If two files were already in compare, opening a single new file replaces the whole set; opening three new files replaces the whole set with those three. To append to the current comparison without starting fresh, use File → Add Files to Compare… (⇧⌘O) or the Compare button in the file dock.

Transport

SpacePlay / pause
⌘.Stop and return to start
/ Skip ±5 seconds
⇧← / ⇧→Skip ±1 second
⌘←Jump to start
LToggle loop. Loops the current selection if one exists, otherwise the whole file.

Hold play-start

Next to Loop in the transport bar is a small ↺ Hold play-start toggle. Default: off. Pause stops playback at the current position (classic transport behaviour). Turn it on (the icon fills, accent-tinted) and Pause (Space or the play / pause toggle) returns the playhead to where Play was last started, or where you last manually placed the cursor with a waveform click. Tap Space again and you re-play the same passage from the same starting point. Useful when you're A/B'ing a passage by ear without wanting to re-position the cursor each time.

In compare mode the toggle also makes nudge buttons (±1 sample / ±1 ms) re-seek to that anchor, so the comparison point doesn't drift forward while you click alignment buttons.

Stop is unchanged either way: it always rewinds to the beginning of the file.

Playback rate

Variable rate from 0.25× to 2×, with optional pitch preservation (timestretch).

[Step rate down
]Step rate up
\Reset rate to 1×

The Transport bar exposes the rate menu (0.25× · 0.5× · 0.75× · 1× · 1.25× · 1.5× · 2×) and the "preserve pitch" toggle.

Output device

The Transport bar shows the current output as a compact Output: <Device Name> pill on the right side. Click it (or press ⌥⌘O, or pick Window → Output Panel) to open the Output window. The window holds a device dropdown (with channel count next to each name; the Refresh Devices item lives at the bottom of the menu) above the per-channel routing matrix. Switching device rebuilds the audio engine. Specula is happy with any device, from built-in speakers to multichannel pro audio interfaces.

The Output window is hidden by default and remembers its size and position across launches. On macOS 26 the window itself uses Liquid Glass, so it reads as a floating utility panel over the main app (the same Liquid Glass opt-out toggle covers it). Routing details are in Channel routing & monitor mode.


3. Waveform display #

Renders all channels of the loaded file simultaneously, scaled to fill the section height. Each channel uses its layout colour - mono, stereo, quad, 5.1, 7.1 each have a distinct palette.

Stereo waveform with a selection highlight and a red clip region on a transient, time ruler on top.
Waveform. ⌘-drag-select, click-to-seek, clip detection, and a dBFS grid across all channels.

Interaction

Ruler

A time ruler runs above the waveform. Click the ruler-mode toggle in the waveform control strip to swap between Time (HH:MM:SS.mmm) and Samples (frame count). Useful when you need to align two takes by sample.

Vertical / amplitude zoom

The control strip below the waveform has an amplitude zoom control (−/+ buttons) - useful for inspecting low-level detail without it being lost in the noise floor. Default is 1× (full ±0 dBFS range).

Clip detection

During load, Specula scans every channel for sample values ≥ ±1.0. Each detected clip region is highlighted in red on the waveform and counted in the Channel Info sidebar.

dBFS grid

The waveform draws horizontal grid lines at 0, −3, −6, −9, −12, −18, −24, −30, −36, −48, −60, −72, −84, −96 dBFS. Lines auto-cull when the channel is too short to label them legibly.

Speech regions overlay (SG toggle)

After file load, Silero VAD runs in the background and produces a per-100 ms speech / non-speech timeline. The waveform's control strip has an SG toggle - when enabled, detected speech blocks are shaded in teal. Useful for visually verifying the VAD result before relying on speech-gated loudness numbers.

Channel mute

Click a channel label in the Channels sidebar to mute that channel for playback. Useful for soloing one side of a stereo file or isolating a single channel of a multi-channel master.


4. Selection & selection analysis #

A selection is a [start..end] range on the waveform. Selections drive offline analysis, edit-mode operations, and loop playback.

EscClear selection (waveform + FFT)
⌘ASelect all
⌘ReturnRun offline selection analysis

Load-time analysis (automatic)

Most of the offline numbers populate the moment a file finishes loading, with no key press. A detached background pass over the whole file produces:

These numbers feed the info panel's Offline tab and both the PDF and JSON reports immediately after open. A typical music track finishes the pass in a few hundred milliseconds. Long-form audio takes proportionally longer.

Selection analysis (⌘Return)

⌘Return reruns the full analysis pipeline on the current selection rather than the whole file. Use it when you want metrics scoped to a specific region (a chorus, a problem section, a candidate edit point), or when you want the averaged FFT spectrum view.

Selection-scoped output:

A spinner in the Loudness sidebar indicates the analysis is in flight; results land when it completes (typically ~1-3 s for a multi-minute selection). Live measurement keeps running in the background regardless.

⌘Return is not required to populate true peak or per-channel stats on a freshly loaded file: those land in the Offline tab and the report automatically. Reach for ⌘Return when the question is about a region of the file, or when you want the averaged spectrum on the FFT view.

Info sidebar with per-channel peak, RMS and crest, integrated LUFS, LRA, and true peak.
Selection analysis. A full offline pass fills the sidebar with per-channel peak, RMS, and crest.

Frequency selection (FFT)

The FFT spectrum view has its own horizontal selection on the frequency axis - and it's an active playback filter, not a measurement readout. See FFT spectrum for details. Esc clears the FFT selection too.


5. EBU R128 loudness #

Specula implements full ITU-R BS.1770-4 / EBU R128:

What's shown live (sidebar)

MetricDescription
Momentary (M)400 ms window, updates 10×/sec
Short-term (S)3 s window
Integrated (I)Whole-file integrated LUFS, dual-gated. Resets on file load and on play-from-zero.
LRAThe 10-95 percentile range of the gated short-term distribution
True Peak (dBTP)Highest 4×-oversampled inter-sample peak. Turns red above −1 dBTP.
Sample Peak (dBFS)Highest absolute sample value across the whole file.
Loudness section of the sidebar: momentary, short-term, integrated, LRA, true peak, and sample peak.
Live loudness. Momentary, short-term, integrated, LRA, and 4× oversampled true peak.

Live vs Offline loudness display

The Loudness section in the sidebar has a [Live] [Offline] segmented control at the top. Live shows the real-time measurement updating during playback. Offline shows the whole-file results from the load-time analysis pass, populated immediately after the file finishes loading; running ⌘Return on a selection narrows the same metrics to that selection.

The same rows appear in both modes from the chosen source: Integrated, LRA, Speech-Gated, Speech %, Momentary / Short-term (labelled Max Momentary / Max Short-term in Offline since they're whole-file or whole-selection maxima rather than instantaneous), True Peak, Sample Peak, Loudness Targets. Offline adds Time range, Dynamic Range, Headroom, Noise Floor (Podcast mode), and per-channel Peak / RMS / Crest / DC below. The choice persists across launches. Live measurement keeps running in the background regardless of mode.

Mode (Music / Podcast / VOD / Broadcast)

The info panel has a top-level Mode picker right under the File section. Mode is the per-file analysis context - switch it to reframe the same file as a different deliverable. It drives both the Loudness Targets panel and which metric rows appear in the sidebar:

MetricMusicPodcastVODBroadcast
Integrated, LRA, M/S, True/Sample Peak, Targets, Per-channel
Speech-Gated I, Speech %-
Noise Floor (ACX)---
Dynamic Range, Headroom--

Settings → Targets is a per-mode catalog editor - four sections, one per mode, each with its target toggles. Use it to decide which targets show up next to a loaded file when that mode is active.

Loudness targets

When integrated LUFS is finite, Specula shows the Loudness Targets panel: one row per selected target with the verdict for the active mode. Pick the mode and up to four targets per mode in Settings → Targets.

ModeTreatmentSource
Music / StreamingPenalty: shows the gain the platform will apply (e.g. Spotify −4 dB)Integrated LUFS
Podcast / Spoken WordPenalty for streaming podcast targets; compliance for ACXDialog-gated (streaming) / RMS (ACX)
VODCompliance: green dot or FAIL L / FAIL TP against the dialog-gated bandDialog-gated LUFS
BroadcastCompliance against the integrated band and the true-peak ceilingIntegrated LUFS

Music targets report the gain the platform will apply rather than a hard verdict. A mix at −10 LUFS reads "Spotify −4 dB" (the platform will turn it down 4 dB to hit its −14 LUFS reference). A quiet mix at −18 LUFS reads "+4 dB" on symmetric platforms, "as-is" on asymmetric ones (see below). The dot is green when the gain is within ±1.5 dB of zero, yellow within ±4 dB, orange beyond. A separate triangle warns when the true peak breaches the platform's ceiling.

VOD and broadcast targets are hard pass/fail. The dot is green when the loudness sits within the tolerance band AND the true peak is at or below the ceiling. FAIL L means the loudness is outside the band; FAIL TP means the loudness is in band but the true peak is over the ceiling; FAIL S↑ means the maximum short-term loudness exceeded the spec's short-term ceiling (used by EBU R128 S1); FAIL NF means the noise floor exceeded the spec's RMS ceiling (used by ACX). Loudness fails take precedence; the loudness reason is reported first even when several criteria are off.

Dialog-gated targets (Netflix and peers, streaming podcasts, ATSC A/85) evaluate against speech-gated LUFS. Files with no detected speech show "no speech" rather than a misleading verdict. Target reference lines lead with dialog · whenever a target reads from the speech-gated path, so you can scan the measurement source without parsing LUFS-vs-LKFS unit suffixes.

ACX noise floor. Specula measures the dB RMS across all VAD-classified non-speech samples and shows it in the Speech-Gated section of the Info panel. ACX rejects audiobooks above −60 dB RMS, so the reading turns red when it exceeds that. The ACX preset in the loudness-targets panel automatically picks up the measurement and shows FAIL NF when the noise floor breaches the ceiling. Files with no non-speech samples show −∞. The noise floor is undefined in that case and the criterion passes trivially.

Verdict in JSON / PDF / HTML exports

Each row in the Loudness Targets panel maps to a verdict object on the corresponding target in the JSON export, so downstream scripts can branch on pass / fail without recomputing against the target spec. The PDF and HTML reports render the same verdict as the status badge above each target card.

FieldTypeWhen populated
statuscompliant · penalty · notCompliant · unavailableAlways.
actualLUFSnumberThe integrated or dialog-gated reading the verdict was computed against.
penaltyGainDBnumberStreaming-platform targets (Music + streaming Podcast).
penaltyAppliesbooleanStreaming-platform targets. False for "as-is" on asymmetric platforms.
truePeakCleanbooleanStreaming-platform targets. False when the master's true peak breaches the platform's ceiling.
nonComplianceReasonloudness · truePeak · shortTerm · noiseFloorHard-spec targets (VOD / Broadcast / ACX) when status is notCompliant.
nonComplianceActualnumberThe actual reading on the failing criterion.
nonComplianceLimitnumberThe ceiling or band edge the failing criterion exceeded.

JSON consumers that ignore the field keep working. Useful for batch QC scripts, CI gates on render farms, and Shortcuts workflows that branch on whether a master will pass a given platform without re-running the math.

VOD mode loudness targets with the Netflix row showing the dialog reference at −27 LKFS.
Dialog-gated targets. The 'dialog' prefix marks targets measured over speech only, like Netflix at −27 LKFS.
Settings Targets tab with the Music catalog and the default platforms checked.
Pick your targets. Choose up to four per mode in Settings, Targets.

Dialog-gated vs integrated measurement

Different specs measure loudness differently and Specula computes both in parallel so the right one feeds each target.

The Loudness section shows both values side by side. Each target row's reference line is prefixed with dialog · when the target evaluates against the dialog-gated reading. Files that contain no detected speech show the dialog-gated reading as "-" and dialog-gated target verdicts as "no speech".

Asymmetric platform normalization

Not every streaming platform applies the gain it computes. Apple Music's Sound Check, YouTube, and Tidal only turn loud tracks down; they leave quieter-than-reference tracks at their original level. So if your master sits at −18 LUFS against Apple Music's −16 reference, the platform plays it at −18, not boosted to −16. Specula reflects this:

Settings → Targets surfaces the behaviour in each target's subtitle ("penalty, turns down only" vs "penalty"). The match-button tooltip on each row also states which measurement source the match will use.

A quiet master in Music mode: Apple Music, YouTube, and Tidal read as-is in green while a boosting platform shows a real gain.
Asymmetric platforms. Apple Music, YouTube, and Tidal leave a quiet master as-is; the platforms that boost show the real gain.

Match to target (non-destructive)

Each target row has a small waveform-circle toggle on the right. Press it and Specula matches playback to that target:

Hit play and you'll hear the file the way the platform plays it. Press the toggle again (or pick another target) to release. Nothing in the loaded buffer is modified - match is purely a playback gain stage with a limiter after it.

The match follows the loudness display mode: switching Live → Offline (or back) while a match is active recomputes the gain against the new source. If a matched target is no longer in the active mode's selected targets (you switched mode, or unticked it in Settings), the match auto-disables so you can never apply gain you can no longer see the verdict for.

5 ms playback latency while match is active. Dialog-gated targets can be matched too - the gain is computed against the speech-gated reading. Targets disable with a tooltip explaining which reading is missing (typically "no detected speech" for dialog-gated targets on speech-free files).

A loudness target row with match-to-target engaged, playing at the platform reference through a true-peak limiter.
Match to target. Audition the file at a target's reference and ceiling without altering it.

Shipping catalog

Defaults are marked ✓.

ModeTargetReferenceToleranceTrue-peak ceiling
MusicSpotify ✓−14 LUFSpenalty (symmetric)−1 dBTP
MusicApple Music ✓−16 LUFSpenalty (turns down only)−1 dBTP
MusicYouTube ✓−14 LUFSpenalty (turns down only)−1 dBTP
MusicTidal ✓−14 LUFSpenalty (turns down only)−1 dBTP
MusicAmazon Music−14 LUFSpenalty (symmetric)−2 dBTP
MusicDeezer−15 LUFSpenalty (symmetric)−1 dBTP
MusicSoundCloud−14 LUFSpenalty (symmetric)−1 dBTP
MusicAES TD1008 (streaming)−18 LUFSpenalty (symmetric, vendor-neutral)−1 dBTP
PodcastApple Podcasts ✓−16 LKFS dialog-gatedpenalty−1 dBTP
PodcastSpotify (Podcast) ✓−14 LKFS dialog-gatedpenalty−1 dBTP
PodcastACX (Audiobook)−20.5 dB RMS±2.5 dB−3 dBTP (plus noise floor)
VODNetflix ✓−27 LKFS dialog-gated±2 LU−2 dBTP
VODPrime Video ✓−27 LKFS dialog-gated±2 LU−2 dBTP
VODApple TV+ ✓−27 LKFS dialog-gated±2 LU−2 dBTP
VODDisney+−27 LKFS dialog-gated±2 LU−2 dBTP
VODMax ✓−27 LKFS dialog-gated±2 LU−2 dBTP
BroadcastEBU R128 (EU) ✓−23 LUFS±0.5 LU−1 dBTP
BroadcastEBU R128 (EU, live)−23 LUFS±1 LU−1 dBTP
BroadcastEBU R128 S1 (EU, short-form)−23 LUFS±0.2 LU + Max Short-term ≤ −18 LUFS−1 dBTP
BroadcastATSC A/85 (CALM Act) ✓−24 LKFS dialog-gated±2 LU−2 dBTP
BroadcastARIB TR-B32 (Japan)−24 LUFS±1 LU−1 dBTP
BroadcastOP-59 (Australia)−24 LUFS±1 LU−1 dBTP
Broadcast mode loudness targets including EBU R128 and ATSC A/85.
Broadcast specs. EBU R128, ATSC A/85, and multi-criteria verdicts including max short-term.

Normalize to target (destructive)

Where Match previews a target by ear, Normalize commits to it. Integrated targets (music streaming, broadcast) and dialog-gated targets (VOD, streaming podcasts) carry a second button on the row, the up-to-line icon next to the Match toggle. Press it and Specula opens Edit mode with the Normalize section pre-filled to that target, then click Apply to normalize the loaded buffer.

The Normalize panel states its Basis so you always know which loudness is being targeted, and shows the move before you commit:

This is a normal edit: it lands on the 16-level undo stack, and exporting writes a new file rather than overwriting the source. Integrated targets re-measure and iterate to the reference; dialog-gated targets apply the uniform gain that lands the speech-gated reading on the reference. Both finish with the same two-pass true-peak limiter at the ceiling.

The ACX RMS target is the one exception: it stays compliance-only with no Normalize button. ACX is a hard delivery spec whose noise-floor requirement a gain can't satisfy (raising RMS into the window raises the noise floor with it), so normalizing its level alone could read as "now compliant" when it isn't. The button is also disabled (with a tooltip) until a loudness reading is available, and unavailable in multi-file Compare, since Edit and Compare are mutually exclusive.

Normalize is the destructive counterpart to Match: Match is a live playback gain you can release at any time, Normalize bakes the gain into a new file. Both target the same basis the row is scored against and share the same true-peak limiter math; the difference is whether the loaded buffer is rewritten.

Limit TP to target (destructive)

Where Normalize fixes a loudness miss (FAIL L), Limit TP fixes a true-peak overage (FAIL TP). When the file's true peak sits above a target's ceiling, that row shows a Limit TP button (the waveform-pulse icon) alongside Normalize. Press it and Specula opens Edit mode with the Limit TP section pre-filled to that target's ceiling; click Apply to cap the inter-sample peaks. Loudness is left untouched, so this is the right fix when the level is already where you want it and only the peaks breach the ceiling, the one-click counterpart to the FAIL L → Normalize button.

The button is contextual: it appears only when there's a true peak over the ceiling to fix. That covers a hard FAIL TP (loudness in band, peak over), a loudness-compliant row carrying the orange over-ceiling triangle, and a music target whose peaks still clip its ceiling. It's offered on any target type, ACX included, because every target carries a true-peak ceiling and capping peaks never raises the noise floor (unlike Normalize). FAIL S↑ and FAIL NF get no fix button, since no peak limit corrects a short-term or noise-floor miss. Like Normalize, it lands on the undo stack, saves as a new file, and is unavailable in multi-file Compare.


6. Speech-gated loudness #

Specula uses a two-tier voice activity detection pipeline to compute integrated loudness over speech-only blocks. This matters for podcast / dialog / film mixing where you want to ignore room tone, music beds, and silence.

Tier 1 - Silero neural VAD

Runs the Silero VAD model via FluidAudio. On file load, the entire file is downsampled to 16 kHz mono and classified. The resulting per-100 ms speech / non-speech history feeds the loudness measurement. Selection analysis re-runs Silero on just the selection.

Tier 2 - Spectral fallback

Used when Silero is unavailable (it fails to load or returns an error). Four spectral features:

  1. HF gate - if HF energy > 0.35 × total → non-speech (rejects plosives and cymbals).
  2. Band Energy Ratio (300-3 400 Hz) - threshold 0.45.
  3. Spectral Flatness Measure (dB) - threshold −8 dB.
  4. Harmonicity + Spectral Flux - voiced when the 2nd and 3rd harmonics of the dominant 80-400 Hz bin carry ≥ 20% of the fundamental's energy.

Output

Waveform with the speech-gating toggle on and speech blocks shaded teal.
Speech regions. The neural VAD shades detected speech, fully offline.

Tuning the VAD (Settings → Speech)

Three sliders bias the detector for the content you work with. Edits trigger a re-analysis in roughly half a second, and the new history feeds straight back into the loudness measurement.

  1. Threshold (0.1 - 0.9, default 0.5) - confidence below which a frame is classified as non-speech. Lower picks up quieter speech; higher rejects more non-speech.
  2. Minimum region duration (0.05 - 2.0 s, default 0.10) - shorter detections are dropped. Raise to reject clicks and very short interjections.
  3. Merge gap (0.05 - 2.0 s, default 0.10) - minimum silence between two regions before they're kept separate. Larger values fuse nearby regions through breath pauses.
Settings Speech tab with threshold, minimum-region, and merge-gap sliders.
Tune the VAD. Threshold, minimum region, and merge gap, each with a plain-language note.

Dialogue mode (manual region authoring)

For programme material where the VAD's false positives or misses matter, ⌘4 (or the Dialogue button on the transport bar) enters Dialogue authoring mode - one of the four modal authoring modes (Edit, Compare, Chapter, Dialogue) the app exposes, mutually exclusive with the other three. Every detected region is visible on the waveform while the mode is active (the SG overlay is forced on so you don't need to discover the toggle).

Region painting on the waveform:

Toolbar:

Settings → Speech slider edits always re-run detection. If you have manual edits when you move a slider, they're pushed onto the undo stack first so you can ⌘Z back to them.

While Dialogue mode is active, each region's waveform tint tells you where it came from:

Edited regions are auto-saved to <audio>.dlg.json next to the audio file (debounced 500 ms after the last edit). Sidecar files round-trip cleanly: re-opening the same audio re-applies your edits, and the schema is versioned (currently v1) so future format changes won't silently corrupt old work.

Acting on the regions: Level Dialogue

The regions aren't only a measurement input. Once they're right, switch to Edit mode (⌘1) and use Level Dialogue to act on them. It ducks the room tone in the non-speech regions (a downward expander with a fast attack and slow release, so word onsets and tails stay clean) and can optionally lift the dialogue to a target in the same pass. Gating the silence before applying the makeup gain is what lets dialogue normalize without the noise floor riding up with it, the fix for a common ACX rejection. When you lift to a target you also set a true-peak ceiling, and the makeup boost is capped there by the same limiter Normalize uses. It's a gate, not a denoiser, so it helps a marginal room tone rather than a hissy recording.

Dialogue mode with region tints for auto, edited, and hand-painted regions and the dialogue toolbar.
Dialogue mode. Hand-correct the detected regions; tints show auto, edited, and hand-painted.

7. Loudness curve #

A dedicated section that plots per-100 ms momentary and short-term LUFS over time, with the time axis aligned to the waveform's. Toggle with ⌃2.

After offline selection analysis, the Loudness Curve view fills in for the analysed range; live playback adds new samples as they're measured.

The exported JSON and PDF reports plot three traces: Momentary (yellow), Short-term (blue), and a running Integrated trace (green) that converges to the file's final integrated reading at the tail. The third trace is the easiest way to see how a master "settled" against its target during the run.


8. Loudness violations #

Set programme-loudness thresholds and watch the waveform highlight every block that exceeds them. In the Loudness Curve / control strip:

Any 100 ms block above its threshold is highlighted on the waveform in real time during playback and after offline selection analysis. Useful for spotting the exact moment a master clips a streaming target.

The Violations toggle in Settings → Report controls whether they're included in JSON / PDF exports.


9. FFT spectrum #

Real-time logarithmic-frequency spectrum from 20 Hz - 20 kHz. Toggle with ⌃4.

FFT spectrum with a hover readout showing frequency, magnitude in dB, and the nearest note.
FFT spectrum. Five windows, frequency-range selection, and note-name on hover.

Window function

Five windows, switchable from Settings → FFT / Analysis:

WindowSide-lobe rejectionBest for
Hann−31 dBGeneral purpose. Good balance.
Hamming(slightly higher)Slightly narrower main lobe than Hann; higher far side-lobes.
Blackman−58 dBExcellent side-lobe rejection. Slightly wider main lobe.
Blackman-Harris−92 dBBest side-lobe rejection. Widest main lobe. Closely-spaced harmonics.
Flat Top(very wide main lobe)Most accurate amplitude reading. Calibration & level measurement.

Note names (NOTES toggle)

Hover anywhere on the spectrum and the cursor reads out frequency in Hz, magnitude in dB, and the closest musical note + cents deviation. Toggle on/off via the NOTES button in the FFT control strip.

Frequency selection - live BAND / NOTCH filter

-drag horizontally on the spectrum to capture a frequency range. The selection is an active playback filter, not a measurement readout - playback is routed through it so you can hear what's inside or outside the band.

A BAND / NOTCH toggle in the FFT control strip switches the filter mode:

The filter is live - adjust the selection edges and the audio responds in real time. Drag the body of the selection to slide the band across the spectrum while keeping its width.

Stereo files use a highpass + lowpass pair for BAND and a parametric −40 dB cut for NOTCH; multichannel files apply the same filtering per channel.

Esc clears the selection and bypasses the filter.

Smoothing

The FFT control strip exposes a smoothing slider - exponential averaging across frames. Higher values produce a calmer trace; lower values respond faster.

Channel selection

Pick which channel feeds the FFT (or use a downmix). Useful for inspecting one channel of a multichannel file in isolation.

Selection-analysis spectrum

After ⌘Return offline analysis, the FFT view switches to the averaged spectrum of the selection - a much smoother, more accurate read than the rolling live spectrum. Clearing the selection (Esc) returns to live mode.

Spectrum under the playhead (paused)

While playback is stopped, clicking anywhere on the waveform or spectrogram recomputes the FFT for that position and updates the spectrum view to match. The FFT window ends at the cursor, so the spectrum you see lines up with the spectrogram column under the playhead. Useful for inspecting a specific moment without scrubbing: click, read, click again, read. Works for keyboard skips too (, , ⇧←, ⇧→).


10. Spectrogram #

Rolling time-frequency display. Toggle with ⌃3. All settings live in Settings → Spectrogram.

Spectrogram of a music track with the Inferno colormap on a log frequency scale.
Spectrogram. Four colormaps, log or mel scale, configurable dB range and time resolution.

Colormap (4 options)

Frequency scale

dB range

Two sliders set the colour-mapping window:

Anything quieter than the floor is mapped to the bottom colour; anything louder than the ceiling clips to the top colour. Tighten the window to highlight subtle detail; widen it to see the full dynamic range.

Time resolution (FFT overlap)

Each spectrogram column is one FFT. Overlap controls how often that FFT is recomputed per second, which sets the time-axis density. Higher overlap means columns are closer together: less blocky at high zoom, smoother gradients in time.

OverlapHop (fftSize=4096, 48 kHz)Columns/sec
75% (default)21.3 ms~47
87.5%10.7 ms~94
93.75%5.3 ms~188
96.875%2.7 ms~375
98.4375%1.3 ms~750

96.875% and 98.4375% can substantially lengthen the offline Analyse pass at large FFT sizes (16k or 32k), since the column count scales with the overlap factor. Use them when you need maximum smoothness; 75% is the standard pro-audio default.

Live vs Offline

The spectrogram follows the same [Live] [Offline] toggle the loudness section uses.

The two stores are independent: live playback never overwrites the offline analysis, and Analyse never disturbs the live history.

Auto-compute on load (opt-in)

Settings → Spectrogram → Performance → Auto-compute spectrogram on load. Off by default. When on, the whole-file spectrogram is built during the load-time background pass instead of waiting for ⌘Return on a selection. Sub-second for typical music tracks; the cost scales with file length, so long-form audio takes proportionally longer.

The setting is independent of the FFT spectrum view, which still requires a selection because its averaging is selection-scoped. Turn it on for sessions that mainly work with shorter material (stereo mixes, single tracks) and want the spectrogram immediately on load; leave it off for long-form audio when you'd rather pick a region with ⌘Return.

Channel selection

Pick a single channel or a downmix. The spectrogram persists per-slot in compare mode, so switching slots restores that slot's spectrogram history (both live and offline).

Settings persistence

Colormap, frequency scale, dB Floor, dB Ceiling, and FFT overlap all persist across launches. Tune the view once and Specula remembers.


11. M/S stereo width #

Computes the Pearson correlation between the mid and side signals over rolling windows. Toggle with ⌃5.

Stereo width view with zone colour mode, the curve crossing narrow, stereo, and wide zones.
Stereo width. Mid/side correlation with zone colouring and plain-language hover labels.

Display modes

The control strip has a Correlation / Width % toggle:

Zone colour mode

Optional shading of the correlation curve:

Hover interpretation

Mouse over any point on the curve to see a plain-English label ("narrow stereo image", "natural stereo", "potential mono sum issue", etc.) plus the raw correlation value.


12. Phase scope #

Lissajous correlation meter - plots L on the X axis and R on the Y axis as a 2D scatter. Useful for catching out-of-phase content at a glance.

Hidden by default in the FFT panel area; can be enabled via Layout settings or a panel toggle.


13. RMS level meters #

Per-channel vertical bars in the Channels sidebar. Each bar shows:

Click a channel label to toggle mute on that channel for playback. The meter still shows the muted-source level - useful for inspecting muted channels.


14. Channel routing & monitor mode #

The Routing panel lives in the dedicated Output window (Window → Output Panel, ⌥⌘O, or click the Output pill in the Transport bar). It is an N → M matrix that maps each input/source channel to one or more output channels with per-route gain trim. The Output window also holds the device selector, so the channel count the matrix targets and the device driving it are always co-located.

The window is hidden by default. Open it when you need to reroute, then close it again. Routing settings persist for the loaded file.

The Output window routing matrix for a 5.1 file with per-cell routing and gain.
Routing matrix. Map any source channel to any output, with per-cell gain and polarity.

List view vs Matrix view

The routing section has a List / Matrix toggle. Same data, two presentations.

Your preference persists across launches.

Mixdown preset library

A horizontally scrolling preset bar (visible in both views) applies a complete mapping in one click. Presets are filtered to the ones that apply to the current file and device channel counts.

Applying a preset is destructive: it overwrites the routing for every output channel, including any custom gain you set. Outputs the preset doesn't address are muted, so the preset fully replaces the routing.

Coefficients follow ITU-R BS.775-3. The −3 dB factor on summed channels is 1/sqrt(2), the equal-power gain for correlated channel summation. The Pro Logic II coefficients (−1.2 dB and −6.2 dB on surrounds, with phase inversion on the left total) match the Dolby specification.

Saved (user) presets

Above the built-in row is a Saved row for your own routings.

Polarity inversion

Polarity inversion is a per-source-slot flag, separate from gain. List view has a small ϕ button next to each source's gain knob; Matrix view has the dedicated ϕ click zone in every routed cell (orange glyph when active). Needed for Pro Logic II Lt/Rt encoding and useful for quick A/B phase checks across any single route.

Why it matters

A 5.1 master with the wrong channel order is a routing problem, not an EQ problem. Specula lets you reroute channels for playback without modifying the file, so you can verify L / R / C / LFE / Ls / Rs ordering against your monitor system. The preset library covers the common stereo and surround downmix workflows in one click; the matrix view makes any custom mapping obvious at a glance.

Render to File

Once a downmix, reorder, or custom matrix is right, Render to File… (in the Routing section header) renders it to a new audio file, so the routing is a deliverable, not just an audition. Every per-route gain, polarity, and Pro Logic II ±90° phase shift is applied exactly as it sounds on playback, computed offline.

Only the outputs you've actually routed are written. A 5.1 → stereo downmix on an 8-channel device renders a 2-channel file, not eight channels padded with silence (outputs with no source are dropped). The result writes as a new file (WAV, or CAF when the output channel layout can't be stored in WAV order) and never overwrites the source.

It bakes the routing only: the playback-side loudness match and the channel solo / mute are not baked, so the output is exactly what the matrix defines. To drop a channel from the bake, mute its route in the matrix (set its source to none) rather than relying on the monitor mute.

The Output window with a 5.1 to stereo downmix and the Render to File button in the header.
Render to file. Write the routed downmix to a new file; only routed outputs are written.

PRE / POST monitor mode

A PRE / POST toggle in the Channels sidebar header decides which signal feeds the loudness measurement:

For most workflows, PRE is what you want - the file's actual loudness. POST is useful when you've reduced channel count via the matrix (e.g. a 5.1 → stereo downmix) and need the loudness of the resulting stereo pair.


15. Compare mode #

Load up to 6 files as slots A through F, switch between them instantly, level-match them by integrated LUFS, and compute sample-accurate residuals between any two.

Compare is a toggled mode (⌘2, or the Compare button in the transport bar), one of the four mutually exclusive modes (Edit, Compare, Chapter, Dialogue) Specula exposes. The mode carries its own sky-blue accent colour and its own dedicated toolbar: the compare toolbar sits below the transport with the per-slot offset / gain / polarity controls, the diff toggle, and the metrics popover - the same shape the other three mode toolbars use, so muscle memory carries between modes.

Compare mode with multiple slots loaded, the file dock showing per-slot metrics.
Compare mode. A/B up to six versions, level-matched and auto-aligned.

Entering the mode

Three ways in:

Compare is mutually exclusive with Edit, Chapter, and Dialogue. Entering one mode exits the others; the Compare-mode-only controls (per-slot toolbar, diff toggle, slot-switch keys) disappear when you leave the mode and reappear when you return.

Loading slots

After the first file is open, drag additional files onto the window, drop them on the Compare toggle, or use the + button on the file dock. New files load in detached background tasks; their per-file metrics (integrated LUFS, true peak, sample peak, LRA, DR, speech stats, loudness curve, stereo width curve, speech history) are computed on load.

Switching slots

1 - 6Switch to slot A - F (only slots that exist). Bare digits, no modifier.

Slot switching uses two paths. Fast path: when the new slot's audio format matches the current one, the engine performs an atomic buffer swap. Sub-millisecond. Slow path: when sample rate or channel count differs, the engine is rebuilt. ~50-100 ms.

Reorder slots

Drag any slot chip in the file dock to a different position to reorder. The slot you drop on becomes the new home of the dragged slot, and the chip you targeted (plus everything between) shifts to make room. Dragging a slot to position A (the leftmost) promotes it to the reference file; existing residuals are invalidated at that point because they were computed against the previous A. Use this when your file came in from a Dock drop or Finder drag and Specula didn't pick the file you wanted as the primary. Finder serialises drag selections in display order, not click order, so the file you wanted as the reference may not be the one Specula picked.

Level match (L)

Each slot's chip has an L toggle. When on, Specula applies a gain offset that brings the slot's integrated LUFS to match slot A's. Hear two masters at the same loudness instead of "the louder one always wins".

Polarity invert (ϕ)

Each slot has a ϕ toggle that flips polarity. Useful for catching wiring errors and for diff workflows where flipping one signal makes the residual sit closer to zero.

Editable offset and gain

The compare toolbar's offset (samples + ms) and gain (dB) readouts are editable TextFields, not just display labels. Click into one, type the value you want, press Return to commit (or Esc to cancel). Each field has a small reset button that appears when the value is non-zero - one click returns it to zero. Right-click either field for the same Reset entry.

The ±1 sample / ±1 ms / ±0.1 dB / ±1 dB nudge buttons sit next to the fields for fine adjustments by ear, folded together with level-match into one gain stage. The transport bar's ↺ Hold play-start toggle (full description in Loading & playback → Transport) extends to compare-mode nudges: with it on, every nudge re-seeks playback to where Play was last started, so the comparison point stays fixed while you click ±1 sample / ±1 ms.

Auto-align

Click the auto-align button on any non-A slot. Specula runs decimated cross-correlation between slot A and the active slot to find the best sample offset, then applies it. Works on takes that are within ~10 seconds of each other.

Listen to Diff (per-slot)

Diff is a per-slot listening mode, not a separate slot. The active non-A slot's toolbar carries a prominent Listen to Diff toggle: press it and playback switches to the cached A − slot residual through the same waveform, spectrogram, and export paths the source uses. Press the toggle again (or pick another slot) to return to the source instantly.

Bare D (no modifier) toggles diff view on the active compare slot - a one-key A/B between source and residual that fits the muscle memory of bare-digit slot switching.

The residual:

Compare with Listen to Diff engaged; the waveform shows the low-level residual.
Listen to the difference. The A−X residual isolates exactly what changed.

Waveform A overlay

The Waveform A Overlay toggle (Edit menu → "Waveform A Overlay" or in the Compare panel) draws slot A's waveform as a ghost behind whichever slot is active - direct visual A/B at the sample level.

Compare panel

The Compare metrics button (toolbar) opens a popover with a side-by-side metrics table for every loaded slot - integrated LUFS, true peak, sample peak, LRA, DR, speech stats, etc.


16. Edit mode #

Non-destructive editing. Always saves as a new file - never overwrites the original.

Edit, Chapter, and Dialogue share one editing session: the edited audio, the undo history, and the unsaved-changes state follow you across all three modes, so a workflow that moves between them is continuous and a single Save at the end writes everything. Switching modes never discards edits; loading a different file, or Discard Edits, is what clears the session. Multi-file Compare is separate: entering it from unsaved edits prompts to Save or Discard first.

Edit mode active with the amber EDIT badge and the full operation toolbar.
Edit mode. Trim, Cut, Gain, Fade, Normalize, Limit TP, Level Dialogue, and more, with 16-level undo.

Enter / exit

⌘1 toggles edit mode. Edit is one of the four mutually exclusive modes (Edit, Compare, Chapter, Dialogue); entering it exits any other active mode. The mode carries the amber accent colour - matching the EDIT badge that lights on the waveform and the tint of the edit toolbar.

Operations

Each operation pauses playback, captures an undo snapshot (a 16-level disk-backed stack), applies the change, and resumes.

OperationNotes
TrimCrop to the current selection (keep the selection, discard the rest).
CutRemove the selection and join the audio on either side - the opposite of Trim. Use it to take a too-long pause or a flubbed line out of the middle. An equal-power crossfade at the join (default 10 ms, set it or 0 for a hard cut) hides the splice click; it's clamped to the audio kept on each side. Undoable, saves as a new file, and large files stream the result.
Add SilenceInsert at the playhead (no selection needed - park the cursor where the gap belongs, say 500 ms for a video edit, and Apply), insert before / after or replace a selection, or insert at the file start / end. Units: seconds or samples.
Change LevelLinear gain in dB, applied to the selection or the whole file.
Invert PhaseMultiply by −1.
FadeLinear / Logarithmic / Equal-power × In / Out - 6 fade curves total.
Normalize PeakTarget dBFS. Applies to the selection when one is set, otherwise the whole file.
Normalize LUFSTarget LUFS + ceiling (interpreted as dBTP). Auto-applies a 5 ms lookahead true-peak limiter when normalization increases loudness so inter-sample peaks stay under the ceiling, not just sample peaks. Two-pass offline: pass 1 builds the gain envelope from the input's oversampled peaks; pass 2 catches any residual peaks the release stage might leave fractionally over the ceiling. Scopes to the selection when one is set (measure and gain over that range only), otherwise the whole file.
Limit TPTrue-peak limiter only, ceiling in dBTP. Loudness otherwise unchanged. Use when integrated LUFS is already where you want it but inter-sample peaks need to be capped under a platform ceiling. Same DSP as the limiter Normalize LUFS uses; just skips the loudness measurement and gain stage. Scopes to the selection when one is set, otherwise the whole file.
Level DialogueA region-aware downward expander keyed to the detected speech regions (the ones you can tune in Dialogue mode). It ducks the room tone between phrases with a fast attack and slow release, so word onsets and tails stay clean, no clipped starts, no chopped ends. Tick Lift dialogue to target and it also gains the speech to a loudness target in the same pass, capping inter-sample peaks at a true-peak ceiling. Because the silence is gated before the makeup gain goes on, the floor ends up lower, not higher: that's the move a plain dialog-gated Normalize can't make, since a uniform gain lifts the floor along with the voice. It's a gate, not a denoiser, so it helps a marginal floor (a quiet room tone just under the gate), not a hissy recording. Undoable, saves as a new file.
Remove DC OffsetSubtracts each channel's mean.
Swap ChannelsStereo L↔R swap.
Split to MonoWrites N mono WAV files (one per channel) to a directory you pick. Channel labels in the filename.
The Cut popover with an equal-power crossfade field over a selected range.
Cut. Remove a range and crossfade the join, the opposite of Trim.
The Level Dialogue popover with a duck amount, a lift-to-target toggle, and a true-peak ceiling, over teal speech regions.
Level Dialogue. Duck room tone before the makeup gain, so the floor drops instead of rising.

Keyboard shortcuts

Edit mode adds a set of single-key edits, active only in Edit mode so they don't shadow the transport keys, and listed in the Edit menu so the keys are always visible. The first group acts on the selection; the playhead group acts at the cursor. Cut uses a 10 ms equal-power crossfade by default; the fade keys use the equal-power curve.

KeyActionActs on
Cut (remove and crossfade-join)selection
⌥⌫Silence selection (replace, keep length)selection
⌘TTrim to selection (keep it, discard the rest)selection
⌘F / ⌥⌘FFade In / Fade Outselection
⌃↑ / ⌃↓Gain +1 / −1 dBselection
⌘⌫Cut to start (remove start → playhead)playhead
⇧⌘⌫Cut to end (remove playhead → end)playhead
⌥[ / ⌥]Fade in to / out from the playheadplayhead

Undo

⌘Z steps back through the session's edits, including edits applied from Chapter mode (Level chapters) and the Level Dialogue pass, since the three modes share one undo history. ⇧⌘Z redoes. Undo is disk-backed (each level is a temp file, not a second copy in RAM), so deep history doesn't multiply memory. In Dialogue mode, ⌘Z targets region edits (a separate 50-step history); audio-buffer undo is reached from Edit or Chapter.

Save and Discard

⇧⌘S opens a save panel. Specula writes the current audio to a new file (it never overwrites the source). Format defaults to WAV at the file's native bit depth and sample rate. Saving makes the written file the working document in place: your chapters, dialogue regions, and playhead stay put, and you stay in the mode you were in.

Discard Edits (Edit menu) reverts the buffer to the last saved version in one step, the explicit counterpart to Save. Save As, Discard, and the unsaved-changes dot are reachable from Edit, Chapter, and Dialogue alike, because those three modes share one editing session.

Edit toolbar

While in edit mode, a horizontal toolbar appears below the transport. Operations with parameters (Cut, Silence, Gain, Fade, Normalize, Limit TP, Level Dialogue) open small popovers - set the parameters, click Apply. The Limit TP popover takes a ceiling in dBTP with presets at -2 / -1 / -0.5 / -0.1. The Cut popover takes the crossfade length (default 10 ms, 0 for a hard cut). The Level Dialogue popover sets the duck amount, an optional Lift dialogue to target with a target and true-peak ceiling.


17. Chapter mode #

A dedicated mode for audiobooks and long-form podcasts. Specula segments the file at long silences, scores every chapter against the full ACX measurement set, and surfaces both the per-chapter ACX gates and each chapter's deviation from the book's own median. Mutually exclusive with the other three modes (Edit, Compare, Dialogue) so the workspace stays focused.

Chapter mode with a multi-chapter audiobook, the ribbon showing per-chapter RMS, LUFS, TP and NF, one chapter flagged red.
Chapter mode. Per-chapter RMS, true peak, and noise floor, with ACX outliers flagged.

Entering the mode

A Chapter button sits in the transport bar next to Edit, Compare, and Dialogue. Click it to enter or exit chapter mode; ⌘3 does the same from the keyboard. Chapter mode carries the green accent colour - the chapter toolbar that slots in below the transport, the playing-chapter fill on the ribbon, and the Chapter button itself all match. Mutually exclusive with Edit, Compare, and Dialogue.

Detect

Detect scans the loaded mono mix for silences longer than the minimum duration (default 2 s) below the silence threshold (default −55 dB). Each silence becomes a boundary; the audio between two silences becomes a chapter. Chapters whose total non-silent content runs under 1 s are dropped so a stray cough doesn't fragment the result.

The detector absorbs leading and trailing silence into the first and last chapters. Chapter 1 always starts at 0 s, the last chapter always ends at file duration, and the ribbon covers the full file timeline with no gaps at the edges.

Boundaries follow zoom and scroll

Boundary markers are drawn inside every time-axis section that has one open: waveform, loudness curve, spectrogram, and stereo width. Each section uses its own time-to-x mapping, so the boundaries stay aligned across stacked views at any zoom level. The chapter ribbon mirrors the waveform's zoom and horizontal scroll, and narrow chapters keep their width when neighbours dominate the visible range.

Editing boundaries

Each chapter boundary draws as a vertical teal line with a "#N" badge labelling the chapter that starts at that boundary. Drag a boundary on the waveform to move it. Hit zone is 6 px; the cursor switches to the macOS resize cursor when it lands on a boundary. Drag clamps so neither neighbouring chapter shrinks below 1 s of content. A time + dBFS hover readout tracks the boundary's new position while the drag is in progress.

Add drops a new boundary at the playhead. The chapter ribbon and the per-chapter measurements update as you drag.

Selecting a chapter

Click a slot in the ribbon to select it. The selected slot takes a 2 pt accent border; unselected slots keep a 1 pt white border. Selection and playback are independent: the currently playing chapter is shown by an accent-tinted fill (28 %), regardless of which chapter is selected. Double-click a slot to seek the playhead to its start.

Selecting a boundary for removal

Click a boundary line in the waveform (not a slot in the ribbon) to flag it. The flagged boundary draws thicker in amber with a "✕" marker next to its number, and the chapter that starts at that boundary becomes the selected chapter in the ribbon. The next press of Remove in the chapter toolbar deletes that boundary, merging its two chapters into one. This is the precise way to pick which split to undo; clicking a slot in the ribbon still selects the same chapter without changing which boundary is highlighted.

Name and reorder

Double-click a chapter name in the ribbon to rename. Default names ("Chapter 01", "Chapter 02"…) follow the index after edits; custom names ("Foreword", "Walk through the woods") stay put.

Per-chapter measurements

Analyse Loudness in the toolbar runs an offline BS.1770 pass per chapter and populates the full measurement set. For each chapter:

Chapters shorter than 10 s skip the BS.1770 measurements entirely (the absolute-gate and relative-gate machinery isn't reliable below that). RMS and sample peak still populate, since plain RMS and peak are valid at any length, so the ACX loudness gate and clipping don't slip through on short chapters.

Ribbon slot layout

Each slot is four lines tall:

#NN · Chapter Name                                 m:ss
-20.4 dB RMS    -20.6 LUFS  +0.0 LU
TP -2.1 dBTP    NF -62 dB
LRA 4.2 LU    Dlg -19.8 LUFS    Sp 87%

Missing metrics render as muted - so the row positions stay stable across chapters that haven't been analysed yet, are too short for an integrated reading, or fall outside the VAD's coverage.

ACX gates

Each chapter is independently checked against three ACX delivery limits. Failures render the metric in red on the ribbon slot and on the report's chapter table:

These three gates are independent of the deviation-from-median flag, so each chapter answers two separate questions: will ACX accept this chapter on its own? (the three red/no-red signals) and is this chapter consistent with the rest of the book? (the deviation badge, controlled by the ±2 LU threshold in Settings → Targets → Chapter detection).

Outlier flag

The consistency metric (RMS or integrated LUFS) and the deviation threshold (default ±2.0, in that metric's unit) drive the red-tint flag on outlier slots and the headline banner at the top of the ribbon ("Chapter 07 is 2.3 dB above median"). RMS is the default, since it's what ACX delivery is judged on. Both live in Settings → Targets → Chapter detection (the metric is also set in the Level… popover).

Level chapters

The fix for the outlier flag, and for the consistency deviation across a book. Level… in the toolbar opens a popover with two choices:

It applies a per-chapter uniform gain, then re-measures so the ribbon updates. RMS covers every chapter; LUFS skips chapters under 10 s (no integrated reading), as it does chapters already on target. Run Analyse Loudness first; the button stays disabled until there's a reading.

It's a single undoable edit (⌘Z). Chapter mode shows a dirty dot and a Save As button when the buffer has unsaved edits, and ⇧⌘S writes the leveled file without leaving Chapter mode.

A true-peak ceiling is applied in the same edit, on by default, so a boosted chapter can't run past the delivery ceiling. It defaults to −3 dBTP for RMS/ACX and −1 dBTP for LUFS (tracking the metric), and the Level popover lets you change it or switch it off. One caveat remains: a uniform per-chapter gain moves that chapter's room-tone noise floor with its level, so for a floor-aware pass, gate the room tone first with Level Dialogue, then level.

The Level Chapters popover with an RMS or LUFS metric, a book-median or fixed target, and a true-peak ceiling.
Level chapters. Bring every chapter to a common loudness in one move.

Reset to detected

Restores Specula's silence-detection result, dropping all manual boundary edits, adds, and removes. Renames are kept.

Export / Import

Export… writes the current chapters (name + start + end) to a JSON sidecar. Import… loads one back, replacing the current chapter list. The exported JSON is plain Specula JSON; the importer also accepts a slimmer format with just start and end per entry, so you can hand-craft one.

Export CSV… writes the per-chapter metrics table (RMS, integrated LUFS, deviation from median, true peak, sample peak, LRA, max momentary / short-term, dialog LUFS, noise floor, speech %) as a spreadsheet-ready CSV, for the producers who live in spreadsheets. Run Analyse Loudness first so the metric columns are populated.

Recall

When you load a file whose fingerprint (filename + duration + sample rate + channels) matches a prior chapter setup saved in ~/Library/Application Support/Specula/chapters/, a Recall button appears in the toolbar. Click to restore the saved chapters. Toggle the prompt in Settings → Targets → Chapter detection.

Chapter ribbon (⌃6)

The sixth section at the bottom of the window. Time-aligned to the waveform's zoom and scroll. Click a slot to select; double-click to seek the playhead to its start. Toggle the section's visibility with ⌃6 (independent of whether chapter mode is active).

In the report

When chapters exist on the file, the PDF / JSON report includes a Chapters section. The table carries every per-chapter metric the ribbon shows: RMS, integrated LUFS, deviation from median, true peak, sample peak, LRA, max momentary, max short-term, dialog LUFS, noise floor, and speech percentage. Cells fail-red on the same three ACX gates the ribbon uses (RMS outside [−23, −18] dB, TP > −3 dBTP, NF > −60 dB). The caption above the table documents all three gates and the configured deviation threshold so a producer reading the PDF can interpret it without opening the app.


18. Export #

KeyActionOutput
⇧⌘EExport JSON Report*.json - all metrics, both live and selection analysis, channel info, file metadata
⌥⌘EExport PDF Report*.pdf - formatted report with charts
⌃⌘EExport Diff Audio*.wav - the active slot's cached A − slot residual buffer (compare mode only). Same output as the Export Diff button next to the Listen to Diff toggle.

In edit mode, ⇧⌘S is "Save Edited Audio As", not the JSON export - the menu item swaps based on context.

Report contents

The Settings → Report tab decides what's included:

JSON is a stable, machine-readable schema. PDF is the same content rendered for human reading. For the audiobook and podcast producers who live in spreadsheets, Chapter mode → Export CSV… writes the per-chapter metrics table as a CSV (one row per chapter), and the CLI's specula report <file> --format csv does the same from the command line. See Chapter mode.

The full-file report (no selection) carries integrated LUFS, true peak (4× oversampled dBTP), sample peak, LRA, Max Momentary, Max Short-Term, DR, speech-gated LUFS, speech %, stereo correlation, per-channel stats, and loudness-target verdicts - every metric the info panel shows.

Each entry in loudnessTargets.targets[] in the JSON envelope carries an inline verdict object so a Shortcut, CLI pipeline, or batch QC script can branch on pass / fail without recomputing it against the target spec. Fields: status (compliant / penalty / notCompliant / unavailable), actualLUFS, plus penaltyGainDB / penaltyApplies / truePeakClean for streaming targets and nonComplianceReason / nonComplianceActual / nonComplianceLimit for hard-spec fails. The PDF and HTML reports render the same verdict as a status badge. Full field reference in Loudness targets.


19. Command-line tool (specula) #

For batch QC, shell scripts, and Shortcuts that walk a folder of files, Specula ships a CLI built on the same measurement engine the app uses. Same numbers, no window.

Install

The CLI ships inside the app bundle. Pick Specula → Install Command-Line Tool… and Specula links specula into /usr/local/bin (one admin prompt if that folder needs it). The link points at the copy inside Specula.app, so app updates keep the installed command current.

Manual alternative (scripted setups):

sudo ln -sfh "/Applications/Specula.app/Contents/Helpers/specula" /usr/local/bin/specula

Verify with specula --version.

Licensing

The CLI shares the app's license and trial. It runs through the 7-day trial (a days-left note prints on stderr) and requires the license activated in the app after that (Specula → Manage License…); an expired, unlicensed install exits with code 77 and an explanation on stderr. --help and --version always work. The first licensed run asks once for access to the license in the Keychain - click Always Allow. Do that once in a local Terminal window before using the tool over SSH, where the prompt can't appear.

specula analyze <FILE>

Headline numbers (integrated LUFS, true peak, sample peak, loudness range, speech-gated LUFS, stereo correlation) as JSON.

specula analyze mix-v2.wav
specula analyze mix-v2.wav --no-vad        # skip Silero (faster)
specula analyze mix-v2.wav --no-stereo     # skip the per-block correlation pass

specula compare <A> <B>

Two files side-by-side with B − A deltas for every metric.

specula compare master-v1.wav master-v2.wav
specula compare master-v1.wav master-v2.wav --match-loudness
specula compare master-v1.wav master-v2.wav --no-vad   # skip Silero on both files

--match-loudness subtracts the integrated-LUFS delta from B's peak / speech-gated readings so the deltas surface spectral or shape differences instead of being dominated by level mismatch. Useful when you've rebalanced a mix but want to know whether the tonality really changed.

--out-diff <PATH> additionally writes the A − gainB·B residual to that path as a Float32 WAV (paired with --match-loudness to net out pure level differences before subtracting). The compare JSON gains a residual block carrying the residual's peak / RMS / applied gain so a script can branch on the residual energy without re-loading the file. Sample rate + channel count must match between A and B - the diff refuses structural mismatches rather than silently resampling.

specula edit <FILE> --out <OUT> …

Applies edit operations and writes the result to --out (-o for short). --out is required and must not resolve to the same path as the input - Specula never writes in place, the same contract the app's Save As enforces.

specula edit mix.wav --out mix-norm.wav --normalize-lufs=-14 --lufs-ceiling=-1
specula edit mix.wav --out mix-trimmed.wav --trim 5.0:30.5
specula edit mix.wav --out mix-louder.wav --gain 3

Negative values need the --option=value form (e.g. --gain=-3, --normalize-lufs=-14). The CLI otherwise reads a bare leading - as an option name and rejects the value.

Supported operations (applied in this order if multiple are passed):

  1. --trim START:END - keep only the range, in seconds (e.g. 5.0:30.5).
  2. --dc-remove - subtract per-channel mean.
  3. --invert-phase - flip polarity on every channel.
  4. --gain DB - uniform gain.
  5. --normalize-peak DBFS or --normalize-lufs LUFS [--lufs-ceiling DBTP] - mutually exclusive.
  6. --limit-tp DBTP - apply the true-peak limiter at this ceiling without changing loudness; runs last so it can catch any residual peaks left by an upstream normalize.

LUFS normalization and --limit-tp use the same 5 ms two-pass true-peak limiter (BS.1770 4× polyphase oversampled detection).

specula report <FILE>

Full Specula report through the headless pipeline, in your choice of format.

specula report mix.wav --mode music                                   # JSON to stdout
specula report mix.wav --mode podcast --out report.json               # JSON to file
specula report mix.wav --mode music --format html --out report.html   # the same dark-themed HTML the app's preview window renders
specula report mix.wav --mode music --format pdf  --out report.pdf    # rasterised PDF
specula report book.wav --mode podcast --format csv --out chapters.csv # per-chapter metrics table as CSV
specula report mix.wav --mode broadcast --no-curve                    # drop the loudness curve for smaller batch output

--format selects the output shape: json (default), html, pdf, or csv. PDF requires --out (binary data on stdout isn't useful), and the receipt printed to stdout on success carries the written byte count + mode so a pipeline can confirm the file landed. csv emits the per-chapter metrics table when the file has chapters, otherwise a flat metric,value table of the file-level numbers. --mode selects which loudness-target catalog drives the evaluation: music, podcast, vod, broadcast. --no-curve omits the per-100 ms loudness curve when the receipt only needs the summary numbers. --no-vad skips Silero speech detection (the speech-gated fields read -inf) and --no-stereo skips the correlation pass, both for faster unattended batches.

Output format

Every subcommand emits pretty-printed JSON with sorted keys, so diffs across runs stay deterministic. Non-finite floats (silence → -∞) render as "-inf" strings rather than blowing up the encoder.

Shortcuts (App Intents)

Specula surfaces eight App Intents that the system Shortcuts app and Siri can call directly. Same engine as the CLI; same numbers. No file-handling shell glue required, output is chainable. The four core actions ship in two variants each (file-input and text-path) so they fit whichever Shortcuts workflow shape you have.

macOS Shortcuts editor with the Get Measurements action and a downstream branch on Integrated LUFS.
Shortcuts. Chain measurements into automations without parsing JSON.
ActionFile-inputPath-inputReturns
MeasurementsGet MeasurementsGet Measurements (from Path)Specula Measurements (I, M, S, LRA, TP, SP, DR, Speech-Gated, Speech %)
Numeric compareCompare FilesCompare Files (from Paths)Specula Comparison (both sides + B − A deltas)
Audio compareGet Compare DiffGet Compare Diff (from Paths)residual WAV file (A − gainB·B)
ReportGet ReportGet Report (from Path)report file (JSON / HTML / PDF)

Each action's behaviour:

Binding the file input

Each intent's file parameter shows a Choose button followed by a small menu. The Choose button opens a static file picker (one-off use). The menu is where the magic-variable picker lives - pick Shortcut Input to bind the file that came in from a Finder Quick Action / Share Sheet / prior action's output. Don't pick "Clear" - that wipes the binding.

Path-input variants

For text-path workflows (clipboard text, Ask for Text, paths pasted from a shell), each intent has a sibling (from Path) variant that takes the audio file as a text string instead of a File. The text variants accept:

Quotes are stripped automatically; tilde gets expanded against your home directory. So a typical clipboard-driven shortcut is just Get Clipboard → Get Report (from Path) with the clipboard variable bound to Audio File Path.

Spotlight phrases

Each file-input action also answers to a second phrasing: "Get measurements from Specula", "Compare audio in Specula", "Get Specula report".

Type one into Spotlight (⌘Space). Intents with required file parameters will prompt for the file via a picker; for variable-driven workflows the Shortcuts app is the better surface.

The six measurement, compare, and report intents default to speech detection off. Flip the "Enable Speech Detection" toggle in the Shortcut step when you need speech-gated metrics; the first speech-gated run may take a moment as the model loads. The two Get Compare Diff intents return a pure residual, so they carry no speech toggle.

Finder Services (right-click → Services)

Three services install automatically and appear under Services when you right-click one or more audio files in Finder. They work out of the box, with nothing to build in Shortcuts or configure in Automator.

If the entries don't appear right after first install, log out and back in (or reboot) so Launch Services indexes them.


20. Settings #

Open with ⌘,. Nine tabs: Layout · Spectrogram · FFT / Analysis · Speech · Targets · Report · Updates · License · Acknowledgements.

Settings Spectrogram tab with colormap, frequency scale, and floor and ceiling sliders.
Settings. Colormap, frequency scale, and dB range for the spectrogram.

Layout

Defaults for which sections show on launch:

Each section can still be toggled live with ⌃1-⌃5.

Liquid Glass chrome (macOS 26+ only). On macOS 26 the transport bar, file dock, info panel, seek bar, and section-toggle strip use the system Liquid Glass material so they feel native alongside Finder, Safari, and other macOS 26 apps. Default on. Turn off in Settings → Layout → Appearance for solid dark chrome, useful when recording screenshots or screencasts where the depth-aware refraction would change with whatever's underneath the window, or simply if you prefer the flat look. The toggle is hidden on macOS 14-25 (Liquid Glass isn't available there; chrome stays solid regardless).

With the toggle on, each chrome strip renders as a rounded floating pill (radius 6, with a small dark gutter between strips and the window edges), the macOS 26 native pattern Music, Photos, and Calendar use. The Liquid Glass material's edge highlights become the rim of each pill rather than horizontal lines crossing the window. The seek bar's accent fill also picks up the glass material when the toggle is on (otherwise solid). With the toggle off, chrome is edge-to-edge square strips; the rounding belongs to the Liquid Glass look.

The seek bar's track and fill are 8 px tall regardless of the toggle, a substantial slider lane that reads well whether it's filled with glass or solid colour.

Spectrogram

All six values (colormap, frequency scale, Floor, Ceiling, FFT overlap, auto-compute on load) persist across launches.

FFT / Analysis

Speech

Edits re-run detection in about half a second and the new history feeds back into the live loudness path. If you have manual edits in Dialogue mode when you move a slider, they're pushed onto the Dialogue mode undo stack first so you can ⌘Z back to them. See Speech-gated loudness + Dialogue mode for the full workflow.

Report


21. Complete keyboard reference #

File

⌘OOpen audio file
⌘,Open Preferences

Playback

SpacePlay / pause
⌘.Stop and return to start
/ Skip ±5 seconds
⇧← / ⇧→Skip ±1 second
⌘←Jump to start
LToggle loop

Playback rate

[Step rate down
]Step rate up
\Reset rate to 1×

Waveform navigation (mouse / trackpad)

GestureAction
Click / plain dragMove the playhead (drag scrubs)
-dragCreate a selection region
Drag a selection's edge / bodyResize / move it
Scroll (wheel or two-finger)Horizontal zoom (cursor-anchored)
-scrollPan horizontally
-scrollAmplitude (vertical) zoom

Selection & analysis

EscClear selection (waveform + FFT)
⌘ASelect all
⌘ReturnAnalyse selection (offline)

Modes (mutually exclusive)

⌘1Enter / exit Edit mode (accent: amber)
⌘2Enter / exit Compare mode (accent: sky-blue)
⌘3Enter / exit Chapter mode (accent: green)
⌘4Enter / exit Dialogue mode (accent: rose)

Compare mode

1 - 6Switch to compare slot A - F (bare digits, Compare mode only)
DToggle Listen to Diff on the active slot (bare D, Compare mode only)

Edit mode

These operate only in Edit mode (so they don't shadow the transport keys) and are listed in the Edit menu. The first group needs a selection; the playhead group acts at the cursor. Defaults: Cut uses a 10 ms equal-power crossfade; the fade keys use the equal-power curve.

⌘Z / ⇧⌘ZUndo / redo last edit
⇧⌘SSave edited audio (new file)
Cut (remove and crossfade-join) - selection
⌥⌫Silence selection (replace, keep length) - selection
⌘TTrim to selection (keep it, discard the rest) - selection
⌘F / ⌥⌘FFade In / Fade Out - selection
⌃↑ / ⌃↓Gain +1 / −1 dB - selection
⌘⌫Cut to start (remove start → playhead) - playhead
⇧⌘⌫Cut to end (remove playhead → end) - playhead
⌥[ / ⌥]Fade in to / out from the playhead - playhead

Chapter mode

⌃6Toggle chapter ribbon section visibility

Dialogue mode

IMark In (anchor at playhead, or set start of selected region)
OMark Out (close pending region, or set end of selected region)
SSplit selected region at playhead
DeleteDelete selected region
⌘Z / ⇧⌘ZUndo / redo region edit (50-step history)

Export

⇧⌘EExport JSON report
⌥⌘EExport PDF report
⌃⌘EExport diff audio (WAV)

View - toggle sections

⌃1Toggle Waveform
⌃2Toggle Loudness Curve
⌃3Toggle Spectrogram
⌃4Toggle FFT
⌃5Toggle Stereo Width
⌃6Toggle Chapter Ribbon

View - focus one section (hide others / restore)

⌃⌥1Focus Waveform
⌃⌥2Focus Loudness Curve
⌃⌥3Focus Spectrogram
⌃⌥4Focus FFT
⌃⌥5Focus Stereo Width

22. Standards #

StandardWhere it appears
ITU-R BS.1770-4K-weighting filter, 100 ms gating blocks, dual gating, 4× true-peak oversampling
EBU R128Momentary / Short-term / Integrated LUFS, LRA, True Peak, recommended programme loudness
Silero VAD (MIT)Speech detection model
FluidAudio (Apache 2.0)Swift wrapper for Silero on Apple platforms

Specula's loudness implementation is validated against the EBU R128 loudness test set (EBU Tech 3341 / 3342), to the EBU Tech 3341 ±0.1 LU tolerance. The underlying measurement algorithm is ITU-R BS.1770-4, which EBU R128 builds on.

EBU R 128 loudness logo EBU R 128
loudness compliant
↑ Back to top