Specula · User guide
This page is the canonical reference. Skim the table of contents to jump to a feature, or read top-to-bottom for the full tour.
⌘O opens the standard file picker. Multiple-selection is allowed: the first file becomes the primary and the rest open as compare slots. Specula loads each file into RAM as a Float32 non-interleaved buffer. Pre-computed waveform peaks and clip regions are calculated during load.
Other ways to open files:
open path/to/file.wav from Terminal works for the same reason: it routes through Launch Services.Opening files always starts a clean comparison. If two files were already in compare, opening a single new file replaces the whole set; opening three new files replaces the whole set with those three. To append to the current comparison without starting fresh, use File → Add Files to Compare… (⇧⌘O) or the Compare button in the file dock.
| Space | Play / pause |
| ⌘. | Stop and return to start |
| ← / → | Skip ±5 seconds |
| ⇧← / ⇧→ | Skip ±1 second |
| ⌘← | Jump to start |
| L | Toggle loop. Loops the current selection if one exists, otherwise the whole file. |
Next to Loop in the transport bar is a small ↺ Hold play-start toggle. Default: off. Pause stops playback at the current position (classic transport behaviour). Turn it on (the icon fills, accent-tinted) and Pause (Space or the play / pause toggle) returns the playhead to where Play was last started, or where you last manually placed the cursor with a waveform click. Tap Space again and you re-play the same passage from the same starting point. Useful when you're A/B'ing a passage by ear without wanting to re-position the cursor each time.
In compare mode the toggle also makes nudge buttons (±1 sample / ±1 ms) re-seek to that anchor, so the comparison point doesn't drift forward while you click alignment buttons.
Stop is unchanged either way: it always rewinds to the beginning of the file.
Variable rate from 0.25× to 2×, with optional pitch preservation (timestretch).
| [ | Step rate down |
| ] | Step rate up |
| \ | Reset rate to 1× |
The Transport bar exposes the rate menu (0.25× · 0.5× · 0.75× · 1× · 1.25× · 1.5× · 2×) and the "preserve pitch" toggle.
The Transport bar shows the current output as a compact Output: <Device Name> pill on the right side. Click it (or press ⌥⌘O, or pick Window → Output Panel) to open the Output window. The window holds a device dropdown (with channel count next to each name; the Refresh Devices item lives at the bottom of the menu) above the per-channel routing matrix. Switching device rebuilds the audio engine. Specula is happy with any device, from built-in speakers to multichannel pro audio interfaces.
The Output window is hidden by default and remembers its size and position across launches. On macOS 26 the window itself uses Liquid Glass, so it reads as a floating utility panel over the main app (the same Liquid Glass opt-out toggle covers it). Routing details are in Channel routing & monitor mode.
Renders all channels of the loaded file simultaneously, scaled to fill the section height. Each channel uses its layout colour - mono, stereo, quad, 5.1, 7.1 each have a distinct palette.
A time ruler runs above the waveform. Click the ruler-mode toggle in the waveform control strip to swap between Time (HH:MM:SS.mmm) and Samples (frame count). Useful when you need to align two takes by sample.
The control strip below the waveform has an amplitude zoom control (−/+ buttons) - useful for inspecting low-level detail without it being lost in the noise floor. Default is 1× (full ±0 dBFS range).
During load, Specula scans every channel for sample values ≥ ±1.0. Each detected clip region is highlighted in red on the waveform and counted in the Channel Info sidebar.
The waveform draws horizontal grid lines at 0, −3, −6, −9, −12, −18, −24, −30, −36, −48, −60, −72, −84, −96 dBFS. Lines auto-cull when the channel is too short to label them legibly.
After file load, Silero VAD runs in the background and produces a per-100 ms speech / non-speech timeline. The waveform's control strip has an SG toggle - when enabled, detected speech blocks are shaded in teal. Useful for visually verifying the VAD result before relying on speech-gated loudness numbers.
Click a channel label in the Channels sidebar to mute that channel for playback. Useful for soloing one side of a stereo file or isolating a single channel of a multi-channel master.
A selection is a [start..end] range on the waveform. Selections drive offline analysis, edit-mode operations, and loop playback.
| Esc | Clear selection (waveform + FFT) |
| ⌘A | Select all |
| ⌘Return | Run offline selection analysis |
Most of the offline numbers populate the moment a file finishes loading, with no key press. A detached background pass over the whole file produces:
These numbers feed the info panel's Offline tab and both the PDF and JSON reports immediately after open. A typical music track finishes the pass in a few hundred milliseconds. Long-form audio takes proportionally longer.
⌘Return reruns the full analysis pipeline on the current selection rather than the whole file. Use it when you want metrics scoped to a specific region (a chorus, a problem section, a candidate edit point), or when you want the averaged FFT spectrum view.
Selection-scoped output:
A spinner in the Loudness sidebar indicates the analysis is in flight; results land when it completes (typically ~1-3 s for a multi-minute selection). Live measurement keeps running in the background regardless.
⌘Return is not required to populate true peak or per-channel stats on a freshly loaded file: those land in the Offline tab and the report automatically. Reach for ⌘Return when the question is about a region of the file, or when you want the averaged spectrum on the FFT view.
The FFT spectrum view has its own horizontal selection on the frequency axis - and it's an active playback filter, not a measurement readout. See FFT spectrum for details. Esc clears the FFT selection too.
Specula implements full ITU-R BS.1770-4 / EBU R128:
| Metric | Description |
|---|---|
| Momentary (M) | 400 ms window, updates 10×/sec |
| Short-term (S) | 3 s window |
| Integrated (I) | Whole-file integrated LUFS, dual-gated. Resets on file load and on play-from-zero. |
| LRA | The 10-95 percentile range of the gated short-term distribution |
| True Peak (dBTP) | Highest 4×-oversampled inter-sample peak. Turns red above −1 dBTP. |
| Sample Peak (dBFS) | Highest absolute sample value across the whole file. |
The Loudness section in the sidebar has a [Live] [Offline] segmented control at the top. Live shows the real-time measurement updating during playback. Offline shows the whole-file results from the load-time analysis pass, populated immediately after the file finishes loading; running ⌘Return on a selection narrows the same metrics to that selection.
The same rows appear in both modes from the chosen source: Integrated, LRA, Speech-Gated, Speech %, Momentary / Short-term (labelled Max Momentary / Max Short-term in Offline since they're whole-file or whole-selection maxima rather than instantaneous), True Peak, Sample Peak, Loudness Targets. Offline adds Time range, Dynamic Range, Headroom, Noise Floor (Podcast mode), and per-channel Peak / RMS / Crest / DC below. The choice persists across launches. Live measurement keeps running in the background regardless of mode.
The info panel has a top-level Mode picker right under the File section. Mode is the per-file analysis context - switch it to reframe the same file as a different deliverable. It drives both the Loudness Targets panel and which metric rows appear in the sidebar:
| Metric | Music | Podcast | VOD | Broadcast |
|---|---|---|---|---|
| Integrated, LRA, M/S, True/Sample Peak, Targets, Per-channel | ✓ | ✓ | ✓ | ✓ |
| Speech-Gated I, Speech % | - | ✓ | ✓ | ✓ |
| Noise Floor (ACX) | - | ✓ | - | - |
| Dynamic Range, Headroom | ✓ | - | - | ✓ |
Settings → Targets is a per-mode catalog editor - four sections, one per mode, each with its target toggles. Use it to decide which targets show up next to a loaded file when that mode is active.
When integrated LUFS is finite, Specula shows the Loudness Targets panel: one row per selected target with the verdict for the active mode. Pick the mode and up to four targets per mode in Settings → Targets.
| Mode | Treatment | Source |
|---|---|---|
| Music / Streaming | Penalty: shows the gain the platform will apply (e.g. Spotify −4 dB) | Integrated LUFS |
| Podcast / Spoken Word | Penalty for streaming podcast targets; compliance for ACX | Dialog-gated (streaming) / RMS (ACX) |
| VOD | Compliance: green dot or FAIL L / FAIL TP against the dialog-gated band | Dialog-gated LUFS |
| Broadcast | Compliance against the integrated band and the true-peak ceiling | Integrated LUFS |
Music targets report the gain the platform will apply rather than a hard verdict. A mix at −10 LUFS reads "Spotify −4 dB" (the platform will turn it down 4 dB to hit its −14 LUFS reference). A quiet mix at −18 LUFS reads "+4 dB" on symmetric platforms, "as-is" on asymmetric ones (see below). The dot is green when the gain is within ±1.5 dB of zero, yellow within ±4 dB, orange beyond. A separate triangle warns when the true peak breaches the platform's ceiling.
VOD and broadcast targets are hard pass/fail. The dot is green when the loudness sits within the tolerance band AND the true peak is at or below the ceiling. FAIL L means the loudness is outside the band; FAIL TP means the loudness is in band but the true peak is over the ceiling; FAIL S↑ means the maximum short-term loudness exceeded the spec's short-term ceiling (used by EBU R128 S1); FAIL NF means the noise floor exceeded the spec's RMS ceiling (used by ACX). Loudness fails take precedence; the loudness reason is reported first even when several criteria are off.
Dialog-gated targets (Netflix and peers, streaming podcasts, ATSC A/85) evaluate against speech-gated LUFS. Files with no detected speech show "no speech" rather than a misleading verdict. Target reference lines lead with dialog · whenever a target reads from the speech-gated path, so you can scan the measurement source without parsing LUFS-vs-LKFS unit suffixes.
ACX noise floor. Specula measures the dB RMS across all VAD-classified non-speech samples and shows it in the Speech-Gated section of the Info panel. ACX rejects audiobooks above −60 dB RMS, so the reading turns red when it exceeds that. The ACX preset in the loudness-targets panel automatically picks up the measurement and shows FAIL NF when the noise floor breaches the ceiling. Files with no non-speech samples show −∞. The noise floor is undefined in that case and the criterion passes trivially.
Each row in the Loudness Targets panel maps to a verdict object on the corresponding target in the JSON export, so downstream scripts can branch on pass / fail without recomputing against the target spec. The PDF and HTML reports render the same verdict as the status badge above each target card.
| Field | Type | When populated |
|---|---|---|
status | compliant · penalty · notCompliant · unavailable | Always. |
actualLUFS | number | The integrated or dialog-gated reading the verdict was computed against. |
penaltyGainDB | number | Streaming-platform targets (Music + streaming Podcast). |
penaltyApplies | boolean | Streaming-platform targets. False for "as-is" on asymmetric platforms. |
truePeakClean | boolean | Streaming-platform targets. False when the master's true peak breaches the platform's ceiling. |
nonComplianceReason | loudness · truePeak · shortTerm · noiseFloor | Hard-spec targets (VOD / Broadcast / ACX) when status is notCompliant. |
nonComplianceActual | number | The actual reading on the failing criterion. |
nonComplianceLimit | number | The ceiling or band edge the failing criterion exceeded. |
JSON consumers that ignore the field keep working. Useful for batch QC scripts, CI gates on render farms, and Shortcuts workflows that branch on whether a master will pass a given platform without re-running the math.
Different specs measure loudness differently and Specula computes both in parallel so the right one feeds each target.
The Loudness section shows both values side by side. Each target row's reference line is prefixed with dialog · when the target evaluates against the dialog-gated reading. Files that contain no detected speech show the dialog-gated reading as "-" and dialog-gated target verdicts as "no speech".
Not every streaming platform applies the gain it computes. Apple Music's Sound Check, YouTube, and Tidal only turn loud tracks down; they leave quieter-than-reference tracks at their original level. So if your master sits at −18 LUFS against Apple Music's −16 reference, the platform plays it at −18, not boosted to −16. Specula reflects this:
Settings → Targets surfaces the behaviour in each target's subtitle ("penalty, turns down only" vs "penalty"). The match-button tooltip on each row also states which measurement source the match will use.
Each target row has a small waveform-circle toggle on the right. Press it and Specula matches playback to that target:
target reference − source LUFS, where source is integrated LUFS for integrated targets (most music + broadcast) or dialog-gated LUFS for dialog-gated targets (Netflix and peers, ATSC A/85, streaming podcasts).dBTP ceiling. Reads the 4× oversampled inter-sample peak (ITU-R BS.1770 polyphase FIR), so the ceiling is enforced against the analog-reconstructed signal rather than just the raw sample magnitude. The limiter only acts when the gain would push peaks above the ceiling; below it the limiter is transparent.Hit play and you'll hear the file the way the platform plays it. Press the toggle again (or pick another target) to release. Nothing in the loaded buffer is modified - match is purely a playback gain stage with a limiter after it.
The match follows the loudness display mode: switching Live → Offline (or back) while a match is active recomputes the gain against the new source. If a matched target is no longer in the active mode's selected targets (you switched mode, or unticked it in Settings), the match auto-disables so you can never apply gain you can no longer see the verdict for.
5 ms playback latency while match is active. Dialog-gated targets can be matched too - the gain is computed against the speech-gated reading. Targets disable with a tooltip explaining which reading is missing (typically "no detected speech" for dialog-gated targets on speech-free files).
Defaults are marked ✓.
| Mode | Target | Reference | Tolerance | True-peak ceiling |
|---|---|---|---|---|
| Music | Spotify ✓ | −14 LUFS | penalty (symmetric) | −1 dBTP |
| Music | Apple Music ✓ | −16 LUFS | penalty (turns down only) | −1 dBTP |
| Music | YouTube ✓ | −14 LUFS | penalty (turns down only) | −1 dBTP |
| Music | Tidal ✓ | −14 LUFS | penalty (turns down only) | −1 dBTP |
| Music | Amazon Music | −14 LUFS | penalty (symmetric) | −2 dBTP |
| Music | Deezer | −15 LUFS | penalty (symmetric) | −1 dBTP |
| Music | SoundCloud | −14 LUFS | penalty (symmetric) | −1 dBTP |
| Music | AES TD1008 (streaming) | −18 LUFS | penalty (symmetric, vendor-neutral) | −1 dBTP |
| Podcast | Apple Podcasts ✓ | −16 LKFS dialog-gated | penalty | −1 dBTP |
| Podcast | Spotify (Podcast) ✓ | −14 LKFS dialog-gated | penalty | −1 dBTP |
| Podcast | ACX (Audiobook) | −20.5 dB RMS | ±2.5 dB | −3 dBTP (plus noise floor) |
| VOD | Netflix ✓ | −27 LKFS dialog-gated | ±2 LU | −2 dBTP |
| VOD | Prime Video ✓ | −27 LKFS dialog-gated | ±2 LU | −2 dBTP |
| VOD | Apple TV+ ✓ | −27 LKFS dialog-gated | ±2 LU | −2 dBTP |
| VOD | Disney+ | −27 LKFS dialog-gated | ±2 LU | −2 dBTP |
| VOD | Max ✓ | −27 LKFS dialog-gated | ±2 LU | −2 dBTP |
| Broadcast | EBU R128 (EU) ✓ | −23 LUFS | ±0.5 LU | −1 dBTP |
| Broadcast | EBU R128 (EU, live) | −23 LUFS | ±1 LU | −1 dBTP |
| Broadcast | EBU R128 S1 (EU, short-form) | −23 LUFS | ±0.2 LU + Max Short-term ≤ −18 LUFS | −1 dBTP |
| Broadcast | ATSC A/85 (CALM Act) ✓ | −24 LKFS dialog-gated | ±2 LU | −2 dBTP |
| Broadcast | ARIB TR-B32 (Japan) | −24 LUFS | ±1 LU | −1 dBTP |
| Broadcast | OP-59 (Australia) | −24 LUFS | ±1 LU | −1 dBTP |
Where Match previews a target by ear, Normalize commits to it. Integrated targets (music streaming, broadcast) and dialog-gated targets (VOD, streaming podcasts) carry a second button on the row, the up-to-line icon next to the Match toggle. Press it and Specula opens Edit mode with the Normalize section pre-filled to that target, then click Apply to normalize the loaded buffer.
The Normalize panel states its Basis so you always know which loudness is being targeted, and shows the move before you commit:
dBTP true-peak limiter ceiling, pre-filled to the target's.This is a normal edit: it lands on the 16-level undo stack, and exporting writes a new file rather than overwriting the source. Integrated targets re-measure and iterate to the reference; dialog-gated targets apply the uniform gain that lands the speech-gated reading on the reference. Both finish with the same two-pass true-peak limiter at the ceiling.
The ACX RMS target is the one exception: it stays compliance-only with no Normalize button. ACX is a hard delivery spec whose noise-floor requirement a gain can't satisfy (raising RMS into the window raises the noise floor with it), so normalizing its level alone could read as "now compliant" when it isn't. The button is also disabled (with a tooltip) until a loudness reading is available, and unavailable in multi-file Compare, since Edit and Compare are mutually exclusive.
Normalize is the destructive counterpart to Match: Match is a live playback gain you can release at any time, Normalize bakes the gain into a new file. Both target the same basis the row is scored against and share the same true-peak limiter math; the difference is whether the loaded buffer is rewritten.
Where Normalize fixes a loudness miss (FAIL L), Limit TP fixes a true-peak overage (FAIL TP). When the file's true peak sits above a target's ceiling, that row shows a Limit TP button (the waveform-pulse icon) alongside Normalize. Press it and Specula opens Edit mode with the Limit TP section pre-filled to that target's ceiling; click Apply to cap the inter-sample peaks. Loudness is left untouched, so this is the right fix when the level is already where you want it and only the peaks breach the ceiling, the one-click counterpart to the FAIL L → Normalize button.
The button is contextual: it appears only when there's a true peak over the ceiling to fix. That covers a hard FAIL TP (loudness in band, peak over), a loudness-compliant row carrying the orange over-ceiling triangle, and a music target whose peaks still clip its ceiling. It's offered on any target type, ACX included, because every target carries a true-peak ceiling and capping peaks never raises the noise floor (unlike Normalize). FAIL S↑ and FAIL NF get no fix button, since no peak limit corrects a short-term or noise-floor miss. Like Normalize, it lands on the undo stack, saves as a new file, and is unavailable in multi-file Compare.
Specula uses a two-tier voice activity detection pipeline to compute integrated loudness over speech-only blocks. This matters for podcast / dialog / film mixing where you want to ignore room tone, music beds, and silence.
Runs the Silero VAD model via FluidAudio. On file load, the entire file is downsampled to 16 kHz mono and classified. The resulting per-100 ms speech / non-speech history feeds the loudness measurement. Selection analysis re-runs Silero on just the selection.
Used when Silero is unavailable (it fails to load or returns an error). Four spectral features:
- is shown for content with < 0.5 % speech.−∞.
Three sliders bias the detector for the content you work with. Edits trigger a re-analysis in roughly half a second, and the new history feeds straight back into the loudness measurement.
For programme material where the VAD's false positives or misses matter, ⌘4 (or the Dialogue button on the transport bar) enters Dialogue authoring mode - one of the four modal authoring modes (Edit, Compare, Chapter, Dialogue) the app exposes, mutually exclusive with the other three. Every detected region is visible on the waveform while the mode is active (the SG overlay is forced on so you don't need to discover the toggle).
Region painting on the waveform:
Toolbar:
Settings → Speech slider edits always re-run detection. If you have manual edits when you move a slider, they're pushed onto the undo stack first so you can ⌘Z back to them.
While Dialogue mode is active, each region's waveform tint tells you where it came from:
Edited regions are auto-saved to <audio>.dlg.json next to the audio file (debounced 500 ms after the last edit). Sidecar files round-trip cleanly: re-opening the same audio re-applies your edits, and the schema is versioned (currently v1) so future format changes won't silently corrupt old work.
The regions aren't only a measurement input. Once they're right, switch to Edit mode (⌘1) and use Level Dialogue to act on them. It ducks the room tone in the non-speech regions (a downward expander with a fast attack and slow release, so word onsets and tails stay clean) and can optionally lift the dialogue to a target in the same pass. Gating the silence before applying the makeup gain is what lets dialogue normalize without the noise floor riding up with it, the fix for a common ACX rejection. When you lift to a target you also set a true-peak ceiling, and the makeup boost is capped there by the same limiter Normalize uses. It's a gate, not a denoiser, so it helps a marginal room tone rather than a hissy recording.
A dedicated section that plots per-100 ms momentary and short-term LUFS over time, with the time axis aligned to the waveform's. Toggle with ⌃2.
After offline selection analysis, the Loudness Curve view fills in for the analysed range; live playback adds new samples as they're measured.
The exported JSON and PDF reports plot three traces: Momentary (yellow), Short-term (blue), and a running Integrated trace (green) that converges to the file's final integrated reading at the tail. The third trace is the easiest way to see how a master "settled" against its target during the run.
Set programme-loudness thresholds and watch the waveform highlight every block that exceeds them. In the Loudness Curve / control strip:
Any 100 ms block above its threshold is highlighted on the waveform in real time during playback and after offline selection analysis. Useful for spotting the exact moment a master clips a streaming target.
The Violations toggle in Settings → Report controls whether they're included in JSON / PDF exports.
Real-time logarithmic-frequency spectrum from 20 Hz - 20 kHz. Toggle with ⌃4.
Five windows, switchable from Settings → FFT / Analysis:
| Window | Side-lobe rejection | Best for |
|---|---|---|
| Hann | −31 dB | General purpose. Good balance. |
| Hamming | (slightly higher) | Slightly narrower main lobe than Hann; higher far side-lobes. |
| Blackman | −58 dB | Excellent side-lobe rejection. Slightly wider main lobe. |
| Blackman-Harris | −92 dB | Best side-lobe rejection. Widest main lobe. Closely-spaced harmonics. |
| Flat Top | (very wide main lobe) | Most accurate amplitude reading. Calibration & level measurement. |
Hover anywhere on the spectrum and the cursor reads out frequency in Hz, magnitude in dB, and the closest musical note + cents deviation. Toggle on/off via the NOTES button in the FFT control strip.
⌘-drag horizontally on the spectrum to capture a frequency range. The selection is an active playback filter, not a measurement readout - playback is routed through it so you can hear what's inside or outside the band.
A BAND / NOTCH toggle in the FFT control strip switches the filter mode:
The filter is live - adjust the selection edges and the audio responds in real time. Drag the body of the selection to slide the band across the spectrum while keeping its width.
Stereo files use a highpass + lowpass pair for BAND and a parametric −40 dB cut for NOTCH; multichannel files apply the same filtering per channel.
Esc clears the selection and bypasses the filter.
The FFT control strip exposes a smoothing slider - exponential averaging across frames. Higher values produce a calmer trace; lower values respond faster.
Pick which channel feeds the FFT (or use a downmix). Useful for inspecting one channel of a multichannel file in isolation.
After ⌘Return offline analysis, the FFT view switches to the averaged spectrum of the selection - a much smoother, more accurate read than the rolling live spectrum. Clearing the selection (Esc) returns to live mode.
While playback is stopped, clicking anywhere on the waveform or spectrogram recomputes the FFT for that position and updates the spectrum view to match. The FFT window ends at the cursor, so the spectrum you see lines up with the spectrogram column under the playhead. Useful for inspecting a specific moment without scrubbing: click, read, click again, read. Works for keyboard skips too (←, →, ⇧←, ⇧→).
Rolling time-frequency display. Toggle with ⌃3. All settings live in Settings → Spectrogram.
Two sliders set the colour-mapping window:
Anything quieter than the floor is mapped to the bottom colour; anything louder than the ceiling clips to the top colour. Tighten the window to highlight subtle detail; widen it to see the full dynamic range.
Each spectrogram column is one FFT. Overlap controls how often that FFT is recomputed per second, which sets the time-axis density. Higher overlap means columns are closer together: less blocky at high zoom, smoother gradients in time.
| Overlap | Hop (fftSize=4096, 48 kHz) | Columns/sec |
|---|---|---|
| 75% (default) | 21.3 ms | ~47 |
| 87.5% | 10.7 ms | ~94 |
| 93.75% | 5.3 ms | ~188 |
| 96.875% | 2.7 ms | ~375 |
| 98.4375% | 1.3 ms | ~750 |
96.875% and 98.4375% can substantially lengthen the offline Analyse pass at large FFT sizes (16k or 32k), since the column count scales with the overlap factor. Use them when you need maximum smoothness; 75% is the standard pro-audio default.
The spectrogram follows the same [Live] [Offline] toggle the loudness section uses.
The two stores are independent: live playback never overwrites the offline analysis, and Analyse never disturbs the live history.
Settings → Spectrogram → Performance → Auto-compute spectrogram on load. Off by default. When on, the whole-file spectrogram is built during the load-time background pass instead of waiting for ⌘Return on a selection. Sub-second for typical music tracks; the cost scales with file length, so long-form audio takes proportionally longer.
The setting is independent of the FFT spectrum view, which still requires a selection because its averaging is selection-scoped. Turn it on for sessions that mainly work with shorter material (stereo mixes, single tracks) and want the spectrogram immediately on load; leave it off for long-form audio when you'd rather pick a region with ⌘Return.
Pick a single channel or a downmix. The spectrogram persists per-slot in compare mode, so switching slots restores that slot's spectrogram history (both live and offline).
Colormap, frequency scale, dB Floor, dB Ceiling, and FFT overlap all persist across launches. Tune the view once and Specula remembers.
Computes the Pearson correlation between the mid and side signals over rolling windows. Toggle with ⌃5.
The control strip has a Correlation / Width % toggle:
(1 − corr) / 2 × 100. 0 % = mono, 50 % = decorrelated, 100 % = out of phase.Optional shading of the correlation curve:
Mouse over any point on the curve to see a plain-English label ("narrow stereo image", "natural stereo", "potential mono sum issue", etc.) plus the raw correlation value.
Lissajous correlation meter - plots L on the X axis and R on the Y axis as a 2D scatter. Useful for catching out-of-phase content at a glance.
Hidden by default in the FFT panel area; can be enabled via Layout settings or a panel toggle.
Per-channel vertical bars in the Channels sidebar. Each bar shows:
Click a channel label to toggle mute on that channel for playback. The meter still shows the muted-source level - useful for inspecting muted channels.
The Routing panel lives in the dedicated Output window (Window → Output Panel, ⌥⌘O, or click the Output pill in the Transport bar). It is an N → M matrix that maps each input/source channel to one or more output channels with per-route gain trim. The Output window also holds the device selector, so the channel count the matrix targets and the device driving it are always co-located.
The window is hidden by default. Open it when you need to reroute, then close it again. Routing settings persist for the loaded file.
The routing section has a List / Matrix toggle. Same data, two presentations.
Your preference persists across launches.
A horizontally scrolling preset bar (visible in both views) applies a complete mapping in one click. Presets are filtered to the ones that apply to the current file and device channel counts.
Applying a preset is destructive: it overwrites the routing for every output channel, including any custom gain you set. Outputs the preset doesn't address are muted, so the preset fully replaces the routing.
Coefficients follow ITU-R BS.775-3. The −3 dB factor on summed channels is 1/sqrt(2), the equal-power gain for correlated channel summation. The Pro Logic II coefficients (−1.2 dB and −6.2 dB on surrounds, with phase inversion on the left total) match the Dolby specification.
Above the built-in row is a Saved row for your own routings.
Polarity inversion is a per-source-slot flag, separate from gain. List view has a small ϕ button next to each source's gain knob; Matrix view has the dedicated ϕ click zone in every routed cell (orange glyph when active). Needed for Pro Logic II Lt/Rt encoding and useful for quick A/B phase checks across any single route.
A 5.1 master with the wrong channel order is a routing problem, not an EQ problem. Specula lets you reroute channels for playback without modifying the file, so you can verify L / R / C / LFE / Ls / Rs ordering against your monitor system. The preset library covers the common stereo and surround downmix workflows in one click; the matrix view makes any custom mapping obvious at a glance.
Once a downmix, reorder, or custom matrix is right, Render to File… (in the Routing section header) renders it to a new audio file, so the routing is a deliverable, not just an audition. Every per-route gain, polarity, and Pro Logic II ±90° phase shift is applied exactly as it sounds on playback, computed offline.
Only the outputs you've actually routed are written. A 5.1 → stereo downmix on an 8-channel device renders a 2-channel file, not eight channels padded with silence (outputs with no source are dropped). The result writes as a new file (WAV, or CAF when the output channel layout can't be stored in WAV order) and never overwrites the source.
It bakes the routing only: the playback-side loudness match and the channel solo / mute are not baked, so the output is exactly what the matrix defines. To drop a channel from the bake, mute its route in the matrix (set its source to none) rather than relying on the monitor mute.
A PRE / POST toggle in the Channels sidebar header decides which signal feeds the loudness measurement:
For most workflows, PRE is what you want - the file's actual loudness. POST is useful when you've reduced channel count via the matrix (e.g. a 5.1 → stereo downmix) and need the loudness of the resulting stereo pair.
Load up to 6 files as slots A through F, switch between them instantly, level-match them by integrated LUFS, and compute sample-accurate residuals between any two.
Compare is a toggled mode (⌘2, or the Compare button in the transport bar), one of the four mutually exclusive modes (Edit, Compare, Chapter, Dialogue) Specula exposes. The mode carries its own sky-blue accent colour and its own dedicated toolbar: the compare toolbar sits below the transport with the per-slot offset / gain / polarity controls, the diff toggle, and the metrics popover - the same shape the other three mode toolbars use, so muscle memory carries between modes.
Three ways in:
Compare is mutually exclusive with Edit, Chapter, and Dialogue. Entering one mode exits the others; the Compare-mode-only controls (per-slot toolbar, diff toggle, slot-switch keys) disappear when you leave the mode and reappear when you return.
After the first file is open, drag additional files onto the window, drop them on the Compare toggle, or use the + button on the file dock. New files load in detached background tasks; their per-file metrics (integrated LUFS, true peak, sample peak, LRA, DR, speech stats, loudness curve, stereo width curve, speech history) are computed on load.
| 1 - 6 | Switch to slot A - F (only slots that exist). Bare digits, no modifier. |
Slot switching uses two paths. Fast path: when the new slot's audio format matches the current one, the engine performs an atomic buffer swap. Sub-millisecond. Slow path: when sample rate or channel count differs, the engine is rebuilt. ~50-100 ms.
Drag any slot chip in the file dock to a different position to reorder. The slot you drop on becomes the new home of the dragged slot, and the chip you targeted (plus everything between) shifts to make room. Dragging a slot to position A (the leftmost) promotes it to the reference file; existing residuals are invalidated at that point because they were computed against the previous A. Use this when your file came in from a Dock drop or Finder drag and Specula didn't pick the file you wanted as the primary. Finder serialises drag selections in display order, not click order, so the file you wanted as the reference may not be the one Specula picked.
Each slot's chip has an L toggle. When on, Specula applies a gain offset that brings the slot's integrated LUFS to match slot A's. Hear two masters at the same loudness instead of "the louder one always wins".
Each slot has a ϕ toggle that flips polarity. Useful for catching wiring errors and for diff workflows where flipping one signal makes the residual sit closer to zero.
The compare toolbar's offset (samples + ms) and gain (dB) readouts are editable TextFields, not just display labels. Click into one, type the value you want, press Return to commit (or Esc to cancel). Each field has a small ↺ reset button that appears when the value is non-zero - one click returns it to zero. Right-click either field for the same Reset entry.
The ±1 sample / ±1 ms / ±0.1 dB / ±1 dB nudge buttons sit next to the fields for fine adjustments by ear, folded together with level-match into one gain stage. The transport bar's ↺ Hold play-start toggle (full description in Loading & playback → Transport) extends to compare-mode nudges: with it on, every nudge re-seeks playback to where Play was last started, so the comparison point stays fixed while you click ±1 sample / ±1 ms.
Click the auto-align button on any non-A slot. Specula runs decimated cross-correlation between slot A and the active slot to find the best sample offset, then applies it. Works on takes that are within ~10 seconds of each other.
Diff is a per-slot listening mode, not a separate slot. The active non-A slot's toolbar carries a prominent Listen to Diff toggle: press it and playback switches to the cached A − slot residual through the same waveform, spectrogram, and export paths the source uses. Press the toggle again (or pick another slot) to return to the source instantly.
Bare D (no modifier) toggles diff view on the active compare slot - a one-key A/B between source and residual that fits the muscle memory of bare-digit slot switching.
The residual:
The Waveform A Overlay toggle (Edit menu → "Waveform A Overlay" or in the Compare panel) draws slot A's waveform as a ghost behind whichever slot is active - direct visual A/B at the sample level.
The Compare metrics button (toolbar) opens a popover with a side-by-side metrics table for every loaded slot - integrated LUFS, true peak, sample peak, LRA, DR, speech stats, etc.
Non-destructive editing. Always saves as a new file - never overwrites the original.
Edit, Chapter, and Dialogue share one editing session: the edited audio, the undo history, and the unsaved-changes state follow you across all three modes, so a workflow that moves between them is continuous and a single Save at the end writes everything. Switching modes never discards edits; loading a different file, or Discard Edits, is what clears the session. Multi-file Compare is separate: entering it from unsaved edits prompts to Save or Discard first.
⌘1 toggles edit mode. Edit is one of the four mutually exclusive modes (Edit, Compare, Chapter, Dialogue); entering it exits any other active mode. The mode carries the amber accent colour - matching the EDIT badge that lights on the waveform and the tint of the edit toolbar.
Each operation pauses playback, captures an undo snapshot (a 16-level disk-backed stack), applies the change, and resumes.
| Operation | Notes |
|---|---|
| Trim | Crop to the current selection (keep the selection, discard the rest). |
| Cut | Remove the selection and join the audio on either side - the opposite of Trim. Use it to take a too-long pause or a flubbed line out of the middle. An equal-power crossfade at the join (default 10 ms, set it or 0 for a hard cut) hides the splice click; it's clamped to the audio kept on each side. Undoable, saves as a new file, and large files stream the result. |
| Add Silence | Insert at the playhead (no selection needed - park the cursor where the gap belongs, say 500 ms for a video edit, and Apply), insert before / after or replace a selection, or insert at the file start / end. Units: seconds or samples. |
| Change Level | Linear gain in dB, applied to the selection or the whole file. |
| Invert Phase | Multiply by −1. |
| Fade | Linear / Logarithmic / Equal-power × In / Out - 6 fade curves total. |
| Normalize Peak | Target dBFS. Applies to the selection when one is set, otherwise the whole file. |
| Normalize LUFS | Target LUFS + ceiling (interpreted as dBTP). Auto-applies a 5 ms lookahead true-peak limiter when normalization increases loudness so inter-sample peaks stay under the ceiling, not just sample peaks. Two-pass offline: pass 1 builds the gain envelope from the input's oversampled peaks; pass 2 catches any residual peaks the release stage might leave fractionally over the ceiling. Scopes to the selection when one is set (measure and gain over that range only), otherwise the whole file. |
| Limit TP | True-peak limiter only, ceiling in dBTP. Loudness otherwise unchanged. Use when integrated LUFS is already where you want it but inter-sample peaks need to be capped under a platform ceiling. Same DSP as the limiter Normalize LUFS uses; just skips the loudness measurement and gain stage. Scopes to the selection when one is set, otherwise the whole file. |
| Level Dialogue | A region-aware downward expander keyed to the detected speech regions (the ones you can tune in Dialogue mode). It ducks the room tone between phrases with a fast attack and slow release, so word onsets and tails stay clean, no clipped starts, no chopped ends. Tick Lift dialogue to target and it also gains the speech to a loudness target in the same pass, capping inter-sample peaks at a true-peak ceiling. Because the silence is gated before the makeup gain goes on, the floor ends up lower, not higher: that's the move a plain dialog-gated Normalize can't make, since a uniform gain lifts the floor along with the voice. It's a gate, not a denoiser, so it helps a marginal floor (a quiet room tone just under the gate), not a hissy recording. Undoable, saves as a new file. |
| Remove DC Offset | Subtracts each channel's mean. |
| Swap Channels | Stereo L↔R swap. |
| Split to Mono | Writes N mono WAV files (one per channel) to a directory you pick. Channel labels in the filename. |
Edit mode adds a set of single-key edits, active only in Edit mode so they don't shadow the transport keys, and listed in the Edit menu so the keys are always visible. The first group acts on the selection; the playhead group acts at the cursor. Cut uses a 10 ms equal-power crossfade by default; the fade keys use the equal-power curve.
| Key | Action | Acts on |
|---|---|---|
| ⌫ | Cut (remove and crossfade-join) | selection |
| ⌥⌫ | Silence selection (replace, keep length) | selection |
| ⌘T | Trim to selection (keep it, discard the rest) | selection |
| ⌘F / ⌥⌘F | Fade In / Fade Out | selection |
| ⌃↑ / ⌃↓ | Gain +1 / −1 dB | selection |
| ⌘⌫ | Cut to start (remove start → playhead) | playhead |
| ⇧⌘⌫ | Cut to end (remove playhead → end) | playhead |
| ⌥[ / ⌥] | Fade in to / out from the playhead | playhead |
⌘Z steps back through the session's edits, including edits applied from Chapter mode (Level chapters) and the Level Dialogue pass, since the three modes share one undo history. ⇧⌘Z redoes. Undo is disk-backed (each level is a temp file, not a second copy in RAM), so deep history doesn't multiply memory. In Dialogue mode, ⌘Z targets region edits (a separate 50-step history); audio-buffer undo is reached from Edit or Chapter.
⇧⌘S opens a save panel. Specula writes the current audio to a new file (it never overwrites the source). Format defaults to WAV at the file's native bit depth and sample rate. Saving makes the written file the working document in place: your chapters, dialogue regions, and playhead stay put, and you stay in the mode you were in.
Discard Edits (Edit menu) reverts the buffer to the last saved version in one step, the explicit counterpart to Save. Save As, Discard, and the unsaved-changes dot are reachable from Edit, Chapter, and Dialogue alike, because those three modes share one editing session.
While in edit mode, a horizontal toolbar appears below the transport. Operations with parameters (Cut, Silence, Gain, Fade, Normalize, Limit TP, Level Dialogue) open small popovers - set the parameters, click Apply. The Limit TP popover takes a ceiling in dBTP with presets at -2 / -1 / -0.5 / -0.1. The Cut popover takes the crossfade length (default 10 ms, 0 for a hard cut). The Level Dialogue popover sets the duck amount, an optional Lift dialogue to target with a target and true-peak ceiling.
A dedicated mode for audiobooks and long-form podcasts. Specula segments the file at long silences, scores every chapter against the full ACX measurement set, and surfaces both the per-chapter ACX gates and each chapter's deviation from the book's own median. Mutually exclusive with the other three modes (Edit, Compare, Dialogue) so the workspace stays focused.
A Chapter button sits in the transport bar next to Edit, Compare, and Dialogue. Click it to enter or exit chapter mode; ⌘3 does the same from the keyboard. Chapter mode carries the green accent colour - the chapter toolbar that slots in below the transport, the playing-chapter fill on the ribbon, and the Chapter button itself all match. Mutually exclusive with Edit, Compare, and Dialogue.
Detect scans the loaded mono mix for silences longer than the minimum duration (default 2 s) below the silence threshold (default −55 dB). Each silence becomes a boundary; the audio between two silences becomes a chapter. Chapters whose total non-silent content runs under 1 s are dropped so a stray cough doesn't fragment the result.
The detector absorbs leading and trailing silence into the first and last chapters. Chapter 1 always starts at 0 s, the last chapter always ends at file duration, and the ribbon covers the full file timeline with no gaps at the edges.
Boundary markers are drawn inside every time-axis section that has one open: waveform, loudness curve, spectrogram, and stereo width. Each section uses its own time-to-x mapping, so the boundaries stay aligned across stacked views at any zoom level. The chapter ribbon mirrors the waveform's zoom and horizontal scroll, and narrow chapters keep their width when neighbours dominate the visible range.
Each chapter boundary draws as a vertical teal line with a "#N" badge labelling the chapter that starts at that boundary. Drag a boundary on the waveform to move it. Hit zone is 6 px; the cursor switches to the macOS resize cursor when it lands on a boundary. Drag clamps so neither neighbouring chapter shrinks below 1 s of content. A time + dBFS hover readout tracks the boundary's new position while the drag is in progress.
Add drops a new boundary at the playhead. The chapter ribbon and the per-chapter measurements update as you drag.
Click a slot in the ribbon to select it. The selected slot takes a 2 pt accent border; unselected slots keep a 1 pt white border. Selection and playback are independent: the currently playing chapter is shown by an accent-tinted fill (28 %), regardless of which chapter is selected. Double-click a slot to seek the playhead to its start.
Click a boundary line in the waveform (not a slot in the ribbon) to flag it. The flagged boundary draws thicker in amber with a "✕" marker next to its number, and the chapter that starts at that boundary becomes the selected chapter in the ribbon. The next press of Remove in the chapter toolbar deletes that boundary, merging its two chapters into one. This is the precise way to pick which split to undo; clicking a slot in the ribbon still selects the same chapter without changing which boundary is highlighted.
Double-click a chapter name in the ribbon to rename. Default names ("Chapter 01", "Chapter 02"…) follow the index after edits; custom names ("Foreword", "Walk through the woods") stay put.
Analyse Loudness in the toolbar runs an offline BS.1770 pass per chapter and populates the full measurement set. For each chapter:
Chapters shorter than 10 s skip the BS.1770 measurements entirely (the absolute-gate and relative-gate machinery isn't reliable below that). RMS and sample peak still populate, since plain RMS and peak are valid at any length, so the ACX loudness gate and clipping don't slip through on short chapters.
Each slot is four lines tall:
#NN · Chapter Name m:ss
-20.4 dB RMS -20.6 LUFS +0.0 LU
TP -2.1 dBTP NF -62 dB
LRA 4.2 LU Dlg -19.8 LUFS Sp 87%
Missing metrics render as muted - so the row positions stay stable across chapters that haven't been analysed yet, are too short for an integrated reading, or fall outside the VAD's coverage.
Each chapter is independently checked against three ACX delivery limits. Failures render the metric in red on the ribbon slot and on the report's chapter table:
These three gates are independent of the deviation-from-median flag, so each chapter answers two separate questions: will ACX accept this chapter on its own? (the three red/no-red signals) and is this chapter consistent with the rest of the book? (the deviation badge, controlled by the ±2 LU threshold in Settings → Targets → Chapter detection).
The consistency metric (RMS or integrated LUFS) and the deviation threshold (default ±2.0, in that metric's unit) drive the red-tint flag on outlier slots and the headline banner at the top of the ribbon ("Chapter 07 is 2.3 dB above median"). RMS is the default, since it's what ACX delivery is judged on. Both live in Settings → Targets → Chapter detection (the metric is also set in the Level… popover).
The fix for the outlier flag, and for the consistency deviation across a book. Level… in the toolbar opens a popover with two choices:
It applies a per-chapter uniform gain, then re-measures so the ribbon updates. RMS covers every chapter; LUFS skips chapters under 10 s (no integrated reading), as it does chapters already on target. Run Analyse Loudness first; the button stays disabled until there's a reading.
It's a single undoable edit (⌘Z). Chapter mode shows a dirty dot and a Save As button when the buffer has unsaved edits, and ⇧⌘S writes the leveled file without leaving Chapter mode.
A true-peak ceiling is applied in the same edit, on by default, so a boosted chapter can't run past the delivery ceiling. It defaults to −3 dBTP for RMS/ACX and −1 dBTP for LUFS (tracking the metric), and the Level popover lets you change it or switch it off. One caveat remains: a uniform per-chapter gain moves that chapter's room-tone noise floor with its level, so for a floor-aware pass, gate the room tone first with Level Dialogue, then level.
Restores Specula's silence-detection result, dropping all manual boundary edits, adds, and removes. Renames are kept.
Export… writes the current chapters (name + start + end) to a JSON sidecar. Import… loads one back, replacing the current chapter list. The exported JSON is plain Specula JSON; the importer also accepts a slimmer format with just start and end per entry, so you can hand-craft one.
Export CSV… writes the per-chapter metrics table (RMS, integrated LUFS, deviation from median, true peak, sample peak, LRA, max momentary / short-term, dialog LUFS, noise floor, speech %) as a spreadsheet-ready CSV, for the producers who live in spreadsheets. Run Analyse Loudness first so the metric columns are populated.
When you load a file whose fingerprint (filename + duration + sample rate + channels) matches a prior chapter setup saved in ~/Library/Application Support/Specula/chapters/, a Recall button appears in the toolbar. Click to restore the saved chapters. Toggle the prompt in Settings → Targets → Chapter detection.
The sixth section at the bottom of the window. Time-aligned to the waveform's zoom and scroll. Click a slot to select; double-click to seek the playhead to its start. Toggle the section's visibility with ⌃6 (independent of whether chapter mode is active).
When chapters exist on the file, the PDF / JSON report includes a Chapters section. The table carries every per-chapter metric the ribbon shows: RMS, integrated LUFS, deviation from median, true peak, sample peak, LRA, max momentary, max short-term, dialog LUFS, noise floor, and speech percentage. Cells fail-red on the same three ACX gates the ribbon uses (RMS outside [−23, −18] dB, TP > −3 dBTP, NF > −60 dB). The caption above the table documents all three gates and the configured deviation threshold so a producer reading the PDF can interpret it without opening the app.
| Key | Action | Output |
|---|---|---|
| ⇧⌘E | Export JSON Report | *.json - all metrics, both live and selection analysis, channel info, file metadata |
| ⌥⌘E | Export PDF Report | *.pdf - formatted report with charts |
| ⌃⌘E | Export Diff Audio | *.wav - the active slot's cached A − slot residual buffer (compare mode only). Same output as the Export Diff button next to the Listen to Diff toggle. |
In edit mode, ⇧⌘S is "Save Edited Audio As", not the JSON export - the menu item swaps based on context.
The Settings → Report tab decides what's included:
JSON is a stable, machine-readable schema. PDF is the same content rendered for human reading. For the audiobook and podcast producers who live in spreadsheets, Chapter mode → Export CSV… writes the per-chapter metrics table as a CSV (one row per chapter), and the CLI's specula report <file> --format csv does the same from the command line. See Chapter mode.
The full-file report (no selection) carries integrated LUFS, true peak (4× oversampled dBTP), sample peak, LRA, Max Momentary, Max Short-Term, DR, speech-gated LUFS, speech %, stereo correlation, per-channel stats, and loudness-target verdicts - every metric the info panel shows.
Each entry in loudnessTargets.targets[] in the JSON envelope carries an inline verdict object so a Shortcut, CLI pipeline, or batch QC script can branch on pass / fail without recomputing it against the target spec. Fields: status (compliant / penalty / notCompliant / unavailable), actualLUFS, plus penaltyGainDB / penaltyApplies / truePeakClean for streaming targets and nonComplianceReason / nonComplianceActual / nonComplianceLimit for hard-spec fails. The PDF and HTML reports render the same verdict as a status badge. Full field reference in Loudness targets.
specula) #For batch QC, shell scripts, and Shortcuts that walk a folder of files, Specula ships a CLI built on the same measurement engine the app uses. Same numbers, no window.
The CLI ships inside the app bundle. Pick Specula → Install Command-Line Tool… and Specula links specula into /usr/local/bin (one admin prompt if that folder needs it). The link points at the copy inside Specula.app, so app updates keep the installed command current.
Manual alternative (scripted setups):
sudo ln -sfh "/Applications/Specula.app/Contents/Helpers/specula" /usr/local/bin/specula
Verify with specula --version.
The CLI shares the app's license and trial. It runs through the 7-day trial (a days-left note prints on stderr) and requires the license activated in the app after that (Specula → Manage License…); an expired, unlicensed install exits with code 77 and an explanation on stderr. --help and --version always work. The first licensed run asks once for access to the license in the Keychain - click Always Allow. Do that once in a local Terminal window before using the tool over SSH, where the prompt can't appear.
specula analyze <FILE>Headline numbers (integrated LUFS, true peak, sample peak, loudness range, speech-gated LUFS, stereo correlation) as JSON.
specula analyze mix-v2.wav
specula analyze mix-v2.wav --no-vad # skip Silero (faster)
specula analyze mix-v2.wav --no-stereo # skip the per-block correlation pass
specula compare <A> <B>Two files side-by-side with B − A deltas for every metric.
specula compare master-v1.wav master-v2.wav
specula compare master-v1.wav master-v2.wav --match-loudness
specula compare master-v1.wav master-v2.wav --no-vad # skip Silero on both files
--match-loudness subtracts the integrated-LUFS delta from B's peak / speech-gated readings so the deltas surface spectral or shape differences instead of being dominated by level mismatch. Useful when you've rebalanced a mix but want to know whether the tonality really changed.
--out-diff <PATH> additionally writes the A − gainB·B residual to that path as a Float32 WAV (paired with --match-loudness to net out pure level differences before subtracting). The compare JSON gains a residual block carrying the residual's peak / RMS / applied gain so a script can branch on the residual energy without re-loading the file. Sample rate + channel count must match between A and B - the diff refuses structural mismatches rather than silently resampling.
specula edit <FILE> --out <OUT> …Applies edit operations and writes the result to --out (-o for short). --out is required and must not resolve to the same path as the input - Specula never writes in place, the same contract the app's Save As enforces.
specula edit mix.wav --out mix-norm.wav --normalize-lufs=-14 --lufs-ceiling=-1
specula edit mix.wav --out mix-trimmed.wav --trim 5.0:30.5
specula edit mix.wav --out mix-louder.wav --gain 3
Negative values need the --option=value form (e.g. --gain=-3, --normalize-lufs=-14). The CLI otherwise reads a bare leading - as an option name and rejects the value.
Supported operations (applied in this order if multiple are passed):
--trim START:END - keep only the range, in seconds (e.g. 5.0:30.5).--dc-remove - subtract per-channel mean.--invert-phase - flip polarity on every channel.--gain DB - uniform gain.--normalize-peak DBFS or --normalize-lufs LUFS [--lufs-ceiling DBTP] - mutually exclusive.--limit-tp DBTP - apply the true-peak limiter at this ceiling without changing loudness; runs last so it can catch any residual peaks left by an upstream normalize.LUFS normalization and --limit-tp use the same 5 ms two-pass true-peak limiter (BS.1770 4× polyphase oversampled detection).
specula report <FILE>Full Specula report through the headless pipeline, in your choice of format.
specula report mix.wav --mode music # JSON to stdout
specula report mix.wav --mode podcast --out report.json # JSON to file
specula report mix.wav --mode music --format html --out report.html # the same dark-themed HTML the app's preview window renders
specula report mix.wav --mode music --format pdf --out report.pdf # rasterised PDF
specula report book.wav --mode podcast --format csv --out chapters.csv # per-chapter metrics table as CSV
specula report mix.wav --mode broadcast --no-curve # drop the loudness curve for smaller batch output
--format selects the output shape: json (default), html, pdf, or csv. PDF requires --out (binary data on stdout isn't useful), and the receipt printed to stdout on success carries the written byte count + mode so a pipeline can confirm the file landed. csv emits the per-chapter metrics table when the file has chapters, otherwise a flat metric,value table of the file-level numbers. --mode selects which loudness-target catalog drives the evaluation: music, podcast, vod, broadcast. --no-curve omits the per-100 ms loudness curve when the receipt only needs the summary numbers. --no-vad skips Silero speech detection (the speech-gated fields read -inf) and --no-stereo skips the correlation pass, both for faster unattended batches.
Every subcommand emits pretty-printed JSON with sorted keys, so diffs across runs stay deterministic. Non-finite floats (silence → -∞) render as "-inf" strings rather than blowing up the encoder.
Specula surfaces eight App Intents that the system Shortcuts app and Siri can call directly. Same engine as the CLI; same numbers. No file-handling shell glue required, output is chainable. The four core actions ship in two variants each (file-input and text-path) so they fit whichever Shortcuts workflow shape you have.
| Action | File-input | Path-input | Returns |
|---|---|---|---|
| Measurements | Get Measurements | Get Measurements (from Path) | Specula Measurements (I, M, S, LRA, TP, SP, DR, Speech-Gated, Speech %) |
| Numeric compare | Compare Files | Compare Files (from Paths) | Specula Comparison (both sides + B − A deltas) |
| Audio compare | Get Compare Diff | Get Compare Diff (from Paths) | residual WAV file (A − gainB·B) |
| Report | Get Report | Get Report (from Path) | report file (JSON / HTML / PDF) |
Each action's behaviour:
--mode. Pipe the result straight into AirDrop, Save File, Mail, or any Shortcut file step.Each intent's file parameter shows a Choose button followed by a small … menu. The Choose button opens a static file picker (one-off use). The … menu is where the magic-variable picker lives - pick Shortcut Input to bind the file that came in from a Finder Quick Action / Share Sheet / prior action's output. Don't pick "Clear" - that wipes the binding.
For text-path workflows (clipboard text, Ask for Text, paths pasted from a shell), each intent has a sibling (from Path) variant that takes the audio file as a text string instead of a File. The text variants accept:
/Users/me/foo.wav/Users/me/My Mix.wav (no quoting needed)~/Music/foo.wav'/Users/me/My Mix.wav' (terminal realpath output)"/Users/me/My Mix.wav"'/Users/me/My Mix.wav'file:///Users/me/My%20Mix.wav (percent-encoded)Quotes are stripped automatically; tilde gets expanded against your home directory. So a typical clipboard-driven shortcut is just Get Clipboard → Get Report (from Path) with the clipboard variable bound to Audio File Path.
Each file-input action also answers to a second phrasing: "Get measurements from Specula", "Compare audio in Specula", "Get Specula report".
Type one into Spotlight (⌘Space). Intents with required file parameters will prompt for the file via a picker; for variable-driven workflows the Shortcuts app is the better surface.
The six measurement, compare, and report intents default to speech detection off. Flip the "Enable Speech Detection" toggle in the Shortcut step when you need speech-gated metrics; the first speech-gated run may take a moment as the model loads. The two Get Compare Diff intents return a pure residual, so they carry no speech toggle.
Three services install automatically and appear under Services when you right-click one or more audio files in Finder. They work out of the box, with nothing to build in Shortcuts or configure in Automator.
Specula Report - <filename>.pdf next to the source (auto-numbered if that name exists), falling back to ~/Downloads if the source folder isn't writable, then reveals it in Finder. If Specula wasn't running it stays hidden the whole time, so no empty window pops up.If the entries don't appear right after first install, log out and back in (or reboot) so Launch Services indexes them.
Open with ⌘,. Nine tabs: Layout · Spectrogram · FFT / Analysis · Speech · Targets · Report · Updates · License · Acknowledgements.
Defaults for which sections show on launch:
Each section can still be toggled live with ⌃1-⌃5.
Liquid Glass chrome (macOS 26+ only). On macOS 26 the transport bar, file dock, info panel, seek bar, and section-toggle strip use the system Liquid Glass material so they feel native alongside Finder, Safari, and other macOS 26 apps. Default on. Turn off in Settings → Layout → Appearance for solid dark chrome, useful when recording screenshots or screencasts where the depth-aware refraction would change with whatever's underneath the window, or simply if you prefer the flat look. The toggle is hidden on macOS 14-25 (Liquid Glass isn't available there; chrome stays solid regardless).
With the toggle on, each chrome strip renders as a rounded floating pill (radius 6, with a small dark gutter between strips and the window edges), the macOS 26 native pattern Music, Photos, and Calendar use. The Liquid Glass material's edge highlights become the rim of each pill rather than horizontal lines crossing the window. The seek bar's accent fill also picks up the glass material when the toggle is on (otherwise solid). With the toggle off, chrome is edge-to-edge square strips; the rounding belongs to the Liquid Glass look.
The seek bar's track and fill are 8 px tall regardless of the toggle, a substantial slider lane that reads well whether it's filled with glass or solid colour.
All six values (colormap, frequency scale, Floor, Ceiling, FFT overlap, auto-compute on load) persist across launches.
Edits re-run detection in about half a second and the new history feeds back into the live loudness path. If you have manual edits in Dialogue mode when you move a slider, they're pushed onto the Dialogue mode undo stack first so you can ⌘Z back to them. See Speech-gated loudness + Dialogue mode for the full workflow.
| ⌘O | Open audio file |
| ⌘, | Open Preferences |
| Space | Play / pause |
| ⌘. | Stop and return to start |
| ← / → | Skip ±5 seconds |
| ⇧← / ⇧→ | Skip ±1 second |
| ⌘← | Jump to start |
| L | Toggle loop |
| [ | Step rate down |
| ] | Step rate up |
| \ | Reset rate to 1× |
| Gesture | Action |
|---|---|
| Click / plain drag | Move the playhead (drag scrubs) |
| ⌘-drag | Create a selection region |
| Drag a selection's edge / body | Resize / move it |
| Scroll (wheel or two-finger) | Horizontal zoom (cursor-anchored) |
| ⇧-scroll | Pan horizontally |
| ⌥-scroll | Amplitude (vertical) zoom |
| Esc | Clear selection (waveform + FFT) |
| ⌘A | Select all |
| ⌘Return | Analyse selection (offline) |
| ⌘1 | Enter / exit Edit mode (accent: amber) |
| ⌘2 | Enter / exit Compare mode (accent: sky-blue) |
| ⌘3 | Enter / exit Chapter mode (accent: green) |
| ⌘4 | Enter / exit Dialogue mode (accent: rose) |
| 1 - 6 | Switch to compare slot A - F (bare digits, Compare mode only) |
| D | Toggle Listen to Diff on the active slot (bare D, Compare mode only) |
These operate only in Edit mode (so they don't shadow the transport keys) and are listed in the Edit menu. The first group needs a selection; the playhead group acts at the cursor. Defaults: Cut uses a 10 ms equal-power crossfade; the fade keys use the equal-power curve.
| ⌘Z / ⇧⌘Z | Undo / redo last edit |
| ⇧⌘S | Save edited audio (new file) |
| ⌫ | Cut (remove and crossfade-join) - selection |
| ⌥⌫ | Silence selection (replace, keep length) - selection |
| ⌘T | Trim to selection (keep it, discard the rest) - selection |
| ⌘F / ⌥⌘F | Fade In / Fade Out - selection |
| ⌃↑ / ⌃↓ | Gain +1 / −1 dB - selection |
| ⌘⌫ | Cut to start (remove start → playhead) - playhead |
| ⇧⌘⌫ | Cut to end (remove playhead → end) - playhead |
| ⌥[ / ⌥] | Fade in to / out from the playhead - playhead |
| ⌃6 | Toggle chapter ribbon section visibility |
| I | Mark In (anchor at playhead, or set start of selected region) |
| O | Mark Out (close pending region, or set end of selected region) |
| S | Split selected region at playhead |
| Delete | Delete selected region |
| ⌘Z / ⇧⌘Z | Undo / redo region edit (50-step history) |
| ⇧⌘E | Export JSON report |
| ⌥⌘E | Export PDF report |
| ⌃⌘E | Export diff audio (WAV) |
| ⌃1 | Toggle Waveform |
| ⌃2 | Toggle Loudness Curve |
| ⌃3 | Toggle Spectrogram |
| ⌃4 | Toggle FFT |
| ⌃5 | Toggle Stereo Width |
| ⌃6 | Toggle Chapter Ribbon |
| ⌃⌥1 | Focus Waveform |
| ⌃⌥2 | Focus Loudness Curve |
| ⌃⌥3 | Focus Spectrogram |
| ⌃⌥4 | Focus FFT |
| ⌃⌥5 | Focus Stereo Width |
| Standard | Where it appears |
|---|---|
| ITU-R BS.1770-4 | K-weighting filter, 100 ms gating blocks, dual gating, 4× true-peak oversampling |
| EBU R128 | Momentary / Short-term / Integrated LUFS, LRA, True Peak, recommended programme loudness |
| Silero VAD (MIT) | Speech detection model |
| FluidAudio (Apache 2.0) | Swift wrapper for Silero on Apple platforms |
Specula's loudness implementation is validated against the EBU R128 loudness test set (EBU Tech 3341 / 3342), to the EBU Tech 3341 ±0.1 LU tolerance. The underlying measurement algorithm is ITU-R BS.1770-4, which EBU R128 builds on.
EBU R 128