How Neural Amp Modelling Works
I recently picked up guitar again after a ten-year break. A lot has changed.
When I stopped playing, amp modelling meant compromise. Line 6 had been shipping digital modellers since 1996, but experienced players could always tell the difference. The “digital fizz” was a running joke. Studios kept their vintage Marshalls and Fenders.
That’s no longer true. Modern amp modellers routinely fool experienced engineers in blind tests.[2] Some studios have switched entirely to digital. Touring musicians carry rack units instead of 50kg valve heads.
As a software engineer, I got curious. What does software engineering look like at companies like Neural DSP or Fractal Audio? How do they build these things? What changed in the last decade to make digital amps sound this good?
This post is a summation of what I learned - from the signal processing fundamentals through to the neural network architectures that power modern amp capture.
What Guitar Effects Do
Before diving into how amp modelling works, it helps to understand what’s being modelled.
Guitar effects change sounds. A distortion pedal makes things crunchier. A delay adds echoes. A wah makes that distinctive “wah-wah” sweep. But what’s happening under the hood?
Signal Transformation: Distortion
The simplest transformation to understand is distortion. It clips the signal, limiting how loud peaks can get.
The visualisation shows this relationship. On the left is the transfer function - a graph mapping input amplitude to output amplitude. A perfectly linear system would be a straight diagonal line (what goes in comes out unchanged). Distortion bends this into an S-curve, compressing loud peaks. On the right, you can see the effect on actual waveforms: the input (blue) has sharp peaks, while the output (warm) has those peaks squashed.
Drag the drive slider to see how increasing the effect amount changes both the transfer curve and the resulting waveform. At low drive, the curve is nearly linear. At high drive, it flattens at the extremes - hard clipping.
But Signals Aren’t Simple Waves
The visualisation above shows a simple wave, but a real guitar signal is far more complex. When you pluck a string, you don’t get a single frequency - you get the fundamental note plus a stack of harmonics (overtones) at multiples of that frequency. A low E (82Hz) also contains energy at 164Hz, 246Hz, 328Hz, and so on. The relative strength of these harmonics is what makes a Stratocaster sound different from a Les Paul, or a bridge pickup sound different from a neck pickup.
Frequency-Dependent Processing
This is where it gets interesting. Effects can treat different frequencies differently.
The visualisation shows a frequency spectrum - how much energy exists at each frequency in the signal. The blue line is the original signal (more energy in the low-mids, less in the highs - typical of a guitar). The dashed orange line is an EQ curve - how much we’re boosting or cutting at each frequency. The cyan line is the result.
Try adjusting the sliders. Boost the mids and you get a honky, nasal tone. Cut the highs and things get darker. This is what every tone knob, EQ pedal, and cabinet simulation is doing - reshaping the frequency content.
A wah pedal takes this further - it sweeps a resonant peak across the frequency range as you rock the pedal. A cabinet simulation applies a complex EQ curve that mimics how speaker cabinets colour the sound.
Other Transformations
Effects do much more than just EQ and distortion:
Delay adds a copy of the signal after a set time. The diagram shows the original (blue), a quieter delayed copy, and the combined result (warm). Reverb is similar but with thousands of decaying reflections instead of one clean repeat.
Tremolo modulates the volume over time. The diagram shows how a low-frequency oscillator (LFO) varies the amplitude - the signal swells and fades rhythmically. Chorus and phaser use similar modulation on pitch and phase.
Compression evens out the dynamics. Loud parts get quieter, quiet parts get louder. The diagram shows a signal with varying volume (faint blue) becoming more consistent after compression (warm).
Distortion clips the peaks. The faint blue shows the original waveform; the warm line shows the squashed result with flattened peaks.
Memory and State
But the transfer function is only part of the story. Real gear has memory - the output depends not just on the current input, but on what came before:
- Thermal state: Tubes change behaviour as they heat up. The first few minutes sound different from an hour in.
- Power supply dynamics: A loud chord depletes the power supply capacitors, affecting how the next notes sound.
- Frequency interactions: Capacitors and inductors throughout the circuit create different responses at different frequencies - and these can interact with the distortion behaviour.
- Component aging: Old tubes, worn pots, and degraded capacitors all affect the sound.
This is what makes analogue gear feel “alive” to play through - it responds to how you’re playing, not just what you’re playing.
Recreating This Digitally
So how do digital effects companies recreate all of this - the frequency shaping, the nonlinear distortion, the memory, the state - in software or dedicated hardware?
There are two fundamental approaches:
White-box modelling: Understand the circuit. Analyse every component - every resistor, capacitor, tube, transformer. Write mathematical equations describing how they behave. Solve those equations in real time to produce the output.
Black-box modelling: Don’t try to understand the circuit at all. Instead, record what goes in and what comes out. Train a system to learn the relationship between input and output. Let the maths figure itself out.
Both approaches can run on different hardware depending on the use case.
Where the Code Runs
If you’re a software engineer, you might assume audio effects are written in C++ or Python and run on normal CPUs. And sometimes they are.
Frameworks like JUCE let developers write audio plugins in C++ that run on x86 or ARM processors. When you load an amp sim plugin in Ableton or Logic Pro, that’s regular compiled code running on your Mac or PC. These plugins aren’t written with strict latency constraints - the DAW handles buffering, and a few milliseconds of delay is acceptable when you’re mixing a recorded track.
But real-time hardware - the floor units and rack processors that guitarists plug into on stage - is different. The delay between plucking a string and hearing sound needs to stay below about 10ms for playing to feel natural. Professional players often want it under 5ms.[4] At those latencies, you can’t rely on an operating system scheduler or garbage collector. You need deterministic timing.
This is where DSPs (Digital Signal Processors) come in. A DSP is a specialised chip optimised for signal processing: multiply-accumulate operations in a single cycle, specialised memory architectures for streaming data, predictable timing guarantees. At a 48kHz sample rate, a new audio sample arrives every 20.8 microseconds. The DSP needs to read the input, run it through the amp model (potentially hundreds of calculations), and write the output - all before the next sample arrives. Miss that deadline, and the audio glitches.
FPGAs (Field-Programmable Gate Arrays) take this further. Instead of running software on a chip, you’re configuring the chip itself - defining the actual logic gates and their connections using hardware description languages like Verilog. You’re designing circuits, not writing programs.
Fractal Audio’s Axe-FX, for example, uses high-performance DSPs (TI Keystone chips) rather than FPGAs - their ARES platform is a software architecture optimised for these dedicated signal processors. Other manufacturers do use FPGAs for convolution processing and parallel effects routing.
Neither DSPs nor FPGAs are mass-market chips. They’re niche components - audio, telecommunications, industrial control. For a while, there was excitement about FPGAs for AI inference (particularly automotive applications), but that’s largely been overtaken by GPU hype. The audio world remains one of the few places where these chips are essential.
This is why high-end units cost what they do. The engineering isn’t just in the algorithms - it’s in making those algorithms run on specialised hardware with microsecond-level timing guarantees.
White-Box: Circuit Simulation
The white-box approach is to model the actual circuit mathematically. The process is iterative:
- Model each component - tubes, resistors, capacitors, transformers. Write equations describing how they behave.
- Approximate the unknowns - some behaviours (thermal drift, tube aging, transformer saturation) aren’t fully deterministic. Use your best mathematical approximations.
- Compare with the real thing - run the same signal through both and A/B the results.
- Refine the maths - adjust equations, add terms, improve approximations until it sounds right.
The trade-off is full parametric control. Turn up the virtual gain knob and the model responds like the real circuit would - because it’s simulating the circuit behaviour. Designers can even create “impossible” amps by mixing components that never existed together.
The cost is that some behaviours are genuinely hard to model mathematically.[6] Transformer hysteresis, tube microphonics, the interaction between power supply sag and multiple gain stages - these involve complex physics that don’t reduce to clean equations. Each amp requires months of engineering work to get right. And the computation is expensive - solving coupled nonlinear differential equations in real time requires serious hardware.
Black-Box: Neural Capture
The alternative is to skip the circuit modelling entirely. Instead of trying to understand and program each component, use machine learning to learn the input-output relationship directly from recordings.
This is what changed in the last ten years. Companies like Neural DSP, IK Multimedia’s ToneX, and others moved towards neural networks as the foundation for black-box amp modelling. Rather than reverse-engineering circuits, they train models on real gear.
To understand how this works, we’ll look at NAM (Neural Amp Modeler) - an open-source implementation with readable code and well-documented architecture. Commercial products like Neural DSP’s Quad Cortex use proprietary implementations, but the underlying principles are similar.
One clarification: despite the name, NAM runs on regular x86/ARM processors - there’s no DSP implementation. Most neural amp modellers work this way, running as plugins on your computer or as software on ARM-based floor units. That said, some commercial implementations do run parts of their neural networks on dedicated DSP or FPGA hardware for lower latency.
The capture process works like this:
- Get the sweep signal - NAM provides a pre-designed audio file of sine sweeps, noise bursts, and chirps that cover the full frequency and dynamic range
- Play it through the amp and record - route the sweep through your amp and record the output (mic’d speaker or load box)
- Train - the neural network learns to predict the recorded output from the known input
- Check the ESR - Error to Signal Ratio measures capture quality. Below 0.01 is excellent.
The trade-off is universality. Anything with an audio input and output can be captured - vintage amps, rare pedals, specific microphone positions, even rooms. No reverse engineering required. A bedroom hobbyist can capture a $50,000 vintage amp if they can borrow it for an afternoon.
This has spawned community marketplaces where users share captures. ToneHunt hosts thousands of free NAM profiles, while Neural DSP has a built-in cloud library for Quad Cortex users. You can download someone else’s capture of a rare Dumble or a specific pedal chain without ever touching the real gear.
The cost is that captures are snapshots. Each capture represents one specific combination of settings. Change the gain knob and the capture is wrong. The model has no concept of the underlying parameters - it just knows this input produces this output.
Inside NAM: How the Networks Work
A disclaimer: I’m not an expert in neural networks or DSP mathematics. This is my best understanding from reading the research and code. If I’ve got something wrong, let me know and I’ll fix it.
The core challenge for any neural amp model is memory. A loud chord affects how the next quiet note sounds - power supply sag, thermal effects, capacitor charge. The network needs to “see” enough history to capture these dynamics.
At 48kHz, amp dynamics happen over tens of milliseconds - that’s hundreds or thousands of samples the network needs to consider for each output sample.
The visualisation shows how this works. To produce one output sample, the network looks back at input samples spaced exponentially - recent samples densely, older samples more sparsely. This lets it “see” enough history (40+ milliseconds) without needing hundreds of layers.[8]
NAM offers several architecture options - WaveNet, LSTM, and others - each with different trade-offs.
The Capture Challenge
Not all gear captures equally well. The neural network is learning a mapping from input to output, and some mappings are harder to learn than others.
Fuzz pedals were notoriously difficult. Fuzz circuits have extreme nonlinearity with long memory. The same input note sounds different depending on what was played seconds earlier. Transistor-based fuzzes also have significant temperature dependence - the circuit literally sounds different as it warms up.
Standard captures treat the system as if each output sample depends only on recent input samples. Fuzz behaviour violates this assumption - it has internal state (transistor bias points, thermal effects) that isn’t observable from the audio signal alone. That said, Neural Capture v2 seems to have solved much of this - their updated architecture handles fuzz pedals significantly better.
High-gain amps are challenging for similar reasons. The multiple cascaded gain stages create complex interactions. Noise gates and compression effects add state that’s hard to capture.
Clean amps capture well. Their behaviour is closer to linear, with less long-term state. The nonlinearities are gentle enough that a reasonable receptive field captures the relevant dynamics.
Research is ongoing into “stateful” capture approaches that explicitly model hidden internal state, but these require more complex networks and more training data.[10]
Making It Real-Time
Training a neural network is one thing. Running it fast enough for real-time audio is another.
At 48kHz with a 64-sample buffer, you’ve got about 1.3ms to process each block. Miss that deadline and the audio glitches. Several tricks make this possible:
- Parallel processing - modern CPUs can process 4-8 samples at once using vector instructions (SIMD). This alone can make code 4-8x faster.
- Approximations - functions like tanh appear constantly in neural networks. The mathematically exact version is slow; a fast approximation that’s “close enough” saves significant CPU.
- Lower precision - running the network in 16-bit instead of 32-bit can double throughput. For audio, the difference is inaudible.
It’s Not All-or-Nothing
In practice, most commercial products use both approaches. White-box circuit simulation and neural capture aren’t competing technologies - they’re complementary tools in a signal chain.
Consider a typical signal path: guitar → input buffer → gain stages → tone stack → power amp → cabinet → microphone. Some of these stages are well-understood and cheap to simulate mathematically. Others are idiosyncratic or computationally expensive to model from first principles.
A practical architecture might use traditional DSP for the predictable parts - overdrive, distortion, EQ - and neural networks for the parts that are hard to model mathematically. That vintage cabinet with the specific speaker breakup you love? Capture it. The tone stack? That’s just filters; simulate it directly.
There’s also a third technique: convolution. Reverb often uses impulse responses (IRs) - a recording of how a room responds to a click, which gets mathematically convolved with your signal. It’s pure DSP, no neural network involved, and it’s been the standard for reverb and cab simulation for decades. That’s a topic for another post.
Hardware platforms enable this mixing. Modern FPGAs often include embedded ARM cores, allowing a single unit to run circuit simulation on the FPGA fabric while the ARM processor handles neural inference for specific stages. The signal flows through a pipeline where each section uses whatever approach makes sense.
This is why the white-box vs black-box framing can be misleading. The question isn’t which approach is better - it’s which parts of your signal chain benefit from each.
Where Things Are Heading
The current generation of neural amp modelling captures static snapshots well. The frontier is parametric modelling[11] - learning not just one setting, but how the amp responds across its full control range.
Imagine a single model that accurately reproduces all combinations of gain, tone, and volume. This requires conditioning the network on control parameters, and the training data requirements increase substantially - captures at many different settings instead of just one.
Some commercial products already do this. The training is more involved, but the result is a single model that covers the full range of an amp’s controls - replacing dozens of individual captures.