Three Prompts: Voice Modulation

Introduction

Voice modulation is a fascinating area where technology meets creativity, allowing us to transform audio in ways that can be both practical and entertaining. This project demonstrates how to build a browser-based voice modulation application that lets users record or upload audio and apply various effects like pitch shifting, speed adjustment, and timbre modification.

What makes this project particularly interesting is that it was created using just three AI prompts, showcasing how complex audio processing functionality can be implemented efficiently with modern web technologies. The application provides real-time visual feedback through waveform displays, making it easy to see how your adjustments affect the audio.

The Three Prompts

Here are the exact prompts that were used to create this project:

Prompt 1

In @projects/voice-modulation make a demo project of voice modulation/voice changing

I want a super simple UI that works on desktop and mobile. It would be nice to show the base and adjusted audio waveform.

Keep executing tool calls until you finish the demo project

Prompt 2

Now adjust the pitch control to actually do pitch and use another slider for playback speed.

Add timbre too

Prompt 3

Now fix the modulated audio call. It's not playing. There are no errors in the console

The first prompt established the foundation of the project, requesting a simple UI that works across devices and displays waveforms for both original and modified audio. The second prompt enhanced the functionality by separating pitch and speed controls, and adding timbre adjustment. The final prompt addressed a technical issue with audio playback, ensuring the modulated audio would play correctly.

Technologies Used

This project leverages several modern web technologies:

React 19 with Vite 6: Provides the UI framework and development environment
Web Audio API: Powers the audio processing capabilities, including decoding, playback, and effects
MediaRecorder API: Enables recording directly from the user's microphone
Canvas API: Used to visualize audio waveforms in real-time

The application demonstrates how these technologies can work together to create an interactive audio processing tool without requiring any external libraries for the core audio functionality.

Key Features and Functionality

Audio Input Options

Users have two ways to get audio into the application:

File Upload: Import existing audio files for modification
Microphone Recording: Record audio directly in the browser

This flexibility makes the tool useful for a variety of scenarios, from modifying pre-recorded content to experimenting with live voice effects.

Real-time Waveform Visualization

Both the original and modulated audio are visualized as waveforms, providing immediate visual feedback about how the audio is being transformed. The waveforms update in real-time as you adjust the parameters, helping users understand the relationship between the controls and the resulting sound.

Multiple Audio Parameters

The application offers three distinct audio controls:

Pitch: Adjust in semitones (-12 to +12), allowing for precise musical adjustments
Speed: Control playback speed (0.5x to 2x) independently from pitch
Timbre: Modify the tonal quality using a low-pass filter (300Hz to 10,000Hz)

These parameters can be combined to create a wide range of voice effects, from deep monster voices to high-pitched chipmunk sounds.

Responsive Design

The interface adapts seamlessly between desktop and mobile devices, with a layout that reorganizes based on screen size. On larger screens, the waveforms appear side-by-side, while on mobile they stack vertically for better visibility.

Implementation Details

Audio Processing

The core audio processing is handled through the Web Audio API, which provides powerful capabilities for manipulating audio in the browser:

AudioContext is used to decode audio files and create the processing pipeline
Pitch shifting is implemented using the mathematical formula 2^(semitones/12), which converts semitone values to the corresponding frequency ratio
BiquadFilterNode with type 'lowpass' provides timbre control by filtering out higher frequencies
The audio pipeline connects the audio element through the filter to the audio output

One interesting aspect is how the application separates pitch from speed, allowing independent control of these parameters that are often linked in simpler implementations.

Waveform Visualization

The waveform visualization uses a custom drawing function that efficiently renders audio data to HTML canvas elements:

function drawWaveform(buffer, canvas, rate = 1) {
  if (!canvas || !buffer) return;
  const ctx = canvas.getContext('2d');
  const { width, height } = canvas;
  ctx.clearRect(0, 0, width, height);
  ctx.strokeStyle = '#646cff';
  const data = buffer.getChannelData(0);
  const samplesPerPixel = Math.max(1, Math.floor(data.length / (width * rate)));
  // Drawing logic...
}

This function handles large audio files efficiently by sampling the data based on the canvas width, ensuring smooth performance even with longer recordings.

Audio Recording

The application uses the MediaRecorder API to capture audio from the user's microphone:

const toggleRecording = async () => {
  if (isRecording) {
    mediaRecorderRef.current.stop();
    setIsRecording(false);
  } else {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const mediaRecorder = new MediaRecorder(stream);
    // Setup recording handlers...
    mediaRecorder.start();
    mediaRecorderRef.current = mediaRecorder;
    setIsRecording(true);
  }
};

This implementation collects audio chunks as they become available and combines them into a blob when recording stops, which is then processed the same way as uploaded files.

Challenges and Solutions

Challenge 1: Modulated Audio Playback

Initially, the modulated audio would not play properly because the AudioContext and filter were not being initialized correctly. The solution was to split the audio context initialization and filter updates into separate effects, ensuring the audio context is only created after audio is loaded:

useEffect(() => {
  if (!modAudioRef.current || !audioBuffer) return;
  if (!audioCtxRef.current) {
    const ctx = new (window.AudioContext || window.webkitAudioContext)();
    audioCtxRef.current = ctx;
    const source = ctx.createMediaElementSource(modAudioRef.current);
    const filter = ctx.createBiquadFilter();
    // Setup filter...
  }
}, [audioBuffer, timbre]);

This approach ensures that the audio element is properly connected to the Web Audio API processing chain before playback begins.

Challenge 2: True Pitch Shifting

Implementing pitch shifting that doesn't affect playback speed required separating these parameters and applying them correctly:

const semitoneToRate = (semitones) => Math.pow(2, semitones / 12);

const updatePlaybackRate = useCallback(() => {
  if (modAudioRef.current) {
    modAudioRef.current.playbackRate = semitoneToRate(pitch) * speed;
  }
}, [pitch, speed]);

This mathematical approach allows for precise musical pitch adjustments in semitones while maintaining independent control over playback speed.

Challenge 3: Efficient Waveform Visualization

Rendering waveforms for large audio files could potentially cause performance issues. The solution was to implement a sampling algorithm that adapts to the canvas width and audio length:

const samplesPerPixel = Math.max(1, Math.floor(data.length / (width * rate)));
for (let x = 0; x < width; x++) {
  const start = Math.floor(x * samplesPerPixel * rate);
  let min = 1, max = -1;
  // Find min/max values in the sample range...
}

This approach ensures smooth performance by only processing the minimum amount of audio data needed for an accurate visualization.

Conclusion

The Voice Modulation project demonstrates how powerful audio processing capabilities can be implemented in the browser using standard web APIs. With just three prompts, we were able to create a fully functional application that provides intuitive controls for modifying audio in real-time.

There are several ways this project could be expanded in the future:

Additional audio effects like reverb, echo, or distortion
More advanced visualization options, such as frequency spectrum analysis
Preset configurations for common voice effects
The ability to save and share modified audio

Try out the Voice Modulation demo yourself to experiment with different audio effects and see how the visual representation changes as you adjust the parameters.

If you're interested in more projects created with minimal prompts, check out the Three Prompts collection for additional examples of what can be accomplished with AI-assisted development.