Three Prompts: Speech To Slides
Three Prompts: Speech To Slides

Introduction
Have you ever been struck by inspiration and wished you could quickly turn your thoughts into a presentation without the hassle of typing everything out? The Speech-to-Slides app solves this problem by letting you create presentations using just your voice. This innovative web application leverages the Web Speech API to convert spoken words into beautifully formatted Reveal.js slide decks in real-time.
What makes this project particularly interesting is that it was created using only three AI prompts, demonstrating how complex functionality can be implemented efficiently with the right guidance. The app combines speech recognition technology with modern web development practices to create a seamless experience for users who prefer speaking over typing when creating presentations.
The Three Prompts
Here are the exact prompts that were used to create this project:
Prompt 1
@projects/speech-to-slides Speech-to-Slides Hit record, speak bullet points, get an auto-generated Reveal.js deck. Web Speech API → Markdown → Reveal; 2: add voice commands (“next slide”, “theme X”); 3: fix mis-captured punctuation.
Prompt 2
This error is showing in the console App.jsx:73 Uncaught ReferenceError: Cannot access 'processVoiceCommand' before initialization at App (App.jsx:73:20)
Prompt 3
All the text in the inputs and presenation are white on white. Fix this. Also remove the ability to changes themes by saying it. It's broken. 1. fix the input colors 2. Fix the presentation colors, the text in the presentation is white on white... 2. Fix the UI, 1 background colors full width, keep it simple 3. Remove the text 'Say "theme [name]" to change the theme.'
Let's analyze how these prompts guided the development process:
-
Initial Concept and Core Functionality: The first prompt outlined the basic concept of the application—converting speech to slides using Web Speech API, markdown conversion, and Reveal.js for presentation. It established the core functionality and technology stack.
-
Bug Fix: The second prompt addressed a specific JavaScript error related to function initialization order, highlighting the importance of proper code organization in React components.
-
UI and UX Improvements: The final prompt focused on fixing visual issues with the application, including text color problems, simplifying the UI, and removing problematic features (theme voice commands). This demonstrates the iterative refinement process that's essential in software development.
These prompts show a natural progression from concept to implementation to refinement, covering the full development lifecycle of the application.
Technologies Used
The Speech-to-Slides application leverages several modern web technologies:
- React: The UI is built using React (v19.1.0), providing a component-based architecture for the application
- Web Speech API: The core speech recognition functionality uses the browser's native Web Speech API
- Reveal.js: This powerful presentation framework (v4.5.0) renders the final slides with professional transitions and styling
- Marked: The markdown parsing library (v9.1.0) converts the processed speech text into HTML for the slides
- Vite: The project uses Vite as the build tool and development server for fast iteration
Key Features and Functionality
Speech Recognition
The application uses the Web Speech API's continuous recognition mode to capture speech in real-time. The speech recognition engine processes spoken words and converts them into text that's displayed in the transcript area. This happens seamlessly as you speak, allowing you to see your words appear on screen instantly.
Voice Commands
One of the most powerful features is the ability to control the presentation structure using voice commands. Saying "next slide" automatically creates a new slide break in your presentation, allowing you to organize your content without touching the keyboard.
Automatic Formatting
The app doesn't just transcribe your words—it intelligently formats them into proper presentation content:
- Spoken sentences are converted into bullet points
- Punctuation is automatically corrected and standardized
- First letters of sentences are capitalized
- Periods are added where needed
Live Preview
As you speak or edit the transcript, you can see a live preview of how your slides will look. Each slide is displayed as a card, giving you immediate feedback on your presentation's structure and content.
Presentation Mode
When you're ready to view your presentation, the app transforms into a full-screen Reveal.js deck with professional styling and navigation controls. You can navigate through slides using arrow keys and return to the editor when needed.
Responsive Design
The application features a responsive design that works well on both desktop and mobile devices, with special attention paid to touch-friendly controls and appropriate sizing for smaller screens.
Implementation Details
Speech Recognition Implementation
The speech recognition functionality is implemented using React's useEffect
and useRef
hooks. The Web Speech API is initialized when the component mounts, and the recognition process is configured to be continuous, allowing for uninterrupted speech capture:
useEffect(() => {
// Initialize Web Speech API
if ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
recognitionRef.current = new SpeechRecognition()
recognitionRef.current.continuous = true
recognitionRef.current.interimResults = true
// Event handlers for speech recognition
// ...
}
}, [isRecording, processVoiceCommand])
Voice Command Processing
Voice commands are processed in real-time using a callback function that analyzes the speech transcript for specific phrases:
const processVoiceCommand = useCallback((command) => {
// Handle "next slide" command
if (command.includes('next slide')) {
setTranscript(prev => prev + '\n\nnext slide\n\n')
return
}
}, [setTranscript])
Transcript Processing
The raw transcript is processed into formatted slides using a series of string manipulations and regular expressions:
const processTranscript = () => {
// Split transcript by "next slide" or periods to create bullet points
const slideTexts = transcript.split(/next slide/i).map(text => text.trim()).filter(Boolean)
const processedSlides = slideTexts.map(slideText => {
// Split by periods or line breaks to create bullet points
const points = slideText
.split(/\.\s+|\n/)
.map(point => point.trim())
.filter(Boolean)
.map(point => {
// Fix common punctuation issues
let fixedPoint = point
.replace(/\s+([.,;:!?])/g, '$1') // Remove spaces before punctuation
.replace(/([.,;:!?])([A-Za-z])/g, '$1 $2') // Add space after punctuation if missing
// Capitalize first letter if it's not already
if (fixedPoint && /[a-z]/.test(fixedPoint[0])) {
fixedPoint = fixedPoint[0].toUpperCase() + fixedPoint.slice(1)
}
// Add period at the end if missing
if (fixedPoint && !fixedPoint.match(/[.,;:!?]$/)) {
fixedPoint += '.'
}
return fixedPoint
})
.filter(Boolean)
// Convert points to markdown
const markdown = points.map(point => `* ${point}`).join('\n')
return markdown
})
setSlides(processedSlides)
setStatus('Ready')
}
Reveal.js Integration
The presentation mode uses Reveal.js, which is initialized when the user enters presentation mode and destroyed when they exit:
useEffect(() => {
if (showPresentation && presentationRef.current) {
if (revealRef.current) {
revealRef.current.destroy()
}
revealRef.current = new Reveal(presentationRef.current, {
controls: true,
progress: true,
center: true,
hash: true,
})
revealRef.current.initialize()
}
return () => {
if (revealRef.current) {
revealRef.current.destroy()
revealRef.current = null
}
}
}, [showPresentation, slides])
Challenges and Solutions
Challenge 1: Function Reference Error
Problem: The application initially had a ReferenceError where the processVoiceCommand
function was being used before it was defined.
Solution: The function was moved before its usage and wrapped in a useCallback
hook to maintain proper dependency management in React:
// Define processVoiceCommand before using it in useEffect
const processVoiceCommand = useCallback((command) => {
// Function implementation
}, [setTranscript])
useEffect(() => {
// Now the function can be safely used here
}, [isRecording, processVoiceCommand])
Challenge 2: Text Visibility Issues
Problem: The application had text visibility issues where text in inputs and the presentation was white on a white background, making it unreadable.
Solution: The CSS was updated to explicitly set text colors and ensure proper contrast:
textarea {
background-color: #ffffff;
color: var(--text-color);
}
.reveal {
color: #333333;
}
.reveal h1,
.reveal h2,
.reveal h3,
.reveal h4,
.reveal h5,
.reveal h6 {
color: #333333;
}
.reveal ul li,
.reveal ol li,
.reveal p {
color: #333333;
}
Challenge 3: Voice Command Reliability
Problem: The theme voice commands were unreliable and causing confusion.
Solution: The theme voice command functionality was removed, and the UI was simplified to use a fixed theme, focusing on the core functionality of speech-to-slides conversion.
Conclusion
The Speech-to-Slides application demonstrates how modern web technologies can be combined to create a powerful tool that transforms the way we create presentations. By leveraging the Web Speech API, React, and Reveal.js, the app provides a seamless experience for converting spoken words into professional-looking slides.
The development process, guided by just three prompts, showcases the efficiency of iterative development—starting with core functionality, addressing bugs, and refining the user experience. The result is a practical tool that could be valuable for anyone who prefers speaking over typing when creating presentations.
Potential future enhancements could include additional voice commands for text formatting, export options to various formats, image insertion capabilities, and improved punctuation handling. The foundation laid by this project provides a solid base for these extensions.