Audiocraft
Automate and integrate Audiocraft audio generation into your projects
Audiocraft is a community skill for generating music and audio using Meta AudioCraft models, covering text-to-music generation, audio continuation, melody conditioning, sound effect synthesis, and model configuration for AI audio production workflows.
What Is This?
Overview
Audiocraft provides patterns for using Meta AudioCraft models to generate and manipulate audio programmatically. It covers text-to-music generation that creates musical compositions from natural language descriptions of genre, mood, and instrumentation, audio continuation that extends existing audio clips by generating coherent follow-up content, melody conditioning that generates music matching a provided melody reference while varying arrangement and style, sound effect synthesis that creates environmental sounds and sound effects from text descriptions, and model configuration that tunes generation parameters for duration, quality, and inference speed. The skill enables developers to build audio generation features into applications without requiring deep expertise in digital signal processing or music theory.
Who Should Use This
This skill serves application developers adding AI music generation to creative tools, game developers generating dynamic background music and sound effects, and content creators building automated audio production pipelines. It is also relevant for researchers and hobbyists exploring generative audio techniques.
Why Use It?
Problems It Solves
Licensing commercial music for applications is expensive and legally complex. Creating original music requires specialized skills and production time. Dynamic audio that adapts to context needs real-time generation capability. Sound effect libraries are limited and require searching through large collections. AudioCraft addresses these constraints by enabling on-demand, royalty-free audio generation from simple text descriptions.
Core Highlights
MusicGen generates music from text prompts with controllable genre and mood. AudioGen produces sound effects from text descriptions. Melody conditioning preserves a reference melody while generating new arrangements. Duration and quality parameters balance generation speed with output fidelity.
How to Use It?
Basic Usage
from audiocraft.models\
import MusicGen
import torchaudio
model = MusicGen.get_pretrained(
'facebook/musicgen-medium')
model.set_generation_params(
duration=15,
temperature=1.0,
top_k=250,
cfg_coef=3.0,
)
descriptions = [
'upbeat electronic dance'
+ ' music with synth pads'
+ ' and driving bass',
'calm acoustic guitar'
+ ' with light percussion'
+ ' and ambient pads',
]
wav = model.generate(
descriptions)
for i, audio\
in enumerate(wav):
torchaudio.save(
f'output_{i}.wav',
audio.cpu(),
sample_rate=32000)Real-World Examples
from audiocraft.models\
import MusicGen
import torchaudio
model = MusicGen.get_pretrained(
'facebook/'
+ 'musicgen-melody')
model.set_generation_params(
duration=20)
melody, sr =\
torchaudio.load(
'reference.wav')
if sr != model.sample_rate:
melody = torchaudio\
.functional.resample(
melody, sr,
model.sample_rate)
wav = model.generate_with_chroma(
descriptions=[
'orchestral arrangement'
+ ' with strings'
+ ' and piano'],
melody_wavs=melody\
.unsqueeze(0),
melody_sample_rate=\
model.sample_rate,
)
torchaudio.save(
'conditioned.wav',
wav[0].cpu(),
sample_rate=32000)Advanced Tips
Adjust the cfg_coef parameter to control how closely the generation follows the text description where higher values produce more literal interpretations and lower values allow more creative variation. Use the small model for rapid prototyping and the large model for production quality output. Chain audio continuation with text conditioning to create longer compositions that evolve through multiple sections. When crafting prompts, include specific tempo indicators such as slow, mid-tempo, or fast alongside mood descriptors to improve output consistency.
When to Use It?
Use Cases
Generate background music for a video editing application from user text descriptions. Create dynamic game audio that adapts to gameplay context using text-conditioned generation. Build a sound effect generator for a content creation platform.
Related Topics
AI music generation, AudioCraft, MusicGen, audio synthesis, and generative audio.
Important Notes
Requirements
PyTorch with CUDA support for GPU-accelerated generation. AudioCraft library installed from the Meta research repository. Sufficient GPU memory for model loading where medium model requires approximately 4GB VRAM.
Usage Recommendations
Do: use descriptive prompts that specify genre, instruments, tempo, and mood for best results. Start with shorter durations to iterate on prompt quality before generating longer pieces. Use melody conditioning when you have a specific musical direction to maintain.
Don't: expect generated music to match the quality of professional studio productions. Use the largest model for quick experiments as it requires significant GPU resources. Generate very long durations in a single pass which degrades quality after 30 seconds.
Limitations
Generated audio quality decreases with longer durations beyond 30 seconds. The model does not generate vocals or lyrics only instrumental music. Real-time generation is not feasible on consumer hardware due to inference time. Model weights require significant disk space with the large model exceeding 3GB. Output audio is 32kHz mono which may need resampling for production use. Generated audio varies between runs with the same prompt due to stochastic sampling.
More Skills You Might Like
Explore similar skills to enhance your workflow
Flutter Animations
Automate and integrate Flutter Animations for smooth and dynamic mobile app motion design
Team Update
Post project updates to team chat, gather feedback, triage responses, and plan next steps. Adapts to available tools (chat, git, issues, tasks). First
Loomio Automation
Automate Loomio operations through Composio's Loomio toolkit via Rube MCP
Incident Commander
Streamline emergency response and IT operations with automated incident management and communication
Brex Automation
Automate Brex operations through Composio's Brex toolkit via Rube MCP
Gosquared Automation
Automate Gosquared operations through Composio's Gosquared toolkit via