Common audio formats

Description

Audio files are processed to create journal entries with automatic transcription and AI-generated titles. They support coordinate correlation through GPS track matching.

Supported Formats

All audio formats supported by pydub.AudioSegment, including: - WAV (.wav) - MP3 (.mp3) - FLAC (.flac) - M4A (.m4a) - OGG (.ogg) - And other formats supported by FFmpeg

Processing Details

Time Extraction

Filename Parsing (Primary): Extracts numeric sequences from filename (e.g., note_20200101_040820.wav → 2020-01-01 04:08:20)
File Modification Time (Fallback): Uses the file's mtime when filename parsing fails

Coordinate Extraction

Time-Based Correlation Only: Audio files rely entirely on time correlation with GPS tracks. The system matches the audio timestamp with the nearest GPS track point within the configured time window.

Audio Processing

Format Conversion: All audio is converted to MP3 format for web compatibility
Transcription (Optional):
Automatic speech-to-text using AI/Ollama
Creates timestamped segments
Generates searchable text content
Title Generation: AI analyzes transcribed content to create descriptive titles
Web Player: Creates interactive audio player with transcript display

Configuration

features:
  transcription:
    enabled: !auto transcription.enabled    # Enable/disable transcription (auto-detect)
  llms:
    enabled: true                           # Enable/disable LLM features
    text_model: "llama3:8b"                # Model for transcription and titles
  geo_correlation:                         # For coordinate correlation
    enabled: true
    time_offset: !duration 0 seconds       # Audio device time offset
    max_time_diff: !duration 300 seconds   # Max correlation window

llm_prompts:
  generate_title:                          # Title generation settings
    prompt: |
      Create exactly one title that summarizes the following text...
    options:
      temperature: 0.2
      top_p: 0.8

Dependencies

pydub: Audio format conversion
FFmpeg: Backend for audio processing
Ollama: AI transcription and title generation (optional)

Tips for Best Results

Naming Convention: Use timestamp-based filenames for accurate time extraction
Time Synchronization: Keep audio recording device time synced with GPS device
Transcription Quality: Clear audio and supported language improve AI transcription accuracy
GPS Logging: Maintain GPS tracks during audio recording for location data

Output

Audio Player: Interactive web-based player in journal entries
Transcription: Searchable text content with timestamps
AI Titles: Automatically generated descriptive titles based on content
Location Data: Coordinates from GPS correlation (when available)