Architectural Document for Google TTS Python package

This document is using arc42 template and C4 model.

[TOC]

1. Introduction and Goals

  • Requirements Overview:

    • Provide text-to-speech functionality for various languages.

    • Handle texts of varying lengths, including very long texts.

    • Support real-time audio playback.

      More details can be found in User Stories

2. Constraints

  • Technical Constraints:

    • Requires internet connectivity for accessing the Google TTS service.

    • Dependent on external libraries (requests, playsound).

  • Operational Constraints:

    • Audio playback capabilities needed on the host machine.

3. Context and Scope

  • External Interfaces:

    • Google Translate TTS API for fetching speech audio.

  • User Interfaces:

    • Calling the text-to-speech function play_tts(text, language)

Hello World! Example

This example demonstrates how to use the google_text_to_speech package to convert a simple text string, “Hello World!”, into speech.

Prerequisites

Ensure that you have installed the google_text_to_speech package:

pip install google_text_to_speech
from google_text_to_speech import play_tts

# Text to be converted to speech
text = "Hello World!"
language = "en"  # Language code (e.g., "en" for English)

# Calling the text-to-speech function
play_tts(text, language)

4. Solution Strategy

  • Split long texts into sentences to avoid exceeding URL length limits.

  • Use multithreading to handle audio playback and file removal.

  • Organize the project using a src directory for better separation of source code and tests.

5. Building Block View

@startuml
package "src/play_tts Module" {
    [play_tts] -right-> [generate_url]
    [play_tts] -down-> [play_and_remove_file]
    [generate_url] .right.> (Google TTS API)
    [play_and_remove_file] ..> (Audio Playback)
}

package "External Dependencies" {
    (Google TTS API) ..> [Internet]
    (Audio Playback) ..> [System Audio Device]
}
@enduml

5.1 Context Diagram:

  • System: Google Translate TTS Python Module

  • Users: Developers using the module

  • External Systems: Google Translate TTS API

@startuml Context Diagram
!include <C4/C4_Context>

Person(developer, "Developer", "Uses the Google Translate TTS Python Module")
System(googleTTSAPI, "Google Translate TTS API", "Provides text-to-speech service")
System_Boundary(sys, "Google Translate TTS Python Module") {
    System(googleTTSModule, "Google Translate TTS Module", "Python module for converting text to speech")
}

Rel(developer, googleTTSModule, "Uses")
Rel(googleTTSModule, googleTTSAPI, "Sends TTS requests to")
@enduml

5.2 Container Diagram:

  • Container: Python Module with different functionalities

@startuml C4 Container Diagram
!include <C4/C4_Container>

Person(user, "User", "Interacts with play_tts Module")
Container(playTTSModule, "play_tts Module", "Python", "Module to play text-to-speech")
ContainerDb(googleTTSAPI, "Google TTS API", "External API for text-to-speech conversion")
System_Ext(audioPlayback, "Audio Playback", "System Audio Device")
System_Ext(internet, "Internet", "Facilitates external API communication")

Rel(playTTSModule, googleTTSAPI, "Uses for TTS conversion")
Rel(playTTSModule, audioPlayback, "Sends audio output to")
Rel(googleTTSAPI, internet, "Communicates via")
Rel(audioPlayback, internet, "Communicates via")
Rel(user, playTTSModule, "Uses")
@enduml

5.3 Component Diagram (focusing on the play_tts function):

  • Components: URL Generator, Audio Player, and Error Handler

@startuml Component Diagram
!include <C4/C4_Component>

Container(googleTTSModule, "Google Translate TTS Module", "Python")

Component(generateURL, "generate_url()", "Generates TTS API URL")
Component(playAndRemoveFile, "play_and_remove_file()", "Plays and removes audio file")
Component(splitLongSentence, "split_long_sentence()", "Splits long sentences")
Component(splitText, "split_text()", "Splits text into smaller parts")
Component(playTTS, "play_tts()", "Plays text-to-speech for given text and language")

Rel(generateURL, playTTS, "Called by")
Rel(playAndRemoveFile, playTTS, "Called by")
Rel(splitLongSentence, splitText, "Called by")
Rel(splitText, playTTS, "Called by")
@enduml

6. Runtime View

  • Sequence Diagram for play_tts Execution (PlantUML):

@startuml
actor User
participant "play_tts" as PTTS
participant "generate_url" as GURL
participant "Google TTS API" as GTTS
participant "File System" as FS
participant "Audio Playback" as AUDIO

User -> PTTS: Calls play_tts(text, lang)
loop for each sentence
    PTTS -> GURL: generate_url(sentence, lang)
    GURL -> GTTS: Request audio data
    GTTS -> GURL: Return audio data
    PTTS -> FS: Save audio file
    PTTS -> AUDIO: Play audio file
    PTTS -> FS: Remove audio file
end
@enduml

7. Deployment View (PlantUML Diagram)

  • Deployment of the Module:

@startuml
package "User's Machine" {
    [User Application] -right-> [src/play_tts Module]
    [src/play_tts Module] -down-> [Google TTS API]
    [System Audio Device] <-left- [src/play_tts Module]
}

node "Google Servers" {
    [Google TTS API]
}
@enduml

8. Crosscutting Concepts

  • Concurrency: Use of threading for simultaneous audio playback and file operations.

  • Error Handling: Manage errors related to network issues, file operations, and external API limitations.

9. Architecture Decisions

  • Decision to split text into sentences for handling long texts.

  • Use of external TTS service (Google TTS) for speech synthesis.

  • Adoption of a src directory structure for the project to cleanly separate source code from tests and documentation.

10. Quality Requirements

  • Performance: Handle long texts without significant delays.

  • Usability: Accurate and clear audio playback in requested languages.

10.1 Quality Reports

Lint score: Pylint Score

11. Risks and Technical Debt

  • Dependency on the availability and limitations of Google TTS API.

  • Potential issues with audio playback on different operating systems.

12. Glossary

  • TTS: Text-to-Speech

  • API: Application Programming Interface