Exploring the World of Text: A Comprehensive Guide
This article aims to present an abstraction of ideas described in FreezeFlix’s World Of Text article (https://freezflix.gitbook.io/freezflix) and implementation ideas, as well as Technithusiasts implementation of these ideas as outlined in his YouTube videos (here and here).
At a higher abstraction level, the ideas presented in the FreezeFlix article are essentially a Personal Information Management (PIM) system. However, FreezeFlix improves on this by adding automation where possible, and Technithusiast (partially) uses different methods for this automation.
Process
Let’s look at a higher-level schematic of the ideas suggested:
At this higher level, the system follows a classic information processing cycle:
CONVERT: Transform various media types into text (OCR, speech-to-text, etc.)
CAPTURE: Collect and store the converted text and direct text inputs
PROCESS: Enrich and categorize the information
ACTION: Convert to tasks or knowledge base entries
This abstraction shows that "World of Text" is really about transforming unstructured thoughts into structured, actionable information - a digital implementation of the "Getting Things Done" (GTD) methodology or a Personal Knowledge Management (PKM) system.
Automation
Each block offers different automation opportunities:
CONVERT
Automated OCR for images
Speech-to-text for audio files
Automated folder monitoring for new media
CAPTURE
Webhook endpoints for external services
API integration points
Automated file system monitoring
PROCESS
Node-RED flows for routing and transformation
Natural Language Processing for categorization
Pattern matching for content classification
ACTION
Automatic task creation
Storage automation
Automated notifications/alerts
Within each block, different applications (local or cloud-based) can be used to achieve certain tasks while Node-RED can act as the "glue" connecting all these automation points, creating a seamless flow from input to action. Each block can also have its own set of automation rules and triggers while contributing to and maintaining the overall flow.
Options, option, options..
Node-RED appears to be a great choice to act as the overall process manager for this information flow, but there are other options. And while some block actions can also be performed by Node-RED functions directly, external software may be better suited to perform that action. And while it is certainly possible to use a MongoDB database instance to store information (as per FreezeFlix’s outline), there are other options as well (as Technithusiast demonstrated by using Notion).
Remember that with every addition to the software stack used to accomplish this there is the added benefit of system maintenance: a MongoDB server and database must be maintained, as does a Node-RED instance. This takes up the very time you want to save with a PKM.
And sometimes you want a system to be flexible enough to also be able to store source data (i.e. the actual media file).
Take-away here is that there is more than one way to go about achieving this. Key is doing this efficiently, with a minimum of functional overlap to minimize cost.
Implementation
As far as implementing this is concerned, as Node-RED is already running my home i am confident that this the easiest and most viable option to manage the entire process flow.
For the rest of the functions however, things are not as clear-cut. Whisper STT (Speech To Text) is already running as part of Home Assistant, but I’m not sure it’s possible to send an audio file to it and have it generate the text.
Methods to investigate further:
Google NotebookLM
May offer useful functions (once available in Europe). Appears to not be available by API, so probably of limited use.
Pinokio
Local AI\LLM implementation with ability to run multiple applications (free, if your system is capable enough..). First tests suggest this might not be feasible unless running you’re a 4090 GPU on an Intel I7 system 24/7.
Whisper STT (Speech To Text)
Already running as part of Home Assistant, Whisper STT might be used to convert audio to text. If not using the Home Assistant instance maybe a separate instance may work (using Pinokio?).
PLAUD.AI NotePin or Note
Capturing voice data with AI transcoding and enrichment options (paid). Preferable to using a smart phone because dedicated in function and possibly more unobtrusive in use. However, also 1 more device to charge & take along.
Questions: how to integrate in a Node-RED workflow, how to get voice data automatically, what about sending non-plaud-recorded audio
Capabilities
Object-based Note\PKM application with AI option (free + paid). Multi-platform, cloud-based (local with sync being worked on). API option in paid plan. Could function as storage as well. Realy promissing software!
Questions: API not fully developed yet, to what extent can in-app-AI help in automation, integrate other PKM functions
TickTick
ToDo list application (integrates with Capabilities). Also adds calendar etc.
AI (OpenAI, Abacus AI, …)
Possibly there is need for an additional AI connection for tasks that cannot be handled by the applications. API should be available.
OpenAI: DeFacto standard, ChatGPT, select between multiple generations of 1 model.
AbacusAI: ChatLLM, select between multiple models allows use of most appropriate model.
Nice to have
Of course, it would be nice to have an integration with Home Assistant. As Node-RED is to be used as the main flow manager, this shouldn’t be a problem. But good to be mindfull of this anyway.
Subscribe to my newsletter
Read articles from DensTechDen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by