Self-Hosted AI news aggregator using Cloudflare Workers, Vectorize, and Nostr
2 points by delirehberi
2 points by delirehberi
I built this primarily to solve my own reading fatigue from jumping between HN, Lobsters, and Reddit, while keeping data completely within my own infrastructure.
Architecture & Storage Choices:The entire stack is built to live on Cloudflare’s free/low-cost tier to make self-hosting accessible.
@cf/baai/bge-base-en-v1.5.For Lobsters specifically, the background cron job polls the .json endpoints rather than scraping raw HTML. For bootstrapping historical preferences, I provided endpoints to ingest past JSON or RSS activity exports so the Cosine Similarity calculation actually has a baseline vector profile to match against.
bge-base-en-v1.5 because it executes completely inside Workers AI with zero external cold starts, but I’m experimenting with more performant embeddings models if they become available natively on the edge.The repo includes a Makefile that handles full remote provisioning (make db-init, make vectorize-init, make deploy) so you don't have to navigate the Cloudflare UI manually.
Would love any feedback on the vector indexing strategy at the edge, or how people are handling semantic search history pruning over time.