md2wechat Agent API markmd2wechat Agent API
  • Docs
  • Themes
  • Editor
  • Blog
  • Pricing
  • Examples
  • Skills
  • Contact
How to Turn AI Sources into an Automated Writing Pipeline
2026/03/18

How to Turn AI Sources into an Automated Writing Pipeline

A workflow guide to AI source automation, covering fetching, filtering, deduplication, validation, and topic-card generation for writing systems.

Many AI content workflows start at the wrong end.

They begin with "generate the article automatically" before stabilizing the information flow that feeds the article. The result is predictable:

  • too many duplicated links
  • too many noisy items and too little prioritization
  • mixed-quality sources with weak verification
  • articles that feel assembled instead of selected

A better sequence is the reverse:

automate the source flow first, then automate the article flow.

This article uses a public AI newsletter fetcher as a concrete example. It does several things that matter:

  • fetches 20+ RSS sources
  • fetches Hacker News
  • fetches GitHub Trending
  • calls a HuggingFace paper fetcher
  • filters by AI-related keywords
  • validates GitHub results again with README content

Reference:

  • bozhouDev/bozhou-skills: fetch_ai_news.py

The core judgment: the bottleneck is usually filtering, not writing

If you already write AI-related WeChat content, the pattern becomes obvious quickly:

  • getting a first draft is not the hard part
  • filtering a large source pool into items worth writing about is harder

That means the first part worth standardizing is usually not the prose template. It is the upstream information flow.

A practical pipeline should include at least five layers:

  1. fetching
  2. filtering
  3. deduplication
  4. validation
  5. topic output

If you only solve layer one, you still end up with noise.

1. Fetching: do not depend on one interface type

One good design choice in the reference script is that it does not treat all sources as one category.

It separates:

  • RSS for stable subscriptions
  • Hacker News for community discussion
  • GitHub Trending for project discovery
  • HuggingFace papers for research input

That matters because the input layer is already multi-channel.

The immediate benefit is structural:

  • release news does not get mixed with project momentum
  • research updates are not drowned by discussion heat
  • later ranking can apply different weights by source class

2. Filtering: keep irrelevant material out early

The script uses an AI keyword filter that includes terms such as:

  • AI, LLM, GPT, Claude, Agent, RAG
  • DeepSeek, Gemini, Llama, MCP
  • Embedding, Vector DB, LoRA, vLLM, GGUF

That highlights an important rule:

the point of source automation is not to capture everything. It is to reject obviously irrelevant material early.

But keyword filtering alone is not enough.

It still has two weaknesses:

  • some relevant items do not expose the right keywords directly
  • some low-value items happen to match the keywords anyway

So keyword filtering works best as a first-pass screen, not a final relevance decision.

3. Deduplication: otherwise you mistake repetition for value

The Hacker News part of the script deduplicates by HN discussion URL. That is a small detail with real editorial value.

Without deduplication, the system creates a false signal:

  • the same event appears in multiple places
  • repeated visibility gets mistaken for topic value
  • the output starts to look like trend chasing instead of topic selection

A steadier method is to split deduplication into two layers.

1. URL-level deduplication

The same link should only appear once.

2. topic-level deduplication

The same launch, model update, or project release should have one primary entry plus a few supporting discussions, not a pile of near-duplicates.

That is how a source feed starts to look like editorial input rather than log output.

4. Validation: do not confuse "looks like AI" with "is worth tracking"

One of the most useful parts of the reference script is the GitHub validation step:

  1. check whether the repo name or short description matches AI terms
  2. fetch the README
  3. check the README content again

That logic is worth copying.

Many automated pipelines fail because they trust title-level signals too much:

  • the name looks relevant
  • the description sounds relevant
  • the actual project still does not matter

README validation is really doing one thing:

do not stop at headline-level relevance. Verify body-level relevance.

The same principle applies beyond GitHub. Blog posts, newsletters, and release notes also benefit from a second-pass validation step.

5. Topic output: do not stop at raw entries

Many pipelines stop once they have clean JSON.

That is not enough for a writing workflow.

A WeChat article does not need raw entries. It needs topic candidates.

The output should answer questions such as:

  • what category the item belongs to
  • who it matters to
  • why it matters now
  • whether it should become a brief, a commentary piece, a tutorial, or a selection article

In other words, the raw entry still needs one more transformation into a topic card.

A stronger output structure for writing systems

If the goal is to feed a writing workflow, useful output fields include:

  • source
  • category
  • title
  • url
  • time
  • summary
  • why_it_matters
  • best_for
  • story_angle

The last three fields are the important ones.

why_it_matters

Why this item is worth attention.

best_for

Which audience or account type it is most relevant to.

story_angle

Whether the item fits better as:

  • a brief
  • a commentary
  • a tool recommendation
  • a tutorial
  • a selection or comparison piece

Once the pipeline produces that layer, it starts serving writing instead of just collection.

If you want this to work in production, add three more rules

1. Source weighting

For example:

  • official blogs should usually outrank second-hand retellings
  • GitHub items that pass README validation should outrank name-only matches
  • topics with sustained discussion should outrank one-off spikes

2. Time windows

The reference script exposes --hours, which is more important than it first appears.

For WeChat writing, the goal is not always "newest possible."
The goal is often:

  • keep items fresh enough to matter
  • avoid dragging old items back into the daily queue

3. Output limits

The script also exposes --limit. This is not just an engineering detail. It is editorial discipline.

If the daily candidate set is too large, the pipeline produces clutter. A better pattern is:

  • collect broadly
  • shortlist narrowly

A practical path inside the current project context

If this logic is applied to the md2wechat Agent API content flow, the order becomes:

  1. fetch AI sources
  2. apply source-type and keyword filtering
  3. validate projects and articles again
  4. convert the result into topic cards
  5. ask the agent to draft the article
  6. then continue into formatting, drafts, and publishing

That order is much steadier than asking the model to write first and justify later.

Closing thought

Turning AI sources into an automated writing pipeline is not mainly a collection problem. It is an editorial problem moved upstream.

The real goal is not only to fetch content. The goal is to make source material:

  • filterable
  • verifiable
  • rankable
  • convertible into topics

Once that part is stable, prompts, drafts, formatting, and publishing become much easier to automate without losing quality.

If you want the adjacent pieces next, continue with:

  • How to Find High-Quality AI Sources for WeChat Writing
  • How to Remove AI Tone from WeChat Official Account Articles
All Posts

Author

avatar for geekjourney
geekjourney

Categories

  • Workflow
The core judgment: the bottleneck is usually filtering, not writing1. Fetching: do not depend on one interface type2. Filtering: keep irrelevant material out early3. Deduplication: otherwise you mistake repetition for value1. URL-level deduplication2. topic-level deduplication4. Validation: do not confuse "looks like AI" with "is worth tracking"5. Topic output: do not stop at raw entriesA stronger output structure for writing systemswhy_it_mattersbest_forstory_angleIf you want this to work in production, add three more rules1. Source weighting2. Time windows3. Output limitsA practical path inside the current project contextClosing thought

More Posts

obsidian-md2wechat: Convert Obsidian Notes into WeChat Layouts
Integration

obsidian-md2wechat: Convert Obsidian Notes into WeChat Layouts

An overview of the obsidian-md2wechat plugin, including installation, theme support, and the Obsidian-first workflow it fits.

avatar for geekjourney
geekjourney
2026/03/12
feishu-md2wechat: Convert Feishu Docs into WeChat Layouts
Integration

feishu-md2wechat: Convert Feishu Docs into WeChat Layouts

An overview of feishu-md2wechat, including fit, export path, and where it sits in a Feishu-first publishing workflow.

avatar for geekjourney
geekjourney
2026/03/12
How to Create a WeChat Draft from Generated Content
Workflow

How to Create a WeChat Draft from Generated Content

A step-by-step guide to moving generated content through formatting, image handling, and draft creation for WeChat.

avatar for geekjourney
geekjourney
2026/03/12

Newsletter

Join the community

Subscribe to our newsletter for the latest news and updates

md2wechat Agent API markmd2wechat Agent API

The official md2wechat API, docs, CLI, and skill entry points for Markdown to WeChat publishing workflows.

GitHubX (Twitter)
Product
  • Quickstart
  • Themes
  • Pricing
  • Contact
Docs
  • Auth
  • OpenAPI
  • llms.txt
Ecosystem
  • Examples
  • Skills
  • md2wechat-lite
© 2026 md2wechat Agent API. All Rights Reserved.