editorial-system

Backend Architecture

Bootstrap and Request Lifecycle

  1. Entry point – Every public script under /api begins by requiring includes/init.php.
  2. Environment guardsinit.php enforces PHP 8+, blocks API traffic when the global ONLINE constant is false, and answers CORS preflight requests.
  3. Authentication – HTTP Basic Auth credentials from Config::getAdminLogin() are required for API requests. CLI scripts (e.g., cron, wp-backfill) skip this gate via is_cli_request(). Slack webhook endpoints opt out by defining IS_PUBLIC_API.
  4. Configurationinit.php loads includes/config.php. The project ships includes/config.example.php; copy it to config.php and fill in database credentials, login info, OpenAI token, Firecrawl API key, WordPress origin URL, and the Slack-compatible webhook.
  5. Common helpersinit.php defines json_response(), json_line(), read_json_body(), and a tiny resolve_url() utility that make the endpoint scripts concise.

Configuration Objects

includes/config-schema.php declares the strongly-typed objects returned by Config:

Database Schema

includes/model.php contains a DB_SCHEMA SQL string that is executed via mysqli::multi_query when the Model is constructed. The most important tables are:

Table Purpose Key Columns / Constraints
users Editorial staff list name unique index; referenced by sources.author_id.
sources Manually curated links awaiting processing state (working/done/aborted), updated_date, content_md, foreign key to users.
keywords Canonical keyword strings Unique keyword; reused by both sources and WP posts.
sources_keywords_v2 Many-to-many link between sources and keywords Unique (source_id, keyword_id) pairs.
wp_authors/wp_posts Mirror of the external WordPress site wp_posts.title is stored already HTML-escaped; downstream readers must decode before rendering.
task_groups/tasks Background fact/editorial check orchestration task_groups.reference_url, tasks.task_result_json, tasks.running_time_s.
logs Structured application logs component, log_json, date_s.
wp_tags, wp_categories, join tables Term metadata from WordPress Keep tag/category associations.
wp_posts_keywords Keywords derived per imported WP post Maintained by replace_wp_post_keywords().

Because the schema is seeded on every Model instantiation, onboarding a new database only requires granting DDL permissions.

Compatibility note: the schema uses MySQL/MariaDB features that can be version-sensitive (for example, CREATE INDEX IF NOT EXISTS and defaults on TEXT/BLOB-like columns). This is confirmed to work on MariaDB 10.5.x, but may require edits for older MariaDB/MySQL versions.

Model Responsibilities

The Model class in includes/model.php encapsulates all data work and orchestrates external services:

WordPress Helpers

includes/wordpress.php supplies pure-PHP sanitizers:

WordPress Listings

Background Jobs

API Surface

Every file under /api is intentionally tiny. The table below summarizes behavior:

Endpoint Methods Summary
users.php GET, POST, PATCH List, create, or rename users. Newly created users immediately appear in the SPA author selector.
user-counts.php GET Returns per-state source counts for one user (used to enforce WORKING_SOURCES_LIMIT).
sources.php GET/POST/PATCH Multi-mode endpoint: fetch by source_id, dedupe check (url+keywords+state), list by author_id, create sources (with Slack webhook + keyword array), or update state/content.
search-sources.php GET Keyword search against sources for a given workflow state.
keywords.php GET List all keywords with usage counts.
keyword-counts.php GET Return working/done counts for a single keyword.
search-keywords.php GET Free-text keyword search, returning canonical keyword strings.
detect-keywords.php POST Runs OpenAI keyword extraction on title/comment pairs and returns {keywords, title_ja}.
crawl.php POST Scrapes a URL via Firecrawl and responds with {md_content, title, description}.
fact-check.php POST Creates a fact/editorial background check task group from raw text.
task-groups.php GET Lists task groups or fetches a single task group (with tasks + results).
logs.php GET Returns structured log rows from the logs table.
wp-authors.php GET List mirrored WordPress authors (optional limit/offset) or fetch a specific author via id; responses include slug, name, description, and job title.
wp-post.php GET Fetch a single mirrored WordPress post by id or search posts by keywords; emits hydrated tags/categories/keywords.
wp-posts.php GET List mirrored WordPress posts for a specific author_id with optional limit/offset pagination (max 100 rows). Response matches the hydrated structure from wp-post.php.
background-check.php CLI Worker invoked by Model::background_check(); processes task groups for a given provider.
wp-backfill.php CLI Imports WordPress posts in bulk; not callable over HTTP.
cron.php CLI / POST Runs registered cron jobs (currently WordPress backfill + keyword extraction for new posts).
slack-command.php POST Slack slash-command integration that triggers WordPress checks and replies with a results URL. Public endpoint guarded by a verification token.
slack-events.php POST Slack event endpoint (STUB: currently URL verification + SSL check only). Public endpoint guarded by a verification token.

All endpoints rely on the helper parsers in sources.php/search-sources.php for coercing keyword arrays, integer ids, and enum-style state inputs. Any new endpoint should follow that pattern: validate early, delegate to Model, and wrap the response via json_response().

Error Handling & Responses

Extending the Backend Safely

  1. Add schema changes by editing DB_SCHEMA (mind existing data) and reinitializing; consider running migrations manually in production as automated DDL may fail without permissions.
  2. For new REST features, create a PHP file under /api, require init.php, guard HTTP methods, and call new Model methods.
  3. When persisting WordPress data, continue storing title.rendered as-is and decode only at the presentation layer.
  4. Keep OpenAI/Firecrawl access centralized inside Model/FirecrawlConfig so secrets remain in Config objects.