/api begins by requiring includes/init.php.init.php enforces PHP 8+, blocks API traffic when the global ONLINE constant is false, and answers CORS preflight requests.Config::getAdminLogin() are required for API requests. CLI scripts (e.g., cron, wp-backfill) skip this gate via is_cli_request(). Slack webhook endpoints opt out by defining IS_PUBLIC_API.init.php loads includes/config.php. The project ships includes/config.example.php; copy it to config.php and fill in database credentials, login info, OpenAI token, Firecrawl API key, WordPress origin URL, and the Slack-compatible webhook.init.php defines json_response(), json_line(), read_json_body(), and a tiny resolve_url() utility that make the endpoint scripts concise.includes/config-schema.php declares the strongly-typed objects returned by Config:
MysqlConfig – host/user/password/database information for the primary MySQL instance.LoginConfig – HTTP Basic username/password (currently stored in plaintext; consider hashing if expanding auth).WordPressConfig – WordPress origin, timezone helpers, and back_fill() which hits /wp-json/wp/v2/posts with _embed to hydrate WordPressPost instances.FirecrawlConfig – wraps the Firecrawl v2 scrape API (Markdown + metadata normalization) used by /api/crawl.php.OpenAiConfig – client for the OpenAI Responses API, used for keyword extraction and checks.AnthropicConfig – client for Anthropic Messages API, used for background checks alongside OpenAI.ConfigSchema – abstract base that includes/config.php must implement; includes/config.example.php shows the expected overrides.includes/model.php contains a DB_SCHEMA SQL string that is executed via mysqli::multi_query when the Model is constructed. The most important tables are:
| Table | Purpose | Key Columns / Constraints |
|---|---|---|
users |
Editorial staff list | name unique index; referenced by sources.author_id. |
sources |
Manually curated links awaiting processing | state (working/done/aborted), updated_date, content_md, foreign key to users. |
keywords |
Canonical keyword strings | Unique keyword; reused by both sources and WP posts. |
sources_keywords_v2 |
Many-to-many link between sources and keywords | Unique (source_id, keyword_id) pairs. |
wp_authors/wp_posts |
Mirror of the external WordPress site | wp_posts.title is stored already HTML-escaped; downstream readers must decode before rendering. |
task_groups/tasks |
Background fact/editorial check orchestration | task_groups.reference_url, tasks.task_result_json, tasks.running_time_s. |
logs |
Structured application logs | component, log_json, date_s. |
wp_tags, wp_categories, join tables |
Term metadata from WordPress | Keep tag/category associations. |
wp_posts_keywords |
Keywords derived per imported WP post | Maintained by replace_wp_post_keywords(). |
Because the schema is seeded on every Model instantiation, onboarding a new database only requires granting DDL permissions.
Compatibility note: the schema uses MySQL/MariaDB features that can be version-sensitive (for example, CREATE INDEX IF NOT EXISTS and defaults on TEXT/BLOB-like columns). This is confirmed to work on MariaDB 10.5.x, but may require edits for older MariaDB/MySQL versions.
The Model class in includes/model.php encapsulates all data work and orchestrates external services:
get_users(), add_user(), update_user(), get_user_by_id(), get_user_state_counts().get_sources(), get_source_by_id(), add_source(), update_source_content(), change_source_state()), dedupe logic (check_sources()), keyword joins, and keyword search (search_sources() / search_keywords()).detect_keywords() calls OpenAI with a structured schema for Japanese & English keywords plus a localized title, then normalizes casing, deduplicates, and lowercases hyphenated forms before persisting.fact_check() and editorial_check() run structured calls against OpenAI or Anthropic. background_check() creates a task group and spawns two background tasks (one per provider) via api/background-check.php.get_wordpress_author(), list_wordpress_authors(), get_wordpress_post(), search_wp_posts(), list_wp_posts_by_author(), hydrate_wp_post_terms(), and upsert_wordpress_posts() manage authors, posts, and associated terms. Keyword enrichment for WP content relies on the shared keywords table (replace_wp_post_keywords())./api/crawl.php, the API defers to FirecrawlConfig::scrape() and stores Markdown in sources.content_md.includes/wordpress.php supplies pure-PHP sanitizers:
html_to_safe_markdown() and node_to_md() convert scraped HTML into allowlisted Markdown (dropping scripts/styles) so editors can copy clean text.sanitize_url() rejects non-http(s)/mailto schemes when parsing user content.WordPressConfig::back_fill() receives already escaped title.rendered strings from WordPress, the backend stores them verbatim in wp_posts.title. All consumers (see resolveWordPressTitle() in main.js) must HTML-decode prior to rendering./api/wp-posts.php exposes an author-focused listing endpoint. Provide author_id (required) plus optional limit and offset query parameters; the endpoint clamps limit to 100 rows and never allows negative offsets.Model::list_wp_posts_by_author() powers the endpoint, internally reusing hydrate_wp_post_terms() so every post includes tags, categories, and keyword arrays just like /api/wp-post.php.author_id, limit, and offset echoes to keep pagination logic simple for SPA callers.includes/cron-tasks.php declares a lightweight Cron registry. Jobs call Cron::register_job() and are executed in order via Cron::run_all_jobs().api/cron.php registers a job that first syncs WordPress authors via /wp-json/wp/v2/users (updating wp_authors), then paginates through remote WordPress posts newer than get_yesterday_midnight() (currently 120 days back), upserts posts/terms, and runs keyword detection for new posts. The endpoint must be invoked via CLI or an authenticated POST.api/wp-backfill.php is CLI-only and performs a historical import starting from the page index that you pass as $argv[1].Every file under /api is intentionally tiny. The table below summarizes behavior:
| Endpoint | Methods | Summary |
|---|---|---|
users.php |
GET, POST, PATCH |
List, create, or rename users. Newly created users immediately appear in the SPA author selector. |
user-counts.php |
GET |
Returns per-state source counts for one user (used to enforce WORKING_SOURCES_LIMIT). |
sources.php |
GET/POST/PATCH |
Multi-mode endpoint: fetch by source_id, dedupe check (url+keywords+state), list by author_id, create sources (with Slack webhook + keyword array), or update state/content. |
search-sources.php |
GET |
Keyword search against sources for a given workflow state. |
keywords.php |
GET |
List all keywords with usage counts. |
keyword-counts.php |
GET |
Return working/done counts for a single keyword. |
search-keywords.php |
GET |
Free-text keyword search, returning canonical keyword strings. |
detect-keywords.php |
POST |
Runs OpenAI keyword extraction on title/comment pairs and returns {keywords, title_ja}. |
crawl.php |
POST |
Scrapes a URL via Firecrawl and responds with {md_content, title, description}. |
fact-check.php |
POST |
Creates a fact/editorial background check task group from raw text. |
task-groups.php |
GET |
Lists task groups or fetches a single task group (with tasks + results). |
logs.php |
GET |
Returns structured log rows from the logs table. |
wp-authors.php |
GET |
List mirrored WordPress authors (optional limit/offset) or fetch a specific author via id; responses include slug, name, description, and job title. |
wp-post.php |
GET |
Fetch a single mirrored WordPress post by id or search posts by keywords; emits hydrated tags/categories/keywords. |
wp-posts.php |
GET |
List mirrored WordPress posts for a specific author_id with optional limit/offset pagination (max 100 rows). Response matches the hydrated structure from wp-post.php. |
background-check.php |
CLI | Worker invoked by Model::background_check(); processes task groups for a given provider. |
wp-backfill.php |
CLI | Imports WordPress posts in bulk; not callable over HTTP. |
cron.php |
CLI / POST | Runs registered cron jobs (currently WordPress backfill + keyword extraction for new posts). |
slack-command.php |
POST |
Slack slash-command integration that triggers WordPress checks and replies with a results URL. Public endpoint guarded by a verification token. |
slack-events.php |
POST |
Slack event endpoint (STUB: currently URL verification + SSL check only). Public endpoint guarded by a verification token. |
All endpoints rely on the helper parsers in sources.php/search-sources.php for coercing keyword arrays, integer ids, and enum-style state inputs. Any new endpoint should follow that pattern: validate early, delegate to Model, and wrap the response via json_response().
{ "error": "message" } with HTTP 4xx/5xx status codes.json_line() so callers can monitor progress.DB_SCHEMA (mind existing data) and reinitializing; consider running migrations manually in production as automated DDL may fail without permissions./api, require init.php, guard HTTP methods, and call new Model methods.title.rendered as-is and decode only at the presentation layer.Model/FirecrawlConfig so secrets remain in Config objects.