Operations Guide
Prerequisites
- PHP: Version 8.0 or newer with
mysqli, curl, and json extensions enabled.
- MySQL / MariaDB: The MySQL user configured in
MysqlConfig needs privileges to create tables/indexes inside the target schema.
- Web server: Any server capable of executing PHP files (Apache, nginx+php-fpm, Caddy, or simply
php -S).
- External tokens:
- Firecrawl API key for
/api/crawl.php.
- OpenAI API token for keyword extraction.
- Anthropic API token for background checks.
- Slack (or compatible) webhook URL for new source notifications.
- Slack verification token for
slack-command.php / slack-events.php.
- Optional webhook URL for fact-check error notifications.
- WordPress origin URL with REST API access for backfill jobs.
Initial Setup
- Copy configuration: Duplicate
includes/config.example.php to includes/config.php and fill in MySQL, admin login, OpenAI, Firecrawl, webhook, and WordPress settings.
- Toggle availability: The
ONLINE constant (defined near the top of config.php) can disable every API endpoint when set to false—handy for maintenance windows.
- Launch server: From project root, run
php -S 127.0.0.1:8000 (or configure your preferred web server) and browse to the host. The SPA will prompt for Basic Auth credentials and store them in localStorage.
- Seed users: The UI is empty until you create at least one user via the “ユーザー” page; this populates the author dropdown used when logging new sources.
Database Management
- Schema creation happens automatically whenever
new Model(...) runs, thanks to the DB_SCHEMA constant in includes/model.php. No manual migrations are required during development.
- For production, ensure the configured MySQL account has
CREATE TABLE, ALTER, and CREATE INDEX privileges; otherwise the automatic bootstrap will fail.
sources.updated_date tracks last mutation time in Unix seconds, which the frontend uses for timestamps.
- Compatibility note:
DB_SCHEMA is verified on MariaDB 10.5.x. Other MySQL/MariaDB versions may need adjustments (for example, CREATE INDEX IF NOT EXISTS support or defaults on TEXT/BLOB-like columns).
Running Background Jobs
Scheduled WordPress Sync (api/cron.php)
- Invoke via CLI (
php api/cron.php) or send an authenticated POST request.
- The job loops through WordPress pages (up to 100 posts per API call) newer than
WordPressConfig::get_yesterday_midnight() (currently “120 days ago, midnight, WordPress TZ”).
- Each iteration upserts authors/posts/terms and runs keyword detection for posts that were new to the local database.
- Each new WordPress post also triggers a background fact/editorial check and stores results under the task group list.
- Progress is streamed via
json_line() so you can log post_processed_count / posts_new_count as the job advances.
Full Backfill (api/wp-backfill.php)
- CLI-only script meant for initial historical imports.
- Usage:
php api/wp-backfill.php [start_page] (default 1).
- Loops up to one million pages or until the remote API returns fewer than one post.
- Keyword detection is run for each newly imported post (
detect_keywords(... )['keywords']).
Background Check Worker (api/background-check.php)
- CLI-only worker spawned by
Model::background_check() (used by the manual fact-check endpoints, WP checks, and cron).
- Requires
proc_open, setsid, and write access to sys_get_temp_dir() for the task JSON payloads and log file.
External Service Notes
- OpenAI –
Model::detect_keywords() and fact/editorial checks use the Responses API with JSON schema output. If you change models, ensure response schemas remain compatible.
- Anthropic – Background checks also run via Anthropic Messages API with JSON schema output.
- Firecrawl –
/api/crawl.php simply relays the Markdown/title/description returned by FirecrawlConfig::scrape(). Network failures raise 400 errors which the UI surfaces to editors.
- Slack/webhook – Source creation triggers a POST with plain text (
title, comment, url, author). Fact-check results can also post via a separate webhook when errors are detected. Leave the webhook blank to disable notifications.
- WordPress – Titles retrieved from the WP REST API arrive HTML-escaped and are stored verbatim. Frontend code decodes them before rendering; no operations action is required, but it is good to know when auditing data storage.
Operational Tips
- Rate limiting work in progress: The UI enforces
WORKING_SOURCES_LIMIT = 30 per author by checking /api/user-counts.php. If you adjust this constant, remind editors after deployment.
- Monitoring: Errors in cron/backfill scripts are logged with
json_line(['end_reason' => ...]) and also emitted via PHP warnings if thrown inside Cron::run_all_jobs().
- Maintenance mode: Set
ONLINE to false to return HTTP 503 from every API while keeping the UI shell available (handy for read-only access during upgrades).
- Security: Basic Auth is the only auth layer; rotate credentials by updating
Config::getAdminLogin() and restarting PHP-FPM (or clearing opcode caches) so the new values load. The SPA stores credentials in localStorage, so clear it on shared devices.