The week I stopped my schema firing twice

I run this site as a working laboratory. It’s where I try the patterns I’d recommend to a client before I’d recommend them, and it’s where I catch the things a real audit would catch on someone else’s WordPress install. This past week was a renovation week — five days of working on the house, not on anyone else’s. Three things came out of it that are worth writing down, mostly so I can point to them later when a client hits the same wall.

The schema was firing two primary types on the same page

Here’s the bug. On every portfolio entry, every testimonial, every download, every tool, and every event, the JSON-LD payload contained an Article object and the post type’s own primary schema (CreativeWork for portfolio, Review for testimonial, and so on). Two primary types. One page. Search engines pick one and discard the other, but they don’t tell you which one — and the one they pick is rarely the one you’d want.

The cause was lazy emitter logic. The Article branch was running on anything with a post_date, and a custom post type has a post_date. So every CPT was getting Article on top of its own type. Underneath that, a second collision: BreadcrumbList and CollectionPage were both being emitted twice on archive pages, because the plugin emitter and the theme emitter were both hooked into wp_head without knowing about each other.

The fix was a single source of truth for “what is this page primarily?” — a small function that returns one of Article, CreativeWork, Review, SoftwareApplication, Event, WebPage, or CollectionPage based on context. Every emitter calls that function first, and the Article branch suppresses itself when something else owns the page. Breadcrumbs deduplicate against a request-scoped flag.

If you’re running JSON-LD across a CPT-heavy site, this is the check: view source on any CPT single, search for "@type", and count. If you get more than one primary type per page, you have the same bug I had. The fix is a Tuesday afternoon. The traffic effect won’t be a spike — it’ll be a slow tightening of which page ranks for which intent over the next two index cycles.

While I was in there I also fixed a Person.image with the wrong dimensions, scrubbed the JSON-LD payloads on save and on render so an editor pasting raw HTML into a meta field can’t break the schema, added a contactPoint cross-reference to the LocalBusiness emitter sitewide, and gave the past-speaking template its own schema branch with the FAQ and CTA blocks suppressed (they were inheriting from the live-event template and didn’t belong on archived talks). That last one closes a flag from the April 30 audit run.

Wayback Machine as a content-recovery source

I’ve been on this domain since 1996, which means there’s a long tail of posts whose post_content went missing across various migrations, host moves, and one bad export-import in roughly 2014. The Internet Archive has copies of most of them. Phase 1 of a recovery pass landed this week: 693 archived URLs targeted, a mu-plugin that batches the Wayback fetch with rate-limiting and a deploy script that runs the import in chunks small enough not to time out on shared infrastructure.

The pass also produced a Postmedia portfolio rebuild — eleven entries restored from 2014 archive snapshots, with the original feature images pulled from the same crawl, and the surrounding context reconstructed (the “first-in-Canada” anchor on the Sun chain rollout, the child-theme architecture, the WordPress VIP performance work). That’s content I’d lost, not content I’d never written. The distinction matters: I’m not back-dating fiction, I’m restoring authorship that was already on the public record.

Five legacy posts on the dev environment had their post_content recovered cleanly. One of them (post 1851) got mangled in the dev recovery and I had to pull the production copy back over it — a reminder that “dev is for experiments” only works if the experiments are reversible, and content recovery isn’t reversible without a backup of the backup. The full pre-mutation snapshot is now in cold storage, separate from the daily backups.

The remainder of the 41-post historical batch is split between “already present, no recovery needed” and “unresolved, no Wayback capture exists.” The unresolved ones get an editor’s note and a 410 Gone response, not a fabricated body.

A 404 handler that turns dead URLs into a consolidation funnel

Thirty years of URL changes leaves a lot of dead pockets. Old portfolio CPT slugs (the wpshadow_* family from a previous architecture), legacy case-studies archive URLs, post-tag-as-glossary collisions where the canonical glossary taxonomy now wins the route — every one of those is a place where someone with a stale link, a stale bookmark, or a stale Google result hits a wall.

The new 404 handler does one specific thing: before WordPress renders the 404 template, a small router checks the requested path against a registry of known legacy patterns and routes the request to the closest live equivalent. /case-studies/postmedia-something/ goes to the Postmedia portfolio entry. /wpshadow_thing/ goes to the WPShadow plugin page. A glossary post_tag URL that’s been superseded by a glossary_term at the same slug routes to the canonical one.

The pattern I want to write down: a 404 isn’t a failure mode, it’s a signal that you have a routing decision to make. Either the content exists somewhere else (302 it), the content has been deliberately retired (410 it with a real explanation, not a generic “not found”), or the URL was wrong in the first place and the registry has the right one (301 it). The default 404 page is a fourth option that should be reserved for genuine “this URL never meant anything” — which, on a site you’ve owned for thirty years, is a smaller set than you’d think.

Also this week

The /blog/ archive got a real rebuild — mobile grid, scroll-reveal, card design, an information architecture that distinguishes evergreen from timely. It was the laziest page on the site for a long time.
Footer “Where I Work” became a disclosure pattern that retains all eleven cities without taking eleven lines of footer real estate. Closes audit issue #84.
The orphan-recovery pass added internal links from the footer locations hub, sub-topic chips, testimonial singles, and the /donate/ and /methodology/ pages — and stopped the link-stripper from quietly killing internal links that pointed at archive and taxonomy URLs (which it shouldn’t have been doing in the first place).
Glossary migrated off legacy post_tag entries onto the canonical glossary_term taxonomy, and the missing helper that was 500ing on a couple of term pages got written.
CSS got tokens-and-base extraction, a fetchpriority hint on the LCP image, and a critical-CSS scaffold. A dead-rule scanner shipped behind it so I can keep the stylesheet honest as it grows.
Every posts_per_page=-1 in the codebase got a sensible upper bound. Unbounded queries are a “works on dev, dies on prod” pattern and I’d let too many of them accumulate.
A /dev-link-check slash command for the local tooling — a sitemap-wide link audit that runs against the dev environment before I push anything to production. Catches 4xx, 5xx, and redirect chains. The intent is that production never sees a broken internal link because dev caught it first.
The thisismyurl-svg-support plugin shipped an on-upload sanitizer that strips <script>, event handlers, javascript: URIs, and the foreignObject HTML escape hatch. The plugin had been claiming “safely enable SVG uploads” while doing none of that — a marketing-copy/code mismatch I’m not willing to keep public. Seven of seven assertions pass on the malicious-payload test.
The GitHub org page went from twenty repos to eleven active. Same decision pattern as the 404 handler — a stub plugin that claimed functionality it didn’t have, an inherited htaccess plugin that could brick a site on activation, a create_function()-fatal-on-PHP-8 plugin marked “adopt me” for years, plus six legacy snippet plugins now maintained on .org alone. Each one archived with a notice that explains why, not a quiet still-here-still-half-broken.
Three co-maintained legacy plugins — wp-title-case, auto-copyright-1, random-page-redirect-for-wordpress — got proper READMEs that tell the actual lineage. I built two of them in 2008, handed them to Phill Coxon in 2016, and they’ve quietly come back home; the third I picked up after its original authors went quiet. Same plugins, accurate story.

The thread under all of this, if there is one, is that a site you’ve owned for a long time accumulates archaeological layers, and most of the work of running it well is reading those layers honestly. Not every old post deserves a refresh. Not every dead URL deserves a redirect. But every one of them deserves a decision, and the decision should be visible in the code, not implicit in the silence of a 404.

Product names referenced on this page — including WordPress and GitHub — are trademarks or registered trademarks of their respective owners. Training offered here is independent and is not affiliated with, endorsed by, or sponsored by any of these companies.

The week I stopped my schema firing twice

The schema was firing two primary types on the same page

Wayback Machine as a content-recovery source

A 404 handler that turns dead URLs into a consolidation funnel

Also this week

Work with Christopher

Related work

Postmedia WordPress VIP Migration: Eleven Papers, One Parent Theme

EmDash to WordPress Migration

M.L. Campbell’s Distributor Training Center: A Custom LearnDash LMS