Holding a Lighthouse 95+ score on a fresh project is easy. Holding it on a production site for two years through redesigns, third-party scripts, A/B tests, and team changes is the actual problem. This post is a real story (anonymized) of how we kept a client site green for 18 months and what we learned.
TL;DR
Lighthouse 95+ in production is achievable but requires three things: a perf budget enforced in CI, third-party script discipline, and a team culture that treats perf regressions as P1 bugs. Without all three, scores drift down 3-5 points per quarter from accumulated entropy. With them, you stay green indefinitely — even through redesigns.
The problem with "build-time 100"
Lighthouse 100 in your local dev branch is meaningless. The score that matters is the field data score (CrUX) at P75 mobile. We’ve seen sites scoring 98 in Lighthouse CI hit 65 in real-world CrUX because: real users have slow phones, real networks are slower than throttled simulations, real users have third-party scripts loaded by other tabs, and real users hit pages with non-cached states.
Optimize against field data via Real User Monitoring (PostHog, Vercel Speed Insights, Cloudflare Analytics). Lighthouse becomes a regression detector, not the source of truth. Field data is what Google ranks you on.
Where production scores actually drop
- Third-party scripts added "just for this campaign" and never removed (chat widgets, analytics, ad pixels)
- Image dimensions set wrong as the design evolved — 4000px served into 800px slot
- Web fonts swapped without re-doing the preload + font-display dance
- A/B testing libraries that block the main thread for 200-400ms
- New JS feature added without checking bundle impact (one big icon library can add 200KB)
- Cache-busting deployments that invalidate everything instead of just changed files
- Third-party iframes (YouTube, TypeForm, ad networks) injected without lazy loading
The five things that consistently work
Perf budget enforced in CI
size-limit on JS bundles per route, with a hard ceiling. Lighthouse CI in the pipeline with thresholds (LCP < 2.5s, CLS < 0.1, INP < 200ms, total score > 95). PRs that violate are blocked — not "warned about", blocked. The discipline is in the enforcement, not the goal.
Third-party script discipline
A documented list of allowed third-party scripts with a justification per entry. New script requires a PR that includes the perf impact measurement. Most "harmless" scripts add 50-200ms of INP and 10-30KB of bundle. Multiplied by 5 scripts that snuck in over a year, you’re a degraded site.
Image discipline
All images go through Next/Image, Astro Image, or equivalent. Width/height attributes always specified. Lazy-load below-the-fold by default. Hero images preloaded with fetchpriority="high". WebP/AVIF served, with appropriate fallbacks. Done right, images contribute almost nothing to LCP issues; done wrong, they’re the #1 cause.
Font discipline
Self-hosted fonts only. font-display: swap. Preload only the font used in LCP text — typically one weight of one face. Variable fonts where they make sense (one file, multiple weights). Subset to needed glyphs (Latin + your audience’s scripts) — saves 50-200KB easily.
Quarterly performance audits
Once per quarter, the team sets aside half a day to run a comprehensive audit: WebPageTest on a mid-tier mobile profile, manual scroll/click testing, third-party script audit, image weight check. Findings become PRs the same week. Without this, drift is silent — Lighthouse CI catches regressions but not the slow accumulation that field data shows.
A real client story
A B2C content site we maintained for 18 months stayed Lighthouse 95+ across 3 redesigns and 200+ deploys. Field data CrUX P75 mobile averaged: LCP 1.8s, CLS 0.04, INP 130ms. Here’s what kept it green:
- Lighthouse CI in the pipeline with hard thresholds — blocked 4 PRs in 18 months that would have regressed perf
- Quarterly audits caught: a chat widget added 110ms INP (removed), a font swap caused 18% LCP regression (rolled back), a new analytics tag was duplicated (deduplicated)
- Image discipline: one designer-engineer pair handled all hero images, ran weight check before merge — never had an image regression
- Third-party script gate: 7 requests denied in 18 months, 3 approved with mitigation (defer load, lazy mount, etc.)
- Real User Monitoring via PostHog with weekly dashboard review — surfaced 2 regressions before Lighthouse CI did
Cost: roughly 2-3 engineering hours per week on perf maintenance, plus the half-day quarterly audits. Net business impact: Search Console showed +18% organic traffic over 18 months that the marketing team attributed primarily to consistent CWV greens.
Tooling we automate at Schedars
- Lighthouse CI: GitHub Action, blocks PR below thresholds
- size-limit: GitHub Action, blocks PR if JS bundle grows >5%
- Web Vitals JS: reports field data to PostHog with weekly P75 alerts
- Pa11y + axe: accessibility regressions also blocked
- Visual regression: Playwright snapshots on top-5 pages
Bottom line
Lighthouse 95+ in production is sustainable when it’s engineered into the workflow, not negotiated case-by-case. The teams that drift have well-meaning intentions and no enforcement. The teams that hold green have CI gates, quarterly audits, and a culture where perf regressions are P1 bugs. The work isn’t glamorous, but it pays in organic traffic for years.
Auditing your site’s perf or setting up enforcement for the first time? Send us the URL — we’ll send back a prioritized fix list within a day.