East Blue Design System
Assessment & Benchmark Methodology
The "why behind the why." Every rating, score, and level in the East Blue assessment system — what it is, how we arrived at it, and why it exists. When someone asks "why does East Blue score components this way?", this page is the answer.
Part 1

DS Health — The 4 Traits

We studied what makes Material, Atlassian, Carbon, and Polaris components successful across products, platforms, and teams. These 4 traits are what they all have in common. We didn't invent them — the industry's best systems proved them.

Summary

ReusableMaterial, Atlassian, and Polaris all share components across products and platforms. One-screen components are product-specific, not DS-grade.
Self-containedCarbon isolates components with no external dependencies. GCash components that relied on parent containers broke when moved.
ConsistentAtlassian enforces uniformity via shared tokens. GCash's Accordion used yes/no booleans while everything else used true/false.
ComposableCarbon's hierarchy and Material's nested architecture proved that systems scale when components snap together as building blocks.
TraitWhat it isHow we arrived at thisWhat it means in practice
Reusable Works across multiple screens and flows, not just one specific screen or feature. Material's Button works identically across Android, Web, and Flutter. Atlassian uses the same components across Jira, Confluence, Trello, and Bitbucket — wildly different products, same building blocks. The common thread: if it only works on one screen, it's not a component — it's a one-off. A GCash button must work in a payment flow, a settings page, and a modal without modification. You design it once and use it everywhere. If it can't do that, it belongs in a product library, not the core design system.
Self-contained Carries its own styles, states, and logic without depending on external setup. Carbon packages each component independently — teams pull only what they need without side effects. We saw GCash components that depended on parent containers to set their colors. When moved between screens, they broke. The pattern was clear: a component that depends on external setup breaks when moved. Drop the component into any screen and it works immediately. It doesn't need a parent wrapper. It manages its own pressed, disabled, loading states internally. For GCash with 90M+ users and multiple product teams, self-containment means no team can accidentally break another team's UI.
Consistent Naming conventions, property types, and state coverage match the rest of the design system. Atlassian's shared design tokens keep color, spacing, and elevation uniform across every product. When we assessed GCash's Accordion, we found it using yes/no boolean values while the rest of the system uses true/false. That inconsistency forced developers to handle special cases — a pattern we saw repeat across multiple components. Every component uses the same naming conventions, the same property types, the same state coverage. If buttons use true/false, checkboxes use true/false. No surprises, no special cases, no guessing.
Composable Can be nested inside other components and fits into the existing hierarchy. Carbon's hierarchy — Elements → Components → Patterns → Templates — shows exactly how things nest. Material's NavigationDrawer is built from ListItems, which are built from Icons + Text. The systems that scale are the ones where components snap together like building blocks. An icon sits inside a button. A button sits inside a card. A card sits inside a list. If a component can't be nested, designers have to rebuild it from scratch every time they need a new combination. For GCash, composability means a single well-built button powers every screen.
Part 2

Trait Ratings

Each of the 4 traits is rated independently. Why three levels? Because "partially works" needs different action than "completely broken." A component with inconsistent naming needs a 10-minute rename. A component with no states needs a redesign. The rating tells you how much work is ahead.

Summary

Pass ✅Needed a clear "done" signal so healthy traits stop getting revisited.
Needs work ⚠️Binary pass/fail treats a 10-minute rename the same as a full rebuild. Most real components land here.
Fail ❌Separates "fix it" from "rethink it." Fail items block all downstream work.
RatingWhat it isHow we arrived at thisWhat it means in practice
Pass ✅ Fully meets the trait. No issues found. You need a clear "done" signal. Without it, every component gets scrutinized endlessly. We created Pass to mean: stop looking for problems here — this trait is clean. The component fully satisfies this trait. No workarounds, no exceptions. Move on to the next trait or the next component.
Needs work ⚠️ Partially meets the trait. Specific, fixable gaps. Without a middle tier, everything becomes binary — ship or rebuild. That's too coarse. Most real components land here. GCash's Accordion was rated ⚠️ on Consistent because its booleans used yes/no instead of true/false. A 10-minute rename, not a redesign — but still a real issue. The intent is right, but execution has gaps. Maybe reusable in 3 out of 4 contexts. Maybe naming is consistent except for one property. These are targeted repairs, not fundamental problems.
Fail ❌ Does not meet the trait. Structural problem requiring rebuild. We needed to separate "needs a fix" from "needs a rethink." Fail items go first in the priority queue because they block everything downstream — you can't benchmark a component that fundamentally doesn't work. The component structurally can't satisfy this trait in its current form. A component that only works on one screen fails Reusable. A component with no states fails Self-contained. Not a tweak — a rebuild of that aspect.
Part 3

DS Health Verdicts

Why 6 verdicts? Because a component can "belong" in the DS but need very different kinds of work. Fix means tweak properties. Restructure means rebuild architecture. Consolidate means merge. Each verdict tells you the kind of investment required — not just "is it good or bad?"

Summary

KeepNeeded a "stop reviewing" signal for healthy components.
FixGCash Accordion had good structure but bad naming. One session fix, not a rebuild.
RestructureSeparated from Fix because restructuring takes a sprint vs 30 minutes.
ConsolidateGCash had multiple list components doing the same thing. Every duplicate doubles maintenance.
Product LayerA DS that includes everything becomes unmaintainable. Core stays lean.
RemoveDead components accumulate forever. Even unused ones confuse designers.
VerdictWhat it isHow we arrived at thisWhat it means in practice
Keep All 4 traits pass. Ship as-is. We needed a clear "don't touch this" signal. Without Keep, every component gets endlessly scrutinized. Keep means the DS Health work is done here. All 4 traits pass. Component is healthy. Move to native readiness assessment or benchmarking.
Fix Sound architecture, needs targeted repairs. GCash's Accordion had good structure but bad layer names and wrong boolean values. The architecture was sound — it just needed targeted repairs. We needed a verdict that says "the concept is right, the details are wrong." Quick wins — usually one Figma session. Rename properties, add missing states, fix token bindings. The component stays, the details get corrected.
Restructure Right purpose, wrong construction. Needs architectural rebuild. Some components do the right thing but are built wrong — variant structure makes extending impossible, or the layer hierarchy doesn't map to native code. Restructure is a bigger investment than Fix, so separating them helps with planning. The concept stays, the construction changes. A Fix takes 30 minutes. A Restructure takes a full sprint. Different budget, different timeline.
Consolidate Merge duplicates into one component. GCash had multiple list components doing the same thing. Every duplicate means every fix gets applied twice. We needed a verdict that says "merge these into one." Two components exist that do the same thing. One absorbs the other. The surviving component gets the best properties from both.
Product Layer Too feature-specific for core DS. Move to product library. A design system that includes everything becomes unmaintainable. Some components are real but too feature-specific (e.g. a loan calculator widget). They work, but only for one product. The core DS should stay lean. Move to a product-specific library. The team that owns the feature owns the component. Don't invest DS resources here.
Remove Redundant, deprecated, or doesn't belong. Delete it. Without Remove, dead components accumulate forever. Every component in the library has a maintenance cost — even unused ones confuse designers who find them. Remove keeps the library honest. Delete it. It's a duplicate, an outdated pattern, or something that was never a design system component (OS-level UI, ad infrastructure).
Part 4

Native Mobile Readiness — 7 Criteria (C1–C7)

GCash targets iOS (SwiftUI) and Android (Jetpack Compose). These 7 criteria are the minimum requirements for a Figma component to translate cleanly to native code. Each one exists because its absence causes a specific, known failure mode when developers try to build the component natively.

Summary

C1 Layer structureFound 8 layers named "Frame" in GCash Accordion. Messy layers produce messy code.
C2 Variant namingBad names force a translation layer between Figma and code. That layer is where bugs live.
C3 Token coverageHardcoded colors in a 90M-user app are a rebrand time bomb. Tokens update everywhere at once.
C4 Native mappabilityDiscovering a design is impossible to build after a sprint is the most expensive failure.
C5 Interaction statesMissing states are the #1 cause of "works in Figma, feels wrong in app."
C6 Asset qualityGCash's two-tone icons need programmatic tinting. PNGs can't do that.
C7 Code ConnectC1–C6 prepare the component. C7 completes the bridge. The end goal of native readiness.
CriterionWhat it isHow we arrived at thisWhat it means in practice
C1 Layer structure Clean, semantic layer hierarchy with meaningful names. SwiftUI and Compose build UIs as trees of named views. Figma's layer tree is the developer's blueprint. When we assessed GCash's Accordion, 8 layers were named "Frame" — every one needed renaming before a developer could use it. Messy layers = messy code = bugs. Layers named "container," "header-row," "icon-leading" — not "Frame 47" or "Group 12." The layer hierarchy should mirror the native view tree.
C2 Variant naming Property names map 1:1 to native code parameters. Figma properties become function parameters in code. style=filled maps to .ebStyle(.filled) in SwiftUI. Has Icon=yes doesn't map to anything — it should be hasIcon=true. Bad names force developers to create a translation layer where bugs live. Property names should be clean enough to appear directly in SwiftUI/Compose code. No spaces, no special characters, boolean values use true/false.
C3 Token coverage All visual values use design tokens, zero hardcoded values. A button with background #0033B8 breaks if GCash rebrands. A button with color.primary.main updates automatically. For a 90M-user app, hardcoded colors are a ticking time bomb. We found multiple GCash components with raw hex values. Every color, spacing value, and font size uses a design token. Zero hardcoded values. Tokens are the single source of truth across all platforms.
C4 Native mappability No Figma-only visual tricks that can't translate to native. Figma lets you do things native platforms can't — complex blend modes, multiple fills with different opacities, elaborate masks. Finding out a design is impossible to build after a sprint of work is the most expensive kind of failure. No Figma-only visual tricks. If you can't build it in SwiftUI or Compose, it can't be in the design system. C4 ensures the design is achievable.
C5 Interaction states Every interactive state exists as its own variant — default, pressed, disabled, focused, loading. A button without a pressed state looks broken — users tap it and nothing changes. A button without a disabled state gets tapped when it shouldn't be. Missing states are the #1 source of "it works in Figma but feels wrong in the app." Every interactive component documents: default, pressed, disabled, focused, loading. Each state is a separate variant. Missing states force developers to invent behavior differently.
C6 Asset quality Icons and images are vector instances colored by design tokens, never raster embeds. GCash uses a two-tone icon system. If icons are PNG embeds, the opacity treatment can't be applied programmatically. Developers have to manually create different icon versions for light mode, dark mode, and each color context. Vector component instances solve this with one asset. Icons are vector component instances colored with tokens — not raster/PNG embeds or CDN URLs. They can be resized, tinted, and themed natively.
C7 Code Connect The component maps 1:1 to native code via Figma Code Connect. Code Connect is the end goal of native readiness. C1–C6 prepare the component for it. Without Code Connect, developers still manually translate Figma properties to code — which is the exact problem a design system is supposed to solve. When a developer inspects the button in Figma, they see EBButton(.filled, size: .large) instead of "fill: #0033B8, corner-radius: 8." It eliminates the translation step entirely.
Part 5

Native Status Levels

Native readiness isn't binary. A component might be 90% ready (just needs a property rename) or 20% ready (needs a full rebuild). We created 4 levels to tell you how much work sits between the component and its native implementation.

Summary

ReadyNeeded a clear "go" signal. Developer can build from Figma without questions.
Needs RefinementMost components land here. Small issues like unnamed layers or wrong boolean types. One session fix.
Requires ReworkSome Figma structures have no native equivalent. Architecture work, not cleanup.
Not ApplicableNo point assessing native readiness on something exiting the library.
StatusWhat it isHow we arrived at thisWhat it means in practice
ReadyLinkable to native exactly as-is, with no blockers.We needed a "ship it" signal. Ready means a developer can inspect this component in Figma and implement it natively without asking any questions. All layers named, all states exist, all tokens bound.Code Connect is either registered or registerable immediately. No blockers.
Needs RefinementClose to ready — a few small, fixable issues remain.Most components land here after initial assessment. The structure is close but has small issues — a few layers need renaming, a property uses yes/no instead of true/false, a color isn't tokenized. Fixable in one session.Assign to DS team for targeted fixes. Usually one Figma session.
Requires ReworkCan't be implemented natively in its current form.Some components can't be implemented natively in their current form — the variant structure doesn't map to any native pattern, or it uses Figma-only effects. This needs architecture work, not cleanup.Block native implementation until the rework is done. This is a sprint-level investment.
Not ApplicableNative readiness isn't worth assessing.If the DS Health verdict was Remove or Product Layer, there's no reason to assess native readiness. The component either shouldn't exist or doesn't need a native version.Skip entirely. Don't waste time assessing a component that's going away.
Part 6

Combined Status

DS Health and Native Readiness are independent axes. A component can be healthy (Keep) but not native-ready (Requires Rework). Or perfectly native-ready but should be removed. We created a matrix because a single score hides important nuance — "Keep + Requires Rework" is very different from "Remove + N/A."

Summary

Keep + ReadyBoth axes pass. No work needed. Register Code Connect and ship.
Keep/Fix + RefinementA single score would hide that this is a quick fix, not a rebuild.
Fix/Restructure + ReworkBlocks native implementation. Starting dev work now means rework later.
Product Layer + N/AToo specific for core DS. Product team maintains it.
Remove/Consolidate + N/AComponent is being deleted or merged. No return on investment.
DS VerdictNative StatusCombined Action
KeepReadyShip it. Component is clean and native-ready. Register Code Connect.
Keep / FixNeeds RefinementMinor fixes needed. Assign to DS team, usually one session.
Fix / RestructureRequires ReworkSignificant work before engineers can use it. Sprint-level investment.
Product LayerN/AMove to product library. Product team maintains it.
Remove / ConsolidateN/ASkip native assessment. Component is being deleted or merged.
Why two axes? The DS verdict tells you the strategic decision (keep or kill?). The native status tells you the tactical work (how much effort?). Together they give you both.
Part 7

Mobile Documentation Criteria — 9-Point Assessment

C1–C7 checks if the component is built correctly. The 9-point assessment checks if it's documented correctly — can someone who's never seen this component understand how to use it? We split into Must-Haves (1–5) and Good-to-Haves (6–9) because shipping without an overview is a blocker, but shipping without a changelog is acceptable for v1.

Summary — Must-Haves

1. OverviewEvery DS we studied starts here. Without it, designers pick the wrong component.
2. Visual refAtlassian and Spectrum show every combination. "Imagine what disabled looks like" isn't docs.
3. Props/APIFirst question every developer asks. Without it, devs read source code.
4. UsagePolaris and Spectrum pair rules with visuals. Without guidelines, teams invent their own rules.
5. A11y90M+ users including those with disabilities. GOV.UK sets the standard.

Summary — Good-to-Haves

6. TokensCarbon lists tokens per component. Devs can inspect in Figma, but a reference is faster.
7. PlatformMaterial documents per platform. Prevents "works on iOS, broken on Android."
8. CompositionSpectrum shows composition examples. Answers "how does this work inside a card?"
9. ChangelogPrimer maintains changelogs. Must-have once multiple teams consume the DS.
#CriterionHow we arrived at this
1Overview & purposeEvery DS we studied (Material, Polaris, Spectrum) starts with a clear "what is this component and when should you use it vs alternatives." Without it, designers pick the wrong component.
2Visual referenceDevelopers need to see every variant and state rendered. Atlassian and Spectrum show every combination visually. We made this a must-have because "imagine what disabled looks like" isn't documentation.
3Props/API tableEvery developer's first question: "what properties does this accept?" Primer and Carbon document every prop with type, default, and accepted values. Without this, developers read source code instead of docs.
4Usage guidelinesPolaris and Spectrum include do's and don'ts with visual examples. This prevents misuse — "don't use a destructive button for cancel actions." Without guidelines, every team invents their own rules.
5Accessibility notesGCash serves 90M+ users including those with disabilities. ARIA roles, keyboard navigation, and screen reader behavior must be documented. GOV.UK DS sets the standard here.
6Token referenceWhen a developer needs to override a color, they need to know which token to change. Carbon lists every token per component. Good-to-have because developers can inspect tokens in Figma, but a reference is faster.
7Platform notesSwiftUI and Compose handle things differently — shadow rendering, gesture handling, animation APIs. Material documents these differences per platform. Good-to-have because it prevents "works on iOS, broken on Android."
8Composition patternsHow does a button work inside a card? Inside a form? Inside a bottom sheet? Spectrum shows composition examples. Good-to-have because it answers "how do I use this with other components?"
9ChangelogWhen a component changes, teams using the old version need to know what changed and how to migrate. Primer maintains detailed changelogs. Good-to-have for v1, must-have once multiple teams consume the DS.
Part 8

Component Maturity Model (L1–L4)

The assessment tells you what's wrong now. The maturity model tells you how far you are from the best in the industry. We created 4 levels by studying where 30 real design systems cluster — most sit at L2-L3, only Material/Fluent/Apple reach L4. The levels aren't aspirational — they're empirical.

Summary

L1Every component starts here. GCash Checkbox is L1 — 0.3/3.0 future-proof.
L2Industry median across 30 systems. Where young DS's naturally sit. Accordion (1.7), Avatar (~1.5).
L3Where "one product" becomes "the platform." Button is here (2.3). Carbon and Atlassian sit here.
L4Only Material, Fluent, Apple reach L4 — teams of 50+ engineers. North star, not requirement.
LevelHow we arrived at thisEntry criteriaEast Blue example
L1 Functional Every component starts here. L1 answers one question: "does this thing even work?" Before worrying about tokens or accessibility, you confirm the component functions at all. GOV.UK DS Button sits at L2, but many smaller systems never get past L1. Passes C1 and C4 only. Renders correctly with basic interaction. One size, one style. Checkbox — 0.3/3.0 future-proof. No tokens, limited states, no Code Connect.
L2 Reliable L2 separates "barely works" from "reliably works." We found the industry median for Button across 30 systems lands at L2 — most systems have proper states and basic tokens but limited styles and sizes. This is where a young design system naturally sits. Passes C1–C4, most of C5. Semantic naming, 2–3 sizes, core states. Accordion (1.7 avg), Avatar (~1.5 avg) — both work but have variant and token gaps.
L3 Scalable L3 is where a component goes from "works for one product" to "works for the platform." For GCash with payments, banking, investments, and insurance surfaces, this is the target level. IBM Carbon, Atlassian, and most Tier 1 systems sit here for most components. Passes C1–C6, partial C7. 3+ styles, 4+ sizes, composable slots, multi-theme. Button — L3, 2.3 avg. 48 variants, 3 styles, 4 sizes. Missing: Code Connect, motion, edge cases.
L4 Industry-grade L4 is the north star, not the requirement. Only Material, Fluent, and Apple HIG reach L4 for their core components — teams with 50+ engineers dedicated to the design system. Knowing what L4 looks like helps decide which improvements are worth pursuing vs gold-plating. Passes all C1–C7. Full documentation, audited a11y, motion specs, cross-platform Code Connect. No East Blue component has reached L4 yet. Industry examples: Material 3 Button, Fluent 2 Button, Apple HIG Button.
Part 9

Future-Proof Scoring (6 Pillars)

A rebrand shouldn't require rebuilding components. We identified 6 architectural properties that determine whether a component survives a visual refresh. Each pillar scored 0–3. Average 2.0+ means resilient. These come from studying how Material and Carbon handled their own rebrands — Material 2 → 3, Carbon v10 → v11.

Summary

Token abstractionMaterial M2→M3: deep chains updated automatically, hardcoded values broke.
API stability.filled breaks when "filled" becomes "elevated." Intent names don't.
Variant extensibilityCarbon uses orthogonal axes. Tested by asking "what happens when you add a ghost style?"
Layout decouplingFixed heights break when text changes. Carbon uses min-height + padding.
Composition isolationFound GCash components sharing token layers — changing one accidentally changed another.
Migration pathPrimer maintains changelogs with codemods. Without them, every team upgrades differently.
PillarHow we arrived at thisScore scale
Token abstraction When Material rebranded from M2 to M3, components using deep token chains updated automatically. Components with hardcoded values needed manual fixing. The depth of the token chain directly predicts rebrand effort.
0
Hardcoded
1
Component tokens
2
Semantic chain
3
Full chain + modes
API stability .filled is a visual name. If a rebrand changes "filled" to "elevated," the API name is wrong. Intent-based names like .prominence(.high) survive any visual change. We saw this problem in real rebrands.
0
No API
1
Visual names
2
Semantic
3
Intent-based
Variant extensibility Carbon uses orthogonal variant dimensions — adding a new style doesn't touch existing sizes or states. Poorly structured components require duplicating every existing variant to add one new option. We tested this by asking "what happens when you add a ghost style?"
0
Duplication
1
Touches all
2
Orthogonal
3
Variable modes
Layout decoupling Fixed heights (50px, 36px) break when text size changes. Carbon uses constraint-based sizing (min-height + padding) that adapts naturally. We found East Blue's button using fixed pixel heights.
0
Fixed px
1
Size tokens
2
Size modes
3
Constraint-based
Composition isolation A button sharing token layers with a card means changing the card accidentally changes the button. We found this cross-contamination pattern in multiple GCash components. Fully isolated components have their own variable collections.
0
Shares layers
1
Shared tokens
2
Slot-based
3
Fully isolated
Migration path Primer maintains detailed changelogs with before/after examples and codemods. When components change without migration guidance, every consuming team figures out the upgrade path differently. We included this pillar because GCash will inevitably version its components.
0
No versioning
1
Changelog
2
+ guides
3
+ codemods
Threshold: 2.0+ average = survives rebrand with token changes only. 1.5–1.9 = survives with token swap + manual fixes. Below 1.5 = needs structural rebuild before any rebrand. East Blue Button currently scores 2.3 — passes.
Part 10

Design Metrics (12 Fields)

These are the raw, countable attributes measured across all 30 reference systems. We chose these 12 because they're the attributes that vary most between design systems and directly impact developer experience. Each one answers a specific question about the component's capability.

Summary

Variants48 vs median 14.5. Strong.
Styles3 vs median 4.5. Gap — missing tonal/ghost.
Sizes4 vs median 3. Strong.
States5 vs median 5. On par.
Slots2 slots. Standard.
Token depth3/3. Top tier.
A11y docsDocumented. Most Tier 1 are tested.
Edge cases0 vs median 2. Critical gap.
Motion specNone. Material/Fluent/Porsche have full specs.
Dark modeYes. On par.
ResponsiveNo. Lower priority for mobile-first.
Code ConnectNo. C7 goal. Only 3 systems have this.
MetricHow we arrived at thisWhat good vs bad looks like
VariantsThe total variant count shows how many combinations the component covers without custom work. We counted variants across all 30 systems to find the median.East Blue Button: 48 (strong). Industry median: 14.5
StylesDifferent contexts need different visual weights — a primary CTA vs a toolbar action vs a destructive confirmation. We tracked how many distinct styles each system offers.East Blue: 3. Median: 4.5. Gap: missing tonal/ghost.
SizesMobile apps run on screens from 4" to 7"+. Different screens and densities need different component sizes. We counted distinct size options.East Blue: 4 (L/M/S/XS). Median: 3. Strong.
StatesMissing states force developers to guess. We listed which states each system documents and found the median is 5 (default, hover, pressed, focus, disabled).East Blue: 5. Median: 5. On par.
SlotsSlots allow customization without new variants. A button with an icon slot vs hardcoded icon placement. We tracked whether systems use slot-based composition.East Blue: 2 (leadingIcon, trailingIcon). Standard.
Token depthDeeper token chains = more rebrand-resilient. Measured 0 (hardcoded) to 3 (reference → semantic → component + modes).East Blue: 3. Top tier.
A11y docsThree levels observed: none, documented (written rules), tested (verified with assistive tech). GOV.UK and Atlassian test with screen readers.East Blue: documented. Most Tier 1 systems: tested.
Edge casesWhat happens with a super long Tagalog label? An icon-only button? RTL layout? Systems that document these prevent production bugs. GCash specifically needs long text handling due to Tagalog.East Blue: 0 (critical gap). Median: 2.
Motion specWithout specs, every developer picks different animation timing. Material documents exact ms/easing per state transition.East Blue: none. Material/Fluent/Porsche: full.
Dark modeExpected standard for mobile apps. We tracked whether the component has a dark theme variant that works via tokens.East Blue: yes. Most systems: yes.
ResponsiveDoes the component adapt to different screen widths? More relevant for web but some mobile components need density awareness.East Blue: no. Less critical for mobile-first.
Code ConnectThe ultimate bridge between design and development. Only Material, Fluent, and Apple have this for their core components.East Blue: no. This is the C7 goal.
Part 11

Recommendation Priority System

Not all gaps matter equally. We created 4 priority levels because working on the wrong gap wastes time. "Skip" exists because knowing what NOT to fix is just as valuable — it prevents you from gold-plating things that are already strengths.

Summary

MustTrails median by 2+ or has zero where median is 2+. Causes visible harm.
ShouldTrails median by 1 or 50%+ of compared systems have it.
CouldFewer than 50% have it. Polish item, not a blocker.
SkipWithout Skip, every benchmark only shows problems. Skip proves progress.
PriorityHow we arrived at thisWhat triggers it
MustSome gaps directly affect users or developers. Zero edge cases for a bilingual app with Tagalog labels (40-60% longer than English) is a real problem, not a nice-to-have. Must items are the ones where the gap causes visible harm.Trailing median by 2+ on a core metric, or having zero where median is 2+ (e.g. zero edge cases).
ShouldMeaningful but not critical. The component works without this, but it's noticeably weaker than the competition. Users won't see it directly, but developers will feel it every day.Trailing median by 1, or lacking a feature that 50%+ of compared systems have.
CouldWorth doing eventually but doesn't block anything. These are polish items that elevate the DS from "good" to "great." Only worth investing in after all Must and Should items are resolved.Fewer than 50% of compared systems have this, or the gap is small (trailing by less than 1).
SkipWe included Skip because without it, every benchmark produces only "things to fix" — which makes the DS feel perpetually broken. Skip items are proof of progress. "Your variant count is 3x the industry average" means stop adding variants and start fixing edge cases.East Blue exceeds the median by 1.5x+ or leads all compared systems.
Part 12

How They Connect — The 7-Step Loop

Assessment finds what's broken. Benchmark measures how far you are from the industry best. They're connected through a 7-step process. The gate check prevents wasted effort on dead components. The benchmark peek prevents double work by showing you the full picture before you touch Figma.

1
Assess the component
Run DS Health (4 traits → verdict) and Native Readiness (C1–C7 → status). This tells you what's broken and how badly.
2
Gate check — should we invest?
If the verdict is Remove or Product Layer, stop here. Don't benchmark a component that shouldn't exist. Only continue for Keep, Fix, Restructure, or Consolidate.
3
Peek at benchmark — gather intel
Quick benchmark run before fixing anything. See the full picture: assessment issues AND industry gaps. Don't act yet. Just know what you're dealing with so you can fix everything in one pass.
4
Fix everything in one Figma session
Combine assessment issues + benchmark gaps into one fix list. Example: "C5 says add hover. Benchmark says add hover AND selected. → Add both now." Touch the component once, not twice.
5
Re-assess — confirm all criteria pass
Run the assessment again. All C1–C7 should pass now. If something still fails, go back to Step 4. Don't move forward until the assessment is clean.
6
Full benchmark — compare against the industry
Now that the component passes assessment, run the full benchmark: L1–L4 maturity, 6-pillar future-proof score, radar chart, and recommendations. These are the "nice to have" improvements, not the broken stuff.
7
Polish and repeat
Fix the top industry gap. Re-benchmark. The next gap surfaces. Each cycle makes the component measurably stronger. Repeat until satisfied, then move to the next component.
Assessment answers: "Is this component healthy?"
Benchmark answers: "How good is this healthy component compared to the best in the industry?"

The gate check prevents wasted effort. The peek prevents double work.