Assessment Methodology

East Blue Design System

Assessment & Benchmark Methodology

The "why behind the why." Every rating, score, and level in the East Blue assessment system — what it is, how we arrived at it, and why it exists. When someone asks "why does East Blue score components this way?", this page is the answer.

Part 1

DS Health — The 4 Traits

We studied what makes Material, Atlassian, Carbon, and Polaris components successful across products, platforms, and teams. These 4 traits are what they all have in common. We didn't invent them — the industry's best systems proved them.

Summary

Reusable	Material, Atlassian, and Polaris all share components across products and platforms. One-screen components are product-specific, not DS-grade.
Self-contained	Carbon isolates components with no external dependencies. GCash components that relied on parent containers broke when moved.
Consistent	Atlassian enforces uniformity via shared tokens. GCash's Accordion used yes/no booleans while everything else used true/false.
Composable	Carbon's hierarchy and Material's nested architecture proved that systems scale when components snap together as building blocks.

Trait	What it is	How we arrived at this	What it means in practice
Reusable	Works across multiple screens and flows, not just one specific screen or feature.	Material's Button works identically across Android, Web, and Flutter. Atlassian uses the same components across Jira, Confluence, Trello, and Bitbucket — wildly different products, same building blocks. The common thread: if it only works on one screen, it's not a component — it's a one-off.	A GCash button must work in a payment flow, a settings page, and a modal without modification. You design it once and use it everywhere. If it can't do that, it belongs in a product library, not the core design system.
Self-contained	Carries its own styles, states, and logic without depending on external setup.	Carbon packages each component independently — teams pull only what they need without side effects. We saw GCash components that depended on parent containers to set their colors. When moved between screens, they broke. The pattern was clear: a component that depends on external setup breaks when moved.	Drop the component into any screen and it works immediately. It doesn't need a parent wrapper. It manages its own pressed, disabled, loading states internally. For GCash with 90M+ users and multiple product teams, self-containment means no team can accidentally break another team's UI.
Consistent	Naming conventions, property types, and state coverage match the rest of the design system.	Atlassian's shared design tokens keep color, spacing, and elevation uniform across every product. When we assessed GCash's Accordion, we found it using `yes/no` boolean values while the rest of the system uses `true/false`. That inconsistency forced developers to handle special cases — a pattern we saw repeat across multiple components.	Every component uses the same naming conventions, the same property types, the same state coverage. If buttons use `true/false`, checkboxes use `true/false`. No surprises, no special cases, no guessing.
Composable	Can be nested inside other components and fits into the existing hierarchy.	Carbon's hierarchy — Elements → Components → Patterns → Templates — shows exactly how things nest. Material's NavigationDrawer is built from ListItems, which are built from Icons + Text. The systems that scale are the ones where components snap together like building blocks.	An icon sits inside a button. A button sits inside a card. A card sits inside a list. If a component can't be nested, designers have to rebuild it from scratch every time they need a new combination. For GCash, composability means a single well-built button powers every screen.

Part 2

Trait Ratings

Each of the 4 traits is rated independently. Why three levels? Because "partially works" needs different action than "completely broken." A component with inconsistent naming needs a 10-minute rename. A component with no states needs a redesign. The rating tells you how much work is ahead.

Summary

Pass ✅	Needed a clear "done" signal so healthy traits stop getting revisited.
Needs work ⚠️	Binary pass/fail treats a 10-minute rename the same as a full rebuild. Most real components land here.
Fail ❌	Separates "fix it" from "rethink it." Fail items block all downstream work.

Rating	What it is	How we arrived at this	What it means in practice
Pass ✅	Fully meets the trait. No issues found.	You need a clear "done" signal. Without it, every component gets scrutinized endlessly. We created Pass to mean: stop looking for problems here — this trait is clean.	The component fully satisfies this trait. No workarounds, no exceptions. Move on to the next trait or the next component.
Needs work ⚠️	Partially meets the trait. Specific, fixable gaps.	Without a middle tier, everything becomes binary — ship or rebuild. That's too coarse. Most real components land here. GCash's Accordion was rated ⚠️ on Consistent because its booleans used `yes/no` instead of `true/false`. A 10-minute rename, not a redesign — but still a real issue.	The intent is right, but execution has gaps. Maybe reusable in 3 out of 4 contexts. Maybe naming is consistent except for one property. These are targeted repairs, not fundamental problems.
Fail ❌	Does not meet the trait. Structural problem requiring rebuild.	We needed to separate "needs a fix" from "needs a rethink." Fail items go first in the priority queue because they block everything downstream — you can't benchmark a component that fundamentally doesn't work.	The component structurally can't satisfy this trait in its current form. A component that only works on one screen fails Reusable. A component with no states fails Self-contained. Not a tweak — a rebuild of that aspect.

Part 3

DS Health Verdicts

Why 6 verdicts? Because a component can "belong" in the DS but need very different kinds of work. Fix means tweak properties. Restructure means rebuild architecture. Consolidate means merge. Each verdict tells you the kind of investment required — not just "is it good or bad?"

Summary

Keep	Needed a "stop reviewing" signal for healthy components.
Fix	GCash Accordion had good structure but bad naming. One session fix, not a rebuild.
Restructure	Separated from Fix because restructuring takes a sprint vs 30 minutes.
Consolidate	GCash had multiple list components doing the same thing. Every duplicate doubles maintenance.
Product Layer	A DS that includes everything becomes unmaintainable. Core stays lean.
Remove	Dead components accumulate forever. Even unused ones confuse designers.

Verdict	What it is	How we arrived at this	What it means in practice
Keep	All 4 traits pass. Ship as-is.	We needed a clear "don't touch this" signal. Without Keep, every component gets endlessly scrutinized. Keep means the DS Health work is done here.	All 4 traits pass. Component is healthy. Move to native readiness assessment or benchmarking.
Fix	Sound architecture, needs targeted repairs.	GCash's Accordion had good structure but bad layer names and wrong boolean values. The architecture was sound — it just needed targeted repairs. We needed a verdict that says "the concept is right, the details are wrong."	Quick wins — usually one Figma session. Rename properties, add missing states, fix token bindings. The component stays, the details get corrected.
Restructure	Right purpose, wrong construction. Needs architectural rebuild.	Some components do the right thing but are built wrong — variant structure makes extending impossible, or the layer hierarchy doesn't map to native code. Restructure is a bigger investment than Fix, so separating them helps with planning.	The concept stays, the construction changes. A Fix takes 30 minutes. A Restructure takes a full sprint. Different budget, different timeline.
Consolidate	Merge duplicates into one component.	GCash had multiple list components doing the same thing. Every duplicate means every fix gets applied twice. We needed a verdict that says "merge these into one."	Two components exist that do the same thing. One absorbs the other. The surviving component gets the best properties from both.
Product Layer	Too feature-specific for core DS. Move to product library.	A design system that includes everything becomes unmaintainable. Some components are real but too feature-specific (e.g. a loan calculator widget). They work, but only for one product. The core DS should stay lean.	Move to a product-specific library. The team that owns the feature owns the component. Don't invest DS resources here.
Remove	Redundant, deprecated, or doesn't belong. Delete it.	Without Remove, dead components accumulate forever. Every component in the library has a maintenance cost — even unused ones confuse designers who find them. Remove keeps the library honest.	Delete it. It's a duplicate, an outdated pattern, or something that was never a design system component (OS-level UI, ad infrastructure).

Part 4

Native Mobile Readiness — 7 Criteria (C1–C7)

GCash targets iOS (SwiftUI) and Android (Jetpack Compose). These 7 criteria are the minimum requirements for a Figma component to translate cleanly to native code. Each one exists because its absence causes a specific, known failure mode when developers try to build the component natively.

Summary

C1 Layer structure	Found 8 layers named "Frame" in GCash Accordion. Messy layers produce messy code.
C2 Variant naming	Bad names force a translation layer between Figma and code. That layer is where bugs live.
C3 Token coverage	Hardcoded colors in a 90M-user app are a rebrand time bomb. Tokens update everywhere at once.
C4 Native mappability	Discovering a design is impossible to build after a sprint is the most expensive failure.
C5 Interaction states	Missing states are the #1 cause of "works in Figma, feels wrong in app."
C6 Asset quality	GCash's two-tone icons need programmatic tinting. PNGs can't do that.
C7 Code Connect	C1–C6 prepare the component. C7 completes the bridge. The end goal of native readiness.

Criterion	What it is	How we arrived at this	What it means in practice
C1 Layer structure	Clean, semantic layer hierarchy with meaningful names.	SwiftUI and Compose build UIs as trees of named views. Figma's layer tree is the developer's blueprint. When we assessed GCash's Accordion, 8 layers were named "Frame" — every one needed renaming before a developer could use it. Messy layers = messy code = bugs.	Layers named "container," "header-row," "icon-leading" — not "Frame 47" or "Group 12." The layer hierarchy should mirror the native view tree.
C2 Variant naming	Property names map 1:1 to native code parameters.	Figma properties become function parameters in code. `style=filled` maps to `.ebStyle(.filled)` in SwiftUI. `Has Icon=yes` doesn't map to anything — it should be `hasIcon=true`. Bad names force developers to create a translation layer where bugs live.	Property names should be clean enough to appear directly in SwiftUI/Compose code. No spaces, no special characters, boolean values use true/false.
C3 Token coverage	All visual values use design tokens, zero hardcoded values.	A button with background `#0033B8` breaks if GCash rebrands. A button with `color.primary.main` updates automatically. For a 90M-user app, hardcoded colors are a ticking time bomb. We found multiple GCash components with raw hex values.	Every color, spacing value, and font size uses a design token. Zero hardcoded values. Tokens are the single source of truth across all platforms.
C4 Native mappability	No Figma-only visual tricks that can't translate to native.	Figma lets you do things native platforms can't — complex blend modes, multiple fills with different opacities, elaborate masks. Finding out a design is impossible to build after a sprint of work is the most expensive kind of failure.	No Figma-only visual tricks. If you can't build it in SwiftUI or Compose, it can't be in the design system. C4 ensures the design is achievable.
C5 Interaction states	Every interactive state exists as its own variant — default, pressed, disabled, focused, loading.	A button without a pressed state looks broken — users tap it and nothing changes. A button without a disabled state gets tapped when it shouldn't be. Missing states are the #1 source of "it works in Figma but feels wrong in the app."	Every interactive component documents: default, pressed, disabled, focused, loading. Each state is a separate variant. Missing states force developers to invent behavior differently.
C6 Asset quality	Icons and images are vector instances colored by design tokens, never raster embeds.	GCash uses a two-tone icon system. If icons are PNG embeds, the opacity treatment can't be applied programmatically. Developers have to manually create different icon versions for light mode, dark mode, and each color context. Vector component instances solve this with one asset.	Icons are vector component instances colored with tokens — not raster/PNG embeds or CDN URLs. They can be resized, tinted, and themed natively.
C7 Code Connect	The component maps 1:1 to native code via Figma Code Connect.	Code Connect is the end goal of native readiness. C1–C6 prepare the component for it. Without Code Connect, developers still manually translate Figma properties to code — which is the exact problem a design system is supposed to solve.	When a developer inspects the button in Figma, they see `EBButton(.filled, size: .large)` instead of "fill: #0033B8, corner-radius: 8." It eliminates the translation step entirely.

Part 5

Native Status Levels

Native readiness isn't binary. A component might be 90% ready (just needs a property rename) or 20% ready (needs a full rebuild). We created 4 levels to tell you how much work sits between the component and its native implementation.

Summary

Ready	Needed a clear "go" signal. Developer can build from Figma without questions.
Needs Refinement	Most components land here. Small issues like unnamed layers or wrong boolean types. One session fix.
Requires Rework	Some Figma structures have no native equivalent. Architecture work, not cleanup.
Not Applicable	No point assessing native readiness on something exiting the library.

Status	What it is	How we arrived at this	What it means in practice
Ready	Linkable to native exactly as-is, with no blockers.	We needed a "ship it" signal. Ready means a developer can inspect this component in Figma and implement it natively without asking any questions. All layers named, all states exist, all tokens bound.	Code Connect is either registered or registerable immediately. No blockers.
Needs Refinement	Close to ready — a few small, fixable issues remain.	Most components land here after initial assessment. The structure is close but has small issues — a few layers need renaming, a property uses `yes/no` instead of `true/false`, a color isn't tokenized. Fixable in one session.	Assign to DS team for targeted fixes. Usually one Figma session.
Requires Rework	Can't be implemented natively in its current form.	Some components can't be implemented natively in their current form — the variant structure doesn't map to any native pattern, or it uses Figma-only effects. This needs architecture work, not cleanup.	Block native implementation until the rework is done. This is a sprint-level investment.
Not Applicable	Native readiness isn't worth assessing.	If the DS Health verdict was Remove or Product Layer, there's no reason to assess native readiness. The component either shouldn't exist or doesn't need a native version.	Skip entirely. Don't waste time assessing a component that's going away.

Part 6

Combined Status

DS Health and Native Readiness are independent axes. A component can be healthy (Keep) but not native-ready (Requires Rework). Or perfectly native-ready but should be removed. We created a matrix because a single score hides important nuance — "Keep + Requires Rework" is very different from "Remove + N/A."

Summary

Keep + Ready	Both axes pass. No work needed. Register Code Connect and ship.
Keep/Fix + Refinement	A single score would hide that this is a quick fix, not a rebuild.
Fix/Restructure + Rework	Blocks native implementation. Starting dev work now means rework later.
Product Layer + N/A	Too specific for core DS. Product team maintains it.
Remove/Consolidate + N/A	Component is being deleted or merged. No return on investment.

DS Verdict	Native Status	Combined Action
Keep	Ready	Ship it. Component is clean and native-ready. Register Code Connect.
Keep / Fix	Needs Refinement	Minor fixes needed. Assign to DS team, usually one session.
Fix / Restructure	Requires Rework	Significant work before engineers can use it. Sprint-level investment.
Product Layer	N/A	Move to product library. Product team maintains it.
Remove / Consolidate	N/A	Skip native assessment. Component is being deleted or merged.

Why two axes? The DS verdict tells you the strategic decision (keep or kill?). The native status tells you the tactical work (how much effort?). Together they give you both.

Part 7

Mobile Documentation Criteria — 9-Point Assessment

C1–C7 checks if the component is built correctly. The 9-point assessment checks if it's documented correctly — can someone who's never seen this component understand how to use it? We split into Must-Haves (1–5) and Good-to-Haves (6–9) because shipping without an overview is a blocker, but shipping without a changelog is acceptable for v1.

Summary — Must-Haves

1. Overview	Every DS we studied starts here. Without it, designers pick the wrong component.
2. Visual ref	Atlassian and Spectrum show every combination. "Imagine what disabled looks like" isn't docs.
3. Props/API	First question every developer asks. Without it, devs read source code.
4. Usage	Polaris and Spectrum pair rules with visuals. Without guidelines, teams invent their own rules.
5. A11y	90M+ users including those with disabilities. GOV.UK sets the standard.

Summary — Good-to-Haves

6. Tokens	Carbon lists tokens per component. Devs can inspect in Figma, but a reference is faster.
7. Platform	Material documents per platform. Prevents "works on iOS, broken on Android."
8. Composition	Spectrum shows composition examples. Answers "how does this work inside a card?"
9. Changelog	Primer maintains changelogs. Must-have once multiple teams consume the DS.

#	Criterion	How we arrived at this
1	Overview & purpose	Every DS we studied (Material, Polaris, Spectrum) starts with a clear "what is this component and when should you use it vs alternatives." Without it, designers pick the wrong component.
2	Visual reference	Developers need to see every variant and state rendered. Atlassian and Spectrum show every combination visually. We made this a must-have because "imagine what disabled looks like" isn't documentation.
3	Props/API table	Every developer's first question: "what properties does this accept?" Primer and Carbon document every prop with type, default, and accepted values. Without this, developers read source code instead of docs.
4	Usage guidelines	Polaris and Spectrum include do's and don'ts with visual examples. This prevents misuse — "don't use a destructive button for cancel actions." Without guidelines, every team invents their own rules.
5	Accessibility notes	GCash serves 90M+ users including those with disabilities. ARIA roles, keyboard navigation, and screen reader behavior must be documented. GOV.UK DS sets the standard here.
6	Token reference	When a developer needs to override a color, they need to know which token to change. Carbon lists every token per component. Good-to-have because developers can inspect tokens in Figma, but a reference is faster.
7	Platform notes	SwiftUI and Compose handle things differently — shadow rendering, gesture handling, animation APIs. Material documents these differences per platform. Good-to-have because it prevents "works on iOS, broken on Android."
8	Composition patterns	How does a button work inside a card? Inside a form? Inside a bottom sheet? Spectrum shows composition examples. Good-to-have because it answers "how do I use this with other components?"
9	Changelog	When a component changes, teams using the old version need to know what changed and how to migrate. Primer maintains detailed changelogs. Good-to-have for v1, must-have once multiple teams consume the DS.

Part 8

Component Maturity Model (L1–L4)

The assessment tells you what's wrong now. The maturity model tells you how far you are from the best in the industry. We created 4 levels by studying where 30 real design systems cluster — most sit at L2-L3, only Material/Fluent/Apple reach L4. The levels aren't aspirational — they're empirical.

Summary

L1	Every component starts here. GCash Checkbox is L1 — 0.3/3.0 future-proof.
L2	Industry median across 30 systems. Where young DS's naturally sit. Accordion (1.7), Avatar (~1.5).
L3	Where "one product" becomes "the platform." Button is here (2.3). Carbon and Atlassian sit here.
L4	Only Material, Fluent, Apple reach L4 — teams of 50+ engineers. North star, not requirement.

Level	How we arrived at this	Entry criteria	East Blue example
L1 Functional	Every component starts here. L1 answers one question: "does this thing even work?" Before worrying about tokens or accessibility, you confirm the component functions at all. GOV.UK DS Button sits at L2, but many smaller systems never get past L1.	Passes C1 and C4 only. Renders correctly with basic interaction. One size, one style.	Checkbox — 0.3/3.0 future-proof. No tokens, limited states, no Code Connect.
L2 Reliable	L2 separates "barely works" from "reliably works." We found the industry median for Button across 30 systems lands at L2 — most systems have proper states and basic tokens but limited styles and sizes. This is where a young design system naturally sits.	Passes C1–C4, most of C5. Semantic naming, 2–3 sizes, core states.	Accordion (1.7 avg), Avatar (~1.5 avg) — both work but have variant and token gaps.
L3 Scalable	L3 is where a component goes from "works for one product" to "works for the platform." For GCash with payments, banking, investments, and insurance surfaces, this is the target level. IBM Carbon, Atlassian, and most Tier 1 systems sit here for most components.	Passes C1–C6, partial C7. 3+ styles, 4+ sizes, composable slots, multi-theme.	Button — L3, 2.3 avg. 48 variants, 3 styles, 4 sizes. Missing: Code Connect, motion, edge cases.
L4 Industry-grade	L4 is the north star, not the requirement. Only Material, Fluent, and Apple HIG reach L4 for their core components — teams with 50+ engineers dedicated to the design system. Knowing what L4 looks like helps decide which improvements are worth pursuing vs gold-plating.	Passes all C1–C7. Full documentation, audited a11y, motion specs, cross-platform Code Connect.	No East Blue component has reached L4 yet. Industry examples: Material 3 Button, Fluent 2 Button, Apple HIG Button.

Part 9

Future-Proof Scoring (6 Pillars)

A rebrand shouldn't require rebuilding components. We identified 6 architectural properties that determine whether a component survives a visual refresh. Each pillar scored 0–3. Average 2.0+ means resilient. These come from studying how Material and Carbon handled their own rebrands — Material 2 → 3, Carbon v10 → v11.

Summary

Token abstraction	Material M2→M3: deep chains updated automatically, hardcoded values broke.
API stability	`.filled` breaks when "filled" becomes "elevated." Intent names don't.
Variant extensibility	Carbon uses orthogonal axes. Tested by asking "what happens when you add a ghost style?"
Layout decoupling	Fixed heights break when text changes. Carbon uses min-height + padding.
Composition isolation	Found GCash components sharing token layers — changing one accidentally changed another.
Migration path	Primer maintains changelogs with codemods. Without them, every team upgrades differently.

Pillar	How we arrived at this	Score scale
Token abstraction	When Material rebranded from M2 to M3, components using deep token chains updated automatically. Components with hardcoded values needed manual fixing. The depth of the token chain directly predicts rebrand effort.	0 Hardcoded 1 Component tokens 2 Semantic chain 3 Full chain + modes
API stability	`.filled` is a visual name. If a rebrand changes "filled" to "elevated," the API name is wrong. Intent-based names like `.prominence(.high)` survive any visual change. We saw this problem in real rebrands.	0 No API 1 Visual names 2 Semantic 3 Intent-based
Variant extensibility	Carbon uses orthogonal variant dimensions — adding a new style doesn't touch existing sizes or states. Poorly structured components require duplicating every existing variant to add one new option. We tested this by asking "what happens when you add a ghost style?"	0 Duplication 1 Touches all 2 Orthogonal 3 Variable modes
Layout decoupling	Fixed heights (50px, 36px) break when text size changes. Carbon uses constraint-based sizing (min-height + padding) that adapts naturally. We found East Blue's button using fixed pixel heights.	0 Fixed px 1 Size tokens 2 Size modes 3 Constraint-based
Composition isolation	A button sharing token layers with a card means changing the card accidentally changes the button. We found this cross-contamination pattern in multiple GCash components. Fully isolated components have their own variable collections.	0 Shares layers 1 Shared tokens 2 Slot-based 3 Fully isolated
Migration path	Primer maintains detailed changelogs with before/after examples and codemods. When components change without migration guidance, every consuming team figures out the upgrade path differently. We included this pillar because GCash will inevitably version its components.	0 No versioning 1 Changelog 2 + guides 3 + codemods

Threshold: 2.0+ average = survives rebrand with token changes only. 1.5–1.9 = survives with token swap + manual fixes. Below 1.5 = needs structural rebuild before any rebrand. East Blue Button currently scores 2.3 — passes.

Part 10

Design Metrics (12 Fields)

These are the raw, countable attributes measured across all 30 reference systems. We chose these 12 because they're the attributes that vary most between design systems and directly impact developer experience. Each one answers a specific question about the component's capability.

Summary

Variants	48 vs median 14.5. Strong.
Styles	3 vs median 4.5. Gap — missing tonal/ghost.
Sizes	4 vs median 3. Strong.
States	5 vs median 5. On par.
Slots	2 slots. Standard.
Token depth	3/3. Top tier.
A11y docs	Documented. Most Tier 1 are tested.
Edge cases	0 vs median 2. Critical gap.
Motion spec	None. Material/Fluent/Porsche have full specs.
Dark mode	Yes. On par.
Responsive	No. Lower priority for mobile-first.
Code Connect	No. C7 goal. Only 3 systems have this.

Metric	How we arrived at this	What good vs bad looks like
Variants	The total variant count shows how many combinations the component covers without custom work. We counted variants across all 30 systems to find the median.	East Blue Button: 48 (strong). Industry median: 14.5
Styles	Different contexts need different visual weights — a primary CTA vs a toolbar action vs a destructive confirmation. We tracked how many distinct styles each system offers.	East Blue: 3. Median: 4.5. Gap: missing tonal/ghost.
Sizes	Mobile apps run on screens from 4" to 7"+. Different screens and densities need different component sizes. We counted distinct size options.	East Blue: 4 (L/M/S/XS). Median: 3. Strong.
States	Missing states force developers to guess. We listed which states each system documents and found the median is 5 (default, hover, pressed, focus, disabled).	East Blue: 5. Median: 5. On par.
Slots	Slots allow customization without new variants. A button with an icon slot vs hardcoded icon placement. We tracked whether systems use slot-based composition.	East Blue: 2 (leadingIcon, trailingIcon). Standard.
Token depth	Deeper token chains = more rebrand-resilient. Measured 0 (hardcoded) to 3 (reference → semantic → component + modes).	East Blue: 3. Top tier.
A11y docs	Three levels observed: none, documented (written rules), tested (verified with assistive tech). GOV.UK and Atlassian test with screen readers.	East Blue: documented. Most Tier 1 systems: tested.
Edge cases	What happens with a super long Tagalog label? An icon-only button? RTL layout? Systems that document these prevent production bugs. GCash specifically needs long text handling due to Tagalog.	East Blue: 0 (critical gap). Median: 2.
Motion spec	Without specs, every developer picks different animation timing. Material documents exact ms/easing per state transition.	East Blue: none. Material/Fluent/Porsche: full.
Dark mode	Expected standard for mobile apps. We tracked whether the component has a dark theme variant that works via tokens.	East Blue: yes. Most systems: yes.
Responsive	Does the component adapt to different screen widths? More relevant for web but some mobile components need density awareness.	East Blue: no. Less critical for mobile-first.
Code Connect	The ultimate bridge between design and development. Only Material, Fluent, and Apple have this for their core components.	East Blue: no. This is the C7 goal.

Part 11

Recommendation Priority System

Not all gaps matter equally. We created 4 priority levels because working on the wrong gap wastes time. "Skip" exists because knowing what NOT to fix is just as valuable — it prevents you from gold-plating things that are already strengths.

Summary

Must	Trails median by 2+ or has zero where median is 2+. Causes visible harm.
Should	Trails median by 1 or 50%+ of compared systems have it.
Could	Fewer than 50% have it. Polish item, not a blocker.
Skip	Without Skip, every benchmark only shows problems. Skip proves progress.

Priority	How we arrived at this	What triggers it
Must	Some gaps directly affect users or developers. Zero edge cases for a bilingual app with Tagalog labels (40-60% longer than English) is a real problem, not a nice-to-have. Must items are the ones where the gap causes visible harm.	Trailing median by 2+ on a core metric, or having zero where median is 2+ (e.g. zero edge cases).
Should	Meaningful but not critical. The component works without this, but it's noticeably weaker than the competition. Users won't see it directly, but developers will feel it every day.	Trailing median by 1, or lacking a feature that 50%+ of compared systems have.
Could	Worth doing eventually but doesn't block anything. These are polish items that elevate the DS from "good" to "great." Only worth investing in after all Must and Should items are resolved.	Fewer than 50% of compared systems have this, or the gap is small (trailing by less than 1).
Skip	We included Skip because without it, every benchmark produces only "things to fix" — which makes the DS feel perpetually broken. Skip items are proof of progress. "Your variant count is 3x the industry average" means stop adding variants and start fixing edge cases.	East Blue exceeds the median by 1.5x+ or leads all compared systems.

Part 12

How They Connect — The 7-Step Loop

Assessment finds what's broken. Benchmark measures how far you are from the industry best. They're connected through a 7-step process. The gate check prevents wasted effort on dead components. The benchmark peek prevents double work by showing you the full picture before you touch Figma.

Assess the component

Run DS Health (4 traits → verdict) and Native Readiness (C1–C7 → status). This tells you what's broken and how badly.

Gate check — should we invest?

If the verdict is Remove or Product Layer, stop here. Don't benchmark a component that shouldn't exist. Only continue for Keep, Fix, Restructure, or Consolidate.

Peek at benchmark — gather intel

Quick benchmark run before fixing anything. See the full picture: assessment issues AND industry gaps. Don't act yet. Just know what you're dealing with so you can fix everything in one pass.

Fix everything in one Figma session

Combine assessment issues + benchmark gaps into one fix list. Example: "C5 says add hover. Benchmark says add hover AND selected. → Add both now." Touch the component once, not twice.

Re-assess — confirm all criteria pass

Run the assessment again. All C1–C7 should pass now. If something still fails, go back to Step 4. Don't move forward until the assessment is clean.

Full benchmark — compare against the industry

Now that the component passes assessment, run the full benchmark: L1–L4 maturity, 6-pillar future-proof score, radar chart, and recommendations. These are the "nice to have" improvements, not the broken stuff.

Polish and repeat

Fix the top industry gap. Re-benchmark. The next gap surfaces. Each cycle makes the component measurably stronger. Repeat until satisfied, then move to the next component.

Assessment answers: "Is this component healthy?"
Benchmark answers: "How good is this healthy component compared to the best in the industry?"

The gate check prevents wasted effort. The peek prevents double work.