WoWApr 21, 2015 8:00 pm CT

Scaling: On feedback processes and language

As we jump head first into the Patch 6.2 PTR and plethora of class/balance changes that accompany it, I notice a familiar type of thread has begun to rear its head among various discussion forums once again. The exact setting might change– I guarantee you’ll find one for every class in the game– but every one of these threads essentially says the same thing: “I don’t scaaaaaale good!”

This phrase (and variations thereof) are indicative of a mindset that, while genuine in its desire to provide feedback, does so in an extremely unhelpful fashion. Simply invoking “scaling” as a class or spec’s primary issue provides very little meaningful feedback, and is usually a disservice to yourself and your community. Let’s analyze why.

What is scaling?

The obvious place to begin is by attempting to define what people mean when they refer to “scaling”. The one point that I consistently note is that posters seem to think that their audience (players and fellow developers alike) are able to read their minds or instinctively understand what they mean when they make reference to this term.

For the purposes of the article, I’m going to define scaling as the relative gain in performance by a spec based on increase in stats (both primary and secondary), either as a function of item level increase or quantity shifting.

This definition hopefully conveys two important aspects of the term: First, that it is the rate of gain experienced by a spec, and second, that it is usually defined by gear upgrades– either in terms of an increase to item level (i.e. direct upgrades) or switching in pieces of gear that are better itemized for their specs due to stat weights (i.e. sidegrades). It’s also worth noting that in the latter situation, some pieces of gear can actually be a performance gain despite being at a lower item level than the piece which they are replacing.

Screen Shot 2014-05-15 at 12.14.14 pm

Scaling as a generalized complaint

How is it that a term such as scaling, which refers to legitimate in-game mechanics, has become lumped into the same category of taboo words such as “clunky” or “broken” in the context of feedback? Quite simply, through sheer misuse. Let’s conduct a thought experiment using two different situations here.

In Situation A, let’s say I make the claim that the reason Frost Death Knights are currently shown to be performing so poorly is because they “…don’t scale well.” In doing so, I’ve provided literally nothing for developers to work with. What is “good” scaling? What am I comparing myself to? What basis for comparison am I drawing when I claim something isn’t doing “good”? Am I going off of Top 10 parses in Warcraft Logs as evidence? Top 90th percentile? Moreover, am I accounting for every combat situation various specs are capable of being put into, without accounting for differing strategies, flexible raid sizes and player skill? Do I therefore hinge my definition of “scaling” on very precise scenarios that I expect developers will innately understand? Perhaps most importantly, when it comes to underrepresented specs: Am I accounting for sampling bias, whereby small sizes can unreasonably skew a spec’s apparent performance?

All of this might seem obvious on the player side (“Well of course, only my metrics suffice!”), but consider for a moment that the amount of data and variables that we have access to is barely a fraction of what developers receive every day. An even more heinous practice that I’ve come to see being used is utilizing Simulation rankings to justify “poor scaling” by comparing one’s spec to the top performers in sims. I cannot begin to emphasize what a terrible idea this is: It assumes that every class/spec Action Priority List module in Simcraft is accurate/optimized (hint: They’re not), that they simulate realistic raid conditions beyond very simplistic, standstill fights (hint: They overwhelmingly don’t) and that variations in the data are clear indications of “bad scaling” (hint: They aren’t). While Blizzard has stated that they attempt to balance various DPS specs to deal similar damage on Patchwerk type fights, players tend to take this declaration as a gospel promise that anything short of near-identical DPS is evidence of imbalance.

Stat scaling within a spec

In Situation B, let’s say I make the claim that Frost Death Knights “…scale poorly with Critical Strike”, and that this is one of the reasons for their poor performance. Because I’ve localized the issue to one spec, I feel more confident when examining various scale factors for stats within that spec (since it’s a module that I can account for being updated and worked upon) — acknowledging, of course, that these weights were generated under Patchwerk-type conditions. It’s also worth mentioning that even within this framework, there are still outstanding issues that the sim has in relation to modelling Killing Machine procs or correct Blood Tap usage. On the whole however, let’s take in good faith that these issues wouldn’t greatly skew our results.

At Mythic Blackrock Foundry gear levels, some stat breakdowns we see for Two-Handed Frost DKs include:

Stat	Average Weight per point
Haste	3.9
Multistrike	3.49
Versatility	3.26
Mastery	2.78
Critical Strike	2.76

This table obviously shows that Critical Strike is one of Frost’s weakest stats, and clearly doesn’t contribute nearly the same amount of DPS as Haste or Multistrike do. But can I definitively use this as a definition for one stat “scaling poorly”?

When I say “poorly,” am I comparing this level of stat valuation to something such as the value of Crit for Fury Warriors or Fire Mages? That alone will prove to be an exercise in frustration for reasons that should be obvious. Moreover, what about stats that provide even lower DPS for other specs — A good example being Mastery rating for Windwalker Monks? If there are clearly stats that provide less of a gain for their specs than Critical Strike does for Frost, is it truly justified to call it “poor scaling” when our entire premise centers on such a narrow field of vision?

Framing the issue correctly

Despite starting from a flawed premise, there are still nuggets of truth to my assertion about Critical Strike. For instance, it’s correct to note that Critical Strike for Frost obviously doesn’t scale as well as Haste or Multistrike in a majority of simmable combat situations. It’s also correct to state that a large reason for this deprecation is due to the Killing Machine mechanic, which supplants Crit rather than interacting with it — but developers have already confirmed that this is intended design. The issue here comes in when I assert that this is “poor scaling,” because I’m assuming that developers: a) Interpret the data in the same way as I do, and use the same metrics, and b) Would still agree with my assessment, in light of the view that Crit and Frost DKs probably aren’t the biggest outliers as far as low stat valuations across specs go.

What I should have said from the get-go was something that I had alluded to and hoped would come across as obvious: My spec feels undertuned. It demonstrably performs on a fairly low end of the DPS spectrum across a wide variety of fights this tier, and I believe that it deserves to be buffed. Back during Mists of Pandaria, Ghostcrawler theorized that the reason players often tried to frame their complaints about numbers by using “scaling” as a buzzword was because the vagueness associated with the term made their argument seem more clever. In reality, as he noted, such statements tend to confuse the actual issue at hand.

I do want to be clear: I am not saying that spec or stat scaling can’t have problematic aspects. Up until Patch 5.2, for example, the Unholy DK Gargoyle (a.k.a. Gary!) didn’t scale with our Mastery at all — this, coupled with its hefty Runic Power cost at the time, ensured that the spec would have reached a point where it would have been a damage loss to summon the minion. Moreover, simply buffing its damage via its Attack Power modifier to compensate would have simply put the issue off for another tier at best. The idea of a cooldown thus being a DPS loss, let alone DPS neutral, was something that hit against some of the core fundamentals of what define the function of cooldowns for DPS specs. Thus, it made sense at the time to complain about Gary’s lack of Mastery scaling as a fundamental issue that deserved to be examined and corrected.

As a whole though, community use of such buzzwords tends to either lead to feedback being ignored or incorrectly characterized. In the past, Celestalon has noted that — barring something obvious such as the Gargoyle — Blizzard usually takes complaints about the phenomenon seriously when there’s been a considerable amount of math and effort on the player end. Does this mean that in order to provide effective feedback players need to approach the issue with a Theck-level PhD analysis? Of course not! What it does mean is that when players make claims about concepts that usually require mathematical proof, they should be prepared to provide that proof.

To quote Ghostcrawler, “TLDR: many players worry far more about scaling than they need to. ;)”

Blizzard Watch is made possible by people like you.
Please consider supporting our Patreon!