The following was written to a colleague regarding User Story breakout. We are dealing with situations in which “tech” stories are also held to be untestable and not subject to the INVEST model in general. It is an a coding-for-hire situation with multi-year contracts in play. It’s messy. The discussion involved that, and questions about how this situation resembles the big-batch risks of Waterfall. I’ve written articles about “atomic” user stories, but this situation involves “sub-atomic” work and tasks captured in User Stories.

The situation also involves SAFe being applied to all work, large or small, interdependent or stand-alone. I can apply elements of SAFe in almost any environment, but I don’t advocate using whole-sale SAFe everywhere.

~~~

Overall, this situation appears to be a set of problems set up in Acquisition, by applying rigid waterfall-learned acquisition stringency. Unfortunately, that means the vendor team is partly defending themselves from a situation that was handed to them in the contract. One result is that they are blocked in integrating working product early and often, so they’ve gotten very good at looking busy and churning out a backlog of things that will be integrated later.

REGARDING THE WATERFALL QUESTION

The primary risks in Waterfall are:

Rework and breakage as components are developed out of sync with one another
Wasted money and time as overly specialized roles work out of sync with one another (hence the "generalizing specialist" approach promoted by agilest.)

Potential benefits of Waterfall are:

Clear scope and details of work up front. (Assuming nothing changes. Something always changes.)
Clear budgets and staffing needs up front, allowing Raleigh curve staffing and budgets far into the future. (Assuming the price of components doesn't change. Assuming physical and digital warehousing costs are reasonable and stable. Assuming people stay available exactly when you predict they would be. Assuming people are interchangeable parts; if not, assuming no one gets stick, divorced, disabled, quits, or dies. Assuming the purchasing power of your budget doesn't change.)
- This assumes the projected benefits and operating needs of the system remain static. (They never do.)
- This assumes that, in succession, the following are complete, perfect, sufficient, and clear: Concept of Operation (CONOPS); high-level requirements; detailed requirements; high-level designs; detailed designs. (They almost always need refinement once development work starts.)
- Assuming that development tool need, security risks, regulated technical constraints, and other technical parameters remain static. (The more complex the system, the less likely this is.)
Less disruption of key subject matter experts, as they are only needed at specification time, and never again until the final system demo 2 - 3 years later. (This typically turns into a massive drawback.)

In response to a question from a team member, the Drawbacks of Waterfall are:

Batch review and inspection assume the up-front requirements were fully known, stable, complete, clear, and correct. (They almost never are. Vendors count on that and make lots of money via "change orders.")
Batch review and acceptance at the end of development assumes that inspection and testing can catch any defects, and that there won't be significant defects anyway. (Batch inspection is inadequate and usually fails terribly.)
Batch review and acceptance at the end of development assumes there will be enough budget and time to correct critical defects. (There rarely is.)
Batch review and acceptance at the end of development assumes there will be no political or financial pressure to overlook critical flaws, design compromises, or operations defects or limits as a result of "sunk cost" mentality. (As in the space shuttle Challenger.)
Batch review and acceptance at the end of development assumes that scope creep/mutations will not have crept in. (This is patently untrue, as in the case of the Bradley personnel carrier. The longer the span prior to validation, the more likely things will creep in and not get weeded out. Frequent integration and demonstration is a defense against this.)

Architecture is overhead. UX is overhead. UI is overhead, to an extent. None of their work makes complete sense until the Service or functionality that will use them invokes them and they function together. The farther these are apart in time, the greater the risk of having to backtrack and correct. That is disruptive and ultimately wasteful. This situation reflects some of the great drawbacks of Waterfall, which included:

Work and knowledge growing stale before integration, resulting in wasteful effort to integrate and retrofit. Those of us working under MIL-STD-2167 compensated by clustering work close together and minimizing changes before integrating and building to ensure "shelf damage" had not occurred. We also used strict configuration management (code and data structure control) to minimize these risks.
Overspecialization meant that the creator of one component often was not available or able to support integration later. It also created delays, with people waiting on each other. It is mandated inefficiency. In aerospace, we address such situations by having the design engineers, fabricators, and other roles co-locate and collaborate daily. Some specialization may be necessary, overspecialization is wasteful.
Lengthy time frames before integrating and testing "thin vertical slices" made work seem like a dense fog to the people with the money -- the sponsors. They would naturally get fidgety and start imposing their will, which often made things worse. Delivering in short batches is in part a defense against this situation.
The "assembly line" analogy makes people feel highly productive, stocking up on work to be assembled later. That mentality is largely where Waterfall came from. But this is not an assembly line, it is a design-intensive activity. Even parity projects can't do pure waterfall well because of the discovery/redesign that inevitably comes into play. And even in manufacturing (where I have put in some time), we know that hoarding parts/raw materials too long is risky because batches of product can be perishable, and specific knowledge about that batch of work is perishable, unless you want to have a sophisticated inventory information system.
- Your average software team doesn't, and shouldn't. The group in question is having enough trouble using Rally for the basics, which illustrates why keeping sufficient records about stockpiled work is unlikely.
- There will inevitably be rework and tweaking when it's assembled. Better to minimize the disruption by assembling the pieces and parts as soon as possible.
There are some reasons to have highly skilled UI teams working in advance of the services development. On many of our teams, however, this has become a problem that clutters up the backlog further. UI is a design layout, not a ‘deliverable’ of its own.

OVERALL RESPONSE

I believe a root causes of the lengthy exhanges prompting this article are:

The team has been lax in its Rally use. That is a shared responsibility between all of them, but particularly the Scrum Master and POs. Hence, a lot of mopping up.
Team concerns about having to justify changes when "we're agile" betrays a lack of understanding — or poor communications with product sponsors. Teams should feel free to identify goals (here they are obligated to call them “Release Goals”), and then change them when interruptions such as mandataory new data security crop up. If they're getting hammered for making rational, necessary changes in priority then there is a dysfunction from a different direction -- Product Owners, Task Order Managers, Contracts Representatives, Project Managers, and others. Additionally, some of the push-back by the team is against trying to fit all larger parent stories (called “Features” in their backlog) into 12-week increments. (These increments are called “Releases”, even though small stand-alone teams should be releasing much more frequently than every 12 weeks.) This team is not part of a system-of-systems effort, there are not realistic Release Trains, they are not integrating components into a new system. If a larger story (“Feature”) needs to span an arbitrary 12-week barrier, and no coordinated functionality is being integrated on that cadence, then it doesn’t matter of a ‘Feature’ spans a ‘Program Increment.’ In that case, the team is pushing back against having SAFe applied poorly to their stand-along project. I can see their point of view.

Wanting to put prep work such as Architectural Runway betrays a lingering pressure (possibly from Project Managers) to treat Rally like a time accounting system.
- The recent addition in the Foundational Practice of a statement that all stories must have at least one task under them with hours makes this worse.
- Things such as wireframes, UX work, design work, etc. are overhead. They are necessary background work that reduces team ability to work on brand new functionality, and managers hate them. Managers want functional stuff, stuff, stuff cranked out, and managers often partially are at fault for technical debt.
- The conventional approach in agile is to keep such stuff out of the backlog, and only reflect it in stories when it gets "plugged into" functionality.
- At many clients, there is a lot of pressure to track such things in the backlog, which results in Administrative Epics. I'm still not sure how I feel about Admin Epics, but where use of large tools such as Agile Central is mandated, people naturally will try to avoid working in two places and dump peripheral information in there. If people were allowed to use a simple tool like Trello instead of a system like Agile Central, it would be easier to keep the backlogs uncluttered. But they would need other ways to keep tabs on the overhead work versus the functional development work.

MISCELLANEOUS

One team member’s response to INVEST was, "Those who’ve been on the team for a while know this is not achievable because of the way we’ve agreed to write stories. Our stories have become more like tasks in many cases. We have a story for UX, DB, Architecture, Services, and UI, so each of those are not really independent and not really testable. We can usually test UI stories, but not the other ones. If we have to meet all these criteria, the story would get too big. So I don’t think we can follow these guidelines."

Frankly, I don't see how developing data base work separately isn't testable. The criteria may be that the data base changes reflect the functionality that they will support. There may be data normalization standards that should be met. There may be data field harmonization efforts that must be accounted for (in many enterprises with systems of systems, we have fields scattered everywhere that hold the same information but in different formats/lengths/architectures. There is no harmonization, there seems to be a Data Dictionary problem. Incrementally fixing that could be an Acceptance Criteria (AC) for all data work enterprise-wide.

All so-called TECH user stories are testable. There may be architectural guidelines, performance needs, integration needs, tech stack constraints, etc. All of those must be tested against some criteria, and probably are. Agile teams typically don't demand tests per se for each architecture or design decision, but such decisions should be made in a visible, collaborative way that de facto tests them. And most such decisions should be reflected in the non-functional aspects User Stories. The disconnect between background work and the constraints on User Stories remains a waterfall hold-over behavior. They were at least integrated in the requirements discipline under Waterfall, even if they were unfortunately segregated in development. Under agile approaches, they must be more integrated.

More statements from a dust-kicking development team:

"Those who’ve been on the team for a while know this is not achievable because of the way we’ve agreed to write stories." This is a negotiation ploy, legitimizing poor behavior and failure to improve with some ancient group decision. They essentially have agreed to write stories poorly. To be fair, it may be partly because of how this team has been managed by CIO, but not entirely.

"If we have to meet all these criteria, the story would get too big..." That's because they have written stories, as he admits, for dust-level tasks.

"So each of those are not really independent and not really testable." Then they should use the Predecessor/Successor function in Agile Central to track what makes up a cohesive, testable unit. Other teams are doing so very successfully. This looks like the writer was blowing smoke. If they are not independent, then they should be testable. The reason to put such things in "thin vertical slices" is to make it clearer how they will be demonstrated and tested. Otherwise, it appears the team is just good at looking busy.

INVEST still applies, even to itty-bitty TECH stories. If something is too small to be tested, then quit doing it. If it is useful to the system, then associate with a User Story. If it is preparatory work that spreads across the entire system, then identify what regression tests should be run to make sure your data/architecture/design work didn't break a lot of existing functionality.