Tips and Tasks: Modernizing a legacy web application – Woolpert

Tony Franzese is a software engineer on Woolpert's product engineering team. In this article he shares some goals, issues, approaches, and outcomes from a large scale app modernization effort on an existing large web application code base.

Background

Woopert's product engineering team recently took on the maintenance and future development of a complex Web application. The app is used by many customers and is an integral part of a broader application ecosystem.

While the app has been successful for those customers and end users, it has accumulated a significant amount of technical debt over the nearly ten years that it has been under development.

The application is also complex by necessity: it is integrated into many internal and external systems, and each of those integrations has it's own flexible but fragile configurations and strategies.

The Goal

Over the long term, improving these and other areas will increase the quality and velocity of change we can make to this application.

The Product Engineering team has a clear short term goal: take this legacy Web application and make it sustainable and easy to extend with new features, while keeping it profitable and a good fit for our customers. The approach we decided to take is one of incremental modernization, not 'The Grand Rewrite'.

Discovery

Our initial effort went into discovery. Without appreciable documentation either next to the code or elsewhere, it took extra effort to identify the application stack, app dependencies, build steps, and how to make it run.

Along the way, we produced the documentation that we wished we'd had: README files, automation scripts, structural and logical diagrams, comments, etc.

Running the app and looking at how the code flowed was also really helpful. We identify some primary flows, followed them carefully, and diagrammed them so that they are better understood and that understanding can be shared.

To this point, the application was not modified: we were just investigating. Two specific areas became clear: the ecosystem had evolved since the app was first written, and there was significant technical debt to address.

An ecosystem that has evolved

A common challenge for any software is the moving ecosystem of tools, languages, frameworks, and platforms that it runs on. In the case of this application, the JavaScript landscape has changed almost beyond recognition. For example, whereas JQuery was the de facto standard for most web applications in the mid-to-late twenty-teens, now ECMAScript and CSS support has dramatically improved across most browser platforms.

And the integration story has changed too: JSON-everywhere, JSON Schema, JS on the server (Node), and security expectations are radically different.

Finally, the continuous integration (CI) and continuous deployment (CD) story has become a lot clearer as well. So from code, to graphics, to how the app is shipped is all up for a refresh.

Technical Debt

Technical debt comes in many forms: some deliberate decisions that are tradeoffs, and some resulting from unstructured application programming and architecture. Here are a few of the key items we came across that make an application like this difficult to sustain:

Lack of documentation. We started to address this during initial discovery.
Large, monolithic global scripts. Lots of global state, and multi-thousand line JavaScript files that are hard to reason about.
Tightly coupled architecture. For example, configuration for different integrations is tightly interwoven into general purpose application code.
Preponderance of global declarations of function and state. Shared state should be used very sparingly, not as a general practice.
High cognitive barriers based on coding style. For example non-strictness, callback hell, shared responsibility vs. single-responsibility, and code duplication, to name a few.
Loosely defined boundaries and interfaces. This comes up as we try to add tests at the 'seams' of the codebase.

Finally, there are issues around the build and delivery part of the application:

No dependency management
No automated build
No automated tests

Wrap it in Tests

In his classic 2004 work, Working Effectively with Legacy Code, Michael Feathers suggests that you add tests first and then make changes. Or put another way:

The challenge with changing existing code is to preserve the existing behavior. When code is not tested, how do you know you didn’t break anything?

You need feedback. Automated feedback is the best. Thus, this is the first thing you need to do: write the tests. Only then you’ll be safe to change the code and refactor. Your goal is to get there.

The point of the book is to show you how you can get there when you have to deal with an impossibly convoluted codebase.

Source

We put tests in place as soon as possible to help us characterize existing behavior. From the primary flows identified during discovery, we devised end-to-end and visual regression tests. These required us to add supporting interfaces to create seams that a test harness could sense from: we were starting to define the contracts within the codebase.

For tests, we use TypeScript and configure these compiler options to work with our JavaScript components:

"allowJs": true,
"checkJs": false,
"esModuleInterop": true,
"allowSyntheticDefaultImports": true,

We disabled checkJs by default and enabled it on individual scripts when possible.

With new components, we introduced a unit test framework, modules, and typing.

The end-to-end and visual regression tests provided high-level testing frameworks to capture existing behavior and changes to it. In our case, these were used to great effect around the parts of the code we needed to change but were harder to test more granularly.

End-to-end tests

Our end-to-end tests used Puppeteer to control a Chrome/Chromium browser, the Jasmine test framework, and Istanbul for code coverage reports. This was a simple stack to get up and running that we were familiar with. The downside has been poor integration with our IDE, so more recent all-in-one solutions (e.g., Cypress) are worth a look.

Visual regression tests

The visual regression tests were created with BackstopJS. For these to be most useful, we configure the tests to allow zero threshold for differences. The tests rely on scripts that use Puppeteer and Istanbul, much like our end-to-end stack. The scripts allow us to instruct the browser to collect code coverage information and set up the application for test scenarios. We use Puppeteer-to-Istanbul to convert the data collected from Puppeteer to the output format for Istanbul that’s then used to generate code coverage reports.

Automation and Reporting

We extended the official BackstopJS Docker image to incorporate environment-specific templatization of the test scenarios and code coverage report generation. The image adds the additional dependencies and changes the entry point to use our own script.

The Dockerfile extends the official BackstopJS image:

FROM backstopjs/backstopjs
ENV NODE_PATH=/app/node_modules
WORKDIR /app
COPY backstop.template.json .
RUN sudo chown node /app/
RUN npm install puppeteer-to-istanbul --silent && npm install nyc -g
WORKDIR /src
COPY start-backstop.sh /usr/bin
ENTRYPOINT [ "/usr/bin/start-backstop.sh", "--config=/app/backstop.json" ]

start-backstop.sh runs when the container is started:

#!/bin/sh
 
# Render the template
sed "s|<APP_URL>|$APP_URL|g" /app/backstop.template.json > /app/backstop.json
 
# Run backstop
backstop $@
backstop_return_code=$?
 
# Produce coverage report
npx nyc report --reporter=text --reporter=html --reporter=cobertura
 
# Preserve the backstop return code if it wasn't successful
exit $(($backstop_return_code == 0 ? $? : $backstop_return_code)

Evolving the Code

At some point we wanted to start making both fixes to the code, and to starting adding new features. We used the introduction of unit tests as an opportunity to move the code closer to modules, separating out the specific class into its own file.

Component Separation

By separating components into their own file, we improved the organization as well as make incremental improvements in greater isolation. From here the separated component takes two forms: a script module for tests and a global script for the application (concatenated in dependency order by a build task).

Compile time assurances via static analysis

As a dynamic language, JavaScript gives the unwary developer plenty of opportunities to introduct subtle runtime bugs. Coupled with an unfamiliar code base, we wanted to improve static analysis and our ability to get help from our code editors.

ESLint was added but had far too much noise to be helpful with the existing code due to the multitude of cross-file references (even before component separation) and non-explicit declarations.

To aid suggestions and checks in the IDE (VS Code here) we add JSDoc comments with types and descriptions. For example, here's a hint that the types for the some-model module or class were defined:

/// <reference path="models/some-model.d.ts" />

We imported components and their JSDoc typedefs from JavaScript (app):

/** @typedef {import('./components/some-thing')} SomeThing */

We imported components and their JSDoc typedefs from TypeScript (tests):

import SomeThing from './some-thing';

Further, we enable type checks for specific files and start to fill in the gaps, e.g., explicit declarations, incorrect typing, and so on.

// @ts-check

Ready to accelerate

The work up to this point lays the foundations for gradually accelerating change, and change with more confidence.

We:

Created space to automate broad test strategies: end-to-end and visual regression.
Established an incremental path for both old and new components to isolate and automate tests for. These test frameworks allowed us to improve our confidence in how the code behaves before and after we make changes; and provides the starting point to capture behavior we did not expect.
Added type definitions that have reduced our cognitive load and enabled us to more easily identify mistakes often the result of typos or ambiguous spaghetti objects.

What is next

We plan to continue with more incremental change on the path to a modernized web application. That includes an introduction of frameworks (application, state, and CSS) to enable more changes to the application’s architecture and appearance.