A Set of Topics to Follow when Developing Distributed Software Systems

This is the first step in a collection of topics and how-to's on software development. This blog post describes the concept of a Walking Skeleton in software development.

Why the Walking Skeleton Approach?

Building distributed software systems is inherently complex. Unlike monolithic applications where components communicate through in-memory calls, distributed systems introduce network communication, partial failures, eventual consistency, and coordination challenges that can catch even experienced developers off guard.

The Walking Skeleton approach serves as a practical foundation - a set of battle-tested practices that help teams avoid common pitfalls and build systems that are reliable, maintainable, and scalable from day one. Rather than learning these lessons through painful production incidents, teams can proactively adopt these patterns and processes.

The goal isn’t perfection from the start, but establishing a foundation that supports iterative improvement and reduces the likelihood of architectural debt that becomes expensive to fix later.

What is a Walking Skeleton

A Walking Skeleton, as described by Alistair Cockburn, is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The Walking Skeleton grows to become the whole system - hence the metaphor of a skeleton that “walks” through the entire application lifecycle.

The concept, also discussed in “The Pragmatic Programmer,” emphasizes starting with the smallest possible implementation that touches all major components of your system. This approach allows you to:

Validate your architectural assumptions early
Establish your development workflow and toolchain
Create a foundation that the entire team can build upon
Identify integration points and potential issues before they become expensive problems
Demonstrate progress to stakeholders with working software

In the context of distributed systems, a Walking Skeleton becomes even more critical because it forces you to address cross-cutting concerns like service communication, deployment pipelines, monitoring, and configuration management from day one. Rather than building these capabilities as an afterthought, you integrate them into your development process from the very beginning.

For this phase, I’ll use a Java-based stack with Spring Boot and Gradle as our example, though the principles apply to any technology stack.

Target System Types

This approach focuses on distributed software systems - applications composed of multiple services that communicate over a network to deliver business value. This includes:

Microservices architectures: where business capabilities are decomposed into independent services
Event-driven systems: that react to and produce events across service boundaries
Cloud-native applications: designed to leverage cloud platform capabilities
API-first systems: where services expose well-defined interfaces

These systems share common characteristics: they’re deployed across multiple processes/machines, they communicate over unreliable networks, they need to handle partial failures gracefully, and they require coordination between autonomous components.

The practices in this guide apply whether you’re building a greenfield system or evolving an existing monolith towards a more distributed architecture.

Key Architectural Decisions

Monolith First Pattern

When building distributed systems, resist the temptation to immediately decompose into microservices. As Martin Fowler describes in his MonolithFirst approach, start with a well-structured monolith that can be decomposed later when you better understand the domain boundaries.

Why Monolith First?

Domain understanding: You don’t know the right service boundaries until you understand the problem domain
Reduced complexity: Avoid distributed system complexity whilst learning the business domain
Faster iteration: Changes across service boundaries are expensive in distributed systems
Easier refactoring: Moving code within a monolith is simpler than redefining service contracts

Implementation approach:

Design your monolith with clear module boundaries
Use packages or modules that could become services later
Avoid shared databases between modules
Define clear interfaces between modules
Test module boundaries with architecture tests (ArchUnit)

Critical Anti-Pattern: Distributed Monolith

Never build a distributed monolith - this combines the worst aspects of both architectures:

High coupling: Services that must be deployed together
Shared databases: Multiple services accessing the same database tables
Synchronous communication: Services blocking on each other for every operation
Distributed transactions: Coordinating transactions across service boundaries

Warning signs of a distributed monolith:

Services that always deploy together
Cascading failures when one service is down
Database tables accessed by multiple services
Long chains of synchronous service calls
Shared libraries containing business logic

Git Repository with Cloud Hosting

Setup

Create a GitHub repository for our project. Initialize it with a basic Spring Boot project structure using Spring Initializr (or some template project that you might already have).

# Example project structure
my-distributed-app/
├── .github/
│   └── workflows/
├── src/
│   └── main/java/com/example/app/
├── build.gradle
├── gradle.properties
└── README.md

Key practices

Setup a branching model (trunk based development works wonders; others can be considered as well)
Use conventional commit messages (e.g., “feat:”, “fix:”, “docs:”)
Set up branch protection rules requiring PR reviews
- No direct commit or push to main
- Only merge commits to main
Configure repository settings for security (dependency alerts, secret scanning)

CI/CD Pipeline with GitHub Actions

Setup

Create .github/workflows/ci.yml that runs on every push and pull request.

# Example GitHub Actions workflow
name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'
      - run: ./gradlew test
      - run: ./gradlew build

Key features

Automated testing on every commit
Build artifact creation
Docker image building and pushing
Deployment to staging environments
Using latest version of Java that has long term support (at the time of this writing)

Local Development Environment

Ensure consistent development environments across the team.

Setup

Tools and practices:

Java SDK: Use SDKMAN! for managing Java versions: sdk install java 21.0.1-tem
IDE: IntelliJ IDEA Ultimate or VS Code with Java extensions
Docker: For running dependencies locally (databases, message queues)
Docker Compose: Define development stack in docker-compose.yml

# Example docker-compose.yml for local development
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: dev
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"

Code Quality and Coverage Monitoring

SonarCloud for cloud-based analysis, or SonarQube for self-hosted.

Setup with SonarCloud

Connect your GitHub repository to SonarCloud
Add Gradle plugin to build.gradle:

plugins {
    id 'org.sonarqube' version '4.4.1.3373'
    id 'jacoco'
}

jacocoTestReport {
    reports {
        xml.enabled true
    }
}

Integration

Add SonarCloud analysis to your GitHub Actions workflow.

Linting and Code Style

Tools

Checkstyle
SpotBugs
Google Java Format

Setup in Gradle:

plugins {
    id 'checkstyle'
    id 'com.github.spotbugs' version '5.2.5'
    id 'com.diffplug.spotless' version '6.23.3'
}

spotless {
    java {
        googleJavaFormat()
        removeUnusedImports()
        trimTrailingWhitespace()
        endWithNewline()
    }
}

Integration

Run checks in CI pipeline
Configure IDE to use the same formatting rules
Set up pre-commit hooks with tools like pre-commit or Husky

AI Coding Assistant Adoption

Tools

GitHub Copilot
Claude Code etc.

Best practices

Establish team guidelines for AI assistant usage
Review AI-generated code carefully, especially for security implications
Use AI assistants to generate boilerplate code, tests, and documentation
Train team members on effective prompt engineering

Example usage scenarios

Generating Spring Boot controller boilerplate
Creating unit test templates
Writing API documentation
Generating Gradle build configurations
Reviewing pull requests

Integration considerations

Ensure AI-generated code follows your established patterns and style
Review AI suggestions for potential security vulnerabilities
Document any specific AI tools and configurations used by the team

Testing Strategy

A comprehensive testing strategy is crucial for the Walking Skeleton phase. It establishes the foundation for confidence in your code and enables rapid development without sacrificing quality.

Unit Tests

JUnit 5 is the modern standard for Java unit testing, providing powerful features for test organisation and execution.

Setup in Gradle:

dependencies {
    testImplementation 'org.junit.jupiter:junit-jupiter:5.10.1'
    testImplementation 'org.mockito:mockito-core:5.8.0'
    testImplementation 'org.mockito:mockito-junit-jupiter:5.8.0'
    testImplementation 'org.assertj:assertj-core:3.24.2'
}

test {
    useJUnitPlatform()
    testLogging {
        events "passed", "skipped", "failed"
    }
}

Example Unit Test:

@ExtendWith(MockitoExtension.class)
class UserServiceTest {

    @Mock
    private UserRepository userRepository;

    @InjectMocks
    private UserService userService;

    @Test
    @DisplayName("Should create user with valid data")
    void shouldCreateUserWithValidData() {
        // Arrange
        User user = new User("john.doe", "john@example.com");
        when(userRepository.save(any(User.class))).thenReturn(user);

        // Act
        User result = userService.createUser("john.doe", "john@example.com");

        // Asser
        assertThat(result.getUsername()).isEqualTo("john.doe");
        assertThat(result.getEmail()).isEqualTo("john@example.com");
        verify(userRepository).save(any(User.class));
    }
}

Integration Tests

TestContainers enables testing with real databases and external services without requiring complex setup.

Setup Dependencies:

dependencies {
    testImplementation 'org.testcontainers:junit-jupiter:1.19.3'
    testImplementation 'org.testcontainers:postgresql:1.19.3'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}

Example Integration Test:

@SpringBootTest
@Testcontainers
class UserRepositoryIntegrationTest {

    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:15")
            .withDatabaseName("testdb")
            .withUsername("test")
            .withPassword("test");

    @Autowired
    private UserRepository userRepository;

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }

    @Test
    void shouldPersistAndRetrieveUser() {
        // Given
        User user = new User("jane.doe", "jane@example.com");

        // When
        User saved = userRepository.save(user);
        Optional<User> retrieved = userRepository.findById(saved.getId());

        // Then
        assertThat(retrieved).isPresent();
        assertThat(retrieved.get().getUsername()).isEqualTo("jane.doe");
    }
}

LocalStack for AWS Services

LocalStack provides local emulation of AWS services for testing cloud integrations.

Docker Compose Setup

# Add to docker-compose.yml
services:
  localstack:
    image: localstack/localstack:3.0
    ports:
      - "4566:4566"
    environment:
      - SERVICES=s3,sqs,sns,dynamodb
      - DEBUG=1
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"

Integration with TestContainers

@SpringBootTest
@Testcontainers
class S3ServiceIntegrationTest {

    @Container
    static LocalStackContainer localstack = new LocalStackContainer(DockerImageName.parse("localstack/localstack:3.0"))
            .withServices(LocalStackContainer.Service.S3);

    @Test
    void shouldUploadAndDownloadFile() {
        // Configure S3 client to use LocalStack
        S3Client s3Client = S3Client.builder()
                .endpointOverride(localstack.getEndpoint())
                .credentialsProvider(StaticCredentialsProvider.create(
                    AwsBasicCredentials.create("test", "test")))
                .region(Region.US_EAST_1)
                .build();

        // Test S3 operations
        s3Client.createBucket(CreateBucketRequest.builder()
            .bucket("test-bucket")
            .build());

        // Verify bucket creation and file operations
        assertThat(s3Client.listBuckets().buckets()).hasSize(1);
    }
}

Testing Best Practices

Test Organisation

Keep unit tests fast (< 1 second each)
Use descriptive test names with @DisplayName
Follow the Arrange-Act-Assert pattern
Use TestContainers for integration tests requiring external dependencies

CI/CD Integration

Run unit tests on every commit
Run integration tests in pull request pipelines
Generate test coverage reports with JaCoCo
Fail builds on test failures or coverage drops

Architecture Testing with ArchUnit

ArchUnit enables testing of architectural rules and constraints as part of our automated test suite. This ensures that our codebase adheres to architectural principles and prevents architectural drift over time.

Setup Dependencies

dependencies {
    testImplementation 'com.tngtech.archunit:archunit-junit5:1.2.1'
}

Example Architecture Tests

@AnalyzeClasses(packages = "com.example.app")
class ArchitectureTest {

    @ArchTest
    static final ArchRule services_should_only_be_accessed_by_controllers =
        classes().that().resideInAPackage("..service..")
            .should().onlyBeAccessed().byAnyPackage("..controller..", "..service..");

    @ArchTest
    static final ArchRule repositories_should_only_be_accessed_by_services =
        classes().that().resideInAPackage("..repository..")
            .should().onlyBeAccessed().byAnyPackage("..service..");

    @ArchTest
    static final ArchRule controllers_should_not_access_repositories_directly =
        noClasses().that().resideInAPackage("..controller..")
            .should().accessClassesThat().resideInAPackage("..repository..");

    @ArchTest
    static final ArchRule entities_should_not_depend_on_services =
        noClasses().that().resideInAPackage("..entity..")
            .should().dependOnClassesThat().resideInAPackage("..service..");

    @ArchTest
    static final ArchRule services_should_be_named_correctly =
        classes().that().resideInAPackage("..service..")
            .and().areNotInterfaces()
            .should().haveSimpleNameEndingWith("Service");

    @ArchTest
    static final ArchRule controllers_should_be_annotated_with_rest_controller =
        classes().that().resideInAPackage("..controller..")
            .should().beAnnotatedWith(RestController.class);
}

Custom Architecture Rules

@AnalyzeClasses(packages = "com.example.app")
class CustomArchitectureTest {

    @ArchTest
    static final ArchRule no_spring_framework_in_domain =
        noClasses().that().resideInAPackage("..domain..")
            .should().dependOnClassesThat().resideInAnyPackage("org.springframework..");

    @ArchTest
    static final ArchRule use_slf4j_for_logging =
        noClasses().should().accessClassesThat()
            .resideInAnyPackage("java.util.logging..", "org.apache.log4j..")
            .because("Use SLF4J for logging instead");

    @ArchTest
    static final ArchRule configuration_classes_should_be_in_config_package =
        classes().that().areAnnotatedWith(Configuration.class)
            .should().resideInAPackage("..config..");
}

Integration with CI/CD

Architecture tests run as part of our regular test suite and will fail the build if architectural rules are violated, ensuring consistent adherence to our design principles.

This testing strategy ensures that our Walking Skeleton is robust from the beginning and provides confidence for rapid iteration.

Development Practices

Establishing solid development practices from the beginning ensures team alignment and code quality throughout the project lifecycle.

Trunk-Based Development

Adopt trunk-based development as your branching strategy:

Main branch: All developers commit to a single main branch frequently
Short-lived branches: Feature branches should live for less than a day
Feature flags: Use feature toggles for incomplete features rather than long branches
Continuous integration: Every commit triggers automated testing

Benefits

Reduces merge conflicts and integration problems
Enables rapid deployment and rollback
Forces good testing practices
Improves team collaboration

Environment Strategy

Establish three core environments with clear purposes:

Development:
- Individual developer environments (local + shared dev)
- Latest code changes
- Mock external dependencies
Staging:
- Production-like environment
- Integration testing with real dependencies
- Pre-production validation
Production:
- Live system serving users
- Monitoring and alerting fully operational
- Automated deployment and rollback capabilities

Performance Testing Environment

Create a dedicated environment that mimics production:

Infrastructure: Same instance types and network configuration
Data volume: Production-like data sets for realistic testing
Load patterns: Representative user traffic simulation
Monitoring: Same observability stack as production

Code Review Practices

Implement comprehensive code review processes:

Human Reviews:

All code must be reviewed before merging
Focus on logic, maintainability, and architectural alignment
Review both production and test code with equal rigour

AI-Assisted Reviews:

Use tools like GitHub Copilot for initial code suggestions
Leverage AI for identifying potential bugs and security issues
Maintain human oversight for architectural decisions

Test Code Quality

Treat test code as production code:

Apply the same quality standards to test code
Refactor test code to eliminate duplication
Use descriptive test names and clear assertions
Maintain test code with the same rigour as application code

Testing principles:

Tests should be independent and repeatable
Use clear arrange-act-assert structure
Mock external dependencies appropriately
Maintain good test coverage without obsessing over percentages

Commit and Collaboration Practices

Establish clear guidelines for code commits and team collaboration:

Commit Templates

Ensure work item numbers are referenced in every commit (Jira task, GitHub issue etc.)
Use consistent commit message formats for better traceability
Include context about why changes were made, not just what changed

Pair Programming and AI Assistance

Use Co-authored-by when doing pair programming or using an AI coding assistant
Give proper attribution to all contributors in commit messages
Document AI tool usage for transparency and learning

Semantic Versioning

Use semantic versioning as described in SemVer
Follow MAJOR.MINOR.PATCH versioning convention
Increment versions based on the nature of changes (breaking, feature, fix)

This foundation ensures that our Walking Skeleton not only works but can evolve sustainably as the system grows.