1 - Unveiling Architectures: Static Benchmarking in Practice (Part 1 - Hexagonal Architecture)

First step of our systematic technology evaluation for EventStream AI Monitor. We dive into static benchmarking of the Hexagonal Architecture, examining code quality, structure, and revealing real issues found by type checking before runtime.

EVENTSTREAM BUILD LOG

Fernando Magalhães

3/17/20264 min read

Introduction

How do you compare software architectures objectively? It's a challenge many teams face. In this series, we're taking a hands on approach with the EventStream AI Monitor, a platform designed for intelligent event processing in distributed systems. Rather than relying on abstract concepts or gut feelings, we're grounding our decisions in hard data. Our methodology? A phased, systematic evaluation where we fix most variables and isolate the technology under test such as architecture, database, or framework to truly understand its characteristics.

Today, we're sharing the results from the very first phase: the static benchmarking of our Hexagonal Architecture.

EventStream AI Monitor is an intelligent layer for monitoring distributed systems. It receives events, applies AI for classification and summarization, and triggers automated actions based on rules. Given the critical nature of such a system, choosing the right foundational technologies is paramount.

Our approach is systematic and structured in multiple phases:

Phase 1: Compare architectural styles (Hexagonal, Clean, Onion) using fixed technologies: FastAPI and PostgreSQL.
Phase 2: Compare databases (PostgreSQL, MongoDB, SQLite) using the selected architecture, with FastAPI and Kafka fixed.
Phase 3: Compare messaging systems (Kafka, RabbitMQ, Local Queue) using the selected architecture and database, with the framework fixed.
Phase 4: Compare frameworks (FastAPI, Django, Flask) using the selected architecture, database, and messaging system.
Phase 5: Compare AI integration methods (Hugging Face API, Local Models) using the fully defined stack.

By fixing variables like the framework and database in Phase 1, we ensure that any observed differences are caused by the architectural structures themselves, rather than side effects from other components. This isolation is essential for making fair and reliable comparisons.

Context: EventStream AI Monitor & Systematic Evaluation

Phase 1 - Architecture Comparison: Hexagonal Architecture

We began by implementing the core functionality receiving an event, validating it, processing it, and storing it using the Hexagonal Architecture. This involved creating domain entities, application use cases, ports (interfaces), and adapters for input (API) and output (database). Comprehensive unit tests were written for each layer to ensure correctness.

Static Benchmarking: What Is It and Why Does It Matter?

Static benchmarking analyzes the source code itself, without running the application. It's distinct from dynamic benchmarking, which measures performance metrics like response time or throughput under load. Static analysis is invaluable early in the development cycle because it provides insights into:

Code Quality: Is the code well-structured and readable?
Maintainability: How easy will it be to modify or extend the codebase?
Testability: Does the architecture facilitate testing?
Potential Issues: Can we spot bugs or inconsistencies before they cause runtime failures?

We used several tools to perform this static analysis on our Hexagonal Architecture implementation:

pytest-cov (Code Coverage): Measures the percentage of code executed by tests.
radon (Complexity & Maintainability): Calculates cyclomatic complexity (number of independent paths) and maintainability index.
ruff (Linting): Checks for stylistic issues and potential bugs based on coding standards (like PEP 8).
mypy (Type Checking): Validates type annotations and identifies type-related errors.
pipdeptree (Dependency Tree): Shows the hierarchy of external Python package dependencies.
pydeps (Dependency Graph): Visualizes the internal dependency structure of our modules.

Results from the Hexagonal Architecture (Version 1 - Unit Tests)

Here's a summary of the initial findings from our static benchmarking suite:

Code Coverage: Overall coverage was 67%. While the core domain logic (event.py, process_event_use_case.py) showed good coverage (87% and 100% respectively), the main application entry point (main.py, 0%) and the database repository (postgresql_event_repository.py, 46%) had room for improvement. This is expected initially, as main.py often contains setup code not covered by unit tests, and repository coverage can be trickier without integration tests.
Complexity: The average cyclomatic complexity across all analyzed blocks (functions, methods, classes) was very low: 1.52. This indicates a well-structured codebase with relatively simple functions and methods. The most complex function was Event.__post_init__ with a complexity of 5, but the rest were 3 or lower.
Maintainability: Generally high maintainability scores were observed. Files like main.py, event.py, and postgresql_event_repository.py scored slightly lower (still good, around 80-62) compared to others scoring near 100, reflecting areas with slightly more complexity or lower test coverage.
Linting: The ruff tool reported all checks passed, confirming that the code adheres well to style guidelines.
Type Checking (mypy): This was perhaps the most revealing part. mypy identified 10 specific type-related errors in 3 files:
- event_model.py: Errors related to how SQLAlchemy's Base class is interpreted by mypy.
- postgresql_event_repository.py: Multiple errors showing a mismatch between SQLAlchemy Column types (e.g., Column[str]) being passed where the domain Event entity expects primitive types (e.g., str). This highlights a common issue in mapping between ORM representations and domain entities.
- main.py: An error regarding an incompatibility between async_sessionmaker and sessionmaker types used for dependency injection into the repository.
- These errors are critical because they represent potential runtime issues that static analysis caught before the code was even executed.
Dependencies: The pipdeptree output showed the expected external library dependencies (FastAPI, SQLAlchemy, Pydantic). The pydeps graph visually confirmed the layered structure of the Hexagonal Architecture, with clear dependencies flowing from adapters towards core and domain.

Insights and Lessons Learned (from Version 1)

The unit tests provide a solid foundation, though coverage in infrastructure layers needs attention.
The code exhibits low complexity and high maintainability overall, which is excellent for long-term health.
The mypy errors are a significant finding. They demonstrate that static type checking is a powerful tool for catching real bugs early in the development process, especially concerning data mapping and dependency injection.
The dependency graph confirms that the architectural structure is being correctly implemented.

Next Steps

Fix mypy Errors: Address the type-related issues identified, particularly the ORM-to-domain mapping problems.
Implement Integration Tests: Expand our test suite to cover interactions between layers (e.g., API to Use Case to Repository).
Execute Dynamic Benchmarking: Once the core logic is stable, we'll move to measuring performance metrics like response time and throughput under simulated load.
Repeat the Process: Apply the same static and dynamic benchmarking process to the Clean and Onion Architectures.
Compare and Decide: Analyze the results from all three architectures to make an informed decision for Phase 2.

Explore the Code & Results

If you'd like to see the detailed benchmark results, graphs, and diagrams referenced in this article, you can explore the project repository: EventStream AI Monitor repository

Conclusion

This initial foray into static benchmarking for the Hexagonal Architecture has been highly informative. It provided concrete metrics on code quality and structure and, importantly, uncovered real type-related issues before any runtime execution. This process exemplifies our commitment to making technology choices based on evidence, not just intuition. Stay tuned as we continue this journey, implementing and evaluating the other architectural candidates.