Musings of an Eon...

Understanding True Microservices

Published 2021 Oct 18 @ 17:08

When people say “microservices” I feel like they imagine a fleet of VMs or container pods running a bunch of small and “compartmentalized” applications that communicate with each other in vast and complex manners. But honestly, that’s not what microservices is supposed to be about. Companies like Netflix certainly popularized the concept of creating the smallest possible version of an application so that you have hundreds of unique services that all talk to each other in this manner. But that’s not the core concept behind microservices. A microservices architecture, or what I’ll call MSA for brevity, can be present even in a monolithic application. It’s antithesis, which I’ll reference as well, is the “single service architecture”, or SSA for brevity. In the context of this post, MSA good and SSA bad.

MSA can be summarized into the following high-level concepts:

  • Separation of responsibility
  • Encapsulation into APIs
  • Ownership of data

I’ll talk about each one below, but first let’s set up a theoretical application so that we can use concrete examples as we discuss these topics. Imagine a relatively simple chat application. You have users, which can have friends, and can belong to groups. There are two types of messages: direct messages, which are chats between exactly two users, and group messages, which are chats sent to a group rather than any specific user. And that’s it, at a high level. Now let’s talk about MSA concepts.

Separation of responsibility

This is the core concept of MSA. You might have heard of this phrase, especially in conjunction with Unix philosophy. It’s also tightly coupled with the concept of do one thing and do it well. In essence, you want to separate your code so that each “service” handles the responsibility of one thing and focuses solely on that.

In a “traditional” SSA, you typically have one code repo representing one application with one database and one database user that has “full access” to all the data. If you were building this in, say, Ruby on Rails then you might create a Rails app for the entire chat application. You’d have one database, with one set of database user credentials, and all of your routes, models, helpers, etc. would be contained in that one application. Although you might have a separation of layers, such as controllers and models, you don’t have separation of logical responsibilities.

One of the ways this lack of separation can be harmful is when you start repeating common scenarios in multiple locations within your code. You might, for example, have two places where you create and publish messages because you’ve created separate functions for messages depending on whether it’s a direct message or a group message. There isn’t anything inherently wrong with this but it does make future maintenance of the application significantly harder. That’s because you now have two places in your code where you need to keep some level of parity when making changes otherwise one flow might work and one flow might break.

MSA is about identifying unique areas of responsibility and separating them within your code. In our example, you’d be best served by separating code into responsibilities for users, friends, groups, and messages. Then you’d have one microservice for each of those. All of the source code remains inside your monolith, but you would at least have implicit separation of their functionality. For example, neither the user nor group source code should create and publish messages: they should both call into code that exists within the source code for managing messages. This separation helps reduce code duplication, improves testability, and will make it easier to grow and evolve the application.

One way to separate this responsibility is by using modules or namespaces. Some languages and frameworks may not immediately make it easy to create one folder for the user service, one for the friends service, and so on. Rails, for example, prefers separation by application layer (using MVC) and so in these cases you can instead opt for creating namespaces that help organize which service a particular piece of code belongs to. E.g., your user service code might be under the MyApp::Users::* namespace. So even if you have files spread across multiple directories shared by all of your services, it would probably end up looking something like the following:

- myapp
  - controllers
    - friends
    - groups
    ...
  - models
    - friends
    - groups
    ...
  ...

Separating code in this way helps show where the boundaries in your code are and makes creating and maintaining these “microservices” easier.

Encapsulation into APIs

If you’ve done any type of programming class you’ve probably heard of the word encapsulation. It’s especially prevalent in OOP (Object-Oriented Programming) languages. In OOP languages, the concept is generally conveyed as “group attributes and functions into a single class to encapsulate its behavior”. So a dog might have a breed, age, the ability to bark, and so on. All of these things are “unique” to a dog and so you encapsulate all of that into a Dog class (ignore class inheritance for this example). The same idea is applied to MSA.

But once you encapsulate all those traits and functions you need a way for other parts of your application to interact with that encapsulated service. You do that through an API. Basically everything you write in your application falls under the broad generlization of API which is an acronym for application programming interface.

In our case though, we’ll take this general concept into a more specific definition: an interface that adheres to a contract. In other words, you will provide a set of functions that accept certain inputs, perform specific actions, and generate certain outputs. As long as the inputs remain the same, the output should theoretically remain unchanged. There might be a few edge cases, such as when you are creating something using the same inputs but only the first attempt succeeds and any subsequent attempts fail. But overall, an API in the MSA context is a contract.

So you encapsulate all functionality for users, friends, groups, and messages into separate folders. Each of these folders represents a separate service: the user service, the friends service, the group service, and the message service. For each service you would create a file that contains your API for the other services/parts of your application to use. The user service, for example, might contain functions such as Users.verify that takes a JWT and returns a user identity on successful verification. And as long as the input remains constant, the outputs should reasonably remain constant.

This further increases testability. You could write all of your tests in such a way that a single flag can determine whether you stub those API modules to create an independent test suite of unit tests or whether you actually run through the other services and perform integration tests. Because the API modules adhere to contracts, you can reasonably assume that any changes to a single service will have no impact on the other services as long as the API remains the same. So instead of always running integration tests whenever changes are committed to the source code, you could intelligently make a determination on whether you need that extra level of testing or whether unit tests are sufficient.

In a monolithic application, where the source code resides in the same location, integration tests aren’t significantly more “expensive” to run than unit tests. However, it’s best to at least start thinking about the separation between integration tests and unit tests early on. If you ever did introduce external services to your solution, you might then need to deploy multiple services and ensure proper network configuration in order to run integration tests. This makes CI pipelines more complex and almost certainly starts adding noticeable delays to test suite resolution. If you want to avoid constantly running integration tests, especially for builds that aren’t going to a production environment, this use of MSA should improve your CI pipeline immensely.

Ownership of data

Part of the separation of responsibility should include being responsible for the data associated with the service. This means ownership. When you think of the user service, it should own all user data. No other service should be allowed to modify user data because no other service should realistically be responsible for user data. Other services might need to view user data to complete an operation, but no other service should ever need to modify user state through creation, updates, or deletions.

When I first got into the software development industry over a decade ago, I was exposed to a multi-services platform that talked to a single database but with different user credentials. So even though you might have users and groups tables in the same database, the user service would have credentials that gave it permission solely to the users table and the group service would have credentials that gave it permission solely to the groups table. Much like MSA for code can be applied as a set of principles within a monolithic application deployment, it’s possible to have a single database that follows MSA principles.

If you want to get really advanced, start looking into stored procedures for your SQL databases. Back when I first started, stored procedures were a big thing, but so was the practice of hiring dedicated database engineers (DBAs). These engineers would help us create stored procedures that could execute complex queries against tables that a set of credentials couldn’t directly access. Basically, instead of giving db_service_messages direct access to the users table, you would give it access to the proc_get_some_user_data stored procedure. This allowed us to audit the stored procedures for security, compliance, performance, etc. It was the maximum level of MSA in a monolithic database.

However, if it simplifies your development process, give all of your database credentials read-only access to the entire database. There’s nothing inherently wrong with this, and you don’t want to overly constrain yourself to complex solutions that might be unnecessary for your situation. And some databases, like MongoDB, don’t even have the concept of a stored procedure; they might have similar features, but they might not carry the same pros and cons, so starting out simpler and working toward more complexity is preferred here. Don’t rush into stored procedures unless you’re super familiar with that concept and managing your database.

Something that is not considered frequently enough during initial design is that there’s a very real possibility that some data should live in a completely different kind of data storage solution than the bulk of your data. For example, storing data about users and groups in PostgreSQL is great, but what if your messages are ephemeral? It doesn’t make sense to add them to a traditional RDBS like PostgreSQL if you’re just going to delete them after they’ve been delivered/read by the recipient(s). You’re better off using a message queue like RabbitMQ or Kafka or NATS.

And even if you did archive every message for a long period of time, you might still want to use a message queue as your primary data storage/delivery solution. Then you would add on a process that dumps all messages into a more durable data storage solution. Or maybe you allow your chat to send pictures and you host a copy of those pictures yourself; you definitely shouldn’t dump those into PostgreSQL, use something like an S3 or GCS bucket. In that case you’d want an image service to handle that type of thing.

Ultimately, different services might need to store data in drastically different manners in order to optimize for both efficiency and cost. Ensuring that other services can’t/don’t attempt to directly read or write the data owned by other services allows developers to customize the data storage solution per problem domain rather than per application. It also allows an easier evolution of data requirements and their solutions.

Start simple, “upgrade” as desired

1. Monolith: Implicit Separation

When trying to understand MSA concepts, it’s usually better to start simple. Create one “umbrella application” that contains multiple application services. The code can talk directly to one another so it will greatly simplify how you write your entire application stack. Configuration is also simpler at this stage. Just focus on grouping code into separate services within the monolith and exposing those services via a single API. If you want to go the full distance, also set up multiple database user profiles so you can separate which services can have access to which parts of your database.

Basically, create your monolithic application like you normally would. But try to encapsulate your different “services” and expose them through a single API. Doesn’t matter where the source files are or how the code is structured, just try to do that in the beginning. If you want to start off a little harder than that, maybe because you’re creating a new project from scratch rather than applying MSA to an existing project, then you can try going start to the next step.

2. Monolith: Internal Libraries

Once you’ve got a monolith working, you can see about creating sub-directories within the monolith to create code libraries. Most package management tools will allow you to specify locally-hosted packages within the root application directory. This step allows you to start physically separating your code and gets you thinking about things like versioning without actually worrying about versioning.

The big benefit here, however, is that you have a more solid boundary between your services. New and experienced developers alike will be able to instantly recognize the separation between responsibilities within the application. When changes need to be made, they can more quickly find where the changes need to go. When testing needs to be done, they can constrain themselves to locally testing only the services that had changes. And you’re getting closer to pulling those sub-directories into completely separate projects.

3. Monolith: External Libraries

Once you have “internal libraries”, try pulling one of those services into a separate project entirely. This will be your first external library. Most package management tools will let you specify private repositories so don’t worry about where you will host the package, just put it into source control somewhere.

At this point, you can still consider things to be a monolithic application. But instead of having all code reside in one repository, you’re pulling in other services as external packages. This is what I would call the first “hard boundary” between multiple services. But ultimately, all the code gets pulled into the umbrella application so it’s almost like having a monolith, just with having the different folders now hosted in different code repos, and downloaded into your project by a package management system. You’ll also start getting a chance to really work with and understand best practices for versioning.

Bonus Tip: To make things simultaneously easier and harder, you can use git submodules if you’re using git to host your source code. It’s basically a way to create a symlink within your application directory, but instead of linking to a local folder it links to a git-hosted repo. This is easier because you aren’t using package management tools to “import” these libraries into your application. This is also harder because you have to deal with git submodule, which most people agree is its own headache.

I would argue against git submodules personally, unless you are using a programming language that doesn’t have a decent package management system and so hosting code in separate repositories would make it absurdly difficult to pull them in for testing and deployment. In that case, sure, use git submodules, it’s not the worst thing ever. But it’s not pleasant, either.

4. Monolith + 1: Almost There

Once you’ve separated all (or most) services into external libraries, you can try turning one of those libraries into a standalone application. Start with just one, the simplest one if possible, and get that working 100% before moving onto the other services. You will learn lots during this process and getting as close to perfection with your deployment pipelines and network configuration at this early stage will save you a ton of headache later.

I won’t list everything here, but at a high-level these are the kinds of problems you’ll need to start gathering solutions for if you haven’t already:

  • Message protocol. REST? GraphQL? gRPC? How will your services talk to each other?
  • Client code generation. Do not manually create and maintain client libraries. Find tools that will maintain those libraries based on specifications that are stored alongside each service.
  • Configuration management. Do you need common configuration sync’d across your services? How will you accomplish that? Redis? etcd? How will you allow deployments to modify key configuration components?
  • Monitoring. You will need deep insight into how data flows between services. This helps with debugging, auditing, performance tuning, etc.

Once you’ve figured out all of that and more, and you’ve got your second application deployed successfully, then you’re “done”. Obviously you aren’t completely done; if you have more than two services in your architecture, you can keep going with the other services now. And you can, and should, continue to fine-tune and improve aspects of the system as a whole. But effectively you’re done with the hardest part.

5. Beyond

You might have already done some of this in the steps leading up to here.

Containerization is a big thing I would suggest if you aren’t doing that already. Even if you’re deploying straight to VMs and not using any container orchestration like Kubernetes, containerization is still the next step, hands down.

If you have many services talking to each other, like more than 2 or 3, start thinking about service meshes. Especially if the idea of this new architecture is to allow developers to quickly and easily create entirely new services. A proper service mesh should help manage communication and monitoring of your system while potentially unlocking some common MSA behaviors, such as short-circuiting traffic and running blue-green deployments.

Start thinking about major version upgrades, the kind that break stuff. Remember how we said that our API is a contract? Well, you can’t just change the terms of the contract whenever you want. But at some point you’ll have some function or set of functions that is just not appropriate for continued use for whatever reason. This means putting out a major version upgrade and hosting two versions of that service, making sure all other services switch over to the new version, ensuring nothing is broken, and then finally decommissioning the old version. It seems like something that would be easy, but it typically is not.

Hope that helps

And that’s about it. There’s so much more that could be said on this topic, but I hope this helps people understand the underlying principles behind a microservices architecture. It’s not all about deploying hundreds of tiny applications. In fact, I’d argue there are plenty of businesses that never need to move beyond step 2 or 3, and that’s totally alright.