Notes
-
Heya everyone, my name is Azriel, and today I'll be showing you my automation side project, called Peace.
-
Peace is a framework to create user friendly automation.
-
It provides a set of constraints and common functionality, that when you write code according to these constraints, and plug them into the common functionality, you get a nice tool.
-
As with every side project, there is an origin story.
-
In my first job, a team of us were given the task of fully automating our solution deployment.
-
So this was the deployment process, it's all manual.
-
and, obviously this is the solution. Genius.
-
We wanted this process: click, wait, done.
-
And through engineering eyes, we aimed for:. (linking: Dimensions)
-
End-to-end automation.
-
Repeatable correctness.
-
and Performance.
-
and we delivered!
-
If you measure success using these metrics, it was undeniable.
-
We reduced the deployment duration down from 2 weeks of manual steps, to 30 minutes.
-
However, our users said "we hate this!". Azriel, this doesn't enhance our lives.
-
What we really did when we introduced automation, was this.
-
When switching from manual steps to automation, the work changes from doing each step at your own pace, to setting up the parameters for all of the steps, pressing go, and waiting.
-
When it's done, you check if your parameters were correct.
-
If they're correct, that's fine.
-
If they weren't, then you had to understand the error, figure out which step it came from, and which parameters feed into that step.
-
Then when the parameter is fixed, which may take 30 seconds, they still had to wait 30 minutes to confirm if their fix worked,
-
and this delayed feedback loop was frustrating.
-
For new users, it was especially painful:
-
We're telling them to fill in parameters that they don't understand,
-
to feed into a process that they cannot see,
-
to create an environment, that they cannot visualize.
-
So they may not have understood what they were doing, but it was certainly our fault.
-
We created pain.
-
We had engineering eyes, but not human eyes:
-
We took away understandability,
-
We took away control.
-
And when you take away understanding and control, you inadvertently also take away morale. What little they had.
-
Ideally we should have built something that provides the benefits of automation,
-
while retaining the benefits of manual execution.
-
This is what the Peace framework aims to do.
-
And today I'd like to show you how it does this, through a tool built using the Peace framework.
-
It's called
envman
, short for environment manager. -
envman
automates the download of a web application from github, creates some resources in Amazon, and uploads the web application. -
Notably there's a missing step to launch a server that runs the web application, but I'm all out of AWS credits. So if you have spare, I'd gladly take them.
-
The first thing we took away was understandability, so let's put that back.
-
There are two ways we tend to write automation:
-
Either we produce too little information, and we can't tell what's going on,
-
or, we produce too much information, and we still can't tell what's going on.
-
For understandability, we need to have something in between.
-
Let's take a look.
-
This is what it looks like when you have too little information:
-
clear
,./envman deploy --format none
, clean. -
Something's going on, I promise!
-
This is what it looks like when you have too much information:
-
./envman deploy --format json
, clean. -
And finally, something in between.
-
See if you can see how many steps there are in this process, and whether they complete successfully:
-
clear
,./envman deploy
. -
How many steps are there? gesture
-
Did every step complete successfully?
-
"Green means good", so I believe so.
-
What resources were created? gesture
-
And if we clean up the environment, you'll see a similar interface, so you can tell that each resource is deleted:
./envman clean
. -
That's all good when things go well, but what happens in a failure? Can we understand it?
-
First we'll limit the connection speed of the tool to 40 kilobits per second:
New-NetQosPolicy ` -Name "envman" ` -AppPathNameMatchCondition "envman.exe" ` -PolicyStore ActiveStore ` -ThrottleRateActionBitsPerSecond 40KB
-
and run the deployment again:
clear
,./envman deploy
. -
You can see that our download from github has slowed,
-
and in a little while we should see an error happen.
-
Here we go.
-
With fresh eyes, can you see which step went wrong?
-
Red means bad, so it should be apparent.
-
In detail, what went wrong, why it went wrong, and how to recover, are all shown.
-
We failed to upload the object. Why? The upload timed out, and make sure you are connected to the internet and try again.
-
We're also shown which resources exist and which don't, so we don't have to guess.
-
If we fix our connection, and re-run the automation:
Remove-NetQosPolicy ` -Name "envman" ` -PolicyStore ActiveStore ` -Confirm:$false
-
You'll see that it picks up where it left off, and completes the process.
-
That is, what you think it should do, it does. No surprises.
-
So in summary, with information, the goldilocks principle applies:
-
Too much information is overwhelming, too little is not useful, and there's some middle ground which is just right.
-
The Peace framework generally tries to fit the most relevant information on one screen.
-
The second thing we took away, was control.
-
Most automation tools give you one button -- start -- and that's it.
-
Start the creation, or update, and start the deletion.
-
While pressing start is not difficult, knowing whether the automation will do what we think it will, before we press start, is difficult.
-
What we should understand before starting anything, is:
-
Where we are -- our current state,
-
Where we want to go -- our goal state, and
-
The distance between the two.
-
Because if we start with nothing, and end up with something, the distance is something.
-
And if we start with something, and our goal state is something, the distance is nothing.
-
And if we start with something, and our goal state is something else, the distance is that else
-
When we understand these three things, then we can make an informed decision if we should press go.
-
Now, if we press start, and change our mind, can we stop the process?
-
Without automation, we can.
-
Like, if someone said, "Azriel! Stop work."
-
I'd say, "Gladly." I can stop where I am.
-
With automation, you need to intentionally build interruptibility into the process.
-
And while pressing Ctrl C on a command line tool is one form of interruption,
-
what we really care about, is safe interruption.
-
i.e. Stop what you're doing when it is safe to do so.
-
Maybe we're at step 5 of a 10 step process, and we want to adjust the parameter for step 7.
-
If we can interrupt the process, adjust the parameters, press go, and have the automation pick up where it left off, that would be great.
-
As in, don't undo all of the work you've already done to get to this point.
-
I just want to fix the parameter for the later step, and continue.
-
Let's see all of this control, in action.
-
Before we run our deployment, what is our environment's current state.
-
Just like we can run
git status
, we can also run./envman status
. -
What state will the automation bring our environment to, when we run it?
./envman goal
-
What's the difference?
./envman diff
-
The commands are intentionally similar to
git
commands so we make use of familiar names. -
And for interruptibility, when we deploy, we'll stop the process halfway.
-
./envman deploy
, ctrl c. -
Here you can see steps 1 through 3, and step 5 were complete,
-
and step 4 and 6 were not started due to the interruption.
-
If we look at the diff:
./envman diff
, -
you can see that steps 1, 2, 3, and 5 are done, steps 4 and 6 haven't been executed.
-
If we change our parameters, to using version 0.1.2 instead of 0.1.1 of our web application,
-
the diff will now show that step 1 will change.
-
And if we run deploy again, that is exactly what happens.
-
When cleaning up, we can also interrupt the process.
-
Steps 1, 4, 5, and 6 were cleaned, and 2 and 3 were not.
-
And we can choose to either deploy the environment again, or clean up fully.
-
Let's deploy it to completion.
deploy
,clean
. -
What's the use of this?
-
Well there was once we were told,
-
"hey this customer doesn't need their environment anymore, you can delete it."
-
"You sure?"
-
"Yes."
-
So we started the deletion process, and we got this "Hey stop. Stop what you're doing."
-
"We can't. It's all just going to go."
-
And that was the beginning of a very exciting day.
-
So, build a stop button into your automation people.
-
If you use Peace, it is built in for you.
-
We've given back to the user some control, but there are other things still to be implemented like running a subset of the process.
-
Not too hard to implement, just needs time.
-
Morale.
-
Not everyone who uses automation tools has a software background, and not everyone uses the command line all the time.
-
So why not create something that caters for these situations as well?
-
Back to understandability, normally when explaining what automation does,
-
we tend to draw a diagram on the whiteboard,
-
or create a diagram in an internal documentation site.
-
However, it's never really accurate, and it's usually a tangle of overlapping boxes and lines,
-
so it is hard to understand, because the information isn't clear.
-
So here's a web interface.
./envman web
-
Based on the code written for your automation, two diagrams are generated:
-
The one on the left is called the Progress diagram, which shows the steps in your process,
-
and the one on the right is the Outcome diagram, which shows what the deployed environment looks like, before you deploy it.
-
By clicking on these steps on the right, we get to see what is happening in that step.
-
For example the first step is to download a file from Github, it shows you the request to github and where it saves the file on the file system.
-
Then it creates the IAM policy, role, and instance profile, and S3 bucket,
-
then uploads the web application to that bucket.
-
All of this is generated from your automation code. Magic.
-
This is what you can use to teach someone, or self learn, what the automation process is, and what the environment looks like.
-
And you don't have to keep erasing and redrawing lines on the whiteboard.
-
Which step was unclear? This one? Let's go through that again.
-
Now, this is great, but I like this one.
-
The diagram you saw is the example environment, but what does the actual environment look like?
-
We can discover it.
-
The diagram on the right has faded boxes for each resource, indicating that it doesn't exist.
-
When I click deploy, you can watch the progress diagram on the left, which will show you which steps are being executed,
-
or you can watch the outcome diagram on the right, which will show you the interactions between hosts, that are happening in real time.
-
All of the steps completed successfully, that why they're green,
-
and the resources have been created, so they are now visible.
-
We can do the same for clean up, and it will delete all of the resources from Amazon, as well as on disk.
-
And if we were to have an error, as we did before, we should see it clearly.
-
slow down internet, click deploy
-
Let's take a moment to admire this diagram.
-
Ooh look it's gone red.
-
So very quickly, from the user interface, you can tell which step the error came from,
-
as well as which resources it involves.
-
And we can surface the timeout message on the web interface, I just haven't coded that part yet.
-
Cool.
-
So for morale, a lot of effort has been put into aesthetics.
-
For seeing the state of the system, showing one line for each resource, with a link to the full detail, is deliberate.
-
If you've ever been on-call and gotten a call out in the middle of the night, it's very annoying to have to go and find each resource that is part of the system you are investigating.
-
If I can think it, take me there.
-
For progress, we present the information at a level of detail that is digestable,
-
and for errors, instead of panicking, which is visually equivalent of printing a stack trace,
-
we take that error, refine it, and make it beautiful.
-
Always include what went wrong, the reason, and how to recover,
-
because when help people recover from a bad situation,
-
you recover their morale.
-
With all of these aesthetic refinements, that box, is no longer opaque.
-
It is completely, clear.
-
You can see inside it, you can understand it, and you can control it.
-
How does all of this work?
-
Magic.
-
Architecture, how does it fit together?
-
The Peace framework is categorised into two main parts.
-
The item definition, which is the common shape of logic and data, for anything that is managed by automation, and
-
Common functionality, which works with those shapes to provide command execution and a user interface.
-
Item crates contain the logic and data to automate one thing, and
-
the tool crate connects different items together, and passes them to the common functionality from the Peace framework, to provide automation.
-
These groupings are deliberate, so that you can share and reuse common item logic from the standard package registry,
-
while keeping proprietary values and workflows within your tool.
-
Let's go deeper.
-
Starting with Item.
-
If you think of one step in a process, normally we would write code to do the step.
-
But instead of only writing code that does the work of that step,
-
an Item is a collection of functions that interact with the thing that is being automated.
-
What is the current state of the thing I'm managing?
-
What will it be, after the automation logic is executed?
-
What's the difference between these states?
-
What does it look like if it's not there?
-
The actual work logic, and
-
interactions -- what are the hosts, and paths that are involved in this automation.
-
Is it a request to fetch data back in, or is it a push to push data out.
-
This information is used to generate the diagram you saw earlier.
-
An example implementation of this, the File Download.
-
The current state function returns the state of the file on disk -- whether or not it exists.
-
And if it does exist, it also returns the MD5 hash.
-
The goal state function returns the state of the file from the server, because the state of the file on the server, will become the state of the file on disk, when the download is executed.
-
So this would fetch the content-length and etag from the server, as a way to compare with what is on disk locally.
-
Many servers use the MD5 hash of a file as its etag.
-
state_diff
returns whether the local file has the same hash as the remote file. -
If it's got a different hash, then we assume we need to download it.
-
state_clean
returns "the file does not exist". -
apply
downloads the file. -
and
interactions
says I'm pulling data from this host, and writing to this path on localhost. -
A collection of functions is called an Item.
-
And a collection of items, is called a Flow.
-
And a flow also contains the dependency ordering between items.
-
And in Rust, since we cannot store different concrete types in a collection, we have to put them on the heap and store their addresses.
-
Then this flow is what is passed to Peace's common functionality to use in execution or display.
-
Commands. Commands are one of the common functionality that Peace provides.
-
Given a flow and parameters, it invokes different functions within each item.
-
For example, the Discover command.
-
What is the current state of each item? What is the goal state of each item?
-
The discover command will run these functions, store the state, and display it to the user.
-
The Diff command will compute and show the difference between the current and goal states of each item.
-
The Ensure command will turn the current state of each item, into its goal state, through the apply function.
-
The Clean command is similar, where it turns the current state into the clean state, also through the apply function.
-
So Peace provides common logic to iterate through the items, and call the appropriate functions.
-
and it will also pass the appropriate values between each item.
-
That, is magic.
-
Going back to the Item definition, besides the functions to read from or write to the item, implementors also have to specify these data types.
-
Input, which we call parameters, and
-
Output, which we call State.
-
The parameters tell the item where to fetch data from, and where to write to, as well as any other information needed to access the item.
-
The state indicates whether or not the item exists, where it lives, and a summary of its contents.
-
This is the type that is returned from the current, goal, and clean state functions.
-
Putting it all together:
-
We combine the items into a flow,
-
We specify the parameters for each item,
-
Pick an output -- the command line, or web, or both,
-
and these three things together is called a command context.
-
Essentially "all the things you need to run a command".
-
Surface the commands to the user with appropriate names,
-
and this is your tool.
-
So Peace is a side project, and
-
there are side-side projects that were built in the making of Peace.
-
The first noticeable one is Interruptible, which adds the ability to interrupt a stream.
-
If you think about playing music, we are streaming bytes to a speaker, and out comes some audio.
-
When we pause the music, the bytes that were buffered still play, but any bytes that were not the audio buffer will not be played.
-
In automation, instead of streaming bytes to a speaker, we are streaming logic, to an executor.
-
When we pause, any logic that was already queued and is executing, will continue to run to completion.
-
Any logic that hasn't been queued, will not be started.
-
So we fully execute what is in progress to completion, and safely stop between steps.
-
And that's how we get safe interruptibility.
-
The second noticeable project is Dot Interactive.
-
This generates diagrams from structured input.
-
So it takes a data model with the nodes and edges, generates a diagram using GraphViz dot, and adds styles using Tailwind.
-
And that's what I've used for most of the diagrams you've seen today.
-
Now rounding off, what's the status of Peace? Is it ready to be used?
-
For development workflows, or short lived environments, where the environment does not live longer than one version of a tool,
-
I'd say it is ready.
-
But for production workflows, or environments that need to be stable, then Peace is not ready.
-
Don't use it, you will not have Peace.
-
In the table below, you can see the command execution and CLI functionality is stable,
-
The web interface is definitely not stable -- it was hacked together last month for this demo, and
-
the most important one for readiness is API and data stability, which may take me a year to complete.
-
Links to the project:
-
peace.mk for the project website
-
Slides are on peace.mk/book.
-
github.com/azriel91/peace for the repository.
-
To wrap up, I'd like to end with this note:
-
To engineer with empathy,
-
whether it is verbal, visual, or vocal,
-
refine your voice, connect,
-
and communicate with clarity.
-
Thank you for listening, and I'm happy to take questions.