SSH is great. And so is tmux. But sometimes they can interact in ways which are not intuitive.
Consider this scenario:
-
You ssh to some remote server (a bastion server) and start a tmux session. Perhaps you need to do something long-running, or you are worried about losing your connection and dont want to risk having to start all over again. Perhaps you are using your coffee shop WIFI network, and expect to go elsewhere once you have finished your coffee, and don't fancy having to interrupt your work too much.
-
Inside the tmux session you end up running ssh to connect to some other server. Or multiple servers. Or accessing a git repository using the SSH protocol. Not a problem, your SSH key allows you through. Everything works as you expect, and you get on with your work.
-
At some point you lose your connection to the bastion server. Perhaps you left the coffee shop. Not a problem - you know your tmux session will not die, and you can always reconnect later. After all: that's the whole point of tmux.
-
You arrive home to your trusty sofa, wake up the laptop, watch it connect to your WIFI network, ssh into the bastion server and resume the tmux session. All is good - things are exactly as you left them! (You reluctantly avoid high-fiving yourself at this point, as it would just look weird).
-
You get on with whatever you are doing. Soon the coffee shop is all but a distant memory.
-
At some point, you need to ssh to yet another server from inside your tmux session. And you are greeted with:
# inside tmux you@bastionserver $ ssh anotherserver Permission denied (publickey).
What happened there!? But it worked before!??
TLDR: The Fix
The fix is easy:
-
Disconnect from the tmux session (no need to close stuff) returning you to the shell on the bastion server.
-
In your shell on the bastion server where tmux is running, get the contents of the SSH_AUTH_SOCK environment variable:
echo $SSH_AUTH_SOCK
-
Reconnect to your tmux session ("tmux attach" as usual), and update the SSH_AUTH_SOCK environment variable for the shell inside to have the same contents:
SSH_AUTH_SOCK=Whatever-Echo-Gave-as-Output-Above
-
If you have multiple windows or panes open inside tmux, you should repeat the above for each one you expect to use ssh in.
-
Update SSH_AUTH_SOCK for tmux itself with Ctrl-b : (Ctrl-B followed by a colon) and enter:
SSH_AUTH_SOCK=Whatever-Echo-Gave-as-Output-Above
at the prompt. This will ensure that newly created panes/windows inherit the correct value.
The Cause
The underlying cause: Agent forwarding breakdown.
Let me explain that. (I would normally have written "Let me break that down", but the resulting pun would just make it confusing...)
You have a private SSH key. This key must never leave your laptop. You use that key to ssh to the bastion server - and that works because your ssh client can prove your identity: your key is on the laptop where your ssh client is running.
But what happened when you then further "jumped" to another server from the bastion server ? The bastion server did not have your private key, and thus should not be able to prove your identity to the other server. But it obviously did!
This is where "SSH Agent Forwarding" comes into play. Basically, it is a mechanism for SSH clients on the bastion to proxy the "prove your identity" request back to your laptop. The SSH private key never leaves your laptop. But the laptop gets involved when ssh clients on the bastion need to prove your identity to other servers.
So how does the ssh command on the bastion host know to talk to your laptop? When you log into a server, a socket is created on the server which allows communication with the ssh client on your laptop. And they talk over that - through the SSH connection you initiated to the bastion server. Yes: SSH communication is really a two-way thing. It's not just you connecting to the server, but the server may connect to you - under some circumstances. Bend your brain around it and live with it.
Obviously, we can't just allow any process on the bastion host to talk back to your laptop like this: It would give those processes a way of impersonating you. So the permissions on the socket are restricted to your user only. To avoid "cross-talk" with other SSH sessions, care is taken to give it a guaranteed unique name.
As you may have guessed: The SSH_AUTH_SOCK environment variable contains the path to that socket.
Once your SSH session is finished, the socket is removed again.
This works well.
At least until you ssh back into the bastion server, run "tmux attach" and resume from where you left off.
When you ssh'ed into the bastion server after arriving on the sofa, agent forwarding was set up again. With a different path name for the socket.
The processes running inside the tmux sessions were completely shielded from your disconnection/reconnection antics. They still have the same environment variables as before. Including the (now obsolete) value of SSH_AUTH_SOCK. They can no longer prove your identity when you try to authenticate to other servers.
Hence the error message: Permission denied
- which really came from
the server you were trying to ssh to: You got treated like
an anonymous alien, who has no rights.
Avoiding The Problem ?
At this point you may be thinking of ways to avoid the problem. After all, the underlying cause is not trivial unless you are aware of the workings of SSH. And the error message does nothing to point unfortunate users in the right direction, let alone suggest a solution. It feels like a problem worth solving.
I have some ideas. I wouldn't recommend implementing any of these; they break some existing use cases, and they all decrease security...
Use a Fixed Socket Name ?
If the path to the socket was always the same (e.g. "$HOME/.ssh-agent"), then this problem would not occur. That part is easy.
OK: The $HOME directory may not be writeable in some environments. But usually there is somewhere which is writable, so perhaps /tmp ? But /tmp is shared, and we want to allow concurrent logins to different user accounts, so we have to at least include the username in the file name.
We still have the problem of allowing multiple simultaneous ssh connections into the server on the same user account; we need to avoid "entangling" those connections:
-
It could be a shared account. While usually discouraged, they do exist, and we can't just break that usecase.
-
They connections may using different SSH keys. Allowing one user to prove his identity by using somebody else's SSH key would definitely be a security flaw. That "other" SSH key may be valid in many places. We cannot tell. The key owner would not be pleased.
-
The 2nd and subsequent connections would not be able to use SSH Agent Forwarding - at least not with the same socket name.
-
Only one process can (effectively) listen on the socket. Since the name is now fixed (or at least: predictable), they cannot assume that the existence of the socket means that some other process is using it. That process may have died and not cleaned up. So they would need some way of coordinating with each other. Which means finding each other. Reliably. Without security headaches. Which just makes the problem worse.
So although using a fixed socket name may alleviate the initial problem, it would create more complexity and cause worse problems than it solves. So that "solution" isn't a solution.
Add a Level of Indirection ?
So how about this instead? What if tmux was "ssh-aware"?
tmux sessions have names. Or at least: Some unique identifier. If we used that identifier to make a predicably-named path name for a symbolic link - and let that symbolic link point to the actual socket file?
This could be implemented by adding code in the startup of tmux: remove any existing symlink and create a new one pointing to the current value of SSH_AUTH_SOCK, and then set tmux's own environment variable SSH_AUTH_SOCK to point to the symlink. The code change would be small, and only needs to be surrounded by an if statement to check whether the environment varible is set in the first place.
This allows the value of SSH_AUTH_SOCK to stay constant for the duration of a tmux session (but different across tmux sessions), and quietly resolve to the currently-valid socket via the symbolic link?
This seems like a good idea until we consider the use case of multiple people using a shared tmux session. tmux is often used (or at least: was used) in educational contexts, as it is a nifty way of sharing the same screen across a vast number of users. Then it suddenly allows for those people to impersonate the person to whose ssh agent the symlink points to!
It would also require the symlink to be placed somewhere where everybody can create and delete files. $HOME wouldn't work for that, and neither would /tmp as only the file owner can delete files there...
But at least: for the use cases which do not need to cater for multiple users this can be implemented with a somewhat simple wrapper script around tmux or in $HOME/.profile.
If you read this far: Congratuations! - I hope the above made sense.
Enjoy!