UPDATE: 2017-01-24: Add https proxy to the command arguments. Also, updating style and grammar.
Problem
I'm having fun with w3m on Emacs but by using this I discarded several features I have neglected with a GUI: private browsing. This is useful if prying eyes were desperate enough to check the terminal browser for browsing history. Gluing it to ignore several sites is simple enough that might warrant a post itself.
Even with firefox or chrome, private browsing may not be enough as servers can track your IP and infer which sites you went to. Not truly private but good enough against nosy users. Enter tor, the anonymizing web proxy, that routes HTTP requests through several proxies/relays and adding layers of encryption between them to mask where the request comes from. Despite slowing down browsing speed, it will keep those pesky trackers and advertisers from scarily guessing what books or movies you secretly want.
Sadly, tor
is a SOCKS proxy and w3m
uses only HTTP proxies. This
means it isn't as quick and straightforward joining the two, thus we
need a middleman. Enter polipo, a caching HTTP proxy that is
lightweight for the experiment. To finalize, w3m
will use polipo
as its HTTP proxy which in turn uses tor
as its SOCKS proxy to send
the request.
With that idea, here is the snippet that glues them. The first time
you run this, tor
might take some time setting up thus be patient.
As hackers, we want elaboration.
Configuration Generation
Once you installed and tried both tor
and polipo
, we want Emacs to
start these proxies for us when we start w3m
. They can be configured
as external services or daemons; instead, we want portable setups that
minimize external dependencies and conflicts.
To start an external process, we use start-process
. Practically, we
want it to call both proxies and their respective command line
options; however, generating their respective configuration files via
Elisp is more interesting. A minor bonus, the configuration can be
tested and examined externally.
Both configurations can be abstracted as a list of key-value pairs or
cons with different line formatting. A basic formatter for this with
tor
:
(with-temp-file config-file ;; Config file path (insert (string-join (mapcar (lambda (pair) (pcase-let ((`(,key . ,value) pair)) (format "%s %s" ;; Line formatting key ;; Key formatting (typecase value ;; Value formatting (symbolp (symbol-name value)) (numberp (number-to-string value)) (stringp value))))) pairs ;; The cons list ) "\n")))
Aside from using the helpful with-temp-file
macro and destructring
pcase-let
, the only nuance here is the value formatting via
typecase
. How you do the formatting affects what values you place.
For example with polipo
, our list of cons
with some stylistic
backquoting looks something like this:
`(("proxyAddress" . "0.0.0.0") ("allowedClients" . "127.0.0.1") ("diskCacheRoot" . ,fn/w3m-polipo-cache-dir) ("proxyPort" . ,fn/w3m-polipo-port) ("cacheIsShared" . false) ("socksParentProxy" . ,(format "%s:%s" "localhost" (number-to-string fn/w3m-tor-port))) ("socksProxyType" . socks5))
Although the key types are the same, notice there are different value
types. Strings are quoted, numbers and symbols are stringified; if we
used just list of line strings, we'd have to do the formatting
ourselves. With this setup, it looks configurable and proper. A caveat
to this is than when a value is changed, the files and processes needs
to be updated or refreshed. This is natural since this is external to
Emacs. Wrapping the generation as a function,
fn/w3m-polipo-tor-update-conf
, it is not a big issue to invoke it
each time.
As an aside, customize-set-variable
has some data binding mechanism
through :set
and :get
property but it isn't necessary for a small
shiv. Also, it is fascinating how generating the files with Elisp
binds the data and code.
We now talk about the options we are interested in since I am assuming you didn't really read the manuals.
tor Options
The configuration values for tor
:
`(("SocksPort" . ,fn/w3m-tor-port) ("DataDirectory" . ,fn/w3m-tor-cache-dir) ("ControlPort" . ,(1+ fn/w3m-tor-port)) ("DisableDebuggerAttachment" . 0))
- SocksPort
- The port it listens to. This port is what
polipo
points to. - DataDirectory
- The directory it uses. Optional, but must be changed since it defaults to a system directory
- ControlPort
- Optional debugging port if you want to monitor it with tools such as arm.
- DisableDebuggerAttachment
If you want to monitor, set this flag
All we strictly need is the SocksPort or the interface port and everything else is for portability.
polipo Options
The configuration values for polipo
:
`(("proxyAddress" . "0.0.0.0") ("allowedClients" . "127.0.0.1") ("diskCacheRoot" . ,fn/w3m-polipo-cache-dir) ("proxyPort" . ,fn/w3m-polipo-port) ("cacheIsShared" . false) ("socksParentProxy" . ,(format "%s:%s" "localhost" (number-to-string fn/w3m-tor-port))) ("socksProxyType" . socks5))
- proxyAddress
- The address this listens to. For portability, localhost is the value.
- allowedClients
- The IP address allowed. Portability again, localhost is the value.
- diskCacheRoot
- Like with
tor
and DataDirectory. - proxyPort
- The port this listens to. This port is what
w3m
uses. - cacheIsShared
- Set to true if the configuration is used by only one instance.
- socksParentProxy
- The SOCKS proxy port it listens to. This port is the meeting of
tor
andpolipo
. - socksProxyType
- The SOCKS proxy type. The default
socks5
is whattor
is.
A little more nuanced since it is the middle man. What is strictly needed here is proxyPort and socksParentProxy which are just the interface ports.
Now that the options are clear, we move to our browser options.
w3m Options
We now look at the main browser and the only external option it needs,
http_proxy
. This is simply adding to w3m-command-arguments
the
value http_proxy=https://127.0.0.1:<polipo-port>
after -o
option.
Aside, a nuance is also duplicating it for https_proxy
as such:
(setq w3m-command-arguments (append w3m-command-arguments (list "-o" (format "http_proxy=http://127.0.0.1:%s/" fn/w3m-privoxy-port)) (list "-o" (format "https_proxy=https://127.0.0.1:%s/" fn/w3m-privoxy-port))))
To note, w3m
has its own configuration thus this configuration can
be set there but it does ruin the data and code binding. It is now
just a matter of calling the appropriate processes when w3m
loads:
(setq fn/w3m-tor-process (start-process "w3m-tor" "*w3m-tor*" "tor" "-f" fn/w3m-tor-conf-file) fn/w3m-polipo-process (start-process "w3m-polipo" "*w3m-polipo*" "polipo" "-c" fn/w3m-polipo-conf-file))
Aside from managing the processes, all the pieces should work together properly.
If you don't need anonymity for some specialized hosts or domain, set
it via w3m-no-proxy-domains
:
(add-to-list 'w3m-no-proxy-domains "127.0.0.1") (add-to-list 'w3m-no-proxy-domains "localhost")
As for me, if I don't add this, I can't test my blog since I will get a proxy error. Configure to your actual and specific setup. Refactoring that, here is a command to add the site as a proxy exception:
(defun fn/w3m-add-current-host-to-no-proxy-domains () "Add current host to `w3m-no-proxy-domains'" (interactive) (when (eq major-mode 'w3m-mode) (lexical-let* ((parts (w3m-parse-http-url w3m-current-url)) (host (elt parts 1))) (add-to-list 'w3m-no-proxy-domains host t) (w3m-reload-this-page))))
Auto Start And Kill
When w3m
loads, it might be convenient to start it up. However,
these processes is not as simple as flyspell
or the like, so asking
permission or confirmation as a reminder might be wise. A simple
snippet does the job:
(when (yes-or-no-p "Start polipo and tor for w3m? ") (fn/w3m-polipo-tor-start-process))
Now the flip side of killing it should be handled when Emacs
terminates; however for some reason, tor
doesn't cleanly stop which
creates a leak that is not visible by list-process
. Strangely,
polipo
closes properly but tor
does not. A stronger guarantee
is to kill it as part of Emacs:
(add-hook 'kill-emacs-hook #'fn/w3m-polipo-tor-kill-process)
This hook is not a true guarantee as Emacs can be killed externally,
so this hook will not run, thus tor
is still running after the next
session. You can check using proced
if a tor
is still running and
kill it accordingly. This is apparently standard behavior for deamons.
When Emacs is killed, it sends a SIGHUP signal to its child processes
or more importantly to the processes made with start-process
. As
stated by the manual:
SIGHUP The signal instructs Tor to reload its configuration (including closing and reopening logs), and kill and restart its helper processes if applicable.
This means when Emacs is killed externally, it restarts tor
instead
of killing it. Sadly, changing the kill or intercept the signal tor
receives is not easy. For now, this issue is out of scope for a simple
process glue. You can create a custom script to start tor
, but this
is a reminder that we can't control everything without lower level
work.
Conclusion
With this, we joined w3m
, polipo
, tor
to browse more privately
and all within Emacs. Not perfect but it gets the job done. There are
features or aspects that can be improved:
- Data and argument binding, managing
w3m-command-arguments
as well - Error handling
- Process management on open and close of
w3m
, maybeprodigy
? - Proxy fine-tuning and configuration, checkout custom polipo conf from tor
As disclaimer, this does not fully make you anonymous specially in cases where you need to login, but it is an improvement over private browsing.