UPDATE: 2017-01-24: Add https proxy to the command arguments. Also, updating style and grammar.

Problem

I'm having fun with w3m on Emacs but by using this I discarded several features I have neglected with a GUI: private browsing. This is useful if prying eyes were desperate enough to check the terminal browser for browsing history. Gluing it to ignore several sites is simple enough that might warrant a post itself.

Even with firefox or chrome, private browsing may not be enough as servers can track your IP and infer which sites you went to. Not truly private but good enough against nosy users. Enter tor, the anonymizing web proxy, that routes HTTP requests through several proxies/relays and adding layers of encryption between them to mask where the request comes from. Despite slowing down browsing speed, it will keep those pesky trackers and advertisers from scarily guessing what books or movies you secretly want.

Sadly, tor is a SOCKS proxy and w3m uses only HTTP proxies. This means it isn't as quick and straightforward joining the two, thus we need a middleman. Enter polipo, a caching HTTP proxy that is lightweight for the experiment. To finalize, w3m will use polipo as its HTTP proxy which in turn uses tor as its SOCKS proxy to send the request.

With that idea, here is the snippet that glues them. The first time you run this, tor might take some time setting up thus be patient. As hackers, we want elaboration.

Configuration Generation

Once you installed and tried both tor and polipo, we want Emacs to start these proxies for us when we start w3m. They can be configured as external services or daemons; instead, we want portable setups that minimize external dependencies and conflicts.

To start an external process, we use start-process. Practically, we want it to call both proxies and their respective command line options; however, generating their respective configuration files via Elisp is more interesting. A minor bonus, the configuration can be tested and examined externally.

Both configurations can be abstracted as a list of key-value pairs or cons with different line formatting. A basic formatter for this with tor:

(with-temp-file config-file ;; Config file path
  (insert
   (string-join
    (mapcar
     (lambda (pair)
       (pcase-let ((`(,key . ,value) pair))
         (format
          "%s %s" ;; Line formatting
          key ;; Key formatting
          (typecase value ;; Value formatting
            (symbolp (symbol-name value))
            (numberp (number-to-string value))
            (stringp value)))))
     pairs ;; The cons list
     )
    "\n")))

Aside from using the helpful with-temp-file macro and destructring pcase-let, the only nuance here is the value formatting via typecase. How you do the formatting affects what values you place. For example with polipo, our list of cons with some stylistic backquoting looks something like this:

`(("proxyAddress" . "0.0.0.0")
  ("allowedClients" . "127.0.0.1")
  ("diskCacheRoot" . ,fn/w3m-polipo-cache-dir)
  ("proxyPort" . ,fn/w3m-polipo-port)
  ("cacheIsShared" . false)
  ("socksParentProxy" .
   ,(format "%s:%s" "localhost" (number-to-string fn/w3m-tor-port)))
  ("socksProxyType" . socks5))

Although the key types are the same, notice there are different value types. Strings are quoted, numbers and symbols are stringified; if we used just list of line strings, we'd have to do the formatting ourselves. With this setup, it looks configurable and proper. A caveat to this is than when a value is changed, the files and processes needs to be updated or refreshed. This is natural since this is external to Emacs. Wrapping the generation as a function, fn/w3m-polipo-tor-update-conf, it is not a big issue to invoke it each time.

As an aside, customize-set-variable has some data binding mechanism through :set and :get property but it isn't necessary for a small shiv. Also, it is fascinating how generating the files with Elisp binds the data and code.

We now talk about the options we are interested in since I am assuming you didn't really read the manuals.

tor Options

The configuration values for tor:

`(("SocksPort" . ,fn/w3m-tor-port)
  ("DataDirectory" . ,fn/w3m-tor-cache-dir)
  ("ControlPort" . ,(1+ fn/w3m-tor-port))
  ("DisableDebuggerAttachment" . 0))
SocksPort
The port it listens to. This port is what polipo points to.
DataDirectory
The directory it uses. Optional, but must be changed since it defaults to a system directory
ControlPort
Optional debugging port if you want to monitor it with tools such as arm.
DisableDebuggerAttachment

If you want to monitor, set this flag

All we strictly need is the SocksPort or the interface port and everything else is for portability.

polipo Options

The configuration values for polipo:

`(("proxyAddress" . "0.0.0.0")
  ("allowedClients" . "127.0.0.1")
  ("diskCacheRoot" . ,fn/w3m-polipo-cache-dir)
  ("proxyPort" . ,fn/w3m-polipo-port)
  ("cacheIsShared" . false)
  ("socksParentProxy" .
   ,(format "%s:%s" "localhost" (number-to-string fn/w3m-tor-port)))
  ("socksProxyType" . socks5))
proxyAddress
The address this listens to. For portability, localhost is the value.
allowedClients
The IP address allowed. Portability again, localhost is the value.
diskCacheRoot
Like with tor and DataDirectory.
proxyPort
The port this listens to. This port is what w3m uses.
cacheIsShared
Set to true if the configuration is used by only one instance.
socksParentProxy
The SOCKS proxy port it listens to. This port is the meeting of tor and polipo.
socksProxyType
The SOCKS proxy type. The default socks5 is what tor is.

A little more nuanced since it is the middle man. What is strictly needed here is proxyPort and socksParentProxy which are just the interface ports.

Now that the options are clear, we move to our browser options.

w3m Options

We now look at the main browser and the only external option it needs, http_proxy. This is simply adding to w3m-command-arguments the value http_proxy=https://127.0.0.1:<polipo-port> after -o option. Aside, a nuance is also duplicating it for https_proxy as such:

(setq w3m-command-arguments
   (append w3m-command-arguments
           (list "-o"
              (format
               "http_proxy=http://127.0.0.1:%s/"
               fn/w3m-privoxy-port))
           (list "-o"
              (format
               "https_proxy=https://127.0.0.1:%s/"
               fn/w3m-privoxy-port))))

To note, w3m has its own configuration thus this configuration can be set there but it does ruin the data and code binding. It is now just a matter of calling the appropriate processes when w3m loads:

(setq fn/w3m-tor-process
   (start-process "w3m-tor" "*w3m-tor*" "tor" "-f" fn/w3m-tor-conf-file)
   fn/w3m-polipo-process
   (start-process "w3m-polipo" "*w3m-polipo*" "polipo" "-c" fn/w3m-polipo-conf-file))

Aside from managing the processes, all the pieces should work together properly.

If you don't need anonymity for some specialized hosts or domain, set it via w3m-no-proxy-domains:

(add-to-list 'w3m-no-proxy-domains "127.0.0.1")
(add-to-list 'w3m-no-proxy-domains "localhost")

As for me, if I don't add this, I can't test my blog since I will get a proxy error. Configure to your actual and specific setup. Refactoring that, here is a command to add the site as a proxy exception:

(defun fn/w3m-add-current-host-to-no-proxy-domains ()
  "Add current host to `w3m-no-proxy-domains'"
  (interactive)
  (when (eq major-mode 'w3m-mode)
    (lexical-let* ((parts (w3m-parse-http-url w3m-current-url))
        (host (elt parts 1)))
      (add-to-list 'w3m-no-proxy-domains host t)
      (w3m-reload-this-page))))

Auto Start And Kill

When w3m loads, it might be convenient to start it up. However, these processes is not as simple as flyspell or the like, so asking permission or confirmation as a reminder might be wise. A simple snippet does the job:

(when (yes-or-no-p "Start polipo and tor for w3m? ")
  (fn/w3m-polipo-tor-start-process))

Now the flip side of killing it should be handled when Emacs terminates; however for some reason, tor doesn't cleanly stop which creates a leak that is not visible by list-process. Strangely, polipo closes properly but tor does not. A stronger guarantee is to kill it as part of Emacs:

(add-hook 'kill-emacs-hook #'fn/w3m-polipo-tor-kill-process)

This hook is not a true guarantee as Emacs can be killed externally, so this hook will not run, thus tor is still running after the next session. You can check using proced if a tor is still running and kill it accordingly. This is apparently standard behavior for deamons.

When Emacs is killed, it sends a SIGHUP signal to its child processes or more importantly to the processes made with start-process. As stated by the manual:

SIGHUP
The signal instructs Tor to reload its configuration (including closing and reopening
logs), and kill and restart its helper processes if applicable.

This means when Emacs is killed externally, it restarts tor instead of killing it. Sadly, changing the kill or intercept the signal tor receives is not easy. For now, this issue is out of scope for a simple process glue. You can create a custom script to start tor, but this is a reminder that we can't control everything without lower level work.

Conclusion

With this, we joined w3m, polipo, tor to browse more privately and all within Emacs. Not perfect but it gets the job done. There are features or aspects that can be improved:

  • Data and argument binding, managing w3m-command-arguments as well
  • Error handling
  • Process management on open and close of w3m, maybe prodigy?
  • Proxy fine-tuning and configuration, checkout custom polipo conf from tor

As disclaimer, this does not fully make you anonymous specially in cases where you need to login, but it is an improvement over private browsing.