Problem

Good news, I was able to make my gnus work, now I can read emails easily, productively and in Emacs. Now I am trigger happy starring and watching GitHub repos, I am the observer, the watcher.

Bad news, I have 300 emails coming every day specially from the popular repos such as react-native or spacemacs which gets a lot of issue traffic. Nothing new and expected, but having ideas in improving the information processing is a good thing.

Thankfully, the email reader allows me to sift through it rather quick but I do lament that I might be missing something due to the speed I am glancing at each issue. So I remembered something.

Is there a text summarization tool?

Of course there is, if a tool can help me understand quicker a long discussion or issue, then profit. So here is my experience tying up a text summary tool sumy and how I bind it with gnus. For those impatient, here is my working snippet for this task. Not perfect yet, but I might go back once I learn more. You can find this in my .gnus configuration but that is somewhat private, so here it is.

(require 'deferred)
(require 's)

(defconst fn/cache-dir "~"
  "Where you keep your moving files")

(defconst fn/summary-text-file (expand-file-name ".text" fn/cache-dir)
  "Summary temporary file")

(defun fn/article-body ()
  "Get current article buffer message"
  (gnus-with-article-buffer
    (let ((start (prog2 (message-goto-body) (point)))
          (end (prog2 (message-goto-signature) (point))))
      (buffer-substring-no-properties start end))))

(defun fn/summarize-text (text)
  "Summarize text for easier comprehension"
  (with-temp-file fn/summary-text-file
    (insert text))
  (deferred:nextc
    (deferred:process-shell "sumy" "lex-rank" (concat "--file=" fn/summary-text-file))
    (lambda (summary-output)
      (s-split "\n" (s-trim-right summary-output)))))

(defvar fn/prev-summarizing-request nil
  "The previous request made, this is to prevent extra request being made.
This can be made into a function is so desired")

(defun fn/gnus-article-summary ()
  "Show a summary for each article I visit"
  (when fn/prev-summarizing-request
    (deferred:cancel fn/prev-summarizing-request))

  (lexical-let* ((current-message (fn/article-body))
                 (summary (fn/summarize-text current-message)))
    (setq fn/prev-summarizing-request summary)

    (deferred:$
      summary
      (deferred:nextc it
        (lambda (summarizies)
          (let ((summary-text (string-join
                               (mapcar (lambda (summarization)
                                         (concat "* " summarization))
                                       summarizies)
                               "\n")))
            (gnus-with-article-buffer
              (message "Summarizing article")
              (message-goto-eoh)

              (read-only-mode +1)

              (insert "\n")
              (insert "Summary:")
              (insert "\n")
              (insert summary-text)
              (insert "\n")
              (insert "-------")
              (insert "\n")

              (read-only-mode 0)))))
      (deferred:nextc it
        (lambda ()
          (setq fn/prev-summarizing-request nil))))))

(add-hook 'gnus-article-prepare-hook #'fn/gnus-article-summary)

So what this does is whenever I open an article, I call sumy on the message and I get an list of sentences that are important to read. I then hack a little summary on top of the message or before the content to show me.

I could have used other tools or other features of gnus such as adaptive scoring or done it in a better way or what have you but this is good enough.

The Tool And The Glue

I'm more of a python guy and I've heard some NLTK, so duck duck go, what python text summary tool is nice to use? After a few searches and experiments, sumy looks like a good enough command line tool. So what is good enough?

I have no idea what the algorithms are but with some experimentation and reading, I used the lex-rank algorithm which is focused on central sentences. Easy enough to understand, I suppose.

So how do we tie it up? First is getting the message from the article buffer. I was hoping for an gnus-article-message function or something but apparently even the code doesn't clearly have it. Here is what I came up with.

(defun fn/article-body ()
  "Get current article buffer message"
  (gnus-with-article-buffer
    (let ((start (prog2 (message-goto-body) (point)))
          (end (prog2 (message-goto-signature) (point))))
      (buffer-substring-no-properties start end))))

As you can see, it uses buffer magic and some tinkering. Now how do we plug it in the tool? I hoped for an easy piping with the tool but it uses files instead, a little tangle. I filed an issue but it can tied with this simple snippet.

(defconst fn/summary-text-file (expand-file-name ".text" fn/cache-dir)
  "Summary temporary file")

(defun fn/summarize-text (text)
  "Summarize text for easier comprehension"
  (with-temp-file fn/summary-text-file
    (insert text))
  (deferred:nextc
    (deferred:process-shell "sumy" "lex-rank" (concat "--file=" fn/summary-text-file))
    (lambda (summary-output)
      (s-split "\n" (s-trim-right summary-output)))))

Here I use the nice deferred library to make an shell command and return a deferred as well. Why deferred? Asynchronous or non-blocking operation. You can use shell-command-to-string and make it synchronous, but when you're reading a lot of email; the wait time is compounded. With deferred:process-shell, it is non-blocking and easy enough to use than make-process or start-process. Like a promise, I process the raw string output into a list of sentences. Easy enough.

So how do we tie this in with gnus?

Gnus Article

This is the sad part for me, I hacked the buffer content manually. I first tried font-lock which sort of works but not consistent… yet. I did try manipulaintg gnus-emphasis-alist but no dice. After so many hacks, that's what I did. This snippet should explain all.

(defvar fn/prev-summarizing-request nil
  "The previous request made, this is to prevent extra request being made.
This can be made into a function is so desired")

(defun fn/gnus-article-summary ()
  "Show a summary for each article I visit"
  (when fn/prev-summarizing-request
    (deferred:cancel fn/prev-summarizing-request))

  (lexical-let* ((current-message (fn/article-body))
                 (summary (fn/summarize-text current-message)))
    (setq fn/prev-summarizing-request summary)

    (deferred:$
      summary
      (deferred:nextc it
        (lambda (summarizies)
          (let ((summary-text (string-join
                               (mapcar (lambda (summarization)
                                         (concat "* " summarization))
                                       summarizies)
                               "\n")))
            (gnus-with-article-buffer
              (message "Summarizing article")
              (message-goto-eoh)

              (read-only-mode +1) ;; I feel this is evil

              (insert "\n")
              (insert "Summary:")
              (insert "\n")
              (insert summary-text)
              (insert "\n")
              (insert "-------")
              (insert "\n")

              (read-only-mode 0)))))
      (deferred:nextc it
        (lambda ()
          (setq fn/prev-summarizing-request nil))))))

Aside from wrapping the main function with keeping tabs of the current summary, the core of the message can be seen in the gnus-with-article-buffer. It just adds a small summary section right before the message begins and since the whole operation is asynchronous, you have to blink before you see the result. Give or take, making the section is not that hard nor how the deferred or promise continuation and housekeeping is implemented. It is pretty straightforward code.

I do lament using read-only-mode which breaks the contract of being immutable. There is a correct mode for this which is gnus-article-edit-mode but this modifies the backing value. What we need is simply a display aid. I do pray I find the more legit way of doing this but for now, this does show a summary.

This is harder than it looks and I mulled this over for hours.

Limitation

As the astute read, sometimes the summary isn't helpful or redundant

Summary Snapshot

This is a common thing, you can't expect a machine to understand what you want. The screenshot above shows that the summary might be the same as the email you are reading and in that case, the summary is redundant and sadly useless. Obviously, the case amplifies when there is code where a text processor cannot understand.

But I did come up with a long issue email, where the summary really did show the points I am interested in. I am not looking for the perfect tool, just something that is good enough without sacrificing too much. As long as I get a good summary after skimming through the text, if the summary help then it is a bonus.

Again, it is not perfect but it is okay.

Conclusion

So with a shiv of a email text analysis, maybe we can do better? Perhaps, once I learn more and update this code. But text analysis is pretty interesting. Emails, buffers, or maybe diary journals? I maybe scratching the surface here and not showing the true strength, but the idea is there. Text analysis for email reading. Maybe someone can do a better job?

As for now, time to check my mail.

2016-08-13 Update

I got obsessive about the code so here is a revised edition of the core code that does it appropriately.

(require 'dash)

(defface fn/article-aid-face  '((t (:weight bold :height 1.1 :box (:line-width 2 :color "grey75" :style released-button))))
  "Article aid face")

(defvar fn/article-aid-face 'fn/article-aid-face
  "Article aid var")


(defvar fn/prev-summarizing-request nil
  "The previous request made, this is to prevent extra request being made.
    This can be made into a function is so desired")

(defun fn/gnus-article-summary ()
  "Show a summary for each article I visit"
  (when fn/prev-summarizing-request
    (deferred:cancel fn/prev-summarizing-request))

  (lexical-let* ((current-message (fn/article-body))
                 (summary (fn/summarize-text current-message)))
    (setq fn/prev-summarizing-request summary)

    (deferred:$
      summary
      (deferred:nextc it
        (lambda (summarizies)
          (with-current-buffer gnus-summary-buffer
            (let ((summarizies summarizies))
              (setq gnus-article-emphasis-alist
                    (-concat
                     (mapcar (lambda (summary-text)
                               (list
                                (regexp-quote summary-text)
                                0
                                0
                                'fn/article-aid-face))
                             summarizies)
                     gnus-article-emphasis-alist))))
          (with-current-buffer gnus-article-buffer
            (message "Emphasizing article aids")
            (article-emphasize))
          (with-current-buffer gnus-summary-buffer
            (setq gnus-article-emphasis-alist
                  (-filter (lambda (emphasis)
                             (not (eq (nth 3 emphasis) fn/article-aid-face )))
                           gnus-article-emphasis-alist)))))
      (deferred:nextc it
        (lambda ()
          (setq fn/prev-summarizing-request nil))))))

(add-hook 'gnus-article-prepare-hook #'fn/gnus-article-summary)

Here is a screenshot of this new snippet

Emphasized Article

Not the best screenshot but what this snippet does is emphasize and highlight key sentences. The correct way of emphasizing an article with gnus is with article-emphasize and I homed in on that.

Ideally, you just add to the list gnus-article-emphasis-alist phrases you want emphasized. Obvious enough, but the first problem I came across is that it wasn't working even with the simplest configuration. After two hours of useless mutations, I checked the code for article-emphasize which is quite deceptive.

Long story short, you have to mutate it within the gnus-summary-buffer and call article-emphasize on gnus-article-buffer which is weird or have I not read the manual enough? Probably the latter but with that out of the way, the code followed.

I do admit the hacked code is a little annoying that it pushed down the text I was reading from time to time, so this correct code is much better. Sadly there is an extra requirement, dash, and the weird with-current-buffer juggling and state management of gnus-article-emphasis-alist so that it doesn't stack up during prolonged use.

Well, state. Maybe a macro can do this but I don't know which one yet. Whatever. And now I can definitely get back to reading my mail.

Minor Reflection

By the way, there has been research in using machine learning and automatic summarization with emails which is pretty cool. I wonder how that can apply to reading source code instead of plain text? I'm not hoping for too much but still pretty nice to think about all the things you can analyze with a tool.

Currently, I'm using conkeror as my browser which is extensible and can run shell commands. What this means is that I can run sumy and plugin the current url which will give me a summary output of the current page. Basically, an article summary in one command. Here is my rookie snippet.

require("io");
require("spawn-process");
require("interactive");

interactive(
    'gist',
    'What is this webpage all about?', function(I) {
        var url = I.buffer.current_uri.spec;
        var sumy_cmd = '/home/fnmurillo/.local/bin/sumy';
        var cmd_str = sumy_cmd + " lex-rank --length 5 --url \"" + url + "\"";

        I.window.minibuffer.message('Shell Command: ' + cmd_str);

        var data = '';
        var error = '';

        var output = yield shell_command(
            cmd_str,
            $fds = [
                {
                    output: async_binary_string_writer("")
                },
                {
                    input: async_binary_reader(function(s) {
                        data += s || "";
                    }
                                              )
                },
                {
                    input: async_binary_reader(function(s) {
                        error += s || "";
                    }
                                              )
                }
            ]
        );


        I.window.alert (
            "Here's The Gist\n" +
                "--------\n" +
                data
                .trim()
                .split('.\n')
                .map(function (summary) { return '* ' + summary })
                .join ('.\n'));
    });

It looks dirty because, but the key points is the shell_command and alert which pretty much is just calling the tool and displaying with the poor man's dialog box. Still learning and probably will go back once I learn more, this is for your enjoyment.

So with that, let me see what the tool says about this article. Kinda meta which pretty much reflections.

Article Gist

Huh… looks very hopeful. What do you think? Does this represent the article you read? One thing, was it helpful? There is definitely some use with an information processing tool at your fingertip.