Problem

After writing some projectile glue and .projectile and .gitignore files, I thought to myself if I could write code to generate those files. A DSL so to say. The reason is three-fold:

  • Write a simple ignore file structure as Elisp code
  • Allow Elisp variables for dynamic and reusable sections
  • Have a hand in writing a generic DSL

Aside from my hacking sensibilities, the key concerns of doing so may be put as:

  • Maintaining two ignore files when it is just really one with a different syntax
  • You cannot reuse blocks of repeating sections with different files

The former is syntactic separation while the later is flexibility and reusability. For my use case with Elisp, I have two projects: my Emacs configuration and my org-jekyll-blogger.el blog project. Using git and projectile, I have to more or less copy my ignore files and tweak it a bit. To address this, I created my own DSL magin to handle .gitignore files; .projectile is similar in intent but magin was the library I had in mind as proof.

As an example, this is my .emacs.d .gitignore:

# Base files
config.el
personal.el

# Base directory
.cache
extra

# Project specific
working-config.org
my-macros.el

# Library
lib/sandbox
lib/packages
!lib/modules
!lib/scripts

# Block for gtags
GPATH
GTAGS
GRTAGS

# Block for emacs
*.elc
.#*
*Org Src*

It is generated by this DSL:

(magin-write-to-project
 `(delimited ;; Context keyword that separates contexts by newlines

   (context ;; Represents a semantic group
    (comment "Base files") ;; A comment keyword
    (file ,(format "%s.el" (file-name-base fn/config-file))) ;; A file keyword
    (file ,(format "%s.el" (file-name-base fn/personal-file)))) ;; Notice I use a Elisp variable

   (context
    (comment "Base directory") ;; Generates "# Base directory"
    (dir ".cache") ;; A dir keyword
    (dir "extra")) ;; An alias to file but it indicates intends for a directory

   (context ;
    (comment "Project specific")
    (file "working-config.org")

    (file  ,(format "%s.el" (file-name-base fmk/macro-file))))


   (context
    (comment "Library")
    (path ;; Represents a path prefix
     "lib" ;; Everything underneath is generated with a "lib" prefix
     (context
      (dir "sandbox")
      (dir "packages"))
     (include
      (dir "modules")
      (dir "scripts"))))

   (block gtags) ;; Reusable blocks
   (block emacs))
 user-emacs-directory ;; Rewrites =.gitginore= of `user-emacs-directory'
 )

I do hope the DSL is easy to digest. Here are the two libraries for the interested: magin and projin. If you stay around, I'd like to discuss the following concepts in crafting it:

  • Backquote
  • Dispatching
  • Implicit Environment

Backquoting

The reason why I was confident in making this DSL is due to thanks to backquoting: it mixes the use of quoting and expressions in a Lispy way. I point you to understand quoting before proceeding, but as a quick brush it is using list of symbols as data.

As an example of quoting:

(+ 1 2 3) ;; 6

'(+ 1 2 3)         ;; (+ 1 2 3)
(quote (+ 1 2 3e))


(subject linking-verb adjective)  ;; Missing symbols
'(subject linking-verb adjective) ;; As is

It also allows for lazy evaluation, the problem arises when we want to integrate Elisp variables into the equation:

(setq left 1
   right 2)

(+ left right) ;; 3

(+ 1 right)    ;; 3
'(+ 1 right)   ;; (+ 1 right)

;; Using backquotes
`(+ 1 ,right)           ;; (+ 1 2)
`(+ 1 ,(+ left right))  ;; (+ 1 3)

;; Evaluating the actual value
(eval `(+ 1 ,(+ left right)))  ;; 4

The complicated approach would ask us to create a tokenizer, parser and compiler while managing text matching and all that jazz; lispy data structures allows us to bypass this through the quoting. Without needing anymore, we can create a quoted structures that integrates with Elisp without any hastle.

Domain Syntax

So in thinking of the DSL, let us understand what the domain is. Our domain of abstraction is gitignore. After some thinking and reading, this is the abstraction I want to address:

  • Files and directories have no distinction
  • Comments
  • Path context
  • Separators
  • Inclusion
  • Reusable blocks or groups

There are other such abstraction I wish not cover:

  • Quoting or escaping
  • Glob keywords, as in rx.el, for single or double asterisk

A DSL that is good enough to accomplish the above is good enough without being too abstract. Without stretching too much, this DSL has no intermediate output, it translates directly to text. This tradeoff loses flexibility in output but at least it is easier. If you wanted, the DSL can be mechanically a list of plists with a main text manipulated by several properties as such:

(magin--compiler
 (path "glob"
       (include
        (file "foo.bar")
        (path "gib"
              (dir "foo-bar")))))

;; This hypothetically might yield
'((:line "foo.bar" :include t :parent "glob")
  (:line "foo-bar" :include t :parent "glob/gib"))

Without much complexity aside, we can start with hacking with the simple syntax.

Keywords

At last, we define our data structure:

;; The leaf keywords
(file ,line)
(dir ,line) ;; Alias for file

;; Non functional nodes
(comment ,text)
(newline)


;; Syntactic grouping, the equivalent of progn
(context &rest ,sublines)

;; Contextual grouping, like `context' but affects everything inside it
;; Sublines have the property `:include' as `t'
;; Which tells the leaf node to add `!' as a prefix
(include &rest ,sublines)

;; Sublines have the property `:path' set to `path'
;; Nesting of paths are handled
(path ,path &rest sublines)


;; Block keywords
;; Create a variable
(defblock ,block-name &rest block-lines)

;; Call the variable
(block ,block-name)


;; Aesthetic grouping
;; Every subline is interleaved with a newline
(delimited &rest ,sublines)

Those are the keywords that we must handle in our DSL. With this in mind, we can start be creating a function that handles the compilation.

(defun magin--compiler (dsl &optional env)
  "Compiles DSL with the environment ENV."
  nil)

This is our DSL handler, every node that wants to compile nodes have to go through this dispatching function. This setup allows us to add more keywords independently of each other. However, we are not going to use a long switch or cond statement; we will use the implicit Elisp environment to find handlers. If the compiler comes to a keyword, it finds a function in the Elisp environment that starts with magit--dsl- and invokes that as a handler.

(defun magin--dsl-context (dsl env)
  "File keyword handler.")

(defun magin--dsl-file (dsl env)
  "File keyword handler.")

(defun magin--dsl-block (dsl env)
  "Block keyword handler.")

(magin--compiler
 `(context
   (file "a")
   (block emacs)
   (unknown) ;; Error, no `magit--dsl-unknown'
   ))

In the example above, the compiler finds the three functions above and calls them but fails on the last one. So if you want to extend it without messing with the source, one can simply define a function with the prefix and it will be detected by the compiler. This is similar to how use-package does it with its keywords. A global dependency injection if you will. This magic is done through intern-soft:

(lexical-let* ((rule-name (symbol-name rule))
    (rule-handler
     (intern-soft
      (format "%s%s" magin-dsl-prefix rule-name))))
  (if (null rule-handler)
      (error "No rule to handle %s at dsl: %s" rule-name dsl)
    (funcall rule-handler dsl env)))

As a side note, you can use an hash-map or create your own obarray or symbol environemt if you really wanted a private space. In our approach, you can see the symbols with the helpful describe-variable without extra work.

If you want to use eval-sexp with the raw DSL, you have to create your own eval and env. You can remap eval with magin-compiler, but you need to know when it is evaluating a DSL or a Lisp. We're not creating a new interpreter or environment, so this is good enough.

With this simple mechanism, we can define the keywords incrementally.

Context Keyword

So our dispatcher is a function that takes a dsl and an environment which returns a text. Without the intermediate data structure, the context is applying the dispatcher to each line and then combining it with a newline delimiter as such:

(string-join
 (mapcar  (lambda (subdsl)(funcall #'magin--compiler subdsl env)) subdsls)
 "\n")

This is simply lining up the entries. Nothing complicated but how about manipulating those that manipulate the environment?

Environmental Keyword

A quick way to create an environment like scoping, one can easily use an alist and it will take care of itself. For example with the include keyword:

(lexical-let ((new-env (append (list (cons :include t)) env)))
  (magin--dsl-context dsl new-env))

You could use plist but have to implement an extend function for it. Interestingly, you could also use an alist to simulate an enviroment and lexical scoping without much problem.

It is as simple as appending the cons entry and it is done. The others are implemented the same way. Now how about implementing the leaf keywords

Leaf Keyword

The implementation above forces everything to the leaf node which really have only one important one, file. The file has to take the text value and format it based on the environment.

(lexical-let* ((parent (cdr (assoc :parent env)))
    (include (if (cdr (assoc :include env))"!" nil)))
  (concat include parent file))

Our friend is assoc and cdr when manipulating an alist enviroment. Again, it is simply a matter formatting. So writing a DSL with this setup is actually easy.

Block Keyword

Lastly, let's talk about defining a block variable. Like with the dispatcher, we can use the implicit Elisp environment as the variable space. No need to define a hash of symbols and lookups, we already have one and we take advantage of it. The keyword defblock will simply save the defintion into a symbol with a magin--block- prefix and the other one, block, looks for it. Aside from intern-soft, the complimentary friend is makunbound, which unbinds the symbol so the value is updated properly.

(lexical-let* ((block-name (symbol-name block-symbol))
    (block-def-name (intern
                     (format "%s%s"
                             magin-block-prefix
                             block-name))))
  (makunbound block-def-name)
  (eval `(defvar ,block-def-name '(context ,@block-def)
           ,(format "Block definition for %s" block-name)))
  block-def-name)

Aside from using some direct eval magic, it is as straightforward as it gets as well as with the block handler. With this, we can have our resusable blocks of code.

(magin--dsl-defblock
 `(defblock emacs
    (comment "Block for emacs")
    (file "*.elc")
    (file ".#*")
    (file "*Org Src*") ;; Org Src buffers
    )
 (list))

Non-functional Keyword

The least important is the non-functional ones, like with comment and newline which are prefixed text and an empty text respectively. However, the delimited keyword is a bit more complex that it interleaves each line with a newline keyword.

(lexical-let ((delimited-dsls
     (cdr
      (apply #'append
         (mapcar
          (lambda (dsl)
            (list '(newline) dsl))
          subdsls)))))
  (magin--compiler
   `(context
     ,@delimited-dsls)
   env))

Take your pick in implementing interleave but the thing here is that it wraps the old context with a newly modified one. It may not be a big thing but this interferes with debugging and tracing when you don't know the original source. This is one weakness of this approach that it interferes with the dsl itself. Nothing terrible but it is the price of having the return value as text instead of an intermediate value.

With all those ideas, the DSL is easily implemented.

Wrap Up

So with this design and implementation we have a crud but simple and dynamic DSL. You can look at the main defintion and notice the use of backquoting in how it addresses the problems mentioned. Now I can manage my ignore file with Emacs.

Now for some minor things we might have forgotten:

  • Unit tests
  • Logging
  • Debugging

The DSL is not too complex to warrant any of those but it is nice to note and implement. As one final point, we could have used a defmacro instead of a dispatch compiler but in case we want to update our code in order to join other contexts, we have a base.

(setq left-code `(context
               (file "a")
               (file "b"))
   right-code `(context
                (file "c")
                (file "d")))


(magin-join left-code right-code) ;; ???

If we are dealing with text, we can't have this yet. In particular if you have an initial defintion and want to update it according some condition, then it requires changing the code to handle plist or something that fancies the data.

To reiterate, Elisp allows us to create a DSL with ease and little knowledge.

Conclusion

So I copied this code to create projin for projectile ignore file but unless I need more functionality, I don't think I need it for now. If another ignore file comes my way, I might refactor and do the same.

The one thing I want to really write is a DSL for SQL writing, not SQL itself. My dream is to write SQL as blocks of data.

(setq display-fields '(fields first-name last-name middle-name))

(select
 display-fields
 (from table))
;; "SELECT firstName, lastName, middleName FROM table"

The idea in that snippet is the fields are data itself. It also extends to queries being data but the abstraction of aliasing and joining is sketchy for now. That is my dream though: to write a DSL for SQL writing. For now, ignore files are good practice.