Problem
After writing some projectile glue and .projectile
and .gitignore
files, I thought to myself if I could write code to generate those
files. A DSL so to say. The reason is three-fold:
- Write a simple ignore file structure as Elisp code
- Allow Elisp variables for dynamic and reusable sections
- Have a hand in writing a generic DSL
Aside from my hacking sensibilities, the key concerns of doing so may be put as:
- Maintaining two ignore files when it is just really one with a different syntax
- You cannot reuse blocks of repeating sections with different files
The former is syntactic separation while the later is flexibility and
reusability. For my use case with Elisp, I have two projects: my Emacs
configuration and my org-jekyll-blogger.el blog project. Using git
and projectile
, I have to more or less copy my ignore files and
tweak it a bit. To address this, I created my own DSL magin
to
handle .gitignore
files; .projectile
is similar in intent but
magin
was the library I had in mind as proof.
As an example, this is my .emacs.d .gitignore
:
# Base files config.el personal.el # Base directory .cache extra # Project specific working-config.org my-macros.el # Library lib/sandbox lib/packages !lib/modules !lib/scripts # Block for gtags GPATH GTAGS GRTAGS # Block for emacs *.elc .#* *Org Src*
It is generated by this DSL:
(magin-write-to-project `(delimited ;; Context keyword that separates contexts by newlines (context ;; Represents a semantic group (comment "Base files") ;; A comment keyword (file ,(format "%s.el" (file-name-base fn/config-file))) ;; A file keyword (file ,(format "%s.el" (file-name-base fn/personal-file)))) ;; Notice I use a Elisp variable (context (comment "Base directory") ;; Generates "# Base directory" (dir ".cache") ;; A dir keyword (dir "extra")) ;; An alias to file but it indicates intends for a directory (context ; (comment "Project specific") (file "working-config.org") (file ,(format "%s.el" (file-name-base fmk/macro-file)))) (context (comment "Library") (path ;; Represents a path prefix "lib" ;; Everything underneath is generated with a "lib" prefix (context (dir "sandbox") (dir "packages")) (include (dir "modules") (dir "scripts")))) (block gtags) ;; Reusable blocks (block emacs)) user-emacs-directory ;; Rewrites =.gitginore= of `user-emacs-directory' )
I do hope the DSL is easy to digest. Here are the two libraries for the interested: magin and projin. If you stay around, I'd like to discuss the following concepts in crafting it:
- Backquote
- Dispatching
- Implicit Environment
Backquoting
The reason why I was confident in making this DSL is due to thanks to backquoting: it mixes the use of quoting and expressions in a Lispy way. I point you to understand quoting before proceeding, but as a quick brush it is using list of symbols as data.
As an example of quoting:
(+ 1 2 3) ;; 6 '(+ 1 2 3) ;; (+ 1 2 3) (quote (+ 1 2 3e)) (subject linking-verb adjective) ;; Missing symbols '(subject linking-verb adjective) ;; As is
It also allows for lazy evaluation, the problem arises when we want to integrate Elisp variables into the equation:
(setq left 1 right 2) (+ left right) ;; 3 (+ 1 right) ;; 3 '(+ 1 right) ;; (+ 1 right) ;; Using backquotes `(+ 1 ,right) ;; (+ 1 2) `(+ 1 ,(+ left right)) ;; (+ 1 3) ;; Evaluating the actual value (eval `(+ 1 ,(+ left right))) ;; 4
The complicated approach would ask us to create a tokenizer, parser and compiler while managing text matching and all that jazz; lispy data structures allows us to bypass this through the quoting. Without needing anymore, we can create a quoted structures that integrates with Elisp without any hastle.
Domain Syntax
So in thinking of the DSL, let us understand what the domain is. Our domain of abstraction is gitignore. After some thinking and reading, this is the abstraction I want to address:
- Files and directories have no distinction
- Comments
- Path context
- Separators
- Inclusion
- Reusable blocks or groups
There are other such abstraction I wish not cover:
- Quoting or escaping
- Glob keywords, as in
rx.el
, for single or double asterisk
A DSL that is good enough to accomplish the above is good enough without being too abstract. Without stretching too much, this DSL has no intermediate output, it translates directly to text. This tradeoff loses flexibility in output but at least it is easier. If you wanted, the DSL can be mechanically a list of plists with a main text manipulated by several properties as such:
(magin--compiler (path "glob" (include (file "foo.bar") (path "gib" (dir "foo-bar"))))) ;; This hypothetically might yield '((:line "foo.bar" :include t :parent "glob") (:line "foo-bar" :include t :parent "glob/gib"))
Without much complexity aside, we can start with hacking with the simple syntax.
Keywords
At last, we define our data structure:
;; The leaf keywords (file ,line) (dir ,line) ;; Alias for file ;; Non functional nodes (comment ,text) (newline) ;; Syntactic grouping, the equivalent of progn (context &rest ,sublines) ;; Contextual grouping, like `context' but affects everything inside it ;; Sublines have the property `:include' as `t' ;; Which tells the leaf node to add `!' as a prefix (include &rest ,sublines) ;; Sublines have the property `:path' set to `path' ;; Nesting of paths are handled (path ,path &rest sublines) ;; Block keywords ;; Create a variable (defblock ,block-name &rest block-lines) ;; Call the variable (block ,block-name) ;; Aesthetic grouping ;; Every subline is interleaved with a newline (delimited &rest ,sublines)
Those are the keywords that we must handle in our DSL. With this in mind, we can start be creating a function that handles the compilation.
(defun magin--compiler (dsl &optional env) "Compiles DSL with the environment ENV." nil)
This is our DSL handler, every node that wants to compile nodes have
to go through this dispatching function. This setup allows us to add
more keywords independently of each other. However, we are not going
to use a long switch
or cond
statement; we will use the implicit
Elisp environment to find handlers. If the compiler comes to a
keyword, it finds a function in the Elisp environment that starts with
magit--dsl-
and invokes that as a handler.
(defun magin--dsl-context (dsl env) "File keyword handler.") (defun magin--dsl-file (dsl env) "File keyword handler.") (defun magin--dsl-block (dsl env) "Block keyword handler.") (magin--compiler `(context (file "a") (block emacs) (unknown) ;; Error, no `magit--dsl-unknown' ))
In the example above, the compiler finds the three functions above and
calls them but fails on the last one. So if you want to extend it
without messing with the source, one can simply define a function with
the prefix and it will be detected by the compiler. This is similar to
how use-package
does it with its keywords. A global dependency
injection if you will. This magic is done through intern-soft
:
(lexical-let* ((rule-name (symbol-name rule)) (rule-handler (intern-soft (format "%s%s" magin-dsl-prefix rule-name)))) (if (null rule-handler) (error "No rule to handle %s at dsl: %s" rule-name dsl) (funcall rule-handler dsl env)))
As a side note, you can use an hash-map
or create your own obarray
or symbol environemt if you really wanted a private space. In our
approach, you can see the symbols with the helpful describe-variable
without extra work.
If you want to use eval-sexp
with the raw DSL, you have to create
your own eval
and env
. You can remap eval
with magin-compiler
,
but you need to know when it is evaluating a DSL or a Lisp. We're not
creating a new interpreter or environment, so this is good enough.
With this simple mechanism, we can define the keywords incrementally.
Context Keyword
So our dispatcher is a function that takes a dsl and an environment which returns a text. Without the intermediate data structure, the context is applying the dispatcher to each line and then combining it with a newline delimiter as such:
(string-join (mapcar (lambda (subdsl)(funcall #'magin--compiler subdsl env)) subdsls) "\n")
This is simply lining up the entries. Nothing complicated but how about manipulating those that manipulate the environment?
Environmental Keyword
A quick way to create an environment like scoping, one can easily use
an alist and it will take care of itself. For example with the
include
keyword:
(lexical-let ((new-env (append (list (cons :include t)) env))) (magin--dsl-context dsl new-env))
You could use plist but have to implement an extend
function for it.
Interestingly, you could also use an alist to simulate an enviroment
and lexical scoping without much problem.
It is as simple as appending the cons
entry and it is done. The
others are implemented the same way. Now how about implementing the
leaf keywords
Leaf Keyword
The implementation above forces everything to the leaf node which
really have only one important one, file
. The file has to take the
text value and format it based on the environment.
(lexical-let* ((parent (cdr (assoc :parent env))) (include (if (cdr (assoc :include env))"!" nil))) (concat include parent file))
Our friend is assoc
and cdr
when manipulating an alist enviroment.
Again, it is simply a matter formatting. So writing a DSL with this
setup is actually easy.
Block Keyword
Lastly, let's talk about defining a block variable. Like with the
dispatcher, we can use the implicit Elisp environment as the variable
space. No need to define a hash of symbols and lookups, we already
have one and we take advantage of it. The keyword defblock
will
simply save the defintion into a symbol with a magin--block-
prefix
and the other one, block
, looks for it. Aside from intern-soft
,
the complimentary friend is makunbound
, which unbinds the symbol so
the value is updated properly.
(lexical-let* ((block-name (symbol-name block-symbol)) (block-def-name (intern (format "%s%s" magin-block-prefix block-name)))) (makunbound block-def-name) (eval `(defvar ,block-def-name '(context ,@block-def) ,(format "Block definition for %s" block-name))) block-def-name)
Aside from using some direct eval
magic, it is as straightforward as
it gets as well as with the block
handler. With this, we can have
our resusable blocks of code.
(magin--dsl-defblock `(defblock emacs (comment "Block for emacs") (file "*.elc") (file ".#*") (file "*Org Src*") ;; Org Src buffers ) (list))
Non-functional Keyword
The least important is the non-functional ones, like with comment
and newline
which are prefixed text and an empty text respectively.
However, the delimited
keyword is a bit more complex that it
interleaves each line with a newline
keyword.
(lexical-let ((delimited-dsls (cdr (apply #'append (mapcar (lambda (dsl) (list '(newline) dsl)) subdsls))))) (magin--compiler `(context ,@delimited-dsls) env))
Take your pick in implementing interleave
but the thing here is that
it wraps the old context with a newly modified one. It may not be a
big thing but this interferes with debugging and tracing when you
don't know the original source. This is one weakness of this approach
that it interferes with the dsl itself. Nothing terrible but it is the
price of having the return value as text instead of an intermediate
value.
With all those ideas, the DSL is easily implemented.
Wrap Up
So with this design and implementation we have a crud but simple and dynamic DSL. You can look at the main defintion and notice the use of backquoting in how it addresses the problems mentioned. Now I can manage my ignore file with Emacs.
Now for some minor things we might have forgotten:
- Unit tests
- Logging
- Debugging
The DSL is not too complex to warrant any of those but it is nice to
note and implement. As one final point, we could have used a
defmacro
instead of a dispatch compiler but in case we want to
update our code in order to join other contexts, we have a base.
(setq left-code `(context (file "a") (file "b")) right-code `(context (file "c") (file "d"))) (magin-join left-code right-code) ;; ???
If we are dealing with text, we can't have this yet. In particular if you have an initial defintion and want to update it according some condition, then it requires changing the code to handle plist or something that fancies the data.
To reiterate, Elisp allows us to create a DSL with ease and little knowledge.
Conclusion
So I copied this code to create projin
for projectile
ignore file
but unless I need more functionality, I don't think I need it for now.
If another ignore file comes my way, I might refactor and do the same.
The one thing I want to really write is a DSL for SQL writing, not SQL itself. My dream is to write SQL as blocks of data.
(setq display-fields '(fields first-name last-name middle-name)) (select display-fields (from table)) ;; "SELECT firstName, lastName, middleName FROM table"
The idea in that snippet is the fields are data itself. It also extends to queries being data but the abstraction of aliasing and joining is sketchy for now. That is my dream though: to write a DSL for SQL writing. For now, ignore files are good practice.