Posts
-
Removing Paired Delimiters in Emacs
The other day someone filed an issue against my neocaml package, reporting surprising behavior with
delete-pair. My first reaction was – wait,delete-pair? I’ve been using Emacs for over 20 years and I wasn’t sure I had ever used this command. Time for some investigation!What is
delete-pair?delete-pairis a built-in Emacs command (defined inlisp/emacs-lisp/lisp.el) that deletes a pair of matching characters – typically parentheses, brackets, braces, or quotes. You place point on an opening delimiter, invokedelete-pair, and it removes both the opening and closing delimiter.Given that it lives in
lisp.el, it was clearly designed with Lisp editing in mind originally. And it was probably quite handy back in the day – beforepareditcame along and madedelete-pairlargely redundant for Lisp hackers.Here’s a simple example. Given the following code (with point on the opening parenthesis):
(print_endline "hello")Running
M-x delete-pairgives you:print_endline "hello"Simple and useful! Yet
delete-pairhas no default keybinding, which probably explains why so few people know about it. If you want to use it regularly, you’ll need to bind it yourself:(global-set-key (kbd "M-s-d") #'delete-pair)Pick whatever keybinding works for you, of course. There’s no universally agreed upon binding for this one.
The Gotcha
The issue that was reported boiled down to
delete-pairnot always finding the correct matching delimiter. It usesforward-sexpunder the hood to find the matching closer, which means its accuracy depends entirely on the buffer’s syntax table and the major mode’s parsing capabilities. For languages with complex or unusual syntax, this can sometimes lead to the wrong delimiter being removed – not great when you’re trying to be surgical about your edits.Alternatives for Pair Management
If you work with paired delimiters frequently,
delete-pairis just one tool in a rich ecosystem. Here’s a quick overview of the alternatives:paredit
paredit is the gold standard for structured editing of Lisp code. I’ve been a heavy
paredituser for as long as I can remember – if you write any Lisp-family language, it’s indispensable.pareditgives youparedit-splice-sexp(bound toM-sby default), which removes the surrounding delimiters while keeping the contents intact. There’s alsoparedit-raise-sexp(M-r), which replaces the enclosing sexp with the sexp at point – another way to get rid of delimiters. And of course,pareditprevents you from creating unbalanced expressions in the first place, which is a huge win.Once you’ve got
pareditin your muscle memory, you’ll never think aboutdelete-pairagain (as I clearly haven’t).Let’s see these commands in action. In the examples below,
|marks the position of point.paredit-splice-sexp(M-s) – removes the surrounding delimiters:;; Before (point anywhere inside the inner parens): (foo (bar| baz) quux) ;; After M-s: (foo bar| baz quux)paredit-raise-sexp(M-r) – replaces the enclosing sexp with the sexp at point:;; Before: (foo (bar| baz) quux) ;; After M-r: (foo bar| quux)Notice the difference:
splicekeeps all the siblings,raisekeeps only the sexp at point and discards everything else inside the enclosing delimiters.smartparens
smartparens is the most feature-rich option and works across all languages, not just Lisps. For unwrapping pairs, it offers a whole family of commands:
sp-unwrap-sexp– removes the enclosing pair delimiters, keeping the contentsp-backward-unwrap-sexp– same, but operating backwardsp-splice-sexp(M-D) – removes delimiters and integrates content into the parent expressionsp-splice-sexp-killing-backward/sp-splice-sexp-killing-forward– splice while killing content in one direction
Here’s how the key ones look in practice:
sp-unwrap-sexp– removes the next pair’s delimiters:# Before (point on the opening bracket): result = calculate(|[x, y, z]) # After sp-unwrap-sexp: result = calculate(|x, y, z)sp-splice-sexp(M-D) – works like paredit’s splice, removes the innermost enclosing pair:# Before (point anywhere inside the parens): result = calculate(x + |y) # After M-D: result = calculate x + |ysp-splice-sexp-killing-backward– splices, but also kills everything before point:# Before: result = [first, second, |third, fourth] # After sp-splice-sexp-killing-backward: result = |third, fourthI used
smartparensfor a while for non-Lisp languages, but eventually found it a bit heavy for my needs.electric-pair-mode
electric-pair-modeis the built-in option (since Emacs 24.1) that automatically inserts matching delimiters when you type an opening one. It’s lightweight, requires zero configuration, and works surprisingly well for most use cases. I’ve been using it as my go-to solution for non-Lisp languages for a while now.The one thing
electric-pair-modedoesn’t offer is any way to unwrap/remove paired delimiters. The closest it gets is deleting both delimiters when you backspace between an adjacent empty pair (e.g.,(|)– pressing backspace removes both parens). But that’s it – there’s no unwrap command. That’s wheredelete-paircomes in handy as a complement.A Note on Vim’s surround.vim
Having played with Vim and its various surround.vim-like plugins over the years, I have to admit – I kind of miss that experience in Emacs, at least for removing paired delimiters.
surround.vimmakes it dead simple:ds(deletes surrounding parens,ds"deletes surrounding quotes. It works uniformly across all file types and feels very natural.In Emacs, the story is more fragmented –
paredithandles it beautifully for Lisps,smartparensdoes it for everything but is a heavyweight dependency, andelectric-pair-modejust… doesn’t do it at all.delete-pairis the closest thing to a universal built-in solution, but its lack of a default keybinding and its reliance onforward-sexpmake it a bit rough around the edges.If you’re using
electric-pair-modeand want a simplesurround.vim-style “delete surrounding pair” command without pulling in a big package, here’s a little hack that does the trick:(defun delete-surrounding-pair (char) "Delete the nearest surrounding pair of CHAR. CHAR should be an opening delimiter like (, [, {, or \". Works by searching backward for the opener and forward for the closer." (interactive "cDelete surrounding pair: ") (let* ((pairs '((?( . ?)) (?[ . ?]) (?{ . ?}) (?\" . ?\") (?\' . ?\') (?\` . ?\`))) (closer (or (alist-get char pairs) (error "Unknown pair character: %c" char)))) (save-excursion (let ((orig (point))) ;; Find and delete the opener (when (search-backward (char-to-string char) nil t) (let ((open-pos (point))) (delete-char 1) ;; Find and delete the closer (adjust for removed char) (goto-char (1- orig)) (when (search-forward (char-to-string closer) nil t) (delete-char -1)))))))) (global-set-key (kbd "M-s-d") #'delete-surrounding-pair)Now you can hit
M-s-d (to delete surrounding parens,M-s-d "for quotes, etc. It’s deliberately naive – no syntax awareness, no nesting support – so it won’t play well with delimiters inside strings or comments (it’ll happily match a paren in a comment if that’s what it finds first). But for quick, straightforward edits it gets the job done.When to Use What
My current setup is:
- Lisp languages (Emacs Lisp, Clojure, Common Lisp, etc.):
paredit, no contest. - Everything else:
electric-pair-modefor auto-pairing, plusdelete-pairwhen I need to unwrap something.
If you want a more powerful structural editing experience across all languages,
smartparensis hard to beat. It’s just more than I personally need outside of Lisp.Wrapping Up
One of the greatest aspects of Emacs is that we get to learn (or relearn) something about it every other day. Even after decades of daily use, there are always more commands lurking in the corners, patiently waiting to be discovered.
Now, if you’ll excuse me, I’m going to immediately forget about
delete-pairagain. Keep hacking! -
Code Formatting in Emacs
I got inspired to look into this topic after receiving the following obscure bug report for neocaml:
I had your package installed. First impressions were good, but then I had to uninstall it. The code formatting on save stopped working for some reason, and the quickest solution was to revert to the previous setup.
I found this somewhat entertaining – neocaml is a major mode, it has nothing to do with code formatting. But I still started to wonder what kind of setup that user might have. So here we are!
Code formatting is one of those things that shouldn’t require much thought – you pick a formatter, you run it, and your code looks consistent. In practice, Emacs gives you a surprising number of ways to get there, from built-in indentation commands to external formatters to LSP-powered solutions. This article covers the landscape and helps you pick the right approach.
One thing to note upfront: most formatting solutions hook into saving the buffer, but there are two distinct patterns. The more common one is format-then-save (via
before-save-hook) – the buffer is formatted before it’s written to disk, so the file is always in a formatted state. The alternative is save-then-format (viaafter-save-hook) – the file is saved first, then formatted and saved again. The second approach can be done asynchronously (the editor doesn’t block), but it means the file is briefly unformatted on disk. Keep this distinction in mind as we go through the options.Built-in: Indentation, Not Formatting
Let’s get the terminology straight. Emacs has excellent built-in indentation support, but indentation is not the same as formatting.
indent-region(C-M-\) adjusts leading whitespace according to the major mode’s rules. It won’t reformat long lines, reorganize imports, add or remove blank lines, or apply any of the opinionated style choices that modern formatters handle.That said, for many languages (especially Lisps),
indent-regionon the whole buffer is all the formatting you’ll ever need:;; A simple indent-buffer command (Emacs doesn't ship one) (defun indent-buffer () "Indent the entire buffer." (interactive) (indent-region (point-min) (point-max)))Tip:
whitespace-cleanupis a nice complement – it handles trailing whitespace, mixed tabs/spaces, and empty lines at the beginning and end of the buffer. Adding it tobefore-save-hookkeeps things tidy:(add-hook 'before-save-hook #'whitespace-cleanup)Shelling Out: The DIY Approach
The simplest way to run an external formatter is
shell-command-on-region(M-|). With a prefix argument (C-u M-|), it replaces the region with the command’s output:C-u M-| prettier --stdin-filepath foo.js RET(The
--stdin-filepathflag doesn’t read from a file – it just tells Prettier which parser to use based on the filename extension.)You can wrap this in a command for repeated use:
(defun format-with-prettier () "Format the current buffer with Prettier." (interactive) (let ((point (point))) (shell-command-on-region (point-min) (point-max) "prettier --stdin-filepath foo.js" (current-buffer) t) (goto-char point)))This works, but it’s fragile – no error handling, no automatic file type detection, and cursor position is only approximately preserved. For anything beyond a quick one-off, you’ll want a proper package.
reformatter.el: Define Your Own (Re)Formatters
reformatter.el is a small library that generates formatter commands from a simple declaration. You define the formatter once, and it creates everything you need:
(reformatter-define black-format :program "black" :args '("-q" "-") :lighter " Black")This single form generates three things:
black-format-buffer– format the entire bufferblack-format-region– format the selected regionblack-format-on-save-mode– a minor mode that formats on save
Enabling format-on-save is then just:
(add-hook 'python-mode-hook #'black-format-on-save-mode)reformatter.elhandles temp files, error reporting, and stdin/stdout piping. It also supports formatters that work on files instead of stdin (via:stdin niland:input-file), and you can use buffer-local variables in:programand:argsfor per-project configuration via.dir-locals.el.I love the approach taken by this package! It’s explicit, you see exactly what’s being called, and the generated on-save mode plays nicely with the rest of your config.
format-all: Zero Configuration
format-all takes a different approach – it auto-detects the right formatter for 70+ languages based on the major mode. You don’t define anything; it just works:
(add-hook 'prog-mode-hook #'format-all-mode) (add-hook 'prog-mode-hook #'format-all-ensure-formatter)The main command is
format-all-region-or-buffer. Theformat-all-modeminor mode handles format-on-save. If you need to override the auto-detected formatter, setformat-all-formatters(works well in.dir-locals.el):;; In .dir-locals.el -- use black instead of autopep8 for Python ((python-mode . ((format-all-formatters . (("Python" black))))));; Or in your init file (setq-default format-all-formatters '(("Python" black)))The trade-off is less control – you’re trusting the package’s formatter database, and debugging issues is harder when you don’t see the underlying command.
apheleia: Async and Cursor-Aware
apheleia is the most sophisticated option. It solves two problems the other packages don’t:
- Asynchronous formatting – it runs the formatter after save, so the editor never blocks. If you modify the buffer before formatting completes, the result is discarded.
- Cursor preservation – instead of replacing the entire buffer, it applies changes as RCS patches (a classic diff format from one of the earliest version control systems), so your cursor position and scroll state are maintained.
;; Enable globally (apheleia-global-mode +1)apheleia auto-detects formatters like
format-all, but you can configure things explicitly:;; Chain multiple formatters (e.g., sort imports, then format) (setf (alist-get 'python-mode apheleia-mode-alist) '(isort black))Formatter chaining is a killer feature –
isortthenblack,eslintthenprettier, etc. No other package handles this as cleanly.Caveat: Because apheleia formats after save, the file on disk is briefly in an unformatted state. This is usually fine, but it can confuse tools that watch files for changes. It also doesn’t support TRAMP/remote files.
LSP: eglot and lsp-mode
If you’re already using a language server, formatting is built in. The language server handles the formatting logic, and Emacs just sends the request.
eglot (built-in since Emacs 29)
The main commands are
eglot-format(formats the active region, or the entire buffer if no region is active) andeglot-format-buffer(always formats the entire buffer).Format-on-save requires a hook – eglot doesn’t provide a toggle for it:
(add-hook 'eglot-managed-mode-hook (lambda () (add-hook 'before-save-hook #'eglot-format-buffer nil t)))The
nil tmakes the hook buffer-local, so it only fires in eglot-managed buffers.lsp-mode
The equivalents here are
lsp-format-bufferandlsp-format-region.lsp-mode has a built-in option for format-on-save:
(setq lsp-format-buffer-on-save t)It also supports on-type formatting (formatting as you type closing braces, semicolons, etc.) via
lsp-enable-on-type-formatting, which is enabled by default.LSP Caveats
- Formatting capabilities depend entirely on the language server. Some servers
(like
goplsorrust-analyzer) have excellent formatters; others may not support formatting at all. - The formatter’s configuration lives outside Emacs – in
.clang-format,pyproject.toml,.prettierrc, etc. This is actually a feature if you’re working on a team, since the config is shared. - LSP formatting can be slow for large files since it’s a round-trip to the server process.
Which Approach Should You Use?
There’s no single right answer, but here’s a rough guide:
- Lisps and simple indentation needs: Built-in
indent-regionis probably all you need. - Specific formatter, full control:
reformatter.el– explicit, simple, and predictable. - Many languages, minimal config:
format-allorapheleia. Pickapheleiaif you want async formatting and cursor stability. - Already using LSP: Just use
eglot-format/lsp-format-buffer. One less package to maintain. - Mixed setup: Nothing stops you from using LSP formatting for some languages
and
reformatter.elfor others. Just be careful not to have two things fighting over format-on-save for the same mode.
Tip: Whichever approach you choose, consider enabling format-on-save per project via
.dir-locals.elrather than globally. Not every project uses the same formatter (or any formatter at all), and formatting someone else’s unformatted codebase on save is a recipe for noisy diffs.;; .dir-locals.el ((python-mode . ((eval . (black-format-on-save-mode)))))Epilogue
So many options, right? That’s so Emacs!
I’ll admit that I don’t actually use any of the packages mentioned in this article – I learned about all of them while doing a bit of research for alternatives to the DIY and LSP approaches. That said, I have a very high opinion of everything done by Steve Purcell (author of
reformatter.el, many other Emacs packages, a popular Emacs Prelude-like config, and co-maintainer of MELPA) and Radon Rosborough (author ofapheleia,straight.el, and the Radian Emacs config), so I have no issue endorsing packages created by them.I’m in camp LSP most of the time these days, and I’d guess most people are too. But if I weren’t, I’d probably take
apheleiafor a spin. Either way, it’s never bad to have options, right?There are languages where LSP isn’t as prevalent – all sorts of Lisp dialects, for instance – where something like
apheleiaorreformatter.elmight come in handy. But then again, in Lispsindent-regionworks so well that you rarely need anything else. I’m a huge fan ofindent-regionmyself – for any good Emacs mode, it’s all the formatting you need. -
Taming Font-Lock with font-lock-ignore
I recently wrote about customizing font-lock in the age of Tree-sitter. After publishing that article, a reader pointed out that I’d overlooked
font-lock-ignore– a handy option for selectively disabling font-lock rules that was introduced in Emacs 29. I’ll admit I had no idea it existed, and I figured if I missed it, I’m probably not the only one.1It’s a bit amusing that something this useful only landed in Emacs 29 – the very release that kicked off the transition to Tree-sitter. Better late than never, right?
The Problem
Traditional font-lock gives you two ways to control highlighting: the coarse
font-lock-maximum-decoration(pick a level from 1 to 3) and the surgicalfont-lock-remove-keywords(manually specify which keyword rules to drop). The first is too blunt – you can’t say “I want level 3 but without operator highlighting.” The second is fragile – you need to know the exact internal structure of the mode’sfont-lock-keywordsand call it from a mode hook.What was missing was a declarative way to say “in this mode, don’t highlight these things” without getting your hands dirty with the internals. That’s exactly what
font-lock-ignoreprovides.How It Works
font-lock-ignoreis a single user option (adefcustom) whose value is an alist. Each entry maps a mode symbol to a list of conditions that describe which font-lock rules to suppress:(setq font-lock-ignore '((MODE CONDITION ...) (MODE CONDITION ...) ...))MODE is a major or minor mode symbol. For major modes,
derived-mode-pis used, so a rule forprog-modeapplies to all programming modes. For minor modes, the rule applies when the mode is active.CONDITION can be:
- A face symbol – suppresses any font-lock rule that applies that face. Supports
glob-style wildcards:
font-lock-*-facematches all standard font-lock faces. - A string – suppresses any rule whose regexp would match that string. This
lets you disable highlighting of a specific keyword like
"TODO"or"defun". (pred FUNCTION)– suppresses rules for whichFUNCTIONreturns non-nil.(not CONDITION),(and CONDITION ...),(or CONDITION ...)– the usual logical combinators.(except CONDITION)– carves out exceptions from broader rules.
Note: The Emacs manual covers
font-lock-ignorein the Customizing Keywords section of the Elisp reference.When to Use It
font-lock-ignoreis most useful when you’re generally happy with a mode’s highlighting but want to tone down specific aspects. Maybe you find type annotations too noisy, or you don’t want preprocessor directives highlighted, or a minor mode is adding highlighting you don’t care for.For Tree-sitter modes, the feature/level system described in my previous article is the right tool for the job. But for traditional modes – and there are still plenty of those –
font-lock-ignorefills a gap that existed for decades.Discovering Which Faces to Suppress
To use
font-lock-ignoreeffectively, you need to know which faces are being applied to the text you want to change. A few built-in commands make this easy:C-u C-x =(what-cursor-positionwith a prefix argument) – the quickest way. It shows the face at point along with other text properties right in the echo area.M-x describe-face– prompts for a face name (defaulting to the face at point) and shows its full definition, inheritance chain, and current appearance.M-x list-faces-display– opens a buffer listing all defined faces with visual samples. Handy for browsing thefont-lock-*-facefamily and the newer Emacs 29 faces likefont-lock-bracket-faceandfont-lock-operator-face.
Once you’ve identified the face, just drop it into
font-lock-ignore.Practical Examples
Here’s the example from the Emacs manual, which shows off the full range of conditions:
(setq font-lock-ignore '((prog-mode font-lock-*-face (except help-echo)) (emacs-lisp-mode (except ";;;###autoload")) (whitespace-mode whitespace-empty-at-bob-regexp) (makefile-mode (except *))))Let’s break it down:
- In all
prog-modederivatives, suppress all standardfont-lock-*-facehighlighting (syntactic fontification for comments and strings is unaffected, since that uses the syntax table, not keyword rules). - But keep any rules that add a
help-echotext property. - In
emacs-lisp-mode, also keep the;;;###autoloadcookie highlighting (which rule 1 would have suppressed). - When
whitespace-modeis active, additionally suppress thewhitespace-empty-at-bob-regexphighlight. - In
makefile-mode,(except *)undoes all previous conditions, effectively exempting Makefiles from any filtering.
Here are some simpler, more focused examples:
;; Disable type highlighting in all programming modes (setq font-lock-ignore '((prog-mode font-lock-type-face))) ;; Disable bracket and operator faces specifically (setq font-lock-ignore '((prog-mode font-lock-bracket-face font-lock-operator-face))) ;; Disable keyword highlighting in python-mode only (setq font-lock-ignore '((python-mode font-lock-keyword-face)))Pretty sweet, right?
Important Caveats
A few things to keep in mind:
font-lock-ignoreonly affects keyword fontification (the regexp-based rules infont-lock-keywords). It does not touch syntactic fontification – comments and strings highlighted via the syntax table are not affected.- It’s a global option, not buffer-local. You scope rules to specific modes via the alist keys.
- Since it filters rules at compile time (during
font-lock-compile-keywords), changes take effect the next time font-lock is initialized in a buffer. If you’re experimenting, runM-x font-lock-modetwice (off then on) to see your changes.
The End
I don’t know about you, but I really wish that
font-lock-ignoregot added to Emacs a long time ago. Still, the transition to Tree-sitter modes is bound to take years, so many of us will still get to leveragefont-lock-ignoreand benefit from it.That’s all I have for you today. Keep hacking!
-
That’s one of the reasons I love writing about Emacs features – I often learn something new while doing the research for an article, and as bonus I get to learn from my readers as well. ↩
- A face symbol – suppresses any font-lock rule that applies that face. Supports
glob-style wildcards:
-
Customizing Font-Lock in the Age of Tree-sitter
I recently wrote about building major modes with Tree-sitter over on batsov.com, covering the mode author’s perspective. But what about the user’s perspective? If you’re using a Tree-sitter-powered major mode, how do you actually customize the highlighting?
This is another article in a recent streak inspired by my work on neocaml, clojure-ts-mode, and asciidoc-mode. Building three Tree-sitter modes across very different languages has given me a good feel for both sides of the font-lock equation – and I keep running into users who are puzzled by how different the new system is from the old regex-based world.
This post covers what changed, what you can control, and how to make Tree-sitter font-lock work exactly the way you want.
The Old World: Regex Font-Lock
Traditional font-lock in Emacs actually has two phases. First, syntactic fontification handles comments and strings using the buffer’s syntax table and
parse-partial-sexp(implemented in C) – this isn’t regexp-based at all. Second, keyword fontification runs the regexps infont-lock-keywordsagainst the buffer text to highlight everything else: language keywords, types, function names, and so on. When people talk about “regex font-lock,” they usually mean this second phase, which is where most of the mode-specific highlighting lives and where most of the customization happens.If you wanted to customize it, you’d manipulate
font-lock-keywordsdirectly:;; Add a custom highlighting rule in the old world (font-lock-add-keywords 'emacs-lisp-mode '(("\\<\\(FIXME\\|TODO\\)\\>" 1 'font-lock-warning-face prepend)))The downsides are well-known: regexps can’t understand nesting, they break on multi-line constructs, and getting them right for a real programming language is a never-ending battle of edge cases.
The New World: Tree-sitter Font-Lock
Tree-sitter font-lock is fundamentally different. Instead of matching text with regexps, it queries the syntax tree. A major mode defines
treesit-font-lock-settings– a list of Tree-sitter queries paired with faces. Each query pattern matches node types in the parse tree, not text patterns.This means highlighting is structurally correct by definition. A string is highlighted as a string because the parser identified it as a string node, not because a regexp happened to match quote characters. If the code has a syntax error, the parser still produces a (partial) tree, and highlighting degrades gracefully instead of going haywire.
There’s also a significant performance difference. With regex font-lock, every regexp in
font-lock-keywordsruns against every line in the visible region on each update – more rules means linearly more work, and a complex major mode can easily have dozens of regexps. Poorly written patterns with nested quantifiers can trigger catastrophic backtracking, causing visible hangs on certain inputs. Multi-line font-lock (viafont-lock-multilineorjit-lock-contextually) makes things worse, requiring re-scanning of larger regions that’s both expensive and fragile. Tree-sitter sidesteps all of this: after the initial parse, edits only re-parse the changed portion of the syntax tree, and font-lock queries run against the already-built tree rather than scanning raw text. The result is highlighting that scales much better with buffer size and rule complexity.The trade-off is that customization works differently. You can’t just add a regexp to a list anymore. But the new system offers its own kind of flexibility, and in many ways it’s more powerful.
Note: The Emacs manual covers Tree-sitter font-lock in the Parser-based Font Lock section. For the full picture of Tree-sitter integration in Emacs, see Parsing Program Source.
Feature Levels: The Coarse Knob
Every Tree-sitter major mode organizes its font-lock rules into features – named groups of related highlighting rules. Features are then arranged into 4 levels, from minimal to maximal. The Emacs manual recommends the following conventions for what goes into each level:
- Level 1: The absolute minimum – typically
commentanddefinition - Level 2: Key language constructs –
keyword,string,type - Level 3: Everything that can be reasonably fontified (this is the default level)
- Level 4: Marginally useful highlighting – things like
bracket,delimiter,operator
In practice, many modes don’t follow these conventions precisely. Some put
numberat level 2, others at level 3. Some includevariableat level 1, others at level 4. The inconsistency across modes means that settingtreesit-font-lock-levelto the same number in different modes can give you quite different results – which is one more reason you might want the fine-grained control described in the next section.1It’s also worth noting that the feature names themselves are not standardized. There are many common ones you’ll see across modes –
comment,string,keyword,type,number,bracket,operator,definition,function,variable,constant,builtin– but individual modes often define features specific to their language. Clojure hasquote,deref, andtagged-literals; OCaml might haveattribute; a markup language mode might haveheadingorlink. Different modes also vary in how granular they get: some expose a rich set of features that let you fine-tune almost every aspect of highlighting, while others are more spartan and stick to the basics.The bottom line is that you’ll always have to check what your particular mode offers. The easiest way is
M-x describe-variable RET treesit-font-lock-feature-listin a buffer using that mode – it shows all features organized by level. You can also inspect the mode’s source directly by looking at how it populatestreesit-font-lock-settings(tryM-x find-libraryto jump to the mode’s source).For example, clojure-ts-mode defines:
Level Features 1 comment,definition,variable2 keyword,string,char,symbol,builtin,type3 constant,number,quote,metadata,doc,regex4 bracket,deref,function,tagged-literalsAnd neocaml:
Level Features 1 comment,definition2 keyword,string,number3 attribute,builtin,constant,type4 operator,bracket,delimiter,variable,functionThe default level is 3, which is a reasonable middle ground for most people. You can change it globally:
(setq treesit-font-lock-level 4) ;; maximum highlightingOr per-mode via a hook:
(defun my-clojure-ts-font-lock () (setq-local treesit-font-lock-level 2)) ;; minimal: just keywords and strings (add-hook 'clojure-ts-mode-hook #'my-clojure-ts-font-lock)This is the equivalent of the old
font-lock-maximum-decorationvariable, but more principled – features at each level are explicitly chosen by the mode author rather than being an arbitrary “how much highlighting do you want?” dial.Note: The Emacs manual describes this system in detail under Font Lock and Syntax.
Cherry-Picking Features: The Fine Knob
Levels are a blunt instrument. What if you want operators and variables (level 4) but not brackets and delimiters (also level 4)? You can’t express that with a single number.
Enter
treesit-font-lock-recompute-features. This function lets you explicitly enable or disable individual features, regardless of level:(defun my-neocaml-font-lock () (treesit-font-lock-recompute-features '(comment definition keyword string number attribute builtin constant type operator variable) ;; enable '(bracket delimiter function))) ;; disable (add-hook 'neocaml-base-mode-hook #'my-neocaml-font-lock)You can also call it interactively with
M-x treesit-font-lock-recompute-featuresto experiment in the current buffer before committing to a configuration.This used to be hard in the old regex world – you’d have to dig into
font-lock-keywords, figure out which entries corresponded to which syntactic elements, and surgically remove them. Emacs 29 improved the situation withfont-lock-ignore, which lets you declaratively suppress specific font-lock rules by mode, face, or regexp. Still, the Tree-sitter approach is arguably cleaner: features are named groups designed for exactly this kind of cherry-picking, rather than an escape hatch bolted on after the fact.Customizing Faces
This part works the same as before – faces are faces. Tree-sitter modes use the standard
font-lock-*-facefamily, so your theme applies automatically. If you want to tweak a specific face:(custom-set-faces '(font-lock-type-face ((t (:foreground "DarkSeaGreen4")))) '(font-lock-property-use-face ((t (:foreground "DarkOrange3")))))One thing to note: Tree-sitter modes use some of the newer faces introduced in Emacs 29, like
font-lock-operator-face,font-lock-bracket-face,font-lock-number-face,font-lock-property-use-face, andfont-lock-escape-face. These didn’t exist in the old world (there was no concept of “operator highlighting” in traditional font-lock), so older themes may not define them. If your theme makes operators and variables look the same, that’s why – the theme predates these faces.Adding Custom Rules
This is where Tree-sitter font-lock really shines compared to the old system. Instead of writing regexps, you write Tree-sitter queries that match on the actual syntax tree.
Say you want to distinguish block-delimiting keywords (
begin/end,struct/sig) from control-flow keywords (if/then/else) in OCaml:(defface my-block-keyword-face '((t :inherit font-lock-keyword-face :weight bold)) "Face for block-delimiting keywords.") (defun my-neocaml-block-keywords () (setq treesit-font-lock-settings (append treesit-font-lock-settings (treesit-font-lock-rules :language (treesit-parser-language (car (treesit-parser-list))) :override t :feature 'keyword '(["begin" "end" "struct" "sig" "object"] @my-block-keyword-face)))) (treesit-font-lock-recompute-features)) (add-hook 'neocaml-base-mode-hook #'my-neocaml-block-keywords)The
:override tis important – without it, the new rule won’t overwrite faces already applied by the mode’s built-in rules. And the:featurekeyword assigns the rule to a feature group, so it respects the level/feature system.Note: The full query syntax is documented in the Pattern Matching section of the Emacs manual – it covers node types, field names, predicates, wildcards, and more.
For comparison, here’s what you’d need in the old regex world to highlight a specific set of keywords with a different face:
;; Old world: fragile, doesn't understand syntax (font-lock-add-keywords 'some-mode '(("\\<\\(begin\\|end\\|struct\\|sig\\)\\>" . 'my-block-keyword-face)))The regex version looks simpler, but it’ll match
begininside strings, comments, and anywhere else the text appears. The Tree-sitter version only matches actual keyword nodes in the syntax tree.Exploring the Syntax Tree
The killer feature for customization is
M-x treesit-explore-mode. It opens a live view of the syntax tree for the current buffer. As you move point, the explorer highlights the corresponding node and shows its type, field name, and position.This is indispensable when writing custom font-lock rules. Want to know what node type OCaml labels are? Put point on one, check the explorer: it’s
label_name. Want to highlight it? Write a query for(label_name). No more guessing what regexp might work.Another useful tool is
M-x treesit-inspect-node-at-point, which shows information about the node at point in the echo area without opening a separate window.The Cheat Sheet
Here’s a quick reference for the key differences:
Aspect Regex font-lock Tree-sitter font-lock Rules defined by font-lock-keywordstreesit-font-lock-settingsMatching mechanism Regular expressions on text Queries on syntax tree nodes Granularity control font-lock-maximum-decorationtreesit-font-lock-level+ featuresAdding rules font-lock-add-keywordsAppend to treesit-font-lock-settingsRemoving rules font-lock-remove-keywordstreesit-font-lock-recompute-featuresSuppressing rules font-lock-ignore(Emacs 29+)Disable features via level or cherry-pick Debugging re-buildertreesit-explore-modeHandles nesting Poorly Correctly (by definition) Multi-line constructs Fragile Works naturally Performance O(n) per regexp per line Incremental, only re-parses changes Closing Thoughts
The shift from regex to Tree-sitter font-lock is one of the bigger under-the-hood changes in modern Emacs. The customization model is different – you’re working with structured queries instead of text patterns – but once you internalize it, it’s arguably more intuitive. You say “highlight this kind of syntax node” instead of “highlight text that matches this pattern and hope it doesn’t match inside a string.”
The feature system with its levels, cherry-picking, and custom rules gives you more control than the old
font-lock-maximum-decorationever did. Andtreesit-explore-modemakes it easy to discover what’s available.If you haven’t looked at your Tree-sitter mode’s font-lock features yet, try
M-x describe-variable RET treesit-font-lock-feature-listin a Tree-sitter buffer. You might be surprised by how much you can tweak.-
Writing this article has been more helpful than I expected – halfway through, I realized my own neocaml had
functionbanished to level 4 andnumberpromoted to level 2. Physician, heal thyself. ↩
- Level 1: The absolute minimum – typically
-
Mastering Compilation Mode
I’ve been using Emacs for over 20 years. I’ve always used
M-x compileandnext-errorwithout thinking much about them – you run a build, you jump to errors, life is good. But recently, while working on neocaml (a Tree-sitter-based OCaml major mode), I had to write a custom compilation error regexp and learned thatcompile.elis far more sophisticated and extensible than I ever appreciated.This post is a deep dive into compilation mode – how it works, how to customize it, and how to build on top of it.
The Basics
If you’re not already using
M-x compile, start today. It runs a shell command, captures the output in a*compilation*buffer, and parses error messages so you can jump directly to the offending source locations.The essential keybindings in a compilation buffer:
Keybinding Command What it does grecompileRe-run the last compilation command M-ncompilation-next-errorMove to the next error message M-pcompilation-previous-errorMove to the previous error message RETcompile-goto-errorJump to the source location of the error at point C-c C-fnext-error-follow-minor-modeAuto-display source as you move through errors But the real power move is using
next-errorandprevious-error(M-g nandM-g p) from any buffer. You don’t need to be in the compilation buffer – Emacs tracks the last buffer that produced errors and jumps you there. This works across compile, grep, occur, and any other mode that produces error-like output.Pro tip:
M-g M-nandM-g M-pdo the same thing asM-g n/M-g pbut are easier to type since you can hold Meta throughout.How Error Parsing Actually Works
Here’s the part that surprised me. Compilation mode doesn’t have a single regexp that it tries to match against output. Instead, it has a list of regexp entries, and it tries all of them against every line. The list lives in two variables:
compilation-error-regexp-alist– a list of symbols naming active entriescompilation-error-regexp-alist-alist– an alist mapping those symbols to their actual regexp definitions
Emacs ships with dozens of entries out of the box – for GCC, Java, Ruby, Python, Perl, Gradle, Maven, and many more. You can see all of them with:
(mapcar #'car compilation-error-regexp-alist-alist)Each entry in the alist has this shape:
(SYMBOL REGEXP FILE LINE COLUMN TYPE HYPERLINK HIGHLIGHT...)Where:
- REGEXP – the regular expression to match
- FILE – group number (or function) for the filename
- LINE – group number (or cons of start/end groups) for the line
- COLUMN – group number (or cons of start/end groups) for the column
- TYPE – severity: 2 = error, 1 = warning, 0 = info (can also be a cons for conditional severity)
- HYPERLINK – group number for the clickable portion
- HIGHLIGHT – additional faces to apply
The TYPE field is particularly interesting. It can be a cons cell
(WARNING-GROUP . INFO-GROUP), meaning “if group N matched, it’s a warning; if group M matched, it’s info; otherwise it’s an error.” This is how a single regexp can handle errors, warnings, and informational messages.A Real-World Example: OCaml Errors
Let me show you what I built for neocaml. OCaml compiler output looks like this:
File "foo.ml", line 10, characters 5-12: 10 | let x = bad_value ^^^^^^^ Error: Unbound value bad_valueWarnings:
File "foo.ml", line 3, characters 6-7: 3 | let _ x = () ^ Warning 27 [unused-var-strict]: unused variable x.And ancillary locations (indented 7 spaces):
File "foo.ml", line 5, characters 0-20: 5 | let f (x : int) = x ^^^^^^^^^^^^^^^^^^^^ File "foo.ml", line 10, characters 6-7: 10 | f "hello" ^ Error: This expression has type string but ...One regexp needs to handle all of this. Here’s the (slightly simplified) entry:
(push `(ocaml ,neocaml--compilation-error-regexp 3 ; FILE = group 3 (4 . 5) ; LINE = groups 4-5 (6 . neocaml--compilation-end-column) ; COLUMN = group 6, end via function (8 . 9) ; TYPE = warning if group 8, info if group 9 1 ; HYPERLINK = group 1 (8 font-lock-function-name-face)) ; HIGHLIGHT group 8 compilation-error-regexp-alist-alist)A few things worth noting:
- The COLUMN end position uses a function instead of a group number.
OCaml’s end column is exclusive, but Emacs expects inclusive, so
neocaml--compilation-end-columnsubtracts 1. - The TYPE cons
(8 . 9)means: if group 8 matched (Warning/Alert text), it’s a warning; if group 9 matched (7-space indent), it’s info; otherwise it’s an error. Three severity levels from one regexp. - The entry is registered globally in
compilation-error-regexp-alist-alistbecause*compilation*buffers aren’t in any language-specific mode. Every active entry is tried against every line.
Adding Your Own Error Regexp
You don’t need to be writing a major mode to add your own entry. Say you’re working with a custom linter that outputs:
[ERROR] src/app.js:42:10 - Unused import 'foo' [WARN] src/app.js:15:3 - Missing return typeYou can teach compilation mode about it:
(with-eval-after-load 'compile (push '(my-linter "^\\[\\(ERROR\\|WARN\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)" 2 3 4 (1 . nil)) compilation-error-regexp-alist-alist) (push 'my-linter compilation-error-regexp-alist))The TYPE field
(1 . nil)means: “if group 1 matches, it’s a warning” – but wait, group 1 always matches. The trick is that compilation mode checks the content of the match. Actually, let me correct myself. The TYPE field should be a number or expression. A cleaner approach:(with-eval-after-load 'compile (push '(my-linter "^\\[\\(?:ERROR\\|\\(WARN\\)\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)" 2 3 4 (1)) compilation-error-regexp-alist-alist) (push 'my-linter compilation-error-regexp-alist))Here group 1 only matches for
WARNlines (it’s inside a non-capturing group with an alternative). TYPE is(1)meaning “if group 1 matched, it’s a warning; otherwise it’s an error.”Now
M-x compilewith your linter command will highlight errors and warnings differently, andnext-errorwill jump right to them.Useful Variables You Might Not Know
A few compilation variables that are worth knowing:
;; OCaml (and some other languages) use 0-indexed columns (setq-local compilation-first-column 0) ;; Scroll the compilation buffer to follow output (setq compilation-scroll-output t) ;; ... or scroll until the first error appears (setq compilation-scroll-output 'first-error) ;; Skip warnings and info when navigating with next-error (setq compilation-skip-threshold 2) ;; Auto-close the compilation window on success (setq compilation-finish-functions (list (lambda (buf status) (when (string-match-p "finished" status) (run-at-time 1 nil #'delete-windows-on buf)))))The
compilation-skip-thresholdis particularly useful. Set it to 2 andnext-errorwill only stop at actual errors, skipping warnings and info messages. Set it to 1 to also stop at warnings but skip info. Set it to 0 to stop at everything.The Compilation Mode Family
Compilation mode isn’t just for compilers. Several built-in modes derive from it:
grep-mode–M-x grep,M-x rgrep,M-x lgrepall produce output in a compilation-derived buffer. Samenext-errornavigation, same keybindings.occur-mode–M-x occurisn’t technically derived from compilation mode, but it participates in the samenext-errorinfrastructure.flymake/flycheck– uses compilation-style error navigation under the hood.
The
grepfamily deserves special mention.M-x rgrepis recursive grep with file-type filtering, and it’s surprisingly powerful for a built-in tool. The results buffer supports all the same navigation, and you can even edit results and write changes back to the original files.M-x occurhas had this built-in for a long time viaoccur-edit-mode(just pressein the*Occur*buffer). For grep, the wgrep package has been the go-to solution, but starting with Emacs 31 there will be a built-ingrep-edit-modeas well. That’s a multi-file search-and-replace workflow that rivals any modern IDE, no external tools required.Building a Derived Mode
The real fun begins when you create your own compilation-derived mode. Let’s build one for running RuboCop (a Ruby linter and formatter). RuboCop’s
emacsoutput format looks like this:app/models/user.rb:10:5: C: Style/StringLiterals: Prefer single-quoted strings app/models/user.rb:25:3: W: Lint/UselessAssignment: Useless assignment to variable - x app/models/user.rb:42:1: E: Naming/MethodName: Use snake_case for method namesThe format is
FILE:LINE:COLUMN: SEVERITY: CopName: Messagewhere severity isC(convention),W(warning),E(error), orF(fatal).Here’s a complete derived mode:
(require 'compile) (defvar rubocop-error-regexp-alist `((rubocop-offense ;; file:line:col: S: Cop/Name: message "^\\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\): \\(\\([EWFC]\\)\\): " 1 2 3 (5 . nil) nil (4 compilation-warning-face))) "Error regexp alist for RuboCop output. Group 5 captures the severity letter: E/F = error, W/C = warning.") (define-compilation-mode rubocop-mode "RuboCop" "Major mode for RuboCop output." (setq-local compilation-error-regexp-alist (mapcar #'car rubocop-error-regexp-alist)) (setq-local compilation-error-regexp-alist-alist rubocop-error-regexp-alist)) (defun rubocop-run (&optional directory) "Run RuboCop on DIRECTORY (defaults to project root)." (interactive) (let ((default-directory (or directory (project-root (project-current t))))) (compilation-start "rubocop --format emacs" #'rubocop-mode)))A few things to note:
define-compilation-modecreates a major mode derived fromcompilation-mode. It inherits all the navigation, font-locking, andnext-errorintegration for free.- We set
compilation-error-regexp-alistandcompilation-error-regexp-alist-alistas buffer-local. This means our mode only uses its own regexps, not the global ones. No interference with other tools. compilation-startis the workhorse – it runs the command and displays output in a buffer using our mode.- The TYPE field
(5 . nil)means: if group 5 matched, check its content – but actually, here all lines match group 5. The subtlety is that compilation mode treats a non-nil TYPE group as a warning. To distinguish E/F from W/C, you’d need a predicate or two separate regexp entries. For simplicity, this version treats everything as an error, which is usually fine for a linter.
You could extend this with auto-fix support (
rubocop -A), or a sentinel function that sends a notification when the run finishes:(defun rubocop-run (&optional directory) "Run RuboCop on DIRECTORY (defaults to project root)." (interactive) (let ((default-directory (or directory (project-root (project-current t)))) (compilation-finish-functions (cons (lambda (_buf status) (message "RuboCop %s" (string-trim status))) compilation-finish-functions))) (compilation-start "rubocop --format emacs" #'rubocop-mode)))Side note: RuboCop actually ships with a built-in
emacsoutput formatter (that’s what--format emacsuses above), so its output already matches Emacs’s default compilation regexps out of the box – no custom mode needed. I used it here purely to illustrate howdefine-compilation-modeworks. In practice you’d justM-x compile RET rubocop --format emacsand everything would Just Work.1If you want a real, battle-tested
rubocop-moderather than rolling your own, check out rubocop-emacs. It provides commands for running RuboCop on the current file, project, or directory, with proper compilation mode integration. Beyond compilation mode, RuboCop is also supported out of the box by both Flymake (viaruby-flymake-rubocopin Emacs 29+) and Flycheck (via theruby-rubocopchecker), giving you real-time feedback as you edit without needing to run a manual compilation at all.In practice, most popular development tools already have excellent Emacs integration, so you’re unlikely to need to write your own compilation-derived mode any time soon. The last ones I incorporated into my workflow were ag.el and deadgrep.el – both compilation-derived modes for search tools – and even those have been around for years. Still, understanding how compilation mode works under the hood is valuable for the occasional edge case and for appreciating just how much the ecosystem gives you for free.
next-error is not really an error
There is no spoon.
– The Matrix
The most powerful insight about compilation mode is that it’s not really about compilation. It’s about structured output with source locations. Any tool that produces file/line references can plug into this infrastructure, and once it does, you get
next-errornavigation for free. The namecompilation-modeis a bit of a misnomer – something likestructured-output-modewould be more accurate. But then again, naming is hard, and this one has 30+ years of momentum behind it.This is one of Emacs’s great architectural wins. Whether you’re navigating compiler errors, grep results, test failures, or linter output, the workflow is the same:
M-g nto jump to the next problem. Once your fingers learn that pattern, it works everywhere.I used
M-x compilefor two decades before I really understood the machinery underneath. Sometimes the tools you use every day are the ones most worth revisiting.That’s all I have for you today. In Emacs we trust!
-
Full disclosure: I may know a thing or two about RuboCop’s Emacs formatter. ↩
Subscribe via RSS | View Older Posts