Posts

  • Taming Font-Lock with font-lock-ignore

    I recently wrote about customizing font-lock in the age of Tree-sitter. After publishing that article, a reader pointed out that I’d overlooked font-lock-ignore – a handy option for selectively disabling font-lock rules that was introduced in Emacs 29. I’ll admit I had no idea it existed, and I figured if I missed it, I’m probably not the only one.1

    It’s a bit amusing that something this useful only landed in Emacs 29 – the very release that kicked off the transition to Tree-sitter. Better late than never, right?

    The Problem

    Traditional font-lock gives you two ways to control highlighting: the coarse font-lock-maximum-decoration (pick a level from 1 to 3) and the surgical font-lock-remove-keywords (manually specify which keyword rules to drop). The first is too blunt – you can’t say “I want level 3 but without operator highlighting.” The second is fragile – you need to know the exact internal structure of the mode’s font-lock-keywords and call it from a mode hook.

    What was missing was a declarative way to say “in this mode, don’t highlight these things” without getting your hands dirty with the internals. That’s exactly what font-lock-ignore provides.

    How It Works

    font-lock-ignore is a single user option (a defcustom) whose value is an alist. Each entry maps a mode symbol to a list of conditions that describe which font-lock rules to suppress:

    (setq font-lock-ignore
          '((MODE CONDITION ...)
            (MODE CONDITION ...)
            ...))
    

    MODE is a major or minor mode symbol. For major modes, derived-mode-p is used, so a rule for prog-mode applies to all programming modes. For minor modes, the rule applies when the mode is active.

    CONDITION can be:

    • A face symbol – suppresses any font-lock rule that applies that face. Supports glob-style wildcards: font-lock-*-face matches all standard font-lock faces.
    • A string – suppresses any rule whose regexp would match that string. This lets you disable highlighting of a specific keyword like "TODO" or "defun".
    • (pred FUNCTION) – suppresses rules for which FUNCTION returns non-nil.
    • (not CONDITION), (and CONDITION ...), (or CONDITION ...) – the usual logical combinators.
    • (except CONDITION) – carves out exceptions from broader rules.

    Note: The Emacs manual covers font-lock-ignore in the Customizing Keywords section of the Elisp reference.

    When to Use It

    font-lock-ignore is most useful when you’re generally happy with a mode’s highlighting but want to tone down specific aspects. Maybe you find type annotations too noisy, or you don’t want preprocessor directives highlighted, or a minor mode is adding highlighting you don’t care for.

    For Tree-sitter modes, the feature/level system described in my previous article is the right tool for the job. But for traditional modes – and there are still plenty of those – font-lock-ignore fills a gap that existed for decades.

    Discovering Which Faces to Suppress

    To use font-lock-ignore effectively, you need to know which faces are being applied to the text you want to change. A few built-in commands make this easy:

    • C-u C-x = (what-cursor-position with a prefix argument) – the quickest way. It shows the face at point along with other text properties right in the echo area.
    • M-x describe-face – prompts for a face name (defaulting to the face at point) and shows its full definition, inheritance chain, and current appearance.
    • M-x list-faces-display – opens a buffer listing all defined faces with visual samples. Handy for browsing the font-lock-*-face family and the newer Emacs 29 faces like font-lock-bracket-face and font-lock-operator-face.

    Once you’ve identified the face, just drop it into font-lock-ignore.

    Practical Examples

    Here’s the example from the Emacs manual, which shows off the full range of conditions:

    (setq font-lock-ignore
          '((prog-mode font-lock-*-face
                       (except help-echo))
            (emacs-lisp-mode (except ";;;###autoload"))
            (whitespace-mode whitespace-empty-at-bob-regexp)
            (makefile-mode (except *))))
    

    Let’s break it down:

    1. In all prog-mode derivatives, suppress all standard font-lock-*-face highlighting (syntactic fontification for comments and strings is unaffected, since that uses the syntax table, not keyword rules).
    2. But keep any rules that add a help-echo text property.
    3. In emacs-lisp-mode, also keep the ;;;###autoload cookie highlighting (which rule 1 would have suppressed).
    4. When whitespace-mode is active, additionally suppress the whitespace-empty-at-bob-regexp highlight.
    5. In makefile-mode, (except *) undoes all previous conditions, effectively exempting Makefiles from any filtering.

    Here are some simpler, more focused examples:

    ;; Disable type highlighting in all programming modes
    (setq font-lock-ignore
          '((prog-mode font-lock-type-face)))
    
    ;; Disable bracket and operator faces specifically
    (setq font-lock-ignore
          '((prog-mode font-lock-bracket-face
                       font-lock-operator-face)))
    
    ;; Disable keyword highlighting in python-mode only
    (setq font-lock-ignore
          '((python-mode font-lock-keyword-face)))
    

    Pretty sweet, right?

    Important Caveats

    A few things to keep in mind:

    • font-lock-ignore only affects keyword fontification (the regexp-based rules in font-lock-keywords). It does not touch syntactic fontification – comments and strings highlighted via the syntax table are not affected.
    • It’s a global option, not buffer-local. You scope rules to specific modes via the alist keys.
    • Since it filters rules at compile time (during font-lock-compile-keywords), changes take effect the next time font-lock is initialized in a buffer. If you’re experimenting, run M-x font-lock-mode twice (off then on) to see your changes.

    The End

    I don’t know about you, but I really wish that font-lock-ignore got added to Emacs a long time ago. Still, the transition to Tree-sitter modes is bound to take years, so many of us will still get to leverage font-lock-ignore and benefit from it.

    That’s all I have for you today. Keep hacking!

    1. That’s one of the reasons I love writing about Emacs features – I often learn something new while doing the research for an article, and as bonus I get to learn from my readers as well. 

  • Customizing Font-Lock in the Age of Tree-sitter

    I recently wrote about building major modes with Tree-sitter over on batsov.com, covering the mode author’s perspective. But what about the user’s perspective? If you’re using a Tree-sitter-powered major mode, how do you actually customize the highlighting?

    This is another article in a recent streak inspired by my work on neocaml, clojure-ts-mode, and asciidoc-mode. Building three Tree-sitter modes across very different languages has given me a good feel for both sides of the font-lock equation – and I keep running into users who are puzzled by how different the new system is from the old regex-based world.

    This post covers what changed, what you can control, and how to make Tree-sitter font-lock work exactly the way you want.

    The Old World: Regex Font-Lock

    Traditional font-lock in Emacs actually has two phases. First, syntactic fontification handles comments and strings using the buffer’s syntax table and parse-partial-sexp (implemented in C) – this isn’t regexp-based at all. Second, keyword fontification runs the regexps in font-lock-keywords against the buffer text to highlight everything else: language keywords, types, function names, and so on. When people talk about “regex font-lock,” they usually mean this second phase, which is where most of the mode-specific highlighting lives and where most of the customization happens.

    If you wanted to customize it, you’d manipulate font-lock-keywords directly:

    ;; Add a custom highlighting rule in the old world
    (font-lock-add-keywords 'emacs-lisp-mode
      '(("\\<\\(FIXME\\|TODO\\)\\>" 1 'font-lock-warning-face prepend)))
    

    The downsides are well-known: regexps can’t understand nesting, they break on multi-line constructs, and getting them right for a real programming language is a never-ending battle of edge cases.

    The New World: Tree-sitter Font-Lock

    Tree-sitter font-lock is fundamentally different. Instead of matching text with regexps, it queries the syntax tree. A major mode defines treesit-font-lock-settings – a list of Tree-sitter queries paired with faces. Each query pattern matches node types in the parse tree, not text patterns.

    This means highlighting is structurally correct by definition. A string is highlighted as a string because the parser identified it as a string node, not because a regexp happened to match quote characters. If the code has a syntax error, the parser still produces a (partial) tree, and highlighting degrades gracefully instead of going haywire.

    There’s also a significant performance difference. With regex font-lock, every regexp in font-lock-keywords runs against every line in the visible region on each update – more rules means linearly more work, and a complex major mode can easily have dozens of regexps. Poorly written patterns with nested quantifiers can trigger catastrophic backtracking, causing visible hangs on certain inputs. Multi-line font-lock (via font-lock-multiline or jit-lock-contextually) makes things worse, requiring re-scanning of larger regions that’s both expensive and fragile. Tree-sitter sidesteps all of this: after the initial parse, edits only re-parse the changed portion of the syntax tree, and font-lock queries run against the already-built tree rather than scanning raw text. The result is highlighting that scales much better with buffer size and rule complexity.

    The trade-off is that customization works differently. You can’t just add a regexp to a list anymore. But the new system offers its own kind of flexibility, and in many ways it’s more powerful.

    Note: The Emacs manual covers Tree-sitter font-lock in the Parser-based Font Lock section. For the full picture of Tree-sitter integration in Emacs, see Parsing Program Source.

    Feature Levels: The Coarse Knob

    Every Tree-sitter major mode organizes its font-lock rules into features – named groups of related highlighting rules. Features are then arranged into 4 levels, from minimal to maximal. The Emacs manual recommends the following conventions for what goes into each level:

    • Level 1: The absolute minimum – typically comment and definition
    • Level 2: Key language constructs – keyword, string, type
    • Level 3: Everything that can be reasonably fontified (this is the default level)
    • Level 4: Marginally useful highlighting – things like bracket, delimiter, operator

    In practice, many modes don’t follow these conventions precisely. Some put number at level 2, others at level 3. Some include variable at level 1, others at level 4. The inconsistency across modes means that setting treesit-font-lock-level to the same number in different modes can give you quite different results – which is one more reason you might want the fine-grained control described in the next section.1

    It’s also worth noting that the feature names themselves are not standardized. There are many common ones you’ll see across modes – comment, string, keyword, type, number, bracket, operator, definition, function, variable, constant, builtin – but individual modes often define features specific to their language. Clojure has quote, deref, and tagged-literals; OCaml might have attribute; a markup language mode might have heading or link. Different modes also vary in how granular they get: some expose a rich set of features that let you fine-tune almost every aspect of highlighting, while others are more spartan and stick to the basics.

    The bottom line is that you’ll always have to check what your particular mode offers. The easiest way is M-x describe-variable RET treesit-font-lock-feature-list in a buffer using that mode – it shows all features organized by level. You can also inspect the mode’s source directly by looking at how it populates treesit-font-lock-settings (try M-x find-library to jump to the mode’s source).

    For example, clojure-ts-mode defines:

    Level Features
    1 comment, definition, variable
    2 keyword, string, char, symbol, builtin, type
    3 constant, number, quote, metadata, doc, regex
    4 bracket, deref, function, tagged-literals

    And neocaml:

    Level Features
    1 comment, definition
    2 keyword, string, number
    3 attribute, builtin, constant, type
    4 operator, bracket, delimiter, variable, function

    The default level is 3, which is a reasonable middle ground for most people. You can change it globally:

    (setq treesit-font-lock-level 4)  ;; maximum highlighting
    

    Or per-mode via a hook:

    (defun my-clojure-ts-font-lock ()
      (setq-local treesit-font-lock-level 2))  ;; minimal: just keywords and strings
    
    (add-hook 'clojure-ts-mode-hook #'my-clojure-ts-font-lock)
    

    This is the equivalent of the old font-lock-maximum-decoration variable, but more principled – features at each level are explicitly chosen by the mode author rather than being an arbitrary “how much highlighting do you want?” dial.

    Note: The Emacs manual describes this system in detail under Font Lock and Syntax.

    Cherry-Picking Features: The Fine Knob

    Levels are a blunt instrument. What if you want operators and variables (level 4) but not brackets and delimiters (also level 4)? You can’t express that with a single number.

    Enter treesit-font-lock-recompute-features. This function lets you explicitly enable or disable individual features, regardless of level:

    (defun my-neocaml-font-lock ()
      (treesit-font-lock-recompute-features
       '(comment definition keyword string number
         attribute builtin constant type operator variable)  ;; enable
       '(bracket delimiter function)))                       ;; disable
    
    (add-hook 'neocaml-base-mode-hook #'my-neocaml-font-lock)
    

    You can also call it interactively with M-x treesit-font-lock-recompute-features to experiment in the current buffer before committing to a configuration.

    This used to be hard in the old regex world – you’d have to dig into font-lock-keywords, figure out which entries corresponded to which syntactic elements, and surgically remove them. Emacs 29 improved the situation with font-lock-ignore, which lets you declaratively suppress specific font-lock rules by mode, face, or regexp. Still, the Tree-sitter approach is arguably cleaner: features are named groups designed for exactly this kind of cherry-picking, rather than an escape hatch bolted on after the fact.

    Customizing Faces

    This part works the same as before – faces are faces. Tree-sitter modes use the standard font-lock-*-face family, so your theme applies automatically. If you want to tweak a specific face:

    (custom-set-faces
     '(font-lock-type-face ((t (:foreground "DarkSeaGreen4"))))
     '(font-lock-property-use-face ((t (:foreground "DarkOrange3")))))
    

    One thing to note: Tree-sitter modes use some of the newer faces introduced in Emacs 29, like font-lock-operator-face, font-lock-bracket-face, font-lock-number-face, font-lock-property-use-face, and font-lock-escape-face. These didn’t exist in the old world (there was no concept of “operator highlighting” in traditional font-lock), so older themes may not define them. If your theme makes operators and variables look the same, that’s why – the theme predates these faces.

    Adding Custom Rules

    This is where Tree-sitter font-lock really shines compared to the old system. Instead of writing regexps, you write Tree-sitter queries that match on the actual syntax tree.

    Say you want to distinguish block-delimiting keywords (begin/end, struct/sig) from control-flow keywords (if/then/else) in OCaml:

    (defface my-block-keyword-face
      '((t :inherit font-lock-keyword-face :weight bold))
      "Face for block-delimiting keywords.")
    
    (defun my-neocaml-block-keywords ()
      (setq treesit-font-lock-settings
            (append treesit-font-lock-settings
                    (treesit-font-lock-rules
                     :language (treesit-parser-language
                                (car (treesit-parser-list)))
                     :override t
                     :feature 'keyword
                     '(["begin" "end" "struct" "sig" "object"]
                       @my-block-keyword-face))))
      (treesit-font-lock-recompute-features))
    
    (add-hook 'neocaml-base-mode-hook #'my-neocaml-block-keywords)
    

    The :override t is important – without it, the new rule won’t overwrite faces already applied by the mode’s built-in rules. And the :feature keyword assigns the rule to a feature group, so it respects the level/feature system.

    Note: The full query syntax is documented in the Pattern Matching section of the Emacs manual – it covers node types, field names, predicates, wildcards, and more.

    For comparison, here’s what you’d need in the old regex world to highlight a specific set of keywords with a different face:

    ;; Old world: fragile, doesn't understand syntax
    (font-lock-add-keywords 'some-mode
      '(("\\<\\(begin\\|end\\|struct\\|sig\\)\\>" . 'my-block-keyword-face)))
    

    The regex version looks simpler, but it’ll match begin inside strings, comments, and anywhere else the text appears. The Tree-sitter version only matches actual keyword nodes in the syntax tree.

    Exploring the Syntax Tree

    The killer feature for customization is M-x treesit-explore-mode. It opens a live view of the syntax tree for the current buffer. As you move point, the explorer highlights the corresponding node and shows its type, field name, and position.

    This is indispensable when writing custom font-lock rules. Want to know what node type OCaml labels are? Put point on one, check the explorer: it’s label_name. Want to highlight it? Write a query for (label_name). No more guessing what regexp might work.

    Another useful tool is M-x treesit-inspect-node-at-point, which shows information about the node at point in the echo area without opening a separate window.

    The Cheat Sheet

    Here’s a quick reference for the key differences:

    Aspect Regex font-lock Tree-sitter font-lock
    Rules defined by font-lock-keywords treesit-font-lock-settings
    Matching mechanism Regular expressions on text Queries on syntax tree nodes
    Granularity control font-lock-maximum-decoration treesit-font-lock-level + features
    Adding rules font-lock-add-keywords Append to treesit-font-lock-settings
    Removing rules font-lock-remove-keywords treesit-font-lock-recompute-features
    Suppressing rules font-lock-ignore (Emacs 29+) Disable features via level or cherry-pick
    Debugging re-builder treesit-explore-mode
    Handles nesting Poorly Correctly (by definition)
    Multi-line constructs Fragile Works naturally
    Performance O(n) per regexp per line Incremental, only re-parses changes

    Closing Thoughts

    The shift from regex to Tree-sitter font-lock is one of the bigger under-the-hood changes in modern Emacs. The customization model is different – you’re working with structured queries instead of text patterns – but once you internalize it, it’s arguably more intuitive. You say “highlight this kind of syntax node” instead of “highlight text that matches this pattern and hope it doesn’t match inside a string.”

    The feature system with its levels, cherry-picking, and custom rules gives you more control than the old font-lock-maximum-decoration ever did. And treesit-explore-mode makes it easy to discover what’s available.

    If you haven’t looked at your Tree-sitter mode’s font-lock features yet, try M-x describe-variable RET treesit-font-lock-feature-list in a Tree-sitter buffer. You might be surprised by how much you can tweak.

    1. Writing this article has been more helpful than I expected – halfway through, I realized my own neocaml had function banished to level 4 and number promoted to level 2. Physician, heal thyself. 

  • Mastering Compilation Mode

    I’ve been using Emacs for over 20 years. I’ve always used M-x compile and next-error without thinking much about them – you run a build, you jump to errors, life is good. But recently, while working on neocaml (a Tree-sitter-based OCaml major mode), I had to write a custom compilation error regexp and learned that compile.el is far more sophisticated and extensible than I ever appreciated.

    This post is a deep dive into compilation mode – how it works, how to customize it, and how to build on top of it.

    The Basics

    If you’re not already using M-x compile, start today. It runs a shell command, captures the output in a *compilation* buffer, and parses error messages so you can jump directly to the offending source locations.

    The essential keybindings in a compilation buffer:

    Keybinding Command What it does
    g recompile Re-run the last compilation command
    M-n compilation-next-error Move to the next error message
    M-p compilation-previous-error Move to the previous error message
    RET compile-goto-error Jump to the source location of the error at point
    C-c C-f next-error-follow-minor-mode Auto-display source as you move through errors

    But the real power move is using next-error and previous-error (M-g n and M-g p) from any buffer. You don’t need to be in the compilation buffer – Emacs tracks the last buffer that produced errors and jumps you there. This works across compile, grep, occur, and any other mode that produces error-like output.

    Pro tip: M-g M-n and M-g M-p do the same thing as M-g n / M-g p but are easier to type since you can hold Meta throughout.

    How Error Parsing Actually Works

    Here’s the part that surprised me. Compilation mode doesn’t have a single regexp that it tries to match against output. Instead, it has a list of regexp entries, and it tries all of them against every line. The list lives in two variables:

    • compilation-error-regexp-alist – a list of symbols naming active entries
    • compilation-error-regexp-alist-alist – an alist mapping those symbols to their actual regexp definitions

    Emacs ships with dozens of entries out of the box – for GCC, Java, Ruby, Python, Perl, Gradle, Maven, and many more. You can see all of them with:

    (mapcar #'car compilation-error-regexp-alist-alist)
    

    Each entry in the alist has this shape:

    (SYMBOL REGEXP FILE LINE COLUMN TYPE HYPERLINK HIGHLIGHT...)
    

    Where:

    • REGEXP – the regular expression to match
    • FILE – group number (or function) for the filename
    • LINE – group number (or cons of start/end groups) for the line
    • COLUMN – group number (or cons of start/end groups) for the column
    • TYPE – severity: 2 = error, 1 = warning, 0 = info (can also be a cons for conditional severity)
    • HYPERLINK – group number for the clickable portion
    • HIGHLIGHT – additional faces to apply

    The TYPE field is particularly interesting. It can be a cons cell (WARNING-GROUP . INFO-GROUP), meaning “if group N matched, it’s a warning; if group M matched, it’s info; otherwise it’s an error.” This is how a single regexp can handle errors, warnings, and informational messages.

    A Real-World Example: OCaml Errors

    Let me show you what I built for neocaml. OCaml compiler output looks like this:

    File "foo.ml", line 10, characters 5-12:
    10 |   let x = bad_value
                  ^^^^^^^
    Error: Unbound value bad_value
    

    Warnings:

    File "foo.ml", line 3, characters 6-7:
    3 | let _ x = ()
              ^
    Warning 27 [unused-var-strict]: unused variable x.
    

    And ancillary locations (indented 7 spaces):

    File "foo.ml", line 5, characters 0-20:
    5 | let f (x : int) = x
        ^^^^^^^^^^^^^^^^^^^^
           File "foo.ml", line 10, characters 6-7:
    10 |   f "hello"
              ^
    Error: This expression has type string but ...
    

    One regexp needs to handle all of this. Here’s the (slightly simplified) entry:

    (push `(ocaml
            ,neocaml--compilation-error-regexp
            3                                    ; FILE = group 3
            (4 . 5)                              ; LINE = groups 4-5
            (6 . neocaml--compilation-end-column) ; COLUMN = group 6, end via function
            (8 . 9)                              ; TYPE = warning if group 8, info if group 9
            1                                    ; HYPERLINK = group 1
            (8 font-lock-function-name-face))    ; HIGHLIGHT group 8
          compilation-error-regexp-alist-alist)
    

    A few things worth noting:

    • The COLUMN end position uses a function instead of a group number. OCaml’s end column is exclusive, but Emacs expects inclusive, so neocaml--compilation-end-column subtracts 1.
    • The TYPE cons (8 . 9) means: if group 8 matched (Warning/Alert text), it’s a warning; if group 9 matched (7-space indent), it’s info; otherwise it’s an error. Three severity levels from one regexp.
    • The entry is registered globally in compilation-error-regexp-alist-alist because *compilation* buffers aren’t in any language-specific mode. Every active entry is tried against every line.

    Adding Your Own Error Regexp

    You don’t need to be writing a major mode to add your own entry. Say you’re working with a custom linter that outputs:

    [ERROR] src/app.js:42:10 - Unused import 'foo'
    [WARN] src/app.js:15:3 - Missing return type
    

    You can teach compilation mode about it:

    (with-eval-after-load 'compile
      (push '(my-linter
              "^\\[\\(ERROR\\|WARN\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
              2 3 4 (1 . nil))
            compilation-error-regexp-alist-alist)
      (push 'my-linter compilation-error-regexp-alist))
    

    The TYPE field (1 . nil) means: “if group 1 matches, it’s a warning” – but wait, group 1 always matches. The trick is that compilation mode checks the content of the match. Actually, let me correct myself. The TYPE field should be a number or expression. A cleaner approach:

    (with-eval-after-load 'compile
      (push '(my-linter
              "^\\[\\(?:ERROR\\|\\(WARN\\)\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
              2 3 4 (1))
            compilation-error-regexp-alist-alist)
      (push 'my-linter compilation-error-regexp-alist))
    

    Here group 1 only matches for WARN lines (it’s inside a non-capturing group with an alternative). TYPE is (1) meaning “if group 1 matched, it’s a warning; otherwise it’s an error.”

    Now M-x compile with your linter command will highlight errors and warnings differently, and next-error will jump right to them.

    Useful Variables You Might Not Know

    A few compilation variables that are worth knowing:

    ;; OCaml (and some other languages) use 0-indexed columns
    (setq-local compilation-first-column 0)
    
    ;; Scroll the compilation buffer to follow output
    (setq compilation-scroll-output t)
    
    ;; ... or scroll until the first error appears
    (setq compilation-scroll-output 'first-error)
    
    ;; Skip warnings and info when navigating with next-error
    (setq compilation-skip-threshold 2)
    
    ;; Auto-close the compilation window on success
    (setq compilation-finish-functions
          (list (lambda (buf status)
                  (when (string-match-p "finished" status)
                    (run-at-time 1 nil #'delete-windows-on buf)))))
    

    The compilation-skip-threshold is particularly useful. Set it to 2 and next-error will only stop at actual errors, skipping warnings and info messages. Set it to 1 to also stop at warnings but skip info. Set it to 0 to stop at everything.

    The Compilation Mode Family

    Compilation mode isn’t just for compilers. Several built-in modes derive from it:

    • grep-modeM-x grep, M-x rgrep, M-x lgrep all produce output in a compilation-derived buffer. Same next-error navigation, same keybindings.
    • occur-modeM-x occur isn’t technically derived from compilation mode, but it participates in the same next-error infrastructure.
    • flymake/flycheck – uses compilation-style error navigation under the hood.

    The grep family deserves special mention. M-x rgrep is recursive grep with file-type filtering, and it’s surprisingly powerful for a built-in tool. The results buffer supports all the same navigation, and you can even edit results and write changes back to the original files. M-x occur has had this built-in for a long time via occur-edit-mode (just press e in the *Occur* buffer). For grep, the wgrep package has been the go-to solution, but starting with Emacs 31 there will be a built-in grep-edit-mode as well. That’s a multi-file search-and-replace workflow that rivals any modern IDE, no external tools required.

    Building a Derived Mode

    The real fun begins when you create your own compilation-derived mode. Let’s build one for running RuboCop (a Ruby linter and formatter). RuboCop’s emacs output format looks like this:

    app/models/user.rb:10:5: C: Style/StringLiterals: Prefer single-quoted strings
    app/models/user.rb:25:3: W: Lint/UselessAssignment: Useless assignment to variable - x
    app/models/user.rb:42:1: E: Naming/MethodName: Use snake_case for method names
    

    The format is FILE:LINE:COLUMN: SEVERITY: CopName: Message where severity is C (convention), W (warning), E (error), or F (fatal).

    Here’s a complete derived mode:

    (require 'compile)
    
    (defvar rubocop-error-regexp-alist
      `((rubocop-offense
         ;; file:line:col: S: Cop/Name: message
         "^\\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\): \\(\\([EWFC]\\)\\): "
         1 2 3 (5 . nil)
         nil (4 compilation-warning-face)))
      "Error regexp alist for RuboCop output.
    Group 5 captures the severity letter: E/F = error, W/C = warning.")
    
    (define-compilation-mode rubocop-mode "RuboCop"
      "Major mode for RuboCop output."
      (setq-local compilation-error-regexp-alist
                  (mapcar #'car rubocop-error-regexp-alist))
      (setq-local compilation-error-regexp-alist-alist
                  rubocop-error-regexp-alist))
    
    (defun rubocop-run (&optional directory)
      "Run RuboCop on DIRECTORY (defaults to project root)."
      (interactive)
      (let ((default-directory (or directory (project-root (project-current t)))))
        (compilation-start "rubocop --format emacs" #'rubocop-mode)))
    

    A few things to note:

    • define-compilation-mode creates a major mode derived from compilation-mode. It inherits all the navigation, font-locking, and next-error integration for free.
    • We set compilation-error-regexp-alist and compilation-error-regexp-alist-alist as buffer-local. This means our mode only uses its own regexps, not the global ones. No interference with other tools.
    • compilation-start is the workhorse – it runs the command and displays output in a buffer using our mode.
    • The TYPE field (5 . nil) means: if group 5 matched, check its content – but actually, here all lines match group 5. The subtlety is that compilation mode treats a non-nil TYPE group as a warning. To distinguish E/F from W/C, you’d need a predicate or two separate regexp entries. For simplicity, this version treats everything as an error, which is usually fine for a linter.

    You could extend this with auto-fix support (rubocop -A), or a sentinel function that sends a notification when the run finishes:

    (defun rubocop-run (&optional directory)
      "Run RuboCop on DIRECTORY (defaults to project root)."
      (interactive)
      (let ((default-directory (or directory (project-root (project-current t))))
            (compilation-finish-functions
             (cons (lambda (_buf status)
                     (message "RuboCop %s" (string-trim status)))
                   compilation-finish-functions)))
        (compilation-start "rubocop --format emacs" #'rubocop-mode)))
    

    Side note: RuboCop actually ships with a built-in emacs output formatter (that’s what --format emacs uses above), so its output already matches Emacs’s default compilation regexps out of the box – no custom mode needed. I used it here purely to illustrate how define-compilation-mode works. In practice you’d just M-x compile RET rubocop --format emacs and everything would Just Work.1

    If you want a real, battle-tested rubocop-mode rather than rolling your own, check out rubocop-emacs. It provides commands for running RuboCop on the current file, project, or directory, with proper compilation mode integration. Beyond compilation mode, RuboCop is also supported out of the box by both Flymake (via ruby-flymake-rubocop in Emacs 29+) and Flycheck (via the ruby-rubocop checker), giving you real-time feedback as you edit without needing to run a manual compilation at all.

    In practice, most popular development tools already have excellent Emacs integration, so you’re unlikely to need to write your own compilation-derived mode any time soon. The last ones I incorporated into my workflow were ag.el and deadgrep.el – both compilation-derived modes for search tools – and even those have been around for years. Still, understanding how compilation mode works under the hood is valuable for the occasional edge case and for appreciating just how much the ecosystem gives you for free.

    next-error is not really an error

    There is no spoon.

    – The Matrix

    The most powerful insight about compilation mode is that it’s not really about compilation. It’s about structured output with source locations. Any tool that produces file/line references can plug into this infrastructure, and once it does, you get next-error navigation for free. The name compilation-mode is a bit of a misnomer – something like structured-output-mode would be more accurate. But then again, naming is hard, and this one has 30+ years of momentum behind it.

    This is one of Emacs’s great architectural wins. Whether you’re navigating compiler errors, grep results, test failures, or linter output, the workflow is the same: M-g n to jump to the next problem. Once your fingers learn that pattern, it works everywhere.

    I used M-x compile for two decades before I really understood the machinery underneath. Sometimes the tools you use every day are the ones most worth revisiting.

    That’s all I have for you today. In Emacs we trust!

    1. Full disclosure: I may know a thing or two about RuboCop’s Emacs formatter. 

  • Transpose All The Things

    Most Emacs users know C-t to swap two characters and M-t to swap two words. Some know C-x C-t for swapping lines. But the transpose family goes deeper than that, and with tree-sitter in the picture, things get really interesting.

    Let’s take a tour.

    The Classics

    The three transpose commands everyone knows (or should know):

    Keybinding Command What it does
    C-t transpose-chars Swap the character before point with the one after
    M-t transpose-words Swap the word before point with the one after
    C-x C-t transpose-lines Swap the current line with the one above

    These are purely textual – they don’t care about syntax, language, or structure. They work the same in an OCaml buffer, an email draft, or a shell script. Simple and reliable.

    One thing worth noting: transpose-lines is often used not for literal transposition but as a building block for moving lines up and down.

    One caveat: transpose-lines doesn’t play well with visual-line-mode. Since visual-line-mode wraps long lines visually without inserting actual newlines, what looks like several lines on screen may be a single buffer line. transpose-lines operates on real (logical) lines, so it can end up swapping far more text than you expected. This is one of the reasons I’m not a fan of visual-line-mode – it subtly breaks commands that operate on lines. If you must use visual-line-mode, your best workaround is to fall back to transpose-sentences or transpose-paragraphs (which rely on punctuation and blank lines rather than newlines), or temporarily disable visual-line-mode with M-x visual-line-mode before transposing.

    The Overlooked Ones

    Here’s where it gets more interesting. Emacs has two more transpose commands that most people never discover:

    transpose-sentences (no default keybinding)

    This swaps two sentences around point. In text modes, a “sentence” is determined by sentence-end (typically a period followed by whitespace). In programming modes… well, it depends. More on this below.

    transpose-paragraphs (no default keybinding)

    Swaps two paragraphs. A paragraph is separated by blank lines by default. Less useful in code, but handy when editing prose or documentation.

    Neither command has a default keybinding, which probably explains why they’re so obscure. If you write a lot of prose in Emacs, binding transpose-sentences to something convenient is worth considering.

    The MVP: transpose-sexps

    C-M-t (transpose-sexps) is the most powerful of the bunch. It swaps two “balanced expressions” around point. What counts as a balanced expression depends on the mode:

    In Lisp modes, a sexp is what you’d expect – an atom, a string, or a parenthesized form:

    ;; Before: point after `bar`
    (foo bar| baz)
    ;; C-M-t →
    (foo baz bar)
    

    In other programming modes, “sexp” maps to whatever the mode considers a balanced expression – identifiers, string literals, parenthesized groups, function arguments:

    # Before: point after `arg1`
    def foo(arg1|, arg2):
    # C-M-t →
    def foo(arg2, arg1):
    
    (* Before: point after `two` *)
    foobar two| three
    (* C-M-t → *)
    foobar three two
    

    This is incredibly useful for reordering function arguments, swapping let bindings, or rearranging list elements. The catch is that “sexp” is a Lisp-centric concept, and in non-Lisp languages the results can sometimes be surprising – the mode has to define what constitutes a balanced expression, and that definition doesn’t always match your intuition.

    How Tree-sitter Changes Things

    Tree-sitter gives Emacs a full abstract syntax tree (AST) for every buffer, and this fundamentally changes how structural commands work.

    Sexp Navigation and Transposition

    On Emacs 30+, tree-sitter major modes can define a sexp “thing” in treesit-thing-settings. This tells Emacs which AST nodes count as balanced expressions. When this is configured, transpose-sexps (C-M-t) uses treesit-transpose-sexps under the hood, walking the parse tree to find siblings to swap instead of relying on syntax tables.

    The result is more reliable transposition in languages where syntax-table-based sexp detection struggles. OCaml’s nested match arms, Python’s indentation-based blocks, Go’s composite literals – tree-sitter understands them all.

    That said, the Emacs 30 implementation of treesit-transpose-sexps has some rough edges (it sometimes picks the wrong level of the AST). Emacs 31 rewrites the function to work more reliably.1

    Sentence Navigation and Transposition

    This is where things get quietly powerful. On Emacs 30+, tree-sitter modes can also define a sentence thing in treesit-thing-settings. In a programming context, “sentence” typically maps to top-level or block-level statements – let bindings, type definitions, function definitions, imports, etc.

    Once a mode defines this, M-a and M-e navigate between these constructs, and transpose-sentences swaps them:

    (* Before *)
    let x = 42
    let y = 17
    
    (* M-x transpose-sentences → *)
    let y = 17
    let x = 42
    
    # Before
    import os
    import sys
    
    # M-x transpose-sentences →
    import sys
    import os
    

    This is essentially “transpose definitions” or “transpose statements” for free, with no custom code needed beyond the sentence definition.

    Beyond the Built-ins

    If the built-in transpose commands aren’t enough, several packages extend the concept:

    combobulate is the most comprehensive tree-sitter structural editing package. Its combobulate-drag-up (M-P) and combobulate-drag-down (M-N) commands swap the current AST node with its previous or next sibling. This is like transpose-sexps but more predictable – it uses tree-sitter’s sibling relationships directly, so it works consistently for function parameters, list items, dictionary entries, HTML attributes, and more.

    For simpler needs, packages like drag-stuff and move-text provide line and region dragging without tree-sitter awareness. They’re less precise but work everywhere.

    Wrapping Up

    Here’s the complete transpose family at a glance:

    Keybinding Command Tree-sitter aware?
    C-t transpose-chars No
    M-t transpose-words No
    C-x C-t transpose-lines No
    C-M-t transpose-sexps Yes (Emacs 30+)
    (unbound) transpose-sentences Indirectly (Emacs 30+)
    (unbound) transpose-paragraphs No

    The first three are textual workhorses that haven’t changed much in decades. transpose-sexps has been quietly upgraded by tree-sitter into something much more capable. And transpose-sentences is the sleeper hit – once your tree-sitter mode defines what a “sentence” is in your language, you get structural statement transposition for free.

    That’s all I have for you today. Keep hacking!

    1. See bug#60655 for the gory details. 

  • expreg: Expand Region, Reborn

    expand-region is one of my all time favorite Emacs packages. I’ve been using it since forever – press a key, the selection grows to the next semantic unit, press again, it grows further. Simple, useful, and satisfying. I’ve mentioned it quite a few times over the years, and it’s been a permanent fixture in my config for as long as I can remember.

    But lately I’ve been wondering if there’s a better way. I’ve been playing with Neovim and Helix from time to time (heresy, I know), and both have structural selection baked in via tree-sitter – select a node, expand to its parent, and so on. Meanwhile, I’ve been building and using more tree-sitter major modes in Emacs (e.g. clojure-ts-mode and neocaml), and the contrast started to bother me. We have this rich AST sitting right there in the buffer, but expand-region doesn’t know about it.

    The fundamental limitation is that expand-region relies on hand-written, mode-specific expansion functions for each language. Someone has to write and maintain er/mark-ruby-block, er/mark-python-statement, er/mark-html-tag, and so on. Some languages have great support, others get generic fallbacks. And when a new language comes along, you’re on your own until someone writes the expansion functions for it.

    Read More

Subscribe via RSS | View Older Posts