Posts

  • Mastering Compilation Mode

    I’ve been using Emacs for over 20 years. I’ve always used M-x compile and next-error without thinking much about them – you run a build, you jump to errors, life is good. But recently, while working on neocaml (a Tree-sitter-based OCaml major mode), I had to write a custom compilation error regexp and learned that compile.el is far more sophisticated and extensible than I ever appreciated.

    This post is a deep dive into compilation mode – how it works, how to customize it, and how to build on top of it.

    The Basics

    If you’re not already using M-x compile, start today. It runs a shell command, captures the output in a *compilation* buffer, and parses error messages so you can jump directly to the offending source locations.

    The essential keybindings in a compilation buffer:

    Keybinding Command What it does
    g recompile Re-run the last compilation command
    M-n compilation-next-error Move to the next error message
    M-p compilation-previous-error Move to the previous error message
    RET compile-goto-error Jump to the source location of the error at point
    C-c C-f next-error-follow-minor-mode Auto-display source as you move through errors

    But the real power move is using next-error and previous-error (M-g n and M-g p) from any buffer. You don’t need to be in the compilation buffer – Emacs tracks the last buffer that produced errors and jumps you there. This works across compile, grep, occur, and any other mode that produces error-like output.

    Pro tip: M-g M-n and M-g M-p do the same thing as M-g n / M-g p but are easier to type since you can hold Meta throughout.

    How Error Parsing Actually Works

    Here’s the part that surprised me. Compilation mode doesn’t have a single regexp that it tries to match against output. Instead, it has a list of regexp entries, and it tries all of them against every line. The list lives in two variables:

    • compilation-error-regexp-alist – a list of symbols naming active entries
    • compilation-error-regexp-alist-alist – an alist mapping those symbols to their actual regexp definitions

    Emacs ships with dozens of entries out of the box – for GCC, Java, Ruby, Python, Perl, Gradle, Maven, and many more. You can see all of them with:

    (mapcar #'car compilation-error-regexp-alist-alist)
    

    Each entry in the alist has this shape:

    (SYMBOL REGEXP FILE LINE COLUMN TYPE HYPERLINK HIGHLIGHT...)
    

    Where:

    • REGEXP – the regular expression to match
    • FILE – group number (or function) for the filename
    • LINE – group number (or cons of start/end groups) for the line
    • COLUMN – group number (or cons of start/end groups) for the column
    • TYPE – severity: 2 = error, 1 = warning, 0 = info (can also be a cons for conditional severity)
    • HYPERLINK – group number for the clickable portion
    • HIGHLIGHT – additional faces to apply

    The TYPE field is particularly interesting. It can be a cons cell (WARNING-GROUP . INFO-GROUP), meaning “if group N matched, it’s a warning; if group M matched, it’s info; otherwise it’s an error.” This is how a single regexp can handle errors, warnings, and informational messages.

    A Real-World Example: OCaml Errors

    Let me show you what I built for neocaml. OCaml compiler output looks like this:

    File "foo.ml", line 10, characters 5-12:
    10 |   let x = bad_value
                  ^^^^^^^
    Error: Unbound value bad_value
    

    Warnings:

    File "foo.ml", line 3, characters 6-7:
    3 | let _ x = ()
              ^
    Warning 27 [unused-var-strict]: unused variable x.
    

    And ancillary locations (indented 7 spaces):

    File "foo.ml", line 5, characters 0-20:
    5 | let f (x : int) = x
        ^^^^^^^^^^^^^^^^^^^^
           File "foo.ml", line 10, characters 6-7:
    10 |   f "hello"
              ^
    Error: This expression has type string but ...
    

    One regexp needs to handle all of this. Here’s the (slightly simplified) entry:

    (push `(ocaml
            ,neocaml--compilation-error-regexp
            3                                    ; FILE = group 3
            (4 . 5)                              ; LINE = groups 4-5
            (6 . neocaml--compilation-end-column) ; COLUMN = group 6, end via function
            (8 . 9)                              ; TYPE = warning if group 8, info if group 9
            1                                    ; HYPERLINK = group 1
            (8 font-lock-function-name-face))    ; HIGHLIGHT group 8
          compilation-error-regexp-alist-alist)
    

    A few things worth noting:

    • The COLUMN end position uses a function instead of a group number. OCaml’s end column is exclusive, but Emacs expects inclusive, so neocaml--compilation-end-column subtracts 1.
    • The TYPE cons (8 . 9) means: if group 8 matched (Warning/Alert text), it’s a warning; if group 9 matched (7-space indent), it’s info; otherwise it’s an error. Three severity levels from one regexp.
    • The entry is registered globally in compilation-error-regexp-alist-alist because *compilation* buffers aren’t in any language-specific mode. Every active entry is tried against every line.

    Adding Your Own Error Regexp

    You don’t need to be writing a major mode to add your own entry. Say you’re working with a custom linter that outputs:

    [ERROR] src/app.js:42:10 - Unused import 'foo'
    [WARN] src/app.js:15:3 - Missing return type
    

    You can teach compilation mode about it:

    (with-eval-after-load 'compile
      (push '(my-linter
              "^\\[\\(ERROR\\|WARN\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
              2 3 4 (1 . nil))
            compilation-error-regexp-alist-alist)
      (push 'my-linter compilation-error-regexp-alist))
    

    The TYPE field (1 . nil) means: “if group 1 matches, it’s a warning” – but wait, group 1 always matches. The trick is that compilation mode checks the content of the match. Actually, let me correct myself. The TYPE field should be a number or expression. A cleaner approach:

    (with-eval-after-load 'compile
      (push '(my-linter
              "^\\[\\(?:ERROR\\|\\(WARN\\)\\)\\] \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
              2 3 4 (1))
            compilation-error-regexp-alist-alist)
      (push 'my-linter compilation-error-regexp-alist))
    

    Here group 1 only matches for WARN lines (it’s inside a non-capturing group with an alternative). TYPE is (1) meaning “if group 1 matched, it’s a warning; otherwise it’s an error.”

    Now M-x compile with your linter command will highlight errors and warnings differently, and next-error will jump right to them.

    Useful Variables You Might Not Know

    A few compilation variables that are worth knowing:

    ;; OCaml (and some other languages) use 0-indexed columns
    (setq-local compilation-first-column 0)
    
    ;; Scroll the compilation buffer to follow output
    (setq compilation-scroll-output t)
    
    ;; ... or scroll until the first error appears
    (setq compilation-scroll-output 'first-error)
    
    ;; Skip warnings and info when navigating with next-error
    (setq compilation-skip-threshold 2)
    
    ;; Auto-close the compilation window on success
    (setq compilation-finish-functions
          (list (lambda (buf status)
                  (when (string-match-p "finished" status)
                    (run-at-time 1 nil #'delete-windows-on buf)))))
    

    The compilation-skip-threshold is particularly useful. Set it to 2 and next-error will only stop at actual errors, skipping warnings and info messages. Set it to 1 to also stop at warnings but skip info. Set it to 0 to stop at everything.

    The Compilation Mode Family

    Compilation mode isn’t just for compilers. Several built-in modes derive from it:

    • grep-modeM-x grep, M-x rgrep, M-x lgrep all produce output in a compilation-derived buffer. Same next-error navigation, same keybindings.
    • occur-modeM-x occur isn’t technically derived from compilation mode, but it participates in the same next-error infrastructure.
    • flymake/flycheck – uses compilation-style error navigation under the hood.

    The grep family deserves special mention. M-x rgrep is recursive grep with file-type filtering, and it’s surprisingly powerful for a built-in tool. The results buffer supports all the same navigation, plus M-x wgrep (from the wgrep package) lets you edit grep results and write the changes back to the original files. That’s a workflow that rivals any modern IDE.

    Building a Derived Mode

    The real fun begins when you create your own compilation-derived mode. Let’s build one for running RuboCop (a Ruby linter and formatter). RuboCop’s emacs output format looks like this:

    app/models/user.rb:10:5: C: Style/StringLiterals: Prefer single-quoted strings
    app/models/user.rb:25:3: W: Lint/UselessAssignment: Useless assignment to variable - x
    app/models/user.rb:42:1: E: Naming/MethodName: Use snake_case for method names
    

    The format is FILE:LINE:COLUMN: SEVERITY: CopName: Message where severity is C (convention), W (warning), E (error), or F (fatal).

    Here’s a complete derived mode:

    (require 'compile)
    
    (defvar rubocop-error-regexp-alist
      `((rubocop-offense
         ;; file:line:col: S: Cop/Name: message
         "^\\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\): \\(\\([EWFC]\\)\\): "
         1 2 3 (5 . nil)
         nil (4 compilation-warning-face)))
      "Error regexp alist for RuboCop output.
    Group 5 captures the severity letter: E/F = error, W/C = warning.")
    
    (define-compilation-mode rubocop-mode "RuboCop"
      "Major mode for RuboCop output."
      (setq-local compilation-error-regexp-alist
                  (mapcar #'car rubocop-error-regexp-alist))
      (setq-local compilation-error-regexp-alist-alist
                  rubocop-error-regexp-alist))
    
    (defun rubocop-run (&optional directory)
      "Run RuboCop on DIRECTORY (defaults to project root)."
      (interactive)
      (let ((default-directory (or directory (project-root (project-current t)))))
        (compilation-start "rubocop --format emacs" #'rubocop-mode)))
    

    A few things to note:

    • define-compilation-mode creates a major mode derived from compilation-mode. It inherits all the navigation, font-locking, and next-error integration for free.
    • We set compilation-error-regexp-alist and compilation-error-regexp-alist-alist as buffer-local. This means our mode only uses its own regexps, not the global ones. No interference with other tools.
    • compilation-start is the workhorse – it runs the command and displays output in a buffer using our mode.
    • The TYPE field (5 . nil) means: if group 5 matched, check its content – but actually, here all lines match group 5. The subtlety is that compilation mode treats a non-nil TYPE group as a warning. To distinguish E/F from W/C, you’d need a predicate or two separate regexp entries. For simplicity, this version treats everything as an error, which is usually fine for a linter.

    You could extend this with auto-fix support (rubocop -A), or a sentinel function that sends a notification when the run finishes:

    (defun rubocop-run (&optional directory)
      "Run RuboCop on DIRECTORY (defaults to project root)."
      (interactive)
      (let ((default-directory (or directory (project-root (project-current t))))
            (compilation-finish-functions
             (cons (lambda (_buf status)
                     (message "RuboCop %s" (string-trim status)))
                   compilation-finish-functions)))
        (compilation-start "rubocop --format emacs" #'rubocop-mode)))
    

    Side note: RuboCop actually ships with a built-in emacs output formatter (that’s what --format emacs uses above), so its output already matches Emacs’s default compilation regexps out of the box – no custom mode needed. I used it here purely to illustrate how define-compilation-mode works. In practice you’d just M-x compile RET rubocop --format emacs and everything would Just Work.1

    next-error is not really an error

    There is no spoon.

    – The Matrix

    The most powerful insight about compilation mode is that it’s not really about compilation. It’s about structured output with source locations. Any tool that produces file/line references can plug into this infrastructure, and once it does, you get next-error navigation for free. The name compilation-mode is a bit of a misnomer – something like structured-output-mode would be more accurate. But then again, naming is hard, and this one has 30+ years of momentum behind it.

    This is one of Emacs’s great architectural wins. Whether you’re navigating compiler errors, grep results, test failures, or linter output, the workflow is the same: M-g n to jump to the next problem. Once your fingers learn that pattern, it works everywhere.

    I used M-x compile for two decades before I really understood the machinery underneath. Sometimes the tools you use every day are the ones most worth revisiting.

    That’s all I have for you today. In Emacs we trust!

    1. Full disclosure: I may know a thing or two about RuboCop’s Emacs formatter. 

  • Transpose All The Things

    Most Emacs users know C-t to swap two characters and M-t to swap two words. Some know C-x C-t for swapping lines. But the transpose family goes deeper than that, and with tree-sitter in the picture, things get really interesting.

    Let’s take a tour.

    The Classics

    The three transpose commands everyone knows (or should know):

    Keybinding Command What it does
    C-t transpose-chars Swap the character before point with the one after
    M-t transpose-words Swap the word before point with the one after
    C-x C-t transpose-lines Swap the current line with the one above

    These are purely textual – they don’t care about syntax, language, or structure. They work the same in an OCaml buffer, an email draft, or a shell script. Simple and reliable.

    One thing worth noting: transpose-lines is often used not for literal transposition but as a building block for moving lines up and down.

    The Overlooked Ones

    Here’s where it gets more interesting. Emacs has two more transpose commands that most people never discover:

    transpose-sentences (no default keybinding)

    This swaps two sentences around point. In text modes, a “sentence” is determined by sentence-end (typically a period followed by whitespace). In programming modes… well, it depends. More on this below.

    transpose-paragraphs (no default keybinding)

    Swaps two paragraphs. A paragraph is separated by blank lines by default. Less useful in code, but handy when editing prose or documentation.

    Neither command has a default keybinding, which probably explains why they’re so obscure. If you write a lot of prose in Emacs, binding transpose-sentences to something convenient is worth considering.

    The MVP: transpose-sexps

    C-M-t (transpose-sexps) is the most powerful of the bunch. It swaps two “balanced expressions” around point. What counts as a balanced expression depends on the mode:

    In Lisp modes, a sexp is what you’d expect – an atom, a string, or a parenthesized form:

    ;; Before: point after `bar`
    (foo bar| baz)
    ;; C-M-t →
    (foo baz bar)
    

    In other programming modes, “sexp” maps to whatever the mode considers a balanced expression – identifiers, string literals, parenthesized groups, function arguments:

    # Before: point after `arg1`
    def foo(arg1|, arg2):
    # C-M-t →
    def foo(arg2, arg1):
    
    (* Before: point after `two` *)
    foobar two| three
    (* C-M-t → *)
    foobar three two
    

    This is incredibly useful for reordering function arguments, swapping let bindings, or rearranging list elements. The catch is that “sexp” is a Lisp-centric concept, and in non-Lisp languages the results can sometimes be surprising – the mode has to define what constitutes a balanced expression, and that definition doesn’t always match your intuition.

    How Tree-sitter Changes Things

    Tree-sitter gives Emacs a full abstract syntax tree (AST) for every buffer, and this fundamentally changes how structural commands work.

    Sexp Navigation and Transposition

    On Emacs 30+, tree-sitter major modes can define a sexp “thing” in treesit-thing-settings. This tells Emacs which AST nodes count as balanced expressions. When this is configured, transpose-sexps (C-M-t) uses treesit-transpose-sexps under the hood, walking the parse tree to find siblings to swap instead of relying on syntax tables.

    The result is more reliable transposition in languages where syntax-table-based sexp detection struggles. OCaml’s nested match arms, Python’s indentation-based blocks, Go’s composite literals – tree-sitter understands them all.

    That said, the Emacs 30 implementation of treesit-transpose-sexps has some rough edges (it sometimes picks the wrong level of the AST). Emacs 31 rewrites the function to work more reliably.1

    Sentence Navigation and Transposition

    This is where things get quietly powerful. On Emacs 30+, tree-sitter modes can also define a sentence thing in treesit-thing-settings. In a programming context, “sentence” typically maps to top-level or block-level statements – let bindings, type definitions, function definitions, imports, etc.

    Once a mode defines this, M-a and M-e navigate between these constructs, and transpose-sentences swaps them:

    (* Before *)
    let x = 42
    let y = 17
    
    (* M-x transpose-sentences → *)
    let y = 17
    let x = 42
    
    # Before
    import os
    import sys
    
    # M-x transpose-sentences →
    import sys
    import os
    

    This is essentially “transpose definitions” or “transpose statements” for free, with no custom code needed beyond the sentence definition.

    Beyond the Built-ins

    If the built-in transpose commands aren’t enough, several packages extend the concept:

    combobulate is the most comprehensive tree-sitter structural editing package. Its combobulate-drag-up (M-P) and combobulate-drag-down (M-N) commands swap the current AST node with its previous or next sibling. This is like transpose-sexps but more predictable – it uses tree-sitter’s sibling relationships directly, so it works consistently for function parameters, list items, dictionary entries, HTML attributes, and more.

    For simpler needs, packages like drag-stuff and move-text provide line and region dragging without tree-sitter awareness. They’re less precise but work everywhere.

    Wrapping Up

    Here’s the complete transpose family at a glance:

    Keybinding Command Tree-sitter aware?
    C-t transpose-chars No
    M-t transpose-words No
    C-x C-t transpose-lines No
    C-M-t transpose-sexps Yes (Emacs 30+)
    (unbound) transpose-sentences Indirectly (Emacs 30+)
    (unbound) transpose-paragraphs No

    The first three are textual workhorses that haven’t changed much in decades. transpose-sexps has been quietly upgraded by tree-sitter into something much more capable. And transpose-sentences is the sleeper hit – once your tree-sitter mode defines what a “sentence” is in your language, you get structural statement transposition for free.

    That’s all I have for you today. Keep hacking!

    1. See bug#60655 for the gory details. 

  • expreg: Expand Region, Reborn

    expand-region is one of my all time favorite Emacs packages. I’ve been using it since forever – press a key, the selection grows to the next semantic unit, press again, it grows further. Simple, useful, and satisfying. I’ve mentioned it quite a few times over the years, and it’s been a permanent fixture in my config for as long as I can remember.

    But lately I’ve been wondering if there’s a better way. I’ve been playing with Neovim and Helix from time to time (heresy, I know), and both have structural selection baked in via tree-sitter – select a node, expand to its parent, and so on. Meanwhile, I’ve been building and using more tree-sitter major modes in Emacs (e.g. clojure-ts-mode and neocaml), and the contrast started to bother me. We have this rich AST sitting right there in the buffer, but expand-region doesn’t know about it.

    The fundamental limitation is that expand-region relies on hand-written, mode-specific expansion functions for each language. Someone has to write and maintain er/mark-ruby-block, er/mark-python-statement, er/mark-html-tag, and so on. Some languages have great support, others get generic fallbacks. And when a new language comes along, you’re on your own until someone writes the expansion functions for it.

    Read More
  • Soft Wrapping Done Right with visual-wrap-prefix-mode

    Emacs has always offered two camps when it comes to long lines: hard wrapping (inserting actual newlines at fill-column with M-q or auto-fill-mode) and soft wrapping (displaying long lines across multiple visual lines with visual-line-mode).1

    Hard wrapping modifies the buffer text, which isn’t always desirable. Soft wrapping preserves the text but has always had one glaring problem: continuation lines start at column 0, completely ignoring the indentation context. This makes wrapped code and structured text look terrible.

    Emacs 30 finally solves this with visual-wrap-prefix-mode.

    What it does

    When enabled alongside visual-line-mode, visual-wrap-prefix-mode automatically computes a wrap-prefix for each line based on its surrounding context. Continuation lines are displayed with proper indentation — as if the text had been filled with M-q — but without modifying the buffer at all.

    The effect is purely visual. Your file stays untouched.

    Basic setup

    As usual, you can enable the mode for a single buffer:

    (visual-wrap-prefix-mode 1)
    

    Or globally:

    (global-visual-wrap-prefix-mode 1)
    

    You’ll likely want to pair it with visual-line-mode:

    (global-visual-line-mode 1)
    (global-visual-wrap-prefix-mode 1)
    

    Note that with visual-line-mode soft wrapping happens at the window edge. If you’d like to control the extra indentation applied to continuation lines, you can tweak visual-wrap-extra-indent (default 0):

    ;; Add 2 extra spaces of indentation to wrapped lines
    (setq visual-wrap-extra-indent 2)
    

    Before and after

    Without visual-wrap-prefix-mode (standard visual-line-mode):

        Some deeply indented text that is quite long and
    wraps to the next line without any indentation, which
    looks terrible and breaks the visual structure.
    

    With visual-wrap-prefix-mode:

        Some deeply indented text that is quite long and
        wraps to the next line with proper indentation,
        preserving the visual structure nicely.
    

    A bit of history

    If this sounds familiar, that’s because it’s essentially the adaptive-wrap package from ELPA — renamed and integrated into core Emacs. If you’ve been using adaptive-wrap-prefix-mode, you can now switch to the built-in version and drop the external dependency.

    Closing Thoughts

    As mentioned earlier, I’m not into soft wrapping myself - I hate long lines and I prefer code to look exactly the same in every editor. Still, sometimes you’ll be dealing with some code you can’t change, and I guess many people don’t feel as strongly about cross-editor consistency as me. In those cases visual-wrap-prefix-mode will be quite handy!

    I have to admit I had forgotten about auto-fill-mode before doing the research for this article - now I’m wondering why I’m not using it, as pressing M-q all the time can get annoying.

    That’s all I have for you today. Keep hacking!

    1. I’ve always been in the M-q camp. 

  • Preview Regex Replacements as Diffs

    If you’ve ever hesitated before running query-replace-regexp across a large file (or worse, across many files), you’re not alone. Even experienced Emacs users get a bit nervous about large-scale regex replacements. What if the regex matches something unexpected? What if the replacement is subtly wrong?

    Emacs 30 has a brilliant answer to this anxiety: replace-regexp-as-diff.

    How it works

    Run M-x replace-regexp-as-diff, enter your search regexp and replacement string, and instead of immediately applying changes, Emacs shows you a diff buffer with all proposed replacements. You can review every single change in familiar unified diff format before committing to anything.

    If you’re happy with the changes, you can apply them as a patch. If something looks off, just close the diff buffer and nothing has changed.

    Multi-file support

    It gets even better. There are two companion commands for working across files:

    • multi-file-replace-regexp-as-diff — prompts you for a list of files and shows all replacements across them as a single diff.
    • dired-do-replace-regexp-as-diff — works on marked files in Dired. Mark the files you want to transform, run the command, and review the combined diff.

    The Dired integration is particularly nice — mark files with m, run the command from the Dired buffer, and you get a comprehensive preview of all changes.

    Note to self - explore how to hook this into Projectile.

    A practical example

    Say you want to rename a function across your project. In Dired:

    1. Mark all relevant files with m (or % m to mark by regexp)
    2. Run dired-do-replace-regexp-as-diff
    3. Enter the search pattern: \bold_function_name\b
    4. Enter the replacement: new_function_name
    5. Review the diff, apply if it looks good

    No more sweaty palms during large refactorings.1

    Closing Thoughts

    I have a feeling that in the age of LLMs probably few people will get excited about doing changes via patches, but it’s a pretty cool workflow overall. I love reviewing changes as diffs and I’ll try to incorporate some of the commands mentioned in this article in my Emacs workflow.

    That’s all I have for you today. Keep hacking!

    1. Assuming you’re still doing any large-scale refactorings “old-school”, that is. And that you actually read the diffs carefully! 

Subscribe via RSS | View Older Posts