Djot PHP: A Modern Markup Parser

If you’ve ever wished Markdown was a bit more consistent and feature-rich, you’ll want to hear about Djot – and now there’s a complete PHP implementation available.

What is Djot?

Djot is a lightweight markup language by the author of Commonmark, Markdown and Pandoc. It takes the best ideas from Markdown while addressing many of its ambiguities and limitations. The syntax is familiar yet more predictable, making it an excellent choice for content-heavy applications. You could call it somewhat a possible successor.

The php-collective/djot composer package brings full Djot support to PHP 8.2+, with 100% compatibility with the official djot test suite.

Use Cases

Let’s talk about common cases where such a markup language would be beneficial:

  • Blog engines and CMS platforms
  • Documentation systems
  • Technical writing applications
  • User-generated content (comments, forums) with Profile-based restrictions
  • Any project requiring lightweight markup with advanced formatting
  • Customizable to specific (business relevant) markup/constructs
  • Secure by design

Let’s see if Djot fits these needs.

Feature Highlights

Rich Text Formatting

Djot supports the familiar emphasis and strong formatting, plus several extras:

Syntax Result Description
*Strong* Strong Bold text
_Emphasized_ Emphasized Italic text
{=Highlighted=} Highlighted Highlighted text
{+Inserted+} Inserted Inserted text
{-Deleted-} Deleted Deleted text
`code` code Inline code
E=mc^2^ E=mc2 Superscript
H~2~O H2O Subscript

Smart Typography

Smart quotes, em-dashes, en-dashes, and ellipsis are handled automatically:

  • "Hello" becomes “Hello” with curved quotes
  • --- becomes an em-dash (—)
  • -- becomes an en-dash (–)
  • ... becomes an ellipsis (…)

Tables with Alignment

Full table support with column alignment:

| Feature     | Status |   Notes |
|:------------|:------:|--------:|
| Left-align  | Center | Right   |

Task Lists

Native checkbox support for task lists:

- [x] Create parser
- [x] Create renderer
- [ ] World domination

Since this post is written in Djot, here’s the actual rendered output:

  • Create parser
  • Create renderer
  • World domination

Divs with Classes

Create styled containers with the triple-colon syntax:

::: warning
This is a warning message.
:::

Renders as:

<div class="warning">
<p>This is a warning message.</p>
</div>

Live demo:

Note: This is a note block. Use it for tips, hints, or additional information that complements the main content.

Warning: This is a warning block. Use it to highlight important cautions or potential issues that readers should be aware of.

Spans with Attributes

Add classes, IDs, or custom attributes to inline content:

This is [important]{.highlight #key-point}

Code Blocks

Fenced code blocks with syntax highlighting hints:

```php
$converter = new DjotConverter();
echo $converter->convert($text);
```

Captions (Images, Blockquotes & Tables)

The ^ prefix adds a caption to the block immediately above it:

Block Type HTML Output
Image <figure> + <figcaption>
Table <caption> inside <table>
Blockquote <figure> + <figcaption>
> To be or not to be,
> that is the question.

^ William Shakespeare

Renders as:

To be or not to be, that is the question.

William Shakespeare

The Markdown Elephant in the Room

Let’s be honest: Markdown has quirks. Ever spent 20 minutes debugging why your nested list won’t render correctly? Or wondered why _this_works_ but _this_doesn't_ in some parsers?

Djot was designed by someone who knows these pain points intimately – John MacFarlane literally wrote the CommonMark spec. With Djot, he started fresh with lessons learned from years of Markdown edge cases.

The result? A syntax that feels familiar but actually behaves predictably. Your users write content, not workarounds.

Why Djot Over Markdown?

  • More consistent syntax – Fewer edge cases and ambiguities
  • Better nesting – Clear rules for nested emphasis and containers
  • Built-in features – Highlights, insertions, deletions, and spans without extensions
  • Smart typography – Automatic without additional plugins
  • Cleaner specification – Easier to implement correctly
  • Easier to extend – AST makes adding new features straightforward
  • Secure by design – Random unfenced HTML like <b>...</b> shouldn’t be treated as such blindly

Djot vs Markdown: Quick Comparison

Feature Markdown Djot
Strong **text** or __text__ *text*
Emphasis *text* or _text_ _text_
Highlight ❌ (needs extension) {=text=}
Insert/Delete ❌ (needs extension) {+text+} / {-text-}
Attributes ❌ (non-standard) [text]{.class #id}
Divs ::: classname
Smart quotes Depends on parser Always on
Nested emphasis Inconsistent Predictable
Hard line breaks Two trailing spaces Visible \ (backslash)

Trailing spaces are problematic since most IDEs and editors auto-trim whitespace. Using a visible \ character is much cleaner.

Auto-HTML is also problematic for user-generated content. Djot treats everything as text by default – you must explicitly enable raw HTML (see below).

Basic Usage

Converting Djot to HTML is straightforward:

use Djot\DjotConverter;

$converter = new DjotConverter();
$html = $converter->convert($djotText);

Need XHTML output? Just pass a flag:

$converter = new DjotConverter(xhtml: true);

Advanced Usage

For more control, you can work with the AST directly:

$converter = new DjotConverter();

// Parse to AST
$document = $converter->parse($djotText);

// Manipulate the AST if needed...

// Render to HTML
$html = $converter->render($document);

Markdown compatibility modes

Note: This is specific to this library and not yet officially in the specs. Using this in your apps means, your users get the best out of both concepts, but it also means you need to clarify and document this and cannot “just” link to djot specs.

Soft break mode

Configure soft breaks as per context and user needs:

Mode HTML Output Browser Display
Newline \n No visible break (whitespace collapsed)
Space No visible break (whitespace collapsed)
Break <br> Visible line break
$renderer = $converter->getRenderer(); // HtmlRenderer

// Default - newline in source, invisible in browser
$renderer->setSoftBreakMode(SoftBreakMode::Newline);

// Space - same visual result, slightly smaller HTML
$renderer->setSoftBreakMode(SoftBreakMode::Space);

// Break - every source line break becomes visible <br>
$renderer->setSoftBreakMode(SoftBreakMode::Break);

This actually allows a certain compatibility with users that are used to Markdown line breaking within normal text. So this is useful for chats or simple text inputs.

As this only affects the rendering, but not the parsing, this is still fully spec-compliant in that way.

Significant Newlines Mode (Markdown-Like)

This mode is for users accustomed to Markdown’s “human” behavior where newlines intuitively interrupt blocks.

The Djot specification states: “Paragraphs can never be interrupted by other block-level elements.”

In standard Djot, this means lists and other elements require blank lines before them – more “spaced” than what Markdown users expect.

There’s an easy solution to get the best of both worlds:

$converter = new DjotConverter(significantNewlines: true);

$result = $converter->convert("Here's a list:
- Item one
- Item two");
// Output: <p>Here's a list:</p>\n<ul><li>Item one</li><li>Item two</li></ul>

If you need a marker character (-, *, +, >) at the start of a line without triggering a block, use escaping:

// Without escaping - creates a list
$result = $converter->convert("Price:
- 10 dollars");
// Output: <p>Price:</p><ul><li>10 dollars</li></ul>

// With escaping - literal text
$result = $converter->convert("Price:
\\- 10 dollars");
// Output: <p>Price:<br>- 10 dollars</p>

This returns you to standard Djot behavior for that line.

This mode is useful when migrating existing systems where users expect Markdown-like behavior – most content works without changes, and the rare edge cases can be escaped. For offline docs and anything needed to be more agnostic one should still use the default spec compliant way.

Customization

Custom Rendering with Events

Want to customize how specific elements render? Use the event system:

use Djot\Renderer\Event\RenderEvent;

$renderer = $converter->getRenderer();

// Convert :emoji: symbols to actual emoji
$renderer->addEventListener('render.symbol', function (RenderEvent $event) {
    $node = $event->getNode();
    $emoji = match ($node->getName()) {
        'smile' => '😊',
        'heart' => '❤️',
        'rocket' => '🚀',
        default => ':' . $node->getName() . ':',
    };
    $event->setHtml($emoji);
});

// Add target="_blank" to external links
$renderer->addEventListener('render.link', function (RenderEvent $event) {
    $link = $event->getNode();
    $url = $link->getDestination();
    if (str_starts_with($url, 'http')) {
        $link->setAttribute('target', '_blank');
        $link->setAttribute('rel', 'noopener noreferrer');
    }
});

Extensibility: Custom Patterns

Need @mentions, #hashtags, or wiki-style links? The parser supports custom inline patterns:

use Djot\Node\Inline\Link;
use Djot\Node\Inline\Text;

$parser = $converter->getParser()->getInlineParser();

// @mentions → profile links
$parser->addInlinePattern('/@([a-zA-Z0-9_]+)/', function ($match, $groups, $p) {
    $link = new Link('/users/' . $groups[1]);
    $link->appendChild(new Text('@' . $groups[1]));
    return $link;
});

// #hashtags → tag pages
$parser->addInlinePattern('/#([a-zA-Z][a-zA-Z0-9_]*)/', function ($match, $groups, $p) {
    $link = new Link('/tags/' . strtolower($groups[1]));
    $link->appendChild(new Text('#' . $groups[1]));
    return $link;
});

echo $converter->convert('Hey @john, check out #PHP!');
// <p>Hey <a href="/users/john">@john</a>, check out <a href="/tags/php">#PHP</a>!</p>

Custom block patterns are also supported for admonitions, tab containers, and more. See the Cookbook for recipes including ToC generation, math rendering, and image processing.

Feature Restriction: Profiles

SafeMode prevents XSS attacks, but what about controlling which markup features users can access? A comment section probably shouldn’t allow headings, tables, or raw HTML – not because they’re dangerous, but because they’re inappropriate for that context.

That’s where Profiles come in. They complement SafeMode by restricting available features based on context:

use Djot\Profile;

// Comment sections: basic formatting only
$converter = new DjotConverter(profile: Profile::comment());

// Blog posts: rich formatting, but no raw HTML
$converter = new DjotConverter(profile: Profile::article());

// Chat messages: text, bold, italic - that's it
$converter = new DjotConverter(profile: Profile::minimal());

SafeMode vs Profile

Concern SafeMode Profile
Purpose Security (XSS prevention) Feature restriction
Blocks javascript: URLs, event handlers Headings, tables, raw HTML
Target Malicious input Inappropriate formatting

Use both together for user-generated content:

$converter = new DjotConverter(
    safeMode: true,
    profile: Profile::comment()
);

Built-in Profiles

Each profile is designed for specific use cases:

  • Profile::full() – Everything enabled (admin/trusted content)
  • Profile::article() – Blog posts: no raw HTML, allows headings/tables
  • Profile::comment() – User comments: no headings/tables, adds rel="nofollow ugc" to links
  • Profile::minimal() – Chat: text, bold, italic only

Understanding Restrictions

Profiles can explain why features are restricted:

$profile = Profile::comment();
echo $profile->getReasonDisallowed('heading');
// "Headings would disrupt page hierarchy in user comments"

echo $profile->getReasonDisallowed('raw_block');
// "Raw HTML could bypass template styling and security measures"

Graceful Degradation

When users try restricted features, content converts to plain text by default – nothing is lost:

$converter = new DjotConverter(profile: Profile::minimal());
$html = $converter->convert('# Heading attempt');
// Renders: <p>Heading attempt</p> (text preserved, heading stripped)

For stricter handling, you can strip content entirely or throw exceptions:

$profile = Profile::minimal()->setDefaultAction(Profile::ACTION_STRIP);
// Or for APIs:
$profile = Profile::minimal()->setDefaultAction(Profile::ACTION_ERROR);

Architecture

The package uses a clean separation of concerns:

  • BlockParser – Parses block-level elements (headings, lists, tables, code blocks, etc.)
  • InlineParser – Processes inline elements within blocks (emphasis, links, code spans)
  • HtmlRenderer – Converts the AST to HTML output

This AST-based approach makes the codebase maintainable and opens possibilities for alternative output formats.

There are also other compatibility renderers available, as well as converters to convert existing markup to Djot.

WordPress Plugin: Djot Markup for WP

Want to use Djot in your WordPress site? There’s now a dedicated plugin that brings full Djot support to WordPress.

Features

  • Full Content Processing – Write entire posts in Djot syntax
  • Shortcode Support – Use [djot]...[/djot] for mixed content
  • Syntax Highlighting – Built-in highlight.js with 12+ themes
  • Profiles – Limit functionality per post/page/comment type, disable raw HTML
  • Admin Settings – Easy configuration via Settings → WP Djot
  • Markdown compatibility mode and soft-break settings if coming from MD

Fun fact: I just migrated this blog from custom markdown-hacks to Djot (and wrote this post with it). For that I used the built in migrator of that WP plugin as well as a bit of custom migration tooling.

I needed to migrate posts, articles and comments – all in all quite straightforward though. The new interface with quick markdown-paste and other useful gimmicks helps to speed up technical blogging actually. It is both safe (comments use the right profile) and reliable.

The plugin also comes with useful semantic customization right away:

Djot Syntax HTML Output Output Use Case
[CSS]{abbr="Cascading Style Sheets"} <abbr title="...">CSS</abbr> CSS Abbreviations
[Ctrl+C]{kbd=""} <kbd>Ctrl+C</kbd> Ctrl+C Keyboard input
[term]{dfn=""} <dfn>term</dfn> term Definition term

On top, it has some gotchas as extensions:

  • ![Alt text](https://www.youtube.com/watch?v=aVx-zJPEF2c){video} renders videos from all WP supported sources right away, customize the attributes as always: {video width=300 height=200}
  • Import from HTML or markdown

You can extend the customizations also on your own.

IDE Support: IntelliJ Plugin

For developers using PhpStorm, IntelliJ IDEA, or other JetBrains IDEs, there’s now an official Djot plugin available.

Features

  • Syntax Highlighting – Full TextMate grammar support for .djot files
  • Live Preview – Split-view editor with real-time rendered output
  • Theme Sync – Preview follows your IDE’s dark/light mode
  • Code Block Highlighting – Syntax highlighting within fenced code blocks
  • HTML Export – Save documents as rendered HTML files
  • Live Templates – Code snippets for common Djot patterns

The plugin requires JetBrains IDE 2024.1+ and Java 17+.

Performance

How fast is it? We benchmarked djot-php against Djot implementations in other languages:

Implementation ~56 KB Doc Throughput vs PHP
Rust (jotdown) ~1-2 ms ~30+ MB/s ~10x faster
Go (godjot) ~2-4 ms ~15+ MB/s ~5x faster
JS (@djot/djot) ~8 ms ~7 MB/s ~2x faster
PHP (djot-php) ~18 ms ~3 MB/s baseline
Python (markdown-it) ~37 ms ~1.5 MB/s ~2x slower*

*Python comparison uses Markdown parsers since no Djot implementation exists for Python.

Key observations: – PHP processes ~2-3 MB/s of Djot content consistently – Performance scales linearly O(n) with document size – Safe mode and Profiles have negligible performance impact – Comparable to Python, ~2x slower than JavaScript reference implementation

For typical blog posts and comments (1-10 KB), parsing takes under 5 ms. A 1 MB document converts in ~530 ms using ~44 MB RAM.

The performance documentation includes detailed benchmarks, memory profiling, and stress test results.

Enhancements & Extensions

The library and the WP plugin already have some useful and powerful extensions, notably:

  • Full attribute support
  • Boolean Attribute Shorthand
  • Fenced Comment Blocks
  • Multiple Definition Terms and Definition Descriptions
  • Captions for Images, Tables, and Block Quotes
  • Markdown compatibility mode (Significant Newlines)

These extend beyond the current spec but are documented as such. Keep this in mind if you need cross-application compatibility.

There is also a highlight.js extension available to also code highlight djot content.

Importing and Migration

You can often with a boolean flag just continue to support the current markup, and with new content add djot based content. For those that want to migrate, there is some built in tooling and converters: – HtmlToDjot – MarkdownToDjot – BbcodeToDjot

Fun fact: They also serve as a nice round-trip validation, to check if the transformation from and to is loss-free. Send a doc into it and reverse it, and the content should still “match” without loss of supported structures.

What’s Next?

The library is actively maintained with plans for:

  • Additional renderers (convert Djot back for interoperability)
  • More converters
  • More markup supported (not contradicting the specs)
  • Maybe some framework specific plugins or integrations

Contributions welcome!

Some personal notes

I would have liked URLs and images to have a bit more friendly syntax as well, e.g. [link: url "text"] style for links and [image: src "alt"] style for images. The ![](url) style still feels a bit too much like code syntax to me.

If I were ever to invent a new markup language, I would probably take a similar approach, but try to keep it even simpler by default. The {} braces seem a bit heavy for these common use cases, and for non-technical users.

One of the quirks I had to get used to, was the automated flow (line breaks are ignored) and the need for the visible (hard) line break if really desired. But in the end it usually helps to keep clear paragraphs. And I added compatibility options as opt-in for upgrading or usability ease.

Overall, Djot strikes a great balance between familiarity and consistency. And at least topics like URL/image can be easily added as extension if desired.

The PHP implementation with djot-php library is the most complete implementation of the standard available. It is perfectly suited for web-based usage. Make sure to check out the live sandbox and play around with the complex examples!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.