HTML, Markdown or BBCode?

Or something different? Or just plain text maybe?

"User input is hard"

There are pretty prejudiced blog posts in favor or against the usage of BBCode for blog posts, comments etc.
Here for instance – continuing on this second post.

Personally, I am in favor of some abstraction for the frontend user, usually BBCode or a similar syntax. But it always depends on the usage in the app.
For forums, blog post comments, etc they are sure quite useful and avoids exposing HTML to users.

Let’s dig into some options more deeply.

Plain text

We all probably do or did it: Using .txt files to store some pseudomarkup enriched text including lists and headings.
While its simple and easy, the output on a a browser for such textual representations is pretty poor.
It starts with missing <p> paragraphs or newlines. All those would have to be manually modified.
In CakePHP there is a helper method that does exactly that: TextHelper::autoParagraph().
Adding autoLink() you even get your URLs and emails enriched as HTML.
It works perfectly for all varchar or text DB fields that are exposed as textareas in the forms and only needs to accept plain text.
This is all nice for very basic use cases, but is quite limited in the presentation.

HTML and WYSIWYG Editors

This widely used in backends as this is easy to implement. Just add a JS based WYSIWYG editor on top of your textareas and you got yourself
editing in the way it will be presented upon output. This live preview while typing sure is a nice bonus here.
One drawback often is the amount of overhead added via tags, classes and alike. But that depends on the editor being used.

Often times there is a "plain HTML" button to even allow custom HTML modifications in the source code.
While this is probably the most flexible approach, it is also the most dangerous. You can easily screw up the HTML – invalid HTML is hard to spot and might break
your whole frontend layout.

Tools such as htmlpurifier exist to support those editors in removing unwanted content, fixing broken HTML and cleaning up the source code (mess). This also makes it possible to allow a subset of HTML in non-backend textareas untrusted users can have access to.
In CakePHP, for example, you can attach a PurifiableBehavior to your model that cleans the content upon saving.

The most commonly used editors are probably listed on this comparison site.

BBCode

That was the very first abstraction level available. A lot of forum software still uses it quite thoroughly.

Everybody who opposes BBCode completely (like the above posts), does not know much about the user perspective.
Users don’t care about semantics and just want their link to be added. They want it simple and straight forward.
So [img]url[/img] in BBCode would be more intuitive as the resulting <img src="url" /> in HTML. Now what is easier to understand for a newbie? What is easier to read? Of course we have a little bit more processing with BBCode. But with caching that is minimized to nothing.
For admin backends it is usually easier to use HTML. This way they have more tags, attributes to chose from. We can also assume that they don’t want to harm the site and that they know what they’re doing.

The second important point is abstraction.
[code] can be <code>... or even <pre>... or a combination of both. Everybody understands [code] whereas [pre] etc is not so understandable. So we use [code] and afterwards transform it into the more complex HTML tags we need for markup. But the user text stays clean and straight-forward. He doesn’t need to know about the mapping of [code=php] to <pre id="xyz" class="php"><code>...

jbbcode looks like a solid implementation for this.

There is even a WSYIWYG editor for this now. Even though that one is based on JS – meaning you would have to keep
your custom rules redundant, once in PHP and once in JS for this editor. Also, there might be slight edge case differences between preview and actual result.

Markdown

This is becoming more and more popular these days. Not only for developers who use GitHub a lot, but also by bloggers using WordPress plugins for this or other websites that want to avoid the HTML overhead when displaying lightweight markup text.
The benefit here is that the text can be written almost as normal text. And even non-developers would easily understand lists such as:

- one
- two
- three

So it combines text with leightweight markup that is easily understandable by everyone – and probably even used intuitively without knowing it.
Translation into HTML is straightforward.

Currently most people prefer the GitHub flavored addons as the original markdown implementation hasn’t had any progress anymore the last years.
A nice demo and comparison shows the difference.

A slower but probably more powerful library is Ciconia. It is intended to be more flexible and extendable.

There are nice WSYIWYG editor implementations for Markdown: sofish.github.io/pen or markitup. But as with BBCode: since that one is based on JS – meaning you would have to keep
your custom rules redundant again, once in PHP and once in JS for this editor. Also, there might be slight edge case differences between preview and actual result.

Imagine you can write all your HTML in such a DRY and non-HTML-polluted way and still get nice HTML from it. And you can also use it for textual representation right way (e.g. text emails). Awesome.

Speaking of – there are nice tools that can actually take your already written HTML and revert it back into Markdown.
See http://blog.oddbit.com/2012/11/06/convert-html-to-markdown/ and to-markdown.

So if you have already existing records or blog posts in the old format, you might be able to convert them and then use markdown only from there on.

Summary

From what we read so far, the complexity would probably be best described as:

Text < Markdown < BBcode < HTML

And the further left, the better – not only for interoperability.
You should use the easiest format that suits your need.

HTML

  • No additional parsing needed (once it is validated and saved)
  • Cannot work without a sanitizing process for non-admins (stripping off any unwanted attributes or unsafe elements)

BBCode

  • Simpler sanitizing
  • Abstraction possible ([ video ] or [spoiler] tag)
  • Does not interfere with HTML Markup (e.g. for code snippet posts in dev blogs)

Markdown

  • A compromise between writing plain text and using minimal additional markup to enhance it
  • Intuitive
  • Less error-prone than BBCode regarding simple tags/markup
  • Maybe more error-prone than BBCode for more complex tags/markup
  • Nice for inline references/images/hyperlinks (links can be grouped at the bottom)

Combining them

Sometimes, the lightweight Markdown might not cover everything.
Writing a custom wrapper you can easily combine them enhancing your toolset.

You use Markdown as primary parser and parse the remaining BBCodes afterwords.
HTML could be allowed using a custom markdown rule, e.g.:

```html
<myhtml />
```

You could also use BBCode then, of course:

[html]
<myhtml />
[/html]

Adding plain HTML in between the Markdown and BBCode markup would work, as well (I do that^^). But this can easily break or have unseen side effects when trying to escape the source code. It is more difficult to distinguish between an HTML tag <b>I am bold</b> and just plain text containing those chars by accident: I like the brand <FooBar>. In that case all those non-HTML-intended characters need to be properly escaped, which really is annoying. So please don’t do that 😉

In the example above the whole page would be escaped using h() (the htmlspecialchars() in CakePHP). This way it is secure by default. And the tags from above would automatically undo h() to display the raw HTML again.

I think combining them in a logical order can in some cases make all the difference and solve all your problems at once.
You have the simple and lightweight markup as basis, you are able to apply custom codes via BBCode similar syntaxes and on top
you can always use real HTML for more complex scenarios (tables and alike) where necessary.

Side notes

This blog also uses Markdown for all posts and some BBCode for the comments, of course.

I did not mention Textile as even though introduced shortly before Markdown it never really became that popular. Those are similar, though, in their ideas.
A full list of further lightweight markup languages can be found at wikipedia.org.

Further addons

I use anchorjs to automatically add anchors on the fly using JS. This is especially useful if the parsed markdown itself
produces headings without any attributes. This way they are added without having to dig deeper into the post-processing of the markdown parser or modifying the resulting HTML. Another alternative would be anchorific.

You can use my MediaEmbed lib as addon for BBCode or Markdown (or even plain text) to auto embed Video snippets. See the examples/bbcode.php there for a live example.

Further Links and Resources

This BBCode parser once looked quite promising. But it now seems abandoned.

The MarkupParsers Plugin combines several markup syntaxes into a plugin.
There are even MarkdownView classes which would render a complete markdown-flavored layout into HTML. For me a helper or lib wrapper usually suffices as I usually only output parts of the layout as such a markup-flavored text.

See dillinger.io/ or stackedit.io for editing Markdown in real time.
And there is a cheat sheet to go along with it.

Outview

I will probably add some real life examples and comparisons soon on my sandbox site. Stay tuned.

5.00 avg. rating (95% score) - 2 votes

1 Comment

  1. For me it depends on the situation and needs and the experience level of the customer. Is he able to understand and maintain plain html? Is he used to bbcode because of former experience on forums and such. Or dos the customer expects something familiar like the office tools he daily uses.
    One thing that lets me gravitate to the use of HTML and WYSIWYG Editors is the way they can handle media, pictures and links.
    Also there is no way that what you enter in a WYSIWYG Editor will appear exactly in the final product or as intendet by the website designer. Too many variables are not within your controll. Therefore you might as well just use a tinyMce to input text, then make sure the code is valid HTML and sanitized and then format it acording to your css

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.