Or something different? Or just plain text maybe?
"User input is hard"
Personally, I am in favor of some abstraction for the frontend user, usually BBCode or a similar syntax. But it always depends on the usage in the app.
For forums, blog post comments, etc they are sure quite useful and avoids exposing HTML to users.
Let’s dig into some options more deeply.
We all probably do or did it: Using .txt files to store some pseudomarkup enriched text including lists and headings.
While its simple and easy, the output on a a browser for such textual representations is pretty poor.
It starts with missing
<p> paragraphs or newlines. All those would have to be manually modified.
In CakePHP there is a helper method that does exactly that: TextHelper::autoParagraph().
autoLink() you even get your URLs and emails enriched as HTML.
It works perfectly for all varchar or text DB fields that are exposed as textareas in the forms and only needs to accept plain text.
This is all nice for very basic use cases, but is quite limited in the presentation.
HTML and WYSIWYG Editors
This widely used in backends as this is easy to implement. Just add a JS based WYSIWYG editor on top of your textareas and you got yourself
editing in the way it will be presented upon output. This live preview while typing sure is a nice bonus here.
One drawback often is the amount of overhead added via tags, classes and alike. But that depends on the editor being used.
Often times there is a "plain HTML" button to even allow custom HTML modifications in the source code.
While this is probably the most flexible approach, it is also the most dangerous. You can easily screw up the HTML – invalid HTML is hard to spot and might break
your whole frontend layout.
Tools such as htmlpurifier exist to support those editors in removing unwanted content, fixing broken HTML and cleaning up the source code (mess). This also makes it possible to allow a subset of HTML in non-backend textareas untrusted users can have access to.
In CakePHP, for example, you can attach a PurifiableBehavior to your model that cleans the content upon saving.
The most commonly used editors are probably listed on this comparison site.
That was the very first abstraction level available. A lot of forum software still uses it quite thoroughly.
Everybody who opposes BBCode completely (like the above posts), does not know much about the user perspective.
Users don’t care about semantics and just want their link to be added. They want it simple and straight forward.
[img]url[/img] in BBCode would be more intuitive as the resulting
<img src="url" /> in HTML. Now what is easier to understand for a newbie? What is easier to read? Of course we have a little bit more processing with BBCode. But with caching that is minimized to nothing.
For admin backends it is usually easier to use HTML. This way they have more tags, attributes to chose from. We can also assume that they don’t want to harm the site and that they know what they’re doing.
The second important point is abstraction.
[code] can be
<code>... or even
<pre>... or a combination of both. Everybody understands [code] whereas [pre] etc is not so understandable. So we use [code] and afterwards transform it into the more complex HTML tags we need for markup. But the user text stays clean and straight-forward. He doesn’t need to know about the mapping of [code=php] to
<pre id="xyz" class="php"><code>...
jbbcode looks like a solid implementation for this.
There is even a WSYIWYG editor for this now. Even though that one is based on JS – meaning you would have to keep
your custom rules redundant, once in PHP and once in JS for this editor. Also, there might be slight edge case differences between preview and actual result.
This is becoming more and more popular these days. Not only for developers who use GitHub a lot, but also by bloggers using WordPress plugins for this or other websites that want to avoid the HTML overhead when displaying lightweight markup text.
The benefit here is that the text can be written almost as normal text. And even non-developers would easily understand lists such as:
- one - two - three
So it combines text with leightweight markup that is easily understandable by everyone – and probably even used intuitively without knowing it.
Translation into HTML is straightforward.
Currently most people prefer the GitHub flavored addons as the original markdown implementation hasn’t had any progress anymore the last years.
A nice demo and comparison shows the difference.
A slower but probably more powerful library is Ciconia. It is intended to be more flexible and extendable.
There are nice WSYIWYG editor implementations for Markdown: sofish.github.io/pen or markitup. But as with BBCode: since that one is based on JS – meaning you would have to keep
your custom rules redundant again, once in PHP and once in JS for this editor. Also, there might be slight edge case differences between preview and actual result.
Imagine you can write all your HTML in such a DRY and non-HTML-polluted way and still get nice HTML from it. And you can also use it for textual representation right way (e.g. text emails). Awesome.
Speaking of – there are nice tools that can actually take your already written HTML and revert it back into Markdown.
See http://blog.oddbit.com/2012/11/06/convert-html-to-markdown/ and to-markdown.
So if you have already existing records or blog posts in the old format, you might be able to convert them and then use markdown only from there on.
From what we read so far, the complexity would probably be best described as:
Text < Markdown < BBcode < HTML
And the further left, the better – not only for interoperability.
You should use the easiest format that suits your need.
- No additional parsing needed (once it is validated and saved)
- Cannot work without a sanitizing process for non-admins (stripping off any unwanted attributes or unsafe elements)
- Simpler sanitizing
- Abstraction possible (
[ video ]or
- Does not interfere with HTML Markup (e.g. for code snippet posts in dev blogs)
- A compromise between writing plain text and using minimal additional markup to enhance it
- Less error-prone than BBCode regarding simple tags/markup
- Maybe more error-prone than BBCode for more complex tags/markup
- Nice for inline references/images/hyperlinks (links can be grouped at the bottom)
Sometimes, the lightweight Markdown might not cover everything.
Writing a custom wrapper you can easily combine them enhancing your toolset.
You use Markdown as primary parser and parse the remaining BBCodes afterwords.
HTML could be allowed using a custom markdown rule, e.g.:
```html <myhtml /> ```
You could also use BBCode then, of course:
[html] <myhtml /> [/html]
Adding plain HTML in between the Markdown and BBCode markup would work, as well (I do that^^). But this can easily break or have unseen side effects when trying to escape the source code. It is more difficult to distinguish between an HTML tag
<b>I am bold</b> and just plain text containing those chars by accident:
I like the brand <FooBar>. In that case all those non-HTML-intended characters need to be properly escaped, which really is annoying. So please don’t do that 😉
In the example above the whole page would be escaped using h() (the htmlspecialchars() in CakePHP). This way it is secure by default. And the tags from above would automatically undo h() to display the raw HTML again.
I think combining them in a logical order can in some cases make all the difference and solve all your problems at once.
You have the simple and lightweight markup as basis, you are able to apply custom codes via BBCode similar syntaxes and on top
you can always use real HTML for more complex scenarios (tables and alike) where necessary.
This blog also uses Markdown for all posts and some BBCode for the comments, of course.
I did not mention Textile as even though introduced shortly before Markdown it never really became that popular. Those are similar, though, in their ideas.
A full list of further lightweight markup languages can be found at wikipedia.org.
I use anchorjs to automatically add anchors on the fly using JS. This is especially useful if the parsed markdown itself
produces headings without any attributes. This way they are added without having to dig deeper into the post-processing of the markdown parser or modifying the resulting HTML. Another alternative would be anchorific.
You can use my MediaEmbed lib as addon for BBCode or Markdown (or even plain text) to auto embed Video snippets. See the
examples/bbcode.php there for a live example.
Further Links and Resources
This BBCode parser once looked quite promising. But it now seems abandoned.
The MarkupParsers Plugin combines several markup syntaxes into a plugin.
There are even MarkdownView classes which would render a complete markdown-flavored layout into HTML. For me a helper or lib wrapper usually suffices as I usually only output parts of the layout as such a markup-flavored text.
I will probably add some real life examples and comparisons soon on my sandbox site. Stay tuned.