Writing A Jekyll Converter for Literate Programming
An important caveat here is that I am not a ruby developer. But I wanted to use Jekyll to generate this blog and I want to blog about code, which for me means literate programming. As I couldn't find an existing Jekyll converter that did all the things that I wanted, I figured "How hard can it be" and wrote my own...
Jekyll makes writing a converter straightforward; you simply need to extend Jekyll::Converter
and implement the matches
, output_ext
, and convert
methods.
Only one of those methods is non-trivial, the convert
method.
All the heavy lifting for this is going to be done by existing gems, I use commonmarker for markdown parsing, and rouge to syntax color code
The conversion process is then
- Parse the input file into an AST
- Render to HTML Using a custom renderer
- Derived from default html renderer
- Overloads code_block to
- Syntax color the code
- Store and concatenate labelled blocks of code
- Overloads link to add functionality to internal links to code
- Add a method to generate data islands for the labelled code and a method to allow a user to download the code (from an internal link in the page)
So working from the inside out
Embedding Literate Programming in Markdown
Firstly the scheme for handling code blocks; the convention for code fences is that a word after the fence is interpreted as describing the language of the fenced block, for example:
``` ruby
puts "Hello, World!"
```
For literate programming I am implementing a scheme where the text after the fence is extended to name a block and optionally to specify an action. There are three possibilities
Starting a new named block
Named blocks are identified using the syntax <<name>>
where name is any sequence of characters other than >>
A new named block is started by appending : <<name>>=
after the language specifier on the code fence line. For example:
``` ruby : <<hello>>=
puts "Hello, World!"
```
puts "Hello, World!"
This will start a block with the name hello
with the contents puts "Hello, World!"\n
Appending to an existing named block
An appended block is created by appending : <<name>>=+
to the code fence line
``` ruby : <<hello>>=+
exit
```
exit
will append exit\n
to the block named hello
Declaring a top-level (file) named block
To declare that a named block will generate a file then append : <<name.*>>= filename
``` python : <<hello.*>>= hello.py
print('Hello, World')
```
print('Hello, World')
This will cause the code associated with that name to be embedded into the output HTML document, and a link to #hello.py
will allow that to be downloaded. You can try it here
Referencing named blocks
Within a block, another named block can be referenced by using <<name>>
inline
in the code. This causes the complete content of the named block to be
inserted — and recursively expanded — at that location. This
expansion occurs after then entire document has been read, allowing forward
references. No check is made for circular references, they'll just blow the
stack.
Note: because of this syntax for named blocks, if your code includes
something that a regex will match as <<name>>
then things will go poorly. The
solution that I have adopted, as you may notice below, is to split this when it
is in a string, so "<<foo>>"
becomes "<<" + "foo>>"
and use the square bracket trick when using regex, so /<<.*?>>/
becomes/[<]<.*?>>/
.
Implementation
The implementation consists of a converter and a renderer, and the entry point class is a Jekyll converter, which is simply a shell to call the converter. In traditional literate programming terminology, this is the weave process.
This structure also allows a CLI frontend to call the same converter to extract sources from literate files. In traditional literate programming terminology, this is the tangle process.
Renderer
The renderer extends the CommonMarker::HtmlRenderer
class, and uses rouge
to
implement syntax coloring.
require "commonmarker"
require "rouge"
$download_code_fn = <<JAVASCRIPT
<<download_code>>
JAVASCRIPT
class LiterateHtmlRenderer < CommonMarker::HtmlRenderer
initialize
The initializer establishes two hashes,
- sources – stores the contents of the named blocks. The key is the name, the value is the concatenated content of that block
- external_names – maps the filenames of top-level named blocks to their internal names.
def initialize
super
@sources = {}
@external_names = {}
end
def sources
@sources
end
def external_names
@external_names
end
make_canonical
A helper used to make block names canonical, this simply replaces any whitespace with an underscore character.
def make_canonical(value)
value.downcase().gsub(/\s+/, "_")
end
link
Overloads the regular handling of links, if the link is to a local fragment,
i.e. it starts with #
, then this is assumed to be a link to download a
top-level named block, where the url is treated as the filename for the block,
this then calls a JavaScript function download_code
to fetch the code from
the data island within the output document.
All other links are handled as normal markdown links.
def link(node)
out('<a href="', node.url.nil? ? "" : escape_href(node.url), '"')
if node.title && !node.title.empty?
out(' title="', escape_html(node.title), '"')
end
if node.url != nil && node.url.start_with?("#")
out(' onclick="', "download_code('", node.url, "')", '"')
end
out(">", :children, "</a>")
end
code_block
This is where the named source blocks are rendered, identified and stored
def code_block(node)
block do
out('<pre class="highlight"><code')
The commonmark parser stores the text following the opening code fence in
fence_info
, if this is present, then split it on whitespace and check the format
to identifiy language, name, operation, and whether this is a top-level block
First, the language is identified
if node.fence_info && !node.fence_info.empty?
fence_parts = node.fence_info.split(/\s+/)
language = fence_parts[0]
out(" class=\"highlight language-#{fence_parts[0]}\">")
Then check for :
separator
if fence_parts.length > 2 && fence_parts[1] == ":"
Check that the name matches the expected format, and if it does, check if this
is a top-level named block—is_source
— and if it is a continuation
of a previous block—is_concat
.
m = /[<]<(.*)>>=(\+?)/.match(fence_parts[2])
if m != nil
name = make_canonical m[1]
is_source = name.end_with? ".*"
is_concat = m[2] == "+"
If this is a continuation, check that the name already exists, and concatenate to that block.
if is_concat
if @sources[name] != nil
@sources[name] << node.string_content
else
@warnings.add("WARNING: Adding to undefined literate block <<" + "#{name}>>=+")
end
else
This isn't a continuation, if this is a top-level block check for a filename,
and create the association in external_names
if is_source
if fence_parts.length > 3
external_name = fence_parts[3]
if @external_names[external_name] == nil
@external_names[external_name] = name
else
@warnings.add("WARNING: Duplicate source name #{external_name} for literate block <<" + "#{name}>>=")
end
else
@warnings.add("WARNING: Missing source name for literate block <<" + "#{name}>>=")
end
end
Check that this isn't a duplicate declaration and create the hash entry in
sources
for this name.
if @sources[name] == nil
@sources[name] = node.string_content
else
@warnings.add("WARNING: Duplicate literate block <<" + "#{name}>>=")
end
end
end
end
else
out(" class=\"highlight\">")
end
If a language was specified, then format the string content, otherwise simply escape it.
if language != nil
formatted = Rouge.highlight node.string_content, language, "html"
out(formatted)
else
out(escape_html(node.string_content))
end
out("</code></pre>")
end
end
append_literate_blocks
This is called after the body of the document has been rendered. This appends a data island for each top-level named block in the document. And adds a JavaScript function to allow the content of those data islands to be downloaded
def append_literate_blocks
output = ""
@external_names.each_pair do |key, value|
source = expand_source(value)
output << '<script type="text/x-literate-src" id="' << key <<
'">' << escape_html(source) << "</script>\n"
end
if external_names.length != 0
output << "<script>\n" << $download_code_fn << "</script>"
end
output.force_encoding("utf-8")
end
download_code
This is the JavaScript fragment that is included to allow downloading a data island.
First find the element whose id matches the argument (ignoring the leading #
)
and verify that it is a script tag with type text/x-literate-src
If it doesn't exist or doesn't match, then do nothing.
function download_code(code_id) {
const el = document.getElementById(code_id.substring(1));
if (!el || el.tagName != 'SCRIPT' || el.getAttribute('type') != 'text/x-literate-src') {
return;
}
If the data island does exist, then download the text in the data island,
- parse the text to undo any html encoding
- put the text into a Blob,
- then create an anchor tag
- with the filename as the
download
attribute, and - set the blob's object URL as the
href
- with the filename as the
- cause the
click
action on the anchor tag
This will cause a file download
const parsed = new DOMParser().parseFromString(el.textContent, 'text/html');
const src = new Blob([parsed.documentElement.textContent], { type: 'text/plain' });
const dl = document.createElement('a');
dl.setAttribute('download', code_id.substring(1));
dl.href = URL.createObjectURL(src);
dl.setAttribute('target', '_blank');
dl.click();
}
expand_source
This recursively expands sources for named literate blocks. It splits the text using a regex that matches the literate reference, and recursively inserts any referenced text.
def expand_source(source_name)
raw = @sources[source_name]
if raw == nil
@warnings.add("WARNING: Cannot find literate block labelled <<" + "#{source_name}>>=")
return "Cannot Find <<" + "#{source_name}>> @sources: #{@sources.keys}"
end
output = ""
raw.split(/[<]<([^"<>]*?)>>/).each_with_index do |val, index|
if index.even?
output << val
else
output << expand_source(make_canonical(val))
end
end
output
end
expand_external_source
Helper method that expands the source for a top-level named block.
def expand_external_source(external_name)
expand_source @external_names[external_name]
end
end
Converter
This is simply a convenience wrapper around CommonMarker and the renderer.
The work is all done in convert
, the accessors are to allow writing a CLI
frontend for the tangle functionality.
require "commonmarker"
require_relative "./renderer"
class LiterateConverter
def initialize
@renderer = LiterateHtmlRenderer.new
@converted = false
end
def sources
if not @converted
raise RuntimeError, "Nothing has been converted."
end
@renderer.sources
end
def external_names
if not @converted
raise RuntimeError, "Nothing has been converted."
end
@renderer.external_names
end
def external_source(name)
if not @converted
raise RuntimeError, "Nothing has been converted."
end
@renderer.expand_external_source name
end
convert
This contains the primary functionality of this class
- Use
CommonMarker
to convert the input content to a document model - Render the doc with the renderer
- Append the literate blocks
- Return the generated html
def convert(content)
if @converted
raise RuntimeError, "Cannot convert twice, use a new instance."
end
doc = CommonMarker.render_doc content, [:DEFAULT, :table]
rendered = @renderer.render(doc)
rendered << @renderer.append_literate_blocks()
@converted = true
rendered
end
end
Jekyll Converter
A simple implementation of the Jekyll::Converter
that in the one significant
method — convert
— calls into the converter.
require "jekyll"
require_relative "jekyll-literate/converter"
module Jekyll
class JekyllLiterateConverter < Jekyll::Converter
safe true
priority :low
DEFAULT_CONFIGURATION = {
"literate_ext" => "literate",
}
def initialize(config = {})
@config = Jekyll::Utils.deep_merge_hashes(DEFAULT_CONFIGURATION, config)
@converter = LiterateConverter.new
end
def sources
@converter.sources
end
def external_names
@converter.external_names
end
def extname_list
@extname_list ||= @config["literate_ext"].split(",").map { |e| ".#{e}" }
end
def matches(ext)
extname_list.include? ext.downcase
end
def output_ext(ext)
".html"
end
def convert(content)
@converter = LiterateConverter.new
@converter.convert(content)
end
end
end
Tangle
A simple CLI wrapper around the converter class
#!/usr/bin/env ruby
require "fileutils"
require "optimist"
require_relative "../lib/jekyll-literate/converter"
class TangleError < RuntimeError
end
Options
Uses optimist for basic command line parsing to give the user a little flexibility.
opts = Optimist::options do
banner <<-BANNER
Generates source files from literate inputs
Usage:
jl-tangle [options] <filenames>+
BANNER
opt :output_dir, "Output directory", :short => "-o", :type => String, :default => "."
opt :stop_on_first, "Stop on the first error processing a file", :short => "-s"
opt :dry_run, "Dry run, only print the full paths and sizes of the files that would be generated", :short => "-n"
opt :verbose, "Enable verbose output", :short => "-v"
end
Optimist::die "At least one filename must be specified" if ARGV.length == 0
capture some simple booleans from the command line.
verbose = opts[:verbose]
dry_run = opts[:dry_run]
dry_verbose = verbose || dry_run
Check if we need to create an output directory, or if it already exists and cannot be used (for example it is a file)
fulldir = File.expand_path opts[:output_dir]
Optimist::die "Output directory #{fulldir} is a file" if (File.exist?(fulldir) && !Dir.exist?(fulldir))
puts "Output directory: #{fulldir}" if verbose
if !Dir.exist?(fulldir)
puts "Creating Output directory" if dry_verbose
FileUtils.mkdir_p fulldir unless dry_run
end
Process each input file — any remaining arguments are input filenames.
- check if the file exists
- read the file
- convert it
- process each external name
stop_on_first = opts[:stop_on_first]
errors = 0
ARGV.each do |file|
begin
filepath = File.expand_path file
raise TangleError, "Input file #{filepath} does not exist" unless File.exist?(filepath)
puts "Tangling #{filepath}" if verbose
content = File.open(filepath, "r:utf-8", &:read)
converter = LiterateConverter.new
converter.convert content
For each external name (top-level block) in the input file
- get the output file path
- create the directory if necessary
- write the file from the expanded source
converter.external_names.each_key do |key|
output_file = File.expand_path key, fulldir
raise TangleError, "Output file #{output_file} is outside of the output directory #{fulldir}" unless output_file.start_with?(fulldir)
puts "Generating #{output_file}" if verbose
dir = File.dirname output_file
if !Dir.exist?(dir)
puts "Creating file directory #{dir}" if dry_verbose
FileUtils.mkdir_p dir unless dry_run
end
source = converter.external_source key
puts "Writing #{source.length} characters to #{output_file}" if dry_verbose
if not dry_run
File.open(output_file, "w:UTF-8") do |f|
f.write source
end
end
end
rescue RuntimeError => error
Optimist::die error.message if stop_on_first
STDERR.puts "Error: #{error.message}"
errors += 1
end
end
exit(errors)
Files
This has the following file structure:
- bin/
- lib/
- jekyll-literate/
- jekyll-literate.rb