Jekyll: Import Disqus comments for Staticman
Due to privacy concerns with Disqus, I have exported old Disqus comments and integrated them directly in Jekyll.
For some years already, I try to rely for this website on less external resources and avoid ad-powered services to improve the privacy for my dear readers.
Recently, I removed the comments provided by Disqus from this blog, because Disqus introduced too much data sharing with many third parties. Norway just fined this year Disqus 2,5 Mio Euro for tracking without legal basis.
Please find hereafter some tips on how to export comments from Disqus and display them in a privacy-friendly way in your Jekyll blog.
Export Disqus Comments to JSON and YAML
- Disqus documents the export and export format at https://docs.disqus.com/developers/export/
-
Navigate to http://disqus.com/admin/discussions/export/ to export your comments to XML format.
The XML has principally 3 parts: meta data, a list with webpages and a list with comments that are linked each to a webpage (via a Disqus identifier) and possibly a parent comment in case the comment is a reply.
For use within Jekyll, I need to restructure the data and have a list of comments for each webpage by my own identifier (e.g. post slug) and convert everything to a format that Jekyll can handle, hence YAML, JSON, CSV, or TSV. I choose YAML.
-
Install the linux tool
xq
to manipulate XML files and export to JSON and the tooljq
.xq
is basically a wrapper ofjq
.pip install xq
Download binaries of jq here: https://stedolan.github.io/jq/download/
-
I convert then the Disqus XML export into a JSON file with the code in
export-disqus-xml2json.sh
- Then, I pipe the output through
import-json-yaml.rb
to split the list of comments into individual files for easy consumption by Jekyll.
# file: 'export-disqus-xml2json.sh'
#!/usr/bin/env sh
xq '.disqus | .thread as $threads | .post | map(select(.isDeleted == "false")) | map(.thread."@dsq:id" as $id | ($threads[] | select(."@dsq:id" == $id)) as $thread | {id: ("disqus-"+."@dsq:id"), date: .createdAt, slug: ($thread.id | tostring | gsub("/$";"") | split("/") | last), name: (if .author.name == "Robert" then "Robert Riemann" else .author.name end), avatar: .author | (if has("username") and .username != "rriemann" then "https://disqus.com/api/users/avatars/"+.username+".jpg" else null end), email: .author | (if has("username") and .username == "rriemann" then "my@mail.com" else null end), message, origin: ($thread.link | tostring | gsub("^https://blog.riemann.cc";"")), replying_to: (if has("parent") then ("disqus-"+.parent."@dsq:id") else null end)})' "$@"
Example comment from the JSON list:
{
"id": "disqus-4145062197",
"date": "2018-10-14T22:14:58Z",
"slug": "versioning-of-openoffice-libreoffice-documents-using-git",
"name": "Robert Riemann",
"avatar": null,
"email": "my@mail.com",
"message": "<p>I agree, it is not perfect. I have no solution how to keep the noise out of git.</p>",
"origin": "/2013/04/23/versioning-of-openoffice-libreoffice-documents-using-git/",
"replying_to": "disqus-4136593561"
}
The script import-json-yaml.rb
takes each comment and puts it in YAML format with a unique filenname in the folder named after the slug.
# file: 'import-json-yaml.rb'
#!/usr/bin/env ruby
require 'json'
require 'yaml'
require 'fileutils'
require 'date'
data = if ARGV.length > 0 then
JSON.load_file(ARGV[0])
else
JSON.parse(ARGF.read)
end
data.each do |comment|
FileUtils.mkdir_p comment['slug']
File.write "#{comment['slug']}/#{comment['id']}-#{Date.parse(comment['date']).strftime('%s')}.yml", comment.to_yaml
end
The output with tree
looks like:
_data
├── comments
│ ├── announcing-kubeplayer
│ │ ├── disqus-113988522-1292630400.yml
│ │ └── disqus-1858985256-1424044800.yml
│ ├── requires-owncloud-serverside-backend
│ │ ├── disqus-41270666-1269302400.yml
│ │ ├── disqus-41273219-1269302400.yml
...
Display Comments in Jekyll
Those comments are accessible in jekyll posts/pages via site.data.comments[page.slug]
Most helpful for the integration of comments to Jekyll was the post https://mademistakes.com/mastering-jekyll/static-comments-improved/.
<!-- file: 'my-comments.html' -->
{% assign comments = site.data.comments[page.slug] | sort %}
{% for comment in comments %}
{% assign index = forloop.index %}
{% assign replying_to = comment[1].replying_to | to_integer %}
{% assign avatar = comment[1].avatar %}
{% assign email = comment[1].email %}
{% assign name = comment[1].name %}
{% assign url = comment[1].url %}
{% assign date = comment[1].date %}
{% assign message = comment[1].message %}
{% include comment index=index replying_to=replying_to avatar=avatar email=email name=name url=url date=date message=message %}
{% endfor %}
<!-- file: 'comment' -->
<article id="comment{% unless include.r %}{{ index | prepend: '-' }}{% else %}{{ include.index | prepend: '-' }}{% endunless %}" class="js-comment comment {% if include.name == site.author.name %}admin{% endif %} {% unless include.replying_to == 0 %}child{% endunless %}">
<div class="comment__avatar">
{% if include.avatar %}
<img src="{{ include.avatar }}" alt="{{ include.name | escape }}">
{% elsif include.email %}
<img src="https://www.gravatar.com/avatar/{{ include.email | md5 }}?d=mm&s=60" srcset="https://www.gravatar.com/avatar/{{ include.email | md5 }}?d=mm&s=120 2x" alt="{{ include.name | escape }}">
{% else %}
<img src="/assets/img/avatar-60.jpg" srcset="/assets/img/avatar-120.jpg 2x" alt="{{ include.name | escape }}">
{% endif %}
</div>
<div class="comment__inner">
<header>
<p>
<span class="comment__author-name">
{% unless include.url == blank %}
<a rel="external nofollow" href="{{ include.url }}">
{{ include.name }}
</a>
{% else %}
{{ include.name }}
{% endunless %}
</span>
wrote on
<span class="comment__timestamp">
{% if include.date %}
{% if include.index %}<a href="#comment{% if r %}{{ index | prepend: '-' }}{% else %}{{ include.index | prepend: '-' }}{% endif %}" title="link to this comment">{% endif %}
<time datetime="{{ include.date | date_to_xmlschema }}">{{ include.date | date: '%B %d, %Y' }}</time>
{% if include.index %}</a>{% endif %}
{% endif %}
</span>
</p>
</header>
<div class="comment__content">
{{ include.message | markdownify }}
</div>
</div>
</article>
Receiving New Comments
Like explained in https://mademistakes.com/mastering-jekyll/static-comments/, the software https://staticman.net/ allows to feed POST HTTP requests to Github and Gitlab pull requests, so that comments can be added automatically. Of course, the website requires after each time a rebuild.
I had much trouble to setup Staticman. Eventually, I decided to use a Ruby CGI program that emails me the new comment as an attachment. I like Ruby very much. Once I figured out how to use the Gitlab API wrapper, I may also use pull requests instead of email attachments.
# file: 'index.rb'
#!/usr/bin/env ruby
Gem.paths = { 'GEM_PATH' => '/var/www/virtual/rriemann/gem' }
require 'cgi'
require 'yaml'
require 'date'
require 'mail'
cgi = CGI.new
# rudimentary validation
unless ENV['HTTP_ORIGIN'] == 'https://blog.riemann.cc' and
ENV['CONTENT_TYPE'] == 'application/x-www-form-urlencoded' and
ENV['REQUEST_METHOD'] == 'POST' and
cgi.params['email']&.first&.strip =~ URI::MailTo::EMAIL_REGEXP and
cgi.params['age']&.first == '' then # age is a bot honeypot
print cgi.http_header("status" => "FORBIDDEN")
print "<p>Error: 403 Forbidden</p>"
exit
end
output = Hash.new
date = DateTime.now
output['id'] = ENV['UNIQUE_ID']
output['date'] = date.iso8601
output['updated'] = date.iso8601
output['origin'] = cgi.params['origin']&.first
output['slug'] = cgi.params['slug']&.first&.gsub(/[^\w-]/, '') # some sanitizing
output['name'] = cgi.params['name']&.first
output['email'] = cgi.params['email']&.first&.downcase&.strip
output['url'] = cgi.params['url']&.first
output['message'] = cgi.params['message']&.join("\n").encode(universal_newline: true)
output['replying_to'] = cgi.params['replying_to']&.first
#Mail.defaults do
# delivery_method :sendmail
#end
Mail.defaults do
delivery_method :smtp, address: "smtp.domain", port: 587, user_name: "smtp_user", password: "smtp_password", enable_starttls_auto: true
end
mail = Mail.new do
from 'no-reply@domain' # 'rriemann'
to 'comments-recipient@domain' # ENV['SERVER_ADMIN']
reply_to output['email']
header['X-Blog-Comment'] = output['slug']
subject "New Comment from #{output['name']} for #{cgi.params['title']&.first}"
body <<~BODY
Hi blog author,
a new comment from #{output['name']} for https://blog.riemann.cc#{output['origin']}:
#{output['message']}
BODY
add_file(filename: "#{output['id']}-#{date.strftime('%s')}.yml", content: output.to_yaml)
end
mail.deliver
if mail.error_status then
print cgi.http_header("status" => "SERVER_ERROR")
cgi.print <<~RESPONSE
<p><b>Error: </b> #{mail.error_status}</p>
<p>An error occured. Please try again later.</p>
<p><a href="javascript:history.back()">Go back</a></p>
RESPONSE
else
print cgi.http_header
cgi.print <<~RESPONSE
<p><b>Thank you</b> for your fedback! Your comment is published after review.</p>
<p><a href="#{output['origin']}">Back to the previous page</a></p>
RESPONSE
end
To make it work with Apache, you may need to add these lines to the Apache configuration (could be a .htaccess
file):
DirectoryIndex index.html index.rb
Options +ExecCGI
SetHandler cgi-script
AddHandler cgi-script .rb