Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more advanced math rendering by default #55

Closed
dhimmel opened this issue Aug 14, 2017 · 28 comments
Closed

Enable more advanced math rendering by default #55

dhimmel opened this issue Aug 14, 2017 · 28 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Aug 14, 2017

The current default math used in our pandoc build command is severely limited: see the "TeX math in HTML" section of the pandoc demos. Pandoc has support for several more advanced methods for math rendering in HTML.

The question is which one to choose? I've seen MathJax used before in scholarly publishing. However, KaTex is faster to render. There are also several more options.

@slochower did you look into the math options at all for b03e1c3?

@dhimmel
Copy link
Member Author

dhimmel commented Aug 14, 2017

--mathjax worked for @zietzm in zietzm/Vagelos2017@8a3a633. See https://zietzm.github.io/Vagelos2017/ (versioned). The conversion to PDF via wkhtmltopdf was okay, not great: manuscript.pdf.

@slochower
Copy link
Collaborator

slochower commented Aug 14, 2017

No, I didn't really look at options, but I'm using the --katex option myself and it's okay but not great.

HTML:
image

This is on Safari 10.1.2 and I've changed to HTML5 output because it looks better on my phone. I can't tell a difference between the HTML and HTML5 output on a desktop browser.

PDF:
image

This is using the built-in Safari PDF renderer.

My impression was that I'd probably just produce a PDF through LaTeX when the time came, because I often rely on LaTeX macros that aren't available as javascript libraries, for example, using mhchem to render chemistry.

@slochower
Copy link
Collaborator

(Digression from the main topic, but I see a table rendering bug in the manuscript PDF you linked. It looks like the header gets reproduced across page breaks -- nice -- but overwrites a row in the process. We may want to report upstream.)

image

@dhimmel
Copy link
Member Author

dhimmel commented Aug 14, 2017

One benefit of MathJax is that you can right-click the equation to show the MathML or TeX Commands. Since Manubot manuscripts are entirely open source (you can always get back to the source TeX), this is not essential (but is still nice). What really is annoying is when you can't get the TeX out of an online equation and hence have no way to copy it.

I've changed to HTML5 output because it looks better on my phone.

Is Manubot Rootstock not using the HTML5 output?!?! Want to fix this with a PR?

Nice to see your in-progress manuscripts

We may want to report upstream

See wkhtmltopdf/wkhtmltopdf#3557. I wonder if we could implement the td, tr { page-break-inside: avoid; } fix.

@slochower
Copy link
Collaborator

slochower commented Aug 14, 2017

right-click the equation to show the MathML or TeX Commands

Fair point. I've always enjoyed that on Wikipedia, for example.

Is Manubot Rootstock not using the HTML5 output?!?! Want to fix this with a PR?

By default, build.sh uses html as the format.
https://github.com/greenelab/manubot-rootstock/blob/03de82178edfc62f43feb57add42739c39b3b7e0/build/build.sh#L31
I'm actually not sure what the difference is between --html and --html5 (I couldn't find clear guidance in the pandoc documentation) and how it plays out through the CSS that's applied, but the --html5 output zooms to an easily readable font size on my phone and the --html does not. I'll submit the single line change in a PR and hope there aren't any unintended consequences.

I wonder if we could implement the td, tr { page-break-inside: avoid; } fix.

Not sure about that, but worth a shot.

@dhimmel
Copy link
Member Author

dhimmel commented Aug 14, 2017

This github code embed feature you triggered is rad!

As a screenshot:

embed

As quoted text:

By default, build.sh uses html as the format.
https://github.com/greenelab/manubot-rootstock/blob/03de82178edfc62f43feb57add42739c39b3b7e0/build/build.sh#L31
I'm actually not sure what the difference is between --html and --html5 (I couldn't find clear guidance in

Never knew this existed.

dhimmel pushed a commit that referenced this issue Aug 15, 2017
This build is based on
9f4deeb.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/264565127
https://travis-ci.org/greenelab/manubot-rootstock/jobs/264565128

[ci skip]

The full commit message that triggered this build is copied below:

Pandoc: use --html5 not --html5 for output (#56)

Refs #55 (comment)
dhimmel pushed a commit that referenced this issue Aug 15, 2017
This build is based on
9f4deeb.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/manubot-rootstock/builds/264565127
https://travis-ci.org/greenelab/manubot-rootstock/jobs/264565128

[ci skip]

The full commit message that triggered this build is copied below:

Pandoc: use --html5 not --html5 for output (#56)

Refs #55 (comment)
@slochower
Copy link
Collaborator

This github code embed feature you triggered is rad!

Just navigate to the file, hit y for current commit URL (not sure if necessary), and click three dots to get permalink.
screen shot 2017-08-14 at 5 16 25 pm

@dhimmel
Copy link
Member Author

dhimmel commented Aug 15, 2017

I often rely on LaTeX macros that aren't available as javascript libraries, for example, using mhchem to render chemistry.

It looks like you may be able to extend MathJax to support mhchem. However, KaTex does not support mhchem... yet.

Are there other macros that you have in mind specifically? If the overhead for enabling these extensions is low, we may want to consider enabling some by default. I imagine they could provide compelling incentives to use the Manubot. My only worry is that they will increase the HTML lockin, when the longterm best option may be to create JATS XML (see #51) as that may become the standard for scholarly content and the best route to a nice frontend.

@slochower
Copy link
Collaborator

It looks like you may be able to extend MathJax to support mhchem. However, KaTex does not support mhchem... yet.

Exactly. I saw that thread but I haven't had time to play around with it yet.

Are there other macros that you have in mind specifically?

In my latest stuff, I think I used mhchem, siunitx, and some packages for long tables in the SI, I think it was booktabs and threeparttable. I can double check later on (see below).

the longterm best option may be to create JATS XML (see #51)

Neat! I didn't know about that. One pain point right now (for me) is not being able to see the figures while writing. One of my use cases at the moment is more of a notebook than a manuscript (with the goal that I could turn the notebook into an article without much trouble). It's much easier to write about images when they are rendered; I've been using vscode to write. Does Texture render URL images in Markdown by any chance?

Edit: the full preamble for a scientific manuscript in LaTeX with all the macros I used, for reference.

\documentclass[12pt]{extarticle}
\usepackage[english]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{csquotes}
\usepackage{siunitx}           
\usepackage{helvet}            % to make the document Helvetica
\renewcommand{\familydefault}{\sfdefault}
\usepackage[EULERGREEK]{sansmath} % to get Greek units to behave
\sansmath
\usepackage{amsmath, amsfonts} % math
\usepackage{bm}
\usepackage[version=4]{mhchem} % for formatting magnesium and ATP
\usepackage[varioref=false]{chemstyle}         % symbols
\usepackage{graphicx}          % images
\usepackage{hyperref}          % for TOC
\usepackage[usenames, dvipsnames]{xcolor}
\definecolor{purple}{RGB}{112,48,160}
\usepackage{booktabs, multirow, threeparttable} % For the tables
\usepackage{etoolbox}
\robustify\tnote               % for tnote with siunitx
\usepackage{esdiff}            % derivatives
\usepackage{comment}           % to hide stuff in drafts
\usepackage{setspace}
% \usepackage{lineno}
% \setstretch{1.2}
\setstretch{2}
% Comment below for easier editing on small screens (makes document narrower)
\usepackage[left=0.4in, right=0.4in, top=1in, bottom=1in, headheight=13.6pt]{geometry}
\usepackage[backend=biber, style=nature, doi=false, isbn=false, url=false]{biblatex}

@dhimmel
Copy link
Member Author

dhimmel commented Aug 15, 2017

It's a difficult decision between mathjax and katex. I'm thinking we should go with mathjax because it:

  • supports more of latex math (not sure whether katex is fully caught up at the moment, but has had more limited support in the past)
  • enables copying the raw TeX
  • due to its maturity has a nice extensions, such as mhchem

See also https://www.bersling.com/2016/05/10/displaying-math-on-the-web/. If MathJax becomes too slow, users can always swap it out for katex.

Neat! I didn't know about that. One pain point right now (for me) is not being able to see the figures while writing.

Can you preview the markdown in your text editor? vscode claims to support side-by-side markdown preview.

Does Texture render URL images in Markdown by any chance?

Texture only reads JATS XML (not markdown). We'd be converting markdown to JATS in pandoc and then feeding that to Texture... but we'd still be writing in markdown.

@agitter
Copy link
Member

agitter commented Aug 15, 2017

One pain point right now (for me) is not being able to see the figures while writing.

The Atom editor also shows local images in the side-by-side Markdown preview. It doesn't resize them according to the {height="13px"} annotation though so the icons in the author list are huge.

dhimmel added a commit to dhimmel/manubot-rootstock that referenced this issue Aug 15, 2017
Closes manubot#55

Also reorder some pandoc CLI arguments.
dhimmel added a commit that referenced this issue Aug 15, 2017
Closes #55

Also reorder some pandoc CLI arguments.
@slochower
Copy link
Collaborator

@dhimmel Oh, vscode is previewing images via https (but not http) URLs through cdn.rawgit... for me. This must have snuck in with one of the recent updates and I didn't notice. Although sizing is off, exactly as @agitter suggested. @agitter can you setup Atom to automatically wrap one sentence per line for Markdown? I've been trying to retrain myself to manually do that for better git integration, unsuccessfully, and none of the built-in wrapping modes seems to fit the bill.

@agitter
Copy link
Member

agitter commented Aug 16, 2017

@slochower I believe that all I did in Atom was View -> Toggle Soft Wrap. It wraps the text on screen without creating line breaks:

image
This is all line 36.

@slochower
Copy link
Collaborator

@agitter Ah. And then you just press enter at the end of the sentence instead of space? Hard habit to break for me.

@agitter
Copy link
Member

agitter commented Aug 16, 2017 via email

@slochower
Copy link
Collaborator

@dhimmel There's a non-ideality with the interaction of this and wkhtmltopdf. The rendered PDF now shows a javascript (?) popup saying the math has been rendered. See screenshot.
image

I couldn't find an issue directly addressing this. Maybe we need a delay before rendering? wkhtmltopdf/wkhtmltopdf#3122

@dhimmel
Copy link
Member Author

dhimmel commented Aug 17, 2017

@slochower does increasing the default --javascript-delay from 200 to 2000 milliseconds fix the issue? perhaps you're getting the issue since you have many formulas? Feel free to submit PR.

@slochower
Copy link
Collaborator

slochower commented Aug 17, 2017

@dhimmel that did work! Good intuiting.

I found another issue, however. Inline math seems to be creating new lines in the PDF rendering only. (Below, D is $D$ and there is an unnecessary line break after "That is,"...)
image
Unclear if this is a wkhtmltopdf bug or mathjax, but I guess the former since the HTML looks fine.
Any thoughts? In one issue, the author of wkhtmlpdf suggests changing mathjax options (for a different issue); I'm not even sure if we can set any mathjax options for pandoc, do you know?

Edit: I can confirm that the HTML shows "math inline" as the span.

<p><span class="math inline">\(D\)</span> was chosen to replicate free diffusion in the absence of any energy barriers. That is, <span class="math inline">\(D\)</span> controls how rapidly the landscape is sampled.

@dhimmel
Copy link
Member Author

dhimmel commented Aug 17, 2017

I'm not even sure if we can set any mathjax options for pandoc, do you know?

I don't know how per se, but we should be able to configure it, as we can always insert javascript.

@slochower
Copy link
Collaborator

Good point. Not getting anything relevant by Googling "wkhtmltopdf math line break" or variants thereof.

@slochower
Copy link
Collaborator

A workaround is to use --katex. This fixes the spurious line break issue, but introduces new one. The baseline for math in the HTML using --katex looks right, but the baseline for subscripts in the PDF are too high. This probably requires more investigation, but neither --mathjax or --katex seems to work out of the box for PDF generation.

@slochower
Copy link
Collaborator

slochower commented Aug 18, 2017

My current suspicion for the problem with --mathjax is something to do with the redirect to this page: https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full. You can see this happening if you run wkhtmltopdf --javascript-delay 2000 --debug-javascript tmp.html tmp.pdf for example.

@slochower
Copy link
Collaborator

Reported wkhtmltopdf/wkhtmltopdf#3609

@slochower
Copy link
Collaborator

slochower commented Jun 20, 2018

Although this issue is closed, in case others are wondering about MathJax integration with mhchem, -- as mentioned here -- placing the following javascript at the top of the HTML file, before loading the main MathJax javascript seems to do the trick.

  <script type="text/x-mathjax-config">
  MathJax.Ajax.config.path["mhchem"] =
    "https://cdnjs.cloudflare.com/ajax/libs/mathjax-mhchem/3.3.0";
  MathJax.Hub.Config({
    TeX: {
      extensions: ["[mhchem]/mhchem.js"]
    }
  });
  </script>

For interested users, this could be incorporated into the build script around here:
https://github.com/greenelab/manubot-rootstock/blob/908848162ef68f0606d12880d381f563faf01947/build/build.sh#L41
to automatically insert the Javascript.

Edit: It might also work by putting the script directly into the Markdown and using a pandoc filter, although that seems a little more messy: https://stackoverflow.com/a/42371554

@dhimmel
Copy link
Member Author

dhimmel commented Jun 20, 2018

@slochower nice! Glad you've got mhchem working. It may not be a bad idea to add a file build/assets/mathjax-config.html with your snippet above. Then users could add:

 --include-after-body=build/assets/mathjax-config.html \ 

to build.sh and enable this "fully loaded" MathJax config. Perhaps we can let you play around a bit more to see if there are any other extensions that'd fit well.

@slochower
Copy link
Collaborator

It may not be a bad idea to add a file build/assets/mathjax-config.html with your snippet above.

Good idea, actually.

I made another small tweak, to center non-full-width images in the CSS. I think this looks a bit nicer than left-aligned images. Change img{max-width:100%} to img{display:block;margin-left:auto;margin-right:auto;max-width:100%}on line 464 of (my) github-pandoc.css. I say "my" because it seems the latest CSS is different than my local copy.

I also began to work on a version for GitLab, using their alternative CI build process, but haven't gotten it working yet and it's pretty low on my priority list. But briefly: (a) change TRAVIS environmental variables to GitLab ones in deploy.sh, (b) get rid of the separate pulls because GitLab CI runs with full access to the repository (i.e., not a separate service), (c) use GitLab pages instead of GitHub domains which involves the creation of a directory in the repository called public (so probably should push webpage to a separate branch [as we do] and then clone into public), and (d) add .gitlab.yaml with something like below (although this won't quite do the trick; I haven't fully debugged and not sure when I'll be able to get back to it).

`.gitlab.yaml`
image: python:latest

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache"

cache:
  paths:
    - ci/cache
    
before_script:
  - python -V               # Print out python version for debugging
  - wget https://repo.continuum.io/miniconda/Miniconda3-4.3.31-Linux-x86_64.sh
    --output-document miniconda.sh
  - bash miniconda.sh -b -p $HOME/miniconda
  - export PATH="$HOME/miniconda/bin:$PATH"
  - hash -r
  - conda config --set always_yes yes --set changeps1 no
  - conda info --all
  - apt-get update
  - apt-get install -y locales >/dev/null
  - echo "en_US UTF-8" > /etc/locale.gen
  - locale-gen en_US.UTF-8
  - export LANG=en_US.UTF-8
  - export LANGUAGE=en_US:en
  - export LC_ALL=en_US.UTF-8
  - apt-get install gettext-base

run:
  script:
  - conda env create --quiet --file build/environment.yml
  - source activate manubot
  - sh build/build.sh
 
after_script:
  - sh ci/deploy.sh
  
pages:
  stage: deploy
  script:
  - mkdir .public
  - cp -r webpage .public
  - mv .public public
  artifacts:
    paths:
    - public
  only:
  - master

@dhimmel
Copy link
Member Author

dhimmel commented Jun 20, 2018

I also began to work on a version for GitLab, using their alternative CI build process

Can you move this part to a new comment in #88? It's very helpful, but doesn't belong in this issue.

I made another small tweak, to center non-full-width images in the CSS.

That could be of interest to us, if you want to open a PR we could evaluate whether its something we generally want.

@agitter
Copy link
Member

agitter commented Jun 20, 2018

👍 on centering non-full-width images. That's my personal preference at least.

ploegieku added a commit to ploegieku/2023-functional-homology-paper that referenced this issue Aug 6, 2024
ploegieku added a commit to ploegieku/2023-functional-homology-paper that referenced this issue Aug 6, 2024
Closes manubot/rootstock#55

Also reorder some pandoc CLI arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants