JM

Wordup

November 5, 2018

I built this tool called Wordup to convert content from Word documents into HTML or Markdown.

It uses the built-in paste tools of CKEditor 4 and sprinkles in some extra vanilla JS for whitespace reduction and string replacement to spit out clean HTML. Check a box and Turndown.js converts to Markdown.

Wordup screenshot

But converting Word documents to HTML is a solved problem, right?

  1. Search for “Word to HTML conversion”
  2. Search for “Word to clean HTML conversion”
  3. Consider pasting Word document contents into mystery text boxes on several online conversion tools
  4. Wonder how these tools actually work
  5. Wonder where my content will be sent off to
  6. Close browser, open Word document
  7. Copy content, paste into text editor
  8. Begin wrapping text in HTML tags
  9. Give up
  10. Open Word document, save as HTML
  11. Open HTML in text editor
  12. Cry a little
  13. Start using find and replace to remove extra markup
  14. Graduate to regex searches
  15. Eventually arrive at relatively clean HTML
  16. Realize that you’ve been sent an updated version of the Word document while working through steps 1-15
  17. Cry a little

WYSIWYG editors in most CMS platforms deal with pasting Word documents, right?

  1. Search for JavaScript-based WYSIWYG editors
  2. Pick one
  3. Create a HTML page with two <textarea> fields
  4. Hook WYSIWYG editor into first <textarea>
  5. Read documentation
  6. Figure out how to get converted text out of WYSIWYG <textarea> and into second <textarea> as HTML
  7. Notice converted HTML still needs some love
  8. Write additional whitespace and string replacement rules to send converted text through
  9. End up with really clean HTML in the second <textarea>
  10. Cry a little
  11. Wonder what else you can do
  12. Add markdown conversion and link helpers
  13. Tell people about it

See the code: https://github.com/communicatehealth/wordup