Recently, while working on Formwork I came across one of those bugs that a Software Engineer lives for. What do I mean by that? It was very complicated, quite frustrating at times and felt so great to finally solve. I tried out a bunch of different things, followed some approaches that turned out to be a dead end. Almost gave up in between, thinking maybe we could just live with it, but somehow couldn’t quite give up. I had to force myself to take breaks when I found myself frustratingly deep into it, and even on breaks it was on the back of my mind. I can’t say it was the impact of the bug but rather the puzzle of it.
First the problem: Adding text after opening angle brackets, and not closing that bracket on the same line crashed the tab. After some reproducing, and tinkering with the problem I found that having a space in between the opening angled bracket and the proceeding text didn’t crash the browser. Interesting. This indicated that there might be some bug or something in the editor’s HTML parsing logic.
So the solution I proposed was automatically inserting a space between the unopened angled bracket and the next non-space character to prevent the crash. In my mind, the implementation would look like our other markdown-related fixes. So I got down to coding it. The editor would fire an event when wysiwyg is converted to markdown, and we would rewrite it with the regex I had proudly came up with. Sidenote: the event is called ‘beforeConvertWysiwygToMarkdown’ and that regex is (<)(\w)(?!\s)(?!.*>). Not the most complicated regex ever, but still not a trivial one. The javascript code for replacing is easy, and makes good use of the capture groups in the regex
content.replace(regex_here, "$1 $2")
Except there was one big problem when I got around to implementing it. The crash happens before the event is fired, and since there is a crash forget about any code executing after it.
I had to look for a way to get before the crash. There is another event called ‘change’ which is fired after (you guessed it) any kind of change in the editor. So I moved this code to be fired on the ‘change’ event.
Oh oh, another issue. replacing the content in the change event also fires the change event, regardless of whether the content changed or not. Because of this, I actually rewrote the entire (existing) markdown fixing utility js file to also return if a fix was needed or not. This made our other fixes more performant as we were now more conservative in calling the heavy-weight function to set the content of the editor.
Well, there are two problems with this approach of mine. First, the smaller one: the ‘change’ event is fired too frequently. The editor just fires it willy-nilly. It fires it upon loading the editor the first time, making a mouse selection in the editor, using even the arrow keys on the editor, etc. This means the regex search is called umpteen times, and given our clients have big documents to write and edit I was not exactly happy with this. Second, and this breaks the solution, this approach didn’t work with pasting of broken content. While a change event is fired after a paste, the crash happens before this. This solution worked nicely when the user is typing stuff (although the performance would take a hit). However given that a lot of our users paste large parts of their documents or even the entire documents, this made it almost a no-go.
After this, I was out of ideas. What followed was a period of rumination.
I downloaded the whole source code of the editor and started going through the code to maybe see what to do. I even had the crazy idea of hijacking the paste event, and I followed this road to a length I should not have followed. I wrote an event handler for the paste event, modified the data in the clipboard, and then refired the paste event. The issue was that toast-ui (or more specifically prosemirror which is what toastuieditor depends on) has its own event handler for the paste event. So I looked into how to disable that, and then refire it after. All of this is verging on crazy, which I realized and abandoned this path.
While going through the source code I found that the crash happened during lines:
const state = new ToMdConvertorState(this.toMdConvertors);
let markdownText = state.convertNode(wwNode, this.getInfoForPosSync());
I had previously seen an option to provide a customMarkdownRenderer, and had made an attempt at this but abandoned it because of its lack of documentation. Seriously there is absolutely no documentation of it. Most I could find was a github issue saying let’s add doc for this, which was closed due to inactivity. It’s not trivial to figure out the function either.
I started looking into the source code of this to figure it out, and things looked sort of promising. I still didn’t fully understand what was going on in the toast ui’s code, because it’s pretty darn complicated but I understood enough to give it a try.
I had to attempt it 3-4 times before I got the syntax of it correct, but finally, I had a working solution here. The ‘customMarkdownRenderer’ is indeed called before the crash and it is able to fix the improper structure before it gets a chance to make things go haywire! And it is not called willy-nilly like the ‘change’ event! Double-win.
During this whole process, I rewrote our other markdown fixing code to make it more optimal because I wanted/needed the code in ‘change’ event handler to be optimised, even though I later discarded that event handler entirely. I upgraded the dependencies on prosemirrors, because there was a mini-chance it might fix the issue. So I guess there were some positive side effects of the ordeal.
And now we can say our text editor is actually more robust than the one for the demo. You can go to the demo site, paste in the following line in the wysiwyg mode, then switch to markdown mode and the tab will crash. But the one in Formwork won’t.
<Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.