Number Duck began its life as a specialized C++ library for reading and writing Excel files. While the library has been useful to developers, being trapped in one language limited the ammount I could reach. Especially when the majority of business focused developers will be on languages like C# or Java. So with that simple problem, I set out on far too long of a journey to transform Number Duck from a traditional C++ codebase into one that could support multiple languages, without requiring runtime dependencies or wrappers or somesuch.
My first attempt at cross-language compatibility leveraged Clang, an open source C++ compiler. The concept was straightforward: parse the C++ code into an Abstract Syntax Tree (AST), then write a custom program to convert that AST back into C++ and also C# source code.
Being able to qickly get the AST was useful, as I was able to focus on the main issue with porting, low-level direct memory accesses, which were leveraged extensively in C++ but problematic in memory-safe languages like C#. The older Excel BIFF8 XLS file format often required intricate binary manipulation, and Number Duck relied heavily on pointer arithmetic and direct memory operations. Such code cannot be directly translated to C# due to its memory safety guarantees, so I needed a more sophisticated approach.
So I created an abstraction layer for memory operations, and created matching versions of it in C++ and C#, then over several months I set about updating all the existing C++ to use it instead of accessing memory directly. So now when it is transpiled to C#, the matching memory operation class can be used and the code can run safely.
There were a few other abstraction layers needed, but those were mostly around having unified access to third party xml and zip libraries that are different on each platform.
While this worked, fairly quickly I came to the conclusion that that relying on Clang's AST would be a longer term limitation. Things were a bit hacking pulling out the right parts of the tree I needed and I also wanted to introduce some language constructs that weren't expressible in standard C++, particularly around the concept of "owned" pointers.
I investigated existing transpilation solutions, but each had drawbacks, such as adding runtime dependencies to the generated code or not being able to target the languages I wanted. Overall they were too advanced where I just wanted something very simple, but with a dusting of magic around memory management.
So then, armed witha copy of Crafting Interpreters, I started to create a custom language designed specifically for this purpose. A language that basically looks like a very limited C#, with classes and generic vectors.
The initial transpiler was written in C++, parsing my custom language and generating both C++ and C# output. This worked well enough to validate the approach, processing simple test source files. But the true test would come next, I started rewriting the transpiler in it's own language, making it fully self hosting.
This self hosting approach had several advantages:
It was mostly a straightforward process, I just made sure to keep a stable and unstable set of sources, so if I broke something in unstable, stable was still available to fix and recover.
Once I had a mostly functionling transpiler, I moved onto porting Number Duck to the new language. Because I had already addressed the issues with memory management for the initial Clang compiler, this translation was mostly straightforward, but still took many months. Rather than attempting a "big bang" conversion, I opted for an incremental file-by-file approach. Converting one file at a time to the new language, then transpiling it back to C++ to replace the original. I could then rebuild the library and run tests with the new code, confirming it matched the originals functionality, before moving to the next file.
This way the libray was always fully functional, so whenever something broke, it was always in a handful of files and easy to correct. Compare to converting everyhting and only testing at the end and wondering why it's crashing...
Having a large existing test suite for Number Duck proved invaluable during this process. Each transpiled file had to pass the same tests as the original C++ code, ensuring functional equivalence across the transition.
Once all files were converted to the custom language, I focused on optimizing the transpilation output for the first v3 release of Number Duck. Mainly around amalgmating the 200 or so input files into into one or two output files, so it's easy to drop in to an existing project. This amalgamated build also injects the cross platform code, such as the abstraciton classes mentioned earlier, and also any third party code, such as TinyXML2 for C++.
Overall it's still a bit of a mess, but it all works and is mostly asthectic issues I can resolve in the future.
Like most programming tasks, I don't think this was really a technically difficult process. The idea is straightforward, it all comes down to how much one is willing to grind and suffer to see it through. Just make sure you have a lot of safety rails like tests and build automation to make sure nothing is broken, and you can do anyhting.
Initially I felt that this was not rewriting from scratch since I kept the source working the whole time, so Joel Spoolsky woudn't be able to mock me. But in hindsight, during the time spent focusing on this porting, no new releases were made, so the outcome was the same and Joel is free to mock me (plz do, I could use the publicity). To be fair I did also have a day job so my development speed is limited.
Most importantly, Number Duck is now multilingual, and has a somewhat straightforward path to extending to more languages in the future. And from here, each new feature will be implemented once in one language and automatically transpiled to all target platforms, keeping things consistent and and saving me from maintaing parallel implementations.
Previous: Number Duck Retrospective: Lessons Learned in Building an Excel Library