Translate Fortran to C++ with AI and RAG

43 points by LosAlamosNerd 2 months ago

I'm trying to think of a reason this couldn't be done more directly with a pretty run-of-the-mill transpiler. Like I understand if this is a technical demo and there is a LOT of Fortran code, but...?

I've actually had to do this with a couple of different Fortran projects when I was in college, I translated them to C for various reasons.

Maybe it's because it was specifically code written by scientists (i.e somewhat brute force and very straightforward) but there really wasn't very many features that I can recall that didn't have a direct C counterpart, other than column major ordering and arrays staring at 1.

Was I just blissfully unaware?

AndrewGaspar 2 months ago

What you want isn't really "output C++ code that is pedantically equivalent to this Fortran code but with the array indexing fixed up", it's usually more like "translate this Fortran module into something that I can offload to a GPU using CUDA/ROCm/etc. with the same high level semantics, but GPU-friendly low level optimizations", and the exact composition of those low level bits probably don't look exactly like a loop-by-loop translation.
- pjmlp 2 months ago
  
  CUDA supports Fortran, is one of the reasons OpenCL lost.
- jeffbee 2 months ago
  
  Doesn't CUDA handle FORTRAN as such?
IshKebab 2 months ago

Yeah I've used FORTRAN to C transpilers before too and they worked fine. There were some downsides though like it has to add and subtract 1 everywhere to deal with FORTRAN's 1-based indexing.
In theory AI could do a more idiomatic translation, but I think I would still prefer the janky but correct translation over the looks nice but probably subtly buggy AI one.
SanjayMehta 2 months ago

I don’t know what the transpiled code would look like vs that rendered by an LLM, but maybe the hope is that latter will be more readable?
- int_19h 2 months ago
  
  Yeah, but there's also N% chance that it's silently wrong.

nevi-me 2 months ago

Microsoft demoed a version of their GraphRAG that translated C code to (I believe) mostly idiomatic Rust, and it ran without errors.

I tried to find reference to how they did it, does anyone know?

It sounds like this approach of translating old code could help speed up teams that are looking at rewrites. I also have some old code that's in Kotlin that I'd like to move to something else. I had a bad NullPointerException take down my application after missing a breaking change in a Kotlin update.

vlovich123 2 months ago

From the looks of [1] they have a graph DB storing the code structure and acting as the RAG for an LLM.
[1] https://microsoft.github.io/graphrag/
reve893 2 months ago

In my experience such methods work really well for small/simple code snippets. Attempting large code snippets is not functional.
vrighter 2 months ago

I do not believe it ran without errors on all cases.
npalli 2 months ago

[flagged]
- nevi-me 2 months ago
  
  I mentioned neither of the things you're shouting at me about.
  > I also have some old code that's in Kotlin that I'd like to move to something else.
  Something else was left unmentioned because I'm not even talking about Rust. My reference to it was that I've seen the approach before where a RAG is used to aid in translating C code, and it's an interesting thing which with more examples, might be easier to non-experts like me.
  Translating languages is of great interest to various communities. I have friends stuck with a Scala codebase written by geniuses who are no longer around, and they want to move it to something else that the team is comfortable with.
  - npalli 2 months ago
    
    My bad. Sorry. Didn't read the full comment and you didn't deserve the accusations from me.
- vlovich123 2 months ago
  
  Or he’s just mentioning the other major transpiler he’s heard of recently that happens to be C to Rust and wondering how it works and if it could be adapted to other language pairs. You’re the one that’s taken the conversation in a super weird direction all by your lonesome.

musicale 2 months ago

Good idea. I'd much rather write

   do concurrent (i = 1:n) 
     y(i) = y(i) + a*x(i)
   enddo

and then let the a compiler translate it into

    std::transform(par, x, x+n, y, y,
      [=](float x, float y){ return y + a*x; }
    );

if C++ is required for some reason.

WalterBright 2 months ago

A member of our community accidentally discovered that the D compiler could translate C code to D code.

D has a module system, where you import a module and it pulls the global declarations out of it. To speed up this process, there's a compiler switch to output just the global declarations to another file, a .di file, that functions much like a .h file in C.

Then there came along ImportC, where a C lexer/parser was welded onto the D compiler logic.

aaaand it wasn't long before the switch was thrown to generate a .di file, and voila, the C code got translated to D!

This resulted in making it dramatically easier to use existing C headers and source files with D.

jll29 2 months ago

As a slight tangent, a re-write in another language is also an opportunity for the human engineer to re-design parts of the software that was clunky before or so that in the new target language idioms can be used.

Using automatic tools - whether AI-based or transpilers - leaves that opportunity unused, and both approaches are likely to create some additional technical debt (errors in translation, odd, non-idiomatic ways of doing things introduced by the automatism etc.).

Surac 2 months ago

there is no place for AI or C++ in this game. Just use a Fortran to C Transpiler . But i get it anything AI sounds modern and C++ because of reasons

almostgotcaught 2 months ago

What is the point of this? Fortran is both faster than cpp and easier to write than cpp. It's also by no means a dead or dying or whatever language. Smells like literally "your scientists were so busy they forgot to ask why".

jandrewrogers 2 months ago

Seems pretty obvious to me, and I’ve written my fair share of both Fortran and C++. I think it is mostly that very few people know Fortran anymore and even fewer people want to maintain it. A vast number of people in 2025 will happily work in C++ and are skilled at it.
Fortran also hasn’t been faster than C++ for a very long time. This was demonstrable even back when I worked in HPC, and Fortran can be quite a bit worse for some useful types of performance engineering. The only reason we continued to use it was that a lot of legacy numerics code was written in it. Almost all new code was written in C++ because it was easier to maintain. I actually worked in Fortran before I worked in HPC, it was already dying in HPC by the time I got there. Nothing has changed in the interim. If anything, C++ is a much stronger language today than it was back then.
- pjmlp 2 months ago
  
  Fortran is still quite modern despite its age, and relevant enough that not only has one of the success factors of CUDA, the availability of Fortran on the CUDA SDK, LLVM project also started a Fortran frontend project.
  Also to me seems more likely that people that enjoy Fortran in HPC are more likely to change to Chapel than use C++.
- walleeee 2 months ago
  
  > Almost all new code was written in C++ because it was easier to maintain.
  What makes you say so? See musicale's comment above. I have a hard time seeing C++ as easier to maintain, if we are just talking about the language. The ecosystem is a different story.
  - jandrewrogers 2 months ago
    
    For pure number crunching, Fortran is nicer. Unfortunately, performance for most codes became about optimizing memory bandwidth utilization at some point in the 2000s, and systems languages are much more effective at expressing that in a sane way. It was easier to make C/C++ do numerics code than to make Fortran do systems code. Some popular HPC workloads were also quite light on numerics code generally, being more about parallelization of data processing.
    This was before modern C++ existed, so most of the code was “C with classes” style C++. If you can read C then you can read that code. I don’t consider that to be particularly maintainable by modern standards but neither is Fortran.
    Modern C++ dialects, on the other hand, are much more maintainable than either. Better results with a fraction of the code. The article doesn’t say but I would expect at least idiomatic C++11, if not later dialects.
jabl 2 months ago

Some people at LANL seem to be on a holy crusade to replace Fortran with C++. They occasionally produce stuff like papers saying Fortran is dying and whatever. Perhaps it makes sense for their in-house applications and libraries, but one shouldn't read too much into it outside their own bubble.
jeffbee 2 months ago

I wonder if they feel that the toolchains are just rotting.
- almostgotcaught 2 months ago
  
  But they're not
  https://github.com/flang-compiler/flang
  https://flang.llvm.org/docs/
- xvilka 2 months ago
  
  Unlike C++, they even got their standard package manager - fpm[1].
  [1] https://fpm.fortran-lang.org/
  - pjmlp 2 months ago
    
    It is as standard as vcpkg and conan.
    ISO Fortran does not acknowledge the existences of FPM, just like any programming language ISO standard, the ecosystem is not part of the standard.
mkoubaa 2 months ago

This. If someone can't correctly articulate the advantages of Fortran they shouldn't be migrating away from it. This is not to say that migrations should never happen.
- greenavocado 2 months ago
  
  Chesterton's Fence

npalli 2 months ago

This is great effort, wonder how it compares to Fortran2Cpp

https://github.com/HPC-Fortran2CPP/Fortran2Cpp

pankajdoharey 2 months ago

LLM as translators for Cobol code to Java or Go should be attempted. And Shut down the IBM mainframe rent seek business for good permanently.

jabl 2 months ago

The soon to be GCC 15 release will contain a COBOL frontend. Also other non mainframe compilers have existed for a long time, both proprietary and FOSS.
Thus, availability of a compiler is but a small piece of the puzzle. The real problem is the spider web of dependencies on the mainframe environment, as the enterprises business processes have been intertwined into the mainframe system over decades.
- pankajdoharey 2 months ago
  
  Which is why i think cross compiling to other dependencies and porting to other languages is a better solution. Many of these dependencies could be hardware specific. As long as core business solutions could be ported would be a win for everyone stuck in decades of vendor lockin.
  - alexchamberlain 2 months ago
    
    I think the point was you could do that in COBOL; the vendor lock in won't go away just because you change language- it goes away when you decide to refactor the code to vendor agnostic solutions.
    
    pankajdoharey 2 months ago
    
    You're absolutely right that switching languages alone doesn't solve the problem. The real issue isn't COBOL itself but the deep entanglement of business logic with the mainframe ecosystem, things like CICS, IMS, and even the way data is stored and processed. But I still think there's a path forward, and I’ll share a thought experiment based on my experience working alongside colleagues who’ve spent years maintaining these systems.
    I’ve seen firsthand how much frustration COBOL can cause. Many of my colleagues didn’t enjoy writing it, they stuck with it because it paid well, not because they loved the work. The language itself isn’t the hard part; it’s the decades of accumulated technical debt and the sheer complexity of the environment. Over time, these systems become so intertwined with business processes that untangling them feels impossible. But what if we approached it incrementally?
    Imagine taking an existing COBOL codebase, say, for a large insurance system and identifying the core business logic buried within it. These are the rules and conditions that power critical operations, like calculating premiums or processing claims. Now, instead of trying to rewrite everything at once, you build a parallel backend in a modern language like Java or Go. You don’t aim for a literal translation of the COBOL code, you focus on replicating the functionality in a way that makes sense in a modern context. For example, replace hardcoded file operations with database calls, or screen based interactions with REST APIs.
    Most mainframe customers already use middleware like MuleSoft or IBM Z/OS Connect to route requests to both systems simultaneously. For every write operation, you update both the mainframe’s DB2 database and a modern relational database like Postgres. For every read operation, you compare the results from both systems. If there’s a discrepancy, you flag it for investigation. Over time, as you handle more and more business scenarios, you’d start covering all the edge cases. This dual system approach lets you validate the new backend without risking critical operations.
    Of course, this process isn’t without its struggles. Testing is a huge challenge because mainframe systems often rely on implicit behaviors that aren’t documented anywhere. My colleagues used to joke that the only way to understand some parts of the system was to run it and see what happened. That’s why rigorous testing and monitoring are essential you need to catch discrepancies early before they cause problems. There’s also the cultural side of things. People get attached to their mainframes, especially when they’ve been running reliably for decades. Convincing stakeholders to invest in a multi year migration effort requires strong leadership and a clear case for ROI.
    But I think the effort is worth it. Moving off the mainframe isn’t just about saving money though that’s a big part of it. It’s about future proofing your organization. Mainframes are great at what they do, but they’re also a bottleneck when it comes to innovation. Want to integrate with a third party service? Good luck. Need to hire new developers? Most of them have never touched COBOL. By transitioning to a modern platform, you open up opportunities to innovate faster, integrate with other systems more easily, and attract talent who can actually work on your codebase.
    In the end, this isn’t a quick fix it’s a long term strategy. But I believe it’s achievable if you take it step by step. Start small, validate constantly, and gradually build up to a full replacement. What do others think? Are there better ways to tackle this problem, or am I missing something obvious?
    
    alexchamberlain 2 months ago
    
    I don't think you're missing anything fundamental, but I've worked with systems written in Fortran, C, C++ and Python that have the same problems. I suspect the systems I'm working on in Python & Rust will have the same issues if they last 10+ years.
creatonez 2 months ago

No, not for the foreseeable future. In fact, this is the absolute hardest possible code translation task you can give an LLM.
COBOL varies greatly, the dialect depends on the mainframe. Chatbots will get quite confused about this. AI training data doesn't have much true COBOL, the internet is polluted with GnuCOBOL which is a mismash of a bunch of different dialects, minus all the things that make a mainframe a mainframe. So it will assume the COBOL code is more modern than it is. In terms of generating COBOL (e.g. for adding some debugging code to an existing system to analyze its behavior) it won't be able to stay within the 80 column limit due to tokenization, it will just be riddled with syntax errors.
Data matters, and mainframes have a rather specific way they store and retrieve data. Just operating the mainframe to get the data out of an old system and into a new database in a workable & well-architected format will be its own chore.
Finally, the reason these systems haven't been ported is because requirements for how the system needs to work are tight. The COBOL holdouts are exclusively financial, government, and healthcare -- no one else is stuck on old mainframes for any other reason. The new system to replace it needs to exactly match the behavior of the old system, the developer has to know how to figure out the exact confines of the laws and regulations or they are not qualified to do the task of porting it. All an LLM will do is hallucinate a new set of requirements and ignore the old ones. And aside from just knowing the requirements on paper, you'd need to spend a good chunk of time just checking what the existing system is even doing, because there will be plenty of surprises in such an old system.
pjmlp 2 months ago

There are COBOL compilers that target JVM and .NET for as long as these technologies exist.
There are also modern compilers to IBM mainframes, including Go, C++, Java, PHP,..
Also outside DevOps and CNCF application space very few people bother with Go, specially not the kind of customers that buy IBM mainframes.
KaiserPro 2 months ago

Apart from cobol is only part of the reason for running on a mainframe. The other part is the orchestration and "resilience" of the mainframe platform
You can run cobol on x86, there are at least two compilers.
- pankajdoharey 2 months ago
  
  Resilience == redundancy, it has been successfully replicated by almost every organisation without mainframes. M-MANGA (Meta, Microsoft, Apple, Netflix, Google, Amazon) Infrastructure is quite resilient.
  - KaiserPro 2 months ago
    
    > it has been successfully replicated by almost every organisation without mainframes
    yeah but how much did it cost?
    At a large financial news company I worked for, the AWS opex was £4m, All to put text on a page. You know, a solved problem. We spent years fucking about with fleet to make something "resilient". (then early, shitty k8s)
    The opex for FAANG is astronomical. facebook spends something like 40 billion a year on infra. Thats without staffing costs. (manifold, the S3 clone, is a bastard to use.)
    My point is, which I should make more obvious: if you buy the right mainframe, you can literally blow one of them up, and not loose any uptime. Yes, its expensive.
    But, hiring a team to fuck about with k8s on AWS is going to cost more, and never be finished, because true redundant systems are hard to design and build, even by distributed systems experts.
    
    pankajdoharey 2 months ago
    
    So you are arguing that renting Mainframe is cheaper than datacenter hardware?

hulitu 2 months ago

> Translate Fortran to C++ with AI and RAG

f2c ? But yeah, 1 level of abstraction sucks. We need around 10 to be satisfied.

cpgxiii 2 months ago

f2c produces pretty sketchy C code. It's very easy for reasonable thread-safe Fortran code to go through f2c and end up as C code with globals and other thread-unsafe constructs. You have to be prepared to completely rewrite the generated C code to make it usable, possibly more unpleasantly than just doing the port by hand.