How I Ported a Python Astronomy Library to Go with AI

AI-assisted port of Skyfield (Python) to a Go ephemeris library. Validation-first approach, golden tests, and bug hunts.

Photo by Shaah Shahidh / Unsplash

TL;DR

AI-assisted port of Skyfield (Python) to a Go ephemeris library, validated against Skyfield across long time ranges with strict tolerances.
First working version in about a day, then shipped and integrated into my pipeline shortly after.

A messy pipeline that worked, but kept irritating me

For a while, my astronomy/ephemeris workflow worked, but it was also clunky and took more effort than it deserved:

I used Skyfield (Python) to generate astronomical outputs.
I dumped huge CSVs (multiple GB) because they were easy to debug.
Then I read those files in Go and ran my heavier downstream computations locally.

Every ~3 months, I had to rerun the whole thing (30–60 minutes on my machine). If I forgot, my outputs went stale, and I’d end up hacking around, caching, and “wait, is this the latest file?” confusion.

I should have used Parquet earlier, but I didn’t. CSV won because it was simple and easy to see.

This was a personal use case, so I didn’t put much effort into formalising it as a production-grade system.

The real goal: eliminate the Python middleman

What I wanted was simple: do the ephemeris computations end-to-end in Go.

I considered wrapping Skyfield behind an HTTP service, but that just moves the complexity around. I also looked at existing Go libraries, but I wasn’t confident in them. And Skyfield’s outputs already worked extremely well for my use case, so I didn’t feel pressure to switch unless I could keep the same correctness while simplifying the workflow.

So the idea that kept coming back was the hard one: port the parts of Skyfield I need into Go.

The problem: I’m not a hobbyist astronomer. I do have some foundational astronomy knowledge, but nowhere near “I can write an ephemeris library from scratch.”

Why AI changed the math of effort

I’d had the “port Skyfield to Go” thought earlier, too, and I’d even discussed it with ChatGPT for a detailed plan. But it felt like a long project, so it kept getting deprioritised.

Then I experienced Claude Opus’s ability to debug a deep-determinism issue in another project. It wasn’t just autocomplete. It could actually reason, stay coherent, and land fixes that would normally take me hours.

That brought the thought back in a slightly more reckless form:

If AI can help me debug something deep, can it help me port something I don’t fully understand?

So I treated this as a weekend experiment.

Step 1: a proof of concept (POC) with ruthless validation

I started small: I didn’t try to “port Skyfield.” I tried to port my Skyfield usage.

I had a huge advantage: I already had Python scripts and known-good outputs. That meant I could set up a basic rule:

Every Go output must match Skyfield output within strict tolerances.

The first POC surprised me. Within a few hours, I had Go code that produced numbers similar to those from my Python pipeline (within tolerance).

And it was fast, just as I had expected a Go port to work out. That speed had nothing to do with AI and everything to do with Go.

At that point, I could’ve stopped. But I didn’t.

Step 2: turn the POC into a real library (tier-by-tier)

On day two, I decided to go from “my script works” to “this is a library I can keep using.”

I used a simple pattern to keep the project from collapsing under complexity:

Use multiple AI tools to break the full library port down into a plan.
Ask Claude to consolidate that into a single, coherent plan so I could spot gaps.
Then, I handed Claude small chunks to implement, nudging it as needed.
Keep refining the plan and moving from simplest → hardest.

Claude helped structure the features into tiers, and I kept pushing one rule:

Don’t make tests pass by loosening tolerances.

The key technique: Validation-driven development

Unit tests matter, but I use them in a very practical way. I often ask AI to write unit tests so the assumptions are baked into the tests. The test suites can get so large that I sometimes skip reviewing parts of them, but they still help in a specific way.

When the codebase grows, tests make it easier to see whether old assumptions still hold when new code lands. They also make code review easier because they serve as a checklist for behaviour that used to be true.

One more caveat. AI can also write unit tests that neatly fit its own code, even when the underlying behaviour is wrong. That’s why I don’t treat AI-written unit tests as the final proof.

So I leaned hard on validation scripts. The AI suggested calling them “golden tests,” and I kept the name because it fit:

Generate outputs from Skyfield.
Generate outputs from Go.
Compare across long spans of time (hundreds of years) and multiple bodies.

This is also how I avoided pretending I understood every equation. I didn’t.

I focused on:

Architecture that stayed readable.
Test harnesses that caught drift.
Strict tolerances.
Repeatable validation runs.

The hardest moment: where astronomy fought back

At one point, velocities/distances were way off. The easiest “AI move” was to raise tolerance until the test passed.

That’s exactly what I refused to allow.

Instead, I kept forcing the workflow back to:

isolate the discrepancy
find the math bug
re-run validations

Claude eventually dug into it and found and fixed a matrix multiplication issue in the astronomy math path, one of those bugs you don’t see unless you have a reference implementation.

Right around the same phase, the other big rabbit hole was nutation. Claude tracked down the nutation issue and fixed the sign and convention mismatch. Claude took hours to solve it, and right when it felt like Claude was stuck and unable to proceed, it found the path through. After that, the remaining error fell within the tolerance I cared about.

I didn’t magically become an astronomer during this. But trying to answer the AI’s questions did push me to expand my understanding, enough to keep the project anchored.

The outcome: `goeph`

The result is goeph, a Go ephemeris toolkit validated against Skyfield.

Practically, it let me:

stop generating giant CSV dumps
keep the whole pipeline in Go
rerun computations on demand without the “3-month batch job” feeling

And yes, I did ship it fast. The first working version came together within roughly 24–28 hours of having that reckless thought. This is the kind of project I would have delayed (and possibly never done) because it always loses to more urgent logistical priorities. The “full” tiered structure rolled out over about 3 days, including docs, cleanup, and release tagging. In the next few hours, it was fully integrated into my pipeline.

All of that happened with a limited understanding of astronomy on my side. Will it come back to bite me later? Time will tell.

The uncomfortable truth: I didn’t line-by-line review the math code

This is the part some people will (fairly) criticise. Like in my other AI-assisted projects, I didn’t sit and manually review every line of advanced astronomical computation.

And honestly, I couldn’t. I couldn’t confidently say which computations were correct and which weren’t, line by line. Verifying every corner against the Skyfield library would also have been painfully slow for me.

My prior experience with Opus 4.6 also mattered here. I repeatedly felt like I was becoming the blocker. Reviews are important, but in this case, I had strong validation checks that would surface any issues, and I had already seen the model’s deep debugging capabilities in action in another project.

But here’s what I did do:

compare outputs against a trusted implementation
keep tolerances tight
validate across long time ranges
treat tests as the real contract

In other words, I judged the system by its outputs, just like a consumer would.

What I learned (and what I’d do again)

1) Start with a small, measurable POC

Don’t port a library. Port your use case first.

2) Validation beats vibes

For complex code, “it looks right” is meaningless. Build a harness that proves correctness against a reference.

3) Planning matters more

From another project, I experienced something important. AI can get you moving fast, but it won’t automatically architect the system the way you would. Planning is how you keep the code from drifting.

AI can help with planning, too, but for implementation, it helps to keep nudging it and give it small parts to work with.

4) AI needs constraints more than it needs prompts

The moment you allow tolerance creep, you’re not validating, you’re negotiating.

5) Tier your plan

Simple to hard is not just project management advice. It’s how you keep the model from thrashing and keep your own head clear.

Links/references

GoEph (Go): https://github.com/anupshinde/goeph
Skyfield (Python): https://github.com/skyfielders/python-skyfield

If this reads a bit AI-assisted, you’re right. My workflow is to "speak" the post, let AI draft it, then I revise it.

I explain the process here: Write Faster with AI While Keeping It Yours.