LLM benchmarks like SWE-bench are not trustworthy
4 minute readIf you believe OpenAI’s marketing, their LLM products are automating an increasingly large fraction of software engineering jobs. They substantiate this, in part, by citing how their products perform against various LLM benchmarks.
Read more…Predictions for 2025
4 minute readFollowing up on my predictions for 2024, here’s a bunch of predictions for this year. As before, all dollar values inflation-adjusted for start of 2025.
Read more…Following up on my 2024 predictions
5 minute readHere’s how I did with my 2024 predictions.
Read more…Below are some notes I took during a town hall organized by Kristen Gonzalez, on the topic of a proposal for a casino in Midtown East (38th to 41st St, between First Ave and FDR).
Read more…Predictions for 2024
2 minute readHere’s a bunch of predictions for next year, along with rough probabilities. All dollar values inflation-adjusted for start of 2024.
Read more…Per a KFF report, the US government spent less than 1.3% of Medicare expenses on overhead in 2021 – the remaining 98.
Read more…Debugging stories: the inconsistent database
3 minute readRecently at work I ran across a bug which I thought was kind of interesting, so I figured I’d write it up.
Read more…Setting this blog up on Github Pages
6 minute readAs a birthday present to myself this year, I finally decided to bite the bullet and quit my Twitter addiction. I signed up for Mastodon, which seems nice so far (you can find me here).
Read more…Catching Us Up to 2018
4 minute readWell, it’s been a year since my last blogpost, which caught us up to 2017. Let’s bring us up to speed again, shall we?
Read more…Catching Us Up to 2017
5 minute readIt’s been a long time since my last real update in July 2014. What’s happened since then? Here’s the highlights, month by month.
Read more…