writing

my thoughts and blabberings.

Deep RL Diary: Implementing Policy Gradients on CartPole-v1 from Scratch

In reinforcement learning, policy gradient methods directly parameterize the policy and optimize it to maximize the expected cumulative reward . While mathematically elegant, policy gradient methods are notoriously sensitive to hyperparameters, prone to high-variance gradient estimates, and vulnera...

2026-06-26

CS234 Assignment 1, Part 1: Horizon, Discounting, and Reward Hacking

A few weeks ago I started working through Stanford's CS234 course on Reinforcement Learning. I have been meaning to get into RL properly for a while and decided that working through a real course with real problem sets was the best way to do it. This post covers my solutions to Problems 1 and 2 from...

2026-03-08

CS234 Lecture Notes: Foundations of RL and MDPs (Lectures 1 & 2)

I have been working through Stanford's CS234 course on Reinforcement Learning, taught by Professor Emma Brunskill, as part of building a solid theoretical foundation in RL. These are my notes from the first two lectures. Lecture 1 covers the framing of RL and builds up to Markov Reward Processes. Le...

2026-03-08

Translating RL Math into JAX: What I Learned Working Through 10 Problems

I've been trying to get more serious about reinforcement learning, not just the conceptual side but actually being able to implement things from papers. One thing I kept running into is the gap between reading an equation in a paper and knowing what to do with it in code. You see a summation with so...

2026-03-08

Building a Unix Shell Part 2: Background Processes and the C++ Migration

In Part 1https://boringblog.vercel.app/posts/building-a-unix-shell-a-deep-dive-into-process-management-part-1, we built a basic shell capable of executing commands using fork and exec. It worked, but it had a major limitation: it was strictly synchronous. You ran a command, the shell froze, and you ...

2025-12-30

1 / 3