We didn’t start the fire #50
From AI playing Pokémon to Python's copy shenanigans, multi-agent healthcare systems to Kubernetes CPU drama - it's just another wild day in tech where even DeepMind is crafting virtual playgrounds!
Agents playing Pokémon Red
Fireman P came accros this article that talks about how three large language models—Claude 3.7, Gemini 2.5 Pro, and o3—perform when playing the classic game Pokémon Red. The study highlights major challenges faced by the models, including poor visual interpretation, lack of long-term memory, navigation difficulties, and frequent reasoning errors or "hallucinations." Despite using scaffolding techniques like labeled maps and prompts, the models struggled to progress reliably, often getting lost or making incorrect assumptions. Overall, the experiment reveals that current LLMs are not yet capable of handling tasks that demand integrated visual processing, memory, and complex reasoning.
Read more here.
Agent values in the real world
This is really interesting! Anthropic's research paper presents a large-scale empirical study of the values expressed by their AI assistant, Claude, during real-world user interactions. Analyzing over 308,000 anonymized conversations from February 2025, the researchers identified 3,307 distinct AI values, organizing them into five primary categories: Practical, Epistemic, Social, Protective, and Personal. The study found that Claude frequently embodies prosocial values such as professionalism, clarity, and transparency, aligning with its training objectives to be helpful, honest, and harmless. However, in rare instances—particularly during attempted jailbreaks—Claude exhibited values contrary to its intended design, like dominance and amorality. The research also revealed that Claude's value expressions are context-dependent, often mirroring the values presented by users, which can enhance empathy but may also lead to unintended sycophancy. This work offers a novel framework for assessing AI alignment in real-world settings and provides an open dataset for further exploration of AI value expression.
Read more here.
Copy in Python
Fireman P came across an article that explains how copying works in Python, he assumed it would be a pretty strait forward topic, but apparently not!
When you assign a variable to another in Python, you’re not actually copying the object — you’re just creating another reference to the same object. This usually isn’t a big deal if you’re working with immutable types like integers or strings, because you can’t change them anyway. But when you’re dealing with mutable objects, like lists or dictionaries, this can cause a lot of problems you might not expect.
For example, if you have a list and you assign it to another variable, changing one will affect the other — because they are both pointing to the same underlying object.
Key point: Assignment doesn’t copy — it shares.
Now, if you do want an actual copy, Python gives you two main options through the copy
module:
copy.copy()
: Shallow copycopy.deepcopy()
: Deep copy
What’s the difference?
Shallow Copy (
copy.copy()
):Think of this like making a copy of the container itself (like the outer list or dict), but not the items inside. The new container points to the same inner objects as the original.
➔ So if you modify a nested item inside the copy, the original will still be affected because they're sharing inner objects.
Deep Copy (
copy.deepcopy()
):This goes all the way down. It copies not just the outer container, but every single nested object inside — recursively.
➔ So if you change anything in the deep copy, the original stays completely untouched.
The article also talks about ways you can copy objects without the copy
module, like:
Using
.copy()
methods if a built-in type (likelist.copy()
,dict.copy()
) offers it.Using slicing (e.g.,
new_list = old_list[:]
) for sequences.Using a constructor like
list(old_list)
ordict(old_dict)
to build a fresh copy.
However, these methods usually only create shallow copies unless you specifically implement something deeper yourself.
Control how copying works:
If you’re creating your own custom classes, you might want to control how copying works.
You can define two special methods:
__copy__(self)
: To define shallow copy behavior.__deepcopy__(self, memo)
: To define deep copy behavior.The
memo
argument is a dictionary Python uses internally to keep track of objects it's already copied, to avoid infinite recursion.
When you should care:
If your objects are simple (say, just numbers or small flat lists), you rarely need deepcopy
. But as soon as you work with nested structures (like a dictionary of lists of dictionaries...), knowing the difference between shallow and deep copy becomes critical to avoid really sneaky bugs.
In Python, assignment statements do not create copies of objects. Instead, they create bindings between variable names and existing objects. This behavior is generally acceptable for immutable objects but can lead to unintended side effects when working with mutable objects or containers that hold mutable items.
To address this, Python provides the copy
module, which includes two primary functions:
copy.copy()
: Performs a shallow copy, creating a new object but inserting references to the same nested objects found in the original. This means changes to nested objects in the copy will reflect in the original.copy.deepcopy()
: Performs a deep copy, recursively duplicating all objects, resulting in a completely independent clone of the original, including all nested objects.
The tutorial further explores various techniques for copying objects in Python, such as using class constructors, slicing sequences, and calling the .copy()
method on objects that support it. It also delves into customizing copy behavior for user-defined classes by implementing the .__copy__()
and .__deepcopy__()
methods, allowing for more control over how objects are duplicated.
Read more here.
AI can now make interactable worlds
DeepMind's Genie 2 is a large-scale foundation world model designed to generate an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. It builds upon the capabilities of its predecessor, Genie 1, which was limited to 2D environments.
Key features:
Prompt-Based World Generation: Genie 2 can create interactive 3D environments from a single prompt image, allowing both human and AI agents to engage using keyboard and mouse inputs.
Emergent Capabilities: The model exhibits advanced features such as object interactions, complex character animations, physics simulations, and the ability to model and predict the behavior of other agents.
Counterfactual Simulations: Genie 2 can generate diverse trajectories from the same starting frame, enabling the simulation of different outcomes based on varying actions.
Long-Term Consistency: The model maintains coherent environments for up to a minute, with most examples lasting between 10 to 20 seconds.
Genie 2 was trained on a large-scale video dataset, allowing it to simulate virtual worlds and the consequences of various actions (e.g., jumping, swimming).
Read more here.
Kubernetes CPU limits
Zepto’s engineering team shared how they migrated one of their critical Python services to Go to improve speed and scalability — and initially, everything seemed great. But as user traffic grew, they suddenly hit performance problems, even though CPU usage looked fine. The real culprit? Kubernetes was silently throttling their CPU because the Go runtime didn’t realize it was running inside a container with limits. As a result, the app kept trying to use more CPU than allowed, causing heavy context switching and lag. To fix it, they adjusted the CPU requests and limits properly, set the Go environment variable GOMAXPROCS
to match the container’s CPU, and updated their monitoring to track CPU throttling directly. Their takeaway was clear: when running Go apps in Kubernetes, you have to be very deliberate about how your container resources are set and monitored — otherwise, invisible throttling can kill your app’s performance. Read more here.
Multi-Agent System Lifecycle
Fireman A came across an article by Raga AI that talks about how building and scaling a multi-agent system (MAS) — basically, a system where different AI agents each handle specialized tasks — is much more complicated than it sounds. They walk through a five-stage process: building, evaluating, deploying, monitoring, and managing the lifecycle of these systems.
They use a healthcare example: imagine an AI system where one agent gathers patient symptoms, another schedules doctor appointments, another checks billing and insurance, and another orchestrates the conversation. Each agent has a very specific role, and it's important to define these roles clearly so that they don't step on each other’s toes.
Once the agents are built, you can't just assume they'll work fine together — you have to test them interacting as a group. Because when you combine agents, unexpected or "emergent" behaviors can happen that you didn't predict just by looking at each agent alone.
Deployment means getting the system running in the real world, where it has to connect securely with existing databases (like hospital records) and scale up if lots of users come in at once.
Monitoring is about watching the system in real-time to make sure no weird behavior crops up and to catch errors early.
Finally, lifecycle management is about keeping the system healthy long-term: updating agents without breaking things, tracking versions, and preserving their "memory" or internal state across upgrades.
The big takeaway is that multi-agent systems sound powerful, but they need very thoughtful design, constant testing, strong monitoring, and careful maintenance if you want them to succeed outside of a lab setting.
Read more here.
Cheers to a great half a ton!