Refactor Legacy Code: Reduce DataBags

What is a DataBag? I consider a databag any object that primarily acts as transportation for data.
In C++, this would be a struct. Ignoring any associated functions, the data's exposed and being passed around. Java would represent a databag by a class with primarily get/set methods around data. C# would be almost all Properties in the class.
C#'s upping it's game to make databags even easier to do with Records in C# 8. This is a horible idea that looks like they lifted from Kotlin's Data Class.

Making it easier to create databags makes it harder to encapsulate behavior.

DataBags are one of the deaths of a system.

I to emphasise this point; if you use databags in your code - You are killing the product.

DataBags kill the product.

"But all code has databags" - Yeah, and we have to slog through the crap that it produces to make changes.

The best thing you can do for code you write; never use Getters... I talk about this point a lot - That's the biggest and most impactful thing you can do for the code you write. No Getters.
Since that's been well said; let's move on - Don't write them.

What do we do with our systems that have databags? As titled, this is about Legacy Code. We have to work in this code and try to make it better - and we can do that by removing databags.

I'm currently on a legacy codebase (only two years old) and the reason it's legacy and hard to work in is because it uses databags. It tangles all sorts of areas of the code together.

Databags are threads that get tied into every area of the code they touch. There's no clean refactor of a databag. It's a giant ball of ugly spaghetti.

There are ways. It tends to be slow, and methodical. I'll use C# terms in this as that's what I'm currently invovled in.

Step 1: Get rid of the Getter.

Change the visibility of the getter so that the code breaks.

class User{
    public string PhoneNumber { private get; set;}
}

Step 2: Find the Behavior

Look at the code that uses the getter and isolate what's actual interacting with the data.

public void DoStuff(User user){
    string number = user.PhoneNumber;
    //bunch of code
    number = /* bunch of code to format a phone number */;
    //bunch of code    
}

into

public void DoStuff(User user){
    //bunch of code
    string number = FormatPhoneNumber(user);
    //bunch of code    
}
private string FormatPhoneNumber(User user){
    string number = user.PhoneNumber;
    number = /* bunch of code to format a phone number */;
    return number;
}

Step 3: Move the behavior

We don't want our isolated behavior to exist far away from our data. This makes it so it might need to be done in a whole bunch of places. Move the behavior to the data.

class User{
    public string PhoneNumber { private get; set;}
    
    public string FormatPhoneNumber(User user){
        string number = PhoneNumber;
        number = /* bunch of code to format a phone number */;
        return number;
    }
}

public void DoStuff(User user){
    //bunch of code
    string number = user.FormatPhoneNumber();
    //bunch of code    
}

Step 4: Repeat.

Move all uses of the Getter into the class. It doesn't matter how many methods you end up with; they are all co-located. Similarities will be simpler to see. Refactoring and de-duplication can start to happen.

End

Get rid of Getters and your code will be simpler.

That's what I spend a bit of time doing; pulling all data usages into the object to expose ONLY behavior.

It's a few super simple steps that will keep all tests passing. That's why the Setter gets to stick around. There's a little more complexity around removing Setters, but the same idea exists.

Using these simple steps to ditch the getter removes complexity in the system.