I'm not a DBA, and I don't play one on TV, however, it always seems like I'm the person on the team who ends up knowing the most about the database structure and the data stored therein. This is definitely one of those curse/blessing situations.
For almost a week, I've been slogging through a series of steps to clean up data, thousands of records in total. I'm almost done finally. I've used scripts that fix hundreds at a time, some that fix single records at a time and wrote a nifty VB.NET DBUpdater console app to do some of the other cleanup.
It's the kind of work that deadens the neurons after a while, so yesterday I took a moment to sit outside and watch the red winged black birds defend their territory against crows and let my brain uncramp. While enjoying the spectacle of nature and relaxing a bit, this brilliant and disturbing thought occurred to me: Applications dirty up databases.
Ok, maybe that's not their sole purpose for existence, but looking at it from a data-centric point of view, it seems to be what they are best at. Databases are designed and initially populated with the most utopian of aspirations: Pure unadulterated data, tables living in harmony with one another, existing in a fantasy bubble where nothing bad ever happens and the data remains true to its initial design forever.
Bah! We know that's not how it stays. Pretty much from the moment we hook an application or data feed into our database, we're inviting junk to come live with us. In every project I've ever been involved in, data clean up took up a noticeable chunk of time and attention. Why does this always happen? Are all applications so poorly designed? I don't think that's necessarily the case, though I'm sure it is sometimes. What really happens is that until we get our application and database out of the lab environment and out into the wild, we don't really know what users are going to do. We try to anticipate every twist, but for very complex applications involving users of various skill levels, we are woefully under-prepared for what is really going to end up in our data tables. So we do massive cleanup projects and look for ways to shore up our defenses against junk entering into our databases, so we don't have to do those cleanup processes again.
I'm not sure what the ultimate solution is for this problem. We can set up our little territory, like those red winged blackbirds, and believe it is safe and secure but crows from the outside world are going to intrude eventually.
Posted by buggy at May 6, 2008 01:15 PM