26 Mar 2008

Yet Another Idea for GSoC - New Datatypes

Well, as most of you may know, Google Summer of Code is nearing and I have seen many
MySQL blogs urging people to contribute ideas or mentor projects.

My Idea
So I'm going to suggest an idea and as the title already mentions, I think it would be a good idea to add some new Datatypes. Specifically, a new Email datatype.

Now I already hear you saying "Email is a string, why would you need a datatype for it?", but hear me out.

So basically, I personally, have been dealing a lot with trying to clean email addresses and customer data so that my company can help their biggest client use this data in their new CRM system.

I thought it would have been a good idea to implement an email datatype inside MySQL that already has the standard regex for emails and has a defined varchar for the standard of emails.
Later, I became aware that there can be different types of emails addresses.
For example, you can have the standard latin a-z 0-9 with . and _ in them.
You can have an email with the european a u o with the 2 dots above them.
You can also have emails with characters like / in them for example 24/7@domain.com (someone that posted on my blog mentioned that one).

To solve this, you could have an Email datatype like this:

The Benefits
So what would be the benefits of using this kind of datatype?

  • Its out of the box solution for a common problem.
    • MySQL is used by many many websites and most of them record email addresses, so its relevant to them.
    • You wouldn't need to develop something yourself, go look at MySQL forge for snippets, download and use any libarys, even bother to read about it. Its just there provided for you in the database.
  • It helps improve data quality. After you have many tens of thousands of emails, it would have been nice to have more data quality. Thats not to say that you can develop some solution on the front end with Javascript for example, but it would still be nice to just have it in the database.
  • You wouldn't need to write webinars telling people how to optimize or shrink their email field in the database. Maybe even you can store it better on the harddisk for queries if you know it will be used for email address, who knows.

The downside would be, that you will get a database error if you didn't implement any checks when the user is entering their email address on their website. For this you might want to make your email field a string and worry about it later. But I assure you, later will be for more annoying.

More Options
Now, you can also say "Well, you can also have a datatype for phone". Well, you can.
Maybe you can have a phone datatype like this:

Depends what you want, but I'd like to start with email addresses. Maybe you can throw in there ip(raw), ip(full)...
I saw a webiner from Jay Pipes writing about that to improve query lookup speed specifically for ip addresses.

I think it will be useful for many MySQL users to have an out-of-the-box solution for a very generic problem in the form of new datatypes that come with their own validation rules.

Please write your thoughts in the comments. Your opinion is valued.

Thank you for reading my blog.