22 Sept 2009

MySQL2GoogleSpreadsheets

I've managed to find a way to connect MySQL directly to Google Spreadsheets and although that its not yet perfect, it does show a lot of potential.

You will have a MySQL table directly fed into Google Spreadsheets. From there, you could do some charts, highlights some trends or simply share the data as is with our people in our out of your organization is a secure way.

The end result should look something like this:


What you will need:
A Linux Server with Apache and MySQL
A Google Apps Premier account
Google Secure Data Connector installed on your Linux server


SDC
Installing the SDC is a bit tricky but not too difficult and there is a lot of documentation as to how to do it. A lot of it is giving things the right permissions and configuring 2-3 XML files.

You can read more about how SDC works, but from my impressions, it runs on your Linux server and gets data from apache. It then sends that data in a secure way into Google Spreadsheets. So when you query your data, the URL you use is "localhost" which is as if you are on the server.

Apache
Again, its important to note that everything that SDC gets has to go through Apache, which gives you the option of static data (in the form of CSV or XML files) or dynamic data (from PHP for example). It also means that you should secure Apache to not give the whole world this data. You can do this with OAuth (and there are a lot of instructions as to how to secure Apache with SDC in the SDC documentation) or you could try restricting Apache to know that it will only server 127.0.0.1 since SDC runs on your server. Either way, you need to factor that in.

MySQL
The question you must be asking now is, since SDC connects to Apache, how to I get data out of MySQL directly? (hopefully in the future SDC might be able to query MySQL directly)

In order to achieve that, I have used MySQL CSV Storage Engine, which basically means that MySQL will store all the data from the table into a CSV file on the hard disk (I hope that you are already imagining the possibilities here).
All you would need to do is update that table with an INSERT or UPDATE statement and the CSV file gets updated.

Symlink
Now we need to connect that CSV file to Apache. For that, we can use a Symbolic Link in Linux and connect the CSV into Apache by placing the link into the directory Apache reads files from (where ever your index.html file is, usually /var/www/html).
Now another thing to remember is that you need to give Apache rights to see the MySQL directory and be careful about doing that. An option could be to create a new database which would create a new directory on the hard disk and to let Apache see that directory. Do try not to let someone that might possible hack Apache to also hack MySQL, but I will leave that kind of thinking to the security analysts.

SDC XML configuration file
You need to add this symlink as a new rule in the SDC configuration file. You can see some examples there inside the file. When adding a new rule, you need to remember that SDC accesses Apache from the same machine, meaning your URLs should start with http://localhost/

Google Spreadsheets
Go into your Google Spreadsheets and go the to first cell (A1) and type:
=importData('http://localhost/[Your File Here]')
Hopefully, you should see the data from the MySQL CSV table.
If not, you might need to debug it some more.


Problems with SDC
The installation process wasn't as smooth as we would have liked but we did try something new at the time with MySQL. But apart from that, there are 2 problems we have noticed with SDC:
  1. Refresh rates
  2. High number of rows
SDC caches extremely aggressively and when we wanted to make changes to our CSV file, we still saw the old data which confused us through out the process.
A trick to help solve this is, if you goto Google Spreadsheets and in the importData function you change "http://localhost/sample.csv" to "http://localhost/sample.csv?a"
This will force Google to get the new data and you can change the "a" to whatever you like.

A high number of rows (I tried 31k) will get Google Spreadsheets stuck. And when I mean stuck, I don't mean that their server crashes or anything, just that you would wait a long long time to get any results. So you should just show the summarized data you wanted to show.


Conclusion

I am under the impression that this is a relatively new technology for Google and its not 100% tweaked and finalized (it was launched around the end of April 2009). I would assume that it would be good to invest time in structuring it now (meaning, you would be an early adopter), while Google catches up.

No comments:

Post a Comment