15 Jan 2012

Query-Digest on Windows

The best way to run pt-query-digest is by making absolutely sure it is not running on any server that is even slightly important to production.

I recently had a case where pt-query-digest took up all the memory and swap on a (linux) development server. I think the reason was that the slow log had blob data in it.
Even though it wasn't a production server, it managed to upset some people.

So I:

  1. Installed strawberry perl on my windows computer
  2. Downloaded the gzip'd slow log file (using WinSCP)
  3. Unzip'd it
  4. Saved percon.com/get/pt-query-digest to a .pl file on my laptop
  5. Went to start menu
  6. In the run part I typed cmd
  7. In the new command window, I typed pt-query-digest.pl slow.log > digest.txt


And walla!
My laptop was a bit hot for a few hours, but nothing crashed and no one was upset.

27 Dec 2011

Explaining Indexes with a Library Metaphor - Reloaded

I wanted to build on the metaphor I used to explain indexes and continue a bit further into disk and memory usage.


Sorting without index cards

Let us say that we would like to get a list of all books written by J.R. Hartley and we would like this list ordered by the most recently published books.
What we would do is:

  • Enter the library
  • Speak to our trusty librarian about the list that we need
  • The librarian would consult his or her index cards and would then give us a list of where all those books are on the shelves in the library. 
  • We then head over to find those books in the different positions in the library.
  • Once located, we can do one of two things in order to sort the books:
    1. We can get the information we need from the books and simply remember the list and sort it in our heads (in memory)
    2. We can take all the books over to an available desk (temporary table on disk) then take a pad and pencil and write down that information on it. After that is done, we can sort that list on the pad and produce the list we want.

As you can imagine, doing calculations in memory would be much faster than sitting down next to a table and writing it down. The reasons why we would prefer to write it down would be:

  1. The list of items would be too long to keep in our memory.
  2. We may need to keep a text that is or may be too long  to hold in our memory (such as text or blob datatype).




Sorting with index cards 

Let us suppose that the piece of information we need is kept within the librarian's index cards.
What we would then do is:

  • Enter the library
  • Ask for the index cards from the librarian
  • Place it on his or her counter and change their positions until they are in the order that we want. 
  • Leave the library with the ordered list that we wanted

As you can imagine, using indexes to sort the data can be much faster than the alternative.


How can we use indexes to speed up our sorting?

One example of where you can use indexes to speed up your sorting is by using a multi-column index in your table structure. The index you set should first cover the search condition and then cover the sorting requirement.
For example:
SELECT * FROM table1 WHERE a = 100 ORDER BY b

Your index should be idx_sort(a,b)


Try it out on your own system and see if you notice any improvements!

27 Nov 2011

Get the 95% for Your Index Prefix

I was playing around with an idea recently...
I wanted to find out, what is the lowest number of characters needed to satisfy 95% of the values in a column? 95% is to rule out outliers.

I plan on using this when I want to get a bit agressive with the indexes on a table that gets inserted to very often. But until now, I haven't had a good query to find it quickly.

So, I thought a bit and came up with the following query:

mysql> show create table filenames\G
*************************** 1. row ***************************
       Table: show_filename
Create Table: CREATE TABLE `filenames` (
  `id` int(11) NOT NULL,
  `filename` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `filename` (`filename`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select min(cc.l) as ninty_five_percent (
select length(filename) as l, count(distinct filename) as c, @ccount :=  @ccount + count(distinct filename) as t from filenames, (select @ccount := 0) as foo group by length(filename)) as cc where cc.t >= (@ccount*0.95) order by cc.t;
+---------------------+
|  ninty_five_percent    |
+---------------------+
|                  48         |
+---------------------+
1 row in set (34.67 sec)

I now can run:

ALTER TABLE filenames DROP INDEX `filename`, ADD INDEX `filename`(`filename`(48));

95% may not be the ideal percentage to use, but it probably will reduce the size of outlier values.
It can take me a while to get a good percentage that reduces the index size, reduce the INSERT/UPDATE overhead, but still give good response time to SELECT queries.
I'm going to try this query to help me get there.


Thanks to Shlomi Noach for his talk in Percona London that helped inspire this query.