Welcome to HBH! If you have tried to register and didn't get a verification email, please using the following link to resend the verification email.

MySQL Subquery Question


ghost's Avatar
0 0

Alright, it's pretty slow right now, so I'll ask one of those questions I've had stored away for a while. It's a MySQL question involving multiple subqueries. The situation involves a series of point counts that are tracked for specific, individual competitions. The reasons why I would need multiple subqueries are as follows:

  1. Only the top 10 scores for each competitor count.
  2. The grand totals are created from the top 10 scores above.
  3. Only the top 5 competitors actually place in the overall grand standing.

So, at my first set of attempts doing this, it looked something like this (in pseudo-SQL):

SELECT name, id as target, totals as (SELECT SUM(points) FROM (SELECT points FROM pointtable WHERE id = target ORDER BY points DESC LIMIT 10)) ORDER BY totals DESC LIMIT 5

I know it's confusing, but that's about the best as I can explain it. If you need any more specific detail on any part of it, just post and I will clarify as best as I can. Basically, the problem boils down to this: I have two subqueries inside the original query and, in order to sum the points accurately, I need to pass the id field from the main query – down two levels – to the second subquery (in the WHERE statement).


bl4ckc4t's Avatar
Banned
0 0

Sounds like an If Then statement would be used here. Maybe one that tests one query, then if that one matches, checks the next field, etc?


ghost's Avatar
0 0

I already have it working with PHP doing most of the heavy work. Unfortunately, this proves to be quite the time-consuming script when doing it that way. The difference of looping through 4000+ names, adding their top 10 point totals, storing it in an array, arsorting the array, then doing it 4 more times for different categories… and just grabbing the top 5 scorers with the subqueries doing the hard work is IMMENSE. It literally cut the execution time to easily less than a quarter of what it was.

The only issue I'm having is getting the id from the main query to be used to match the second subquery's WHERE clause (where it pulls the top 10 point results for the names). Thanks for the input.


bl4ckc4t's Avatar
Banned
0 0

Zephyr_Pure wrote: I already have it working with PHP doing most of the heavy work. Unfortunately, this proves to be quite the time-consuming script when doing it that way. The difference of looping through 4000+ names, adding their top 10 point totals, storing it in an array, arsorting the array, then doing it 4 more times for different categories… and just grabbing the top 5 scorers with the subqueries doing the hard work is IMMENSE. It literally cut the execution time to easily less than a quarter of what it was.

The only issue I'm having is getting the id from the main query to be used to match the second subquery's WHERE clause (where it pulls the top 10 point results for the names). Thanks for the input.

Select name WHERE points > 1000 ORDER BY desc Could cut your time by another quarter probably, if my calculations are correct. Just cut out all of the lowest scores immediately by using a certain point value. This will speed up your checks, because any check that is less than the point value will immediately be skipped, unless of course this is already what you had in mind.

-Bl4ckC4t

[edit] I am not the greatest with SQL, so if it isnt possible, sorry for this post. lol[/edit]


ghost's Avatar
0 0

No, that's a damn good point, and one I didn't consider. That will speed it up some. Thanks for the advice. :)

Still, ideally, I'd be looking to put as much of the weight on MySQL as possible, and subqueries utilizing values from the main query (or other subqueries) would be at the core of this.


bl4ckc4t's Avatar
Banned
0 0

After reviewing http://www.w3schools.com/sql/sql_top.asp I know you want the top 5 points shown:

SELECT TOP 5 * FROM pointtable

It will only select the top 5 from your points table.

Is this what you want it to do? Or do you want it to be multiple queries? For some reason I can't seem to think exactly what you need it to do, but I am one who just tries to shorten anything and everything I can.

-Bl4ckC4t


ghost's Avatar
0 0

It's cool… thanks for trying. The MySQL LIMIT is the equivalent of the SQL TOP command. Let me describe the db structure more:

  1. The competition names and ids are stored in one table.
  2. The competitor names and ids are stored in another table.
  3. The points for each competitor, for each competition, are stored in the third table (as a single entry each competitor + competition pair), with ids referencing the above two tables.

So, basically, I'd need to query that last table and pull in the top 10 point results for each competitor, then sum them for each competitor and reverse sort the competitors to get the top point totals. Ideally, this would be done with only one subquery… with the subquery being:

SELECT SUM(points) FROM pointtable WHERE competitorId = id ORDER BY points DESC LIMIT 10

… Unfortunately, SUM doesn't work right when used with a LIMIT. It's a documented MySQL glitch. When using SUM with a LIMIT, it sums all of the values for that field in the table, not just the 10 you're limiting it to. So, the solution to that was to force the LIMIT to return the top ten totals, then SUM() them in an outer query. Thus, the main query and two subqueries.

Not many people are trying to rationalize / figure out this one. Figured it might intrigue more than just the two people that have attempted to come up with an efficient solution thus far. :)


bl4ckc4t's Avatar
Banned
0 0

Zephyr_Pure wrote: It's cool… thanks for trying. The MySQL LIMIT is the equivalent of the SQL TOP command. Let me describe the db structure more:

  1. The competition names and ids are stored in one table.
  2. The competitor names and ids are stored in another table.
  3. The points for each competitor, for each competition, are stored in the third table (as a single entry each competitor + competition pair), with ids referencing the above two tables.

So, basically, I'd need to query that last table and pull in the top 10 point results for each competitor, then sum them for each competitor and reverse sort the competitors to get the top point totals. Ideally, this would be done with only one subquery… with the subquery being:

SELECT SUM(points) FROM pointtable WHERE competitorId = id ORDER BY points DESC LIMIT 10

… Unfortunately, SUM doesn't work right when used with a LIMIT. It's a documented MySQL glitch. When using SUM with a LIMIT, it sums all of the values for that field in the table, not just the 10 you're limiting it to. So, the solution to that was to force the LIMIT to return the top ten totals, then SUM() them in an outer query. Thus, the main query and two subqueries.

Not many people are trying to rationalize / figure out this one. Figured it might intrigue more than just the two people that have attempted to come up with an efficient solution thus far. :)

Up for posting your code? I can look at it and see if I can get an idea of what you have, and maybe edit it to do what you would like for it to do.

It might also inspire more people to look and post, lol.


ghost's Avatar
0 0

No offense, but posting my code would not make sense for my particular question. Yes, I can probably optimize the PHP more from here, but putting the majority of the weight on the MySQL query would speed up the script more than that ever could. If I helps to clarify the methodology more, this pseudocode can be referenced:

foreach (competitor) {
   query the id for the competitor;
   query the top 10 point results for that competitor id;

   pointtotal = 0;
   foreach (point result) {
      pointtotal += point result;
   }

   add competitor name and point total to array;
}

sort the array in reverse (highest total at top);

I know I can save memory by adding a routine, right after the "add to array" bit, to reverse sort the current array for count = ideal total + 1, then dropping the last array element. That is irrelevant, though, since the main ideal is this: Instead of looping through 4000+ competitors to check the totals, it would make more sense to let MySQL sort the point sums and limit the results to the top few. A benchmark of the two methods, side by side, would easily suffice as reason.


bl4ckc4t's Avatar
Banned
0 0

Zephyr_Pure wrote: No offense, but posting my code would not make sense for my particular question.

None taken. Thanks for this code, I will toy with some queries on my home serv to see what I can come up with. However, is your data on tables in the same database? If this is the case, I may be able to optimize it.

-Bl4ckC4t


ghost's Avatar
0 0

bl4ckc4t wrote: However, is your data on tables in the same database? If this is the case, I may be able to optimize it.

-Bl4ckC4t Yes, the tables are all in the same database. Thanks for going to so much effort to help… it's much appreciated. I know it's not going to be an easy solution (as I found after much research and tinkering), but I know it is one that will reap a great deal of knowledge for those that indulge it. Again, the only issue left is getting the retrieved id from the main query to be passed down into the second subquery. If that can happen, I have the query for the rest. I'll post that as soon as I can locate it so that this will be closer to a solved inquiry.

Just for the record, this particular solution is already done and gone… I've been saving this question for about a month now. I like to liven up the forums when they slow down with questions that are… mind benders. :)


ghost's Avatar
0 0

im a sql noob, but do you need a second WHERE that is not in the sub query so you have the SELECT FROM and WHERE in the main query?

just a thought


ghost's Avatar
0 0

mambo wrote: im a sql noob, but do you need a second WHERE that is not in the sub query so you have the SELECT FROM and WHERE in the main query?

just a thought Nope. The subquery would pull the points for each competitor by the id, then the main query would merely order them descending and pull the top 10. The point totals would be associated with the competitors automatically through the iterations of the subquery. If I put a WHERE clause in the main query, it would only return the point total for a specific competitor or group of specified competitors. Nice try, though. :)


SySTeM's Avatar
-=[TheOutlaw]=-
20 0

Could you not just use JOINs? =/ I'm a bit unclear on what it is exactly you're attempting, but JOINs seem to do most things that require cross-table information :p


ghost's Avatar
0 0

I had a look at using a JOIN, but all I came up with was that the most basic MySQL JOIN would not work (tried it). Basically, what I'm trying to do is this, in a nutshell:

  1. Grab the top 10 point results for a single competitor (in any event).
  2. SUM() those.
  3. Tie them to each competitor and ORDER them by point totals (the SUM in #2) DESC LIMIT 5.

Guess I'll take another look at the INNER/OUTER/LEFT/RIGHT JOINs. Thanks for the suggestion. :)


ghost's Avatar
0 0

do you need to declare the tables outside of the subroutine?

thinking outload


ghost's Avatar
0 0

mambo wrote: do you need to declare the tables outside of the subroutine?

thinking outload Umm… what? :P

If you're referring to naming them using an inner join (i.e., tablename as name), I already tried that. The 2nd subquery (the one running the points limit 10 query) doesn't recognize the named table in the main query.