Announcement

#1 2023-10-05 22:12:09

redtails
Member
2022-10-18
1

Large gallery, slow mysql

Hello, hope that you're doing well!

I have a large private gallery (NSFW, won't link), with around 1 million images in it spread over ~500 albums. It's essentially my web-portal to my gallery-dl folder.
It runs surprisingly smooth, but certain queries are extremely slow, and adding tags to every image basically make the site unusable.

Example of particularly slow queries:

"display a calendar by creation date"

Code:

SELECT DISTINCT id  FROM piwigo_images
INNER JOIN piwigo_image_category ON id = image_id
    WHERE 1 = 1
   AND date_creation IS NOT NULL
  ORDER BY date_creation DESC, date_creation ASC, file ASC, id ASC
(this query time : 52.066 s)
(total SQL time  : 72.627 s)
(total time      : 72.632 s)
(num rows        : 1087767 )

???? why distinct(id) ???? it's the primary key, it's already distinct
--> There's edge-cases where images can be in multiple galleries (categories), so it'd yield multiple entries for the same image. I personally don't use this function. And I feel like have a string/array field in the database containing one or multiple category IDs would be much faster to search through

Code:

SELECT * FROM posts WHERE string_to_array(category_string, ' ') @> '{category1}';

???? why sort 'date_creation' first DESC, then ASC ????? crazy
????? why have the piwigo_image_category in a separate table. Is it not possible to put it in the piwigo_images to avoid INNER JOIN
Sorting by multiple factors is wasteful
I suspect that counting the images by year (even though it's purely aesthetic) is taking a really long time

Code:

SELECT DISTINCT(YEAR(date_creation)) as period,
  COUNT(DISTINCT id) as nb_images FROM piwigo_images
INNER JOIN piwigo_image_category ON id = image_id
    WHERE 1 = 1 AND date_creation IS NOT NULL
  GROUP BY period;
(this query time : 20.071 s)
(total SQL time  : 20.074 s)
(total time      : 20.078 s)
(num rows        : 15 )

For information that's only being displayed as a number if you hoover over the year, this is a very wasteful query...

"searching by tag"

Code:

SELECT id
  FROM piwigo_images i 
    INNER JOIN piwigo_image_category ic ON id=ic.image_id
    INNER JOIN piwigo_image_tag it ON id=it.image_id
    WHERE tag_id IN (6)
  GROUP BY id
ORDER BY date_creation ASC, file ASC, id ASC
(this query time : 0.557 s)
(total SQL time  : 13.882 s)
(total time      : 13.901 s)
(num rows        : 15467 )

really slow to do multiple inner join, and to ORDER BY multiple fields. Very wasteful
group by id...? as an alternative to distinct(id)??

"opening the admin panel"

Code:

SELECT
  COUNT(*) AS ext_counter,
   SUBSTRING_INDEX(path,".",-1) AS ext,
   SUM(filesize) AS filesize
  FROM `piwigo_images`
  GROUP BY ext
;
(this query time : 4.254 s)
(total SQL time  : 9.753 s)
(total time      : 9.794 s)
(num rows        : 5 )

? Can't we just add an 'ext' database field? Seems wasteful to compute this again and again. I added an index on filesize and path because this query used to take 60+ seconds.

"looking at an image" (though not always slow)

Code:

UPDATE
  piwigo_images
  SET hit = hit+1, lastmodified = lastmodified
  WHERE id = 957357
;
(this query time : 2.773 s)
(total SQL time  : 3.373 s)
(total time      : 3.383 s)
(affected rows   : 1 )

"most visited"

Code:

SELECT DISTINCT(id)
  FROM piwigo_images
    INNER JOIN piwigo_image_category AS ic ON id = ic.image_id
  WHERE hit > 0
    
     ORDER BY hit DESC, id DESC
  LIMIT 15
;
(this query time : 39.774 s)
(total SQL time  : 39.778 s)
(total time      : 39.784 s)
(num rows        : 15 )

?? why distinct(id), it's the primary key...
I don't actually understand why it's doing the inner join here. The "most visited" page isn't specific to albums, it's the overall toplist. And I assume that it's limited to 15 because doing it this way is so slow. Are the categories also being used for access control? Would be much faster to integrate this into the piwigo_images table. There is no reason why listing the top 15 should take half a minute. Sorting 'hit' with an index from the piwigo_images table would take milliseconds

"searching regular"

Code:

SELECT id from piwigo_images i WHERE
(CONVERT(file, CHAR) LIKE '%search%'
 OR MATCH(name, comment) AGAINST( 'search searches' IN BOOLEAN MODE))
(this query time : 3.219 s)
(total SQL time  : 3.241 s)
(total time      : 3.247 s)
(num rows        : 0 )

Crazy query. Just put a FULLTEXT index on 'file' and 'name'


Piwigo is running bare-metal on a dual-xeon system (8+8 cores) with 196gb ram. The database is on ssd.
I understand that piwigo wasn't intended for such large galleries (or maybe it was? ;)). But could you advice me if there's anything that I can change to make this faster? I'm almost at the point where I'll just dig into the php myself to change these queries, but I don't want to fork piwigo because the development seems to be progressing quite nicely.

Thanks for your time. Hope to hear from you


Code:

Piwigo 13.8.0 Check for upgrade
Installed on 9 July 2022, 1 year 2 months 3 weeks 6 days ago
Operating system: Linux
PHP: 7.4.30 (Show info) [2023-10-05 21:03:17]
MySQL: 5.5.5-10.6.12-MariaDB-0ubuntu0.22.04.1 [2023-10-05 21:03:17]
Graphics Library: External ImageMagick 6.9.11-60
Cache size 65983.40 MB   calculated right now Refresh

Admin Tools
AStat.2
Automatic Size
BatchCustomDerivatives
DefaultCreationDateToToday
Grum Plugins Classes.3
GThumb+
LocalFiles Editor
Properties Mass Update
SortOrders
VideoJS

Last edited by redtails (2023-10-07 23:23:12)

Offline

 

Board footer

Powered by FluxBB

github twitter newsletter Donate Piwigo.org © 2002-2024 · Contact