Mass deletion of entries with groovy script

14 views
Skip to first unread message

Christoph Damm

unread,
Sep 24, 2024, 1:01:49 AM9/24/24
to Magnolia User Mailing List
Hello, 
I currently have the situation that I have a workspace with nearly 500k flat entries (each entry is small, only 4 fields:I know that's bad, will be changed in future but need to cleanup first).
Now, I need to delete a lot of them, tried that with a groovy script. 
However so far it always takes a long time, especially the session.save operation. 
My latest try was to fetch a smaller size by adding a limit to the query and then also saving in smaller sizes, eg saving after 50 or 100 nodes removed.

Will provide the script later on as well.


Regards,
Christoph

Christoph Damm

unread,
Sep 24, 2024, 5:46:53 AM9/24/24
to Magnolia User Mailing List
Hello,

so my script looks something like this:
Even if my initial query (driven by date param) returns 5-6k nodes, still e.g. a batch size of 50 results in a session.save duration of 40-50seconds each.


import info.magnolia.jcr.util.NodeUtil
import java.text.SimpleDateFormat

def formattedDate = "2023-03-01T00:00:01.000Z"
def batchSize = 50
def counter = 0
def dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")

session = ctx.getJCRSession("data")
println "Script started"

try {
    qm = session.getWorkspace().getQueryManager()

    while (true) {
        println("Start query data")
        // Modify query to include LIMIT for pagination
        query = "SELECT p.* FROM [mgnl:data] AS p WHERE p.[date] <> '' AND p.[date] < CAST('" + formattedDate + "' AS DATE)"

        q = qm.createQuery(query, "JCR-SQL2")
        q.setLimit(50);
        results = q.execute()

        def nodeIterator = results.getNodes()
        if (!nodeIterator.hasNext()) {
            println("No more data found, stopping.")
            break /
        }

        while (nodeIterator.hasNext()) {
            def data = nodeIterator.nextNode()          
            data.remove()
            counter++
           
            if (counter % batchSize == 0) {
                def currentTimestamp = dateFormat.format(new Date())
                println "[$currentTimestamp] Saving batch of ${batchSize} nodes"
                session.save()
                def currentTimestampEnd = dateFormat.format(new Date())
                println "[$currentTimestampEnd] Saved batch of ${batchSize} nodes"
            }
        }
    }

 
    if (counter % batchSize != 0) {
        session.save()
        println "Saved remaining nodes"
    }

} finally {
    session.logout()
}

println "Script finished"

Tried various batch sizes, but no big improvement yet.

Any idea what i could improve?

Thanks

Chris
Reply all
Reply to author
Forward
0 new messages