PDI - Connection Pool's Context

512 views
Skip to first unread message

Anderson F.

unread,
Sep 11, 2018, 8:19:30 AM9/11/18
to Pentaho Community
Hi guys.


I have a question regarding the PDI's connection pool context

Whats is the context? Is it by steps? transformations? jobs?

Example: 
Assuming I've enabled the Connection Pooling(initial: 5 maximun: 50), and that I am using a real ETL.

The ETL has a main job, and it has several sub-jobs, and each of them has several transformations, and each transformation has a step (with n copies) that accesses the database.

The Connection Pooling is covering the main job? 
Or it is covering each of transformations?
Or is covering each of steps?  

Best regards

Matt Casters (i-Bridge)

unread,
Sep 11, 2018, 10:07:36 AM9/11/18
to pentaho-...@googlegroups.com
It's usually in the context of a transformation which gets executed repeatedly in a job or another transformation (loops).  So most often where you have short lived transformations connecting/disconnecting all the time.
Remember that connection pooling is only used to reduce connect/disconnect delays.  It can't magically reduce the number of connections made to the database.
 
---
Matt Casters <matt.c...@ibridge.be>
Zaakvoerder i-Bridge bvba
Fonteinstraat 70, 9400 OKEGEM - Belgium - Cell : +32 486 97 29 37 -  skype: mattcasters



Op di 11 sep. 2018 om 14:19 schreef Anderson F. <anderson...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Pentaho Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pentaho-commun...@googlegroups.com.
To post to this group, send email to pentaho-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/a325c5c4-d958-4c23-a360-7484b34256a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anderson F.

unread,
Sep 11, 2018, 10:18:44 AM9/11/18
to pentaho-...@googlegroups.com
Hi Matt

Thanks for the answer. 🙂

I do not expect a magic solution.😁 

I thought I would start using connection pooling for the following problem: The database is configured to have 200 connections at most, and since I am using parallelism in the steps of the transformations, the database started to display the following error: sorry, too many clients already " 😡

You received this message because you are subscribed to a topic in the Google Groups "Pentaho Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pentaho-community/h7liXJ1nQdY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pentaho-commun...@googlegroups.com.

To post to this group, send email to pentaho-...@googlegroups.com.

Matt Casters (i-Bridge)

unread,
Sep 11, 2018, 10:26:31 AM9/11/18
to pentaho-...@googlegroups.com
So you did expect some kind of magic to happen :-)



Op di 11 sep. 2018 om 16:18 schreef Anderson F. <anderson...@gmail.com>:

Anderson F.

unread,
Sep 11, 2018, 10:41:28 AM9/11/18
to pentaho-...@googlegroups.com
Hmm Is there any solution other than increasing the number of connections? 🤔

Nelson Sousa

unread,
Sep 11, 2018, 10:42:17 AM9/11/18
to pentaho-...@googlegroups.com

Decrease the number of clients. ;)

(Sorry, couldn’t resist)



For more options, visit https://groups.google.com/d/optout.
--
Sent from Gmail Mobile

Anderson F.

unread,
Sep 11, 2018, 11:02:12 AM9/11/18
to pentaho-...@googlegroups.com
Decrease the number of clients. ;)
"(Sorry, couldn’t resist)" 

No problem😂

Me: Sorry boss, but if you want to increase performance by increasing the number of copies, we will have to change the database settings.

Boss: Hmmm, I did not give you responsibility for the database server either? (I'm an analyst, not DBA) So it's easy for you to sort this out. For a moment I thought we had a problem. 😅


Matt Casters (i-Bridge)

unread,
Sep 11, 2018, 11:50:56 AM9/11/18
to pentaho-...@googlegroups.com
You're not doing the database or the ETL server a favor by loading with 200 step copies in parallel.
I bet you're doing database lookups in massive amounts of copies or something silly like that.
Most of the time this is done much faster by either loading some data in memory or doing a merge join.


Op di 11 sep. 2018 om 17:02 schreef Anderson F. <anderson...@gmail.com>:

Anderson F.

unread,
Sep 11, 2018, 3:56:33 PM9/11/18
to pentaho-...@googlegroups.com
Its true. 😅😅😅

I chose an approach of putting the heaviest part of the load to the side of the database. Reason: We experienced memory overflow problems (the java server has 16 gigabytes of RAM)

Matt Casters (i-Bridge)

unread,
Sep 12, 2018, 3:22:40 AM9/12/18
to pentaho-...@googlegroups.com
With a merge join you can very efficiently join large data sets.  By sorting the data on the database you can do it quickly without using any memory.


Op di 11 sep. 2018 om 21:56 schreef Anderson F. <anderson...@gmail.com>:

Anderson F.

unread,
Sep 12, 2018, 9:33:57 AM9/12/18
to pentaho-...@googlegroups.com
Guys, thank you for all the answers 🙂

Reply all
Reply to author
Forward
0 new messages