Postgresql trim-function

2,188 views
Skip to first unread message

Zwieberl

unread,
Dec 13, 2012, 5:03:42 AM12/13/12
to sqlal...@googlegroups.com
Hello,

I am very happy with using sqlalchemy 0.8 so far.
But now I stumpled upon a problem I could not solve:

I have strings like 'POINT(-141.1643 32.42112)' and want to extract the numbers between the brackets.
Note1: I can not use python to manipulate the string, since the string gets created from a Postgresql-function-call within the select-statement, and the substring has to be available in the same select.
Note2: the length of the numbers is not constant, therefore a simple substring()-call is not possible, since I dont know the end-value.

Now, since "POINT(" is always the same I could use
trim(leading "POINT(" from x), as well as
trim(trailing ")" from x)

to get rid of everything outside the brackets.

But how can I use the trim-function in sql-Alchemy?

I tried sqlalchemy.func.trim('leading', '"POINT("' ,x) 
-> error: function pg_catalog.btrim(unknown, unknown, text) does not exist

and sqlalchemy.func.trim('leading "POINT(" from ' + x) 
-> returns just the whole string 'leading "POINT(" from POINT(32.233 4.42)'

Does anyone know how to use trim in sqlalchemy correctly?

Any help would be appreciated! (Also if you have a different approach towards extracting the substring from the string)

Thank you all!

Audrius Kažukauskas

unread,
Dec 13, 2012, 6:30:52 AM12/13/12
to sqlal...@googlegroups.com
On Thu, 2012-12-13 at 02:03:42 -0800, Zwieberl wrote:
> I have strings like 'POINT(-141.1643 32.42112)' and want to extract the
> numbers between the brackets.
> Note1: I can not use python to manipulate the string, since the string gets
> created from a Postgresql-function-call within the select-statement, and
> the substring has to be available in the same select.
> Note2: the length of the numbers is not constant, therefore a simple
> substring()-call is not possible, since I dont know the end-value.

How about this:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

db_engine = create_engine('postgresql://localhost')
Session = sessionmaker(bind=db_engine)
db_session = Session()

Base = declarative_base()

class Point(Base):
__tablename__ = 'points'
id = Column(Integer, primary_key=True)
data = Column(Text)

Base.metadata.create_all(bind=db_engine)
p1 = Point(data=u'POINT(-141.1643 32.42112)')
p2 = Point(data=u'POINT(-42.1337 99.99999)')
db_session.add_all([p1, p2])
db_session.commit()

q = db_session.query(func.regexp_split_to_array(
func.substring(Point.data, r'^POINT\((.*)\)$'), r'\s'))
for row in q:
print row

q = db_session.query(cast(func.regexp_split_to_table(
func.substring(Point.data, r'^POINT\((.*)\)$'), r'\s'), Float))
for row in q:
print row

> Any help would be appreciated! (Also if you have a different approach
> towards extracting the substring from the string)

Although I'm not familiar with this stuff, but perhaps this would be
better suited for PostGIS and GeoAlchemy?

--
Audrius Kažukauskas
http://neutrino.lt/

Zwieberl

unread,
Dec 13, 2012, 6:58:22 AM12/13/12
to sqlal...@googlegroups.com

Hi Audrius,

How about this:

  q = db_session.query(func.regexp_split_to_array(
      func.substring(Point.data, r'^POINT\((.*)\)$'), r'\s'))
  for row in q:
      print row

  q = db_session.query(cast(func.regexp_split_to_table(
      func.substring(Point.data, r'^POINT\((.*)\)$'), r'\s'), Float))
  for row in q:
      print row

That works! Thank you very much!
But somehow I am a little bit worried about the performance. Wouldn't trim() be much faster than regex? (I need to do this quite a lot of times)
 
Although I'm not familiar with this stuff, but perhaps this would be
better suited for PostGIS and GeoAlchemy? 

Actually, I am using PostGIS+GeoAlchemy2, but due to my data-format there is no function available which can extract the coordinates from the object. Only the String-representation, which is of the format shown above (Yes, this is VERY annoying.)

Cheers

Audrius Kažukauskas

unread,
Dec 13, 2012, 7:16:29 AM12/13/12
to sqlal...@googlegroups.com
On Thu, 2012-12-13 at 03:58:22 -0800, Zwieberl wrote:
> But somehow I am a little bit worried about the performance. Wouldn't
> trim() be much faster than regex? (I need to do this quite a lot of times)

Here's version which uses trim (actually, ltrim and rtrim), but the best
way to tell what's faster is to try both versions with your data:

func.rtrim(func.ltrim(Point.data, 'POINT('), ')')

Zwieberl

unread,
Dec 13, 2012, 7:29:00 AM12/13/12
to sqlal...@googlegroups.com

Here's version which uses trim (actually, ltrim and rtrim), but the best
way to tell what's faster is to try both versions with your data:

  func.rtrim(func.ltrim(Point.data, 'POINT('), ')')
 
That is awesome! Thank you very much!
Now I understand why he tried to call btrim() in my faulty version above. (I wondered about the 'b')

Stuff like this should be part of the documentation.

Zwieberl

unread,
Dec 13, 2012, 8:00:54 AM12/13/12
to sqlal...@googlegroups.com


but the best way to tell what's faster is to try both versions with your data:

Just as additional information:

# explain analyze select trim(trailing ')' from trim(leading 'POINT(' from ST_AsText(geo))) from profiles;
------------------------------------------------------------------------------------------------------------------
 (rows=17420)
 Total runtime: 150.843 ms

# explain analyze select substring(ST_AsText(geo) from '^POINT\((.*)\)$') from profiles;
------------------------------------------------------------------------------------------------------------------
 (rows=17420)
 Total runtime: 381.579 ms

So trim is, at least in pure Postgresql, indeed more than twice as fast. I doubt that this tendency changes when used from within sqlalchemy.

Thanks again!
Reply all
Reply to author
Forward
0 new messages