Flax search engine & Pypes

13 views
Skip to first unread message

Charlie Hull

unread,
Sep 3, 2010, 8:56:11 AM9/3/10
to pypes
Hi all,

Just a quick note to introduce myself - I'm Charlie Hull from Flax
(www.flax.co.uk) - we work with open source search engines and I heard
about Pypes via a LinkedIn group.

Pypes is a great idea and we'd like to try integrating it with Flax -
there are lots of components we've written that would fit (see
http://code.google.com/p/flaxcode/source/browse/) including a web
crawler, file format filters and XML indexer to the Xapian search
engine.

I have a few questions: I've downloaded and run Pypes as the Wiki
suggested:

1. The 'Submit' button doesn't pop up a dialog as the Help suggests -
is this expected?
2. The FASTXML component creates an output folder, but creates no data
- I had a very simple project where the data was split to a Debug
component, which was showing output to the console, so I know there
was data moving through the system.

Hope to meet some of you at Lucene Revolution!

Cheers

Charlie

Matt Weber

unread,
Sep 3, 2010, 6:30:58 PM9/3/10
to py...@googlegroups.com
Hi Charlie,

Thanks for your interest in Pypes! I am able to reproduce the submit
button bug, however, the FastXML Publisher works without any issue.
Can you answer a few questions for us:

What platform are you using?
What browser?
What version of Pypes?
What version of Stackless?

I just setup a fresh installation of Pypes 1.1, MacOS 10.6.4,
Stackless Python 2.6.4, using Google Chrome:

1. Download pypes from http://bitbucket.org/diji/pypes/get/v1.1.tar.gz
2. tar -xvzf pypes-v1.1.tar.gz
3. cd pypes
4. python bootstrap.py # make sure you are using your stackless
interpreter here
5. ./bin/buildout
6. ./bin/paster make-config pypesvds production.ini
7. ./bin/paster setup-app production.ini
8. ./bin/paster serve production.ini

Once Pypes was running I created a simple Pipeline using the CSV
Adapter and the FAST Xml Publisher and clicked Save.

Then I opened a new terminal window (assume $PYPES is the location you
extracted and installed pypes)

1. cd $PYPES
2. mkdir tmp
3. echo "1,2,3,4,5,6" > tmp/test.csv # just create a basic csv file
here is not using unix
4. ./bin/FileCrawler.py -e csv tmp/

Thats it, assuming you used the default output directory, there should
be a folder called $PYPES/fastxml and a single file in it that looks
similar to:

<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="column1">
<value><![CDATA[1]]></value>
</element>
<element name="column2">
<value><![CDATA[2]]></value>
</element>
<element name="column3">
<value><![CDATA[3]]></value>
</element>
<element name="column4">
<value><![CDATA[4]]></value>
</element>
<element name="column5">
<value><![CDATA[5]]></value>
</element>
<element name="column6">
<value><![CDATA[6]]></value>
</element>
</document>
</documents>


I will open a bug report and look into the Submit button issue.

Thanks,
Matt Weber

Eric Gaumer

unread,
Sep 5, 2010, 10:20:48 AM9/5/10
to py...@googlegroups.com
On Fri, Sep 3, 2010 at 8:56 AM, Charlie Hull <cha...@flax.co.uk> wrote:

I have a few questions: I've downloaded and run Pypes as the Wiki
suggested:

1. The 'Submit' button doesn't pop up a dialog as the Help suggests -
is this expected?
2. The FASTXML component creates an output folder, but creates no data
- I had a very simple project where the data was split to a Debug
component, which was showing output to the console, so I know there
was data moving through the system.


I'm not able to reproduce either of these. 

In terms of the submit button, I've tried this on several browsers (Firefox, Camino, Chrome, Safari) all running on OS X 10.6 (latest version of each). The submit button is a menu button so when you select it, you should see a menu appear. Are you seeing this menu?

The skinning of the button sort of hides the fact that this is a menu button. Be sure to click on the far right side of the button. The idea with the menu was to allow users to submit files and URL's although URLs aren't currently supported.

Also note that the submit button is mainly for testing your workflow. In reality, you push content in using some external feeding mechanism. Pypes provides a RESTful interface so you can use curl to feed documents.

]$ curl -F document=@/path/to/some/doc/test.xml localhost:5000/docs

Since we're dealing with "documents", pypes supports multipart/form-data and you can send compressed files (set Content-Encoding to gzip). Pypes also supports batching for certain content types like XML.

In terms of the FASTXML component, you can find the source for the built in components here:

$PYPES/ui/pypesvds/plugins

You can modify components by adding more verbose logging etc. Just restart pypes and your changes should take effect. By default, logging will be written to stdout.

Regards,
-Eric

Matt Weber

unread,
Sep 5, 2010, 1:33:26 PM9/5/10
to py...@googlegroups.com
Ahh, clicking the right side works as expected. I was clicking the
middle as there is no indication this is a drop down menu. Lets add a
little down arrow to the right or make it so the clicking anywhere on
the button brings up the menu.

--
Thanks,
Matt Weber

--
Thanks,
Matt Weber

Charlie Hull

unread,
Sep 6, 2010, 6:58:48 AM9/6/10
to py...@googlegroups.com
On 03/09/2010 23:30, Matt Weber wrote:
> Hi Charlie,

Hi Matt,


>
> Thanks for your interest in Pypes! I am able to reproduce the submit
> button bug, however, the FastXML Publisher works without any issue.
> Can you answer a few questions for us:
>
> What platform are you using?

Windows XP
> What browser?
Firefox 3.6.8
> What version of Pypes?
v1.1
> What version of Stackless?
2.7, the .msi from the Stackless website. I've just realised Pypes'
INSTALL.txt specifies 2.6.2 .... I can retry with that if you like.

>
> I just setup a fresh installation of Pypes 1.1, MacOS 10.6.4,
> Stackless Python 2.6.4, using Google Chrome:
>
> 1. Download pypes from http://bitbucket.org/diji/pypes/get/v1.1.tar.gz
> 2. tar -xvzf pypes-v1.1.tar.gz
> 3. cd pypes
> 4. python bootstrap.py # make sure you are using your stackless
> interpreter here
> 5. ./bin/buildout
> 6. ./bin/paster make-config pypesvds production.ini
> 7. ./bin/paster setup-app production.ini
> 8. ./bin/paster serve production.ini
>
> Once Pypes was running I created a simple Pipeline using the CSV
> Adapter and the FAST Xml Publisher and clicked Save.

I did something very similar on Windows. I then used FileCrawler.py to
submit some files - splitting the output to the Debug component, which
showed some files making it through the process. The fastxml folder was
created, but no files appeared.

Cheers

Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

Matt Weber

unread,
Sep 7, 2010, 12:27:13 AM9/7/10
to py...@googlegroups.com
Charlie,

I have created a bug report for this issue:
http://bitbucket.org/diji/pypes/issue/2/fastxml-publisher-does-not-work-on-windows.
I will get Pypes setup on Windows to see if I can figure out what is
going on. Python 2.7 vs. 2.6 should not be an issue, but I will try
with both versions and get back to you.


--
Thanks,
Matt Weber

--
Thanks,
Matt Weber

Eric Gaumer

unread,
Sep 9, 2010, 11:19:54 PM9/9/10
to py...@googlegroups.com
Matt just pushed a fix for this in tip.


The change is minor so you might want to just patch manually.

-Eric

Charlie Hull

unread,
Sep 10, 2010, 5:27:42 AM9/10/10
to py...@googlegroups.com
On 10/09/2010 04:19, Eric Gaumer wrote:
> Matt just pushed a fix for this in tip.
>
> http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7
>
> <http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7>The change is
> minor so you might want to just patch manually.

Verified here - great! I'll have a think about how we might get further
involved.

Best

Charlie


>
> -Eric
>
>
> On Tue, Sep 7, 2010 at 12:27 AM, Matt Weber <ma...@mattweber.org
> <mailto:ma...@mattweber.org>> wrote:
>
> Charlie,
>
> I have created a bug report for this issue:
> http://bitbucket.org/diji/pypes/issue/2/fastxml-publisher-does-not-work-on-windows.
> I will get Pypes setup on Windows to see if I can figure out what is
> going on. Python 2.7 vs. 2.6 should not be an issue, but I will try
> with both versions and get back to you.
>
>
> --
> Thanks,
> Matt Weber
>
> :58 AM, Charlie Hull <cha...@flax.co.uk

> >>> (www.flax.co.uk <http://www.flax.co.uk>) - we work with open

> > web: www.flax.co.uk <http://www.flax.co.uk>
> >
>
>
>
> --
> Thanks,
> Matt Weber

Eric Gaumer

unread,
Sep 10, 2010, 7:54:42 AM9/10/10
to py...@googlegroups.com
On Fri, Sep 10, 2010 at 5:27 AM, Charlie Hull <cha...@flax.co.uk> wrote:
On 10/09/2010 04:19, Eric Gaumer wrote:
Matt just pushed a fix for this in tip.

http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7

<http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7>The change is
minor so you might want to just patch manually.

Verified here - great! I'll have a think about how we might get further involved.


Great. 

I looked at Flax a few years ago when I was researching Xapian and even then, it seemed like an exciting project. I'd like to catch up at the conference in Boston and talk more about Flax.

We have a number of clients but we don't typically promote any specific search technology. Instead, we try to match the clients needs to the best solution so we can reduce the friction and achieve our goals.

It certainly would be interesting to add Flax to that list of choices. Aside from pypes, which seems like it could fit very well into Flax, we also do a lot of work involving NLP via the NLTK (now under an Apache 2.0 license). Marrying some of these features together would a worthwhile effort.

I'm already looking into writing a Flax publishing component for pypes by leveraging Xappy. This would allow us to, for instance, map existing FastXML into Flax or submit other formats for indexing. As Xappy is Python, it almost seems like a marriage made in heaven.

Very exciting stuff and I look forward to meeting in Boston.

-Eric

Charlie Hull

unread,
Sep 10, 2010, 9:17:30 AM9/10/10
to py...@googlegroups.com
On 10/09/2010 12:54, Eric Gaumer wrote:
> On Fri, Sep 10, 2010 at 5:27 AM, Charlie Hull <cha...@flax.co.uk
> <mailto:cha...@flax.co.uk>> wrote:
>
> On 10/09/2010 04:19, Eric Gaumer wrote:
>
> Matt just pushed a fix for this in tip.
>
> http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7
>
> <http://bitbucket.org/diji/pypes/changeset/7c24ef1a60b7>The
> change is
> minor so you might want to just patch manually.
>
>
> Verified here - great! I'll have a think about how we might get
> further involved.
>
>
> Great.
>
Hi,

> I looked at Flax a few years ago when I was researching Xapian and even
> then, it seemed like an exciting project. I'd like to catch up at the
> conference in Boston and talk more about Flax.

Yes, looking forward to that myself.


>
> We have a number of clients but we don't typically promote any specific
> search technology. Instead, we try to match the clients needs to the
> best solution so we can reduce the friction and achieve our goals.

We're actually of the same mind - although we've developed the Flax
layer on top of Xapian, we've also worked with Lucene, Ultraseek and
many others over the years. Whatever solves the client's problem.


>
> It certainly would be interesting to add Flax to that list of choices.
> Aside from pypes, which seems like it could fit very well into Flax, we
> also do a lot of work involving NLP via the NLTK (now under an Apache
> 2.0 license). Marrying some of these features together would a
> worthwhile effort.
>

Yes indeed!

> I'm already looking into writing a Flax publishing component for pypes
> by leveraging Xappy. This would allow us to, for instance, map existing
> FastXML into Flax or submit other formats for indexing. As Xappy is
> Python, it almost seems like a marriage made in heaven.

Actually, I'm going to suggest avoiding Xappy and looking at flax.core :
it's a simpler layer for a start. We've blogged about this recently at
http://www.flax.co.uk/blog/ and also about flax.crawler, which might be
something else we can look at.

Let me know what you think!

Cheers

Charlie

>
> Very exciting stuff and I look forward to meeting in Boston.
>
> -Eric
>

Eric Gaumer

unread,
Sep 10, 2010, 10:03:15 AM9/10/10
to py...@googlegroups.com
On Fri, Sep 10, 2010 at 9:17 AM, Charlie Hull <cha...@flax.co.uk> wrote:

I'm already looking into writing a Flax publishing component for pypes
by leveraging Xappy. This would allow us to, for instance, map existing
FastXML into Flax or submit other formats for indexing. As Xappy is
Python, it almost seems like a marriage made in heaven.

Actually, I'm going to suggest avoiding Xappy and looking at flax.core : it's a simpler layer for a start. We've blogged about this recently at
http://www.flax.co.uk/blog/ and also about flax.crawler, which might be something else we can look at.

Let me know what you think!


 Awesome. I'm all over this.

I also like this post:


Reading this post, it's very evident that we share the same mindset. As the folks who actually have to go into an organization, regardless of the chosen search technology, and integrate all the pieces, we understand the importance of choosing a dynamic language like Python.

I didn't want to get deeply into that discussion on the LinkedIn thread but creating a pipeline in Java just doesn't make a lot of sense to me (again, as an integrator). It would inflate project times and therefore cost, and most likely lead to overly complicated solutions that were far less flexible and much more error prone.

This was one aspect where I have to credit FAST (the old FAST). They really understood this and their document processing pipeline was a mixture of C and Python that allowed you to rapidly build robust solutions. I've never been in a situation where it couldn't handle even the most elaborate enterprise task and that included integration with applications like Teragram, etc.

But today people seem enamored with Java and so they try solving everything in that language whether it makes sense or not. 

It's refreshing to meet folks like yourself who actually understand the value of choosing the right tool for the job.

Pypes wasn't designed to be glamorous because data conditioning isn't a glamorous task but it's something that almost every enterprise has to deal with. It's the dirty trench work that leads to a lot of "one off" solutions that aren't typically reusable or manageable in their current form.

Pypes isn't about providing a bunch of pre-fabricated components that will "solve all your needs". It's about providing a flexible and scalable execution environment where you can plug in your own code and not have to worry about how it might scale or interoperate. 

We were careful to chose open standards. Documents are submitted over HTTP and we support both batching and compression. The internal document model is JSON which maps quite nicely to the Python dict. The entire configuration (down to the positioning of components on the canvas) is represented as JSON and can be exported and dropped onto another node. 

Pypes components are Python eggs so we're leveraging the latest advances in distributing Python code and dynamic discovery of plug-in's. We use buildout to be able to sandbox pypes and it's dependencies from the system version of Python. The entire application is RESTful. It's undocumented, but you could actually do HTTP requests that would instantiate components and even wire them together. The UI is nothing more than a Javascript application that does exactly this.

But I think most importantly, pypes wasn't designed to be some commercial product. It's a result of years of experience in actually having to build robust solutions. We designed it to target our own needs and make us more efficient at what we do. Given that we're our number one user, pypes is very good at doing what it was designed to do rather than promising the world and failing to deliver.

I'm very glad our paths crossed. It's refreshing given the world we live in.

-Eric


Charlie Hull

unread,
Oct 1, 2010, 5:32:45 AM10/1/10
to py...@googlegroups.com
On 10/09/2010 12:54, Eric Gaumer wrote:

> Great.
>
> I looked at Flax a few years ago when I was researching Xapian and even
> then, it seemed like an exciting project. I'd like to catch up at the
> conference in Boston and talk more about Flax.

Hi Eric,

I'm arriving on Tuesday night and will be around for Wednesday-Friday.
Currently I'm booked for 1-5pm on Wednesday and 10-11 on Friday, and
will be attending conference sessions - although I doubt I'll go to
everything! Is it worth arranging a time to meet or shall we just swap
contact details?

Reply all
Reply to author
Forward
0 new messages