NEW: Document similarity server (testing needed)

383 views
Skip to first unread message

Radim

unread,
Aug 23, 2011, 8:12:40 PM8/23/11
to gensim, moiz....@gmail.com
Hi,

I added a document similarity service to gensim. It's like a layer
above models/indexes that shields you from details, so you only add/
remove documents from an index based on their id and text, and query
for similar documents based on id/text.

It can also be accessed remotely, that's why I call it a "server".

More info here: http://nlp.fi.muni.cz/projekty/gensim/simserver.html

I will be at EuroScipy the next couple of days, presenting gensim. I
would very appreciate you testing the service:

1) by trying examples in the tutorial: http://nlp.fi.muni.cz/projekty/gensim/simserver.html
2) by running `python -m gensim.test.run_simserver` followed by
`python setup.py test`, which will run unittests from gensim/test/
test_simserver.py.

Report any errors.

The code is in the `simserver` branch on github (there will still be
some refactoring, that branch is work-in-progress):
https://github.com/piskvorky/gensim/tree/simserver

Cheers,
Radim

Dieter Plaetinck

unread,
Aug 24, 2011, 4:22:55 AM8/24/11
to gen...@googlegroups.com


On Wed, Aug 24, 2011 at 2:12 AM, Radim <radimr...@seznam.cz> wrote:
I will be at EuroScipy the next couple of days, presenting gensim.

cool, hope to see video footage and/or slides of that.
good luck!

Dieter

Otto Federico Wald

unread,
Aug 25, 2011, 7:20:17 AM8/25/11
to gen...@googlegroups.com
Excellent! Thanks!
How does the service choose the best numtopic value in lsi?
Regards


http://www.patentnapsis.com

Otto Federico Wald

unread,
Aug 25, 2011, 7:52:12 AM8/25/11
to gen...@googlegroups.com
I answer my self, from the tutorial...
"The method=’lsi’ parameter meant that we trained a model for Latent
Semantic Indexing, using default preprocessing (lowercase+alphabetic
tokenizer) and default dimensionality (400) over a tf-idf
representation of our little corpus."
Sorry and thanks!

http://www.patentnapsis.com

Radim

unread,
Aug 25, 2011, 9:05:31 AM8/25/11
to gensim
Yes. There was talk of implementing automated optimal LSI
dimensionality setting by some users: ttps://github.com/piskvorky/gensim/issues/28

but I haven't heard from them since.

Best,
Radim


On Aug 25, 1:52 pm, Otto Federico Wald <ofw...@gmail.com> wrote:
> I answer my self, from the tutorial...
> "The method=’lsi’ parameter meant that we trained a model for Latent
> Semantic Indexing, using default preprocessing (lowercase+alphabetic
> tokenizer) and default dimensionality (400) over a tf-idf
> representation of our little corpus."
> Sorry and thanks!
>
> http://www.patentnapsis.com
>
> On Thu, Aug 25, 2011 at 8:20 AM, Otto Federico Wald <ofw...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Excellent! Thanks!
> > How does the service choose the best numtopic value in lsi?
> > Regards
>
> >http://www.patentnapsis.com
>
> > On Wed, Aug 24, 2011 at 5:22 AM, Dieter Plaetinck <die...@plaetinck.be> wrote:
>

Otto Federico Wald

unread,
Aug 25, 2011, 9:35:26 AM8/25/11
to gen...@googlegroups.com
That would be a great feature.
Thanks for your answers and for gensim.

http://www.patentnapsis.com

Radim

unread,
Aug 30, 2011, 7:42:45 AM8/30/11
to gensim
On Aug 24, 10:22 am, Dieter Plaetinck <die...@plaetinck.be> wrote:
> On Wed, Aug 24, 2011 at 2:12 AM, Radim <radimrehu...@seznam.cz> wrote:
> > I will be at EuroScipy the next couple of days, presenting gensim.
>
> cool, hope to see video footage and/or slides of that.
> good luck!

It was just a poster, I learned about the conference way too late to
apply for a talk.

Here is a picture of Olivier Grisel presenting my poster, while I went
to grab a drink :-) http://nlp.fi.muni.cz/projekty/gensim/DSC07177.JPG

But apart from people who already knew about gensim and/or were using
it, there was zero interest in what it does and how it does it. In
fact, I checked gensim web stats and the number of visitors *dropped*
on that day, so the interest was actually negative (lol!). It was just
a poor match between the audience (physicists) and the tool (NLP&IR),
I guess.

Radim

Shreyas Karnik

unread,
Aug 30, 2011, 4:20:12 PM8/30/11
to gen...@googlegroups.com
Hi Radim,

I liked the server idea.
I will be testing it with some data of my own and let you know about the results.

-Shreyas


Radim

unread,
Sep 7, 2011, 4:10:05 PM9/7/11
to gensim
Cmon guys! I want make a new release 0.8.1 this week. Having another
pair of eyes doing a sanity check (or at least running the automated
unit tests... doesn't take more than 5 minutes!) would be nice.

Cheers,
Radim

Stephan Gabler

unread,
Sep 7, 2011, 5:43:10 PM9/7/11
to gen...@googlegroups.com

Hello Radim,

sorry I don't have much time. But I did at least run the tests and they all pass.

best, stephan

Kefa Lu

unread,
Sep 7, 2011, 6:07:02 PM9/7/11
to gen...@googlegroups.com
Hi, Radim,

Thanks for the great job. Just got a quick question. Given a document, how accurate could Gensim find out its similar documents? Could LSI work well on finding similar documents? I'm learning LSI and LDA currently. I'm really interested in doing some research on document similarity calculations. Thanks a lot!

Kevin
--
Kefa(Kevin) Lu
Department of EECS,
University of Tennessee at Knoxville


Radim

unread,
Sep 8, 2011, 6:27:11 AM9/8/11
to gensim
Thx Stephan! That's what I wanted to hear :)

Kevin: you're asking two questions there:

1. Q: how accurate is gensim in determining similarities? A: within
single precision ("float") accuracy; it does not use approximations or
heuristics, so it's exact.
2. Q: do LSI/LDA work well? depends what you want to do -- these
statistical models of "semantics" are quite primitive. sometimes
that's enough, sometimes not.

Evaluation of "usefulness" of statistical semantics is a big deal,
even though the methods have been around for decades. That's why I nag
users to send me experiment reports -- real experience counts :)

Radim

Rense Lange

unread,
Sep 11, 2011, 1:45:37 AM9/11/11
to gensim
Hi,

I was playing with the document similarity service. This is my program
thus far:
----------------------------------------------------------
from gensim import similarities
import Pyro4
import sqlitedict
import logging

# no problem up to here ... i.e., no messages without the following
line:

server = similarities.SessionServer('my_server')

-------------------------------------------------------
But, my python now complains that: "'module' object has no attribute
'SessionServer'"

Does anyone see what did I do wrong here?

Radim

unread,
Sep 11, 2011, 4:49:50 AM9/11/11
to gensim
Hello Rense,

my guess: you're not using the `simserver` branch from github . Code
for the similarity server has not been merged into the main
development branch yet, as I'm still working on it :)

-rr

Kefa Lu

unread,
Sep 13, 2011, 11:40:27 AM9/13/11
to gen...@googlegroups.com
Radim,

Really appreciate for the reply. I will definitely try gensim and do some experiments. Thanks a lot for your effort for implementing the algorithms.

Kevin

Dieter Plaetinck

unread,
Sep 14, 2011, 6:01:02 AM9/14/11
to gen...@googlegroups.com
things look mostly okay, although the first command complained, and the last one did mention an exception (but didn't specify)
full output:

11:57:02 dieter@Gi gensim python2 -m gensim.test.run_simserver                                                                                                                                                  1 ↵
2011-09-14 11:57:08,152 : INFO : run_simserver:29 : <module>(MainThread) : running /home/dieter/code/gensim/gensim/test/run_simserver.py

USAGE: run_simserver.py DATA_DIRECTORY

    Start a sample similarity server, register it with Pyro and leave it running as a daemon.

Example:
    python -m gensim.test.run_simserver /tmp/server

11:57:08 dieter@Gi gensim python2 setup.py test                                                                                                                                                                 1 ↵
running test
running egg_info
writing requirements to gensim.egg-info/requires.txt
writing gensim.egg-info/PKG-INFO
writing top-level names to gensim.egg-info/top_level.txt
writing dependency_links to gensim.egg-info/dependency_links.txt
reading manifest file 'gensim.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.sh' under directory '.'
writing manifest file 'gensim.egg-info/SOURCES.txt'
running build_ext
testSplitAlphanum (gensim.test.test_parsing.TestPreprocessing) ... ok
testStemText (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripMultipleWhitespaces (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripNonAlphanum (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripNumeric (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripShort (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripStopwords (gensim.test.test_parsing.TestPreprocessing) ... ok
testStripTags (gensim.test.test_parsing.TestPreprocessing) ... ok
testPersistence (gensim.test.test_models.TestLdaModel) ... ok
testTransform (gensim.test.test_models.TestLdaModel) ... ok
testPersistence (gensim.test.test_models.TestLogEntropyModel) ... ok
testTransform (gensim.test.test_models.TestLogEntropyModel) ... ok
testCorpusTransform (gensim.test.test_models.TestLsiModel)
Test lsi[corpus] transformation. ... ok
testOnlineTransform (gensim.test.test_models.TestLsiModel) ... ok
testPersistence (gensim.test.test_models.TestLsiModel) ... /usr/lib/python2.7/site-packages/scipy/sparse/compressed.py:122: UserWarning: indices array has non-integer dtype (float64)
  % self.indices.dtype.name )
ok
testTransform (gensim.test.test_models.TestLsiModel)
Test lsi[vector] transformation. ... ok
testPersistence (gensim.test.test_models.TestRpModel) ... ok
testTransform (gensim.test.test_models.TestRpModel) ... ok
testInit (gensim.test.test_models.TestTfidfModel) ... ok
testPersistence (gensim.test.test_models.TestTfidfModel) ... ok
testTransform (gensim.test.test_models.TestTfidfModel) ... ok
testChunking (gensim.test.test_similarities.TestMatrixSimilarity) ... ok
testFull (gensim.test.test_similarities.TestMatrixSimilarity) ... ok
testIter (gensim.test.test_similarities.TestMatrixSimilarity) ... ok
testNumBest (gensim.test.test_similarities.TestMatrixSimilarity) ... ok
testPersistency (gensim.test.test_similarities.TestMatrixSimilarity) ... ok
testChunking (gensim.test.test_similarities.TestSimilarity) ... ok
testFull (gensim.test.test_similarities.TestSimilarity) ... ok
testIter (gensim.test.test_similarities.TestSimilarity) ... ok
testNumBest (gensim.test.test_similarities.TestSimilarity) ... ok
testPersistency (gensim.test.test_similarities.TestSimilarity) ... ok
testChunking (gensim.test.test_similarities.TestSparseMatrixSimilarity) ... ok
testFull (gensim.test.test_similarities.TestSparseMatrixSimilarity) ... ok
testIter (gensim.test.test_similarities.TestSparseMatrixSimilarity) ... ok
testNumBest (gensim.test.test_similarities.TestSparseMatrixSimilarity) ... ok
testPersistency (gensim.test.test_similarities.TestSparseMatrixSimilarity) ... ok
testBuild (gensim.test.test_corpora_dictionary.TestDictionary) ... ok
testDocFreqAndToken2IdForSeveralDocsWithOneWord (gensim.test.test_corpora_dictionary.TestDictionary) ... ok
testDocFreqForOneDocWithSeveralWord (gensim.test.test_corpora_dictionary.TestDictionary) ... ok
testDocFreqOneDoc (gensim.test.test_corpora_dictionary.TestDictionary) ... ok
testFilter (gensim.test.test_corpora_dictionary.TestDictionary) ... ok
test_saveAsText_and_loadFromText (gensim.test.test_corpora_dictionary.TestDictionary)
`Dictionary` can be saved as textfile and loaded again from textfile. ... ok
test_miislita_high_level (gensim.test.test_miislita.TestMiislita) ... ok
test_save_load_ability (gensim.test.test_miislita.TestMiislita) ... ok
test_textcorpus (gensim.test.test_miislita.TestMiislita)
Make sure TextCorpus can be serialized to disk. ... ok
test_None (gensim.test.test_utils.TestIsCorpus) ... ok
test_int_tuples (gensim.test.test_utils.TestIsCorpus) ... ok
test_invalid_formats (gensim.test.test_utils.TestIsCorpus) ... ok
test_simple_lists_of_tuples (gensim.test.test_utils.TestIsCorpus) ... ok
test_corpus (gensim.test.test_lee.TestLeeTest)
availability and integrity of corpus ... ok
test_lee (gensim.test.test_lee.TestLeeTest)
correlation with human data > 0.6 ... ok
test_index (gensim.test.test_simserver.SessionServerTester)
test remote server incremental indexing ... No handlers could be found for logger "gensim_server"
ok
test_model (gensim.test.test_simserver.SessionServerTester)
test remote server model creation ... ok
test_optimize (gensim.test.test_simserver.SessionServerTester) ... ok
test_payload (gensim.test.test_simserver.SessionServerTester)
test storing/retrieving document payload ... ok
test_query_document (gensim.test.test_simserver.SessionServerTester) ... ok
test_query_id (gensim.test.test_simserver.SessionServerTester) ... ok
test_sessions (gensim.test.test_simserver.SessionServerTester)
check similarity server transactions (autosession off) ... ok
test_load (gensim.test.test_corpora.TestBleiCorpus) ... ok
test_save (gensim.test.test_corpora.TestBleiCorpus) ... ok
test_serialize (gensim.test.test_corpora.TestBleiCorpus) ... ok
test_load (gensim.test.test_corpora.TestMmCorpus) ... ok
test_save (gensim.test.test_corpora.TestMmCorpus) ... ok
test_serialize (gensim.test.test_corpora.TestMmCorpus) ... ok
test_load (gensim.test.test_corpora.TestSvmLightCorpus) ... ok
test_save (gensim.test.test_corpora.TestSvmLightCorpus) ... ok
test_serialize (gensim.test.test_corpora.TestSvmLightCorpus) ... ok

----------------------------------------------------------------------
Ran 67 tests in 30.186s

OK
Exception in thread Thread-78 (most likely raised during interpreter shutdown):
 Exception in thread Thread-63 (most likely raised during interpreter shutdown):Traceback (most recent call last):


Radim

unread,
Sep 15, 2011, 6:53:23 AM9/15/11
to gensim
Thx Dieter. I suspect the final exception comes from the SqliteDict
module, which in its older versions tried to clean up on shut-down.
That sometimes failed when the Python interpreter already destructed
some of its required imports on shutdown. The new, fixed version of
sqlitedict (1.0.5) shouldn't do this ... but in any case, it has no
relevance to the simserver tests.

Cheers,
Radim

Christian Winkelmann

unread,
Oct 26, 2011, 12:55:14 PM10/26/11
to gensim
Hi, I am evaluating the gensim simserver ( non daemon mode ) and used
it so far with a Flask Server Rest Interface but with to many
concurrent requests ( about 3 per second for indexing and querying
each ) I am getting more and more database error and disk I/O
problems. Therefore I wanted to test the simserver in daemon mode, but
isn't right there.
As stated in gensim/test/run_simserver.py
$ python -m gensim.test.run_simserver /tmp/testserver
should be enough to start it but I get an error message.

INFO : run_simserver:29 : <module>(MainThread) : running /usr/local/
lib/python2.7/dist-packages/gensim/test/run_simserver.py /tmp/
testserver
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/gensim/test/
run_simserver.py", line 39, in <module>
server = gensim.similarities.SessionServer(basename)
AttributeError: 'module' object has no attribute 'SessionServer'

I had quite the same error before after upgrade gensim from 8.0 to 8.1
and needed to change all my imports to
from gensim.similarities.simserver import SessionServer and then
create the server object by service = SessionServer(self.rootlocation,
autosession=True).

I installed gensim via pip install gensim --upgrade.

Before changing your code by myself and then get into conflict with
new versions I'd like to know if the problem only exists with my
installation.

Thanks
Christian

Radim

unread,
Oct 26, 2011, 2:35:07 PM10/26/11
to gensim
Hello Christian,

the SessionServer import in run_server.py is indeed just a neglect bug
(see https://github.com/piskvorky/gensim/issues/55 ). The correct way
to import this class is via `from gensim.similarities.simserver import
SessionServer`, just like you did.


On Oct 26, 6:55 pm, Christian Winkelmann <kar...@gmail.com> wrote:
> Hi, I am evaluating the gensim simserver ( non daemon mode ) and used
> it so far with a Flask Server Rest Interface but with to many
> concurrent requests ( about 3 per second for indexing and querying
> each ) I am getting more and more database error and disk I/O
> problems. Therefore I wanted to test the simserver in daemon mode, but
> isn't right there.

Well if the local mode doesn't work, remote mode won't help you :-)

Let's work this out together. How are you using the server? 3 queries
per second is nothing, that should be super fast if the index is
optimized. 3 calls to `index` can take a long time, it depends on how
many documents you index in each call. Also if you start a new
transaction for each index call, that's also very costly (it
clones=copies the entire server). Can you send me a server log (at
least at "info" level, better "debug") to radimr...@seznam.cz?

Btw I am also interested in offering the remote server through a
RESTful interface, instead of only Pyro. This should be part of the
next release, where I want to improve the similarity server and also
change the gensim license (LGPL doesn't cover remote services at all).
Let me know if you're interested in cooperating on this
functionality ... for example via github, where gensim source code is
hosted.

Cheers,
Radim

Christian Winkelmann

unread,
Oct 28, 2011, 11:06:00 AM10/28/11
to gensim
Hi Radim,

the simserver in daemon mode runs now like a charm.

sometime there are errors like:

2011-10-28 16:39:50,663 : INFO : simserver:744 :
open_session(Thread-5) : opening a new session
2011-10-28 16:39:50,663 : INFO : simserver:745 :
open_session(Thread-5) : removing gensimTraining35/a
2011-10-28 16:39:50,667 : INFO : simserver:751 :
open_session(Thread-5) : cloning server from gensimTraining35/b to
gensimTraining35/a
2011-10-28 16:39:50,681 : INFO : utils:140 : load(Thread-5) : loading
SaveLoad object from gensimTraining35/a/index_fresh
2011-10-28 16:39:50,688 : INFO : simserver:107 :
check_moved(Thread-5) : index seems to have moved from
gensimTraining35/b/index_fresh to gensimTraining35/a/index_fresh;
updating locations
2011-10-28 16:39:50,689 : INFO : sqlitedict:87 : __init__(Thread-5) :
opening Sqlite table 'unnamed' in gensimTraining35/a/
index_fresh.id2sims
2011-10-28 16:39:50,690 : INFO : utils:140 : load(Thread-5) : loading
SaveLoad object from gensimTraining35/a/index_opt
2011-10-28 16:39:50,690 : INFO : utils:140 : load(Thread-5) : loading
SimModel object from gensimTraining35/a/model
2011-10-28 16:39:50,692 : INFO : sqlitedict:87 : __init__(Thread-5) :
opening Sqlite table 'unnamed' in gensimTraining35/a/payload
2011-10-28 16:39:50,692 : INFO : utils:147 : save(Thread-5) : saving
SimIndex object to gensimTraining35/a/index_fresh
2011-10-28 16:39:50,701 : INFO : utils:147 : save(Thread-5) : saving
SimModel object to gensimTraining35/a/model
2011-10-28 16:39:50,702 : INFO : sqlitedict:87 : __init__(Thread-5) :
opening Sqlite table 'unnamed' in /tmp/sqldict579e3a
2011-10-28 16:39:50,703 : INFO : simserver:402 : __init__(Thread-5) :
loaded SimServer(loc='gensimTraining35/a', fresh=SimIndex(4676 docs,
8998 real size), opt=None, model=SimModel(method=lsi,
dict=Dictionary(448 unique tokens)), buffer=SqliteDict(/tmp/
sqldict579e3a))
2011-10-28 16:39:50,704 : INFO : sqlitedict:200 :
terminate(Thread-5) : deleting /tmp/sqldict579e3a
2011-10-28 16:39:50,704 : INFO : sqlitedict:87 : __init__(Thread-5) :
opening Sqlite table 'unnamed' in /tmp/sqldict81847a
Exception in thread Thread-1435:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in
__bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/sqlitedict-1.0.7-
py2.7.egg/sqlitedict.py", line 253, in run
cursor.execute(req, arg)
OperationalError: disk I/O error

That happens every 3 to 4 indexing processes which currently only
happens every 30 seconds, but it seems it doesn't break anything. New
documents get indexed for hours. So now harm done...
Besides this there hundreds of sqldict***** files in the /tmp folder
which should usually get deleted after committing a session. Maybe
over the weekend I will try to just move the server sqldict directory
onto a ramdisk. Besides do you see a possibility to move the the
models into either sqlite memory tables or even redis? Just pickling a
model and save the serialized object into redis should be easy I
guess.


On Oct 26, 8:35 pm, Radim <radimrehu...@seznam.cz> wrote:
> Hello Christian,
>
> the SessionServer import in run_server.py is indeed just a neglect bug
> (seehttps://github.com/piskvorky/gensim/issues/55). The correct way
> to import this class is via `from gensim.similarities.simserver import
> SessionServer`, just like you did.
>
> On Oct 26, 6:55 pm, Christian Winkelmann <kar...@gmail.com> wrote:
>
> > Hi, I am evaluating the gensim simserver ( non daemon mode ) and used
> > it so far with a Flask Server Rest Interface but with to many
> > concurrent requests ( about 3 per second for indexing and querying
> > each ) I am getting more and more database error and disk I/O
> > problems. Therefore I wanted to test the simserver in daemon mode, but
> > isn't right there.
>
> Well if the local mode doesn't work, remote mode won't help you :-)
>
> Let's work this out together. How are you using the server? 3 queries
> per second is nothing, that should be super fast if the index is
> optimized. 3 calls to `index` can take a long time, it depends on how
> many documents you index in each call. Also if you start a new
> transaction for each index call, that's also very costly (it
> clones=copies the entire server). Can you send me a server log (at
> least at "info" level, better "debug") to radimrehu...@seznam.cz?

That was exactly the case. It happened that we had sometimes three
concurrent indexing processes with only 1 very small document and now
we try to collect unindexed documents and then to a batch process.

>
> Btw I am also interested in offering the remote server through a
> RESTful interface, instead of only Pyro. This should be part of the
> next release, where I want to improve the similarity server and also
> change the gensim license (LGPL doesn't cover remote services at all).
> Let me know if you're interested in cooperating on this
> functionality ... for example via github, where gensim source code is
> hosted.

Our Rest Interface is quite simple so far. We use the Flask Python
Webserver to offer the Rest Service itself, then provide Training,
Indexing and Query classes. To allow batch processing, like training
1000 documents at once an sqlite table caches the training and index
set before actually processing anything.

Regards
Christian

>
> Cheers,
> Radim

Lars Heuer

unread,
Oct 29, 2011, 10:18:03 AM10/29/11
to Radim
Hi Radim,

[...]


> next release, where I want to improve the similarity server and also
> change the gensim license (LGPL doesn't cover remote services at
> all).

I hope you're not talking about a viral licenses like (A)GPL? That
would imply that gensim cannot be used by libs/apps which use a more
liberal license.

Best regards,
Lars
--
Semagia
<http://www.semagia.com/>

Radim

unread,
Oct 29, 2011, 7:01:35 PM10/29/11
to gensim
Hi all,

I see a lot of activity on the gensim website that does not translate
into any community feedback and help (hello New York and California!).
But as long as people find gensim useful, it's up to them and their
conscience (LGPL allows both commercial and personal use, as long as
people open-source any modifications and extensions of gensim that
they distribute to other people).

So I'm happy about that activity and I will keep gensim as LGPL.

However, I decided to branch off the similarity server stuff, because
a) its remote nature doesn't make sense under LGPL anyway and b) it is
too high-level to be called a library, it's more a product. In fact, I
already sell that product in my own projects, but I want other people
to be able to experiment and use the server too, for free. What I
don't want is for them to rip it off and resell, abusing my generosity
and the hundreds of hours I spent on gensim.

Therefore, I will be maintaining the similarity server part in a
separate project, **UNDER A DIFFERENT LICENSE**. That license will
most likely be Affero GPL (AGPLv3), so that individuals (students,
teachers etc.) can continue using it with no problems, while
proprietary use is severely limited (you'd have to open source your
entire app that uses it).

So technically, gensim will continue to be developed as it always was.
The new server features that were introduced in 0.8.1 will disappear
though, to re-appear as a separate Python project (called
gensim.simserver, in all likelihood).

If you have particular licensing needs, commercial or not, you can of
course still contact me directly. I am the sole copyright owner of the
similarity server, as well as the vast majority of gensim, so I can
offer you a different license or consulting ... if you give me a good
reason :)


Happy coding,
Radim

Romain Loth

unread,
Oct 29, 2011, 8:14:32 PM10/29/11
to gen...@googlegroups.com
Thanks for doing all this in such a balanced way.

Roman

Le 30/10/2011 01:01, Radim a �crit :

Dieter Plaetinck

unread,
Oct 30, 2011, 3:53:27 PM10/30/11
to gen...@googlegroups.com
fair enough, good luck Radim!

On Sun, Oct 30, 2011 at 2:14 AM, Romain Loth <rl...@u-paris10.fr> wrote:
Thanks for doing all this in such a balanced way.

Roman

Trent

unread,
Jul 21, 2015, 2:10:30 PM7/21/15
to gen...@googlegroups.com
Do you still have the more restrictive licensing in place?  It looks like this server hasn't had much GitHub activity in the last few years.

Radim Řehůřek

unread,
Jul 22, 2015, 5:25:44 PM7/22/15
to gensim, trent.n...@gmail.com
Yeah, the "public" simserver is dead / has not been maintained for years.

Related requests now go through my consultancy directly, as commercial projects.

Best regards,
Radim

-- 
Radim Řehůřek, Ph.D.
RaRe Consulting Ltd.
R&D in machine learning, natural language processing, data mining
Reply all
Reply to author
Forward
0 new messages