[boost] [gsoc] Suggestions on Proposal for Boost Document Library Project

Anurag Ghosh

unread,

Mar 18, 2015, 10:42:59 AM3/18/15

to bo...@lists.boost.org

Hello Everyone

I'm Anurag Ghosh, a 2nd year undergraduate student studying in
IIIT-Hyderabad, India and I'm interested in making the Boost Document
Library as a part of the Google Summer of Code 2015 Program.

My proposal document can be viewed at
https://github.com/anuragxel/boost-generic-document-library/wiki/Google-Summer-of-Code-2015-Proposal-for-Boost-Document-Library-Development

Kindly provide me with comments on the project proposal, as it may seem I
have missed out on something or the other. I'm hopeful that a discussion
would be very helpful in enriching my proposal.

Also, I have developed a working prototype (only for OpenOffice/LibreOffice
API currently) for this project (given as a competency test) whose code is
hosted at
https://github.com/anuragxel/boost-generic-document-library/
Also, Kindly suggest any changes as may deem fit to the code, I will make
the changes appropriately.

Thanks

Anurag Ghosh
(anuragxel)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Stefan Seefeld

unread,

Mar 18, 2015, 11:07:49 AM3/18/15

to bo...@lists.boost.org

Anurag,

On 18/03/15 09:59 AM, Anurag Ghosh wrote:
> Hello Everyone
>
> I'm Anurag Ghosh, a 2nd year undergraduate student studying in
> IIIT-Hyderabad, India and I'm interested in making the Boost Document
> Library as a part of the Google Summer of Code 2015 Program.
>
> My proposal document can be viewed at
> https://github.com/anuragxel/boost-generic-document-library/wiki/Google-Summer-of-Code-2015-Proposal-for-Boost-Document-Library-Development
>
> Kindly provide me with comments on the project proposal, as it may seem I
> have missed out on something or the other. I'm hopeful that a discussion
> would be very helpful in enriching my proposal.

Having worked a fair bit with documents, and in particular programmatic
processing of documents, I believe I can relate to the appeal to a
high-level API to facilitate the manipulation of such documents.

However, I think this requires a bit more thought. For one, I find your
proposal hugely ambitious. In other words, I have doubts that you can
achieve all the things you propose in a short period as this.
Second, I don't think an interface to existing office suites is the
right approach to the problem. Rather, I would suggest something based
on existing standard technologies such as XML (and DocBook in
particular), to support the manipulation of structured documents.

Note that last year we had a GSoC project to advance the state of a
(proposed) Boost.XML library (which I mentored). I believe it's straight
forward to build higher-level APIs on top of that to manipulate
documents on a more "semantic" level, and then leave it to the various
office suites to handle the import & export the chosen format (Libre-
and Open-Office already support DocBook). See
https://github.com/stefanseefeld/boost.xml.

Regards,

Stefan

--

...ich hab' noch einen Koffer in Berlin...

Antony Polukhin

unread,

Mar 18, 2015, 12:18:20 PM3/18/15

to boost@lists.boost.org List

2015-03-18 19:07 GMT+04:00 Stefan Seefeld <ste...@seefeld.name>:
<...>

> Second, I don't think an interface to existing office suites is the
> right approach to the problem. Rather, I would suggest something based
> on existing standard technologies such as XML (and DocBook in
> particular), to support the manipulation of structured documents.
>

Without using existing Office suit API, student will be forced to rewrite
the functionality of Open Office from scratch. Working with spreadsheets is
not just parsing document, but also evaluating functions, plotting charts
and so on... This is hell and nightmare. And it will take insane amount of
time ( about 2,190 years of effort
<https://www.openhub.net/p/libreoffice/estimated_cost>)

So the approach with unification of APIs seems right to me.

--
Best regards,
Antony Polukhin

Stefan Seefeld

unread,

Mar 18, 2015, 12:32:32 PM3/18/15

to bo...@lists.boost.org

On 18/03/15 12:18 PM, Antony Polukhin wrote:
> 2015-03-18 19:07 GMT+04:00 Stefan Seefeld <ste...@seefeld.name>:
> <...>
>
>> Second, I don't think an interface to existing office suites is the
>> right approach to the problem. Rather, I would suggest something based
>> on existing standard technologies such as XML (and DocBook in
>> particular), to support the manipulation of structured documents.
>>
> Without using existing Office suit API, student will be forced to rewrite
> the functionality of Open Office from scratch. Working with spreadsheets is
> not just parsing document, but also evaluating functions, plotting charts
> and so on... This is hell and nightmare. And it will take insane amount of
> time ( about 2,190 years of effort
> <https://www.openhub.net/p/libreoffice/estimated_cost>)
>
> So the approach with unification of APIs seems right to me.

Well, yes, that's what I meant with "manipulating documents on a
semantic level". I agree, writing this from scratch is wrong. But just
providing a programmatic interface to an office suite seems ill-designed
to me, at least for a project other than LibreOffice itself.

For Boost I think one should at least attempt to build the functionality
on top of a standard document model such as XML/DocBook.

Stefan

--

...ich hab' noch einen Koffer in Berlin...

Anurag Ghosh

unread,

Mar 18, 2015, 4:10:04 PM3/18/15

to bo...@lists.boost.org

Sir

Could you point to the specific parts which you think make the project
hugely ambitious ? Is it that the functionalities that I'm proposing too
many (ie. the scope is too broad) or that the different API's I'm thinking
to cover over different platforms a bit too difficult to achieve ?

Thanks

Anurag Ghosh

On Wed, Mar 18, 2015 at 10:02 PM, Stefan Seefeld <ste...@seefeld.name>
wrote:

Stefan Seefeld

unread,

Mar 18, 2015, 4:32:13 PM3/18/15

to bo...@lists.boost.org

On 18/03/15 04:03 PM, Anurag Ghosh wrote:
> Sir
>
> Could you point to the specific parts which you think make the project
> hugely ambitious ? Is it that the functionalities that I'm proposing too
> many (ie. the scope is too broad) or that the different API's I'm thinking
> to cover over different platforms a bit too difficult to achieve ?

Both. From a project planning perspective I would suggest to start with
a single API to bind to, covering small use-cases at a time (enable to
create or modify small text documents, create or edit small
spreadsheets, etc.), rather than a big and generic "expose OOo
functionality as C++ API".
Once that works you may consider adding support for other backends
(other word-processors ?).

But again, I'm not even convinced that this is in scope for Boost.org.
As I said, I would rather see such functionality covered in
application-agnostic terms based on a structured document model. While
still not necessarily in scope for Boost.org, at least it's more driven
by Open Architecture considerations.

Reply all

Reply to author

Forward