epub from the posts

Affichage de 15 messages sur 5
epub from the posts Attila Lendvai 04/12/13 22:16
hi,

i've downloaded an epub of the lesswrong posts, and was reading it with
great pleasure.

then i saw that there are some broken external links, so i looked at how
it was generated. then i saw that the whole lesswrong site is
opensource, so here i am looking for feedback on the following plans of
mine:

i'm considering implementing a feature that generates ebooks in epub
format from the posts, just like the numerous (?!) crawlers that are
available, but as an integral part of the lesswrong site working from
the database.

http://dato.github.io/lesswrong-bundle/
https://github.com/jb55/lesswrong-print
http://hg.ciphergoth.org/scrape-sequences/
https://github.com/OneWhoFrogs/lw2ebook.git

would this be desirable?

is the database dump available for download? if not, then is there any
way i could work on this feature?

any hints on what to use as an organizational structure? i've just
started to consume the content, so that part is beyond me for a
while... i.e. how many standalone, non-interlinked books can be
generated? maybe just one big epub? what ordering and/or categorization
to use?

and another idea would be to write a piece of code that checks for
broken links and creates a report, so that admins can deal with the
situation either by linking to archive.org or by looking up the new home
of the content if available. maybe a way to store and present link
alternatives?

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
Your conscience never stops you from doing anything. It just stops you from enjoying it.
Re: epub from the posts Attila Lendvai 10/12/13 09:05
is anyone listening who has the authority to answer my inquiry?

i'm asking because i'm trying to decide whether to move on to a
different forum with my questions, or just wait until people catch up
with life.

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“A private central bank issuing the public currency is a greater
menace to the liberties of the people than a standing army. [...] We
must not let our rulers load us with perpetual debt.”
        — Thomas Jefferson (1743–1826)
Re: epub from the posts Jeff Schwaber 10/12/13 09:46

You probably want to email the tricycle folks directly. They do respond here sometimes, but not as readily.

Jeff

--
You received this message because you are subscribed to the Google Groups "LessWrong-Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lesswrong-de...@googlegroups.com.
To post to this group, send email to lesswr...@googlegroups.com.
Visit this group at http://groups.google.com/group/lesswrong-dev.
For more options, visit https://groups.google.com/groups/opt_out.
Re: epub from the posts Matthew Fallshaw 11/12/13 17:40
On Thursday, December 5, 2013 5:16:09 PM UTC+11, Attila Lendvai wrote:

i'm considering implementing a feature that generates ebooks in epub
format from the posts…
would this be desirable?

Implementing this in the code doesn't seem to be significantly better than implementing an independent scraper, and it increases the amount of code we have to maintain. I think this is not a desirable feature.
 

is the database dump available for download?

No it is not. The db dump is full of secrets like private messages and information that could link public user accounts with personal info. We do not share the dump easily.

… and another idea would be to write a piece of code that checks for
broken links and creates a report, so that admins can deal with the
situation either by linking to archive.org or by looking up the new home
of the content if available. maybe a way to store and present link
alternatives?

There are a lot of broken links, and updating them all manually is lots of work.
Rather than implementing this feature in the codebase I again suggest scraping. If you or any of the editors has the appetite for this work I suggest using an existing link checker (eg. http://wummel.github.io/linkchecker/).

… but if you're keen to contribute programming skill to the project, there is lots we'd love help with. Have a look at https://code.google.com/p/lesswrong/issues/list and see if there's anything there you'd like to have a go at, then see https://github.com/tricycle/lesswrong/wiki for an intro to developing the codebase.

(Sorry about the delay responding.)
Re: epub from the posts Attila Lendvai 13/12/13 00:53
i'm considering implementing a feature that generates ebooks in epub
format from the posts…
would this be desirable?

Implementing this in the code doesn't seem to be significantly better than implementing an independent scraper, and it increases the amount of code we have to maintain. I think this is not a desirable feature.


if you exclude the implementer's efforts from the weighting then you're right, it's not significantly better.

i understand your perspective: it would be more LoC in something you maintain, and if you don't see enough value in the feature itself, then it's just extra burden. my perspective is different because i think there's value in a publicly available and automatically generated epub snapshot (i am often at places where there's no internet connection and i have time to read).

  

is the database dump available for download?

No it is not. The db dump is full of secrets like private messages and information that could link public user accounts with personal info. We do not share the dump easily.


well, of course. sorry for not being clear enough: what i meant is whether anything exists already that slices out the public part of the database (and potentially offers for public backup). i've seen too many useful stuff disappearing from the internet to be slightly worried in general about this. and it helps contributors also to test their changes.


… and another idea would be to write a piece of code that checks for
broken links and creates a report, so that admins can deal with the
situation either by linking to archive.org or by looking up the new home
of the content if available. maybe a way to store and present link
alternatives?

There are a lot of broken links, and updating them all manually is lots of work.
Rather than implementing this feature in the codebase I again suggest scraping. If you or any of the editors has the appetite for this work I suggest using an existing link checker (eg. http://wummel.github.io/linkchecker/).


the same applies again regarding the weighting. i don't care about broken links on the site in general, i only care about broken links in the more valuable content - the posts.

writing a remote scraper that filters out the useful content is wasted effort compared to going through the database, and setting up a cron job as part of the site that drops a mail to interested parties every once in a while with the results.

 
… but if you're keen to contribute programming skill to the project, there is lots we'd love help with. Have a look at https://code.google.com/p/lesswrong/issues/list and see if there's anything there you'd like to have a go at, then see https://github.com/tricycle/lesswrong/wiki for an intro to developing the codebase.


i had gone through those links already, but motivation is a bitch... ;)

i see enough value in a nicely organized and packaged ebook of the posts to motivate me to consider working on it - hence this mail and my research on the past, arguably wasted and duplicated, efforts.

but having to process derived (and for my purposes obfuscated) information, that can change and render all my code obsolete... is weight on the scale, next to the fact that there are already epubs of varying quality available.


(Sorry about the delay responding.)


no worries, thank you for your time, and for lesswrong in general!

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If pigs could vote, the man with the slop bucket would be elected swineherd every time, no matter how much slaughtering he did on the side.”
— Orson Scott Card (1951–)