Downloading the textual content of Facebook URLs

24 views
Skip to first unread message

Eduardo Ochs

unread,
Aug 15, 2014, 12:48:11 AM8/15/14
to fb...@googlegroups.com
Hi Fbcmd people,

is there a way to use fbcmd to download the (textual) content of Facebook URLs like these?


Something that would output what it got from FB in a raw-ish form - JSON? - would be ideal. Any hints welcome, as I am trying to write a set of scripts for caching texts posted to FB that I may want to access quickly later... the code that converts URLs to local file names is ready - the URLs above are associated to files with these names,

posts_sergio.martins.984991_10152616093738086
photos_jornalanovademocracia_a.288492381220437.66632.187051701364506_679809862088685
pesfi_921476867869306_347772661906399
photofs_10201336092313990_a.1569106477271.73917.1523735650

but right now I the only ways I have to put contents into these files - for playing with a prototype - is with cut-and-paste between a browser and Emacs... which is not fun.

  Thanks in advance, cheers =),
    Eduardo Ochs



P.S.: here is a similar project that I am working on, which is keeping local copies of videos: http://angg.twu.net/youtube-db/README.html

P.P.S.: I wrote "downloading" but what I really meant was "reading from Facebook and outputting to stdout"...

B. Henry

unread,
Aug 21, 2014, 3:58:27 PM8/21/14
to fb...@googlegroups.com
I have not tried this, but will experiment when I have a chance.
Why do you not just use wget or something similar to download this content. You can filter out images I think, or if not
just delete the files you do not want from the created folder(s)?
--
B.H.
> --
> ---
> You received this message because you are subscribed to the Google
> Groups "fbcmd" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]fbcmd+un...@googlegroups.com.
> For more options, visit [2]https://groups.google.com/d/optout.
>
> References
>
> 1. mailto:fbcmd+un...@googlegroups.com
> 2. https://groups.google.com/d/optout

Eduardo Ochs

unread,
Aug 21, 2014, 6:27:45 PM8/21/14
to fb...@googlegroups.com
A call to wget with my username, password, and a "-U
$MOZILLA_USER_AGENT" could work in theory, but even if the page did
not need any Javascript to render its initial contents it would be
hard to parse, some comments would be truncated, etc...

I did try to parse the html for a while, months ago, and also the
output of marking with ctrl-A the whole text of the page displayed in
the browser, and then copying-and-pasting that to an Emacs buffer...
both things were very frustrating, and I was stumbling all the time on
corner cases and on rules that I had to guess. Having the contents of
posts as JSON will probably make things much easier.

By the way: is it possible to load fbcmd's functions from "php5 -a"
and call its functions directly? I am using Debian stable, with php5.5
and php5-readline from http://packages.dotdeb.org/ , and I normally
run interactive programs from Emacs, using this trick here - the demo
starts at 0:16 -


but the interactive mode of "php5 -a" is a bit limited - for example,
autoloads don't work... it would certainly be much easier to just
run these things,

$p_id = '754664537913868'
echo get_json_of_page_from_id($p_id);
echo get_json_of_page_from_url($p_url);

than to also have to implement command-line options for calling these
functions...

  Cheers, TIA &c =),
    Eduardo Ochs

B. Henry

unread,
Aug 22, 2014, 3:03:03 PM8/22/14
to fb...@googlegroups.com
Of course, makes sense...I was just replying off the top of my head, but I do not think fbcmd will be helpful for this.
> > [1]https://www.facebook.com/sergio.martins.984991/posts/
> 10152616093738086
> > [2]https://www.facebook.com/jornalanovademocracia/photos/
> a.288492381220437
> > .66632.187051701364506/679809862088685/
> > [3]https://www.facebook.com/permalink.php?story_fbid=
> 921476867869306&id=34
> > 7772661906399
> > [4]https://www.facebook.com/photo.php?fbid=
> > [5]eduar...@gmail.com
> > [6]https://www.facebook.com/eduardo.ochs
> > [7]http://angg.twu.net/
> >
> > P.S.: here is a similar project that I am working on, which is
> keeping
> > local copies of videos: [8]http://angg.twu.net/youtube-
> db/README.html
> >
> > P.P.S.: I wrote "downloading" but what I really meant was
> "reading from
> > Facebook and outputting to stdout"...
>
> --
> ---
> You received this message because you are subscribed to the Google
> Groups "fbcmd" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [9]fbcmd+un...@googlegroups.com.
> For more options, visit [10]https://groups.google.com/d/optout.
>
> References
>
> 1. https://www.facebook.com/sergio.martins.984991/posts/10152616093738086
> 2. https://www.facebook.com/jornalanovademocracia/photos/a.288492381220437
> 3. https://www.facebook.com/permalink.php?story_fbid=921476867869306&id=34
> 4. https://www.facebook.com/photo.php?fbid=10201336092313990&set=a.1569106
> 5. javascript:/
> 6. https://www.facebook.com/eduardo.ochs
> 7. http://angg.twu.net/
> 8. http://angg.twu.net/youtube-db/README.html
> 9. mailto:fbcmd+un...@googlegroups.com
> 10. https://groups.google.com/d/optout
Reply all
Reply to author
Forward
0 new messages