Parsing search results from rottentomatoes

34 views
Skip to first unread message

Miljan Milovanovic

unread,
Sep 9, 2014, 6:19:59 AM9/9/14
to comp...@googlegroups.com
I've been working on a small compojure project for a while and I'm having trouble with displaying search results from a particular site, in my case rottentomatoes.
Thing is, my project works like a movie helper with a simple search and managing SQL database, like a watchlist.

Since I've tried numerous ways to make this work, I'm hoping someone here could help me out on this.
This is how the homepage looks like:

 (defn view-input []
    (view-layout
     
   [:h2 "Find your Movie"]
    [:body {:style "font: 14pt/16pt helvetica; background-color: #F2FB78; padding-top:100px; text-align: center"  }
   (form-to [:post "/"]
      [:br]  [:br]
     (text-field {:placeholder "Enter movie name" } :a) [:br]
      (submit-button "Search")
                 )]))

Here I create url, using "a" as variable from placeholder "Enter movie name"

(defn create-flick-url [a]  

And this should be Search Result page, but it only returns h2 title and no results:
(defn view-output2 [a]    
  (view-layout
    [:h2 "Search results"]
    [:body {:style "font: 14pt/16pt helvetica; background-color: #F2FB78; padding-top:100px; text-align: center"  }
    [:form {:method "post" :action "/"}               
      (interleave
        (for [flick (flick-vec a)]        
        (label :flick_name (:name flick))) 
        (for [flick-name (flick-vec a)][:br])
        (for [flick-image (flick-vec a)]
           [:img {:id "img_movie" :src (:image flick-image)}])      
        
        (for [flick-read-link (flick-vec a)]
          (link-to (:read-link flick-read-link) "Read it here!"))
        
        (for [flick (flick-vec a)]
        [:br]))]]))

flick-vec is a function that makes a map from parsed segments of the rottentomatoes site:
(defn flick-vec [a]
           (vec (let [flick-url (create-flick-url a)
                     flick-names (print-flick-name-content flick-url)]
                     (mapper-gen3 flick-names
                     (get-image-content flick-url) 
     (get-all-reading-links flick-url)
   ))) )
Whereas mapper-gen3 makes a map from 3 parameters:
(defn mapper-gen3
  [names images read-link] (sort-by :name (map #(hash-map :name %1 :image %2 :read-link %3) names images read-link )))

And this is the PageParser.clj part where I define functions used in flick-vec:
  
  (defn get-page
  "Gets the html page from passed url"
  [url]
  (html/html-resource (java.net.URL. url))[:h3 :span :a])

  
(defn name+search
  "This function returns a sequence of h2 tags,where a.main is parsed from movieweb"
  [url]
  (html/select (get-page url) 
             [:h3 (html/attr= :class "nomargin"):a]))     

 
(defn image+search
  "Function that returns a sequence of tags, where img is image parsed from movieweb"
  [url]
  (html/select (get-page url) 
             [:span (html/attr= :class "movieposter"):a]))    
    

(defn print-flick-name-content
  [url]
  (vec (flatten (map :content (name+search url)))))

(defn get-image-content
  [url]
  (vec (flatten (map #(re-find #"http.*jpg" %) (map :style (map :attrs (image+search url)))))))

Anyway, I'll include whole files here, but I listed functions that I use the most and with whom I'm having the most trouble with. I'm just trying to get search results from rottentomatoes,after typing into placeholder, to look like this:

["movie name"] ["movie image"]
["link to movie review"]
...
...
...and so on
Or anything like that. Thanks everyone in advance!

middleware.clj
PageParser.clj
pages.clj
reorder.clj

James Reeves

unread,
Sep 9, 2014, 1:35:25 PM9/9/14
to Compojure
It might be more useful to dump these files, and the project.clj file, into a repository. That would make it easier to test and access.

What debugging have you done already? Have you tested the function that downloads the data from rottentomatoes? Have you tested the function that parses the data?

When debugging Clojure applications, start by making sure the most low level functions work correctly, then work slowly up to the top layer.

- James


--
You received this message because you are subscribed to the Google Groups "Compojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to compojure+...@googlegroups.com.
To post to this group, send email to comp...@googlegroups.com.
Visit this group at http://groups.google.com/group/compojure.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Miljan Milovanovic

unread,
Sep 9, 2014, 6:27:58 PM9/9/14
to comp...@googlegroups.com, ja...@booleanknot.com
I tried parsing the data and saw that variable "a" takes value from placeholder, as it should but the Search Result page is blank a part from its title.
  I'm still a newbie at this language but I need this to finish since its the last exam before my diploma.

Anyway, I already posted a repository on GitHub, few days ago, here's the link https://github.com/vombat89/movie-search-compojure
I already made couple of functions that parse latest news from other sites and display them properly, so lets say those low level functions work fine :)

I suspect there's an error in name+search or image+search functions, cause create-flick-url properly creates url from variable "a".

Thanks for reply James :)

James Reeves

unread,
Sep 9, 2014, 11:33:40 PM9/9/14
to Compojure
I'm not sure quite how to say this, but the code you have written is extremely difficult to read, to put it mildly.

The first barrier to understanding your code is that the indentation is very inconsistent. Consistent indentation in any programming language is important, but it's vital to get right in Lisps like Clojure.

You also seem to have a lot of code that does nothing, or is ignored, or is commented out. This makes it very difficult to discover which parts of your application are intended to work, and which are discarded works-in-progress.

There's very little separation of concerns in your code. Everything is mixed together. You need to be constructing functions that perform a single task, to isolate them into simple, independent tools.

So, take your application one step at a time. Start by building a function that can query the website you want to pull data from, and return a string of HTML. Test this in a REPL to ensure it works. When it does, write a function that takes the HTML and parses it into map of the data you want. Again, test this in the REPL to ensure it works. Only when this part is operational should you worry about connecting it up to a website.

- James


--
Reply all
Reply to author
Forward
0 new messages