Mbd Publication Books

0 views
Skip to first unread message

Bonny Battaglino

unread,
Aug 4, 2024, 7:53:47 PM8/4/24
to deersmandechar
CambridgeUniversity Press publishes a wide range of research monographs, academic reference, textbooks, books for professionals, and large numbers of books aimed at graduate students. We publish more than 40,000 ebooks for the global market, with 1,500 new titles added on average each year alongside our print publications.

The first volume of two in an updated history of role-playing games, a comprehensive critical analysis of ancient DNA research, a study of architecture in Belgium, and more. Explore these books and a selection of our other new and soon-to-be-released titles for July.


The Food, Health, and the Environment series presents the theories, evidence, and strategies that enable scholars, practitioners, and activists to identify and advance just and resilient food, health, and environmental systems. Titles in the series offer critical analyses of food production, distribution, and consumption, from the global to the local, unmasking the political, economic, cultural, and technological dimensions of existing food systems and illustrating pathways for transformation.


We publish thousands of books, e-book collections, journal articles and key online products each year. Our work as a leading publisher champions the knowledge-maker: serving, connecting and sustaining communities of scholars, instructors, and professionals. Our goal is to ensure their knowledge and expertise makes the fullest possible impact. We are part of Taylor & Francis Group where together we foster human progress through knowledge.


Buy or rent textbooks, learn new subjects and skills on your own, or get the materials to prep for tests and certifications. Our books cover a wide range of topics, from accounting to world languages.


Copyright @ 2000-2024 by John Wiley & Sons, Inc., or related companies. All rights reserved, including rights for text and data mining and training of artificial intelligence technologies or similar technologies.


The driving forces shaping the success of books have been studied by various researchers over the years, explaining the role of writing styles [2], critics [3], book reviews [4], awards [5], advertisements [6], social network [7] and word of mouth effect [8], etc. However, predicting book success from multiple factors has received much less attention. The only published study in this area focused on book sales in the German market, applying a linear model [9] and reported limited accuracy.


Similar studies have focused on other cultural products, from music to movies, like using on-line reviews to forecast motion pictures sales [10], predicting the success of music and movie products by analyzing blogs [11], predicting success within the fashion industry using social media such as Instagram [12]. Nevertheless, the early-prediction of success is of great importance in cultural products. Early-prediction has been studied in various papers to address market needs for introducing new products [13], to predict movie box office success using Wikipedia [14] or to detect promoted social media campaigns [15]. Yet, predicting which cultural product will succeed before its release and understanding the mechanisms behind its success or failure remains a difficult task.


Readers tend to choose books by authors they have read before or books written by celebrities; they often have a strong preference for specific genres and are more likely to notice well marketed books. Our features are designed and consolidated with the domain experts to capture each of these aspects of book selection.


Clustering of genres under (A) fiction and (B) nonfiction. The results are generated with K-means algorithm with number of clusters \(k = 5\) for both fiction and nonfiction. The algorithm is based on the number of books and the top median sales for each genre, where the top median sales is the median sales of the top 100 most-selling books under this genre


We use various distribution descriptors (including the mean, median, standard deviations, 10th, 25th, 75th and 90th percentile, same hereafter) of book sales within each genre cluster, forming a genre cluster feature group. We form these set of features to quantify the properties of each explored distribution.


Seasonal fluctuations for book sales. The median of the one year sales of the top-selling books that published in the same month from 2008 to 2015. For fiction, sales increases in the summer, and October, November have the highest sales. For nonfiction, the increase in sales is not very significant over the summer months; instead October has the highest sales


In Bookscan data, each book is assigned to a publisher and an imprint. In the publishing industry, a publishing house usually has multiple imprints with different missions. Some imprints may be dedicated to a single genre: for example Portfolio under Penguin Random House only publishes business books. Each imprint independently decides which books to publish and takes responsibility for its editorial process and marketing. Some imprints are more attractive to authors because they offer higher advances and have more marketing resources. Additionally, more prominent imprints tend to be more selective, and books published by those imprints have higher sales.


Book sales follow a heavy-tail distribution (see Fig. 5), and in general the prediction and regression of such heavy-tailed distributions are challenging [26, 27]. Indeed, the higher-order moments and the variance of heavy-tailed distributions are not well-defined, and statistical methods based on assumptions of bounded variance leads to biased estimates. The literature on heavy-tail regression problem has developed methods based on prior correction or weighing data points [28, 29]. However, most regression methods show limited performance in learning non-linear decision boundaries and underpredict high-selling books. These high selling books, however, are the most important for publishers, hence for these accuracy is the most desired.


To address the imbalance and heavy-tail outcome prediction problems, we employed Learning to Place algorithm [30] which addresses the following problem: Given a sequence of previously published books ranked by their sales, where would we place a new book in this sequence and estimate sales based on this placement?


Learning to Place has two stages: (1) learn a pairwise preference classifier which predicts whether a new book will sell more or less than each book in the training set; (2) given information from stage 1, place the new book in the ordered list of previously published books sorted by their sales. Note that going from the pairwise preferences to even a partial ordering to a ranking is not trivial. The pairwise preferences may have conflicting predictions. For example, the classifier might predict that A is better than B, B is better than C, and C is better than A. Our majority-vote technique in the second stage is designed to resolve such conflicts by estimating the maximum likelihood of the data. We briefly describe two main stages of the Learning to Place algorithm and graphically explained in Fig. 6.


Learning to Place flowchart explanation. Training Phase: create pairwise feature concatenation for all book pairs in training set and train the Random Forest Classifier on the pairwise preferences. Testing Phase: (a) Predict pairwise preferences between new book and all book in the training set using the trained Random Forest Classifier. (b) Place new book in the given sequence of books from the training set ranked by sales. To obtain predicted sale for the new book, we simply take the highest voted interval and take the average of this interval as the predicted sale for the new book


Linear Regression We compare Learning to Place method with the Linear Regression method. We observe that most features we explored are heavy-tail distributed, and so are the one year sales. Therefore, we take the logarithm of our dependent and independent variables, obtaining the model:


K-Nearest Neighbors (KNN) We employ regression based on k-nearest neighbors as an additional baseline model. The target variable is predicted by local interpolation of the targets associated with the nearest neighbors in the training set. We employed same feature transformation as in the linear regression models with an Euclidean distance metric between instances and five nearest neighbors considered (\(k=5\)). The features are preprocessed in the same fashion as in Linear Regression.


Neural Network The above two baselines do not capture nonlinear relationship between features, therefore we use a simple Multilayer Perceptron with one layer of 100 neurons as another baseline. The features are preprocessed in the same fashion as Linear Regression.


To test the model, we use k-fold cross validation [32, 33]. We apply an evaluation method for each fold of the test sample. In our testing, we use \(k = 5\). For evaluation methods, we choose not to use the classic \(R^2\) score: the book sale is heavy-tailed distributed and we are more interested in the error in the log space. \(R^2\) is not well-defined in log space because the error does not follow a Gaussian distribution, the basic assumption behind \(R^2\). The performance measure are as follows:


AUC and ROC: Evaluate the ranking obtained through the algorithm directly with the true ranking. We consider the true value of each train instance as a threshold and we binarize any predicted value and target value depending on this threshold. Having these two binarized lists, we compute the true positive rate (TPR) and the false positive rate (FPR) for a given threshold. For various thresholds of high- and low-sale books, we compute true positive rates and false positive rates of the ROC (Receiver Operating Characteristic) curve and then calculate the AUC (Area Under Curve) score (see Additional file 1).

3a8082e126
Reply all
Reply to author
Forward
0 new messages