2005 Best Hits

0 views
Skip to first unread message

Dimple Belousson

unread,
Aug 3, 2024, 4:35:50 PM8/3/24
to joystourphohof

Forget BLAST. Use MMseqs2's easy-rbh module. The command is a simple one-liner, and you get the results in a BLAST-style output table. Here's a link to the relevant documentation: -best-hit-using-mmseqs-rbh.

Thanks you for your comment. However I have to start from two FASTA file: In a FASTA file there is the reference genome of a species, and in the second FASTA (subject) there is a small fraction of the genome of another species (this second FASTA would be a multi FASTA) . I would like to identifies the RBH for the subject genes. How can I start the pipeline from these data?

i) So amino acid sequences are better conserved, so in theory, yes. But this really depends on how sensitive you want the search to be and what you're looking for. I think for the generic use-case of trying to identify orthologs between two genomes, protein sequences work really well.

ii) There is no such thing as an RBH value, at least not that I know of. It's just two searches, with the queries and targets swapping places for the second search. You just get an e-value like you get with a regular MMseqs2/BLAST search. I assume you ran the search without specifying an output format (via --format-output), so it should have defaulted to the BLAST tabular format. The output columns in this case correspond to query, target, fident, alnlen, mismatch, gapopen, qstart, qend, tstart, tend, evalue, bits. In this example here, the e-value appears to be 0.000E+00.

And well, with BLAST you'd have to manually do everything easy-rbh has automated for you, so yes it would be more complicated. And it'll also be significantly slower. Tools like MMseqs2 have been developed specifically with the objective of superseding BLAST in mind.

A greatest hits album or best-of album is a type of compilation album that collects popular and commercially successful songs by a particular artist or band.[1] While greatest hits albums are typically supported by the artist, they can also be created by record companies without express approval from the original artist as a means to generate sales.[2] They are typically regarded as a good starting point for new fans of an artist, but are sometimes criticized by longtime fans as not inclusive enough or necessary at all.[3]

It is also common for greatest hits albums to include new recordings, remixes or unreleased alternate takes of the hit songs, plus other new material as bonus tracks to increase appeal for longtime fans (who might otherwise already own the recordings included). At times, a greatest hits compilation marks the first album appearance of a successful single that was never attached to a previous studio album. Greatest hits albums usually are released after an artist or band's contract with a major label is completed, they've been dropped or died, with next releases following on new labels.

The first greatest hits album was Johnny Mathis's Johnny's Greatest Hits, released in 1958.[4] The album collected eight of Mathis's charting singles, as well as three non-charting B-sides and an altogether new track. The album spent three weeks at the number one spot on Billboard's Best Selling Pop LP's chart. The greatest hits album format then gained popularity in the 1960s and 1970s among American and British rock and pop artists. One notable example was the Beach Boys 1974 album Endless Summer, which upon release was certified 3 platinum by the Recording Industry Association of America. It propelled them from an opening act for Crosby, Stills, Nash & Young to headlining their own tour in just a matter of weeks. Some artists were even popular enough to release multiple greatest hits albums during and after their career.

Greatest hits compilations were sometimes also released as 4-track 7" vinyl EPs. In the late 1960s, EMI Sweden released a series of greatest hits-EPs featuring artists such as The Supremes, Ray Charles and Louis Armstrong.[5]

By the 1990s, greatest hits albums were common for popular artists, with some artists even releasing the greatest hits album as a music video collection concurrently with the album. It also became a commercially viable option to boost popularity for artists with dwindling careers. Some bands refuse to release a greatest hits album, such as rock groups AC/DC, Tool, and Metallica. Garth Brooks had initially refused releasing one, but he eventually agreed to it in 1994 for a limited release[6] (the resulting record, The Hits, sold over ten million copies).

In 2000, Sony Music Entertainment launched their The Essential series, which collects singles and other career-defining tracks of artists licensed to Sony. The Essential Bob Dylan was the first in the series, and the company has since released dozens of albums in the series with other artists under their label. In addition to artist-specific collections, the series has also released genre-specific and themed albums, such as The Essential Christmas (collecting pop and rock covers of Christmas songs) or The Essential Australian Rock (collecting a specific regional output). In 2005, Universal Music Group launched a similar line, Gold, which collects artists' greatest hits onto two discs.

In the late 2000s and 2010s, digital downloads and music streaming services increased in popularity, which allow users to listen to their favorite tracks without the need of a greatest hits package. In 2016, Pitchfork said that "in the digital era, once a catalog enters a streaming service or an MP3 store, there's no need for a reissue and, therefore, there's no reason for a label to mine the vaults, searching for old music to make new again. Users can assemble their own personalized greatest hits playlists or just scan through an act's most accessed songs", which has led to greatest hits collections becoming redundant.[7]

Despite the popularity of streaming in the 2010s and early 2020s, some artists continued to issue physical greatest hits albums, including the White Stripes, Spoon, and the Weeknd.[8][9][2] Spoon lead singer Britt Daniel said he chose to compile 2019's Everything Hits at Once: The Best of Spoon out of an affinity for compilations such as Standing on a Beach by the Cure and Substance 1987 by New Order, which had introduced him to those artists in his youth, and to provide an official introduction to Spoon's catalog for new listeners.[2] Alex Kapranos of Franz Ferdinand echoed those sentiments when describing the decision to release the band's 2022 Hits to the Head compilation, stating that "I have friends who believe you're somehow not a 'real' fan if you own a best of rather than a discography. I disagree. I think of my parents' record collection as a kid. I loved their compilation LPs. I am so grateful that they had Changes or Rolled Gold. Those LPs were my entrance point. My introduction."[10]

The concept of greatest hits compilations has been adapted to other media as well. In television, some shows have released compilations of their critically successful and highest-rated episodes to drive new viewers to watch a program, such as Family Guy's Freakin' Sweet Collection and South Park: The Hits. Several video game companies have re-released popular games for continued sales, sometimes with discounted prices: Sony's PlayStation has released games under their Greatest Hits series; Nintendo has re-released games under the Nintendo Selects label (formerly called "Player's Choice"); and Microsoft has re-released games under the Platinum Hits label. Some video game franchises have released greatest hits collections of their own content, such as Super Mario All-Stars, Sonic Mega Collection, and Guitar Hero Smash Hits.

Now, I have two questions; given my command, are all blast hit is the best or I look at also other parameters, like identity and alignment length? Sharing your factors to select the best hit would be highly appreciated.

Sort by 1. query name, 2. bitscore, 3. evalue, 4. nucleotide identity, and extract the best line for each query (bitscore more important than evalue, evalue more important than nucleotide identity). I've been using en_US.UTF-8 locale, but I think this should also work with C (and be somewhat faster).

Standard outfmt 6 plus some extra fields after that. I'm guessing it's not really working if you used blastx (instead of IMO the far superior protein prediction + blastp approach), meaning that all your query names from the same contigs (or whatever) are identical. Tabular blastx output is pretty much unsuitable for automated sorting, unless input is short reads and you're not really expecting more than 1 protein per query sequence..

Just for clarification, as I used -max_target_seqs 20 in my blast command, I expected that all hits were best hit, but using suggested command I got about 27000 hit from 32000 hits as best hit. Please let me know how to explain this difference? Sorry for this question, I'm a new this filed and may be have a stupid question in your professional view! Thanks

This was a chance to correct years of neglect on my part, and take a best-hits tour of this 526-acre park. It would also be the perfect place to hide the fact that I would be walking around like a tourist, stopping every few feet to study my guidebook.

A few times, I got a little lost, or unsure of whether I was headed in the right direction. But it was hard to get too frustrated, considering the payoffs at each turn, like an in, out and around trip through the Ravine, which passes the serene Ambergill Falls. (You mean there are waterfalls in Brooklyn?!)

Another fascinating sight was the Quaker Cemetery, established in 1849, 24 years before Prospect Park. It is gated off and open only to Quakers and the relatives of Quakers buried there, like Montgomery Clift (which this history of Prospect Park also reveals). Provided you are still a card-carrying member of this faith, you can be buried here.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages