Hi All - I notice Markdown isn't recognised as a format in PRONOM yet. Difficult one, because what signature could be used? And should there be another .md extension? What's the general consensus on these concerns?
--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+unsubscribe@googlegroups.com.
To post to this group, send email to droid...@googlegroups.com.
Visit this group at https://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.
The only way to do it efficiently as far as I can see is to create a new kind of signature. It's possible to recognise languages with key words, and structured and semi structured text with other heuristics. You also have to recognise the character encoding.
I do have code to recognise text files as opposed to binary, extracting their encoding at the same time. Would also need efficient keyword recognisers such as the Aho Corsasik search algorithm to avoid grinding DROID to a halt.
Definitely something which would be generally useful given how many text file formats there are. Would also be good for the existing HTML and XML binary signatures which are fairly inefficient and inaccurate at the moment.
Cheers
Matt
On a sidenote we ran an internal Machine Learning hackathon at The National Archives last week and one of the teams looked at using ML approaches to attempt to distinguish between different types of source code (just using 'random' content from GitHub as both training and validation data. There were some promising early results, especially given the 2-day time-frame, and we'd like to look at it further....
--
You received this message because you are subscribed to a topic in the Google Groups "droid-list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/droid-list/SFg-nPSSEQs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to droid-list+unsubscribe@googlegroups.com.
Markdown was created by John Gruber and Aaron Swartz circa 2004. The purpose ofMarkdown is to let users write clean text-based documents that do not sufferfrom the legibility issues of other 'markup' formats.Markdown uses combinations of characters, for example, hashes (pound-sign),asterisks, and combination of square- and rounded- brackets to prefix or suffixparts of text. The symbols provide instructions to an interpreter. Asingle hash '#' for example, that prefixes a line of text is an instruction tomake that line a top-level header in a formatted document. Two hashes '##' is aninstruction to render, or output, a secondary header. And so on.Ultimately the result of writing markdown is a document that can be parsed intoa well-formed version of other presentation languages such as HTML or XHTML.There is no single specification for Markdown, nor is there a single canonicaloutput. That is, Markdown syntax could be converted into many other file types.A good description of the background of Markdown and a list of Markdown'flavors' can be found on the Archiveteam Just Solve It File Formats Wiki:Wikipedia also provides a thorough description of Markdown and its syntax:
--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.