Version History =============== cognition/0.1-alpha1 (2008-02-15) :- Initial release. * initial release * metadata: , , , @role, eRDF * eRDF does not support rdf:type syntax * RFC 2731 is supported for namespaces * microformats: hcard, hcalendar, adr, geo - hcalendar support assumes page is one giant calendar - no support for rel-tag, so no support for categories in hcard or hcalendar - geo support includes body, altitiude and reference-frame extensions - microformats patterns: include-pattern, abbr-pattern, extensions + include-pattern supports my alternative syntax + abbr-pattern supports Andy Mabbett's alternative * RDF output of namespaced metadata cognition/0.1-alpha2 (2008-02-20) :- Stop using XML::XPath; support for @xmlns; support hCalendar, rel=tag, rel=license, figure, XOXO; parse document structure from headings. * drop usage of XML::XPath module, using XML::DOM instead - might use XML::DOM::XPath in future if XPath support is needed * support XML namespaces used as metadata namespaces. * microformats: hcalendar (complete), rel-tag, rel-license, figure, xoxo - rel-licence extended to support searches for 'license' in CC or DCTERMS namespaces; or 'rights.license' in DC or DCTERMS namespaces - experimental figure microformat based on current brainstorming * parse document structure (headings + semantic tables + semantic images/figures microformat? + xoxo lists) cognition/0.1-alpha2.1 (2008-02-21) :- Bugfixes. * Fix handling for entities. * Fix delay on LWP::RobotUA. cognition/0.1-alpha3 (2008-03-01) :- Use GNOME XML library; support for CURIEs; use RDF triples to internally represent data; RDFa support! * Switch from XML::DOM to XML::LibXML. Should be my last big parser change! * Restructure object to be more tuple-like. * URLs: - Support for CURIEs. - support for geo: and tag: URIs - use XPointer to provide URLs for document fragments without identifiers * RDF: - use <rdf:Bag> to wrap multiple tuples with the same subject and property - Remove duplicate values within bags - add support for microformats to RDF output - RDF subjects may have multiple URIs defined to help match up properties that actually belong to the same subject (e.g. some properties might be attached to a fragment identifier, and others to an hcard, but if we know that the hcard root element has an id attribute which matches the fragment identifier, then we can equate the subjects) - support "vocabularies" for RDF - convert document structure to RDF <http://purl.org/dc/terms/hasPart>, <http://purl.org/dc/terms/isPartOf>. * Improve STRINGIFY to prevent all these leading and trailing spaces * Recognise (X)HTML predefined link types and put them in XHTML namespace. * More reliable support for namespaces. * Microformats: - Properly parse DateTimes found in microformats. - support table cell header pattern - support hcalendar 1.1 draft * Complete support for RDFa * Much improved support for eRDF, support rdf:type. Any bugs? * Improved support for XHTML role attribute cognition/0.1-alpha4 (2008-03-07) :- Rudimentary GRDDL; better charset handling; better support for tag soup. * Support rel=meta: retrieve additional document metadata, parse as RDF * GRDDL: - Beginnings of GRDDL support. - Support for rel=transformation linking to XSLT to transform doc to RDF - Support for grddl:transformation="" style transformations. - No support for <head profile> yet. * Microformats: - Table cell header pattern has been changed on wiki. Implement changes. - Better microformat nesting handling. * Improvements in charset handling and support for tag-soup HTML. * Comment out pre-RDFa <link rel>, <a rel> support. It's not really useful. * Disable eRDF by default as it seems to generate too many false positives. cognition/0.1-alpha5 (2008-03-16) :- vCard export; KML export; improved command-line client; support commented-out RDF in (X)HTML. * Various minor improvements to hCard and hCalendar parsing. * Export framework - Add vCard export option. + Parses data: URIs and outputs as base64 embedded data. + Pulls in data from full gamut of supported semantics, so that, say, RDFa FOAF data may end up as part of the vCard output. + Test input: <http://examples.tobyinkster.co.uk/hcard>. - Add KML export option. + Data can come from hCard, (e)RDF(a) vCard, (e)RDF(a) GeoRSS, etc. * Re-enabled eRDF by default, but eRDF parsing is now stricter. It *requires* a profile of <http://purl.org/NET/erdf/profile> to be found on the <head> element. * Improved command-line client. Use GetOpt::Long, Pod::Usage. * Support RDF embedded in HTML <!-- comments -->. (Trackback uses this.) cognition/0.1-alpha6 (2008-03-29) :- Profile URIs; Support for hAtom; Improved GRDDL; Atom and iCalendar output; Improved stringification. * Microformats: - Add option (disabled by default) to require <head profile> for microformat support. Microformat profiles are treated as OPAQUE STRINGS! Supports th following profiles: + http://purl.org/uF/2008/03/ + http://www.w3.org/2006/03/hcard or http://purl.org/uF/hCard/1.0/ + http://dannyayers.com/microformats/hcalendar-profile or http://purl.org/uF/hCalendar/1.0/ + http://purl.org/uF/hAtom/0.1/ + http://purl.org/uF/rel-tag/1.0/ + http://purl.org/uF/rel-license/1.0/ + No profiles required for rel-enclosure, adr or geo (yet). - Support for hAtom, WebSlices. + In addition to hAtom 0.1, rel-enclosure is supported within hEntries. - Improve include-pattern support to prevent some infinite loops. * GRDDL: - Add option (disabled by default) to require <head profile> for GRDDL. - Add option to check profile URLs for profileTransformation links. * Export: - Atom output. (Supports RDF/RSS and hAtom as input.) - iCalendar export option. + hCalendar 1.1 events. + hCalendar 1.1 todo items + hCalendar 1.1 freebusy info. + hCalendar 1.1 alarms. + hAtom entries (as VJOURNAL). + W3C's iCal RDF vocab (but see note in Cognition/Export/Calendar.pm) + RSS Event Module <http://web.resource.org/rss/1.0/modules/event/> * Added a "--nofollow" option to prevent secondary fetching from particular hosts. (Secondary fetching = requesting <head profile>, <link rel="meta">, <link rel="transformation">.) * Support <rdf:RDF> elements found directly in (X)HTML. * Much improved HTML->Text convertion. Namely: word wrapping, line breaks added after block elements, quote marks around <q> elements, bullet points and numbers before <li> elements in unordered and ordered lists, brackets around superscript text, parentheses around subscripts, tab characters between table cells, usenet-style quoting for <blockquote>, alt text from <img> and <input type="img">, values from other <input> tags. Should be able to handle nested elements like //ul/li/ol/li/dl/dd/blockquote/img[@alt]. Won't be completely foolproof, but should be an improvement over what was there before! * Fix so that the entire page is not given a rdf:type of ical:vcalendar unless it contains some bona fide vevent/vtodo/valarm/vfreebusy nodes. cognition/0.1-alpha7 (2008-04-21) :- hCard extensions using vCard 4.0; XFN support; jCard export; RDF/XML output is refactored; RDF/JSON export; improved @lang handling; BNodes. * Set '_xmllang' attribute on all elements, a la '_xpath'. * Microformats: - hCard: + Rename date-of-death "dday", and implement other properties from vCard 4.0 draft <http://www.ietf.org/internet-drafts/draft-resnick-vcarddav- vcardrev-01.txt>. + Empty TEL, EMAIL and IMPP no longer parsed. (e.g. telephone numbers with usages but no actual number.) + Automatically detect the representative hCard and contact hCard. <http://microformats.org/wiki/representative-hcard> - hCalendar: + support rel="vcalendar-(parent|sibling|child)" and class="related-to". + support implicit relationships gleaned from nesting. + Explicitly set RDF datatype for integers. + Better support for vfreebusys. + @title on root element parsed as dc:title. + Support x-wr-calname/x-wr-caldesc/calscale/prodid/method. - XFN: <http://microformats.org/wiki/xfn-to-foaf>. * Exports: - Cognition::Export::findSubject - I won't go into an explanation of why this is important, but it is. - jCard export. - vCard improvements: + Set TYPE parameter when ENCODING=b. + Output vCard 4.0 properties. Detect instant messaging protocols which have been forced into the URLs and output them as IMPP properties. - iCalendar improvements: + Set TYPE parameter when ENCODING=b. + Add RELATED-TO properties. + Support X-WR-CALDESC/CALSCALE/PRODID/METHOD/VERSION. + Big improvements for ATTENDEE/CONTACT/ORGANIZER. - RDF output no longer handled by HTMLParser -- it is in an Export module: + Output RDF datatypes (e.g. <http://www.w3.org/2001/XMLSchema#date>). + Output xml:lang where we can. + s/rdf:Description/FOO/ where FOO is the rdf:type. + Improved output for rdf:XMLLiterals. + Instead of <foo:bar rdf:nodeID="X">, nest the RDF description for X. - RDF JSON <http://n2.talis.com/wiki/RDF_JSON_Specification> export. * RDFa: - RDFa DTD has s/instanceof/typeof/. Cognition supports both (for now), but prefers @typeof. Fixed this attribute to allow whitespace-delimited list of (CURIE|URI)s. - In accordance with RDFa rules, drop resolution of absolute URIs from relative URIs specified in @xmlns. This actually makes parsing dumber, but it's in the recommended algorithm. - Improved parsing of rdf:XMLLiterals. - Extension to RDFa: @title parsed as rdfs:label. * When parsing and outputting dates, retain "resolution". * Create a data type Cognition::MagicString used in place of strings in many places which retains the language and XML representation of a string. MagicString-aware code can then pick up this data and use it if required. non-MagicString-aware code should usually be able to treat the MagicString as if it were a string, and not notice any difference, as MagicString overloads the stringify function. * More improvements to STRINGIFY: - Better algorithm for inserting whitespace between CDATA and inline element nodes. Should prevent words from accidentally running together. - Implement @start and @type for lists. For unordered lists, disc markers are implemented as asterisks, circle markers as hyphens, and square markers as plus signs. (Much like the markers used in this ChangeLog.) For ordered lists, roman numeral markers work up to 3999, and alphabetical markers up to 26 -- after that, the list will revert to numeric markers. - Better support for microformats "value excerpting". - Stringify now takes care of value excerpting and the ABBR pattern. * Better HTML->XHTML conversion routine. * Better framework for namespaces. Old system didn't handle scoped namespaces (e.g. xmlns attribute on a non-root element). * Introduce a BNode concept into the Cognition RDF model. Stored in the RDF triple store with dummy URIs like <bnode:///string>. This pretty much eliminates those ugly XPointers which littered the RDF output previously. As a deliberate change, <div class="vcard vcalendar"> will now result in two different RDF subjects, however they can be united into one subject by giving that node an ID attribute (because then they have proper URIs, not node IDs). - Adjust "->uri" methods for microformats. - Adjust RDFa parser to create BNodes instead of #fakeid URIs. - Adjust RDF export to use rdf:nodeID instead of rdf:resource/rdf:about. * Document structure parsing was disabled in alpha4 as it made the RDF output ugly. Because of improvements in RDF output, and ability to use BNodes, it is now re-enabled by default without uglying everything up. It can still be disabled via options. cognition/0.1-alpha8 (2008-05-04) :- xFolk support; ICBM; OpenURL COinS. * Microformats: - XFN: + Fix XFN rel values to match case-insensitively. + Smarter support for "mailto:", "urn:sha1:" and pictorial link targets. - hCalendar: + Fix Cognition::uF::hFreebusy::fb::uri to issue BNodes instead of XPointers. + Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey. + Fix bug with documents being given rdf:type of ical:Vcalendar, even if they do not use hCalendar. - hCard: + Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey. + Implement Andy Mabbett's suggestion allowing the "fn" class to be attached to address sub-properties, thus allowing hCards to easily represent places rather than organisations or people. - xFolk: introduce support for this microformat. Using a similar internal representation to the model used by Digg's new RDFa -- i.e. dc:source, dc:title and dc:abstract. Perhaps should extend xFolk to allow for dc:date and dc:creator? - Rel-Tag: restructured RDF output to mostly use Dublin Core. - figure: + Improvements to title/legend minimisation. + Restructured RDF output to use Dublin Core and FOAF. - geo: parse <meta name="ICBM"> as if it were an instance of geo. * Exports: - Corrections to support for both of the W3C RDF vocabs, and also the W3C iCalendar vocab. * Fix white space trimming bug in STRINGIFY. * Fix contact exporters to use foaf:name when no better name is available. * Support for COinS <http://ocoins.info/>, including obsolete rel="Z3988". cognition/0.1-alpha9 (2008-06-01) :- Switch to client/server model. Add support for hReview. * Introduce (optional) client/server model for Cognition. cognitiond.pl runs in the background; cognition.pl attempts to connect to it, asks the daemon to parse the URL, consumes the result and returns it. In many cases this significantly speeds up results. By default cognition.pl looks for a server using TCP on localhost:26464, but --host, --port and --proto parameters may be used to configure a different daemon to connect to. cognitiond.pl will look at /etc/cognition/cognitiond.conf to read its options. See sample config file. * Parsing improvements: - Improvements to white space handling. - Improvements to oddball ISO date formats such as 2 digit years, missing years, dates specified by week number or by ordinal day number. * Exports: - vCard: + Multiple vCard output now returns hCard contacts in same order as encountered on the page. + Cope better with more structured names. - jCard: + Multiple jCard output now returns hCard contacts in same order as encountered on the page. + Cope better with more structured names. - iCalendar: + Add VCARDURL parameter support for CONTACT, ORGANIZER and ATTENDEE properties, as described in this draft spec: <http://xml.coverpages.org/draft-royer-ical-vcard-01.txt> + Datetime fixes: convert to UTC and format correctly. * Microformats: - Implement support for hReview. - Rewrote support for N (structured names) in hCard parser to create vcard:N objects to wrap vcard:given-name, etc. - Allow explicit plus signs in geo microformat. cognition/0.1-alpha10 (2008-06-27) :- Document structure parsing overhaul; improvements to rel=tag; better support for some RDF nuances like rdf:value and rdfs:subPropertyOf. * Completely rewritten document structure parsing, using HTML 5 outlines algorithm <http://www.whatwg.org/specs/web-apps/current-work/#outlines> as a guide. Thanks to Ryan King and Geoffrey Sneddon for pointing me towards this algorithm. I also used Geoffrey's python implementation as a crib sheet to help me figure out what was supposed to happen when the HTML 5 spec was ambiguous. <http://hg.gsnedders.com/spec-gen/file/tip/specGen/processes/outliner.py> * Microformats: - rel-tag: + Support for class="tag". + Internal representation now uses Richard Newman's RDF Tag ontology. <http://www.holygoat.co.uk/owl/redwood/0.1/tags/> - XFN: + Explicit XFN 1.0 support. If you give an explicit profile URI pointing to the XFN 1.0 profile, but not to the XFN 1.1 profile, then newer XFN terms such as 'me', 'kin' and 'contact' are ignored. (But rel="me" is still used for determining the representative hCard of a page.) - hCard: + Support for fax: and modem: URIs. + Support "type"/"value" subproperties for "label" properties. - hCalendar: + Support for XOXO vtodo-list optimisation. Very nifty. - Experimental support for data-X classes. <http://purl.org/uF/pattern-data-class/1> - xFolk: + Merged support for xFolk into hReview. xFolk.pm is gone now. <http://buzzword.org.uk/cognition/uf-plus.html#xfolk-hreview> - hReview: + Support "xfolkentry" as an alias for "hreview". + Support "taggedlink" as an alias for "item". + Allow multiple instances of class "description". * Exports: - Special support for rdf:value, such that if an export module is looking for a literal value, but finds a resource which itself has an rdf:value literal, will use that literal. Indeed, it is capable of drilling down through rdf:value properties several layers deep. e.g. the following RDFa can be sucessfully exported as vCard: <div typeof="foaf:Person"> <div rel="foaf:name"> <p rel="rdf:value"> <b property="rdf:value">Toby Inkster</b> </p> </div> </div> - vCard: add support for vCard 4.0 "RELATED" property. XFN, foaf:knows and the RDF relationship vocab <http://vocab.org/relationship/> can all be used to supply the data. * Cognition understands rdfs:subPropertyOf, and will make use of a list of any rdfs:subPropertyOf relationships found in "~/.cognition/subPropertyOf.rdf". (It will also take heed of any such relationships found parsing the page, but won't go looking for them specially.) That is, if Cognition is outputting a vCard, so is looking for a foaf:name for a person, and you have stated that custom:moniker is an rdfs:subPropertyOf of foaf:name, and this person has a custom:moniker property defined, then the custom:moniker property is used. (Note: this was a lot more work than it should be. I'm on the lookout for a third-party triple store that can take the headache out of this sort of thing for me.) cognition/0.1-alpha11 (2008-07-24) :- Improved microformats parsing across the board. Add support for hAudio, hResume, hMeasure, species and XEN. Datetime parsing improvements. * Microformats: - Improved and more consistent parsing. A lot of parsing code that was repeated between the different microformat modules has been moved to Cognition::uF::simple_parse(). It includes better support for embedded microformats like: <div class="vcard"> <div class="agent"> <p class="vcard"></p> </div> </div> and proper support for ISO 8601 durations (not just treated as strings). - hResume + Add support for this draft <http://microformats.org/wiki/hResume>. + Mostly uses DOAC <http://ramonantonio.net/doac/0.1/doac.rdfs> to map to RDF. + LanguageSkills can be specified as ".hresume .contact.vcard .lang". + "affiliation" translated to vCard 4.0 draft "MEMBER" property. - hAudio: + Add support for this draft <http://microformats.org/wiki/hAudio>. - hMeasure / hMoney: + Add support for this draft <http://microformats.org/wiki/measure>. + Units currently treated as an opaque string, though I do have some experimental unit-conversion code that I may include in a future release of Cognition. + Nest within an hCard or hCalendar event to associate the measurement with that contact/event. - species: + Add experimental support for this proposed microformat. + Use the "biota" class to mark up a binomial/trinomial, plus (optionally) other taxonomic data. + Nest within an hCard to mark up the species of the hCard's owner. + Include class="attendee biota" within an hCalendar event to mark up a sighting of a member of the species. - XFN: + Refinements to implied foaf:knows. e.g. if Alice is Bob's parent, it is not necessarily implied that Alice and Bob know each other. For just a handful of relationships (e.g. friend, spouse, etc), foaf:knows is still implied. + Implements the XHTML Enemies Network (XEN). It's a spoof, but some people may find it useful. XEN relationships are only processed on pages that include the profile URI <http://xen.adactio.com/>. - figure: + Support rel-tag and rel-license nested inside figures. - hCard: + Make "lang" plural. + Support vCard 4.0 "member" property - either contains a nested hCard or a URI. * Exports: - vCard: keep up with improvements to hCard. - jCard: keep up with improvements to hCard. * DateTime parsing: - General datetime parsing improvements - I've bundled the Perl DateTime::Format::ISO8601 module within the Cognition distribution, renaming it to Cognition::DTParse. It includes several modifications to make it more tolerant, especially in the case of timezone handling and dealing with whitespace. - Support HTML 5 <time> element. - In conjunction with the smarter microformat parsing mentioned above, the STRINGIFY function now know when the property it's reading is supposed to be a datetime and can tailor its behaviour accordingly. In particular it will attempt to read values from the "datetime" attribute if it exists. This allows, in hCalendar: <time class="dtstart" datetime="2008-07-24">Thursday</time> and also: <span class="dtstart"> <time class="value" datetime="2008-07-24">Thursday</time> at <time class="value" datetime="21:00:00">9pm</time> <time class="value" datetime="+0100">(UK)</time> </span> Note that <time> is not the only HTML element that supports a "datetime" attribute. The following might be useful in hCard: <ins class="tel rev" datetime="2008-07-24T21:00:00"> My new <span class="type">home</span> phone number is <span class="value">01632 960 123</span> </ins> cognition/0.1-alpha12 (2008-08-20) :- Tonnes and tonnes of bugfixes, little improvements, and refactoring, particularly in RDFa parsing and handling nested microformats. Turtle output; M3U output; intelligent parsing and output of durations and intervals. * Bugfix work... - Fix XEN namespace. - In document structure, if <header> is found and contains a heading element (e.g. <h2>) then let <header>'s rank be the same as the contained heading. - Species, figure MFO. - Eliminate unneeded 'use' lines. - Ability to export individual calendar components in iCalendar format. This is some old functionality that disappeared a few versions ago, but is now back. - KML export bugfixes. - HTML (Detect) export bugfixes. - Last version broke rel=me => representative hCard detection. Fixed. - RDF/XML output sometimes tried to define xmlns:rdf twice. Fixed. - Lots of RDFa bug fixes. Cognition nearly passes all the tests in the W3C test suite. The ones it fails are: + 0032: Weakness in test suite. Cognition performs URI canoicalisation, but test suite fails to check for this. + 0033: See 0032. + 0093: Cognition's text/html to text/plain conversion is different from the one specified by RDFa. I'm not changing this - it would be a regression IMHO. + 0094 + 0099: See 0093. + 0100 + 0101 + 0108: See 0093. + 0112: See 0093. - Resolved conflict over hCard 'member' property. When hCard is parsed as an attendee/contact/organizer within hCalendar, then 'member' is treated as per RFC 2445. Otherwise, treated as vCard 4.0. * Exports - RDF/Turtle added. - KML: when a geo microformat is nested within an adr microformat, only output one placemark for them both. - HTML (Detect) improved, includes <pre> elements containing turtle. - M3U output from audio:Recording and audio:Album, including media:position support for ordering and media:duration support. Some very basic support for the music ontology <http://musicontology.com/>. - jCard 'rev' should be an array. * cognitiond has a new SHA1 command. Given a URI it will return the SHA1 of the URI. Given a URI structured like <foo#subject(bar)> it will return the SHA1 of "bar". This is used by the Cognition web service to provide SHA1-based filenames. * Microformats - hReview: + Set "type" to "place" if item hCard appears to be for a place, unless "type" is explicitly set. + Ditto to "product" if item is an hAudio. + Find the "reviewer" if it is outside the root hReview element. + Support for "inside-out ratings" where the rel=tag is wrapped *around* the rating. - Improvements dealing with tripley-embedded microformats. e.g. in the following, Jane Doe is no longer considered an agent of Joe Bloggs. <div class="vcard"> <span class="fn">Joe Bloggs</span>, <span class="birth vcard"> Born at <span class="fn org">Kingdom Hospital</span> <span class="agent vcard"> (<span class="role">Midwife</span>: <span class="fn">Jane Doe</span>) </span> </span> </div> - Added a few more profile URIs. - No longer use "uid" property as RDF URI. It simply doesn't work well with most examples in the wild. As somebody once said: "The creator or me is my mother. The creator of my web page is me. If you get me mixed up with my web page, then you would conclude that I am my own mother." - Better efficiency parsing microformats. Previously an element with classes "agent vcard" would be parsed twice - once in its own right, and again as the agent for its parent vcard. Now it should be parsed just once, resulting in faster parsing and reduced memory consumption. - hAtom entries will now take the page's title as their own title if their own title is blank, and they are the sole hAtom entry on the page, and there is no interleaving hfeeds. - Support for three new hCard properties: + Support vCard 4.0 draft "fburl" property. This may be either a link, or an embedded hCalendar. (Note: not an embedded hCalendar event, or hCalendar freebusy. The embedded hCalendar must have class name "vcalendar".) This is a plural property. + Support vCard 4.0 draft "caluri" property with same parsing rules as "fburl". This is a plural property. + Support vCard 4.0 draft "caladruri" property. This should be a link. This is a plural property. * Refactoring: - Removed a few dependencies. + s/URI::Escape::uri_escape/CGI::Util::escape/g. - Moved RDFa implementation to Cognition::HTMLParser::RDFa. - Moved eRDF implementation to Cognition::HTMLParser::eRDF. - Moved RDF/GRDDL implementation to Cognition::HTMLParser::RDF::*. - Moved some metadata stuff to Cognition::HTMLParser::Metadata. - Moved @role support to Cognition::HTMLParser::RoleAttr. - Rearranged much of Cognition::HTMLParser. * Use <http://www.w3.org/2006/link#uri> instead of dcterms:identifier to internally represent (alternative) RDF subject URIs. * Durations are now a first-class citizen in Cognition. That is, much like datetime values have been handled for a while, durations are now parsed and represented as their own data type (not simply a string). This will allow for more intelligent handling of durations in the future. - Microformat durations now support not just ISO 8601 strings as duration but also: + A simple duration measured in seconds <span class="duration">123 s</span> (SI-style, using seconds only) + Using ISO 31-1-style class names: <span class="duration"> <span class="h">1</span> hour, <span class="min">23</span> minutes and <span class="s">45.6</span> seconds </span> (Classes are: d, h, min, s.) + Embedded hMeasure. The hMeasure must have "type" equal to "duration" or null, and item set to null. Units can be "seconds"/"s", "minutes"/"min" "hours"/"h" or "days"/"d". The numeric component does not need to be an integer. - This introduces a new dependency on DateTime::Duration, but as that's bundled with DateTime, it shouldn't be a problem. (Cognition already had a dependency on DateTime.) - Non-ISO-8601 durations should be seen as EXPERIMENTAL for now. * Intervals are also now first-class citizens. As it happens, the only microformat that *uses* intervals is hCalendar's freebusy objects. - Intervals may be specified using: + ISO 8601 format. + An hMeasure duration (see above) plus one of a 'start', 'end', 'before' or 'after' class, which contain ISO 8601 datetimes. 'start'/'end' are inclusive. + An ISO 31-1-style duration as above, with one of 'start', 'end', 'before' or 'after'. + Both 'start'/'after' and 'before'/'end', with no duration. * Understands <meta http-equiv="Content-Language"> and HTTP header. cognition/0.1-alpha14 (2008-12-14) :- Ability to parse HTML from STDIN; integrate validation; refactored and improved namespace and CURIE handling; improved rel=meta support; approximate datetimes; better HTML 5 support; hRecipe support and RecipeBook XML export; integrated Google SocialGraph Node Mapper; less namespace squatting; HTTP in RDF vocab; Notation3 output and specialised JSON output for Microformats. * Microformats: - Cognition has had hCard validation functions built in for a while, but no interface to access them. I've started adding this information to the RDF output now. Also, simple_parse is able to log some validation errors. - hAudio: + remove rel=license support + title of work is now the "fn" property - figure: + "legend" plural + remove rel=license support + @longdesc support + profile URI - hRecipe: experimental support for this draft microformat. - hAtom: + entries now support class="hfeed replies" and class="in-reply-to" allowing Atom threading support. This feature is EXPERIMENTAL. + Use <http://bblfish.net/work/atom-owl/2006-06-06/#> namespace instead of squatting on <urn:ietf:rfc:4287#>. + Improve the "author" hunt. - rel-enclosure: Use <http://www.iana.org/assignments/relation/enclosure>. - rel-tag: + Support for class="tag" is now contingent upon finding a profile URI of <http://purl.org/uF/rel-tag/class>. + Use <http://bblfish.net/work/atom-owl/2006-06-06/#scheme> to represent tagspaces instead of squatting on <http://microformats.org/wiki/rel-tag#tagSpace>. - XFN: Switch to using Sindice's XFN vocabulary instead of squatting on the XFN profile document as a namespace. - hReview: support profile <http://www.purl.org/stuff/rev#>. - hCard now uses <http://www.w3.org/2006/vcard/ns#> as namespace instead of squatting on <urn:ietf:rfc:2426#>. - hCalendar now uses <http://www.w3.org/2002/12/cal/ical#> as a namespace instead of squatting on <urn:ietf:rfc:2445#>. - geo no longer uses <http://microformats.org/wiki/geo-extension-nonWGS84#> as its namespace for non-WGS84 coördinates. * RDFa: - Add support for @prefix as alternative method of defining RDFa prefixes. (See <http://rdfa.info/wiki/RDFainHTML4>.) - Spec-compliant whitespace handling if "rdfa_strings" option is set to "1". - Correctly ignore @id for subjects. - See also CURIE handling improvements. * RDF Input: - Improved support for rel="meta". - Can handle links to RDF/XML, N3, Turtle and (X)HTML. The last of those is triplified by calling another instance of Cognition on it. - Sends better HTTP "Accept" header on request. * HTTP: Parse HTTP headers. (Yes, some older versions of Cognition did HTTP headers, but not very well.) - Support for the latest draft of Mark Nottingham's HTTP Header Linking standard. (And yes, HTTP Link headers with rel=profile are a supported mechanism for linking to metadata profiles.) - Uses HTTP Vocabulary in RDF <http://www.w3.org/TR/HTTP-in-RDF/>. - Most datetime-related headers are properly parsed. This introduces a dependency on HTTP::Date, but that's a standard part of LWP, which is already widely used. - <meta http-equiv> parsed similarly. * HTML5: added support for new elements such as <section> to html2xhtml. Previously, this function (based on HTML::TreeBuilder) would have stripped those elements as they would not have been recognised. Input code is only passed through html2xhtml if it cannot be parsed as well-formed XML, so strict XHTML5 would have worked already. * Doc structure: - Changed result of parsing <h2 id="foo"> so that instead of this being interpreted as "#foo is a section", it is interpreted as meaning "there is a section, which has a heading #foo". This seems to be semantically the most sensible interpretation, and works better in practice too. * Doc metadata: - Special support for HTML5 metadata terms. * Output: - iCalendar: + Use the new firstOfLiteral/allOfLiteral functions as appropriate. + Pay better attention to date resolution. - RecipeBook XML <http://www.happy-monkey.net/recipebook/> export. - RDF/TriX export. - N3 export: exactly the same as Turtle, but nests some of the BNodes, with " = _:Node". - RDF: support for collections. - Microformats JSON output. * Dates and times: - Support for approximate dates in microformats. The class "approx" must be included on the datetime property element, or any descendant element. Two easy syntaxes: + <span class="approx bday">1665</span> + <span class="bday"> <span class="approx">ca.</span> <span class="value">1665</span> </span> - Better support for ISO 8601 "end of days" notation. e.g. 2008-08-26T24:00. - Improved support for datetimes outside the range AD 1 to 9999. - Cleverer support for value exceprting when parsing datetimes. * NetNewsWire plugins: - "extras" directory includes some plugins for NNW. * Documentation: - Installation help for the Cognition daemon on Mac OS X and Linux. * Refactoring and bug fixes for URI and CURIE handling: - Cognition::HTMLParser::abs_url is now Cognition::HTMLParser::uri and no longer handles CURIEs or BNodes. - Cognition::HTMLParser::uri can detect absolute URIs and avoids canonicalising them. - Dropped function: Cognition::HTMLParser::_fq2pfx - New function: Cognition::Namespace::to_curie, slightly smarter than above. - Dropped function: Cognition::HTMLParser::_pfx2fq - New function: Cognition::HTMLParser::eRDF::curie - New function: Cognition::HTMLParser::RDFa::curie - New function: Cognition::HTMLParser::RDFa::uriOrSafeCurie - New function: Cognition::HTMLParser::RDFa::reservedWordOrCurie - New function: Cognition::HTMLParser::RoleAttr::reservedWordOrCurie - New function: Cognition::HTMLParser::Metadata::reservedWordOrCurie - Change the prefix used by default for undefined prefixes in RDF (an error condition) from <http://undefined-namespace-prefix.invalid/> to <http://invalid.invalid/ns#>. - Support question-mark and equals-sign namespaces in addition to the usual hash and slash types. * STRINGIFY: - Support param@value. - Improved <pre> handling. (Where the <pre> is *outside* the property.) * Infrastructure: - Allow daemon to parse code passed as STDIN. + Syntax is "COGNIFY STDIN AS http://example.com/". + In that example, <http://example.com/> is taken to be the base URI. + Indicate end of input using a line containing a lone full stop. - Command-line client has similar capabilities. + In place of URL parameter on command line, pass '-'. + Then pass a second URL parameter which represents the base URI for the document. - Moved a bunch of stuff that's shared by daemon and client into a new Cognition::Misc module. * Now uses thing-described-by.org a lot for generating URIs for people, places, events, etc. * Integrate Google's SocialGraph::NodeMapper Perl module. - This is a little tricky to install as it relies on JavaScript::SpiderMonkey which in turn relies on Mozilla's libjs. Therefore, I've made this module optional. If SocialGraph::NodeMapper is installed, it will be used. Otherwise, those bits of code with call it will be ignored. - hCard uri, uid and impp properties are passed through it. - XFN links are passed through it. - See <http://code.google.com/p/google-sgnodemapper/>. * Cognition is now on Google Code: - See <http://code.google.com/p/cognition-parser/>. swignition/0.1-alpha15 (2009-01-25) :- Renamed Cognition to Swignition; allow Swignition to be pointed directly at non-HTML files (including various RDF serialisations, RSS feeds and JSON); various GRDDL improvements, including support for RDF-EASE; improved recursive parse. * Previously, Swignition always expected to be pointed at an HTML file. It can now be pointed at: - RDF/XML, Notation 3, TriG, Turtle and N-Triples. - RDF/JSON (but only recognised if the JSON schema is included) - JSON (via jsonGRDDL, requires JavaScript::SpiderMonkey) - Tag soup RSS / Atom + Includes support for RDFa in item <description>. + Includes support for Microformats in item <description>. - TriX, including XSLT transformations. * Outputs: - include a JSON schema in RDF/JSON output. - Atom bugfixes. * Improvements in GRDDL: - Support XML namespace GRDDL. - Support XML attribute GRDDL. - Move GRDDL code out of the RDF/XML module and clean up. - Support RDF-EASE as a transformation language. * User-Agent: - Better 'Accept' header sent. - More descriptive 'User-Agent' header sent. * Plain old metadata: - Allow <title> and <meta> parsing to be turned off. * Recursive parsing: - rel="meta" can now point at any file format understood by Swignition, including feeds, JSON, etc. - Moved to Swignition::GenericParser::Recursive. - The NoFollow feature actually works now. * Microformats: - hAtom: closer conformance to AtomOwl. swignition/0.1-alpha16 (2009-02-28) :- * Replace alpha3's RDFModel with a new DataModel for internal storage of data. - Includes support for multiple graphs. - Includes support for some graphs which are outside the standard RDF model (e.g. literal subjects). - Easy rdfs:Container and rdf:List support. - SPARQL support (by wrapping around Redland). * Microformats: - XFN: + Add support for the Relationship Vocabulary as HTML link types. See http://purl.org/vocab/relationship/. - hCalendar: allow 'rdate' and 'exdate' to be intervals. $Id$