This is a list of steps (as of May 23, 2007) to transfer current IS&T web pages into Alfresco:
- wget http://web.mit.edu.ezproxyberklee.flo.org/ist/
- Copy ist directory only to new ist project directory for processing, but keep other directories wget found for link management research (which extra-IS&T mit pages link into current IS&T pages)
- Find and delete files with ? or & in their names ( Alfresco chokes on these characters)
- Delete "dontindex" folders ( probably not going to migrate these)
- Search and replace internal links "/" for "http://web.mit.edu.ezproxyberklee.flo.org/ist/" (Alfresco repository requies links to be relative to ROOT for virtualization)
- Search and replace "/" for "http://web.mit.edu.ezproxyberklee.flo.org/is/" (alias folder to web.mit.edu/ist/ ,for historical reasons)
- Create empty scr/templates/ branch template directories and add to ist folder ( take current ist directories and empty them, then copy into src/templates/whatever_template dir)
- Run script (istSiteXmler.cgi) over site to: find html, htm, and shtml IS&T pages, create xml for each template and then place in appropriate src branch
- Run further scripts (checkNoGoFurther.cgi and chkforReDirbluebox.cgi) to find: blue-box, redirects and archived files among the pages that wouldn't easily translate from previous step (these scripts will be folded into istSiteXmler shortly, which will eliminate this step)
- Delete html associated with each xml file (this step may not be necessary in alfresco 2.0.1+)
- Make sure appropriate templates are in the Alfresco Data Dictionary
- War up and Bulk upload to the Alfresco (if creating new alfresco web-project) or CIFS transfer to Content Manager Sandbox (remember to submit to staging later.)
- Use Steve's app to attach aspects to xml
- Hand transfer remaining recalcitrant (not translated to xml by script) pages to appropriate template-xml ( easiest to use Alfresco x-forms interface for this, as it will result in properly escaped characters)
- Use R button to regenerate html from xml (by directory in scr branch)
- Submit to Staging (if regenerated in Content Manager Sandbox)
istSiteXMLer.cgi can be found in SVN under aphickey user folder
A diagram describing this process can be found as an attachment to this document, or directly here: https://wikis-mit-edu.ezproxyberklee.flo.org/confluence/download/attachments/31137/MapOlists.pdf