Collex Deployment

From ARC Wiki
Revision as of 17:15, 13 November 2009 by Paul (talk | contribs)
Jump to navigation Jump to search

Summary of Current NINES Deployments

NINES exists on the following servers:

  • nines.org: The live site that is public facing.
  • staging.nines.org: The indexing site, where documents are indexed and tested before going live. Contributors are pointed at this site to approve the index.
  • nines.performantsoftware.com: The site where the latest version of the code is put for testing and approval before deploying.

All of these sites contain local instances of solr and mysql.

All of these sites use solr at port 8983, and are running the production configuration of rails.

Updating the various servers

For normal code updates on nines.org, go to the web folder and type "rake collex:deploy_to_production". That will do the following:

  • Make a backup of the database in a rollback is needed.
  • Tag the current trunk with the version number that is in web/app/models/branding.rb.
  • Deploy the trunk, and do various housekeeping.
  • restart the rails app.

If solr changes, then that needs to be downloaded from SVN with "svn up" from the solr_1.4 folder, then solr needs to be restarted.

If the index changes, then the index needs to be copied onto nines.org and unzipped into solr_1.4/solr/data/resources/index, then solr needs to be restarted.

For code updates on the staging servers, use the following rake task instead: "rake collex:update_staging". This doesn't backup the database or create a tag in SVN.

Install Capistrano

NOTE: EVERYTHING BELOW THIS POINT REFERS TO CAPISTRANO, WHICH WAS THE OLD WAY TO DEPLOY. CURRENTLY THE DEPLOYMENT IS HANDLED AS DESCRIBED ABOVE.

You must first install the Ruby gem Capistrano (and its dependents) on your development computer. We're currently using Capistrano version 1.4.1.

sudo gem install capistrano -v 1.4.1

You must also install Capistrano to your server (where the code will be deployed)

Customizing Deploy.rb file

If you wish to deploy Collex to a new environment, you will need to change the settings in the deploy.rb file (in the web/config directory). Most of these settings refer to the production computer to which you are deploying, rather than the development computer.

:sudo: the full path to sudo on the production computer

:repository: the full URL to the svn repository that holds the code you wish to have deployed

:deploy_to: the full destination file path to where you want the code stored

:user: your user id on the production computer

:web: the URL of the computer that will host the web server

:app: the URL of the computer that will host the RoR business code

:db: the URL of the computer that will host the MySQL database

:rake: the full file path to rake on the production computer

:svn: the full file path to the svn command on the production computer

:checkout: the command for checking out the code branch from svn

Note that your credentials need adequate permissions on the production computer for the script to execute successfully.

You will probably also need to customize the ownership command: sudo("chown username:group #{release_path}/config/database.yml")

Deploying Collex Web to Staging and Production

By default, all Capistrano commands will apply to STAGING. PRODUCTION must be specified with DEPLOY=production preceding your cap command. To update the code and restart the application, go to web/ on your development machine.

Initial setup

Before you can deploy the first time, you need to run from the development computer (not the deployment destination)

cap setup_rails

You may also need to create directories for log files which the deployment process depends upon, for which a symbolic link exists (look for errors during the deployment process). You may need to do mkdir log under ../production-web/shared/ and ../staging-web/shared/.

You may need to do the same for the tmp symlink.

Staging: update code and restart

cap deploy 

Production: update code and restart

DEPLOY=production cap deploy

What if I need to run rake db:migrate?

cap deploy_with_migrations

What if I want to update the code, but not restart it yet?

cap update

And restart? (which needs to be done, for instance, when the sites table is updated)

cap restart

What if I want to see the tasks Capistrano has available?

cap show_tasks

Our custom taks are located inside web/config/deploy.rb

I messed up and need to rollback!

cap rollback

We have a lot of releases on the server and I want to get rid of the oldest ones to save space.

cap cleanup #keeps the last 5 releases on the server

Deploying Solr to Staging and Production

We also use Capistrano for our Solr updating, compiling, starting, and stopping. You will need to change the parameters in the solr/config/deploy.rb file, similar to the Rails file above.

You need to be in the solr/ root directory to issue your Capistrano commands. The commands are similar to those for the Web application. The default environment is staging, and you specify production by preceding your cap command with DEPLOY=production. Our custom tasks are located in solr/config/deploy.rb

Update Solr, Compile and Restart

cap deploy #exports code from svn, compiles java, creates distribution, restarts Solr for staging environment
----
DEPLOY=production cap deploy #same thing for production environment

I just want to update the code, but not restart Solr

cap update

I want to stop, start, or restart Solr

cap stop_solr
cap start_solr
cap restart_solr 

I messed up and need to rollback!

You read the part about deploying the Web application, right?

cap rollback

A Capistrano Interlude

What is this Capistrano tool? Capistrano is a Ruby library that allows you to control setup of and deployment of code to remote servers. You could have one or 100 remote servers. The great thing is that you write your Capistrano tasks in one place, and run them from one place. The execution happens on all your servers at the same time. You don't have to log in to each machine, go to the needed directory and issue commands. Instead, you run your Capistrano task(s) from your own development machine and Capistrano uses SSH to connect to each of the specified remote servers and run the task(s). Capistrano is aimed at easing setup and deployment of code to remote machines, but is not limited to those types of tasks. See http://www.capify.org/ for newer information and the Old Capistrano Manual for older information. Note that in the older manual, rake has been deprecated in favor of cap. Don't use rake.

What happens when I run cap deploy?

  1. Capistrano asks you for the password to log into the remote machine for whichever user account it is using.
  2. Cap checks out (or exports in Collex's case) the code from the Subversion repository.
  3. Cap rollsback automatically if there are errors.
  4. Cap symlinks the new checkout/export to current
  5. Cap restarts the application.

Capistrano may do more that that, but those are the basic tasks that execute on deploy.

Deploying the Admin App

The tasks are pretty much the same as for Collex Web, except there is only one environment, not two, and you run the cap commands from the adminapp/ directory. Don't forget that you can always show the tasks: cap show_tasks.

Currently our server uses the indexer code (RdfFileIndexer) from the Solr staging environment. In order to update the indexer code, you must redeploy (see above) the Solr staging server. This issue is noted here: http://faustroll.clas.virginia.edu:8080/browse/CLX-88

Loading RDF into Collex

See CollexAdminApp

Removing RDF from Collex

Use the correct solr port, depending if you are removing an archive of data from production or test.

curl http://nines.org:[solr port]/solr/update --data-binary '<delete><query>archive:archive-name</query></delete>'
curl http://nines.org:[solr port]/solr/update --data-binary '<commit/>'
curl http://nines.org:[solr port]/solr/select?qt=cache_refresh  

This only deletes the objects from Solr, but not any tags/annotations that we've stored in the database. Further work is needed on database/Solr synchronization. It's possible with script/console hackery, but could be streamlined and documented.

Collex Administrative Interface

Users in Collex have (currently) three basic roles, regular user, admin, and editor. We do not leverage the editor role currently, as that has been subsumed by the CollexAdminApp functionality. Admin users can go into the /admin area of Collex to maintain the sites table, and users table (ignore the licenses table until ExhibitBuilder comes online). To update a site, edit and save it (setting a user as admin role, for example). Collex caches the sites table lookups, so it does need to be restarted to pick up these changes (cap restart for the appropriate environment of Collex web).

You must be logged in and have administrative privileges to use the /admin page.

sites

Each project stores its project name, site url and thumbnail url in the sites database table. To edit this information, go the /admin/site page. For example, on production NINES:

http://www.nines.org/admin/site

A similar url exists for staging.nines.org. From here you can page through all the current sites and edit them or create a new site. Any changes will require a restart of the apache server, accomplished via the "cap restart" command.

Nightly Scripts

The Collex system is serviced by several scripts run periodically via cron. Below is a listing of the cron jobs viewable with "crontab -l" as of August 2007:

30 18 * * * /usr/local/patacriticism/collex/trunk/optimize.sh nines.org 8983 1>/tmp/cron.out 2>/tmp/cron.err
30 19 * * * /usr/local/patacriticism/production-web/shared/scripts/backup_index.sh 
27 19 * * * /usr/local/patacriticism/production-web/shared/scripts/backup_mysql.sh nines_production

20 16 * * * RAILS_ENV=staging PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/staging-web/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'
20 04 * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/production-web/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'
20 04 * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'

* * * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/indexer staging
* * * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/indexer production

A brief summary of these scripts:

  • optimize.sh tells solr to repack the index such that deleted documents are dropped
  • backup_index.sh backs up the index nightly
  • backup_mysql.sh backs up the production database nightly
  • script/indexer is run twice for staging and production modes to pick up any new content supplied by the adminapp
  • ping.sh reports if solr or apache aren't responding (see note below)
  • DELETE FROM sessions (via script/runner) removes sessions older than 3 weeks from both Collex staging and production, as well as the adminapp

The following task used to be in the crontab listing, but it was causing "curl: not found" errors so has been removed:

5,15,25,35,45,55 * * * * /usr/local/patacriticism/production-web/shared/scripts/ping.sh