Difference between revisions of "Collex Deployment"

From ARC Wiki
Jump to navigation Jump to search
Line 1: Line 1:
== Summary of Current NINES Deployments ==
+
== Summary of Current ARC Deployments ==
  
 
NINES exists on the following servers:
 
NINES exists on the following servers:
  
 
* nines.org: The live site that is public facing.
 
* nines.org: The live site that is public facing.
* staging.nines.org: The indexing site, where documents are indexed and tested before going live. Contributors are pointed at this site to approve the index.
+
* edge.nines.org: The indexing site, where documents are indexed and tested before going live. Contributors are pointed at this site to approve the index.
* nines.performantsoftware.com: The site where the latest version of the code is put for testing and approval before deploying.
 
  
All of these sites contain local instances of solr and mysql.
+
18thConnect exists on the following servers:
  
All of these sites use solr at port 8983, and are running the production configuration of rails.
+
* 18thconnect.org
 +
* edge.18thconnect.org
  
== Updating the various servers ==
+
MESA exists on the following servers:
For normal code updates on nines.org, go to the web folder and type "rake collex:deploy_to_production". That will do the following:
 
* Make a backup of the database in a rollback is needed.
 
* Tag the current trunk with the version number that is in web/app/models/branding.rb.
 
* Deploy the trunk, and do various housekeeping.
 
* restart the rails app.
 
  
If solr changes, then that needs to be downloaded from SVN with "svn up" from the solr_1.4 folder, then solr needs to be restarted.
+
* mesa.performantsoftware.com
  
If the index changes, then the index needs to be copied onto nines.org and unzipped into solr_1.4/solr/data/resources/index, then solr needs to be restarted.
+
All of these sites contain local instances of solr and mysql.
  
For code updates on the staging servers, use the following rake task instead: "rake collex:update_staging". This doesn't backup the database or create a tag in SVN.
+
All of these sites use solr at port 8983, and are running the production configuration of rails.
 
 
== Install Capistrano ==
 
NOTE: EVERYTHING BELOW THIS POINT REFERS TO CAPISTRANO, WHICH WAS THE OLD WAY TO DEPLOY. CURRENTLY THE DEPLOYMENT IS HANDLED AS DESCRIBED ABOVE.
 
 
 
You must first install the Ruby gem Capistrano (and its dependents) on your development computer.  We're currently using Capistrano version 1.4.1.
 
 
 
<code>sudo gem install capistrano -v 1.4.1</code>
 
  
You must also install Capistrano to your server (where the code will be deployed)
 
  
== Customizing Deploy.rb file ==
 
If you wish to deploy Collex to a new environment, you will need to change the settings in the <code>deploy.rb</code> file (in the <code>web/config</code> directory).  Most of these settings refer to the production computer to which you are deploying, rather than the development computer.
 
  
<code>:sudo</code>: the full path to sudo on the production computer
 
 
<code>:repository</code>: the full URL to the svn repository that holds the code you wish to have deployed
 
 
<code>:deploy_to</code>: the full destination file path to where you want the code stored
 
 
<code>:user</code>: your user id on the production computer
 
 
<code>:web</code>: the URL of the computer that will host the web server
 
 
<code>:app</code>: the URL of the computer that will host the RoR business code
 
 
<code>:db</code>: the URL of the computer that will host the MySQL database
 
 
<code>:rake</code>: the full file path to rake on the production computer
 
 
<code>:svn</code>: the full file path to the svn command on the production computer
 
 
<code>:checkout</code>: the command for checking out the code branch from svn
 
 
Note that your credentials need adequate permissions on the production computer for the script to execute successfully.
 
 
You will probably also need to customize the ownership command: <code>sudo("chown username:group #{release_path}/config/database.yml")</code>
 
 
== Deploying Collex Web to Staging and Production ==
 
By default, all Capistrano commands will apply to STAGING. PRODUCTION must be specified with <code>DEPLOY=production</code> preceding your <code>cap</code> command. To ''update'' the code and ''restart'' the application, go to web/ on your development machine.
 
 
=== Initial setup ===
 
Before you can deploy the first time, you need to run from the development computer (not the deployment destination)
 
<pre>
 
cap setup_rails
 
</pre>
 
 
You may also need to create directories for log files which the deployment process depends upon, for which a symbolic link exists (look for errors during the deployment process).  You may need to do <code>mkdir log</code> under <code>../production-web/shared/</code> and <code>../staging-web/shared/</code>.
 
 
You may need to do the same for the <code>tmp</code> symlink.
 
 
=== Staging: update code and restart ===
 
<pre>
 
cap deploy
 
</pre>
 
 
=== Production: update code and restart ===
 
<pre>
 
DEPLOY=production cap deploy
 
</pre>
 
 
What if I need to run rake db:migrate?
 
<pre>
 
cap deploy_with_migrations
 
</pre>
 
 
What if I want to update the code, but not restart it yet?
 
<pre>
 
cap update
 
</pre>
 
 
And restart? (which needs to be done, for instance, when the sites table is updated)
 
<pre>
 
cap restart
 
</pre>
 
 
What if I want to see the tasks Capistrano has available?
 
<pre>
 
cap show_tasks
 
</pre>
 
 
Our custom taks are located inside web/config/deploy.rb
 
 
=== I messed up and need to rollback! ===
 
<pre>
 
cap rollback
 
</pre>
 
 
We have a lot of releases on the server and I want to get rid of the oldest ones to save space.
 
<pre>
 
cap cleanup #keeps the last 5 releases on the server
 
</pre>
 
 
== Deploying Solr to Staging and Production ==
 
We also use Capistrano for our Solr updating, compiling, starting, and stopping. You will need to change the parameters in the <code>solr/config/deploy.rb</code> file, similar to the Rails file above.
 
 
You need to be in the <code>solr/</code> root directory to issue your Capistrano commands. The commands are similar to those for the Web application. The default environment is ''staging'', and you specify ''production'' by preceding your <code>cap</code> command with <code>DEPLOY=production</code>. Our custom tasks are located in <code>solr/config/deploy.rb</code>
 
 
=== Update Solr, Compile and Restart ===
 
<pre>
 
cap deploy #exports code from svn, compiles java, creates distribution, restarts Solr for staging environment
 
 
----
 
----
DEPLOY=production cap deploy #same thing for production environment
+
Collex is an open source software package. Please refer to the documentation on the [https://github.com/collex Collex GitHub] for more information on Collex deployment. Specifically, the documentation for [https://github.com/collex/solr SOLR] and [https://github.com/collex/collex Collex] (the web facing application).
</pre>
 
 
 
=== I just want to update the code, but not restart Solr ===
 
<pre>
 
cap update
 
</pre>
 
 
 
=== I want to stop, start, or restart Solr ===
 
<pre>
 
cap stop_solr
 
</pre>
 
<pre>
 
cap start_solr
 
</pre>
 
<pre>
 
cap restart_solr
 
</pre>
 
 
 
=== I messed up and need to rollback! ===
 
You read the part about deploying the Web application, right?
 
<pre>
 
cap rollback
 
</pre>
 
 
 
== A Capistrano Interlude ==
 
What is this [http://www.capify.org/ Capistrano] tool? Capistrano is a Ruby library that allows you to control setup of and deployment of code to remote servers. You could have one or 100 remote servers. The great thing is that you write your Capistrano tasks in one place, and run them from one place. The execution happens on all your servers ''at the same time''. You don't have to log in to each machine, go to the needed directory and issue commands. Instead, you run your Capistrano task(s) from your own development machine and Capistrano uses SSH to connect to each of the specified remote servers and run the task(s). Capistrano is aimed at easing ''setup'' and ''deployment'' of code to remote machines, but is not limited to those types of tasks. See http://www.capify.org/ for newer information and [http://manuals.rubyonrails.com/read/book/17 the Old Capistrano Manual] for older information. Note that in the older manual, <code>rake</code> has been deprecated in favor of <code>cap</code>. Don't use <code>rake</code>.
 
 
 
=== What happens when I run <code>cap deploy</code>? ===
 
# Capistrano asks you for the password to log into the remote machine for whichever user account it is using.
 
# Cap checks out (or exports in Collex's case) the code from the Subversion repository.
 
# Cap rollsback automatically if there are errors.
 
# Cap symlinks the new checkout/export to <code>current</code>
 
# Cap restarts the application.
 
Capistrano may do more that that, but those are the basic tasks that execute on <code>deploy</code>.
 
 
 
== Deploying the Admin App ==
 
The tasks are pretty much the same as for Collex Web, except there is only one environment, not two, and you run the cap commands from the adminapp/ directory. Don't forget that you can always show the tasks: <code>cap show_tasks</code>.
 
 
 
Currently our server uses the indexer code (RdfFileIndexer) from the Solr staging environment.  In order to update the indexer code, you must redeploy (see above) the Solr staging server.  This issue is noted here:  http://faustroll.clas.virginia.edu:8080/browse/CLX-88
 
 
 
== Loading RDF into Collex ==
 
 
 
See [[CollexAdminApp]]
 
 
 
== Removing RDF from Collex ==
 
Use the correct solr port, depending if you are removing an archive of data from production or test.
 
<pre>
 
curl http://nines.org:[solr port]/solr/update --data-binary '<delete><query>archive:archive-name</query></delete>'
 
curl http://nines.org:[solr port]/solr/update --data-binary '<commit/>'
 
curl http://nines.org:[solr port]/solr/select?qt=cache_refresh 
 
</pre>
 
 
 
This only deletes the objects from Solr, but not any tags/annotations that we've stored in the database.  Further work is needed on database/Solr synchronization.  It's possible with script/console hackery, but could be streamlined and documented.
 
 
 
== Collex Administrative Interface ==
 
Users in Collex have (currently) three basic roles, regular user, admin, and editor.  We do not leverage the editor role currently, as that has been subsumed by the [[CollexAdminApp]] functionality.  Admin users can go into the /admin area of Collex to maintain the sites table, and users table (ignore the licenses table until ExhibitBuilder comes online).  To update a site, edit  and save it (setting a user as admin role, for example).  Collex caches the sites table lookups, so it does need to be restarted to pick up these changes (''cap restart'' for the appropriate environment of Collex web).
 
 
 
You must be logged in and have administrative privileges to use the /admin page.
 
 
 
=== sites ===
 
Each project stores its project name, site url and thumbnail url in the sites database table.  To edit this information, go the /admin/site page.  For example, on production NINES:
 
 
 
<pre>
 
http://www.nines.org/admin/site
 
</pre>
 
 
 
A similar url exists for staging.nines.org.  From here you can page through all the current sites and edit them or create a new site.  Any changes will require a restart of the apache server, accomplished via the "cap restart" command.
 
 
 
== Nightly Scripts ==
 
The Collex system is serviced by several scripts run periodically via cron.  Below is a listing of the cron jobs viewable with "crontab -l" as of August 2007:
 
 
 
<pre>
 
30 18 * * * /usr/local/patacriticism/collex/trunk/optimize.sh nines.org 8983 1>/tmp/cron.out 2>/tmp/cron.err
 
30 19 * * * /usr/local/patacriticism/production-web/shared/scripts/backup_index.sh
 
27 19 * * * /usr/local/patacriticism/production-web/shared/scripts/backup_mysql.sh nines_production
 
 
 
20 16 * * * RAILS_ENV=staging PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/staging-web/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'
 
20 04 * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/production-web/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'
 
20 04 * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/runner 'puts ActiveRecord::Base.connection.delete("DELETE FROM sessions WHERE updated_at < \"#{3.weeks.ago.to_s(:db)}\"")'
 
 
 
* * * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/indexer staging
 
* * * * * RAILS_ENV=production PATH=/opt/csw/bin:/usr/jdk1.5.0_06/bin:/usr/bin:/usr/local/bin /usr/local/patacriticism/adminapp/current/script/indexer production
 
</pre>
 
 
 
A brief summary of these scripts:
 
 
 
* optimize.sh tells solr to repack the index such that deleted documents are dropped
 
* backup_index.sh backs up the index nightly
 
* backup_mysql.sh backs up the production database nightly
 
* script/indexer is run twice for staging and production modes to pick up any new content supplied by the adminapp
 
* ping.sh reports if solr or apache aren't responding (see note below)
 
* DELETE FROM sessions (via script/runner) removes sessions older than 3 weeks from both Collex staging and production, as well as the adminapp
 
 
 
The following task used to be in the crontab listing, but it was causing "curl: not found" errors so has been removed:
 
<pre>
 
5,15,25,35,45,55 * * * * /usr/local/patacriticism/production-web/shared/scripts/ping.sh
 
</pre>
 

Revision as of 21:08, 23 April 2013

Summary of Current ARC Deployments

NINES exists on the following servers:

  • nines.org: The live site that is public facing.
  • edge.nines.org: The indexing site, where documents are indexed and tested before going live. Contributors are pointed at this site to approve the index.

18thConnect exists on the following servers:

  • 18thconnect.org
  • edge.18thconnect.org

MESA exists on the following servers:

  • mesa.performantsoftware.com

All of these sites contain local instances of solr and mysql.

All of these sites use solr at port 8983, and are running the production configuration of rails.



Collex is an open source software package. Please refer to the documentation on the Collex GitHub for more information on Collex deployment. Specifically, the documentation for SOLR and Collex (the web facing application).