Class: Run

Inherits:
ActiveRecord::Base
  • Object
show all
Defined in:
app/models/run.rb

Overview

A process consisting of the following steps, repeated over the current version of each FindingAid:

  1. Execute schematron checker against finding aid

  2. Record ConcreteIssues against FindingAidVersions

  3. Apply relevant Fixes to finding aids, producing amended XML

  4. Record ProcessingEvents (this step happens during fix application)

  5. Save final XML result to file

Steps 2-4 may happen repeatedly if necessary

Constant Summary

INPUT_DIR =

Directory to output files as ingested

File.join(Rails.root, 'public', 'input')
OUTPUT_DIR =

Directory to output processed files

File.join(Rails.root, 'public', 'output')
MAX_PASSES =

Maximum number of additional passes of checker/fixes to run after preflights and initial pass

ENV.fetch('MAX_PASSES', 5)

Instance Attribute Summary (collapse)

Belongs to (collapse)

Has and belongs to many (collapse)

Has many (collapse)

Instance Method Summary (collapse)

Instance Attribute Details

- (DateTime) completed_at

Returns:

  • (DateTime)


100
# File 'db/schema.rb', line 100

t.datetime "completed_at"

- (DateTime) created_at

Returns:

  • (DateTime)


103
# File 'db/schema.rb', line 103

t.datetime "created_at"

- (Integer) eads_processed

Returns:

  • (Integer)


101
# File 'db/schema.rb', line 101

t.integer  "eads_processed",                 default: 0,     null: false

- (String) name

Returns:

  • (String)


105
# File 'db/schema.rb', line 105

t.string   "name",               limit: 255,                 null: false

- (Boolean) run_for_processing

Returns:

  • (Boolean)


102
# File 'db/schema.rb', line 102

t.boolean  "run_for_processing",             default: false, null: false

- (DateTime) updated_at

Returns:

  • (DateTime)


104
# File 'db/schema.rb', line 104

t.datetime "updated_at"

Instance Method Details

- (Object) add_to_zip(zout, eadid, file)

Convenience method for adding to zip

Parameters:

  • zout (Java::Util::Zip::ZipOutputStream)

    the zip being written to

  • eadid (String)

    the eadid, used to construct filename in zip

  • file (File)

    an open file containing to add to zip



152
153
154
155
156
157
158
159
160
# File 'app/models/run.rb', line 152

def add_to_zip(zout, eadid, file)
  zout.put_next_entry(java.util.zip.ZipEntry.new("#{eadid}.xml"))
  file.binmode
  file.each_line do |line|
    bytes = line.to_java_bytes
    zout.write(bytes, 0, bytes.length)
  end
  file.close
end

- (Object) apply_fix(xml, fix, pe = nil)

Helper method that performs one step of reduction



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'app/models/run.rb', line 46

def apply_fix(xml, fix, pe = nil)
  begin
    pre_fix_xml = xml.dup
  # HAX: Swallow mysterious namespace failure, come ON Noko
  rescue Java::OrgW3cDom::DOMException => e
    pre_fix_xml = Nokogiri::XML(xml.serialize, nil, 'UTF-8') {|config| config.nonet}
  end

  begin # In case of failure, catch the XML
    fix.(xml)
  rescue Fixes::Failure, StandardError => e
    pe.update(failed: true) if pe
    pre_fix_xml
  end
end

- (Object) close_zipfiles(*zouts)

Convenience method for closing zipfiles

Parameters:

  • zouts (Array<Java::Util::Zip::ZipOutputStream>)

    zipfiles what need closing



164
165
166
167
168
169
170
171
172
# File 'app/models/run.rb', line 164

def close_zipfiles(*zouts)
  zouts.each do |zout|
    begin
      zout.close
    rescue java.io.IOException => e
      # already closed, nothing to do here
    end
  end
end

- (ActiveRecord::Relation<ConcreteIssue>) concrete_issues

Returns:

See Also:



26
# File 'app/models/run.rb', line 26

has_many :concrete_issues, dependent: :destroy

- (ActiveRecord::Relation<FindingAidVersion>) finding_aid_versions

Returns:

See Also:



25
# File 'app/models/run.rb', line 25

has_and_belongs_to_many :finding_aid_versions

- (Object) perform_analysis(faids)

Run checker over a set of provided faids, storing information on found errors in the database



31
32
33
34
35
36
37
38
39
40
41
42
43
# File 'app/models/run.rb', line 31

def perform_analysis(faids)
  @checker = Checker.new(schematron, self)
  faids.each do |faid|
    faid = faid.current if faid.is_a? FindingAid
    ActiveRecord::Base.transaction do
      @checker.check(faid).each do |h|
        ConcreteIssue.create!(h)
      end
      self.finding_aid_versions << faid
      self.increment! :eads_processed
    end
  end
end

- (Object) perform_processing!

Take an analyzed run, and process the finding aids through all relevant fixes. Record events in ProcessingEvents table.



64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# File 'app/models/run.rb', line 64

def perform_processing!
  raise "This run is already processed!" if run_for_processing
  update(run_for_processing: true)
  outdir = File.join(OUTPUT_DIR, "#{id}").shellescape
  indir =  File.join(INPUT_DIR,  "#{id}").shellescape
  Dir.mkdir(outdir, 0755) unless File.directory?(outdir)
  Dir.mkdir(indir, 0755) unless File.directory?(indir)

  # Stream input files to zip
  zout_in = java.util.zip.ZipOutputStream.new(File.open(File.join(indir, 'input.zip'), 'wb', 0644).to_outputstream)
  zout_out = java.util.zip.ZipOutputStream.new(File.open(File.join(outdir, 'out.zip'), 'wb', 0644).to_outputstream)

  finding_aid_versions
    .joins(:finding_aid, :concrete_issues => :issue)
    .select('finding_aid_versions.*,
             finding_aids.eadid,
             ARRAY_AGG(DISTINCT issues.identifier) AS identifiers')
    .group('finding_aids.eadid,finding_aid_versions.id')
    .each do |fa|
      add_to_zip(zout_in, fa.eadid, fa.file)

      # Preflight XML
      fa_xml = Fixes.preflights.values.reduce(fa.xml) do |xml, fix|
        apply_fix(xml, fix)
      end

      # Apply all relevant fixes to Finding Aid
      repaired = Fixes
                 .to_h
                 .select {|identifier, _| fa.identifiers.include? identifier}
                 .reduce(fa_xml) do|xml, (identifier, fix)|
        pe = processing_events.create(issue_id: schematron.issues.find_by(identifier: identifier).id,
                                      finding_aid_version_id: fa.id)
        apply_fix(xml, fix, pe)

      end # end of .reduce

      # Any problems which have fixes that exist now should theoretically
      # be things that were shadowed by the first pass, so take additional passes
      # untill either no known issues or MAX_PASSES
      MAX_PASSES.times do
        remaining_problems = schematron.issues.where(id: @checker.check_str(repaired.serialize(encoding: 'UTF-8')).map {|el| el[:issue_id]}.uniq).pluck(:identifier) & Fixes.to_h.keys

        # Run a second round of fixing if there are remaining problems
        break if remaining_problems.blank?
        repaired = Fixes
                   .to_h
                   .select {|identifier, _| remaining_problems.include? identifier}
                   .reduce(repaired) do |xml, (identifier, fix)|
          pe = processing_events.create(issue_id: schematron.issues.find_by(identifier: identifier).id,
                                        finding_aid_version_id: fa.id)
          apply_fix(xml, fix, pe)
        end
      end


      # Add notice of processing to revisiondesc
      today = DateTime.now.in_time_zone
      rd = repaired.at_xpath('/ead/eadheader/revisiondesc') || repaired.at_xpath('/ead/eadheader').add_child('<revisiondesc />').first
      rd.prepend_child(Nokogiri::XML::DocumentFragment.new(repaired, "\n" + <<-FRAGMENT.strip_heredoc + "\n"))
        <change>
          <date calendar="gregorian" era="ce" normal="#{today.strftime('%Y%m%d')}">#{today.strftime('%m/%d/%Y')}</date>
          <item>This resource was modified by the ArchivesSpace Preprocessor developed by the Harvard Library (https://github.com/harvard-library/archivesspace-preprocessor)</item>
        </change>
      FRAGMENT

      File.open(File.join(outdir, "#{fa.eadid}.xml"), 'w', 0644) do |f|
        repaired.write_xml_to(f, encoding: 'UTF-8')
      end

      add_to_zip(zout_out, fa.eadid, File.open(File.join(outdir, "#{fa.eadid}.xml"), 'r'))
  end

  update(completed_at: DateTime.now)
ensure
  close_zipfiles(zout_in, zout_out)
end

- (Object) perform_processing_run(faids)

Convenience method for doing analysis and processing in one go.



143
144
145
146
# File 'app/models/run.rb', line 143

def perform_processing_run(faids)
  perform_analysis(faids)
  perform_processing!
end

- (ActiveRecord::Relation<ProcessingEvent>) processing_events

Returns:

See Also:



27
# File 'app/models/run.rb', line 27

has_many :processing_events, dependent: :destroy

- (Schematron) schematron

Returns:

See Also:



24
# File 'app/models/run.rb', line 24

belongs_to :schematron