Class: Run
- Inherits:
-
ActiveRecord::Base
- Object
- ActiveRecord::Base
- Run
- Defined in:
- app/models/run.rb
Overview
A process consisting of the following steps, repeated over the current version of each FindingAid:
-
Execute schematron checker against finding aid
-
Record ConcreteIssues against FindingAidVersions
-
Apply relevant Fixes to finding aids, producing amended XML
-
Record ProcessingEvents (this step happens during fix application)
-
Save final XML result to file
Steps 2-4 may happen repeatedly if necessary
Constant Summary
- INPUT_DIR =
Directory to output files as ingested
File.join(Rails.root, 'public', 'input')
- OUTPUT_DIR =
Directory to output processed files
File.join(Rails.root, 'public', 'output')
- MAX_PASSES =
Maximum number of additional passes of checker/fixes to run after preflights and initial pass
ENV.fetch('MAX_PASSES', 5)
Instance Attribute Summary (collapse)
- - (DateTime) completed_at
- - (DateTime) created_at
- - (Integer) eads_processed
- - (String) name
- - (Boolean) run_for_processing
- - (DateTime) updated_at
Belongs to (collapse)
Has and belongs to many (collapse)
Has many (collapse)
- - (ActiveRecord::Relation<ConcreteIssue>) concrete_issues
- - (ActiveRecord::Relation<ProcessingEvent>) processing_events
Instance Method Summary (collapse)
-
- (Object) add_to_zip(zout, eadid, file)
Convenience method for adding to zip.
-
- (Object) apply_fix(xml, fix, pe = nil)
Helper method that performs one step of reduction.
-
- (Object) close_zipfiles(*zouts)
Convenience method for closing zipfiles.
-
- (Object) perform_analysis(faids)
Run checker over a set of provided faids, storing information on found errors in the database.
-
- (Object) perform_processing!
Take an analyzed run, and process the finding aids through all relevant fixes.
-
- (Object) perform_processing_run(faids)
Convenience method for doing analysis and processing in one go.
Instance Attribute Details
- (DateTime) completed_at
100 |
# File 'db/schema.rb', line 100 t.datetime "completed_at" |
- (DateTime) created_at
103 |
# File 'db/schema.rb', line 103 t.datetime "created_at" |
- (Integer) eads_processed
101 |
# File 'db/schema.rb', line 101 t.integer "eads_processed", default: 0, null: false |
- (String) name
105 |
# File 'db/schema.rb', line 105 t.string "name", limit: 255, null: false |
- (Boolean) run_for_processing
102 |
# File 'db/schema.rb', line 102 t.boolean "run_for_processing", default: false, null: false |
- (DateTime) updated_at
104 |
# File 'db/schema.rb', line 104 t.datetime "updated_at" |
Instance Method Details
- (Object) add_to_zip(zout, eadid, file)
Convenience method for adding to zip
152 153 154 155 156 157 158 159 160 |
# File 'app/models/run.rb', line 152 def add_to_zip(zout, eadid, file) zout.put_next_entry(java.util.zip.ZipEntry.new("#{eadid}.xml")) file.binmode file.each_line do |line| bytes = line.to_java_bytes zout.write(bytes, 0, bytes.length) end file.close end |
- (Object) apply_fix(xml, fix, pe = nil)
Helper method that performs one step of reduction
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'app/models/run.rb', line 46 def apply_fix(xml, fix, pe = nil) begin pre_fix_xml = xml.dup # HAX: Swallow mysterious namespace failure, come ON Noko rescue Java::OrgW3cDom::DOMException => e pre_fix_xml = Nokogiri::XML(xml.serialize, nil, 'UTF-8') {|config| config.nonet} end begin # In case of failure, catch the XML fix.(xml) rescue Fixes::Failure, StandardError => e pe.update(failed: true) if pe pre_fix_xml end end |
- (Object) close_zipfiles(*zouts)
Convenience method for closing zipfiles
164 165 166 167 168 169 170 171 172 |
# File 'app/models/run.rb', line 164 def close_zipfiles(*zouts) zouts.each do |zout| begin zout.close rescue java.io.IOException => e # already closed, nothing to do here end end end |
- (ActiveRecord::Relation<ConcreteIssue>) concrete_issues
26 |
# File 'app/models/run.rb', line 26 has_many :concrete_issues, dependent: :destroy |
- (ActiveRecord::Relation<FindingAidVersion>) finding_aid_versions
25 |
# File 'app/models/run.rb', line 25 has_and_belongs_to_many :finding_aid_versions |
- (Object) perform_analysis(faids)
Run checker over a set of provided faids, storing information on found errors in the database
31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'app/models/run.rb', line 31 def perform_analysis(faids) @checker = Checker.new(schematron, self) faids.each do |faid| faid = faid.current if faid.is_a? FindingAid ActiveRecord::Base.transaction do @checker.check(faid).each do |h| ConcreteIssue.create!(h) end self.finding_aid_versions << faid self.increment! :eads_processed end end end |
- (Object) perform_processing!
Take an analyzed run, and process the finding aids through all relevant fixes. Record events in ProcessingEvents table.
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'app/models/run.rb', line 64 def perform_processing! raise "This run is already processed!" if run_for_processing update(run_for_processing: true) outdir = File.join(OUTPUT_DIR, "#{id}").shellescape indir = File.join(INPUT_DIR, "#{id}").shellescape Dir.mkdir(outdir, 0755) unless File.directory?(outdir) Dir.mkdir(indir, 0755) unless File.directory?(indir) # Stream input files to zip zout_in = java.util.zip.ZipOutputStream.new(File.open(File.join(indir, 'input.zip'), 'wb', 0644).to_outputstream) zout_out = java.util.zip.ZipOutputStream.new(File.open(File.join(outdir, 'out.zip'), 'wb', 0644).to_outputstream) finding_aid_versions .joins(:finding_aid, :concrete_issues => :issue) .select('finding_aid_versions.*, finding_aids.eadid, ARRAY_AGG(DISTINCT issues.identifier) AS identifiers') .group('finding_aids.eadid,finding_aid_versions.id') .each do |fa| add_to_zip(zout_in, fa.eadid, fa.file) # Preflight XML fa_xml = Fixes.preflights.values.reduce(fa.xml) do |xml, fix| apply_fix(xml, fix) end # Apply all relevant fixes to Finding Aid repaired = Fixes .to_h .select {|identifier, _| fa.identifiers.include? identifier} .reduce(fa_xml) do|xml, (identifier, fix)| pe = processing_events.create(issue_id: schematron.issues.find_by(identifier: identifier).id, finding_aid_version_id: fa.id) apply_fix(xml, fix, pe) end # end of .reduce # Any problems which have fixes that exist now should theoretically # be things that were shadowed by the first pass, so take additional passes # untill either no known issues or MAX_PASSES MAX_PASSES.times do remaining_problems = schematron.issues.where(id: @checker.check_str(repaired.serialize(encoding: 'UTF-8')).map {|el| el[:issue_id]}.uniq).pluck(:identifier) & Fixes.to_h.keys # Run a second round of fixing if there are remaining problems break if remaining_problems.blank? repaired = Fixes .to_h .select {|identifier, _| remaining_problems.include? identifier} .reduce(repaired) do |xml, (identifier, fix)| pe = processing_events.create(issue_id: schematron.issues.find_by(identifier: identifier).id, finding_aid_version_id: fa.id) apply_fix(xml, fix, pe) end end # Add notice of processing to revisiondesc today = DateTime.now.in_time_zone rd = repaired.at_xpath('/ead/eadheader/revisiondesc') || repaired.at_xpath('/ead/eadheader').add_child('<revisiondesc />').first rd.prepend_child(Nokogiri::XML::DocumentFragment.new(repaired, "\n" + <<-FRAGMENT.strip_heredoc + "\n")) <change> <date calendar="gregorian" era="ce" normal="#{today.strftime('%Y%m%d')}">#{today.strftime('%m/%d/%Y')}</date> <item>This resource was modified by the ArchivesSpace Preprocessor developed by the Harvard Library (https://github.com/harvard-library/archivesspace-preprocessor)</item> </change> FRAGMENT File.open(File.join(outdir, "#{fa.eadid}.xml"), 'w', 0644) do |f| repaired.write_xml_to(f, encoding: 'UTF-8') end add_to_zip(zout_out, fa.eadid, File.open(File.join(outdir, "#{fa.eadid}.xml"), 'r')) end update(completed_at: DateTime.now) ensure close_zipfiles(zout_in, zout_out) end |
- (Object) perform_processing_run(faids)
Convenience method for doing analysis and processing in one go.
143 144 145 146 |
# File 'app/models/run.rb', line 143 def perform_processing_run(faids) perform_analysis(faids) perform_processing! end |
- (ActiveRecord::Relation<ProcessingEvent>) processing_events
27 |
# File 'app/models/run.rb', line 27 has_many :processing_events, dependent: :destroy |