Fandom

Chemistry Toolkit Rosetta Wiki

Detect and report SMILES and SDF parsing errors

21pages on
this wiki
Add New Page
Talk0 Share

Parse the strings "Q" and "C1C1" as if they were SMILES and store any error or warning messages to a file. These should not be accepted as valid SMILES strings.

There needs to be a similar test for SD files.

Hmm, even better would be to parse a SMILES file and SD file containing errors, to see if the respective readers can skip the errors.

The idea here is that an application (including a web app) may want to report that an input structure was incorrect, and give some information about what was wrong.

OpenBabel/RubabelEdit

require 'rubabel'
File.open("log.txt", 'w') do |out|
     %w(Q C1C).each do |smile|
          Rubabel[smile] rescue out.puts "bad smiles #{smile}"
     end
end

Cactvs/TclEdit

In Tcl

set fh [open log.txt w]
foreach smiles [list "Q" "C1C"] {
   if {[catch {ens create $smiles} msg]} {
      puts $fh $msg
   }
}
close $fh

The message is "Error: ens create failed: Failed to decode structure data specification"

For file input, you can do something like

set fh [molfile open "dubious.smi"]
while 1 {
  if {[catch {molfile read $fh} eh]} {
     if {[molfile get $fh eof]} break
     puts $eh
     continue
  }
  ens delete $eh
}
molfile close $fh

The logged messges about corrupted records are typically something like "Data file syntax error in line 99 record 6"

All I/O modules in the toolkit have the capability to re-sync the file (trivial for SMILES, not so simple for SDF). In the read loop, this happens automatically.

Cactvs/PythonEdit

Here essentially the same code in Python:

f=open('log.txt','w')
for smiles in ['Q','C1C']:
    try:
        e=Ens(smiles)
    except Exception as x:
        f.write(x.args[0]+"\n")
f.close()

and

f=Molfile('dubious.smi')
while True:
    try:
        e=f.read()   
    except Exception as x:
        print(x.args[0])                           
    else:
        if (e==None): break
        e.delete()             
f.close()

Note that there is a subtle difference between the Tcl and Python implementations of the structure file input command: In Tcl, hitting EOF raises an error, while on python, a None magic object is returned.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.