OIM Data Miner

With Great Data Comes Great Responsibility

Matt Nowell, EBSD Product Manager, EDAX

First, I have to acknowledge that I stole the title above from a tweet by Dr. Ben Britton (@BMatB), but I think it applies perfectly to the topic at hand. This blog post has been inspired by a few recent events around the lab. First, our data server drives suffered from multiple simultaneous hard drive failures. Nothing makes you appreciate your data more than no longer having access to it. Second, my colleague and friend Rene de Kloe wrote the preceding article in this blog, and if you haven’t had the opportunity to read it, I highly recommended it. Having been involved with EBSD sample analysis for over 20 years, I have drawers and drawers full of samples. Some of these are very clearly labeled. Some of these are not labeled, or the label has worn off, or the label has fallen off. One of these we believe is one of Rene’s missing samples, although both of us have spent time trying to find it. Some I can recognize just by looking, others need a sheet of paper with descriptions and details. Some are just sitting on my desk, either waiting for analysis or around for visual props during a talk. Here is a picture of some of these desk samples including a golf club with a sample extracted from the face, a piece of a Gibeon meteorite that has been shaped into a guitar pick, a wafer I fabricated myself in school, a rod of tin I can bend and work harden, and then hand to someone else to try, and a sample of a friction stir weld that I’ve used as a fine grained aluminum standard.

fig-1_modified
Each sample leads to data. With high speed cameras, it’s easier to collect more data in a shorter period of time. With simultaneous EDS collection, it’s more data still. With things like NPAR™, PRIAS™, HR-EBSD, and with OIM Analysis™ v8 reindexing functionality, there is also a driving force to save EBSD patterns for each scan. With 3D EBSD and in-situ heating and deformation experiments, there are multiple scans per sample. Over the years, we have archived data with Zip drives, CDs, DVDs, and portable hard drives. Fortunately, the cost for storage has dramatically decreased in the last 20+ years. I remember buying my first USB storage stick in 2003, with 256 MB of storage. Now I routinely carry around multiple TBs of data full of different examples for whatever questions might pop up.

cost-per-gigabyte-large_modified
How do we organize this plethora of data?
Personally, I sometimes struggle with this problem. My desk and office are often a messy conglomerate of different samples, golf training aids (they help me think), papers to read, brochures to edit, and other work to do. I’m often asked if I have an example of one material or another, so there is a strong driving force to be able to find this quickly. Previously I’ve used a database we wrote internally, which was nice but required all of us to enter accurate data into the database. I also used photo management software and the batch processor in OIM Analysis™ to create a visual database of microstructures, which I could quickly review and recognize examples. Often however, I ended up needing multiple pictures to express all the information I wanted in order to use this collection.

blog-fig-3_modified

To help with this problem, the OIM Data Miner function was implemented into OIM Analysis™. This tool will index the data on any given hard drive, and provide a list of all the OIM scan files present. A screenshot using the Data Miner on one of my drives is shown above. The Data Miner is accessed through this icon on the OIM Analysis™ toolbar. I can see the scan name, where it is located, the date associated with the file, what phases were used, the number of points, the step size, the average confidence index, and the elements associated with any simultaneous EDS collection. From this tool, I can open a file of interest, or I can delete a file I no longer need. I can search by name, by phase, or by element, and I can display duplicate files. I have found this to be extremely useful in finding datasets, and wanted to write a little bit about it in case you may also have some use for this functionality.