Sherlock: Advanced Automated Scoring
of Crystallization Experiments

Significance of Crystallization Experiment Scoring

Scoring is a critical step in any crystallization experiment as it directly impacts the success of determining macromolecular structures. Accurate scoring distinguishes between promising crystals and non-productive conditions, saving valuable time and resources. By systematically evaluating crystal quality and morphology, researchers can prioritize the most promising conditions for further optimization, leading to more efficient structure determination.

Technological Advancements and Manual Drop-Scoring Challenges

Recent advancements in laboratory automation and imaging technology have significantly increased experimental throughput and accelerated research in structural biology. Additionally, the development of Laboratory Information Management Systems, such as Rock Maker® by Formulatrix, has further streamlined the protein crystallization workflows. However, manual drop scoring remains a bottleneck due to low throughput, bias, person-to-person variability, and time and labor intensiveness.

Variability in Manual Drop Scoring of Crystallization Experiments

In 2021, we conducted a study to assess the variability in manual drop scoring of crystallization experiments. We provided 1200 drop images to 7 crystallographers to label them manually using MARCO's CPOX classification system (Clear, Precipitate, Others, and Crystal). The study revealed that all seven crystallographers agreed on only about 50% of the images, highlighting significant variability in manual crystallization scoring.

general scoring

A) Agreement among crystallographers on general image scoring

scoring on crystals

B) Agreement among crystallographers on image scoring with crystals

Moreover, we also analyzed the agreement among crystallographers, specifically in crystal identification. The dataset included 205 image containing crystals, and the results showed unanimous agreement on only 41% of the images. This indicates that identifying crystals is particularly challenging compared to other image classes. This study underscored the need for an AI-based auto-scoring model to handle this time-consuming task, allowing researchers to focus on more critical aspects of their work.

AI-Based Automation as a Solution

Automating the scoring process with Artificial Intelligence (AI) circumvents these stumbling blocks in crystallization experiments. Not only does it offer higher throughput, but it also remains unaffected by factors such as fatigue, haste, or distraction, ensuring consistent performance around the clock. Moreover, when trained on diverse datasets, AI eliminates human bias, resulting in more reliable and accurate image analysis.

Introduction to MARCO

One such AI-based scoring model is Machine Recognition of Crystallization Outcomes or MARCO, a convoluted neural network-based algorithm developed by Google. Although MARCO's integration with Rock Maker helped reduce the time and effort, it has low crystal detection accuracy due to limitations in training data sets and class definitions.

Sherlock by Formulatrix

To address these limitations, Formulatrix has developed an AI-based auto-scoring model, Sherlock. Compared to MARCO, Sherlock is trained on a larger and more diverse practical dataset of 800,000 images from 28 collaborating laboratories, enabling accurate and reliable crystal identification. Furthermore, it classifies drops into five distinct classes: Crystal, Crystal-Else, Phase Separation, Precipitate, and Clear. Introducing the "Crystal-Else" class has greatly improved crystal identification in challenging cases, such as when crystals are embedded in precipitates, which are often misclassified by MARCO.

Sherlock image classification system

Another key factor contributing to Sherlock's improved crystal detection accuracy is local feature detection or tiling. Before images are sent to Sherlock for scoring, they are divided into smaller sections or tiles, allowing for a more detailed analysis to detect small crystals that MARCO might miss.

image tiling on sherlock

Image tiling in Sherlock

MARCO-Sherlock Performance Comparison

A comparison between MARCO and Sherlock, using a diverse dataset of 5,000 manually labeled images from 10 customers, showed that Sherlock outperforms MARCO in overall accuracy and crystal recall (true positives). However, Sherlock’s crystal precision (reduction of false positives) was slightly lower. Despite this, user surveys reveal that crystallographers prefer false positives over missing crystals, highlighting the practical advantage of Sherlock's higher sensitivity, which is more beneficial in real-world applications.

Performance comparison of Sherlock with MARCO

Performance comparison of Sherlock with MARCO