Key WordsArtificial intelligencemachine learningpredictive modelingovarian stimulation
The field of assisted reproductive technologies (ARTs) has seen significant advancements in the last 40 years, with a growing focus on improving patient outcomes through the application of new technologies (1). In recent years, the healthcare industry has increased its adoption of machine learning (ML) algorithms, leveraging the benefits of computer science advancements and the management of large datasets to improve patient care and increase operational efficiency (2). Applications have ranged from predicting disease outcomes and identifying optimal treatment plans to automating administrative tasks and enhancing patient engagement (2). Although still in its infancy, the field of artificial intelligence (AI) in reproductive medicine has benefited from increased research and investment from both the scientific and technology communities (3).
The potential for AI-assisted in vitro fertilization (IVF) to improve both the outcomes and efficiency of IVF through controlled ovarian stimulation is a rapidly growing area of research (4, 5, 6). Artificial intelligence can aid in optimizing the dosage and timing of medications, reducing the likelihood of overstimulation or under stimulation, and ultimately streamlining the IVF process. This could result in improved outcomes for patients and enable healthcare providers to treat more patients with greater efficiency and accuracy, ultimately driving increased clinical and financial value although increasing access to care. In this article, we aim to shed light on the latest breakthroughs of AI in IVF stimulation and examine the potential of these technologies to transform the field. We will briefly review the technology, extensively review existing literature, and discuss how an AI-enabled future of controlled ovarian stimulation may come to fruition. We will also discuss the role of validation in these technologies and their potential limitations. By integrating AI into the field of reproductive medicine, we expect increased access to fertility services through higher-value clinical care, leading to more successful and efficient fertility treatments.
Artificial intelligence in healthcare
Artificial intelligence is a term that broadly describes technologies built to mimic human intelligence for tasks such as image recognition or decision-making. It has been integrated into many fields, including speech recognition and clinical decision-making support in medicine. Machine learning is a subset of AI that uses statistical modeling to analyze data patterns and generate predictive outputs without explicit coding or instructions. When paired with big data, it can lead to powerful models that draw inferences from seemingly random or inconclusive data.
Machine learning models are increasingly being developed to aid clinical decision-making in healthcare. These narrow AI systems use clinical data to train algorithms for specific tasks and can range in complexity from simple linear models such as linear or logistic regression, to more complicated nonlinear algorithms such as neural networks or gradient boosting. These algorithms rely on a complex statistical analysis of clinical inputs to generate outcome predictions. There are various types of ML algorithms that perform various fundamental tasks. Regression algorithms are trained to predict continuous values, such as eggs retrieved after stimulation, whereas classification algorithms are trained to predict discrete values, such as the likelihood of blastocyst conversion or clinical pregnancy. Segmentation models are used to label or divide specific parts of an image to identify a region of interest, such as measuring the size of a follicle from an ultrasound image. Lastly, statistical techniques such as causal inference can be used to infer causal relationships between observational data and outcomes, such as understanding how the choice of stimulation protocol affects outcomes. The potential of AI in clinical decision-making has been explored across many different fields of medicine (7, 8).
AI has emerged as a transformative force in the field of medicine, offering healthcare professionals capabilities to efficiently organize, analyze, and extract insights from the vast amounts of complex and diverse data generated daily. By harnessing the power of AI algorithms and advanced analytics, clinicians have access to more precise and timely information, enabling them to make better-informed clinical decisions and ultimately deliver better patient care. Use cases within AI are numerous, ranging from improved diagnostic capabilities in radiology to more efficient drug discovery methods that aid pharmaceutical companies in developing novel therapeutics. When it comes to diagnostic assistance, predictive analytics, or clinical decision-making support systems, AI can analyze large datasets to identify patterns and trends to predict patient outcomes or disease progression, which can help healthcare providers to individualize treatment planning and manage resources more effectively. Additionally, AI can assist physicians in making more informed decisions by providing real-time, evidence-based recommendations that can help reduce errors, improve patient outcomes, and optimize treatment plans.
Artificial intelligence tools have been developed and used in various fields, including improving cancer detection in diagnostic radiology and pathology. In mammography and digital breast tomosynthesis, the use of convolutional neural networks has improved diagnostic accuracy, with sensitivity rates approaching those of radiologists (9). Convolutional neural network imaging algorithms trained to integrate clinical information from electronic health records (EHR) have shown promising results also in breast cancer detection (10). Similarly, AI systems have demonstrated high concordance rates with radiologists in the detection of lung cancer (11). In pathology, AI-based deep neural networks have been trained also to read needle core biopsies of prostate cancer, achieving a high area under the curve (AUC) and concordance rates with expert pathologists, indicating the potential for AI to improve diagnostic capabilities and workloads (12).
Artificial intelligence has proven to be valuable in improving patient outcomes and reducing mortality in early sepsis detection by analyzing large datasets from various sources, such as EHR, patient monitors, and laboratory results. Artificial intelligence-based platforms such as CLEW Medical’s virtual intensive care units and technology electronic dashboard-intensive care units provide real-time risk assessments and alerts, although Sepsis DART developed by researchers at the University of Michigan monitors and analyzes vital signs, laboratory results, and other clinical data to identify early signs of sepsis in hospitalized patients (13, 14, 15). In drug discovery, AI accelerates the process by analyzing vast amounts of data from various sources to identify potential drug candidates and predict their effectiveness. Atomwise’s AtomNet system identifies potential cancer drug candidates by analyzing protein structures, whereas BenevolentAI mines vast amounts of biomedical data to discover new drug targets and potential treatments or novel uses of currently available drugs (16). Artificial intelligence is poised to play a crucial role in optimizing personalized healthcare and improving clinician workflow with the use of the power of big data analysis.
Artificial intelligence in ovarian stimulation
Computer-Driven Support Systems (CDSS)
Artificial intelligence-based CDSS are advanced software tools that integrate AI techniques to assist decision-makers in various domains. Computer-driven support systems are designed to analyze complex data, generate insights, and provide recommendations, thereby helping professionals make more informed, accurate, and timely decisions. Letterie and Mac. (5) described a system that employs ML algorithms to analyze patient data, including age, weight, hormone levels, and ovarian reserve, to tailor the stimulation protocols to each patient’s unique needs. This study consisted of 1,853 autologous and 750 donor cycles that incorporated a total of 7,376 visits for training. An additional 556 unique cycles were used to challenge the platform and calculate the accuracy of the study’s algorithm. The investigators showed that the first iteration of their algorithm provided highly accurate decisions that were in concordance with evidence-based decisions by expert teams regarding whether to continue or stop treatment, trigger, and schedule oocyte retrieval or cancel a cycle, undergo medication adjustments, or predict the number of days for patient follow-up (5). Although this study did not aim to improve on or optimize decisions, it was a great proof of concept of the power of how such platforms can provide aid in reproductive medicine by providing tools for clinicians to improve treatment delivery through forming a triad between expertise, evidence, and algorithmic data analytics (17). By providing a proof of concept that a decision-making platform can be developed that agreed with expert decisions, such platforms can be improved on and used to decrease physician workload and improve clinic efficiency for better patient care delivery. Clinics with staffing shortages or long wait times can benefit from such tools.
In the development of predictive models using AI, outcome prediction is a critical component. This is because accurate prognostication before, during, or after a treatment cycle is vital for patient counseling and treatment planning. Patients face significant challenges, including psychological, financial, and psychosocial aspects, in addition to dealing with a large amount of medical information when diagnosed with infertility. The current counseling tools available are on the basis of limited studies that have poor generalizability. However, AI has the flexibility to be trained on large patient populations, allowing it to draw conclusions and identify trends that may be hidden in traditional statistical modeling.
In the realm of ART, AI algorithms have been developed to predict the likelihood of clinical pregnancy and live birth using patient demographics and clinical cycle outcomes. McLernon et al. (18) developed logistic regression models to estimate the chances of live birth for patients going through their first cycle and for patients with an unsuccessful first cycle who are attempting a second. These models were developed from data reported to the Society for Assisted Reproductive Technology Clinic Outcome Reporting System (SART CORS), which collects fertility treatment data and outcomes from over 90% of reported IVF cycles in the United States and can be accessed freely through a calculator on the SART CORS website. Wang et al. (19) used a random forest algorithm to predict clinical pregnancy with an AUC of 0.72, outperforming traditional logistic regression with an AUC of 0.67. Although the AUC only had a marginal increase, the random forests algorithm they developed was able to rank predictors of clinical pregnancy, such as ovarian stimulation protocol, to assess for variable importance and the propensity of impact on positive pregnancy outcomes (19). Additionally, Nelson et al. (20) developed three validated prediction models using boosted tree modeling with antiMullerian hormone (AMH) or antral follicle count (AFC) to link cycle characteristics to live-birth probabilities. These models used clinical cycle characteristics with either AMH, AFC, or both AMH and AFC to create three distinct predictor models to assess the prognostic impact of AMH and/or AFC on clinical pregnancy. In this study, Nelson et al. (20) were able to show that the AMH alone model had the highest predictor power, significantly improving model predictive performance over age alone by 76.2%. Although current models are not able to predict the likelihood of clinical pregnancy on the basis of the clinical parameters alone outside of traditional logistic regression, the models described by Wang et al. (19) and Nelson et al. (20) are important stepping stones toward objectively identifying key clinical factors that may contribute to clinical pregnancy, refining variable selection for downstream algorithm development.
These models can be used to personalize treatment decisions and outcomes counseling for patients and can help providers manage patient expectations and quantify financial risk in the context of multiple-cycle planning. Personalized outcome analysis can help providers better understand suboptimal outcomes using AI-based analytic tools (21). One of the few commercially available algorithms combines these concepts elegantly by training and validating a boosted tree model for IVF outcome prediction. Originally trained and validated on 5,035 IVF cycles with 52 variables (22), this algorithm was further validated using an international retrospective cohort of 13,076 cycles to show an improvement of live birth probability prediction by 35.7% compared with age alone (21). Additionally, the 30 significant variables included in the final analysis were assessed using a sequential multiple additive regression tree, and classification and regression tree analysis to determine their contributory impact on the performance of the model. Interestingly, they found that four variables—the total number of embryos, rate of cleavage arrest, number of 8-cell embryos, and day 3 follicle-stimulating hormone (FSH) level—accurately predicted approximately 70% of the IVF cycle outcomes, providing more informative impacts than age, clinical diagnosis, or other clinical factors (23). These predictive tools provide a bird’s-eye view to provide personalized risk stratification to predict IVF cycle outcomes, a useful counseling tool to help manage patient expectations and treatment planning. Furthermore, predictive tools that evaluate the odds of success of various treatments, such as intrauterine insemination, IVF, or third-party, may help providers and patients select the optimal treatment on the basis of their priorities. Overall, AI can play a critical role in predicting ART outcomes to ultimately enhance patient care.
Dose and Protocol Selection
Given the lack of universally adopted guidelines, ovarian stimulation treatment decisions can vary significantly depending on the doctor or clinic (24). However, the aggregation of data containing this heterogeneity has enabled the training of AI models to analyze and predict how an individual will respond to different treatments, such as gonadotropin dosing and the choice of stimulation protocol. The Consolidated Standards of Reporting Trials model was one of the first approaches for using ML for gonadotropin dosing, in which a nonlinear regression model was trained on patient age, body mass index (BMI), basal FSH, and AFC to predict the dose of FSH required to retrieve 11 oocytes (25). However, this model has been shown in a randomized controlled trial to result in a lower number of oocytes retrieved when compared to clinician dosing (26).
More recently, Fanton et al. (27) developed a nearest-neighbors machine learning model to relate the starting dose of FSH to predicted mature egg outcomes. This model creates an individualized dose-response curve using a patient’s age, BMI, AMH, and AFC to identify similar patients from a database of over 18,000 cycles from three US clinics. For “dose-responsive” patients that had a clear optimal region on their dose-response curve, it was estimated that selecting the predicted optimal dose could result in an average of 1.5 more MIIs. For “dose nonresponsive” patients that had a flat curve without a clear optimal region, it was estimated that selecting a low dose could result in FSH dose savings of approximately 1,375 IUs without harming outcomes. However, this approach for optimizing starting dose did not take into account other factors that can affect outcomes, such as dose adjustments midcycle or increased risk of hyperstimulation for certain populations. Other models for selecting the starting dose of gonadotropins have assumed a linear relationship between starting dose and egg outcomes. Correa et al. (28) used 2,713 patients from five clinics to predict a linear relationship between starting dose of FSH and eggs retrieved and estimated that the model could more accurately prescribe a dose resulting in 10–15 eggs compared with clinicians. Before this study, simpler linear nomograms, built off of small patient populations (N < 1,000) within a single clinic, have been proposed which used patient age, AMH, AFC, and basal FSH levels to recommend FSH dosing (29, 30, 31). However, these simple linear models and nomograms are not able to capture nonlinear dose-response relationships, and make assumptions about the number of eggs retrieved considered to be “optimal.” Although AI models for gonadotropin dosing have been focused primarily on the initial dose of FSH, similar models will be likely soon applied to assist with other dosing decisions, such as dosing of luteinizing hormone or predicting how gonadotropin dosing adjustments mid-stimulation will impact outcomes.
Artificial intelligence techniques have been used also to better understand the choice of the stimulation protocol. A recently published article used causal inference on approximately 20,000 cycles reported to the SART CORS database between 2014–2020 to show that, for poor responders, the antagonist protocol resulted in similar outcomes to the flare protocol (32). This is consistent with a prior study by Wald et al. (24) showing that the choice of protocol has minimal impact on outcomes. Future AI models to help choose stimulation protocols will be developed likely by either directly predicting outcomes for a patient on the basis of the protocol (i.e., gonadotropin hormone-releasing hormone [GnRH] antagonist, long GnRH agonist, or GnRH flare, and others) or by incorporating the choice of protocol into other models (e.g., FSH dosing model) as an input parameter.
Clinic workflow and electronic medical record efficiency are other relevant targets for AI-based systems. In a study aimed at examining AI-based workflow algorithms, Letterie et al. (33) developed and evaluated an AI platform designed to optimize clinic workflow during ovarian stimulation and IVF treatments. The goal of the platform was to improve the overall efficiency and effectiveness of the IVF process by providing personalized treatment recommendations, outcome-based predictions, and decision support for clinicians and embryologists. Data was collected from 1,591 unique autologous cycles with cycle characteristics including age, FSH, estradiol, and AMH concentrations, follicle counts, and BMI analyzed to determine the single best day for monitoring a patient, predict trigger day options, and the total number of oocytes. Using this method, Letterie et al. (33) noted that the platform had an accuracy rate of 0.80 for its ability to predict the single best day for monitoring within 1.36 days, also providing a 3-day window for trigger shot administration with minimal effect on oocyte quantity. Such a platform is able to analyze aspects of clinical workflow, patient visits, and embryology laboratory tasks to find the most efficient integration of all three touch points to improve clinic efficiency and reduce overall workload (33). Improvements in patient monitoring technology through these ultrasound predictors can aid in streamlining a patient’s clinic experience and improve scheduling, thereby increasing clinic processes and decreasing some of the administrative burdens. Therefore, this article demonstrates the potential of an AI platform that optimizes workflow during ovarian stimulation and IVF treatments, allowing for level-loading, also known as process balancing, that accounts for clinical, laboratory, and patient limitations and improves clinic efficiency throughout.
Follicular ultrasound monitoring
The process of follicular monitoring is time-consuming and costly, which creates difficulties for patients, especially working professionals. Because of increasing clinical volumes, clinics are pressured to accommodate patient needs with additional ultrasound and laboratory personnel within the constraints of physical space and scheduling. Patients often have to travel to the clinic early in the morning to accommodate both work and clinic needs. Artificial intelligence could help address these issues by automating follicular monitoring and opening the door to the possibility of at-home monitoring.
Initial attempts to address follicular monitoring involved the segmentation of two-dimensional images focused on reliably identifying maximal follicular contours using ultrasound tracking devices trained in follicle counting. Recently, the field has shifted toward focusing on using either segmentation-based AI or recurrent neural networks to analyze three-dimensional (3D) ultrasonographic imaging. Among varying follicle sizes, deep learning models can track follicle measurements with up to 98% correlation (34, 35, 36). A randomized controlled trial evaluating an AI-based 3D sonography-based automated volume count not only showed noninferiority to two-dimensional manual follicular tracking but also showed decreased time per ultrasound (35).
Integrating clinical AI applications into follicular tracking will revolutionize both clinical practice and outcome optimization. Follicular monitoring is the mainstay of managing stimulation cycles, and critical clinical decisions are made on the basis of the accuracy and reliability of ultrasound measurements. Improving efficiency, accuracy, costliness, and accessibility is essential to improving clinical outcomes as volumes increase nationwide. Importantly, AI can serve as a safety net for quality assurance and reinforce clinical protocols and decision-making. Segmentation models have been used to assess optimal follicular volume to predict oocyte maturity (37), a powerful tool that can help define clinical protocols and optimize clinical outcomes. From a clinic-wide standpoint, ML models have been used to streamline ultrasound timing, minimizing the number of ultrasounds needed along with the ability to accurately predict trigger timing and risk of over-response (38). Portable ultrasound systems, have been able to reliably produce clinical quality ultrasound imaging 98% of the time from at-home patient monitoring by remote ultrasonographer-guided prompts alone (39). Ultimately, transitioning toward at-home monitoring systems by integrating existing systems with AI follicular tracking can significantly reduce barriers to care, increasing access to care and each clinic’s catchment area.
Although these various algorithms offer tools to improve monitoring, oocyte retrieval numbers, or follicular count, the question remains how these improvements at various stages of the IVF process can lead to changes in outcomes that patients ultimately care about, such as clinical pregnancy rate and live birth date (LBR). When specifically analyzing changes in MII oocyte numbers, one can see a correlation between improvements at this level and LBR. One study that examined patients undergoing autologous IVF in over 400,000 cycles, found that the LBR increased significantly with increasing oocytes retrieved until a threshold of 16–20 MIIs was achieved, after which the LBR continued to increase but at a diminished rate (40). The study’s results suggested that ovarian stimulation should be optimized for maximum oocyte retrieval although minimizing the risks of ovarian hyperstimulation syndrome. Therefore, AI algorithms that can help increase the overall number of MIIs for a patient will have a direct, albeit downstream, effect on patients’ LBR and provide some justification for their utility in the various processes leading up to a live birth.
The decision of when to administer the trigger shot to induce oocyte maturation is subjective, with many different criteria that can vary significantly between practices. Clinicians will generally track metrics such as lead follicle size and estradiol levels to decide when to trigger a patient. However, AI has the potential to synthesize much more information to help optimize the timing of this decision. A study by Hariton et al. (4) was the first to present a framework for directly predicting whether a patient should be triggered or wait an additional day on the basis of maximizing two-pronuclei (2PN) yield. This study included 7,866 cycles from an academic medical center and used a causal inference T-learner framework on the basis of bagged decision trees with a patient’s baseline parameters, follicle sizes, estradiol level, and stimulation protocol as inputs for the model. Retrospective analyses predicted that an algorithm-assisted trigger decision could yield 1.4 more 2PNs and 0.6 more blastocysts, on average, compared with the physician’s decision.
Other AI models have investigated the direct association between follicle sizes and egg yields using linearly interpretable models and a larger dataset. Fanton et al. (6) presented linear regression models to predict the number of MIIs when triggering today vs. tomorrow on the basis of patient follicle sizes and estradiol levels. The study included over 30,000 cycles from three IVF clinics and found that up to 2.7 more MIIs, 2.0 more 2PNs, and 0.7 more blastocysts could be achieved when triggering in accordance with the day of the highest predicted MIIs. Abbara et al. (41) also used a linear regression model to investigate the association between follicle sizes and oocytes retrieved on a smaller dataset of 499 cycles, finding that follicles size of 12–19 mm on the day of trigger contributed most to the number of oocytes and MIIs retrieved.
Deep-learning models have been employed also to predict oocyte maturity for trigger timing using 3D ultrasound images. Liang et al. (42) used a previously developed image segmentation algorithm to count follicle volumes from ultrasound images of 181 patients at a single site. This study found that AI-trained follicle counting using a volume threshold of 0.5 cm3 or greater resulted in the highest association with MIIs. Further, they found that triggering a patient when the leading follicle volume was at least 3.0 cm3 resulted in a significantly higher number of MIIs.
Although triggering algorithms thus far have focused primarily on the timing of the injection, future algorithms will likely aim to incorporate additional information, such as the type and dose of the trigger shot.
Aside from the many applications discussed above, there are multiple other problems that lend themselves to ML applications. For example, aside from optimizing gonadotropin dosing, ML algorithms could optimize the timing of antagonist start, minimizing the number of doses needed to successfully block ovulation as well as the number of monitoring visits (27). Furthermore, although the early models are focused on optimizing decisions for a given cycle, a second generation of models could incorporate the outcomes of the first cycle to address the role of biological variability between individuals. Although this is something that providers learn to do in training, it has yet to be implemented in published models.
Before being used in clinical practice, AI algorithms need to undergo a thorough validation process to ensure their effectiveness and safety before clinical integration. Model validation is the process of evaluating the performance of the algorithm in predicting its desired outcome. The term “validation” can refer to both the internal validation that is done when first training and testing the algorithms and the external validation done after deployment (43), both of which are key to developing a clinically useful algorithm that can be integrated into daily practice.
During initial model training, internal validation is performed using common strategies such as separating data into a training set and a blind, held-out testing set or using cross-validation. It is important to avoid information leakage when separating data to prevent overoptimistic performance results. Good performance during internal validation, however, does not guarantee that the model will perform with similar accuracy or predictive power when using a new dataset, referred to as “generalizability.” This can be assessed through external validation, where the model is tested on an independent dataset that was not used in training. Creating AI models that can generalize to blinded data is a significant challenge in creating clinically useful models. Clinics may have striking heterogeneity in equipment or clinical practices, which can impact algorithm performance. External validation can also uncover biases in the training data. For instance, when an ML model is trained on data from a clinic that primarily treats patients of a certain age group or diagnosis, the narrowed scope of the training data may lead to poorer performance parameters when applied to a more diverse dataset. To improve generalizability between algorithms, models should be initially internally trained on datasets from a diverse patient population from multiple geographic sites, with subsequent external validation needed with blinded data before deployment.
Beyond the validation of model performance, a number of other factors need to be considered when using AI models in clinical practice. For some AI applications, a fail-safe system may need to be designed to protect against dangerous recommendations that could cause intended harm. Factors such as these can be evaluated in prospective clinical trials, which ultimately serve as the gold standard for assessing clinical performance. Currently, very few studies have addressed the safety or efficacy of developed algorithms outside of their own clinical setting, a challenge that many AI groups are actively addressing through ongoing recruitment for prospective studies and randomized controlled trials. Importantly, educational efforts need to focus on teaching patients, clinicians, and clinical staff the basics of the technology, uniform troubleshooting, and standardized interpretation of the algorithmic output. A major fear for the integration of AI into any field is the replacement or downsizing of essential staff members, such as embryologists. Although automation has revolutionized the manufacturing of most consumer goods, the current AI technology in reproductive medicine serves as an aid for more efficient decision-making and as a safeguard for objective analysis of clinical outcomes. By focusing on overcoming user hesitation for both patients and clinicians, specialized training for clinical users such as embryologists will be essential to the quick adoption and utilization of the algorithms. A proper user interface design is needed to ensure that the models are being used as intended and that predictions are being interpreted correctly by the user. Additionally, clinical workflow and staff training need to be standardized for both technical and clinical staff members to seamlessly integrate these platforms into clinical practice. Further, as algorithms become largely commercialized and deployed throughout clinics, companies will need to develop technological platforms that are HIPAA-compliant, compatible with laboratory equipment, and cost-effective. Focusing on thoughtful, ethical, and responsible integration will be key to allowing AI algorithms to flourish, instead of becoming another expensive add-on with limited utility.
The approval of AI technologies is controlled by government agencies such as the US Food and Drug Administration in the US and the European Commission (CE Mark) in the EU. Regulatory pathways can therefore vary significantly between different regions. For example, from 2015–2020, the majority of AI-based medical devices were first approved in the EU, potentially because of fewer regulatory hurdles (44). Regulation can further vary depending on the purpose of the device. In the US, for example, AI software intended to diagnose, treat, or prevent health problems would be likely considered a medical device, which could be approved through three different pathways depending on the risk and whether or not it is similar to a predicate. Artificial intelligence tools can be also considered nondevice clinical decision software when conditions are met regarding their intention and structure (45), making them exempt from regulation as a medical devices. These different classifications, which vary between different government agencies, can significantly affect how an AI algorithm is regulated.
Existing medical device regulations have not been designed for the dynamic nature of AI. Artificial intelligence models can change over time as more data is collected or as the model continues to “learn,” meaning that the safety and effectiveness of these tools also continue to evolve. Frameworks have been proposed to account for this adaptive nature. For example, the US Food and Drug Administration has proposed a new regulatory framework for AI algorithms (46), which involves an initial premarket review followed by ongoing evaluation and monitoring of real-world performance. This is an evolving legal jurisdiction, which is likely to become more complex in the near future.
It is crucial to acknowledge that not everyone has equal access to these technologies, and we must work to bridge the digital divide in healthcare to prevent further disparities in healthcare access and clinical outcomes (47, 48, 49). It is also important to be mindful of the bias that we may introduce and perpetuate, and ensure the use of heterogeneous, balanced datasets that represent people of all races, ethnicities, genders, and socioeconomic backgrounds (50). When designing these algorithms, there are both known and unknown biases that are introduced into the system. Known biases can originate from variable selection, image quality, imaging platform or microscope specifications, or limiting training data to cycles with a known outcome. One major challenge of many IVF-based AI platforms is that training data needs to have a known output, many times excluding cycles or data with no known outcomes, such as nontransferred blastocysts, when predicting implantation potential. Unknown biases may be related to the homogeneity of the dataset and the selection of variables that may or may not contribute to the outcome. These biases can be either unintentionally introduced—such as by training a convolutional neural network on only Embryoscope images—or unintentionally learned. A striking example of learned bias came from big- technological recruiting. Amazon’s tool was initially aimed at optimizing resume reviews to recruit top talent to the company; however, performance fell short because of learned sexist behaviors because of the male dominance within the technological industry (51).
To prevent mistrust within this rapidly evolving field, care needs to be taken to identify potential sources of bias internally and externally, with swift identification and correction to avoid an AI moratorium. Every layer of decision-making throughout the cycle, particularly those that are responsive to a patient’s biological variability, such as dose adjustment, can lead to bias within a training system because of imprecise data. With the use of technology in a thoughtful and equitable way, we can help ensure that all patients receive the care they need and deserve, regardless of their background or circumstances. Ultimately, the art of medicine will be always grounded in human connection and empathy, and it is up to us to use technology in a way that supports and enhances this essential aspect of healthcare. By doing so, we can help create a more equitable, compassionate, and effective healthcare system for all.
Because AI continues to advance the field of ovarian stimulation, it is important to carefully consider the future studies needed to unlock its full potential. Firstly, future work should evaluate AI tools using live birth outcomes. Thus far, AI algorithms in ovarian stimulation have been evaluated typically on their ability to improve intermediate outcomes, such as follicle measurements, MIIs, or 2PNs. Although recent studies have shown that an increased number of eggs retrieved is associated with higher cumulative live birth rates (40), AI models designed to increase ovarian stimulation efficacy should eventually be evaluated directly on their ability to improve cumulative birth rates. Furthermore, because multiple automated tools are starting to be developed to accomplish the same tasks, a universal framework for comparing the efficacy and generalizability of these tools should be adopted. The growing necessity for data management in healthcare raises a crucial concern for patients: the ownership, safeguarding, and use of their personal information. Because the volume of patient data, data points, and input vectors expands, the ways in which this information is stored, protected, employed, and integrated become increasingly vital. Patients are entitled to know whether their data is being responsibly harnessed to generate algorithms that are beneficial and nondiscriminatory as well as thoroughly protected from the increasingly sophisticated data theft, hacking, and ransomware to which AI will be subject. Because large private corporations and commercial interests steadily permeate the healthcare sector, particularly in reproductive medicine, patient data gains value as a prized commodity. These companies invest heavily in the development of medical technologies, which underscores the pressing need to protect patient privacy, agency, and data anonymity. Any breach of patient data in healthcare may result in the diminishing of patient trust and a major setback for AI (52). Consequently, corporations face mounting scrutiny over patient trust, data privacy, and transparency in data ownership, control, and protection, as well as the implications of data analytics for personal privacy (53). Because patient data becomes increasingly traceable, wearable, and analyzable, it is imperative to establish safeguards for all AI data storage. These risks, however, are no different from those in current patient databases and electronic medical records. Therefore, some of the same safeguards employed today, such as data hashing for deidentification, can be used for cloud-stored AI data. Moreover, novel blockchain technologies are being developed that can significantly increase the anonymity and protection of patient data (54).
The Future of AI in Ovarian Stimulation
The practice of medicine is a deeply human undertaking that requires a compassionate and empathetic approach to caring for patients. With the increasing use of digital technologies and automation in the field of medicine, it is important to ensure that these advancements do not create distance between patients and their healthcare providers. Instead, we should strive to use these technologies to enhance the human connection at the heart of medicine by streamlining care and allowing providers to spend more time with patients. Because we think about what the future of AI may look like, we recognize that this field is still in its infancy with boundless growth on the horizon. Back in the 1990s, few could see how the internet would revolutionize the way we live today. Because we think of ARTs in the future, we can only guess what the future will look like.
It is widely accepted that, given the rate of growth of ART, there is a crucial shortage of specialized providers when accounting for the number of patients in need of care (55). This need is more acute in rural areas and internationally, where large populations do not have a fertility provider. Artificial intelligence has the potential to democratize access to care by expanding capacity. This can be done by making existing reproductive endocrinology and infertility physicians more efficient or by assisting less specialized physicians or advanced practice providers to manage or monitor bread-and-butter IVF cycles under the supervision of reproductive endocrinology and infertility physicians, reserving complex clinical cases for highly trained reproductive endocrinologists. This could particularly improve outcomes and access to care in rural communities or developing nations that have a shortage of specialized fertility providers (56). Paired with advances in at-home monitoring, AI-based stimulation management will allow the expansion of catchment areas for IVF clinics. Future prospective studies should evaluate how AI tools can be implemented to allow for less specialized clinicians to expand their scope.
It is important to acknowledge that because of the inherent and unavoidable uncertainty of an AI-driven future, there will always be some fear regarding when and how these algorithms will be used in ways that benefit our patients. Appropriate validation, minimizing biases, and well-thought-out integration of these technologies must be continuously questioned and reevaluated to ensure that these algorithms are not causing statistical errors that can negatively impact patients. The “black box” nature of AI can lead to understandable fear about the validity of these platforms, but there are methods, as detailed previously, that can be employed to lessen these risks. Moreover, with various AI algorithms being developed in all aspects of clinical care, from physician treatment plans to the grading of embryos in the laboratory, it will be critical to not have these systems operate independently in silos but instead, be integrated because a functioning system within the whole clinic. This may require further validation because these systems are combined to ensure minimal bias.
We will end by postulating a patient journey that incorporates AI into the whole patient journey: When a patient meets the criteria for infertility, AI suggests seeking help from a reproductive specialist. This could occur through an EHR, a menstrual tracker app, or any other software with which a patient trying to conceive interacts. Existing medical records and intake forms are synthesized, and suggestions on recommended diagnostic tests are made. Once these are done, a complete outlook is presented to a reproductive specialist, along with a recommendation for the most appropriate treatment on the basis of success rates, time to pregnancy, and/or value. The treatment plan can be optimized further by taking into account the preferences of the doctor or patient when desired. From there, an AI-driven protocol selection and dosing tool can aid in selecting the most effective dosage of medication and monitoring the timeline for the patient’s progress, with oversight from a provider as appropriate. Follicular monitoring, which is a time-consuming process, can be streamlined using AI tools and potentially done at home when desired by the patient. The integrated dose adjustment and appointment system can remind patients of their upcoming appointments, adjust their medication dosage as needed, and even order more medications when a patient’s inventory is running low. Nearing the end of stimulation, a trigger tool can aid doctors in selecting the optimal day for the trigger, and patients can receive visibility about the growth of their follicles and likely trigger dates for planning. Artificial intelligence can also level the load in the laboratory, smoothing out the volume for retrievals and blastocyst biopsies when needed with minimal impact on outcomes and helping laboratory directors ensure optimal laboratory staffing.
Although this technology is far from ready for immediate daily use, we may not be waiting for a distant future before accessing these advancements. Bill Gates wrote in 1996 that “we always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten” (57). We hope somewhere in that range we can harness the potential of AI to revolutionize the provision of IVF, making it more efficient, effective, personalized, and accessible to all.
E.H. is a medical advisor and holds stock options in Alife Health and Cercle AI. Z.P. has nothing to disclose. M.F. is an employee of Alife Health and holds stock options. V.S.J. has nothing to disclose.