Data Science, Machine Learning, Natural Language Processing, Text Analysis, Recommendation Engine, R, Python
Wednesday, 31 July 2019
Kotlin vs. Groovy: Which Language to Choose
The colossus of Java, for example, has a number of offspring; some of them have proved to be a success. One of them, Kotlin, was backed by Google as the official language for Android development in 2017 and was reported to be the second most loved and wanted a programming language in 2018 Stack OverFlow survey and remains in Top 5 in this year’s survey. Another successful member of Java-based languages is Groovy that is gaining popularity among developers. At the same time, the 2018 Stack OverFlow survey listed Groovy among the most dreaded languages. In this setting, it seems unfair to compare the languages, but let’s see whether Groovy is so dreadful compared to Kotlin and, generally, which of them to choose as another addition to your bag of skills.
Overview
Kotlin
Kotlin is developed by Jetbrains — a company well-known in the Java world for its IDE named IntelliJ IDEA — and open sourced in 2012. It is ...
Read More on Datafloq
Why Cybersecurity Essential to Protect the Health Industry
As these cyberattacks continue to jeopardize the personal information of patients, the subject of cybersecurity has received greater attention. Professionals in health information management have taken on more responsibilities, shifting in their role as they work in coordination with their organization's IT department.
Unfortunately, we have few simple solutions to the problem. Cybercriminals grow more sophisticated with each passing year, refining their former techniques as they adapt and improve. They've developed new methods to bypass the security of modern organizations and collect the sensitive data of vulnerable people.
As context, we've seen more than one breach per day every year since 2016, and experts predict this trend will continue through 2019. So how can healthcare organizations protect their patients from the threat of malicious hackers? How have we made progress in cybersecurity, and where are we going next?
Progress in Cybersecurity
As mentioned earlier, professionals in health information management now coordinate with the IT department. They've adopted a range ...
Read More on Datafloq
Tuesday, 30 July 2019
Zombie-Car Taxes Are Arising For AI Autonomous Cars
By Lance Eliot, the AI Trends Insider
Taxes seem to be the bane of our existence.
Whenever a new technology or innovation appears, one of the questions that inevitably gets raised is whether or not to tax it (the answer usually is yes) and how to do so.
Consider for example the advent of online retailing, which has increasingly become a favored way for people to shop and seems to inexorably be undercutting storefront retailers.
On the taxes side of things, a now famous U.S. Supreme Court case seemed to settle a thorny question about taxes as it relates to online retailers (a case pitting South Dakota versus Wayfair Inc.).
The United States Supreme Court ruled that online retailers would henceforth be required to collect sales taxes in states for which those online firms have no actual physical presence, assuming that the respective state wants those retailers to collect such taxes (yes, most states if not all will – it is essentially “free” revenue for the state).
Brick-and-mortar firms cheered feverishly and popped champagne when they heard the ruling.
Online retailers cursed out loud and were dreading this possible decision.
Significance Of The Online Retailer Taxes Case
Let’s unpack the case and then recast it toward another newly emerging innovation, namely the advent of AI self-driving driverless autonomous cars.
At the start of the 1990’s, the Supreme Court had ruled in the Quill Corporation versus North Dakota case that online retailers could not be forced to pay sales taxes in states for which they did not have a substantial connection in. Pretty much if the online retailer had no physical offices or warehouses in a state, it was nearly impossible for the state to try and hit them with sales taxes.
For online retailers, this was a godsend.
First, it meant that their goods could be sold without having to include the added cost of sales tax in the state for which an item was ordered by a customer. Physical stores were maddened at this decision since it meant that they were inherently more expensive than if you bought something online. You might counter-argue that the online retailer has to charge you shipping, while the local store does not, but the cost of shipping has come down dramatically, plus many consumers don’t think about the true cost of ordering online (they often neglect to add the shipping cost when they compare in-store prices versus online prices).
Second, online retailers, while laughing all the way to the bank, claimed that it was just too byzantine to try and collect the sales tax for every state. They argued that the effort to calculate the sales tax for this state and for that state, would be an undue burden. It would jack up their costs unfairly. A physical store in state X only has to collect sales tax for items sold as based on the sales tax of state X. Meanwhile, the online retailer would have to figure out the sales tax for state A, B, C, D, and so on. It would be a nightmare.
Online retailers have had a free ride, one might say, since the early 1990’s.
The ride is over. Now, they will need to collect the sales tax for any item they sell that is purchased by a customer in whatever state the customer is in. Will this be horribly complex?
Most would say that with the advent of modern day computers, trying to calculate the sales tax is no longer a manual or laborious burden and it doesn’t really matter how many states you are selling into. It’s a few lines of extra code in a program. So, that prior argument about added complexity is now out the window.
Will states decide they want to hit online retailers with a sales tax. Of course, since it is “free” money. Imagine how much additional revenue a state can take in. It’s a windfall. Plus, the brick-and-mortar firms in a state will argue, as they have all along, the states “must” charge the online retailers with a sales tax as a matter of fairness and logic. The logic part is that why would a physical store want to be in a given state if the state is going to let the online retailers get away without paying sales tax. It puts the physical store at a disadvantage. Meanwhile, the physical store likely is helping the state economy in many other ways, including hiring people in that state, perhaps making use of local vendors in that state, and otherwise helping that state to be vibrant.
Justice Anthony Kennedy indicated in the written opinion by the majority in the 5-to-4 ruling that states have been losing out on about $33 billion in annual sales tax revenue. Ouch! States are now already starting to figure out how they will divvy up all that sales tax revenue that will be pouring in. A lot of special interests in each state will want to grab a piece of that pie.
One important aspect that with hindsight we can perhaps concede is that the lack of a state-by-state sales tax might have aided the advent of the online retailers. Perhaps, if there had been a sales tax imposed, there might not be as many online retailers today, or the big ones might not be as big. As a society, by allowing online retailers to be unburdened by the sales tax, we might have allowed the innovation of online retailing to grab hold. It is like a new plant that we carefully protected from the other fauna, so that it could take root. Now, the new plant is presumably established enough that it can exist without special exemptions. That’s the hope or thought about why we did not force the online retailers into collecting state sales taxes for their online sales.
Do you think that online retailers will now be curbed or substantially harmed because of the Supreme Court ruling?
It’s admittedly hard to see that it would.
The trend towards online purchasing has hammered the brick-and-mortar world. Physical stores and malls have been devastated. It would appear to be an unstoppable trend. Seems unlikely that just because the online retailers now will be charging you sales tax that you’ll say to yourself, darn it, I’ll go over to my local mall and buy that item there instead.
AI Autonomous Cars And Emerging New Tax
What does this have to do with AI self-driving driverless autonomous cars?
At the Cybernetic AI Self-Driving Car Institute, we are developing AI systems for self-driving cars, and also identifying key trends of where self-driving cars and AI is headed. It seems likely that we’ll soon enough see states opting to impose so-called Zombie-cars usage taxes.
Zombie-cars?
Yes, some like to refer to self-driving cars as zombie-cars.
Kind of cute, I suppose.
For those of you AI developers that are pouring your hearts into making true Level 5 self-driving cars, which are self-driving cars that can drive without any human intervention and are supposed to be able to drive as a human could, it would seem somewhat disheartening to think that some would call your creation a zombie. Zombies are brain dead. At least call the self-driving car a Frankenstein, which though considered a monster, it did have something of a brain.
See my article about Frankenstein and AI self-driving cars: https://aitrends.com/selfdrivingcars/frankenstein-and-ai-self-driving-cars/
See my article about why these cars should be referred to as self-driving cars: https://aitrends.com/selfdrivingcars/ai-reasons-call-self-driving-cars/
Zombie-cars usage taxes?
Yes, there will likely be usage taxes imposed onto zombie-cars. Well, another way to phrase things is that there are going to be “self-driving car” usage taxes, which if OK with you is how I will refer to it henceforth herein. Just don’t like the connotations about those zombies.
Let’s consider what will happen once AI self-driving cars become prevalent.
Most are predicting that self-driving cars will be working non-stop.
You have a built-in chauffeur that never sleeps and is always available to drive the car. When you are at work, no sense in having your AI self-driving car sitting out in the parking lot. Put it to use! Especially if you can make money by doing so. The odds are that most people that own a self-driving car are going to turn it into a ridesharing revenue source. Running it essentially 24×7, you can recoup the costs of the self-driving car (hopefully) and maybe make a profit besides.
See my article about non-stop use of AI self-driving cars: https://aitrends.com/selfdrivingcars/non-stop-ai-self-driving-cars-truths-and-consequences/
Where is your non-stop AI self-driving car going to go?
If you live in city Y, you’ll most likely want to keep your self-driving car relatively close in case you need it, so you’ll rideshare it for people in city Y that need to get a lift. Suppose that there are hundreds, maybe thousands of these AI self-driving cars, all cruising around your city. It would be akin to an Uber driver that is cruising around, waiting for a request for a pick-up. You probably won’t have the self-driving car be parked, and instead keep it moving. You’ll likely even be using some kind of Big Data analytics system that will help predict where your self-driving car should be. If it’s nearing 2 a.m. and the bars are closing, seems like that’s the part of town where your ridesharing AI self-driving car ought to be.
Anyway, imagine these self-driving cars crisscrossing city Y.
Seems like it will impact the roads there in city Y. Today, during late hours, those roads are hardly used. In the future, there might be roads that are getting non-stop usage. Non-stop usage of the roads means they’ll wear out more. Roads that wear out more need greater amount of upkeep and repairs. Greater upkeep and repairs costs money. Where will the money come from for those rising costs? Answer: AI self-driving cars.
You can expect that local cities, the states, and the federal government will all wake-up and realize that the popularity of AI self-driving cars has caused a tremendous strain on the roadway infrastructure.
If the cause of the problem is due to AI autonomous cars, perhaps one could suggest that the “cause” also should be the solution. Impose a usage fee on AI self-driving cars. The money raised would presumably go towards the infrastructure that supports those roving beasts.
Of course, there are some governmental entities that might opt to use the money for some other purpose entirely.
They might decide that spending the money on homelessness or other local societal aspects is preferred over spending it on infrastructure. The infrastructure angle is usually the way to get society to buy into the usage fee, since it suggests a tit-for-tat. If the AI self-driving cars caused added costs for infrastructure, they should pay for it. That would seem fair to most, though maybe not necessarily to the owners of the AI self-driving cars. On the other hand, once a government finds a spigot that pours out money, they’ll likely spend it as they wish.
This brings up an important point that ties back to the online retailers and the sales taxes by state.
Remember that it was viewed as a crucial breathing space for online retailers to grow and prosper by not having had a sales tax burden to meet. Would the imposition of a usage fee possibly curb or disrupt the advent of AI self-driving cars? If so, do we want to cause an innovation that appears to have great societal payoffs get “harmed” by having to deal with usage fees? This is going to likely require a significant amount of open public debate.
For more about regulations and AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/assessing-federal-regulations-self-driving-cars-house-bill-passed/
You might wonder how the owners of AI self-driving cars could possibly complain about a usage fee, since if indeed their vehicles are impacting the roadways it seems befitting that they pay for it, somehow.
Their counter-argument would be that the AI self-driving car is providing other societal good. For example, by providing a ready mode of transportation, these cruising non-stop self-driving cars are offering a form of “public transportation” and that perhaps a municipality is able to cut-back on their costs for buses and other forms of mass transit. It would be unfair to hit the AI self-driving cars with usage fees and neglect the fact that these vehicles are already doing so much for the city or locale.
How To Determine The Taxes Owed On Zombie-Cars
The debate about the appropriateness of a usage fee is going to likely bounce back-and-forth. Meanwhile, let’s consider how usage fees would even be ascertained. The usage fee should presumably be based on one or more usage factors.
Here’s some of the most likely usage factors:
- Per occupant
- Per delivery
- Per miles driven
- Per time driven
- Per trip
One means would be to require the AI self-driving car to self-report the number of occupants in the car, for any trip that takes place. There could be a usage fee charged per occupant. The owner of the AI self-driving car would need to decide whether to pass along the cost to the occupants, or absorb the usage fee otherwise into the cost of doing business.
This approach though seems perhaps intrusive in that it requires counting people.
Furthermore, if you have one person that goes one hundred miles, while there are five people that go two miles, it doesn’t seem sensible that if the cost is a fixed fee per occupant that the shorter trip has the higher usage fee.
There’s another drawback of using occupants as a factor, namely that if the AI self-driving car is used to delivery pizza, and there’s not a human occupant needed, the AI self-driving car would travel around the city without having to pay any usage fee.
Thus, another factor would be to impose a per delivery fee. This would be eagerly done by many cities that want to catch revenue for Amazon deliveries, and for the food GrubHub deliveries, and the like.
We can combine these factors.
Perhaps there is one level of a usage fee for the occupant’s model, and a different level of a usage fee for the delivery model.
Another approach says that rather than caring about what the AI self-driving car is doing, charge a usage fee for the number of miles driven.
This would be handy too since the occupant or delivery models assume that the self-driving car is being used for a particular purpose. If the AI self-driving car is cruising around, and there’s no occupants and no delivery underway, perhaps because it is staying in motion to most likely be near to wherever it is needed when so needed, the miles driven approach would make sure that the usage fee applies.
You can argue that the miles driven metric has its own advantages and disadvantages, and so you could possibly end-up suggesting that maybe the better approach is based on time.
Time takes distance out of the equation. If the AI self-driving car is on the roadways, and moving at one mile per hour for five hours, maybe that’s more important than if it was being driven for 80 miles per hour for thirty minutes. You could also try to argue in favor of using the number of trips as the usage metric, but that’s squishy since who’s to say exactly what constitutes a trip.
We can mix-and-match these metrics. We can also add various conditions. Maybe the usage fee is based on the metric of time during the morning hours, and then it shifts into the metric of the number of miles driven, doing so for the afternoons and evenings. You might at first complain that having varied limitations and metrics is going to be onerous to calculate, but that’s kind of the same argument used in the early days of the online retailers. With today’s vastly superior computing power, I don’t think you can make much of a case about things being labor intensive to calculate.
Zombie Self-Reporting To Aid The Calculating Of The Taxes
Notice that I mentioned that the AI self-driving car would self-report its usage.
How would it do this?
It could keep track of the miles driven, or the time underway, or the number of occupants, and so on. This tracking could then be communicated to a municipality via OTA (Over The Air) updating capability that AI self-driving cars are going to have. It would seem relatively straightforward that the AI self-driving car would be required to report its numbers say each day, or each week, and the government would then receive those numbers and charge the usage fee.
You could even consider using blockchain to have the AI self-driving cars record their usage, and then have the government tap into the blockchain to impose the usage fees.
See my article about OTA for self-driving cars: https://aitrends.com/selfdrivingcars/air-ota-updating-ai-self-driving-cars/
See my article about blockchain for AI self-driving cars: https://aitrends.com/ai-insider/blockchain-self-driving-cars-using-p2p-distributed-ledgers-bitcoin/
I’m sure there will be cheaters.
There are bound to be AI self-driving car owners that will try and find a means to avoid the usage fees. Since the numbers are presumably self-reported by the AI self-driving car, there could be a method to hack your AI self-driving car so it under-reports the numbers. It is likely the government would need to try and audit the AI self-driving cars and determine if there is anything funny going on about the reporting aspects.
The government could also potentially use other means to try and verify the self-reported numbers. With the advent of V2I (vehicle to infrastructure) communications, the roadway infrastructure is going to be able to track our cars in ways not as feasible today. It is conceivable that the infrastructure will know exactly where your self-driving car has been the entire day and can calculate the usage fee without even asking your AI self-driving car for the numbers.
All this potential collecting of data about the AI self-driving car raises potential privacy issues. We’ll have to see how that plays out in the usage fee debates.
See my article about privacy and AI self-driving cars: https://aitrends.com/selfdrivingcars/privacy-ai-self-driving-cars/
Conclusion
One final point to consider.
We had the tension between the brick-and-mortar stores and online retailers.
The emergence of AI self-driving cars as ridesharing vehicles is certainly going to create tension with human-based ridesharing services.
This includes taxis and shuttles, but also includes those everyday folks that are trying to make some money by being a rideshare driver. What will happen to them?
Maybe the usage fee of AI self-driving cars will help save them, since it might make the costs of using an AI self-driving car higher than choosing the human-driven approach. Or, maybe to be “fair” we’ll have usage fees on any ridesharing service, regardless of AI based or human-based.
As you can see, there are lots of public related considerations that go into all of this.
The revenue though of Zombie-cars is just going to be so tempting that no one will be able to overlook it. You can bet that there will be many start and stop attempts to come up with usage fees on those Zombies. The zombie apocalypse will need to pay for itself.
Copyright 2019 Dr. Lance Eliot
This content is originally posted on AI Trends.
Your Mobile Marketing Data Is Dirty - But These Mobile App Attribution Techniques Can Help
In order to figure out which specific marketing campaigns are delivering the best results in terms of lead generation and conversions, an attribution model is often used to track links and other marketing tools to record interactions. However, when it comes to mobile marketing, the traditional attribution approach is not always perfectly accurate.
The reason why mobile attribution is a totally different ballgame than traditional marketing attribution is because of its complexity. It is far easier to track a customer’s journey on a desktop version of a website by using cookies, image tags, and customized URL parameters. However, this is far more complicated on mobile devices, particularly if a customer is switching between the mobile website and an installed app.
Unfortunately, many marketing teams struggle with proper attribution when it comes to mobile interactions because they do not understand how to track their campaigns properly. The good news is there are some specific mobile ...
Read More on Datafloq
Monday, 29 July 2019
AI and the Future of Business Intelligence
Today, traditional analytics as a component of business intelligence is becoming a thing of the past. The way forward seems to be rooted in AI applications. Here, we will explore how AI is changing the nature of business intelligence processes, and what these changes mean for us as humans.
AI and Business Intelligence
Data analysis, namely prescriptive analysis, and predictive modeling are central to business intelligence. Both are essential in decision-making processes and developing future business plans. Today, professionals can combine these methods with AI and machine learning to gain more detailed insights. In fact, Gartner’s Top 10 Data and Analytics Technology ...
Read More on Datafloq
Building Teams for Data Science, Analytics, and AI
Sponsored by:
For senior leaders, managing data science is crucial to gain the most benefit from investments in AI and data analytics. Effective management involves people, culture, technology, and process. In this episode of CXOTalk, industry analyst, Michael Krigsman, interviews a top data scientist and business leader, who shares advice on how to create and manage a data science team.
Dr. Bülent Kiziltan is an AI executive and an accomplished scientist who uses artificial intelligence to create value in many business verticals and tackles diverse problems in disciplines ranging from the financial industry, healthcare, astrophysics, operations research, marketing, biology, engineering, hardware design, digital platforms, to art. He has worked at Harvard, NASA and MIT in close collaboration with pioneers of their respective fields. In the past 15+ years he has led data driven efforts in R&D and built multifaceted strategies for the industry. He has been a data science leader at Harvard and the Head of Deep Learning at Aetna leading and mentoring more than 200 scientists. In his current role, his data driven strategies with machine learning, analytics, engineering, marketing, and behavioral psychology components had a disruptive impact on a multi-billion dollar industry sector. Bülent’s previous appearance on CXOTalk was popular and engaging.
Topics discussed
- Unique aspects of data science
- From a management standpoint, what makes data science unique?
- Is there a distinction between managing a data science vs. an AI organization?
- Give us an overview of the unique challenges in managing data science?
- ROI and organizational expectations
- Who should own data science or AI in the organization?
- Where should data science report?
- Under Engineering, as a standalone organization, or someplace else?
- To what extent is data science an ROI or R&D basic research function?
- Is it imperative for corporate data science teams to maintain ongoing relationships with academics and external researchers?
- What are reasonable ROI expectations from the data science group?
- Talent profile
- What characteristics make a great data scientist?
- Do you need a technician, scientist, or business person?
- When hiring, what skills should you prioritize?
- Talent management
- Are there unique points on managing a team of data scientists?
- What about mentoring and ongoing training?
- What drives employee retention for data scientists?
- How can you make your workplace attractive to data scientists?
- How can a company entice data scientists when they are so much in demand?
TRANSCRIPT
Michael Krigsman: Today we’re talking about how to manage a team of data scientists. It’s a crucial topic. We’re speaking with Bülent Kiziltan, who is one of the most articulate and outspoken data scientists that I know.
Bülent Kiziltan: Thank you. Great to be on the show again, Michael.
Michael Krigsman: Give us a flavor of your background to set some context for us.
Bülent Kiziltan: I’m trained as a physicist and an astronomer. I have spent my career searching for neutron stars and black holes. During that pursuit, I used different aspects of apps like math and machine learning for more than 20 years. Now I’m applying those skills in the industry.
Michael Krigsman: All right, so I’m hoping that the topic we talk about today is going to be a little bit less complex than looking for neutrons and black holes but, hey, we’re talking about people, so maybe it’s more complex.
Bülent Kiziltan: That’s exactly what I was going to say.
Michael Krigsman: Okay. Bülent, what is unique about data science, unique about AI when it comes to building teams and managing teams?
Bülent Kiziltan: Right. First and foremost, it’s a new domain for the business and the industry. There is a lot of experimentation going on. The strategy, when it comes to managing, building data science teams and creating value with those teams is still in its infancy.
There is a lot of experimentation. Companies change strategies all the time. There is no single answer. There is no single right answer. Bigger companies versus smaller companies, industry domains matter a lot, and the culture is very, very important. I think we’re going to talk about that as well.
Michael Krigsman: Is there a distinction between managing data science and managing AI? I think there’s a lot of confusion between those two things.
Bülent Kiziltan: Right. Just a few days ago, I wanted to go online and look what people say on blogs about AI, data science, and analytics, what are the differences. Once you go through the blogs and the information that are in it, it’s really confusing. All of those posts have a certain level of truth and correctness in them but, mainly, it’s an area that’s very new to the industry.
AI is a more generic term that is considered more general an umbrella term that sits above data science, and data science typically would sit above analytics in terms of its comprehensiveness, if you will. But there is a lot of confusion. When it comes to etymology and how the context of the world changes, I think a culture and industry will play its part. So, it’s currently reshaping itself. There are overlapping things about all those three areas. There are things that are very different about all those three.
Michael Krigsman: From a management standpoint, do we need to think about them in very unique ways?
Bülent Kiziltan: I think, not based on whether it’s called AI, data science, or analytics, but I think what is more important in managing data science teams, if you will, is in what type of company and what type of domain you are operating in. Depending on the business objectives, I think one can come up with a more appropriate designation for the team. For instance, data science versus analytics.
Michael Krigsman: All right. I think that that sets the stage. Fundamentally, the issue is one of this being a new domain, and so we’re still trying to figure it out. Is that the correct sort of ground level beginning that we need to start from?
Bülent Kiziltan: That’s one important aspect. Another aspect of this is data science is science. It is still a hybrid of an academic culture and a business culture. Companies have a hard time hitting the right balance that aligns well with their business objectives. This is one area that we all are struggling and we’re experimenting with, hitting the right cultural balance within an operation.
I would say, one can come up with the most interesting strategy, but cultural eats strategy for breakfast. If your culture is not set right, you cannot execute on the strategy that you are thinking of.
Michael Krigsman: Let’s talk about the ROI and the organizational expectations for a team that’s being managed. Organizationally, where should a data science team or a data science department fit inside a company?
Bülent Kiziltan: If I were to think only like an academic, I would say they should be independent. Obviously, people who have been in the industry for a very long time, they have a different mindset. Depending on the company size and their business objectives, it could be considered as a supporting department and then many companies have data science operating on their engineering, but their business objectives are–I would imagine–more short-term.
AI has delivered on its promises. It’s creating a lot of ROI. Executives and the companies are well aware of what the ROI can be if analytics, operations, or data science operations can set their strategy independently of the engineering strategies because engineers have different priorities as opposed to analytics folks or data science folks. In my opinion, now we are at the stage where AI, data science, or analytics operations should be independently reporting to the board or be represented at the C-level at least regardless of the size of the company.
Michael Krigsman: I want to remind everybody that we’re speaking with Bülent Kiziltan, and we’re talking about how to manage AI and data science organizations; what are the unique aspects of this?
Bülent, the kind of expectations that companies have of their AI departments or their data science departments, you made a distinction between the business expectations and the technology or the engineering expectations. Can you elaborate on that?
Bülent Kiziltan: Everybody knows there is a certain level of hype that comes with AI or data science. Early, when this hype started, we had seen some justified skepticism from high-level business executives because the teams that were formed were not, essentially, executing or delivering the ROI. I would say there were some unrealistic expectations at the beginning but, right now, we are at the level where we can utilize more mature tools. The talent pool has diversified itself. Still, there is not enough talent out there, but we hire data scientists from very different profiles.
Data science, I would say that it is more of a creative process than an engineering process. Engineering processes require creativity as well, but I think that that creative aspect is very dominant in data science processes. An engineering mindset typically has an output and input and they are, generically speaking, trying to optimize that whole process whereas, in data science, if managed properly and has an aligned vision with the company, is about discovery, extracting new information. That is very different from a sole engineering operations perspective.
This is why I think companies who are trying to make an impact, who are trying to come up with disruptive innovation on the large scale or on the small scale, I think, are moving in the direction where data science operations now are led by domain experts that really have done data science, have written code, and they’re leading the data science operations by example. This is why we have all sorts of titles floating around. That leader type is also very scarce right now, but people and leaders who are domain experts in data science or AI but also have a business acumen and experience are the leaders that are highly sought after today.
Michael Krigsman: You mentioned that data science needs to be aligned with the goals of the organization.
Bülent Kiziltan: Right.
Michael Krigsman: Please, talk about that. That seems like a very crucial dimension here.
Bülent Kiziltan: There are, I think, two extremes when it comes to building a data science strategy. One extreme is the academic mindset where you do research for long-term impact. The other extreme is the short-term pragmatism that comes with short-term deliverables in the business setting.
I think, in data science operations, the balance has to be set right to align with the business objectives of any company. There are larger companies such as Google and Facebook which have the resources to make mid-term and long-term investments only. They have research teams that really operate like academic institutions. Most of the companies don’t have those resources. They don’t have those objectives.
Any leader who is coming in, I think, has to first and foremost identify the business objectives and what sort of short-term, midterm, and long-term deliverables he can deliver to the board in order to justify the operation’s existence. This is very important.
On the other side, and we see that more often than not, AI or analytics operations are managed by non-domain experts. I would call them managers. There are some pragmatic reasons why you would want a person that doesn’t have a data science background but has really the business experience, especially in larger companies, mainly because the value that one can create is most often hindered by the internal dynamics and the stakeholders. A leader that goes into a bigger company has to really consider the balance of different stakeholders and convince them. That comes with business experience.
I think the right balance is somewhere in between where you really deliver on the short-term. There are lots of low hanging fruits in all sorts of business settings, especially in larger companies. A smart leader would focus on delivering the short-term and invest in the long-term.
Michael Krigsman: Given the importance of aligning the data science efforts–and you could say the same thing for AI–can you share with us any examples where the business goals, where the data science efforts were not aligned with the business goals? What’s the kind of outcome that then happens?
Bülent Kiziltan: I will talk about what we generalists see in larger companies today is data science and analytics efforts are led by non-domain experts. They don’t know much about data science itself, but they know really how to manage groups, how to maybe build groups, and how to talk to different stakeholders internally.
Through that relationship, they can justify their existence and large budgets by delivering on the short-term.
When you have an AI manager, if you will, I wouldn’t call them leaders. If you have a manager that has a sole business objective and business background, they will have the shortsightedness that comes with the business pragmatism. They will go after the low hanging fruit only because this is how they have been operating and delivering it to the business executives.
Data science operations, especially when it comes to sustaining the value that AI promises, requires long-term investment. One of that investment is to attract talent and to retain that talent. If you have a data science operation that focuses on short-term goals only without giving the data scientists creative space, those folks will be very difficult to be retained in any type of operations. In that relationship, an employer has to ask what type of value do I bring to the table and do I invest in the continual mentorship and training of the data scientists.
In different domains, the domain knowledge that the company has adds to the employee. But, within data science, one of the nice things about data science is it’s somewhat domain agnostic. So, when a data scientist comes onboard, they will build the required domain knowledge, whether you’re in healthcare or in finance, about what they do, and the value that they create is not directly related to that domain expertise. So, they know their worth and they can switch from one domain to another.
As an employer, the companies have to really consider how to retain that talent. There are methods one can do that. There is a clear ROI case to build a culture in which data scientists collaborate rather than compete, it’s more human-centric, and there is an academic aspect to that operation. It’s because if you are looking just at the skillset of data scientists and work them 150%, that skillset will become obsolete in six months. Continual training and an environment in which intellectually they get enriched is essential in any type of data science operations, in my opinion.
Michael Krigsman: Let’s shift gears here and talk about data science talent. First off, why is this such an important issue?
Bülent Kiziltan: It’s a new domain and there are a lot of people who are trying to come into that domain from very diverse perspectives, very diverse trainings. I think all of them bring an important aspect from their own domain into the creative process of doing data science. Having folks with a psychology background, having a data scientist coming from a math background or an astrophysics background, they all bring in interesting ideas.
As I said, data science is more about the creative process and problem solving than just using certain tools. To know how to use certain tools is important, but what is more important is problem-solving skills. Each diverse background brings an interesting aspect and perspective to the problem-solving process. So, that is very important.
One of the negative sides of being in a new domain is every data scientist is above average in one of those three aspects when it comes to statistics, code writing, or the domain expertise or when it comes to contributing to a problem solving a certain domain. There has to be a continual training for data scientists, and I value very much diverse backgrounds. That is one of my approaches to building data science teams. Diversity is very important.
Michael Krigsman: You’ve mentioned culture several times now as being really important. We have a question from Twitter. Arsalan Khan asks, “How do you create an AI-focused culture when employees are fearful that their jobs will be taken away by that AI?”
Bülent Kiziltan: By that AI? Well, the fear that AI will come and take jobs is largely unjustified, I think. Some folks who are popularizing AI, they come up with figures and claim that AI will take away jobs. I ask them, “What’s the data? How do you justify that claim?”
I don’t know whether AI will take away jobs, but what I know for sure is the demographics in how the labor market is going to look in five years will be very different from today. Just looking back in innovations that are similar to AI, like the invention of electricity or engines in the past, it changed the labor market dramatically but didn’t diminish the labor market in numbers. I don’t think so.
I don’t think that AI, in general, will have a negative impact on the job security for the future. I think that fear is not justified. this has to be addressed.
Setting the culture, I think that fear is not the most important issue. The most important issue is to create an environment which is collaborative and human-centric. It’s intellectually very rich mainly because the problems that we’re facing today are very complex and we need creative ideas to solve those problems. Data science, AI, and machine learning is providing very powerful tools to solve those problems.
People coming with all sorts of different backgrounds are bringing in something that’s valuable. As a leader, we have to make sure that each individual’s opinion is valued. The merit of the opinion wins over the title. This is my approach to things.
Ideas are very important and data science is a new domain. None of us are formally trained in data science. Everybody brings in something different which has value.
Michael Krigsman: What about the relationship between academics and data science inside, say, especially large companies, but I think it’s probably equally true for smaller venture-funded startups.
Bülent Kiziltan: Yes. The bleeding edge know-how is still created and produced within academic settings or in larger companies that operate like academic institutions. We have to consider this if you want to continue and sustain the value that data science creates for companies. I’m an advocate to continue the relationship with my colleagues from the academic institutions and building really productive relationships both ways.
There is a lot that we can learn from folks that are operating in academia. I still read The Archive every day and see what’s coming out in my own domain, in both domains, in astrophysics and in data science or AI-related fields. But also, create a relationship with them that gives them incentives to contribute to solving problems in the industry.
Some bigger companies find different solutions for that where they fund academics full time or they give them the resources so that they don’t have to think about bringing in grants and focus on problem-solving, which is their main job. But in any setting, in smaller companies or bigger companies, I think there has to be some sort of an ongoing productive relationship with the academic world in order to sustain the value that AI is bringing today.
Michael Krigsman: What about smaller companies? How do smaller companies manage? It’s clear with a large company. They have the resources to do that, but what about a smaller company?
Bülent Kiziltan: Smaller companies actually hire folks directly with very strong academic backgrounds. Larger companies, on the other hand, because they have internal dynamics and silos, they typically hire people who are not domain experts in data science but are business managers. I don’t think startups have a problem in building that relationship very informally and casually with academic institutions because this is their main source where they hire that talent. Whereas, at bigger companies, they have their own problems bringing academics into their operations.
Michael Krigsman: It sounds like you’re a big fan of the kind of work that’s done, innovation–put it that way–inside smaller companies.
Bülent Kiziltan: That’s right. I mean I’ve operated and worked in both domains, if you will in the startup space and in bigger companies. Both have pros and cons, positive and negative aspects to their operations. But, when it comes to innovation, I think moving quickly is very important and, startups, they don’t have the stigma of having different silos. They are very fluid in terms of hierarchy, so they can provide and make that impact really quickly.
Time is an important asset today, especially when it comes to AI. Things are changing on a weekly basis, so we have to be quick. Having larger companies having a lot of people to manage, internal dynamics to overcome is a lot of wasted time and resources for bigger companies.
This is why I’ve seen a trend where bigger companies are acquiring smaller companies for their AI operations or they are having a more organic merger with different companies. Smaller companies definitely have their advantages when it comes to innovation.
Michael Krigsman: The challenge that small companies face is they have speed, they can move quickly, but they often don’t have the resources that large companies do. And so, what are you seeing inside smaller companies when it comes to AI and data science to let them overcome the lack of resources?
Bülent Kiziltan: That’s correct. The lack of resources is one of the disadvantages smaller companies have. One important resource that bigger companies have is access to data. Data is currency when it comes to doing data science.
Bigger companies cannot move fast, whereas smaller companies can move really quickly. Depending on the problem that they’re working on, sometimes working with smaller teams is more advantageous than working with bigger teams, to be honest. As long as smaller companies have access to data, I think the day-to-day operations, the hardware requirements are not that enormous and you don’t need an army of data scientists to address a certain problem or come up with an interesting solution. I think the asset that bigger companies have is the data that they’re owning and smaller companies have the speed, so I encourage companies to build synergistic relationships in that regard.
Michael Krigsman: That’s a really interesting point; smaller companies have speed and larger companies have data. That’s a uniquely AI focused proposition today.
Bülent Kiziltan: I think, yes, it’s very different from what we’ve seen until the emergence of data science, how that relationship worked. I think startups today have a lot of leverage doing data science and AI as long as they have access to that data. Sometimes the data is being produced in academic settings, and this is why we see a lot of academics and professors who are building companies, startups, and moving quickly and producing value that way.
Michael Krigsman: Let’s talk about finding data scientists. When I speak with executives, the lack of resources seems to be a constant complaint.
Bülent Kiziltan: There’s definitely a lack of talent. That’s for sure. I think an important part of the searching process is how the recruitment process works.
What I’ve seen in companies, especially bigger companies, their recruitment teams still use the old fashioned way of going after data scientists and, most of the time, they’re missing the real talent. They’re searching for certain keywords. They’re looking for a certain type of experience that is not out there and is not essentially required in the data science operations.
I think reforming, restructuring, and retraining the HR and the recruiters is an important part in proactively going after the data science talent. They are looking in the wrong places most of the time.
Michael Krigsman: What are the wrong places and what are the right places?
Bülent Kiziltan: How they scan CVs, for instance, they’re looking for certain keywords like Python and the type of learning anybody can put on their CV. But it’s very difficult to gauge the creative aspect of a certain individual, so there are metrics that one can use to gauge the creativity that an individual has. But I’m a face-to-face person, so a five-minute face-to-face meeting is worth any type of strategy in recruitment. Meeting a person face-to-face and talking to them gives a lot of insight into what type of person they are, how they go after a certain type of problem. Many companies are filtering based on certain skill sets, and I think that’s the wrong way to go about searching talented data scientists.
Michael Krigsman: The problem is many recruiters are searching for specific languages, as an example, none of which can point to creativity, and creativity is so essential.
Bülent Kiziltan: That’s right. In data science, in my opinion, what language you use, in most of the use cases, the language is not relevant. I advise them to use the language that they are most comfortable with.
When it comes to certain tools, they are looking for tools. Tools are being produced on a weekly basis. They are dynamically changing. What you put on your CV today is becoming irrelevant in a couple of months and there is a new tool. So, what I look for is creativity and the willingness to learn rather than a certain keyword on your CV. I think recruiters, most of them are going about this the wrong way.
Michael Krigsman: Is that any different from hiring software developers?
Bülent Kiziltan: Yes, it is somewhat different. In software development, things are more mature, experience is very important, and the process is really well defined, whereas data science operations, as I said, is still in its experimentation and there is no single answer. You can take two very similar companies, based on their leadership and management style, you have to hire different people that align better with the culture of the particular company. So, it’s a very dynamic domain and one cannot go about this in a deterministic manner. you really have to meet the candidate and talk to them.
Obviously, screening is very important, and we really appreciate the efforts of recruiters and how they help us in the hiring process. But we have to go about data science hiring in a different way.
Michael Krigsman: What makes a company attractive to a data scientist, if you want to set up the right kind of environment to attract folks?
Bülent Kiziltan: Yeah. Everybody has a different background. They have different experiences that they bring to the table. For me, for instance, right now, I think what I focus on is the culture. The culture has to be right for the data scientists. When I mean right, I mean a place where they can continually learn, where they can bring interesting ideas to the table, and the ideas are being valued.
The merit of the idea is more important than the title of a person and an intellectually rich environment. Then, obviously, the problems that they’re working on, how interesting they are is important for people to choose one company over another.
If a company doesn’t set the culture right, as I said, the turnover will be very high, which is a very high cost for companies. That high turnover will make a data scientist question whether it’s a right choice or not. If a company has a high number of turnover, they have to really rethink their strategy and approach to data science, in general, in my opinion.
Michael Krigsman: Arsalan Khan, on Twitter, makes a really interesting point. He says, “Very often HR is not equipped to hire data scientists because they don’t know enough to evaluate who is good and who is not.” I think this gets back to the point that you raised earlier that searching, scanning for keywords on a CV or a resume does not help you evaluate the creativity or the potential of that person at all.
Bülent Kiziltan: Right and wrong. Certainly, he is right when it comes to domain expertise. They don’t. But recruiters and HR is not required to be a domain expert when they are screening the candidate. How they screened in the past based on keywords regarding domains that are mature like software development or certain types of engineering should be very different than looking for a particular type and profile of a data scientist is what I’m saying. They don’t require domain expertise to do that.
Michael Krigsman: What you’re essentially saying is you need to be looking for innovation potential.
Bülent Kiziltan: Right. For instance, has a person operated in different domains and still remain productive? I think that’s an important metric to gauge the learning willingness of a candidate and the creativity of a candidate.
If a person comes from an academic background, have they produced academic work in different topics, for instance, is an important metric. If they have been in the industry, have they been in different domains? If they have remained in the same domain for 15 years, that makes them a really deep domain expert, but will this make a person the right candidate for data science? I don’t think so.
Michael Krigsman: Gus Bekdash, on Twitter, makes a really interesting point. He says, “One principle that’s helped him in the past is looking at the kinds of problems that a person has solved. What do you think about that?
Bülent Kiziltan: Yes, but not everybody has the luxury to choose the problems that they want to work on, both in the academic setting and in the industry setting as well. I kind of evaluate this more dynamically. I look into the problem. It has a certain weight. Certainly, they have some leverage in choosing it. But once they’re in an operation, typically they are assigned certain problems, and I look at what they have brought to the solution.
Michael Krigsman: That’s the key. It’s not just the problem they’re working on but the nature of the solution that they’re applying.
Bülent Kiziltan: Absolutely. This also will give insight into how you can build your CV and make it more transparent to the hiring manager. You should probably include some of the relevant solutions that you brought to a certain type of problem. I think that is very relevant.
Michael Krigsman: Yeah, that’s a very, very interesting point. What you’re really trying to do is help the hiring manager gain insight into the way that you think.
Bülent Kiziltan: Yes, that’s a very tough one. It’s a challenge for everyone, including recruiters, including hiring managers. It is an area that we also keep experimenting. Some companies, they have online challenges, coding challenges, which work or it doesn’t work. I don’t like online challenges, but I like to sit down with candidates and stay in front of a board and have a conversation, focus on some problem, and see how they think.
This is a very subjective process. I agree with that. This is why data science candidates should not be discouraged if you’re turned down by certain companies because everybody has a different approach. It doesn’t reflect their worth, so it’s a very subjective process.
Michael Krigsman: One of the issues, I think, that comes up, especially in larger companies, and you alluded to this earlier, is that the aspect of data science can get lost relative to the importance of process, corporate flow, desire to look good, and so forth.
Bülent Kiziltan: That’s right. This is why choosing the right leader is very important. My approach to leadership is to lead by example. I would imagine a leader who really has written code and models and knows about a process itself is an important asset in this regard to hit the right balance when it comes to focusing on short-term business objectives and deliverables and then also investing in the long-term future.
If the manager of a certain type of data science operations is focused only on one aspect, whether it be R&D for the long-term or just short-term deliverables, I think will face some hardship in the midterm. So, in big companies, as I said, because the domain experts have operated mainly in academic settings, have somewhat limited business experience compared to managers, consultants, or executives who have been in the industry for decades. You cannot compare their business experience, for sure. I think what is more important is to bring a person that has a reputable domain expertise and has business acumen leadership skills as well and invest into that person if you want to break ground using AI.
Michael Krigsman: The ultimate issue is, as you say, how are we breaking ground? Right? That’s very different from, say, developing traditional enterprise software business applications.
Bülent Kiziltan: Yes, breaking ground. Everybody has a different approach to breaking ground or doing disruptive innovation. But what I’ve seen in many companies is, when it comes to AI, when you look at the nuts and bolts of what they actually do, sometimes what they promise is actually not there. This has to do with not having the right talent or not having the right level of investment into AI, but they want to move in the right direction, so that’s an understandable position to take where they advertise a vision and they are trying to build a team that can align with that vision, which is very important.
It’s, I think, critically important to have the right leader that is setting the tone for the culture. A leader versus a manager is very important. Go after the leaders that have some domain expertise.
Michael Krigsman: What about at the senior level inside a company, at the board level, the senior executive level, folks who are setting the goals for the business? How does that translate down into AI efforts and data science efforts?
Bülent Kiziltan: Again, I think the leader who is managing data science operations is critically important here. The person has to be an educator, coming from a background in which they have operated at a level where they can break down technical stuff to simple terms. This is how a leader that is operating within data science has to talk with the board. You cannot just go about a deep learning network and talk about backpropagation in the board meeting.
With startup companies, this may not be a big problem. But with bigger companies, I think the communication skill plays a critical role when the data science leader comes to the meeting and talks about the objectives that they have and how it aligns with the business objectives. On the other side of the table, it requires an executive board and a CEO that is willing to learn.
We had this conversation before. We live at a time in which every business leader has to be articulate and learn some aspects of machine learning and AI in order to make the right decision. If it’s a person that is not open to learning, that doesn’t value the best ideas but is looking for a certain type of presentation or looks at the fonts and the colors of a PowerPoint rather than look at the content of the PowerPoint, I think that’s the place where that relationship can break ground or fail.
Michael Krigsman: Isn’t this really just like It folks? How is this different from the historical problem that technical folks have had communicating with businesspeople? It seems like that’s the same issue.
Bülent Kiziltan: When it comes to communicating, I think, yeah, the problems are similar, but the business objectives, the investment types, and the strategies are very different. There will be a certain structure of communication and a certain expectation from the board that they are used to.
The leader, the data science leader either conforms into this expectation, I think which is the wrong way to do. The data science leader has to be bold but also has to align the business objectives of the data science operations with the board. What the data science leader brings to the table will be very different from an IT leader.
Michael Krigsman: As we finish up, can you describe those differences between managing an IT function like the CIO, for example, and managing an AI or data science operation?
Bülent Kiziltan: I don’t want to talk too much about the IT folks because it’s not my area of expertise, I would say, per se. But I would imagine that the IT infrastructure is much more mature. The day-to-day expectation, how they set up certain goals is more a set as opposed to a domain, such as data science, in which the leader has to reinvent himself or herself on a weekly basis.
You basically have to encourage your team to fail but fail quickly rather than in IT. That margin for failure might be very, very small as opposed to in data science. We are experimenting quite a bit, and so there is a certain overhead that comes with that experimentation.
We face different types of challenges, but by no means do I want to say that IT leadership is less challenging. We all face different challenges.
Michael Krigsman: That is certainly the truth. As we finish up, Bülent, any final thoughts on this topic?
Bülent Kiziltan: Data science, AI, machine learning, and analytics is a great place to be. I see a lot of younger people who are moving into the field. I want to encourage them to move in. Most of the people are coming in, they are suffering from a syndrome where they think they are not properly trained as a data scientist, so they are less worth as opposed to more formally trained computer scientists. I don’t think this is the case. A diverse background has a lot of business value.
From a leaders perspective, creating that creative space for data scientists, regardless of their level, produces a lot of ROI. It has business value that comes with it. I encourage the leaders to create that space for the incoming data scientists. Invest into their continual training if you want to sustain the value that you have produced in the short-term.
Michael Krigsman: Okay. Bülent Kiziltan, thank you so much for joining us today. It’s been a very fascinating discussion.
Bülent Kiziltan: It’s been a pleasure, Michael.
Michael Krigsman: You have been watching this interesting discussion on how we manage data science and artificial intelligence operations. Thanks for watching. Please subscribe on YouTube to our YouTube challenge and go to CXOTalk.com to see more videos. Be sure to subscribe to our newsletter. Thanks, everybody. Have a great day. Check out our videos and we’ll see you next time. Bye-bye.
AI in Medicine: Life Sciences and Drug Discovery
Sponsored by:
Artificial intelligence offers the promise of better health, faster drug discovery and testing, to create improved medical outcomes for patients. We talk with a world expert on using AI in life sciences to discover and develop drugs faster and less expensively.
Dr. Alex Zhavoronkov is the founder and CEO of Insilico Medicine, a leader in the next-generation artificial intelligence for drug discovery, biomarker development, and aging research. Prior to Insilico, he worked in senior roles at ATI Technologies, NeuroG Neuroinformatics, the Biogerontology Research Foundation and YLabs.AI. Since 2012 he published over 130 peer-reviewed research papers and 2 books. For six years in a row, he has organized the annual Aging Research for Drug Discovery and Artificial Intelligence for Healthcare forums at Basel Life/EMBO in Basel. Alex is an adjunct professor at the Buck Institute for Research on Aging.
TRANSCRIPT
Michael Krigsman: Artificial intelligence in drug discovery is a relatively new field. It’s a very important field. Today, we’re speaking with one of the most prominent voices in AI and drug discovery.
I’m Michael Krigsman. I’m an industry analyst. Thank you so much for watching. Before we begin, please subscribe on YouTube and subscribe to our newsletter. You can do that right now.
Alex Zhavoronkov, he is the CEO of Insilico. Tell us briefly about Insilico Medicine and tell us the things that you’re working on.
Alex Zhavoronkov: We are focused primarily on applying next-gen AI techniques to drug discovery, biomarker development, and also aging research. We focus specifically on two machine learning techniques. It’s generative adversarial networks and reinforcement learning. Those are the techniques we are most expert in our field.
We use those techniques for two purposes. One is identifying biological targets and constructing biomarkers from multiple data types and also generating new molecules, new molecular structures with a specific set of properties. We were one of the first companies, possibly the first one, to generate new molecules using this new technique called generative adversarial networks–it’s kind of AI imagination–and validate those molecules experimentally.
What is the Drug Development Pipeline?
Michael Krigsman: Give us some context. What is the drug development pipeline? Why is it so hard? Let’s talk about that. Then we can shift to how AI makes that better, makes it easier.
Alex Zhavoronkov: Drug discovery and drug development is a very lengthy process. It’s also one of those processes where you’ve got more failures than successes. Actually, much more failures than successes.
It takes more than $2.6 billion to develop a drug and bring it to the market to address a specific disease. That’s after the molecule has been tested in animals. Also, there is a 92% failure rate after the molecule has been tested in animals. When it goes into humans, it fails 92% of the time. So, the process is not only lengthy, but also risky.
Usually, the time it takes to discover and develop a molecule is around a decade. People who initiate the process are not always there when the molecule launches. The process is comprised of several steps.
The first one is hypothesis generation. You come up with a hypothesis, a theory of a certain disease and identify relevant targets. You theorize about what kind of proteins are implicated in a disease condition and what proteins are causal.
Afterward, you go and develop either an antibody or a small molecule for this protein target. If you are developing a small molecule, you usually start with screening large libraries of compounds that might hit this particular target and do all kinds of experiments to see how well those small molecules bind to this target.
Afterward, you select several hits. You identify what kind of molecules fit best for this protein target and start doing all kinds of experiments on those molecules to see if they work very well in the biological system, in the disease-relevant assay, in a mouse, in a dog, or other animals, and then you file for IND with the FDA to get the molecule into clinical trials.
After that process is complete, we are getting into drug development and starting clinical trials. It starts with phase I, which is safety; phrase II, you test for efficacy; and phase III, you test for both in a larger clinical setting, in a larger population. Then you might want to go for a phase IV or start launching the product.
Michael Krigsman: Mm-hmm.
Drug Discover and Post-Marketing Research
Alex Zhavoronkov: And then, post-marketing research. That process takes more than ten years, usually, and fails 92% of the time.
With AI, you can really play in pretty much every segment from early-stage drug discovery where AI can assist you with a hypothesis model and, essentially, pulling out the needles from the haystack with a target ID, with small molecule identification, with virtual screening, with generation of novel molecules with specific properties, with planning your clinical trial design with enrolment of the clinical trial. And then, also, for predicting the outcomes of clinical trials.
Michael Krigsman: Where does AI begin to shorten that process, make that process better?
Alex Zhavoronkov: If you got the very early steps of the pipeline and start working on the hypothesis generation and target identification, usually you have multiple kinds of paths to pursue. One path is to look at the literature and identify promising areas that had been uncovered by scientists in the past and were published in peer review literature. Ideally, these targets, those hypotheses were not implicated in the disease that you are looking at by somebody else.
AI can help you mine massive amounts of literature and also other associated beta types to identify signals that a certain target might be implicated in the disease. We, at Insilico, usually start with grants data. We look at biomedical grants that monitor about $1.7 trillion worth of grant money over the past 25 years. Then we look at how those grants progress into publications, into patents of the clinical trials, and then into products on the market.
We follow this idea from idea and money to money, so from money on the market. We also look at how money becomes data. So, usually, when the government is supporting a certain study, the data needs to be deposited in a public repository for other people to replicate it and also for the common good.
We try to follow the money into data. If the data is not there, we try to contact the scientist and get the data from the scientist and/or to encourage the scientist to put the data into the public repository.
We start with text databases, but also link this data to omics data. It’s basically everything that ends with “omics” is called omics data, so transcriptomics, genomics, metabolomics, you name it, so metagenomics.
We work primarily with gene expression data, so we look at how the level of expression of certain genes or entire networks change from, let’s say, a health state to disease. We deconvolute those changes, those signatures of disease into individual targets, especially causality models, and identify what kind of proteins could be targeted with a small molecule.
Then we go back into the prior art in the text and see if anybody has published anything that strengthens our hypothesis. It doesn’t necessarily mean that our hypothesis is wrong if the signal is not there in text because sometimes the humans just couldn’t really associate a certain target with a disease using older methods, but it gives us a little bit more confidence to see that somebody already touched on this challenge and on this target before.
Michael Krigsman: Alex, is the key then at this point that the various AI techniques that you’re using enable you to discern patterns in the data that those signals, as you said, that otherwise you could not pick out? Is that the key issue here?
Alex Zhavoronkov: Yes, but, really, we are aggregating enormous amounts of data that is just not possible to process using human intelligence. We are also aggregating and grooming those data types together. Sometimes, those data types are completely incompatible and it’s impossible to just suture them together using standard tools. You really need to train deep neural networks on several data packs at the same time in order for them to generalize and in order for us to be able to extract relevant features that are present in several data types at the same time.
Some of the data types that we work with are completely incomprehensible to the human mind, to human intelligence. Like, for example, gene expression or movement or cardiovascular activity scanning or ultrasound, for example. We manage to bring those data types together using AI and then identify relevant targets that basically trigger a certain condition.
Core Competency: Biology vs. AI
Michael Krigsman: At Insilico, is your core competence in biology and medicine or in developing the AI techniques? Is it possible to even split those two?
Alex Zhavoronkov: In our case, we are good at both and we hire competitively, internationally. We actually hire through competitions where we put very challenging tests out in order for people to try and solve them very, very quickly. Those challenges are usually in combination of developing an AI method plus solving a complex biological or chemical problem.
However, when you’re looking at really great AI scientists, they are usually not great in biology or great in chemistry. They are good at math. That is why some percentage of our company are just great mathematicians who are developing novel methods for bridging chemistry and biology using deep learning, for example.
Part of the company is specifically focused on applications of already existing techniques like GANs and reinforcement learning to existing problems in chemistry and biology. Those people are usually on the applied side and they know both chemistry and biology. They can talk to the mathematicians and they can do some basic research in AI as well.
Of course, we just have pure play biologists and chemists who are also necessary in order to validate some of the results of our AI. That’s why we have such a large, diverse, and international team because you really need to have those three areas covered: the methods, the applications, and the validation.
Michael Krigsman: We have an interesting question from Chris Peterson on Twitter who says this; he says, “Grid-based parallel Fortran programs are still being used for some pharmacokinetic and pharmacodynamic studies. Do you see AI replacing the old school code, enhancing it, or advancing in parallel?”
Alex Zhavoronkov: I think, currently, we need to advance in parallel. Of course, some of the old techniques and some of the very primitive mol dynamics are still being used by really top experts in drug discovery today. But most of those methods are being significantly accelerated by high-performance computing and AI, so typical software that’s been around for a very long time, like Schrodinger, for example. The company has been around since ’92.
This guy has made major breakthroughs in multiple areas and kind of managed to advance older algorithms to solve very complex problems. I think that at Insilico, we try to reinvent everything from scratch and we write our own software. But, of course, we know many of our collaborators who would just like to take small pieces of our big salami that we’re developing and play around with it today. They might be using some more classical tools that we cannot get around today.
Ideally, you need to have a seamless pipeline, which identifies the targets, generates the molecules, and runs those molecules through a large number of simulations in one seamless pipeline. That’s what we are building and that’s our holy grail. But, of course, many companies, many groups are trying to do the Lego game and try to use multiple tools with varying outputs to solve the same problem.
Developing AI Tools In-House
Michael Krigsman: Why do you develop your own tools?
Alex Zhavoronkov: Yes, just because many of the methods that we are using are so new that they are incompatible with the older tools. There are many groups that claim to do AI but, essentially, what they are doing is they are mechanic jobs taking off-the-shelf software and trying to bridge some gaps in pharma R&D using those tools. We don’t do that. We develop everything from scratch, so from target ID to small molecule generation.
Michael Krigsman: Now, we have spoken about using your techniques to uncover potential candidates. The next step is evaluating. First, we have to uncover possibilities, and you do that by aggregating all of this data and then mining that data using the various techniques. Now you’ve done that. How do you evaluate the candidates that you’ve uncovered initially?
Alex Zhavoronkov: Usually, when you are left with a list of protein targets for a specific disease and you are trying to prioritize, you try to annotate those proteins with as many scores as possible. You are looking at whether this protein target has ever been implicated in toxicity. How is it connected with everything else? Which tissue does it play in more? How does it interact with other proteins? Is it druggable? Is it druggable with a small molecule or with an antibody? Did anybody else touch it? What is the patent space around the molecule? Has anybody tried taking it into the clinic with a small molecule or an antibody for a specific disease?
There are many, many, many, many scoring functions that you need to consider. At the end, when you basically are left with a very small set of targets, then you also test them in a variety of biological systems to see which one is more relevant for your disease of interest.
I’ll give you an example case study. For example, we are very interested in fibrosis. Fibrosis is not a very simple process to describe and there are multiple types of fibrosis. There is IPF, so pulmonary fibrosis. There is smoking-induced fibrosis in the lung. There is aging-induced fibrosis in the lung. We’ve identified more than 120 types of fibrosis by comparing normal tissue to tissue inflicted by a certain condition that is associated with fibrosis.
We just recently did a case study where we looked at the IPF, so pulmonary fibrosis, identified the list of targets for this condition, and our list was 50 targets. We looked at when those targets are more active and more disease-relevant at what stage of the disease because I think, if you kind of catch it later or address it later when there is just so many symptoms, you are going to be treating the symptoms, not the cause.
In our case, we’ve identified a large list of targets that are likely to be very relevant early in the disease progression. Then we looked at what targets are novel, so we looked for novelty, so what targets people did not focus on as much. We don’t want to focus on old targets. Then we looked at what targets are druggable, so where we could actually come up with a small molecule from within the library or we can generate a molecule from scratch. Then we looked at what targets could be validated in a specific set of assays for fibrosis.
Michael Krigsman: Where is the impact of the AI techniques that you’re using in this?
Alex Zhavoronkov: Usually, it’s for scoring. You identify multiple scores for those targets. In our case, the target is annotated with more than 50 scores. Whether it has been implicated in a certain condition before, whether it interacts with other proteins in a specific way, whether it is likely to lead to toxicity. Those predictors that basically give you this kind of score and probability that this target is the most relevant one, these scores are deep learning. We developed them using machine learning.
Academia vs. Industry
Michael Krigsman: We have another interesting question from Twitter. This is from Shreya Amin. She says, “How does this type of research that you’ve been describing using AI and the process compare between academia and industry?”
Alex Zhavoronkov: Sure. It’s a very, very good question. In the industry, in big pharma, people are a little bit less adventurous. They are trying to develop the various techniques to really solve a problem and make incremental changes. It’s not for publication purposes.
In academia, people are much more innovative and adventurous. Of course, they try to publish. That’s where the innovation comes from primarily.
We, at Insilico, we sit in between academia and industry, so we publish at the rate of about two research papers a month. That is a lot for even some of the academic groups just to also prove the concept and explain where we’re going.
Academics, I think, are much more productive nowadays, whether it comes to developing new methods and showing new directions. However, the disconnect between really good computer scientists that are developing novel techniques that might be relevant for drug discovery, they very often are so far away from biology and chemistry that they put the papers out and the paper is really from the machine learning perspective, but it’s really, really poor from real-world applications. Very often, they don’t really understand that they overfitted somewhere or if it’s a completely irrelevant output that they are getting, or input, only after somebody tries it in biology and chemistry.
Very often, and nowadays it’s actually more prevalent, a lot of people put papers on Archive, so in a repository, with a catchy title so it goes viral and gets picked up by the browsers, by Google, or by some news outlets. They get recognition and PR for this work, but then you try to replicate what they did or even just read the paper carefully, and you realize that it’s not going to work in the real world. I think those kinds of papers and those kinds of efforts, early efforts, by academic groups specifically, without going through a peer review, also put a lot of skepticism in big pharma. People just don’t think that many techniques are relevant, applicable, or transformative for their business.
Building a Team for AI and Biotech
Michael Krigsman: Let’s talk about the team construction aspect because one of the things that you’ve mentioned a couple of times is the importance of both the machine learning capabilities as well as the biology capabilities. These are very specialized skills, and so how do you construct teams that enable both sides to work together and create something that one or the other could not do alone?
Alex Zhavoronkov: That’s another very good question. In our case, that’s one of the reasons why we are growing so slowly. We’ve been in business for 5 years now, but we are still 66 people. One of the reasons for this slow, organic growth is because it takes time to really integrate the AI scientists with biologists and chemists. It’s very difficult to find people who are good at both at the same time. Usually, you are good at math or you are good in chemistry or you really need to have some good programming skills to be able to do an API and properly combine your technology with somebody else’s.
We try to work in teams of three or four on specific therapeutic projects where one person is very good in chemistry or biology, one person is good in AI, and another person is good in just basic IT. It’s basically teams of three or four people. On top of them, there is an infrastructure, an organizational infrastructure that helps manage those teams. We also separated the pure play AI team from everybody else, so they could work on the methods without being brought into the applied domain.
Getting this kind of talent who are willing to really contribute to methods development and develop novel algorithms, that is very, very difficult. Getting people who are good in application of already developed methods, that is rather easy. Getting the two to work together, that is very hard. To do this we, again, try to pursue organic growth and work on projects in small teams.
Insilico Business Model
Michael Krigsman: In fact, we have a question from Twitter on this subject of your business model. Chris Peterson is asking great questions. Thanks so much, Chris. He’s asking, “Are you contracted to look for specific therapies or are you developing molecules from scratch and hoping to license them for clinical trials through distribution?”
Alex Zhavoronkov: We’ve been in business for five years and we have explored multiple business models. As an AI company, you have to explore because otherwise it’s very, very difficult to scale on one business model and it’s also quite risky.
We started as a service company, and we started partnering with pharmaceutical companies, with biotechnology companies and, also, venture funds where we provided a service or provided a system to them. We learned the applications that people are looking for and started developing our own small molecules, discovering our own small molecules and then licensing them.
Our current business model is actually very simple and actually allows us to scale. We work with venture capital firms that really know the business of biotechnology and are pursuing drug development and drug discovery. They guide us into where we need to identify targets and generate small molecules. Then they form teams around those small molecules and targets from them and let them do a little bit more validation and development of those target molecule associations.
What we get, we get a small upfront payment initially and then we get milestone payments as the molecules progress through the various steps of validation. Then we get some royalties. Usually, if you consider the BioBox or the future revenues that might come from the molecule, those deals are very, very substantial, but initial payment is rather small.
That is why we have another business that is a software licensing business where we license some of our software tools to others to generate some revenue and ensure that we are sustainable, consistent, and also get some feedback on how well the software works; if we need to add more features.
Michael Krigsman: Okay.
Alex Zhavoronkov: Another business model is that we do have some joint venues. For example, a joint venture with a company called Juvenessence. They are developing the molecules that we provide to them.
Michael Krigsman: Okay, so you have a diverse range of things that you’re working on and trying that support your business model efforts, essentially.
Alex Zhavoronkov: Correct. But what we are mostly interested in is not the immediate revenue. In most of those licensing arrangements and engagements, we get some data back. We pretty much became one of the largest data factories in the world, getting data back from preclinical experiments.
Michael Krigsman: That’s interesting. We have another question from Twitter. This is from @TrovatoChristian. He is a biomedical engineer and he is a Ph.D. student in computational biology in the Department of Computer Science at Oxford. By the way, I find it very interesting that computational biology falls under the Department of Computer science rather than the Department of Biology. His question is, “Are there any examples of drugs developed by AI only?”
Alex Zhavoronkov: At this point of time, there is no such example. You always have a human in between. I hope that in the very near future, we’ll be able to show that the pipeline where no human was involved from target identification to small molecule generation might be able to churn some of those molecules, some of those promising molecules. But at this point in time, the experiment is king. So, unless you can validate your techniques experimentally, it won’t really go forward. I have never seen an example of a molecule, even in mice at this point in time, that is completely generated using AI.
Michael Krigsman: What’s the obstacle preventing using AI to go from beginning to end?
Alex Zhavoronkov: Well, because of the failure rates in pharma, in general. There are very, very few success stories to train on. Those success stories are very, very diverse. In some areas, it’s easy to validate whether your algorithm is producing some meaningful output. But, in many cases, you really need to go and validate at every step of the way. That is why, when you are building this salami that is allowing you to go end-to-end, you need to ensure that you validate every slice of the salami and validate it internally, but also validate it with external partners. That’s what we are trying to do as well.
Michael Krigsman: Eventually, that data may be there, but it sounds like it’s just far too early at this stage.
Alex Zhavoronkov: At this stage, nobody tried to virtualize drug discovery completely using AI and do it seamlessly without human intervention. In many areas, it’s actually not possible just because biology is so diverse and medicine is so diverse that it’s very, very difficult to have a solution that would fit all. That’s why people are going primarily after cancer just because it’s a little bit easier to validate and the specific types of cancer, like for example solid tumors where you can do a xenograft and see if the tumor shrinks in a mouse if you feed it, if you give it a specific molecule. There needs to be validation at every step of the way and, at this point in time, those end-to-end pipelines will work only in certain therapeutic modalities.
Michael Krigsman: Let me ask you another question from Twitter. This is from Shreya Amin again, a great question, an interesting one. She says this; she says, “Using existing AI techniques, which areas from the perspective of types of drugs, diseases, conditions, and so forth are closest to breakthroughs or have made the most progress and what’s most difficult?”
Alex Zhavoronkov: I’ll give you an example that I am very, very familiar with. We’ve got some JAK inhibitors, so Janus kinase inhibitors that are developed completely using generative adversarial networks and reinforcement learning. I think those are kind of the most promising techniques for de novo molecular design – period.
We’re currently in mice with those, so went all the way from enzymatic assays to mice, and showed that we can now achieve selectivity, specificity with those molecules, and those molecules have many other properties. Those are pretty common techniques nowadays, both the GAN that we used and the reinforcement learning technique that we used. It’s not something super new, so we actually switched our R&D in a slightly different direction.
Michael Krigsman: Where is all of this going over the next–I don’t know–three, four years, two to four years? Let’s not go out ten years. Over the next few years, where is this going to be?
Alex Zhavoronkov: I think that companies like ours are going to put much more emphasis on their internal R&D instead of collaborating with big pharma because collaborating with big pharma is usually a path to nowhere because it’s either death by pilot or they just ingest this expertise internally and catch up. But, at the same time, they are so bureaucratic that it’s very difficult to change and, at the same time, at the CEO level, big pharma companies are more focused on increasing sales or buying other companies to increase sales or to get late-stage clinical assets, so phase two, phrase three assets. The internal R&D is actually not being viewed as a huge priority and, regardless of what they think, it’s fact. Usually, it’s kind of the 15% to 20% on the income statement that needs to be there because, otherwise, the investors are not going to invest in the company. But the productivity of this internal R&D is usually very low.
I think that smaller biotechnology companies that embrace AI and embrace virtualization of drug discovery, they are going to be very successful. There are several cases that I admire in the industry, like for example Nimbus Therapeutics. This guy has managed to virtualize the entire drug discovery and development process and get some phase two assets to market and license them.
As the SAI improves and starts solving more problems in the pharmaceutical R&D pipeline, so from hypothesis generation, target ID, small molecule generation, prediction of the various properties of the molecule in clinical trials, and better stratification techniques. I think that people who really understand the process and can virtualize it will be the winners. So far, I know several companies that are doing this, so some companies are working with us. Some are in the stealth mode. I think they are going to be winners going forward.
When you talk about drug discovery in two to three years, it’s actually a very, very short time. In many other areas of human development, if you ask me to plan five years ahead, I won’t be able to because things are changing very quickly. In pharma, that’s not the case. We really need to do the experiments and get things right.
Research on Longevity and Smoking
Michael Krigsman: Do you want to just very briefly tell us about the last research you did on either longevity or smoking? I know we’re out of time, but just very briefly.
Alex Zhavoronkov: [Laughter] Sure. We just published a very fun paper showing that smoking accelerates aging. One of the areas that we are focusing on is age prediction using multiple data types, so from pictures, blood tests, transcriptomic data, proteomic data, microbiomic data. We use this data to predict the person’s age reasonably accurately and we then look at what kind of interventions or behavioral modifications, what kind of lifestyles contribute to that person looking younger or older.
We did this exercise in Canada. We worked with the University of Lethbridge and the government of Alberta to process a large data set of smokers and nonsmokers of varying ages looking only at anonymized blood tests, just very, very few parameters from a recent blood test. First of all, we built a predictor of the smoking status, so now I can, with reasonable confidence, say whether you’re smoking or not by looking at a blood test but, also, we showed that people who smoke, they look older to the deep neural net trained on their blood tests than nonsmokers.
Once we published, it actually went rather viral and we got very positive feedback. For example, my daughter is considering quitting smoking just because she doesn’t want to look old. People don’t really care about their health, but they really care about how they look. If you don’t want to look old, just quit smoking.
Michael Krigsman: [Laughter] Okay. Great advice. Alex, thank you so much for taking time. Everybody, please subscribe on YouTube. Check out CXOTalk.com for lots of videos and subscribe to our newsletter. Have a great day, everybody. Take care. Bye-bye.
