Customers were flocking to Etsy for different reasons once the pandemic started. | Photo: Jackie Molloy/Bloomberg via Getty Images
CTO Mike Fisher on scaling up, fixing search and how Etsy's prepping for the holiday season.
When the CDC started recommending that people wear face coverings, the masses took to Etsy to search for face masks. There was just one problem.
"If you searched on Etsy or any other public ecommerce shop, it would come back with face-cleansing masks," Etsy CTO Mike Fisher told Protocol, "because that's what people used that term for." But that isn't what people wanted this spring. So Etsy had to retrain its algorithms. "We had humans identify what were the real masks they're looking for, and we fed that into the algorithms," Fisher said. "They were then able to identify what people really meant."
Fixing search was just one of many issues Etsy had to deal with as a flood of buyers and sellers started using the platform. And the stress is far from over: Fisher now has to prepare for the holiday season. "That is very different this year. Normally it's very predictable," he said. "This year, it's all up in the air."
In a conversation with Protocol, Fisher described how Etsy's preparing for that, the benefits of being on the cloud and how COVID changed the company's roadmap.
This interview has been edited for clarity and length.
At the start of the pandemic, all ecommerce sites saw a huge spike in demand. Did Etsy experience technical challenges as a result?
When I first arrived [at Etsy in 2017], we were still running in data centers. What brought me back into the company was a question [from] our CEO Josh Silverman: "Are we spending too much on infrastructure and not enough on product development?" When I looked at that, I said "Yes. By my analysis, we're spending too much because we're maintaining our own servers and our own observability stacks, and everything."
The reason was, when Etsy did some of this stuff in the early days, there [weren't] cloud services provided. There wasn't the ability to outsource a bunch of your stuff. Fast forward 12, 13 years, there were companies providing these services. And so I said, one of the things we should do is get out of the data centers and into the cloud. I call it moving up the stack: We can outsource the infrastructure and the services, and our engineers can move closer to the customers. That started us on the journey. We did a full RFP, we looked at all the cloud providers, and we ultimately chose Google as our provider.
We finished the migration in February of this year. You couldn't have asked for any better timing. Thirty days later, we're in this pandemic. Traffic spikes massive amounts, more than we see at holiday seasons. And we're on the cloud, and we can autoscale. We can scale up, we have the infrastructure.
Honestly, had we not been on the cloud, we couldn't have done that. We couldn't have gotten the hardware. Normally, pre-cloud days, we would start in July ordering hardware, and we would order millions of dollars of hardware, because it would take that long to get it ordered, into the data centers, racked up into the racks and set up all for the holiday weeks, the Cyber Week. This happened overnight, and we would have never been able to do that had we not been on the cloud. We sold more than 29 million masks in Q2. There were 11 searches for face masks a second, at one point in time. And we can handle that with our search and our infrastructure, because we're on the cloud.
Was there anything that did come up during the pandemic, from a technical standpoint, that you had to do or think about?
When face masks initially came out, if you searched on Etsy or any other public ecommerce shop, it would come back with face-cleansing masks and things like that, because that's what people used that term for. Fortunately, we have been invested for years on improving what we call "semantic search," and understanding what people mean when they type.
One example pre-COVID: We would look at things like "dress" and "gown." And if you put in "dress" you obviously wanted listings that had "gown," or vice versa. So, we have been working for years really on using state of the art, what's called graph technology and machine learning AI, to be able to power that and be able to close that semantic gap.
So when face masks first appeared, you couldn't find anything except for cleansing masks. But our algorithms could be retrained very, very quickly. So when this came about, we let the algorithms get retrained — it happens in hours or days. Then all of a sudden, what they were really looking for was showing up.
To help get that data, we had humans identify what were the real masks they're looking for, and we fed that into the algorithms, and they were then able to identify what people really meant by these masks. We were scrambling at the beginning to get that going, but because we had all this amazing infrastructure, we were able to do it.
Did you have to think about safety features, to prevent people buying things that claimed to be medically certified?
Face masks and sanitizers that are sold on Etsy are not medical grade, so our listings are not allowed to include any of the medical or health claims. We actively communicated and educated our sellers to make sure their listings comply with our policies. We talked about some of the technical stuff that we did, but there's a big part of this that was our member services and teams like that, that helped sellers.
Another big issue was we put out a call to sellers, we sent out emails … that said there's a big demand, we've got to get on this. We wanted to do this because at the time, the PPE shortage for the surgical masks and the N95s was so great that we wanted sellers to be able to do those for normal people walking around every day. We wanted [the medical masks] for the health care workers. We put out a call for them, and got them organized, or motivated, to do it.
But then the problem was we could overwhelm a seller with sales. As I mentioned, 11 searches a second. And a lot of these are individual sellers, so they're sewing at their dining room table. And so we had to figure out, how do we help them scale their businesses, how do we make sure we don't overwhelm them with sales.
This was a combination. On the technical side, we can help rotate them through search results, so that no one seller got overwhelmed, in 10 minutes. On the member services side, we were reaching out to sellers and helping them figure out how could they scale their business, how could they predict how many masks they could really make, so that they don't overpromise and underdeliver. How could they rotate their shop in and out of service, when they've got enough orders to handle for a day or two. And also educating them about what you can say and what you can't say about the medical or health claims. It was this combination of technical and humans, connecting all the pieces together to make it work.
Were there any features that either new or existing sellers said that they wanted?
Josh made the point internally that this was a time to relook at our product roadmap. We did so, and I think his sentiment was, "I would be surprised if most of what we were working on wasn't still the right thing, but I'd also have to be surprised if there wasn't something that we wanted to change based on this." I think that that sentiment is correct. Part of the reason our marketplace did so well and helped so many people during this was we have been working for years, building the right features.
I mentioned search is one: We have been working for years on making search better and better and better. By doing that, by making sure that the checkout flow and the listings and all of these things, were better and better over the years, we were able to handle this. When people came, they loved the experience. So search and discovery was one that we have totally been working on for years and continued.
Some of the newer things: We realized that people want more lists. They want to be able to create more lists and favorites. So that's something that we've been working on more recently. Another cool feature that has come out recently, in part by new sellers coming in with ideas and requests, was listing videos. Before you could add your images for your listings, but now you can take videos of the products that you make. This could be someone wearing a shirt, it could be a video spinning around them. The other one … [that] I think is pretty cool: We have the ability to do augmented reality. So you can take that picture and hold it on your mobile device, and hold up to the wall and see what the picture would look like, based on the size and stuff.
Some of that was on the roadmap already, some of it was accelerated to help with the new demand for these products.
What are your priorities for the next six to 12 months?
Right now it's holiday season. The products that are in the pipe, we're finishing off to make sure we get done in time for the holiday season. And then of course prepping, given that our demand was already doubled.
Normally it's very predictable — we can actually take the traffic in July and August and say we know almost exactly what it's going to be during the holiday season, because we've done this for many years. This year, it's all up in the air. This could range by many, many percentage [points] between what it could be or what's the upper bound of it. And so we've been in lots of planning sessions with Google about this. That's the big thing on my mind right now: How do we continue to scale, so that all the sellers and the buyers have a wonderful holiday.
How are you thinking about that? Do you just plan for every scenario or try to improve your models?
Part of it is just modeling, so we up the stats. We say, like, what if stores cannot have Black Friday sales, because they don't want people in, or they only allow a limited number of people in — what does that push over to ecommerce? I think that's a very real possibility in this environment, so that's one thing that we're thinking about that pushes us to an upper-bound percentage, of hundreds of percentage [points] of increase. So that's one way.
The other thing that we're thinking about is what can go wrong. We're famous for postmortems: After an incident or problem happens, we're very good at figuring out what was the root cause and how do we prevent that again. We actually use a technique called a premortem, [too]. We do brainstorming sessions and say, what could fail? What are the possible failure scenarios, and how do we mitigate that today, now that we've still got weeks and months before this.
We've been going through sessions like that, where we talk with our teams about what if this particular part of our infrastructure broke, or had a problem — what can we do to keep the site running? The premortem is an interesting idea that we've been working with for the last couple years, [and] it's been heavily used this season as we enter into the holidays.
What are some of the things that could go wrong?
Our system is massive. It's so large to support, not only the volume of dollars and buyers, but it's global. So things like translations: When you put a listing in, we want people in other countries [to] see that in other languages. Translations are something that we care about, and we've got to make sure that those are processing quickly enough, and we've got the infrastructure and the services to do that.
Our payments infrastructure, we've been really bolstering over the last couple of years. We made a pretty big change this year. We have what engineers call a state machine that helps process payments, and we knew when we went through last year's holiday season that this was something that's got to be on our roadmap to fix over the next couple of years, because we could tell by the demand on the payments processing through that. When we saw the spike for masks in March and April, we accelerated that work, and we did it this spring: We made improvements to that state machine so that we can process.
Those are the things that we're watching for: [For] all of the infrastructure that we built, what is the next scale point. That is what we think about scale, that's kind of the step function stuff, that looks at all of these listings, that translation processing, or the reindexing — we get listings added thousands and thousands a day, they have to be reindexed — all of that takes time, but how do we ensure that it doesn't fall over when it hits a certain level?
The system is so complex because we do everything: We have our own advertising network, we have our own payment processing, we have our own search. On the seller side, we have to have tools for them to not only see what their sales were, but to message — we have an entire infrastructure around messaging. And then our shipping infrastructure, to allow for fulfillment. So there's all these little mini pieces that we look at.
Another exercise that we do is called a game day, in which we will put artificial demand on parts of our system to see how far it can take before it falls over. And [we're] doing that type of work on all these pieces, just to make sure that come Cyber Week, everything is ready for that.
- Friday: The tech that didn't make it in time for the pandemic
- Friday: Dark kitchens are a light for COVID-struck restaurants