The challenges of running a leading eCommerce site

The challenges of running a leading eCommerce site

Michał Grela

Relationship Manager at Future Processing

Contact me

Craig Bruce

Director Of Technology at Focus Group (UK)

In a competitive market, customers expect access 24×7 to the eCommerce site from any device, and if they can’t buy what they want when they want, they will simply go elsewhere.  

John Lewis is one of the UKs leading eCommerce platforms, with a demand profile which is massively influenced by promotional activity such as Black Friday. To illustrate that – February is a low point in Retail as customers finances are recovering from Christmas, customer orders might be as low as 200/hr yet peak at 20,000/hr during Black Friday, 100 times higher.  

Together with Craig we discussed what challenges he must face operating such a site and what approach he took to tackle them.  

Michał Grela (MG): Hello, and welcome to yet another episode of IT Insights by Future Processing. The topic of this conversation is the challenges of running a leading e-commerce site. The background of our conversation is that John Lewis, who’s one of the U.K’s leading e-commerce platforms with a demand profile which is massively influenced by promotional activities such as for example, Black Friday. In a competitive market what customers expect access 24 hours/seven days a week from any device and if they can buy what they want when they want they will simply go elsewhere. So we’re going to take a look at that challenge from a tech perspective as my guest today is Craig Bruce, who is the man behind building this e-commerce site.

Hi, Craig. Great to have you. Thanks for joining me. Can you introduce yourself?

Craig Bruce (CB): Hi there. Yes, I’m Craig Bruce. I should be clear that I was responsible for John Lewis. I’ve moved on from that challenge. But I absolutely I’ve been through, I think, three different Black Friday events with John Lewis, and very much I’ve seen the challenge of the e-commerce platform at scale.

MG: Wonderful. So I’m really looking forward to understanding these challenges. Let’s start with the first question right away. So actually to what extent is the demand truly influenced by promotions?

CB: Massively is the short answer, but the longer answer is that if you look at a typical customer profile, in February everyone’s had a good Christmas. They’ve been out to the sales. They’ve spent quite a lot of money and are not really coming into stores, or online and buying a lot. So middle of February, as an e-commerce platform, you might get say, $200 orders per hour coming through. So not an insignificant amount, but equally that’s the quiet hour you might get and that will vary throughout the day.

But when you get to an event like Black Friday, particularly when the offers have just gone live and everyone’s striving to get stock before it runs out, you could see 20,000 orders per hour. So a hundred times higher than you might get at that low hour. So low hour 200, high hour 20,000 per hour. So setting up a platform that’s going to deal with both extremes can be quite a challenge.

MG: Wow. That’s really a difference, I guess. So we also briefly in the intro touched on the customer loyalty. And before diving into the tech aspect of building such a platform how true is the statement that how actual it is that the customers, of course, are loyal to John Lewis and won’t go elsewhere?

CB: Well, I think to some extent it may vary on the specific product they’re after. So if you’re looking for a custom set of curtains or fabrics for a kitchen you’re probably more likely to be a loyal because those there’s a lot more engagement involved there and you get value from that relationship. But if you’re a customer who’s buying a standard tech product, you’re buying some Apple kit, for example, you’re largely rely on price and availability. It’s not like, an example I always use is, if you’re running the website for the HMRC and taking taxes, if that website’s down HMRC don’t particularly care, because they’re going to get your tax money at the end of the day. It’s not their problem if you can’t access the site, but it’s their problem, but they are less considerate. Whereas if you’re an e-commerce provider, a big retailer, if your website’s down a customer’s sitting on their sofa making a speculative purchase, they’ll quite simply go to another retailer so you’ve lost that opportunity.

MG: So what are actually the main challenges related to operating such a site?

CB: Well, I think if you look at the retail landscape, it’s inconsistent, it’s changing, future’s uncertain, particularly bricks and mortar on the High Street. So it can obviously be risky to invest in systems and ways of working that are tailored to that peak demand, because whilst that peak demand is ever growing year-on-year, if you build a system which is perfect for that peak demand you may well find out it’s underutilized throughout the year. I think we talk about the technology landscape but I think it is important to also note that the investment with any business has to be an entire value chain. So from the point where a customer is placing that order all the way through to the receiving that order there needs to be capacity built into that business, so in logistics, in customer care, etcetera. So it’s a big question as to sort of what customer experience the retailer wants at the end of the day.

MG: So bearing in mind these are the core challenges, what has John Lewis done to deal with this?

CB: Well, John Lewis had, by the time I got there, they were already going through a technology evolution. That left them with quite a large, somewhat monolithic Oracle estate. And most people who’ve run Oracle will say that that comes with some hefty running costs that go along with the benefits of being with such a well-known provider.

But in terms of moving forward, what John Lewis looked at was where were the high churn areas of the intern system? Really it is the platform which you want it to be more agile on, need it to be more agile on. We sought to move them into a more microservices architecture based on the Cloud. What I gave was a more rapid deployment capability, as well as an ability to scale in the Cloud very much minute by minute, based upon demand. That was a fundamental change for the business. Keeping the capacity adjustable and keeping the costs manageable, but at the same time blending that with some of the benefits they already had from the Legacy platform.

MG: Wow. That sounds like a really huge, huge project. And the bits you mentioned regarding challenges of moving to Agile CI/CD Cloud, that’s definitely something I’m keen on touching on. But before we move there, what was your role in the whole project?

CB: So in my role I was accountable to the trading board as well as to the IT leadership team for the performance of all IT platforms, so that was branch IT, the website, payment platforms. Ultimately, as a company, if we missed some of our sales targets then IT was often asked to account, and I’d often be the one on a Monday morning explaining how exactly the systems had played. So, I was seeking to stabilize and improve the performance of the systems throughout my time there.

MG: So what sort of approaches did you take in order to reach the end goal in mind?

CB: Sure. So one of the key things I had to do was to work with my colleagues within the development organization, and really get them to see performance and security of that core platform as an everyday part of the job. It was previously quite normal, as an organization, that we would start tuning the website round about Autumn time, ready for Black Friday. So you could mean that you would find some issues later in the day and have to rush to fix them. But what we moved towards was a pair release focus on performance assessment and tuning, and that way we’re able to maintain confidence in the performance of the site throughout the year. So we would run with synthetic agents as well as some real synthetic transactions going through that website to make sure we had absolute confidence that it could cope with peaks.

I guess some of the other things that we did was we looked at the reality of recovering such a system during a major incident, because you got to recognize that do everything you like on performance or security, there’s always going to be something that will go wrong, and it might be completely outside of your control. But in most cases it was clear that the recovery actions, when you’re in the middle of a major incident, are usually pretty limited and pretty predictable. You’re not wanting to make the situation any worse, and you’re wanting to take known actions that will deliver known benefits at the end of it. So we were equipped with some really excellent monitoring and analytics tools, which could quickly alert us to any deviation from the norm, particularly order volumes. It’s quite interesting to see if you looked year-on- year, minute-by-minute there’s an almost identical profile to how do plain people shop. So, that made it quite useful and possible to see any deviation.

What I did was I worked with the team to make sure that we developed playbooks and played ownership of the instant management process so that we could very quickly repeat known steps to get us to a known state, almost to the extent of, if you imagine, of a American football quarterback sort of calling the plays. That would be pretty much on a base of play 24, or maneuver five, whatever. The team would swing into action with confidence, with experience, and that was normally sufficient to get us back to our known states which we could then build upon to get back to full availability.

Obviously it’s not just about managing incidents well, it’s about learning from your mistakes. We also had a focus on problem management. That was again, working closely with the teams. Whilst we’re moving towards DevOps we still had a bit of distinction between operations and development. So bringing the people who are developing the code into the room when we discussed what went wrong so that there was a greater understanding of what happened in reality, and we then focused, as a management team, really on monitoring problems, monitoring the resolution of those, so that we could get fewer and fewer incidents at the end of the day.

MG: It really occurs to me that you took a very holistic approach. Congrats on such a maturity, reaching such a maturity, in the approach. You mentioned moving to DevOps and before challenges like of moving to Cloud, this CI/CD approach. Can you say something more about that?

CB: Sure. Well, anyone that knows John Lewis knows that it’s been around for a long time and there were a number of different ways that systems were developed over the time that the business has been running. So developing a platform and having a team set up to develop platforms based upon mainframe, or based upon large monolithic packages, can drive in more Waterway of thinking, Waterfall way of working, more Waterfall way of thinking.

The challenge that was faced was to look at how we could get into a more Agile way of working, and when it made sense to work in a more Agile way. There was a recognition there were still some projects, some technologies, which were more likely to be Waterfall. But when we look at the technology associated with the web page itself and the presentation layout effectively of that site, there was a lot there that was changing pretty dynamically, a lot of competitive changes to the website to look at what our competitors were doing, a lot of insight being seen from shopping behaviors on the site. So there was really a necessity there, and that meant quite a close working relationship between the business functions and IT to work collaboratively, and at the same time leveraging some external expertise which brought in lessons learned around bringing in CI/CD and moving to Cloud in a large established organization. So, there were some good partners that were worked with there.

MG: Can you say something more about this partner model you took?

CB: So the set up in John Lewis at the time was quite mixed anyway. So they had been working without source partners for some time, so there was expertise offshore in helping to run the systems and to some extent, develop them. That developed quite well so that it was more of a true partnership rather than it being based too much on SLAs and on head count and costs. We were working with people who actually understood the business and felt connected to it. We would involve them in some of the business updates to the high level so that they could feel part of it, feel part of the success. And absolutely when a Black Friday goes well and you’ve made hundreds of million pounds worth of sales, that’s a good feeling to be part of that. So that really helped us in achieving this success, working with good partners who are close enough to the business, but equally brought in expertise from elsewhere that they could leverage in an appropriate way to move us forward.

MG: Yeah, that’s definitely valuable. And this, what I call this feeling of being in this together, it’s really something you should look for when it comes to partnerships. I’m happy to hear that and that you were working in that sort of environment. But I’m also wondering what you were thinking about introducing any sort of KPIs or methodologies on top of that process. Was that a thing as well?

CB: Yes. The Cloud provider which we’d chosen was Google. We started picking up some of their language in terms of some of the KPIs that were targeted. That became a lot more of a focus on data as well in general, as to were we actually getting the right outcomes from all the technical work going on. So it was something that John Lewis had always done pretty well, focus on the customer experience but we were tying that a lot more back to the technical outcomes as well.

But at the end of the day, what mattered to the company was if you made a customer commitment, customer promise, about a delivery, delivery date, delivery slot, then everything would be done that was possible to meet that commitment. And as I indicated previously, that’s from technology all the way through the business processes. So we would, at peak times, be working very collaboratively between IT and the rest of the business to adjust demand profiles, to adjust the customer commitment on the fly, if needed, so we could be certain that if someone was placing an order, say for Christmas, it would arrive when they expected it to arrive. Or if it wasn’t going to, it was going to be late, again, we’d mise that commitment in terms of giving updates to them to reset expectations.

MG: That really sounds like a huge, interconnected environment of variables that’s really hard to stay on top of, so it’s been really interesting to learn your lessons. Thank you, Craig, for sharing that insights, it’s been really interesting. Congrats on being a part of such a massive success story. This interview and this case study was really valuable to understand. Thank you, Craig, for being a part of this podcast.

CB: Thanks a lot.

MG: Thank you. Take care.