loading...

Better, faster and more scalable: Learn how HPC Managed Services enable supercomputing


Abstract

Leading enterprises understand the value of optimizing their high performance computing (HPC) resources, but they struggle with bringing meaningful change to their processes, assets, and infrastructure. For organizations looking to scale their HPC efforts and maximize resources, it is vital that they understand what is driving the industry today.

Samsung SDS invites you to watch this on-demand webinar where we will review the implementation of optimal HPC configurations and discuss the benefits of creating a cost-effective, high core count, high performance-based HPC environment.

Transcript

Dave Gerwitz: Hello everyone. I'm David Gewirtz works here at CBS interactive and we're proud to present better, faster, and more scalable. Learn how HPC Managed Services Enable Supercomputer, a live and interactive webcast sponsored by Samsung. Today we're going to discuss the implementation of optimal HPC configurations and talk about the benefits of creating a cost effective high core count HPC environment. To do that, I'm happy to introduce Brian Kucic Co-Founder and principal at R Systems and George Milner HPC solution architect at Samsung SDS America.

Dave Gerwitz: At R Systems Brian is responsible for the management of all business activities related to sales, customer relations and project contracts. Prior to R Systems, Brian served as the private sector program director at the National Center for supercomputing applications at the University of Illinois and currently serves on Microsoft's partner advisory board and on the Alliance for high performance digital manufacturing. At Samsung George has built deep domain expertise as a systems engineer in the semiconductor manufacturing industry with the span of 15 years designing and building HPC clusters for EDA applications. Hey, Brian. George. Welcome.

Brian Kucic: Hey Dave, it's good to be here.

Dave Gerwitz: Hi. Brian, before we kick off the discussion, I want to remind our audience that this presentation is interactive. You can use the ask a question button on the left side of your console to send in questions at any point during the webcast and we'll do our best to answer them during the Q & A session set aside at the end of the presentation. If for some reason we don't get to your question or if you're watching this on demand, we will definitely reply by email. If you'd like more information about Samsung SDS, please refer to the related resources widget to the right of your brewing console. And with that I'm going to turn things over to Brian. Kick it off, Brian.

Brian Kucic: Hey, thanks David. Appreciate it. Hello out there. Thanks for joining and taking your time out of your day. But today we're going to talk about, as David mentioned, HPC at HPC managed services. So what we're excited to talk about, especially this slide here is the rapid growth and adoption rate that we see in HPC. As you can tell since 2017, it's going to grow dramatically basically doubling the size by 2026. And we see that as well with our indicators, and the clients that we deal with.

Brian Kucic: There go the next slide. So we'll be talking in terms that some of you may or may not be familiar with a hyperscale. So that is the AWS, the Amazons and the Googles and the Azures of the world, the large scale resources. We're talking about performance computing, basically HPC. And then we also will talk about supercomputing as well. Such as sentiment and HVC. So we're talking about clusters or whether you use a single high performance server, or our array of architects, even a hybrid cloud model would fit the bill. So next slide please.

Brian Kucic: So what's trending? We see a lot of things going on right now that's really exciting in the marketplace. I'm actually taking all the resources that are out there, the devices that are gathering data and then bringing us those in house. And then having companies decide what they want to extract from those data, from the amounts of data, I should say that they're getting as well as applying AI and machine learning, deep learning applications to that data to get the gold nuggets that they're looking for.

Brian Kucic: We also see a rise of GPU use as far as the applications they're being available to take advantage of the performance of GPUs. And also what we're talking about here today is the rise of cloud and how cloud HPC is really gaining momentum in the marketplace. And HPC become very accessible, not just through the hyperscalers, and we'll talk about that today. Next slide please. So we also see industry challenges and opportunities. And on the challenges side, obviously, a lot of folks we seen outdated infrastructure and with today's high dense racks and the power that's needed and the cooling that's needed to host these racks is just getting to levels we haven't seen before.

Brian Kucic: And we also see limited or challenging to try and hire skillsets in HPC is really being a challenge and not only have we seen in the commercial side of the house, but in academia as well. And that's what we'd like to talk about too, is how we address those issues as well as running a scalable fault tolerant operations. So on the opportunities, actually the good news is that you can access these HPC bare mental resources very easily, and you can do that in a variety of approaches. And we'll talk about that more in detail, but one of which is a hybrid approach.

Brian Kucic: Be able to leverage the HPC and AI applications in the same resource or carve out specific resources that are more applicable for those certain applications. And obviously when you do the combination of the two, you really increase your operational excellence both on prem and off prem as well. Well, I can't emphasize enough how good architecture is so important to check off your HPC environment because we've seen people that, or companies that actually have invested for Peak and design for peak when actually they're not using peak that often and therefore they pay for a lot of white space during the year, throughout the life span of the clusters. So George, do you have any comment on that?

George Milner: Yeah, it's the life span of the clusters, what we deal with the most and think it's important aspect.

Brian Kucic: Right. Great. So let's move on.

Dave Gerwitz: So this is really interesting stuff. And when you're talking about this, you're talking about the pinnacle of computer power, right? You're talking about just how high performance you can get in terms of capability, especially when you're looking at Hyperscale and those kinds of capabilities. So with that AI is certainly one of these challenges that benefits from as much capability power, computing power, storage and so forth. So given that AI has been with us in some technology for one way or another, why is it only beginning to start now really being adopted in mainstream HPC circles? Georgia, are you comfortable talking about this for us?

Brian Kucic: Yeah, sure David. AI is one of the first things that I saw when I started getting into working with computing many years ago and is usually express to something like a neural net and neural nets were trained like we see a machine learning, deep machine learning today, but this was all very early on back in the late 80s. But AI has to be fed data and it has to be fed a lot of data because it's processing and sorting large amounts of data like we were talking about earlier. And also it's a dynamic process where it's learning more, it's actually rewriting itself and adding the knowledge to itself.

Brian Kucic: So it's gaining a self replicating learning mode. So it takes a lot of data. We're seeing a lot of AI applications. One in one. One area that I have been working with this in the autonomous driving and driver assisted applications where the data has been gathered through a test vehicles going throughout a different scenarios and they're literally processing hundreds of terabytes of data per day that has to be offloaded into a remote or into a storage system. And all that data is then actually parse through an HPC cluster and then the results are actually fed into the AI part of the entire system that's putting together the data and the programs for the autonomous driving.

Brian Kucic: This is one example how AI has been used mainstream today. Some of the other applications that we see is when you talk to an automated call center, you call him to Walmart, and it's asking you to speak to it. That's another level of AI. So AI is becoming one more common today in everyday life, and it's going to be coming more and more mainstream. As we start getting more and more data process through the AI structures that are taking, and I'm learning this data on their own.

Dave Gerwitz: What are the reasons you need this computing power is because a lot of these interactions need to be pretty much real time, right? It's not where it can just go off, think about it and spit out a report in a couple of hours. You're dealing with consumers on the phone or, vehicle navigation and stuff like that, right?

Brian Kucic: That is correct. And it becomes a real time application after it's been running for a while and then it has enough data where it can actually start interacting in intelligence seamless way. But all that data that has to be filtered to some HPC cluster because the data that it's bringing in is so concentrated and so massive, the day that you actually have to use several types compute processing. And we'll talk about in a little. A GPU processing is part of this whole mix of things because the data is actually coming in as data that's processed in and HPC cluster, but through GPUs because it's more of a vector type data was taking visual representations and actually putting it in as data for the AI to use.

Dave Gerwitz: So we are in fact living in the future. So Let's move on and switch gears a little bit. Will talking about AI a moment ago, but now we're going to start looking at some of the implementation questions in terms of where do we put all of these computing resources and how do we manage them? So in IT we're always talking about cloud versus on premise and moving to cloud and so forth. But you're talking about HPC, you're talking about really serious computing power and what goes into the decision between whether you do a hybrid version of this or you use some on demand service or on Prem? Georgia, is this one you want to take or is this a Brian question?

George Milner: I can take it then we'll hand is over to Brian. The factors that are influencing cloud is because of the availability and just the fact you don't have to work with the actual hardware itself. You're actually using the virtualized version of the hardware, this provided by the cloud service provider. So essentially you're allowing them to run everything in the background and you're working in a virtualized environment. I'm sure the audience already knows all these facts, but the factor of using the cloud in HPC, and I've said this in the past, if it's HPC in the cloud, is it really HPC? Because these diverse geographically separate resources.

George Milner: And I know some of the cloud providers are starting to consolidate resources in zones, but HPC in the cloud is probably a different type of HPC that you would use for different applications. The on premise HP resources. There's obviously what's been being used for many years now and that's where you have the bare metal hardware on premises and you're learning your batch applications and your scheduler and everything on top of that. And it's has a lot less overhead and HPC in the cloud.

George Milner: So there's various advantages to having the bare metal, HPC cluster on premises where some companies that may have some hesitation to put their information in the cloud because of security reasons or availability or those sorts of things. Prefer to keep it on premises versus on the cloud, which they have a very high level of security. And then you get all the flexibility of all the services, and the things that she can do with the dynamically scaling up your resources. And then also you can do things that you can't do on bare metal or you can, but it's very complicated where you can actually dynamically distribute workloads in a more seamless fashion.

Dave Gerwitz: I was actually just asking you to come on in and shed some light on this as well. I know you've thought about it quite a bit.

Brian Kucic: Sure. Yeah. Well, we've seen several instances where folks that are on their own prem resources are getting constraints internally. So they're trying to get a project done. They're fighting over the resources. They don't have time to wait for. So what they do is they go off prem approach and that enables them to quickly get there, as George mentioned, scale up and scale down and get their project done on time. And now what the emergence of cloud, you have a lot of different options out there to do that.

Brian Kucic: And George mentioned bare metal, which is really exciting for us because that's what we offer and how you can actually ... Basically make it almost transparent from your internal resources, your remote resources. And so this is exciting to us. We seeing project by project where they don't even want to implement that project internally. They want to do it externally. So that's where we get a lot of our requests from. And the other thing too is, as far as internal configurations might be a little bit different and we might have either the quantity that they're looking for or actually an increase in generations, basically the latest and greatest hardware that they could actually run their codes on, get performance characteristic and take it from there. So it's pretty exciting to have that option for companies to not only just depending on their internal resources.

Dave Gerwitz: We talk a lot about virtualization, but when you're talking about trying to get the very last bit of performance out of things, can you even really do the kind of HPC activities on top of virtual layer?

Brian Kucic: Well, go ahead George, if you'd like to. I jumped in, but go ahead.

George Milner: No, go ahead Brian.

Brian Kucic: Well, basically what we've seen is a lot of different, solutions basically on BMS or on bare metal. And so the bare metal comes into play when, like you mentioned David performance driven. And we feel if we can take holistic approach and not only just look at that peak side, but look at other factors such as interconnects, such as file systems and storage, combined those together and then look at the performance picture. And that's where I think we have a real unique offering.

Dave Kucic: That makes sense. All right, let's move on. So going back in time, we've gone through so many changes over the years. And I'm going to ask this of each of you. When you got started in the industry, what are some of the major shifts you've seen taking place? And my first gig was actually punched cards. So we're talking about a long time ago. George, what about you? And when you got started, what were some of the shifts you've seen that take place in the industry and how we've gotten here?

George Milner: Well, David, what I've seen just from, since I've been involved with it and like you mentioned earlier in the introduction, most of my experience has been in semiconductor and semiconductor manufacturer and early on. One thing that I can just say run off the bat, things have gotten bigger and faster and probably cheaper to operate too. One of the things early on is that there wasn't near as much modularity as we see today as far as blade servers and a modular scalable system as far as storage, typically what we saw back in the year early 2000s, was lots and lots of racks servers, a gigabit internet was the hot fast set up back then.

George Milner: And networking was a little more cumbersome, a little bit harder to manage. There was not the retirement that we see today as far as network management. And one thing about the early days is the rack servers just took up so much resources and power and also they were a lot more moving parts with a modular system that we see today. So it was a lot more maintenance intensive I would say, so you have hundreds scale more power supplies, fans, memory, spinning disk, all those things. And so it's a constant break-fix nightmare, basically is what it used to be these days with the scalable modularity that we see with blade servers and the higher density, form factors is there's a lot less moving parts. There's a lot more shared hardware resources.

George Milner: The power has usage, it haven't gone down, but it's a lot more predictable, actionable because you know what you're getting with each chunk of modular scalability. So those are some of the things. And then also we're in a hundred gigabit network architecture now. So if you look at that from 16, 17 years ago, that's almost a hundred fold increase in through put in network speed. Right now I think what we seen in the industry is 40 gig distribution, hundred gig backbone. And I think it's also starting to move to a higher gigabit Ethernet from the actual servers themselves. So we're seeing a lot of a faster, better and more affordable than it was back in the earlier days.

Dave Gerwitz: Numbers like that makes me all tingly. Brian, how about you? Where would have you seen in terms of changes since you got started?

Brian Kucic: I'll tell you, we got started back in 2005, and I think the biggest example that comes to mind is the adoption rate of HPC and the adoption rate of the cloud. So basically we were offering resources to a variety of verticals and got started heavily with the oil and gas and those are the big time users of HPC back in the day Seismic Applications would really take a lot of processing power and still do today. So coming out of the gate with bare metal solutions back in 2005 was great for certain segments of verticals that we're looking at. But as far as everyone else I think the paranoia are taking their data outside the four walls was so hard to ... is just a hard nut to crack.

Brian Kucic: And I picked the security teams were just saying, "Hey, this is really difficult for us to take our data out of there we're going to have to do everything on pram." And for a while there was ... is pretty lean years, but you've seen that with the popularity cloud computing grow that the data concerns and as George mentioned earlier with the security of the cloud has really not been as big issue is what we first came out of the gate with. And that's great. We're excited about that because we've been handling the proprietary data for a long time and look forward to other companies that are looking to outsource their or offer them their resources could certainly feel comfortable with our security policies procedures that we have.

Dave Gerwitz: Yeah. Now, George, you've been thinking about some things like that too. Do you want to jump in before we move on to our next question?

George Milner: Yeah. I just had another thought on this early on. Slag Brian was saying. I was working with the entertainment industry and this was back in basically the early 90s when some of the rendering for some of the early full feature length animation movies were just starting to come out through industrial light and magic basically allowed say animation, computer generated interactive type application. So one of the things was Trelissick park was a big thing and silicon graphics set back then that they were building a better dinosaur and essentially, what was happening is all of the rendering was all done on high performance compute render farms.

George Milner: So that was an earlier application of HPC and it's still done the same way today. And as we all know there's such a big plethora of computer generated animations and we almost see seamless computer generated art in movies now. So that you can just take the difference from back in the early to mid-90s and what was done then. There is a big deal. It's almost a seamless part of what we see every day now. And that's all down because it's all in HPC backend. And so I just thought I would throw that after.

Dave Gerwitz: Yeah. I'm thinking about this a lot and we were talking about AI a few minutes ago. Those are our areas that didn't seem super possible back in the day that we're now applying to so much of our technology. Are you seeing AI in graphics? What are the drivers? Are you seeing pushing HPC?

George Milner: Generally it's more available these days, as some of the bigger drivers today. But there's also more applications, they're taking advantage of age. We're looking at bioinformatics processing, hospital data and just where large amounts of data always fix the bigger faster, hardware, to getting results that you're looking for. I'm looking for the problem that you're trying or looking for a solution for the problem you're trying to solve. We see it more available to smaller companies that are doing the same thing that the big companies are, but on a smaller scale. It's more available to them. So those are some of the things that we're seeing.

Dave Gerwitz: Yeah, that makes sense. It's also interesting because one of the benefits of cloud versions of this are for the smaller companies who are trying to do much bigger projects. I want to move on to a different topic, but Brian if you've got anything you wanted to say about markets as we move into this, I'm going to ask you first about where fault tolerance fits in, but also Brian, if you've got thoughts about different opportunities that problems that can be solved that couldn't otherwise be solved before this technology.

Brian Kucic: Again, so what George says, I think the size of the HPC environments that are available now were not available before and then the costs and the ability to scale up and scale down, I think is really driving the resurgence. For applications such AI and to me it's again the improved and the amount of applications and the quality of applications for AI and the marketing and big data and open source infrastructure, fabric, stuff of that nature is really attributing to the resurgence is what I believe.

Dave Gerwitz: Yeah and what is so apparent to all of this.

Brian Kucic: Well, when you have all that data that needs to be quickly analyze and given results back time to the solution has to be very short. So that requires a lot of infrastructure that can require basically tolerance and redundancy, which right now is a lot of the ... as I mentioned earlier, that a lot of the infrastructure is just not available. And so we're looking at it at the extreme power and extreme redundancy as far as not only on the compute side but the infrastructure side as well. So that's pretty interesting.

Dave Gerwitz: Now George, you're coming from a semiconductor background. When we're talking about things like fault tolerance on the HPC scale, we're really talking about a different kind of problem that's being solved. Then, for example, fail over from one web server to another. Right? What kinds of problems is fault tolerance solving or needs to solve it at the HPC level.

George Milner: Thing that comes to mind first when we were talking about fault tolerance is when we build HPC stack. When I say stack I'm talking about from the hardware. Well, we've tried to build in as much resiliency and redundancy as possible so there's at least double of everything, beginning with the power of the supply to the racks, the servers or the blade server chassis we'll have multiple power supplies, multiple fans.

George Milner: And then there's almost two of everything. Especially with the networking all the way from the actual blade server interfaces all the way to the core network architecture. So all of this resiliency and redundancy is built in so that on the hardware level, if something does fail, there's something that will automatically take over. And there's not even a notice from the application side that something has happened or failed on the hardware side. So the main things keeping the applications going and one of the things that we see in semiconductor is, we have several different use cases in HPC. One of the main things is different levels of design and testing and verification of the designs is that some of the jobs that are learning in the batch program on the cluster grid itself.

George Milner: Some of the jobs have been queued up and a plan and then put into the cluster to run because the cluster runs multiple jobs, as far as semiconductor is concern and so what can happen is if you don't have the resiliency or the fault tolerance built in, is that a job could fail if there's some interruption and when a job fails that could cost dozens if not hundreds of engineering hours. Because, say a Job has been running for two or three days and they're waiting for the results from the cluster crunching the data and bringing it back the results and very costly. It could cost hundreds of thousands up to millions of dollars in lost engineering time just from the failure of one job, especially if they're on a really tight deadline. So very critical to have the fall tolerance built in. And that's all done in real time.

Dave Gerwitz: So in some of the technology that enables that level of fault tolerance, what kind of work is being done to make that actually work?

George Milner: Well, I think the work that's being done is you can fail over to a different piece of hardware. You can also in writing the application layer or the job schedulers that the job schedules can actually hand over a job to a different process or different thread if it's written for that. So those are some things for sending there.

Dave Gerwitz: Brian, do you have any final thoughts on this before we move on to the next slide?

Brian Kucic: No, I think George covered most of it or if not all but very well. So again the fault tolerance that we see especially in data centers is just incredible these days. And with the applications, again being real time and the time to solution extremely short. This is very important to a lot of our clients.

Dave Gerwitz: So you just said these days, so I'm going to take that and multiply that out a bit. Brian, take us five years into the future and then 10 years into the future and show us what we're going to be dealing with in this field.

Brian Kucic: Well, as you can imagine, the amount of data is just continue to grow and grow and grow. And that requires companies to come up with algorithms and other applications to mine that data and then get the results out as quick as possible. So there's a lot of talk, moving the compute closer to where the data resides and so on and so forth. So basically I probably see just like the trying to edge computing, just pricing more compute close to where the data sources are at and continue to go down that road. At least that's basically what we see going on right now and that's why we're able to move our resources and not just one location but several locations just for that reason. And as far as a network latency as well plays a role in that. So George got anything to comment?

George Milner: Yeah, Brian I totally agree with, your comment there. What I'm seeing that seems to me it's very interesting is in the hybrid compute model. I've been really looking at Microsoft Azure stack very closely because it's got some really interesting features that seems to be where I think the trend of hybrid computing or hybrid HPC is moving towards in that you actually have the hardware on premises. And this isn't just a proprietary to Microsoft, it's actually proprietary to all the major OEMs. And you can keep your hardware on premises, but you're taking advantage of all the services, and the things that make the clouds so attractive, which is running, being able to dynamically move workloads on different parts of the cluster without having to go back in and reallocate and reconfigure for a specific group that may be using the cluster itself.

George Milner: Also, some of the things that we're seeing is that, or at least some of the things I'm noticing is that having your hardware on premises, but actually having a route to the cloud that allows you to put things that are out there that you're comfortable with doing and it gives you that burst out capability like Brian was talking about earlier from the bare metal side. It gives you a path to use for on demand scaling when you need it. Also, the thing about hybrid computing is you have to basically run in as some sort of virtualized environment. I think what we're going to start seeing in the future ahead is that the power in the process and capability of a hybrid architecture running a virtualization layer is going to take away that overhead to the point of where the virtualization is going to be negligible compared to what you would get with the same stack in bare metal.

George Milner: And I think what that's going to show us is that, that's where we're going to get the real flexibility and I'm excited to see where it's going to end up in the next few years because I think the virtualization layer gives you a lot of advantages that you just can't have on bare metal. Bare metal you just throw more hardware at it. One of the things we're doing with our managed services and this is something we've seen come across boards is with a bare metal you can actually have on demand services are on man burst out capability. Either furnished on premises or off premises, but in close proximity to where you may be computing in one cage, you can actually connect over to another cage and Nicola Cantan facility and borrow resources over there to get you through the tight deadline.

Dave Gerwitz: Do you actually want to jump into the next question I was going to ask. And Brian, go ahead. Take it.

Brian Kucic: No, was just a little thing. I was built up that landed Georgia's, one of the things that you get on it is with the data management side of the House I'm workflow orchestration is vitally important. We're seeing more and more companies are starting to ask for solutions and we definitely regarding that have some tools in our toolkit to help address those issues. So data and data management is just ... is going to be a bigger challenge. I think coming up in the near term here.

Dave Gerwitz: So in terms of ... and I'm going to switch off this just slightly because I'm curious about this. As you look at the question in between on prem and off prem, you talked about hundred gigabit speeds between, backbones on prem, but when you start to build hybrids in, you're talking off prem to say the cloud environment. Are we going to see over the next five or 10 years better performance in terms of data transfer between on prem and off prem and where do things like managed services fit into that? In terms of are you able to either add expertise or technology that wouldn't otherwise be possible to get that hybrid environment working at speed? George?

George Milner: Yeah, so what we're seeing, and this is with some of the co-located personally say we're working with is a yes, the speeds that we're seeing in bare metal environment are certainly achievable in hybrid condition or situation where a lot of the facilities today actually have interconnected facilities between CoLos where if you're running, say, a private cloud or hybrid cloud with another work group, that may be another location. They certainly have the backbone speeds that can get up to the a hundred gigabit range. Most of the time it's 10 gigabit, maybe something in between. But also what we're seeing in some of these is that they are so interconnected with the cloud providers, there's on ramp into the cloud environment, pick any cloud provider and the facility can provide you a direct connection into the cloud from the co-located facility.

George Milner: So there're some things that are real interesting and exciting because as a managed services group, what we can do is we can position the resources that we're providing and maintaining for our customers in proximity to these locations where the customer can have a connectivity into the cloud and into high speed architecture. I know one of the providers were using here in Texas as interconnected facilities throughout the whole United States. And they happened to be one of these providers that basically use their strategies using dark fiber that was in place years ago to strategically locate their facilities, so they can scale up to a hundred gigabit and beyond very easily. So and that gives you external backbone to have like I said, you’re on ramp into the cloud.

Dave Gerwitz: So Brian, thinking about this, what we're hearing is on one hand there's the technology just how fast can these processors perform, how much traffic can the network management, there's also this whole logistics component. Once you start talking about dark fiber and mixing it with on ramps from network providers and so forth, you start to have a whole lot of logistics complication. Is that where the managed services part of the puzzle comes in? Where the expertise like a firm yours or George's can manage that logistics, how does that fit into the puzzle?

Brian Kucic: Yes, absolutely. Because what we do, and it's a little different than companies. Companies are mainly focused on the Ram from HPC, or they can burst out to the cloud. We're always looking at the latest and greatest technology that's coming down the road here in the near term or few years out. But what we do is get pre-released access to that technology, so we can benchmark that, we can test it and we can see what exactly can deliver to the clients. So when I say that, I mean our system says a lot of tools in their tool belt with different ISV or vendors so to speak. So we can actually take a client's problem if they're having data management, or they want a better way to access the cloud through a single pane of glass and still access their internal resources through that single pane of glass we can do that. So we have a lot of tools that we can help out once we get to know what the pain points are from the client to apply these applications. And make it a lot more efficient organization.

Dave Gerwitz: That's a perfect segue to my next question, which is examples, since Brian, you've been on that topic, why don't you continue for a bit and give us some use cases and then George, I'll ask you to share with us some use cases as well so that we can really get a feel for what this is and to our audience will be answering questions in about nine minutes. So get them on in, and we're good to go. Go ahead Brian.

Brian Kucic: Yeah, thanks. I can think of a couple right out of the gate here is one in which we had an oil and gas clients that wanted us to move the resources down in one of our data centers. And so we took that down over a weekend, moved it, and then brought it back up the following week. And what we noticed is there was some configurations that weren't correct. And so make the long story short is when they access the resources at the new location, they're wondering why they got all of a sudden at 15 to 20% performance. And that was because it was misconfigured from when we moved it. And so they're very happy with that. And again, I think it goes to the skillsets that you have and to identify what's wrong with these resources right out the gate before even the customer knew anything was wrong.

Brian Kucic: So the other one is we had transparent, basically identical resources from the customer's internal resources to our off prem resources, absolutely identical. After the 60 day project, we got a call with their team to wrap up and it was noted that even though the resources are identical. We're still providing the customer again around it somewhere in the neighborhood at 22% performance increase. And that's the experience that you get when you have people that are focused on HPC, running HPC resources. So basically if you had a Ferrari, you don't want somebody that does have experience with the Ferrari in your mechanics. So we take a lot of pride in our tech team. They're very performance driven and that shows in the results that we received.

Dave Gerwitz: And George, do you have any examples you want to share with us?

George Milner: Sure, David. So one of the things, and like Brian was saying, a lot of it has to do with the preparation before resources are delivered to the customer. What we've seen in the past is our customers want us to bring the technology is them not have them tell us what they want. We always listen when they say we want this thing or we're very interested in this approach to building something out or bring something into the mix. But what we try to do is be proactive where we can do integration, where we're testing, burning in and preparing the server stack and storage before it's even delivered. And we make sure that all of the firmware revisions are very consistent with what may be existing in the actual cluster itself. We can set up any preparation that helps provide the customer with the provisioning. Basically what we're trying to do is make it easy for the customer to have a hand-off from what we do to what they do. And then like Brian was talking about R Systems takes it up to a whole other level where they can work on the application side and the operating system side to, assist with getting the systems integrated into the actual cluster itself.

Dave Gerwitz: Awesome. I love this quote and I want to spend just a second on it and then we'll go into Q and A. The idea that of course everything is designed in the sense that everything we do has some sense of a purpose, but the proper implementation of it is a bit of a challenge. I think I'm going to throw this one to George first. Just talk to me about designing. Well, when it comes to HPC.

George Milner: Well, David the science has to do with the specific type of HPC. And I don't think we've discussed that a lot, but there's different kinds of HPC for different applications. I don't think there's kind of one design that fits everything. So a lot of times you have to look at what kind of data is the actual application going to be per HPC. So oil and gas is probably a lot different. They are different than semiconductor, semiconductors different than AI Processing, and then AI processing is vastly different from CGI or financial services.

George Milner: First of all, you have to kind of take a top down approaches. What is this? What does the data actually going to be doing? What are the results that's expected from processing the data? So is the data compute intensive or is it throughput intensive? So that has a lot to determine where you start with the basic design of what parts are you going to put together. And this is more on the hardware side. You get into the application side, then there's a lot more choices. There's a very specific applications to each one of those verticals that I just mentioned. But from the hardware perspective a high throughput, high data volume type of usage particularly in oil and gas cause they deal with such high volumes of data but it's very throughput intensive.

George Milner: So you have to have interconnected processors with a backend Mesh using OPA technology or InfiniBand technology to interconnect all of the parts of the cluster together as far as making it look like one big compute image versus like in semiconductor we call it, it's very high throughput as far as, it's a busty type of a use case where you don't really saturate network interfaces continually, but it's very computer CPU intensive because you're processing a very intensive amount of data to push the result back to the storage where the consumer or the engineer could see that result. So your design has to do with what is your specific application? What does your data look like? How are you trying to solve problems with the data that you're crunching and an HPC cluster? So I can just say summarizes, it all has to do with the top down a design aspect.

Dave Gerwitz: All right. So Brian, you have the opportunity for a last thought before we go to Q and A.

Brian Kucic: Well I agree with everything on George was saying, he's the architect there. So I really don't have a say either way on that. But all I can say is I really appreciate everybody's time out there and this has been very exciting for us. I hope it has been very beneficial for you all, and I'm looking forward to some good questions.

Dave Gerwitz: Yeah, we've got quite a few questions. In fact, queued up, I don't know how much time we're going to have for them, but let's start with the first one from Matthew. Matthew asks, short of large increase in headcount, how can organizations with limited in-house HPC expertise address workload and staffing issues? And Brian, should I throw that one to you first?

Brian Kucic: Yeah and I'm not putting a sales pitch all year, but we provide that service to a lot of our clients. We've seen that started out in universities before they got tired of Jason and Grad student around to , and manage our resources, so we can actually do remote management as well as the service, and that's part of our HPC managed services. So if you're again having as I mentioned at the start of the Webinars, one of the challenges that we've seen is hiring skilled folks in HPC is becoming very challenging. And if there's some opportunities for us to help you out, please get ahold of us.

Dave Gerwitz: And I see George out there virtually raising his hand as well, wanting to answer this. So George, how to organizations with limited expertise handle this workload and staffing issue?

George Milner: Well, they call it a managed service companies like us and with R Systems, and we can put people on Saturday, like Brian said, we can work remotely, most clusters are access remotely. So it depends on the level of comfort that the customer has as far as allowing us to come in and help, either troubleshoot issues or to monitor hardware and performance, function to be able to be proactive on when we see some hardware. She comes up where we can come on site and fix the problem. But the main thing is that we provide expertise in having on crust staff or remote access.

George Milner: That's what a service is. We can do everything. White Glove Service, which means swapping out parts when they're broken all the way to doing the data center infrastructure monitoring and reporting of results to that or a processor or server warrants and monitoring for and now do something you might be out of Spec on that scope. So that's what we do for the customers, they can concentrate on what they do best, which is running their business. And we don't want to run their business for them.

George Milner: We want to be a supplemental part of the business so that it frees them up from the worries of, had to deal with all the underpinnings of what makes an HPC cluster work all the way from the power coming into the actual facility all the way up to here's what your processors are doing and here's the reporting that we do. We also can monitor the security so customers can be at ease, is who's going to be around their equipment. So those are a lot of things that we can do to supplement what you would normally have to have head count for. We become that for you. We become staff augmentation that a company can use in order to focus on what they do best with their business.

Dave Gerwitz: So George and Brian, you guys are both from separate companies, but you guys work together as a team. On some of these problems?

Brian Kucic: Oh, absolutely. Yes, so we've done that very successfully and even have some resources out in Las Vegas and that process is moving along better than planned. And we're looking at growing the services offering throughout the country and possibly even global here.

Dave Gerwitz: Great. So we've got a couple more questions I want to try to get them in. Emma asks, how do hyperscale approaches to HPC differ from small to medium sized companies? Brian, why don't you take that and George follow up with it when he's done?

Brian Kucic: Yeah, we've seen companies that ... we have used hyperscale, but then they actually come to us and say, "You know what? I don't really need it." Thousands and thousands of cores. I'm just looking at this application and only scales to 256. So the most run is maybe two jobs at a time. So there's need for the hyperscalers, when the scalability comes and then on the other side they just look for performance on a lower level. So that's what comes to mind when I first think of that. George, you got anything to add to that?George Milner: Sure, Brian. So hyper scaling means that there's a dynamic teacher involved where your computer resources and your storage can actually adapt to the workload that's being asked for it to process. So perfect example is the cloud. You can actually dial in a certain amount of dynamicism to your account to where once you start scaling up and you start seeing a certain load on resources that you have, then they can actually start adding CPUs and adding storage and compute power. And then you can start shifting your workload out.

George Milner: And then once your workload draws down, then your resources to save you money actually starts scaling down with that. And so you have this elasticism or elastic things. There are cloud providers to say we have elastic cloud or we have an elastic picture chart clouds. So hyper scaling gives you that kind of functionality in the cloud in a pure public cloud environment. In a bare metal environment or in a hybrid situation, you can and going to the first week, you can actually scale out to the cloud and use those resources as burst out capability if your workloads are becoming a lot more than what your own premises or your bare metal resources can handle.

George Milner: And then you can pull the scaling back in again to save you money and resources but in a bare metal situation, the scaling can be done not as dynamically as in the cloud, but you can have resources that may not be new being used in one part of the cluster or in a different location. And then you can actually have the connectivity in place in order to add the resources that you need for some tight deadline. Or you're trying to make some tape out in the semiconductor industry, or you have a product to deliver. And say you can actually reach out and grab those resources and use them. So that's the definition that I see in scale out, scale up, and you can stand up resource and in stand down when you don't need them.

Dave Gerwitz: We've got another question I really want to get in. We don't have a lot of time, but Anna asks, what are the biggest areas of opportunity in the HPC landscape? So can you guys give me two minutes each, starting with George.

George Milner: So one of the biggest areas of opportunities that we see as is managed services Obviously we think that companies are wanting to take advantage of managed services. One, it's because we can be the augmented staffing, but we can also two, take away the pains and the hassle of having to work with the facilities part all the way up to getting infrastructure install. But we can also scale up just like we were talking about a minute ago. And with our partners like R Systems, we can go all the way up in stack to do the troubleshooting, to do application development, to do application, administration and to do cluster design. I think there's opportunities are that you have many service companies like us ready to come out and work with you on any level. And I think that's one of the biggest opportunities that customers have for using companies that are doing services like us.

Brian Kucic: No, I totally agree with George was saying, as I mentioned earlier, some of the areas that we're seeing good growth is the hybrid approach where you still have a smaller amount of HPC resources on prem, but then you have your larger amount so you could adjust to your workflows accordingly, and data management and workflow management, it's really getting a lot of tension right now. So as mentioned before, when you can have access to multiple providers and you're moving data around in between those providers and internally that becomes a logistics issue. And to manage that correctly is what we have the tool set to offer as well as a scale. And scale up, scale down capability. So we're pretty excited about that.

Dave Gerwitz: Well, that looks like all the time we have. If we haven't gotten to your question, we will answer it via email. And if you're watching this on demand, feel free to send in your questions and we will answer it via email. We hope you've enjoyed today's webcast presentation. Feel free to head over to the related resources widget to check out more information about our sponsors Samsung, and huge thanks to Brian Kucic and George Milner for sharing their insights with us today. I don't know about you, but I was fascinated by it and I'm sure most of you out there were as well. And thank you for taking your time to come join us today. I'm David for CBS interactive. Have a great day.

John Bertoli
John Bertoli

John Bertoli currently serves as Head of Marketing & Partner Services at Samsung SDS America where he is responsible for brand awareness and driving demand through outbound campaigns and optimizing inbound marketing channels to generate meaningful opportunities for the various business units and solutions, namely retail technology, digital out of home (DOOH), HPC Managed Services, blockchain, and retail analytics software.