Friday, 31 August 2018

Jenkins: Shifting Gears

Kohsuke here. This is a message for my fellow Jenkins developers.

Jenkins has been on an amazing run, but I believe we are trapped in a local optimum, and losing appeal to people who fall outside of our traditional sweet spot. We need to take on new efforts to solve this. One is “cloud native Jenkins” that creates a flavor of Jenkins that runs well on Kubernetes. The other is “gear shift”, where we take an evolutionary line from the current Jenkins 2, but with breaking changes in order to gain higher development speed.

I say it’s time we tackle these problems head on. I’ve been talking to various folks, and I think we need to take on two initiatives. One is what I call "Cloud Native Jenkins," and the other is to insert a jolt in Jenkins.

Some of you have already seen the presentation I posted on the Jenkins YouTube channel. In this post, I’ll expand on that with some additional details.

Jenkins: Shifting Gears Presentation (Slides)

Come hear more in Kohsuke’s keynote at Jenkins World on September 16-19th, register with the code JWFOSS for a 30% discount off your pass.

Our Amazing Success

Our project has been an amazing success over the past 10+ years, thanks to you all. What started as my hobby project became a huge community that boasts thousands of contributors and millions of users. When I think about what enabled this amazing journey, I can think of several magic sauces:

Extensible: the ability to take the system, or a portion of the system, then build on top of it to achieve what you need, without anyone else’s permission. Here, I’m not talking about the specific technical mechanism of Guice, extension point, etc, but rather I’m talking more broadly about the governance, culture, distribution mechanism, and so on.
General purpose: At the base level, Jenkins can be used for any kind of automation around the area of software development. This matched the reality of the software engineering world well. Combined with extensibility, this general purpose system that is Jenkins can specialize into any domain, much like Linux and JetBrains IDEs.
Community: Together we created a community where different people push envelopes in different directions and share the fruits with others. This meant everyone can benefit from somebody else’s work, and great ideas and best practices spread more quickly.

Our Challenges

The way we set up our community meant that collectively we were able to work toward solving certain kinds of problems locally and organically, such as Android application development, new UX, more expressive pipeline description language, …

But at the same time, the incremental, autonomous nature of our community made us demonstrably unable to solve certain kinds of problems. And after 10+ years, these unsolved problems are getting more pronounced, and they are taking a toll — segments of users correctly feel that the community doesn’t get them, because we have shown an inability to address some of their greatest difficulties in using Jenkins. And I know some of those problems, such as service instability, matter to all of us.

In a way, we are stuck in a local optimum, and that is a dangerous place to be when there is growing competition from all sides. So we must solve these problems to ensure our continued relevance and popularity in the space.

Solving those problems starts with correctly understanding them, so let’s look at those.

Service Instability

CI/CD service was once a novelty and a nice-to-have. Today, it is very much a mission critical service, in no small part because of us! Increasingly, people are running bigger and bigger workloads, loading up more and more plugins, and expect higher and higher availability.

Admins today are unable to meet that heightened expectation using Jenkins easily enough. A Jenkins instance, especially a large one, requires too much overhead just to keep it running. It’s not unheard of that somebody restarts Jenkins every day.

Admins expect errors to be contained and not impact the entire service. They expect Jenkins to defend itself better from issues such as pipeline execution problems, run-away processes, over resource consumption so that they don’t have to constantly babysit the service.

Every restart implies degraded service for the software delivery teams where they have to wait longer for their builds to start or complete.

Brittle Configuration

Every Jenkins admin must have been burnt at least once in the past by making changes that have caused unintended side effects. By “changes,” I’m talking about installing/upgrading plugins, tweaking job settings, etc.

As a result, too many admins today aren’t confident that they can make changes safely. They fear that their changes might cause issues for their software delivery teams, that those teams will notice regressions before they do, and that they may not be able to back out somes changes easily. It feels like touching a Jenga tower for them, even when a change is small.

Upgrading Jenkins and plugins is an important sub case of this, where admins often do not have understanding of the impact. This decreases the willingness to upgrade, which in turn makes it difficult for the project to move forward more rapidly, and instead we get trapped with the long tail of compatibility burden.

Assembly Required

I’ve often described Jenkins as a bucket full of LEGO blocks — you can build any car you want, but everyone first has to assemble their own car in order to drive one.

As CI/CD has gone mainstream, this is no longer OK. People want something that works out of the box, something that gets people to productivity within 5 clicks in 5 minutes. Too many choices are confusing users, and we are not helping them toward “the lit path.” Everyone feels uncertain if they are doing the right thing, contributors are spread thin, and the whole thing feels a bit like a Frankenstein.

This is yet another problem we can’t solve by “writing more plugins.”

Reduced Development Velocity

This one is a little different from others that our users face, but nonetheless a very important one, because it impacts our ability to expand and sustain the developer community, and influences how fast we can solve challenges that our users face.

Some of these problems are not structural and rather just a matter of doing it (for example, Java 11 upgrade), but there are some problems here that are structural.

I think the following ones are the key ones:

As a contributor, a change that spans across multiple plugins is difficult. Tooling gets in the way, users might not always upgrade a group of changes together, reviewing changes is hard.
As a contributor, the tests that we have do not give me enough confidence to ship code. Not enough of them run automatically, coverage is shallow, and there just isn’t anything like production workload of real users/customers.

These core problems create other downstream problems, for example:

As a non-regular contributor, what I think of as a small and reasonable change takes forever and a 100 comments going back & forth to get in. I get discouraged from ever doing it again.
As a regular contributor, I feel people are throwing crap over the wall, and if they cause problems after a release, I’m on the hook to clean up that mess.
As a user, I get a half-baked change that wreaks havoc, which results in loss of their confidence to Jenkins, an even slower pace of change, etc. This is a vicious cycle as it makes us even more conservative, and slow down the development velocity.

Path Forward

In the past, my frustration and regret is that we couldn’t take on an effort of this magnitude. But that is NO MORE! As CTO of CloudBees, I’m excited that these challenges are important enough for CloudBees now that we want to solve these efforts within the Jenkins project.

I’ve been talking to many of you, and there are a number of existing efforts going on that touch this space already. From there, the vision emerged is that we organize around two key efforts:

Cloud Native Jenkins: a general purpose CI/CD engine that runs on Kubernetes, and embraces a fundamentally different architecture and extensibility mechanism.
Jolt in Jenkins: continue the incremental trajectory of Jenkins 2 today, but with renegotiated “contract” with users to gain what we really need, such as faster pace of development and better stability.

Cloud Native Jenkins

In order to solve these problems that we can’t solve incrementally, I’m proposing the “Cloud Native Jenkins” sub-project in the context of the Cloud Native SIG with Carlos, who is the leader of this SIG.

We don’t have all the answers, that’s something we’ll discuss and figure out collectively, but based on numerous conversations with various folks, I think there are many clear pieces of puzzles.

Kubernetes as the Runtime

Just like Java was the winning server application platform in the early 2000s, today, Kubernetes is the dominant, winning platform. Cloud Native Jenkins should embrace the paradigm this new platform encourages. For example,

Serverless / function-as-a-service build execution (ala Jenkinsfile runner) that are isolated.
Various pieces of functionalities deployed as separate microservices.
Services interacting through Kubernetes CRDs in order to promote better reuse and composability.

These are the design principles that enable highly desirable properties like infinite scalability, pay-as-you-go cost model, immutability, zero down time operability, etc.

New Extensibility Mechanism

We need to introduce a new mechanism of extensibility in order to retain the magic sauces, and continue our incredible ecosystem.

For example, microservice or container-based extensibility avoids the service instability problem (ala Knative builder and the userspace-scm work.) Pipeline shared libraries is another example that concretely shows how extensibility mechanism can go beyond plugin, though it hasn’t fully flourished as one just yet.

Data on Cloud Managed Data Services

The long-term data storage must be moved from the file system to data services backed by cloud managed services, in order to achieve high availability and horizontal scalability, without burdening admins with additional operational responsibilities.

Configuration as Code

Jenkins Configuration as Code has been incredibly well received, in part because it helps to solve some of the brittle configuration problems. In Cloud Native Jenkins, JCasC must play a more central role, which in turn also helps us reduce the surface area for Blue Ocean to cover by eliminating many configuration screens.

Evergreen

Jenkins Evergreen is another well received effort that’s already underway, which aims to solve the brittleness problem and developer velocity problem. This is a key piece of the puzzle that allows us to move faster without throwing users under the bus.

Secure by Default Design

Over the past years, we’ve learned that several different areas of Jenkins codebase, such as Remoting, are inherently prone to security vulnerabilities because of their design. Cloud Native Jenkins must address those problems by flipping those to “secure by design.”

Following Footsteps of Jenkins X

Jenkins X has been pioneering the use of Jenkins on Kubernetes for a while now, and it has been very well received, too. So naturally, part of the aim of Cloud Native Jenkins is to grow and morph Jenkins into a shape that really works well for Jenkins X. Cloud Native Jenkins will be the general purpose CI/CD engine that runs on Kubernetes, which Jenkins X uses to create an opinionated CD experience for developing cloud native apps.

All The Same Good Things, with New Foundation

And then on top of these foundations, we need to rebuild or transplant all the good things that people love about Jenkins today, and all the good things people expect, such as:

Great “batteries included” onboarding experience for new users, where we are present in all the marketplaces, 5 clicks to get going and easy integration with key services.
Modern lovable UX in the direction of front-end web apps that Blue Ocean pioneered.
General purpose software that is useful for all sorts of software development.

Cloud Native Jenkins MVP

As I wrote, a number of good efforts are already ongoing today. Thus in order to get this effort off the ground, I believe the first MVP that we aim toward is pretty clear, which is to build a function-as-a-service style Jenkins build engine that can be used underneath Jenkins X.

Cloud Native Jenkins MVP combines the spirits of Jenkins Pipeline, Jenkins Evergreen, Jenkinsfile Runner, and Jenkins Configuration as Code. It consists of:

Webhook receiver: a service that receives webhooks from GitHub and triggers a build engine.
Build Engine: take Jenkinsfile Runner and evolve it so that it can run as a “function” that carries out a pipeline execution, with some CasC sprinkled together in order to control Jenkins configuration and plugins used. This way, Jenkinsfile works as-is for the most part.
Continuously delivered through Evergreen: It allows us to solve the combinatorial version explosion problem, allow us to develop changes that span multiple plugins faster, and develop changes more confidently. Of all the projects out there, ours should be the community that believes in the value of Continuous Delivery and Evergreen is how we bring continuous delivery to the development of Cloud Native Jenkins itself.

This solves some of the key challenges listed above that are really hard to achieve today, so it’s already incredibly useful.

The catch is that this MVP has no GUI. There’s no Blue Ocean UI to look at. No parsing of test reports, no build history. It uses no persistent volumes, it keeps no record of builds. The only thing permanent at the end of a build is whatever data is pushed out from Jenkins Pipeline, such as images pushed to a Docker registry, email notifications, and GitHub commit status updates. Load of other features in Jenkins will not be available here.

This is not that far from how some sophisticated users are deploying Jenkins today. All in all, I think this is the right trade off for the first MVP. As you can see, we have most of the pieces already.

From here, the build engine will get continuously more polished and more cloud native, other services will get added to regain features that were lost, new extensibility will get introduced to reduce the role of current in-VM plugins, and so on.

Jolt in Jenkins

Cloud Native Jenkins is a major effort and in particular initially it’s not usable for everyone; it only targets a subset of Jenkins functionalities, and it requires a platform whose adoption is still limited today. So in parallel, we need to continue the incremental evolution of Jenkins 2, but in an accelerated speed. Said differently, we need to continue to serve the majority of production workload on Jenkins 2 today, but we are willing to break some stuff to gain what we really need, such as faster pace of development and better stability, in ways that were previously not possible. This requires us injecting a jolt in Jenkins.

Release Model Change

The kind of jolts that we need will almost certainly means we need to renegotiate the expectation around new releases with our users. My inspiration source is what happened to the development of Java SE. It changed the release model and started moving faster, by shedding off more pieces faster, in ways that they haven’t done before. Again, Jenkins Evergreen is the key piece that achieves this without throwing users under a bus, for the reasons I described in the Cloud Native MVP above.

Compatibility

This jolt is aimed to put us on a different footing, one where our current “forever compatibility” expectation does not hold. If that requires us to use a new major version number, such as Jenkins 3, or new major version number every N months, I’m open to that.

Of course, whatever move we do has to make sense to users. The accelerated pace of value delivery needs to justify any inconvenience we put on users, such as migration, breaking changes, and so on.

In practice, what that means is that we need to be largely compatible. We have to protect users’ investment into their existing job definitions as much as possible. We continue to run freestyle jobs, etc…

Ingredients

Other proposals CloudBees is putting forward with the intent to staff the effort are: * Configuration as Code: accelerate that and make it a more central part of Jenkins. * Developer experience improvements through buildpack style auto-detection of project types. * Continued evolution of Jenkins Pipeline There’s an effort going on to remove CPS execution of Pipeline and isolate any failures during pipeline execution. Continue to evolve Jenkins Pipeline toward the sweet spot that works well with the Cloud Native Jenkins effort. Continued tactical bug-by-bug improvements of Pipeline. * Evergreen: I already talked about this above. * Plugin spring cleaning: let’s actively guide users more toward the sweet spot of Jenkins and reduce our feature surface area, so that we can focus our contributors’ effort to important parts of Jenkins. I expect this to be a combination of governance and technical efforts. * Table stakes service integration:** let’s look at what kind of tablestake tool/service integrations today’s user need, and see if we are meeting/exceeding the competition. Where we fall short, let’s add/reimplement what are needed.

UI Effort

The Web UI will be likely done differently in Cloud Native Jenkins, as its own app and not a plugin in Jenkins. JCasC will also play a bigger role in Cloud Native Jenkins, reducing UI surface area from Jenkins.

Given that, CloudBees will reconsider where to spend its effort in Blue Ocean. The current work where parts of Blue Ocean are made reusable as NPM modules is one example that aligns well with this new vision.

Conclusion

This document lays out the key directions and approaches in a broad stroke, which I discussed with a number of you in the past. Hopefully, this gives you the big picture of how I envision where to move Jenkins forward, not just as the creator of Jenkins but as the CTO of CloudBees, who employs a number of key contributors to the Jenkins project.

Come meet Kohsuke and chat with him about the direction of Jenkins at Jenkins World on September 16-19th, register with the code JWFOSS for a 30% discount off your pass.

Google’s Ad Tracking Knows Every Mastercard Purchase You Make, Online or Off

Google tracks Mastercard purchases to see if online advertisements prompt real-world sales. The deal is worth millions of dollars, with Mastercard basically selling customer data to Google.

The Best iPhone Armbands And Waist Belts For Running And The Gym

Whether you use your iPhone to track your workout, listen to music, or more, there’s a good chance it’s coming with you on a run o…

Click Here to Continue Reading

Your Google Assistant Can Now Understand Two Languages At a Time

Your home may be bilingual, but until today your Google Assitant wasn’t.

Click Here to Continue Reading

What Is A URL (Uniform Resource Locator)?

When you type an address into your web browser, a lot of things happen behind the scenes. And most of that is determined by the various parts of the URL you typed. Let’s take a closer look.

Can Opener

Using tin cans as a method of food preservation started as early as the 1770s and a patent was issued for tin food cans in 1810, but the first dedicated can openers weren’t invented until the 1850s. Prior to the advent of the can opener, people used hammers and chisels to open their cans.

How to Batch Resize Images with Automator on Mac

Having to open a whole bunch of images just to resize and save them again is a pain. Luckily, Apple’s built-in Automator tool can streamline this process, letting you select a group of images and resize them all at once—automatically.

The Best Electronics Kits for Kids Of All Ages

Whether your child already has an interest in electronics, or you’d like to foster one,…

Click Here to Continue Reading

How to Recover Your Images from Your Lightroom Preview Files

RAW image files are huge so Adobe Lightroom saves preview JPEG files to speed things up. If worst comes to worst and you lose your originals, you might be able to recover something from the previews.

Why Microsoft’s Office 365 is a Great Deal

Microsoft’s Office 365 service has been a great deal for a long time, and it’s getting better. Starting October 2, 2018, Office 365 Home will let six users install an unlimited number of Office applications.

US satellite startup that chose ISRO for launch is back online after defying US officials

The Federal Communications Commission forced Swarm to disable the satellites and warned that the company’s long-term plans to build a type of space internet were in peril.

Uber plans Air taxis in Indian metros by 2023

The taxi aggregator Uber is looking to start its ridesharing service Uber Air in Delhi, Mumbai and Bengaluru by 2023.

With new government policy, a big opportunity awaits Indian drone industry

While recreational drone usage will pick up, what the policy will really open up is the largely untapped business-to-business (B2B) market

Greaves Cotton to acquire 67% stake in EV maker Ampere for Rs 77 Cr

This deal arrives nearly two years after Hero MotoCorp pumped in over Rs 200 crore for a minority stake, triggering interest in the nascent sector.

How to Create, Set Up, and Manage Your Discord Server

Discord is a quickly growing text and voice chat application, aimed at gamers in particular. Its sleek and simple design makes it an excellent alternative to older apps like Teamspeak and Skype. Discord has taken a lot of inspiration from Teamspeak’s extensive customization and management options but has buried some of those options within the interface. Luckily, it’s pretty simple to get started.

The Best MicroSD Cards For Drones, Dash Cams, and Action Cams

When is a memory card not just a memory card? When you need it to do more than sit in a camera.

Click Here to Continue Reading

Thursday, 30 August 2018

Effectively using Kubernetes plugin with Jenkins

This is a guest blog by Niklas Tanskanen, consultant at Eficode.

Kubernetes, the container orchestration platform is rapidly becoming popular. There are more and more workloads that you can run on top of Kubernetes. It’s becoming an enabling layer of your Hyper-convergenced infrastructure.

If you set up Kubernetes as a Cloud provider in Jenkins, you’ll get a very powerful couple for running your workloads. To do that, you can simply install Kubernetes plugin. Kubernetes is able to run your Jenkins workloads as long as they are run in container. And containers are an awesome way if your workload is a build, because you can pack all your application and OS dependencies in a container and then run it anywhere!

Let’s imagine that you have been running a Kubernetes cluster setup in your organisation for a while now. First it was all about proof of concept but now its becoming more popular within your developers and you have to think about scaling and orchestration. Resource quotas are a part of that and every responsible operator should set those up both in both development and production clusters. Otherwise people will be lazy and just reserve all the resources of your cluster without actually using those resources for anything. By introducing quotas into your cluster, you can control how many resources should each namespace have.

Quotas are a mature feature of Kubernetes already. You have the possibility to create very fine grained quotas for different hardware resources, whenever it’s fast disk, GPUs or CPU time. You can also specify multiple scopes of quota per one namespace. For example, you can have a quota for workloads that are to be run to the infinity like web servers or databases. Or have quota for workloads that are short lived like builds or test automation runs.

Table 1. Scopes
Scope	Description
`Terminating`	Match pods where `.spec.activeDeadlineSeconds >= 0`
`NotTerminating`	Match pods where `.spec.activeDeadlineSeconds is nil`
`BestEffort`	Match pods that have best effort quality of service.
`NotBestEffort`	Match pods that do not have best effort quality of service.

Different scopes of Kubernetes quota

Since Jenkins is all about running short workloads, you should aim for the Terminating scope of quota. But how do you specify workloads in Jenkins so that correct scope is used?

If you were to do this in Kubernetes, you have to specify .spec.activeDeadlineSeconds. The same field can also be specified by the Kubernetes plugin when you are specifying a Pod Template.

Figure 1. Specifying .spec.activeDeadlineSeconds in the Kubernetes plugin

Same configuration is available in the Jenkinsfile as well if you don’t like static configurations.

podTemplate(label: 'maven', activeDeadlineSeconds: 180, containers: [
    containerTemplate(name: 'maven', image: 'maven:3.5.4-jdk-10-slim')
  ]) {
  // maven magic
}

This was just a small sample of features of the Kubernetes plugin in Jenkins. For more, be sure to check out our talk where we share more of how you can utilise Kubernetes with Jenkins!

Come see Niklas Tanskanen and many other Jenkins experts and contributors at Jenkins World on September 16-19th, register with the code JWFOSS for a 30% discount off your pass.

Geek Trivia: Similar To The Y2K Bug, There’s A Looming Date Bug Triggered By The Year?

Think you know the answer? Click through to see if you're right!

Firefox 65 Will Block Cross Site Tracking

An upcoming version of Firefox will block cross site tracking by default.

Why People Named “Weiner,” “Butts,” and “Dikshit” Have Trouble Creating Accounts Online

Trolls delight in making up “hilarious” fake names, so websites try to filter certain words for new accounts. What if your real name contains one of those words?

eufy BoostIQ RoboVac 11S Review: A Practical (If Not Particularly Intelligent) Robotic Vacuum

Vacuuming isn’t the most thrilling domestic task which is why robot vacuums are so appealing.

Click Here to Continue Reading

How to Find and Replace Formatting in Microsoft Word

Microsoft Word’s Find and Replace feature isn’t just for replacing text. You can also use Find and Replace to locate specific types of formatting throughout your document and even replace that formatting with something else.

Students Can Get Spotify Premium, Hulu, and Showtime in $5 a Month Bundle

US students can get Spotify Premium for $5 a month, with Hulu and Showtime thrown in.

Introducing Cluster-scoped init scripts

Introduction

This summer, I worked at Databricks as a software engineering intern on the Clusters team. As part of my internship project, I designed and implemented Cluster-scoped init scripts, improving scalability and ease of use.

In this blog, I will discuss various benefits of Cluster-scoped init scripts, followed by my internship experience at Databricks, and the impact it had on my personal and professional growth.

Cluster-scoped init scripts

Init scripts are shell scripts that run during the startup of each cluster node before the Spark driver or worker JVM starts. Databricks customers use init scripts for various purposes such as installing custom libraries, launching background processes, or applying enterprise security policies. These new scripts offer several improvements over previous ones, which are now deprecated.

Init Scripts are now part of the cluster configuration

One of the biggest pain points for customers used to be that init scripts for a cluster were not part of the cluster configuration and did not show up in the User Interface. Because of this, applying init scripts to a cluster was unintuitive, and editing or cloning a cluster would not preserve the init script configuration. Cluster-scoped init scripts addressed this issue by including an ‘Init Scripts’ panel in the UI of the cluster configuration page, and adding an ‘init_scripts’ field to the public API. This also allows init scripts to take advantage of cluster access control.

curl -n -X POST -H 'Content-Type: application/json' -d '{
  "cluster_id": "1202-211320-brick1",
  "cluster_log_conf": {
    "dbfs" : {
      "destination": "dbfs:/cluster-logs"
    }
  },
  "init_scripts": [ {
    "dbfs": {
      "destination": "dbfs:/databricks/<directory>/postgresql-install.sh"
    }
  } ]
}' https://<databricks-instance>/api/2.0/clusters/edit

Init Scripts now work for jobs clusters

Previous init scripts depended on storing the scripts in a folder with the cluster-name. This prevents them from being used in jobs clusters, where cluster names are generated on the fly. Since Cluster-scoped init scripts are part of the cluster configuration, they can be applied to jobs clusters as wel, with an identical interface via both the UI and API.

Environment variables for init scripts

Init scripts now provide access to certain environment variables that are listed here. This reduces the complexity of many init scripts that require access to information such as whether the node is a driver or executor and the cluster id.

Access Control for init scripts

Users can now provide a DBFS or S3 path for their init scripts, which can be stored at arbitrary locations. When using S3, IAM roles can be used to provide access control for init scripts, protecting against malicious or mistaken access/alteration to the init scripts. Read more details on how to set this up here.

Simplified logging

Logs for Cluster-scoped init scripts are now more consistent with Cluster Log Delivery and can be found in the same root folder as driver and executor logs for the cluster.

Additional cluster events

Init Scripts now expose two new cluster events: INIT_SCRIPTS_STARTED and INIT_SCRIPTS_FINISHED. These help users determine the duration of init scripts execution and provide additional clarity as to the state of the cluster at a given moment.

Conclusion

Working on this project exposed me to the process of designing, implementing and testing a customer-facing feature. I learned how to write robust, maintainable code and evaluate execution semantics. I remember my distributed systems professor claiming that a good design can simplify engineering effort by orders of magnitude, resulting in shorter, cleaner code that is less prone to bugs. However, I never imagined that this point would be driven home just a few months later in an industry setting.

I found Databricks engineers to be extremely helpful, with a constant desire to learn and improve, as well as the patience to teach. The leadership is extremely open with the employees and is constantly looking for feedback, even from the interns. The internship program also had a host of fun activities, as well as educational events that allowed us to learn about other areas of the company (e.g., sales, field engineering, customer success).

Finally, I’d like to thank the clusters team for their encouragement and support throughout my project. A special shout out to my manager Ihor Leshko for always being there when needed, Mike Lin for completely changing the way I approach front-end engineering, and my mentor Haogang Chen for teaching me valuable technical skills that enabled me to graduate from writing simple, working code to building robust, production-ready systems.

Try Databricks for free. Get started today.

The post Introducing Cluster-scoped init scripts appeared first on Databricks.

The Million-Piece Lego Bugatti Can Reach a Top Speed of Almost 19 Miles Per Hour

As part of a promo for the Italian Grand Prix, Lego built a functional Bugatti Chiron made from Technic pieces.

Click Here to Continue Reading

What Is a WMA File (and How Do I Open One)?

A file with the .wma file extension is a Windows Media Audio (WMA) file. Microsoft created the format to avoid the licensing issues associated with the MP3 format.

How to Run macOS Mojave in Parallels For Free

Want to give macOS Mojave a spin, but don’t feel ready to upgrade from High Sierra? You can quickly set Mojave up in a virtual machine, for free.

How to Change or Rename the Active Network Profile Name in Windows 10

Windows 10 automatically creates a network profile when you connect to a network. Ethernet networks are named something like “Network,” while wireless networks are named after the SSID of the hotspot. But you can rename them with a simple Registry hack or local security policy setting.

What Happens to Your Smarthome If the Internet Goes Down?

An internet connection can be the thing that separates a smarthome device from a dumbhome device. If the internet goes out at your place, what exactly happens to your smarthome devices? Do they turn into bricks, or are they still somewhat functional?

Mario Mario

The surname of iconic Nintendo character Mario is also Mario, making his full name Mario Mario.

Google Titan Security Key Review: Two Great Keys For The Price Of One

Google’s getting into the two-factor security key game with their freshly released Titan Security Key Bundle.

Click Here to Continue Reading

Day of Jenkins, and other chances to meet JCasC

The Jenkins Configuration as Code plugin is reaching a stage when it is almost ready to be used in a production environment. As a matter of fact, I know some living-on-the-edge users are already doing that. The first release candidates are out and the official 1.0 is just around the corner.

I’d like to use this chance to invite you to meet us and contribute to the plugin. There will be plenty opportunities this autumn.

Jenkins Configuration as Code (also called "JCasC") is a Jenkins plugin that allows you to store and maintain all your Jenkins configuration in yaml file. It’s like Pipeline or Job DSL but for managing Jenkins.

In one of my blogposts, Jenkins Configuration as Code - Automating an automation server, I provide a longer explanation of the plugin, and answer questions like “why did we decided to develop it?” and “why you may want to use it?”. I recommend you to read that one if you’re not familiar with the project yet.

The plugin has been presented at a number of meetups - by me but also other contributors. This is my first open source project that I’ve actively participated in and I’m quite shocked - positively - how many people decided to join the effort and actively develop the plugin with us. Now it’s time to take it to the bigger stage and broader audience. So together with Nicolas de Loof I’m gonna present the plugin at DevOps World | Jenkins World in San Francisco (19th of September) and in Nice (24th of October) - yes, Jenkins World is coming to Europe.

But that’s not all! Praqma - the company I work for - has organised a number of “Day of Jenkins” events around Scandinavia in past years. This October they have decided to bring the events back with a theme 2018 Day of Jenkins is Day of Jenkins [as code]. It’s a two track one day event - presentations and hands-on for users and hackathon for contributors - in that specific case Configuration as Code Plugin’s contributors.

Detailed agenda is available on the event page - Jenkins X, Jenkins Evergreen, Jenkins Confguration as Code and more waiting for you!

I really can’t wait to hear what Kohsuke has to say and to introduce you to the plugin during hands-on session I’ll run.

Hope to see you at least at one of those events!

Come meet the Configuration as Code contributors, Nicolas de Loof and Ewelina Wilkosz at Jenkins World on September 16-19th, register with the code JWFOSS for a 30% discount off your pass.

How to Find and Replace Text in Microsoft Word

Have you ever finishing typing a letter, report or presentation only to discover that you misspelled a person’s name or had the wrong company listed multiple times throughout your document? No worries—it’s an easy fix. Using Word’s Find and Replace feature, you can quickly locate and replace text. Let’s see how it works.

The Best Wi-Fi Routers For Gamers

Will an awesome premium router make you a better gamer? To be honest, no.

Click Here to Continue Reading

Wednesday, 29 August 2018

Painless Disaster Recovery using Hortonworks Data Lifecycle Manager

In the age of big data, information is power. Enterprises are analysing tremendous amounts of data in order to achieve significant competitive advantage, increase revenue or reduce risks. For example, the success of the Human Genome Project has demonstrated how a global community of scientists can collectively produce and use data to benefit scientific progress […]

The post Painless Disaster Recovery using Hortonworks Data Lifecycle Manager appeared first on Hortonworks.

Geek Trivia: The Most Expensive Rare Coin Ever Sold Was?

Think you know the answer? Click through to see if you're right!

Acer Ripped This Gaming Rig Right Out of the Cockpit of a Space Ship

Acer has lost its absolute mind.

Click Here to Continue Reading

For Sale: 1976 Apple 1. Still Works, Asking $300,000 OBO

An original Apple 1, hand-built by Steve Wozniak in 1976, is up for auction in September. It’s expected to sell for $300,000 or more.

The Best Budgeting Tools for Phones

Budgeting and money management is not-very-thrilling necessity.

Click Here to Continue Reading

How to Enable Dark Mode for YouTube

YouTube’s dark mode provides an easier-on-the-eyes viewing experience. It’s particularly nice when watching videos in the dark. YouTube’s dark theme is available on the YouTube website and in YouTube’s mobile apps for iPhone, iPad, and Android.

Windows’ Sticky Notes Will Finally Sync Between Computers

Love Sticky Notes in Windows, but wish notes synced between computers? Good news: that feature is coming.

How to Fix Your Mac’s Dock When It Gets Stuck

Sometimes, your Mac’s Dock might freeze up and stop working. It might also become glitchy, with app badges never going away or apps still showing after you close them. Here’s how to fix those problems.

How to Disable Your PC’s Touchpad When You Connect an External Mouse

While laptop touchpads can be useful—especially those that support gestures—they can also be annoying. They’re just too easy to accidentally hit when you’re typing. If you use an external mouse, it’s even more annoying because you don’t need the touchpad at all. Here’s how to disable the touchpad when you use an external mouse.

Xbox All Access Is the Perfect Deal For People Who Hate Deal Hunting

Microsoft has officially announced a financing plan that lets you pay monthly for a console, Xbox Live Gold, and Game Pass.

Click Here to Continue Reading

The Best Free Video Converters

If you watch videos on a variety of devices, its likely that you’ve run into compatibility issues. Your iPhone might record 4K video, but can your PlayStation or your smart TV play that video seamlessly? Luckily, there are many free video converters available that will help you convert and watch your favorite videos on the device of your choice. Here are our top picks.

What Is a Checksum (and Why Should You Care)?

A checksum is a sequence of numbers and letters used to check data for errors. If you know the checksum of an original file, you can use a checksum utility to confirm your copy is identical.

Yogi Bear Influence

The animated character Yogi Bear’s style of speaking and mannerisms were modeled after The Honeymooners character Ed Norton, played by Art Carney.

Google bets on AI to tackle floods, diseases

Earlier this year, Google had partnered with India's Ministry of Water Resources on a pilot for flood warning based on AI and ML.

Uber Elevate may launch aerial taxi service in India

Uber Elevate, the US ride-sharing major’s aerial taxi arm, is considering India as one of the five countries where it wants to launch the service. An Uber team conveyed this to Union minister of state for aviation Jayant Sinha during a meeting in Delhi on Tuesday.

Do You Need Fast Internet Speeds for Your Smarthome Devices?

A lot of smarthome devices require an internet connection for advanced functionalities to work correctly, but does that internet connection need to be super fast? Here’s what you need to know.

5 Great Bread Makers For Enjoying A Delicious Loaf Every Time

Nothing quite beats the aroma or taste of freshly baked bread.

Click Here to Continue Reading

Tuesday, 28 August 2018

Geek Trivia: As Part Of Its Mosquito Fighting Arsenal, Disney World Uses Which Of These Domestic Animals?

Think you know the answer? Click through to see if you're right!

Be First. Done Right. AIM for Success with Hortonworks Professional Services

Data is an asset. Data is the digital currency. Data is the core of your business, many say the foundation, some say both. The list goes on and the market doesn’t seem to run out of terms to describe how invaluable data is to modern organizations. However, nothing is worthwhile for discussion if not put […]

The post Be First. Done Right. AIM for Success with Hortonworks Professional Services appeared first on Hortonworks.

There’s a New Smaller Version of That Blue Yeti Microphone Every Podcaster and YouTuber Uses

The Blue Yeti microphone has been a standard for high-quality home audio.

Click Here to Continue Reading

Vintage Nickelodeon Offered on Yet Another $10 a Month Streaming Service

Classic Nickelodeon cartoons are now offered on VRV, a $10 a month streaming service from AT&T.

The Best Inexpensive Monitors

So you’d like to get a monitor for your laptop, or perhaps expand your desktop to two.

Click Here to Continue Reading

How to Compress PDFs and Make Them Smaller

PDFs can get pretty big, especially if you’re adding lots of images and objects. If you’ve created a PDF that’s too big—maybe you’re trying to email it or maybe it just takes too long to load—here’s how you compress your PDF to a smaller size.

Free Download: Write is a Word Processor for Handwriting

Windows/Mac/Linux/Android: Love the feel of writing by hand, but wish you could use features like copy/paste and undo? Write is a free tool that lets you do just that.

How (and Why) to Use Hidden Text in a Word Document

Word lets you hide text so you can read or print your document as if the text isn’t there. This might seem pointless—why not just remove the text if you don’t want someone to read it—but hidden text does have some interesting uses. Let’s take a look at what hidden text is (and what it isn’t), why you might want to hide text, and how to do it.

What is “rpcsvchost” and Why it Running on my Mac?

You find something called rpcsvchost while using Activity Monitor to see what’s running on your Mac. What is this process, and should you be worried? In a word, no: rpcsvhost is a core part of macOS.

The Best Everyday Pen For The Office And Beyond

More and more of everyday life is going digital but it’s still not 100% there yet.

Click Here to Continue Reading

Should You Buy Crop Sensor Specific Camera Lenses?

Digital cameras have two primary sensor formats: full frame (or 35mm) cameras where the sensor is roughly the same size as a 35mm film frame and crop sensor (or APS-C) cameras where the sensor is just under 2/3 the size. Lenses designed for full frame cameras work on crop sensor cameras, but using crop sensor lenses on full frame cameras is either impossible (Canon) or comes with some serious compromises (Nikon and Sony). If you’ve got a crop sensor camera, it can be tempting just to buy crop lenses, but it’s not always the best idea.

How to Protect Yourself From Public USB Charging Ports

USB charging ports in public places are convenient but possibly risky. Data is transferrable over a USB connection, so plugging your phone into an unknown charging port puts it at risk.

Gumby Origin

The iconic 1950s era claymation character Gumby was the inspired result of a 1953 three-minute student film called Gumbasia created by Art Clokey while he attended the University of Southern California. Later, in 1955, the pilot episode of Gumby was seen by an NBC executive, which quickly led to it being picked up and turned into a popular children’s television show.

By Customer Demand: Databricks and Snowflake Integration

Written By: Bill Chambers and Harsha Kapre

Today, we are proud to announce a partnership between Snowflake and Databricks that will help our customers further unify Big Data and AI by providing an optimized, production-grade integration between Snowflake’s built for the cloud-built data warehouse and Databricks’ Unified Analytics Platform.

Over the course of the last year, our joint customers such as Rue Gilt Groupe, Celtra, and Overstock.com asked for a tighter integration and partnership between our two companies.

These and many other customers that already use our products together, have shared their use cases and experiences and have provided amazing feedback. While both products are best-in-class and are built as cloud-first technologies, our customers asked for improvements around performance and usability in the connector. In response to our joint customers’ feedback, we’re happy to introduce the optimized, production-ready Databricks-Snowflake Connector built right into the Databricks Runtime.

What’s the solution?

It’s simple, this connector brings together the best of breed technologies, so you can have industry leading ETL, data warehousing, and machine learning without needing to worry about initial or ongoing configuration and setup. Concretely, Databricks and Snowflake now provide an optimized, built-in connector that allows customers to seamlessly read from and write data to Snowflake using Databricks.

This integration greatly improves the experience for our customers who get started faster with less set-up, stay up to date with improvements to both products automatically. Additionally, Snowflake’s automatic query pushdown can pushdown certain queries into Snowflake. This removes all the complexity and guesswork in deciding what processing should happen where. With the optimized connector, the complex workloads are processed by Spark and Snowflake processes the workloads that can be translated to SQL. This can provide benefits in performance and cost without any manual work or ongoing configuration.

For Stephen Harrison, architect at flash online retailer Rue Gilt Groupe, this means that “since we use Snowflake as our primary data source for accessing all information about our members and products, [with the Databricks-Snowflake connector] it is seamless to directly connect to our data warehouse, directly import to Spark without any time-consuming ETL processes, and write back to Snowflake directly.”

Now let’s see the connector in action!

Using Databricks and Snowflake

First, you’ll need a Snowflake Account and a Databricks account. Once you’ve logged into Databricks, ensure you’ve created a cluster in Databricks, using Databricks Runtime 4.2 or later, and a virtual warehouse (compute cluster) in Snowflake. It’s also worth double-checking that your Snowflake and Databricks accounts are in the same region to get the best performance and lowest cost. You’ll want to set your Snowflake virtual warehouse to auto-suspend and auto-resume so you only pay for what you use.

That’s it! No library to load and no configurations to manage.

We’ve abbreviated some of the code in this blog, but you can follow along with the code snippets in a Databricks Notebook here. There’s a lot more detail about different pieces of functionality, so it’s worth checking out!

Configuring the Connection

First we’re going to need to configure the connection. To do this, we’re going to leverage the Databricks Secrets API to securely store and encrypt the credentials we’re using to access Snowflake.


val user = dbutils.secrets.get("data-warehouse", "snowflake-user")

val password = dbutils.secrets.get("data-warehouse", "snowflake-password")

Once we do that, we can set our options for reading and writing this data.


val options = Map(

"sfUrl" -> ".snowflakecomputing.com/", // replace this with your own connection information

"sfUser" -> user,

"sfPassword" -> password,

"sfDatabase" -> "demo",

"sfSchema" -> "databricks_demo",

"sfWarehouse" -> "DEMO_CLUSTER"

)

In Python, it would look like something like the following.


# use the secrets API to set these values

user = dbutils.secrets.get("data-warehouse", "snowflake-user")

password = dbutils.secrets.get("data-warehouse", "snowflake-password")

# replace this with your own connection information

options = {

"sfUrl": ".snowflakecomputing.com/",

"sfUser": user,

"sfPassword": password,

"sfDatabase": "demo",

"sfSchema": "databricks_demo",

"sfWarehouse": "DEMO_CLUSTER",

}

ETL’ing Data into Snowflake

Loading data into Snowflake requires simply loading it like any other data source. There’s no library to load or Spark (or Snowflake Connector) version to worry about – the connector is built-in!


// In this example, we’re reading CSV data

val df = spark.read.format("csv").option("header", "true").load("/databricks-datasets/adult/adult.data")

// and writing the data into Snowflake

df.write.format("snowflake")

.options(options)

.option("dbtable", "adult")

.mode("append")

.save()

In the above example, we’ve only done a simple write. However, many customers leverage Databricks to perform complex transformations on structured and semi-structured, data to load into Snowflake for downstream analytics or BI applications. With Snowflake, you get the added benefit of native JSON support which means no transformations required on your JSON data. Now that we’ve loaded the data, let’s query it in Snowflake.

Querying data in Snowflake

Upon loading the data, it’s simple to query in Snowflake. After enabling a Snowflake virtual warehouse, simply open up a Snowflake worksheet and immediately query the data. Here’s a simple query you can run to manipulate the data:


SELECT * FROM adult limit 5;

Upon hitting Run, you’ll see something such as the following.

With the data now loaded into Snowflake, business analysts can leverage tools such as SnowSQL to query the data and run a number of business intelligence applications against the data. Users can also leverage Snowflake Data Sharing to share this data in real time and in a secure manner with other parts of their organization or with any of their partners that also use Snowflake. It’s also easy to connect BI tools such as Tableau or Looker to your Snowflake warehouse, allowing analysts to query large amounts of data stored in Snowflake.

Snowflake is an excellent repository for important business information, and Databricks provides all the capabilities you need to train machine learning models on this data by leveraging the Databricks-Snowflake connector to read input data from Snowflake into Databricks for model training.

Training a Machine Learning Model

To train a machine learning model, we leverage the Snowflake connector to pull the data stored in Snowflake. You’ll notice that it follows the same structure as other Spark Data Sources. Since it’s integrated with the Databricks Runtime, it’s zero-configuration and production ready.


dataset = spark.read.format("snowflake")\

.options(**options)\

.option("dbtable","adult").load()

However, there are times when you might want to limit the data pulled from the table for performance or efficiency’s sake. To do so, run arbitrary queries using the Snowflake connector. For instance, filter down to the relevant rows on which you want to train your ML algorithm.


spark.read.format("snowflake")\

.options(**options)\

.option("query", "SELECT workclass, marital_status FROM adult where EDUCATION=’ Bachelors’").load()

This is a simple example of how the Databricks-Snowflake Connector will automatically pushdown any predicates and even expressions into Snowflake that it can meaning you’ll get optimized performance right out of the box.

Preprocessing and Feature Generation

Now that we’ve loaded the data, we can go about defining our machine learning model transformations inside Databricks. For instance, here we’ll define a pipeline that converts categorical variables into Indexed and One Hot Encoded variables for input into our machine learning algorithm.


from pyspark.ml import Pipeline

from pyspark.ml.feature import OneHotEncoderEstimator, StringIndexer, VectorAssembler

categoricalColumns = ["WORKCLASS", "EDUCATION", "MARITAL_STATUS", "RELATIONSHIP", "RACE"]

numericCols = ["AGE", "FNLWGT", "EDUCATION_NUM", "CAPITAL_GAIN", "CAPITAL_LOSS", "HOURS_PER_WEEK"]

assemblerInputs = [c + "classVec" for c in categoricalColumns] + numericCols

all_columns = categoricalColumns + numericCols

stages = []

for categoricalCol in categoricalColumns:

stringIndexer = StringIndexer(inputCol=categoricalCol, outputCol=categoricalCol + "Index")

encoder = OneHotEncoderEstimator(inputCols=[stringIndexer.getOutputCol()], outputCols=[categoricalCol + "classVec"])

stages += [stringIndexer, encoder]

label_stringIdx = StringIndexer(inputCol="INCOME", outputCol="LABEL")

assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")

stages += [label_stringIdx, assembler]

We now have a preprocessed dataset to train our machine learning algorithms.


pipeline = Pipeline(stages=stages)

pipelineModel = pipeline.fit(dataset)

result = pipelineModel.transform(dataset)

selectedcols = ["label", "features"] + all_columns

dataset = result.select(selectedcols)

(trainingData, testData) = dataset.randomSplit([0.7, 0.3], seed=100)

print(trainingData.count())

print(testData.count())

Once we’ve done our train-test split, we can now train and evaluate our model using cross validation and a Random Forest Classifier.


from pyspark.ml.classification import RandomForestClassifier

from pyspark.ml.evaluation import BinaryClassificationEvaluator

from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

paramGrid = (ParamGridBuilder()

.addGrid(rf.maxDepth, [2, 4, 6])

.addGrid(rf.maxBins, [20, 60])

.addGrid(rf.numTrees, [5, 20])

.build())

cv = CrossValidator(estimator=rf, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=5)

cvModel = cv.fit(trainingData)

predictions = cvModel.transform(testData)

evaluator.evaluate(predictions)

Lastly, we can keep our best model and make predictions with it.


bestModel = cvModel.bestModel

finalPredictions = bestModel.transform(dataset)

evaluator.evaluate(finalPredictions)

Now that we trained this model and evaluated it, we can save the results back into Snowflake for analysis. Doing so is as simple as using the connector again as shown in the notebook.

Seeing the Results in Snowflake

In this case, we can easily query our table called adult_results, and users can even access the raw probabilities for each output class.

Conclusion

Databricks and Snowflake provide a best-in class solution for bringing together Big Data and AI by removing all the complexity associated with integration and automating price performance through automatic query pushdown. In this post, we outlined how to use the Databricks-Snowflake Connector to read data from Snowflake and train a machine learning model without any setup or configuration. We then wrote both the unprocessed data as well as the machine learning model’s results back into Snowflake, making it available for immediate analysis.

We’ve already had dozens of customers succeed with these two products, building end-to-end pipelines to derive value from data. We look forward to seeing more customers succeed and we’ll be doing a lot more together in the near future!

Get all the latest information at www.databricks.com/snowflake.

Learn more about Snowflake’s cloud-built data warehouse at Snowflake.com

Follow this tutorial in a Databricks Notebook.

Try Databricks for free. Get started today.

The post By Customer Demand: Databricks and Snowflake Integration appeared first on Databricks.

India's drone policy rules to delay e-commerce companies' liftoff plans

The machines can be deployed for commercial purposes but only within the line of sight of remote pilot, says aviation ministry.

India allows commercial operation of drones from December 2018

The drones can be used for a height up to 400 feet but within the line of sight of the operator.

Toyota to invest $500M in Uber to jointly work on self-driving cars

Uber will combine its autonomous driving system with Toyota's Guardian technology, which offers automated safety features such as lane-keeping but does not enable a vehicle to drive completely autonomously

Cryptocurrency tycoons will soon find out how rich they really are

The secretive bosses of Bitmain Technologies, Canaan and Ebang are all facing the prospect of public-market scrutiny for the first time as they pursue stock listings in Hong Kong.

Nokia secures 500 million euro EU loan for 5G development

5G mobile networks, which are still at an early stage, will offer data speeds up to 50 or 100 times faster than current 4G networks and serve as critical infrastructure for a range of industries, such as driverless cars.

5 Reasons to Attend Spark + AI Summit Europe 2018

The world’s largest event for the Apache Spark Community

By Singh Garewal Posted in COMPANY BLOG August 27, 2018

Spark + AI Summit Europe will be held in London on October 2-4, 2018. Check out the full agenda and get your ticket before it sells out! Register today with the discount code 5Reasons and get 25% off.

Knowledge Central

Those interested in Apache Spark and AI, novices and experts alike, can attest to the breadth and depth of knowledge available at these summits. Attendees find these an invaluable mechanism for sharing and gaining knowledge. This autumn the event will be in London and will offer tracks for personalized learning to meet the needs of developers, data scientists, practitioners and executives.

For this 13th summit, here are my five reasons why you should join us.

1. Keynotes from Distinguished Engineers, Academics and Industry Leaders

Distinguished engineers and academics (Matei Zaharia, Reynold Xin, Soumith Chintala, Ion Stoica, Michael Armbrust) and visionary industry leaders (Ali Ghodsi, Rohan Kumar, and Srini Varadarajan) in the big data and AI industries will share their vision of where Apache Spark and AI are heading in 2018 and beyond.

2. Superb Sessions = Excellent Learning

To support your learning the summit will comprise nine tracks: AI, Data Science, Deep Learning Techniques, Productionizing ML, Developer, Enterprise, Research, Technical Deep Dives and Apache Spark Use Cases. You are sure to find many sessions of interest regardless of your expertise level or project focus.

3. Apache Spark Training

Update your skills and get the best training from Databricks’ best trainers, who have trained over 4,200 summit attendees. A day dedicated to training, you can choose from five courses and stay abreast with the latest in Spark 2.3 and Deep Learning: Data Science with Apache Spark; Understand and Apply Deep Learning with Keras, TensorFlow, and Apache Spark; Apache Spark Tuning and Best Practices; Apache Spark Essentials; and Databricks Delta. Depending on your preference, you can choose to register for each class on either AWS or Azure cloud. Plus, we will offer half-day Databricks Developer Certification for Apache Spark prep course after which you can sit for the exam on the same day. Get Databricks Certified!

4. Apache Spark Meetup

Apache Spark Meetups are reputed for tech-talks. At summits’ meetups, you learn what other Spark developers from all over are up to, mingle and enjoy the beverages and camaraderie in an informal setting, and ask burning questions. Be sure to check out the Apache Spark+AI Summit meetup at 6 PM on Tuesday, October 2.

5. London: the City known as The City!

London is a vibrant, cosmopolitan city regarded by some as the best city in the world. Take time out to enjoy theatre, shopping, dining, historical attractions and much more…

We hope to you see you in London!

What’s Next

With only five weeks left, tickets are selling fast. If you haven’t yet, register today with the discount code 5Reasons and get 25% off.

Try Databricks for free. Get started today.

The post 5 Reasons to Attend Spark + AI Summit Europe 2018 appeared first on Databricks.

Amazon’s Fights With Apple and Google Annoy Pretty Much Everyone

Apple wants 30 percent cut of all in-app purchases. Amazon doesn’t want to give Apple 30 a percent cut of Kindle book sales.

Announcing Databricks Runtime 4.3

I’m pleased to announce the release of Databricks Runtime 4.3, powered by Apache Spark. We’ve packed this release with an assortment of new features, performance improvements, and quality improvements to the platform. We recommend moving to Databricks Runtime 4.3 in order to take advantage of these improvements.

In our obsession to continually improve our platform’s performance, the Databricks Runtime 4.3 release benefits from substantial performance gains over previous versions of the Databricks Runtime. When running performance benchmarks using TPC-DS at 1 Terabyte scale, we’re showing:

16% performance improvement on AWS: This is a result of improvements to data skipping and optimal shuffle placement.
55% performance improvement on Azure: This is largely due to enabling caching and internal performance optimizations.

In addition to the performance improvements, we’ve also added new functionality to Databricks Delta:

Truncate Table: with Delta you can delete all rows in a table using truncate. It’s important to note we do not support deleting specific partitions. Refer to the documentation for more information: Truncate Table
Alter Table Replace Columns: Replace columns in a Databricks Delta table, including changing the comment of a column, and we support reordering of multiple columns. Refer to the documentation for more information: Alter Table
FSCK Repair Table: This command allows you to Remove the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system. This can happen when these files have been manually deleted. Refer to the documentation for more information: Repair Table
Scaling “Merge” Operations: This release comes with experimental support for larger source tables with “Merge” operations. Please contact support if you would like to try out this feature.

We’ve added some improvements to Structured Streaming that I’d also like to highlight:

We now support streaming writes using the Azure SQL Data Warehouse Connector.
Support for foreachBatch() in Python (already available in Scala). See foreach and foreachBatch documentation for more details.
Changes to the watermark policy allow you to specify either a min or a max watermark when there are multiple input streams in a query instead of defaulting to the minimum timestamp. See the multiple watermark policy for more details.

To read more about the above new features and to see the full list of improvements included in Databricks Runtime 4.3, please refer to the release notes in the following locations:

Amazon Web Services: Databricks Runtime 4.3 release notes

Azure: Databricks Runtime 4.3 release notes

Try Databricks for free. Get started today.

The post Announcing Databricks Runtime 4.3 appeared first on Databricks.

Security Cameras Are Useless If They Can’t Identify Anyone

It’s important to consider the location of your security cameras carefully. While you might think you have every area covered, you’ll also want to make sure your cameras are close enough to the potential action to capture shots that let you identify people and vehicles.

The Best Instant Cameras For Fast Retro Snaps

Remember instant cameras and film? Owning a Polaroid camera was the coolest thing possible back in the day before camera phones.

Click Here to Continue Reading

Monday, 27 August 2018

Geek Trivia: Which Of These Cartoon Characters Started Life In An Educational Comic?

Think you know the answer? Click through to see if you're right!

Disney’s Streaming Service Will Be Cheaper Than Netflix, Possibly Named “Disney Play”

Disney’s top priority for 2019 is launching a streaming service. It will be cheaper than Netflix and possibly named “Disney Play.”

The Third-Generation August Smart Lock Is Cheaper Than Ever at $100 Today

The third-generation August Smart Lock caught our eye in our roundup of sm…

Click Here to Continue Reading

We Love Philips Hue Bulbs, But We’re Not Sold On Their New Light Fixtures

We like Philips Hue lights.

Click Here to Continue Reading

A Gmail-Style Side Panel Is Coming to Google Docs and Google Calendar

Google plans to add a sidebar to Google Calendar, Docs, Sheets, and Slides. Users will be able to create notes, tasks, and calendar appointments while editing documents.

Use the Navigation Pane to Easily Reorganize Microsoft Word Documents

Microsoft Word is packed with features that improve usability and workflow efficiency. Navigation Pane is a great example, and you can use it to navigate headings, search your document for text or objects, and even easily reorganize your documents.

How to Disable or Customize Autocorrect on Mac

Sometimes, autocorrect gets it wrong, replacing a word you meant to type with something completely different. You can customize it to fix these issues or disable it altogether.

You Should Pay Attention to These Android Manufacturers if You Care About Updates

The Android update landscape is a disaster that has plagued the OS for years. “Fragmentation” is a common complaint against Android, but some manufacturers are starting to take the necessary steps to correct this years-long problem.

How to Make Your Smart Home Tech Guest Friendly

Whether you’re having family over or renting out your place on AirBnb, making your smart home tech easy for your guests is a good idea.

Click Here to Continue Reading

How to Access HBO Now from the EU

HBO is famous for making great TV shows. Game of Thrones, The Sopranos, Silicon Vally, Entourage. Honestly, if not for Breaking Bad, as far as people in Europe are concerned, HBO is pretty much the only US channel worth watching. Let’s look at how to—somewhat—legally access it, no matter where you are.

What is Code Injection on Windows?

Code injection is common on Windows. Applications “inject” pieces of their own code into another running process to modify its behavior. This technique can be used for good or evil, but either way it can cause problems.

No 007 Movie In 2007

Ironically, despite how famous the James Bond franchise made the numbers “007”, there was no Bond movie released in 2007.

Inside Elon Musk's reversal on taking Tesla Private

The reversal — announced late Friday in a blog post, a day after he discussed it with directors — capped a tumultuous series of moves that drew in Wall Street’s biggest investment banks, prompted an investigation by regulators and raised questions on Musk’s leadership.

Kit-Kat Filler

The chocolate between the wafers of a Kit-Kat bar isn’t the same chocolate used on the exterior of the bars—it’s made from mashed up Kit-Kat bars that failed their quality checks because of exterior air bubbles, off-center wafers, other imperfections, or simply not being shiny enough.

Typewriter QWERTY Top Row

There are several long 10 letter words that you can spell using only the top row of a QWERTY format keyboard such as proprietor, repertoire, and—fittingly enough—typewriter.

Nintendo Name Translation

The name of Japanese game company Nintendo can be translated as “leave luck to heaven” and reflects the company’s origins in the 1880s as a playing card company.

Sunday, 26 August 2018

Geek Trivia: Long Before Virtual Game Achievements Were A Thing, Activision Would Mail Gamers?

Think you know the answer? Click through to see if you're right!

The Best Aftermarket Android Auto and Carplay Head Units for Your Car

If you love the idea of having Android Auto or Carplay in your ride, you don’t have to wait until it’s time to get a new vehicle t…

Click Here to Continue Reading

Space race 2.0: A low-down on the great flight

Even as the Trump Administration plans a contentious Space Force and Nasa looks to intensify its efforts to explore deep space, PM Modi has announced a human space mission by 2022.

How ISRO is gearing up for the human space flight mission

ISRO chairman K Sivan talks about the ambitious human spaceflight mission and what India will gain from it.

K VijayRaghavan on India's bet on mega science projects, challenges & more

"The only route open to our trained students/researchers is to become a professor. There could be exciting opportunities in the industry and startups." said K VijayRaghavan, principal scientific advisor to the Government of India

Reaching for stars: India bets big on mega, multi-country science projects

"The big shift is that India is moving from the periphery to the core of such mega global science projects", says RA Mashelkar former director general, CSIR.

Dark Patterns: When Companies Use Design to Manipulate You

Ever feel like you’re being prompted into going along with something you don’t want because better options aren’t clearly being presented? You probably just found a dark pattern.

Saturday, 25 August 2018

Geek Trivia: What Causes Wint-O-Green Candies To “Spark” In Your Mouth?

Think you know the answer? Click through to see if you're right!

The Best Console Controller Charging Docks

Modern wireless controllers are great….until they run out of juice.

Click Here to Continue Reading

It’s Finally Safe (And Affordable) To Buy Graphics Cards Again

It’s been a bit of a drag to be a PC gamer over the last year or so. Cryptocurrency miners gobbled up all of the graphics cards in an already somewhat niche market, sending prices for high-end and even mid-range GPUs skyrocketing. That has changed.

Tesla to remain as a public company after board meets with Elon Musk

"Given the feedback I've received, it's apparent that most of Tesla's existing shareholders believe we are better off as a public company," Musk wrote in a blogpost

How to Install a Hue Dimmer Switch Over an Existing Light Switch

When you have Philips Hue lights all across your house, physical light switches become less useful. If you want, you can hide them and use a Hue Dimmer Switch instead.

Friday, 24 August 2018

Picking Oddjob in Goldeneye 007 Is Cheating, So Stop Picking Him, Dustin

T-Mobile Hacked Again: Over 2 Millions Account Numbers and Addresses Potentially Leaked

Attackers may have compromised three percent of T-Mobile’s 77 million customers on Monday, revealing personal information like addresses, phone numbers, and account numbers.

The Best Robotics Kits for Kids

Over Half of New Netflix Content Next Month is Original

Netflix has long planned for a library that’s half original content, and two years later the production engine is up and running. This is the second month in a row where more than half of new content on the site is Original.

How to Reduce the Size of a Microsoft Word Document

Word documents can get huge, unusually long, complex documents with loads of embedded images, fonts, and other objects. But it also seems like documents can grow out of hand for seemingly no reason at all. If you’re dealing with a huge document, here are some things you can try to reduce its file size.

How to Enable Dark Mode for Google Chrome

Google Chrome doesn’t have a built-in dark theme like Mozilla Firefox and Microsoft Edge do, but you can get a dark Chrome browser in a few clicks. You can even apply a dark theme to every web page you visit.

How to Stop Android’s Keyboard from Censoring Your Messages

Sometimes you need to say what’s on your mind using colorful language. But right out of the box, Android can hamper that, making your swear words duller (or just plain wrong). Here’s how to put a stop to that nonsense.

The Best Phone Docks For Your Car

If you’re in the habit of using your phone as a makeshift GPS navigation unit or music manager in your car, you want a reliable place to…

Click Here to Continue Reading

How to Watch US TV In Europe

Everyone in Europe knows that the US has the best TV. You guys have ESPN, Comedy Central, and HBO. We have…other stuff. So, let’s look at how we poor Europeans can once again plunder your continent; but this time for decent evening entertainment.

How to Enable Microsoft’s New OneDrive Folder Protection in Windows

Microsoft OneDrive now offers to “protect” the contents of your Desktop, Documents, and Pictures folders. You can use your standard file storage folders, and OneDrive will synchronize them as if they were saved in the regular OneDrive folder.

Jenkins Configuration-as-Code: Look ma, no hands

This blog post is part 1 of a Configuration-as-Code series

Jenkins is highly flexible and is today the de facto standard for implementing CI/CD, with an active community to maintain plugins for almost any combination of tools and use-cases. But flexibility has a cost: in addition to Jenkins core, many plugins require some system-level configuration to be set so they can do their job.

In some circumstances, "Jenkins Administrator" is a full time position. One person is responsible for both maintaining the infrastructure, and also pampering a huge Jenkins master with hundred installed plugins and thousands hosted jobs. Maintaining up-to-date plugin versions is a challenge and failover is a nightmare.

This is like years ago when system administrators had to manage dedicated machines per service. In 2018, everything is managed as code using infrastructure automation tools and virtualization. Need a fresh new application server as staging environment for your application? Just deploy a Docker container. Infrastructure is missing resources? Apply a Terraform recipe to allocate more on your favourite Cloud.

What about the Jenkins administrator role in this context? Should they still spend hours in the web UI, clicking checkboxes on web forms? Maybe they already adopted some automation, relying on Groovy script voodoo, or some home-made XML templating?

Early this year we announced the first alpha release of “Jenkins Configuration-as-Code” (JCasC), a fresh new approach to Jenkins configuration management, based on YAML configuration files and automatic model discovery. “JCasC” has been promoted as a top-level Jenkins project, and the corresponding Jenkins Enhancement Proposal has been accepted.

What can JCasC do for our Jenkins Administrator?

JCasC allows us to apply a set of YAML files on our Jenkins master at startup or on-demand via the web UI. Those configuration files are very concise and human readable compared to verbose XML files the Jenkins uses to actually store configuration. The files also have user-friendly naming conventions making it easy for administrators to configure all Jenkins components.

Here’s an example:

jenkins:
 systemMessage: "Jenkins managed by Configuration as Code"

 securityRealm:
   ldap:
     configurations:
       - server: ldap.acme.com
         rootDN: dc=acme,dc=fr
         managerPasswordSecret: ${LDAP_PASSWORD}
     cache:
       size: 100
       ttl: 10
     userIdStrategy: CaseInsensitive
     groupIdStrategy: CaseSensitive

As you can see, you don’t need long explanation to understand how this YAML file will setup your Jenkins master.

Benefits

The most immediate benefit of JCasC is reproducibility. An administrator can now bootstrap a new Jenkins master with the exact same configuration with a trivial setup. This allows them to create a test instance and check the impact of plugin upgrades in a sandboxed environment. This also lets them be more confident with failover and disaster recovery scenarios.

Further benefits come when administrators start managing their Jenkins’ YAML configuration files in source control, like they do with Terraform configuration. Doing so gives them auditing and reversibility of their Jenkins master configuration. Theycan establish a sane configuration change workflow that runs a test Jenkins instance and ensures configuration is healthy before actually applying any change to their production Jenkins master.

Last but not least, with ability to quickly setup Jenkins masters and control them from a set of shared YAML configuration files, administrators can now offer per-team Jenkins instances, with more flexibility on installed plugins. A Master becomes more or less a transient piece of infrastructure for your team, as long as they also manage build definition with Jenkinsfiles.

With Configuration-as-Code we can stop having to treat our Jenkins master like a pet we need to pamper, and start managing Jenkins masters as cattle you can replace without effort nor impacts. Welcome in the “as-code” world.

Figure 1. They are still cute though, right?

Ok, so what’s next?

You can read more about the Jenkins Configuration-as-Code plugin on the project’s github repository. To chat with the community and contributors join our gitter channel, or come see us in person at link:Jenkins World to discuss the JCasC project and its future!

Also don’t miss next post from the Configuration-as-Code series, where we’ll look at how JCasC works with sensitive data like passwords and other credentials.

Come meet the Configuration as Code contributors, Nicolas de Loof and Ewelina Wilkosz at Jenkins World on September 16-19th, register with the code JWFOSS for a 30% discount off your pass.

Friday, 31 August 2018

Our Amazing Success

Our Challenges

Service Instability

Brittle Configuration

Assembly Required

Reduced Development Velocity

Path Forward

Cloud Native Jenkins

Kubernetes as the Runtime

New Extensibility Mechanism

Data on Cloud Managed Data Services

Configuration as Code

Evergreen

Secure by Default Design

Following Footsteps of Jenkins X

All The Same Good Things, with New Foundation

Cloud Native Jenkins MVP

Jolt in Jenkins

Release Model Change

Compatibility

Ingredients

UI Effort

Conclusion

Thursday, 30 August 2018

Different scopes of Kubernetes quota

Introduction

Cluster-scoped init scripts

Init Scripts are now part of the cluster configuration

Init Scripts now work for jobs clusters

Environment variables for init scripts

Access Control for init scripts

Simplified logging

Additional cluster events

Conclusion

Wednesday, 29 August 2018

Tuesday, 28 August 2018

What’s the solution?

Using Databricks and Snowflake

Configuring the Connection

ETL’ing Data into Snowflake

Querying data in Snowflake

Training a Machine Learning Model

Preprocessing and Feature Generation

Seeing the Results in Snowflake

Conclusion

Read More

The world’s largest event for the Apache Spark Community

Knowledge Central

1. Keynotes from Distinguished Engineers, Academics and Industry Leaders

2. Superb Sessions = Excellent Learning

3. Apache Spark Training

4. Apache Spark Meetup

5. London: the City known as The City!

What’s Next

Monday, 27 August 2018