As I’m currently thinking of creating a new app which requires a cloud service (or at least an online service 24/7), I thought about what are the long term constraints I want to have when signing up with a cloud provider. These are the topics to consider IMHO.

First things first

There are many cloud providers, and they all come with different services, different capabilities, different prices.

The most obvious difference between a standard hosting and the cloud is that all providers have different data centers in different locations. The different locations are great to be able to serve your customers with lower latency, but it may not be a priority for you at the beginning. Being hosted at home may be an option in the short term. In the long term, if your company is a success and you have thousands of employees, you don’t want to rely on an external provider, you want to have your own data centers.

All the cloud providers also provide a free tier, meaning that they can provide you with some free stuff, like any good drug dealer would. I’m actually using this metaphor on purpose, you will see why later. The free tier is very great to try things and test things. Using a small micro instance is great to see if a service works as expected in your ecosystem. It won’t be enough to performance test it, of course, but at least you can see if all the connections are working and if all the APIs are sound.

Keeping services small

It’s very tempting to have one service doing everything for a backend. It’s even difficult sometimes to split it in pieces, functional testing may still require several services talking together (to test authentication, you need to create users, which is another service of the backend, and to test the latter, you need the authentication layer as well).

Let’s say you need to save files on a server, and you use one of the MongoDB facility, gridFS. It’s a good idea to store this in another database than your user data, with a different user. At the beginning, you will probably use the same instance to store both of them, but having them in different databases will allow you to split them to scale them differently, and of course if you want to change one of them, you don’t have to impact both of them anymore.

And if your service needs to be changed because one of its dependencies is not maintained anymore, it’s less work to do if the service is small.

Don’t get locked in with one cloud provider

This is actually trickier than it sounds. They all provide some facilities that are only available with them. It may be their ML services, but also their serverless tools.

What happens if they get dropped? Google is known to kill very popular products ALL THE TIME. What happens if you find a better deal for your services with another provider?

Only use services that you can easily port from one cloud provider to another. Select cloud solutions that allow you to use different workflows (for instance, there are cloud services that provide you the option of running Tensorflow, scikit learn… instead of locking you in their own small solution).

This is why I talked about drugs before. These services are easy to use, but they lock you in for all eternity. Beware of them! Choose carefully with a long term plan.

To be continued

My adventures in the cloud are just at their beginning, stay tuned for more episodes!

Buy Me a Coffee!
Other Amount:
Your Email Address:

This question started for me when I had to handled files that could be either compressed or uncompressed and I needed to do so transparently.

If you look online, there may be only one answer to that, and it is on StackOverflow when I answered it. Here is some more context to what the answer does and what’s the problem with Boost::Iostreams.

Context

The reason why there is no online answer is not very obvious. First zlib can handle compressed and uncompressed streams in C on the fly. So there should be no reason why the Iostreams decompressor has any problem.

The reason stems from the fact that the decompressor doesn’t delegate the header parsons to zlib, but does it manually. And there is no option for no header, as it will just break and stop in that case.

So when lots of GNU tools can handle text files or gz-compressed files without a specific option, Boost::Iostreams throws at you an exception telling you to change your stream stack.

This is not very maintainable. For instance, if you think that you have to open your file first to check that it is compressed or not, create your stack to open the file again, it feels like lots of work for nothing. And it is. When you now have cloud streaming that cost for each access, and you need to multiply by two these requests, this is something not sustainable.

The solution

My solution comes by stealing code from the decompressor itself. First, I wanted to just read the first two characters and then wrap them in a fixed array that I would read again either with the decompressor, or simply by calling read on the parent stream. Unfortunately, the only object in Boost::Iostreams, basic_array_source, doesn’t provide a read interface and it would have been tough to switch after to the main stream.

I also tried implementing the seekable interface, which was a huge pain. Parent filters and sources cannot be told to seek back (even if they have the capability, like a simple ifstream) , and you have to tell your full stack to be seekable. Which means that your own filter also has to implement the seekable API (which is impossible if you don’t have random access, like in a compressed file!). The problem is that even if it works for files, it will not work for other kind of streams, like with the Google Storage Client API. This one will silently skip the current buffer and then throw an exception in a parallel thread, aborting your program. Just horrible.

So instead, I reused the peekable_source private class from eh decompressor. The latter already had to sometimes read data and put it back to the main stream. It could have sought back, but instead, it has a small string buffer that it uses when data is requested. And this works so well that I wondered why it’s not part of the main API.

using namespace boost::iostreams;
 
template>typename source=""<
struct PeekableSource {
    typedef char char_type;
    struct category : source_tag, peekable_tag { };
    explicit PeekableSource(Source& src, const std::string& putback = "")
            : src_(src), putback_(putback), offset_(0)
    { }
    std::streamsize read(char* s, std::streamsize n)
    {
        std::streamsize result = 0;
 
        // Copy characters from putback buffer
        std::streamsize pbsize =
                static_cast>std::streamsize>(putback_.size());
        if (offset_ < pbsize) {
            result = (std::min)(n, pbsize - offset_);
            BOOST_IOSTREAMS_CHAR_TRAITS(char)::copy(
                    s, putback_.data() + offset_, result);
            offset_ += result;
            if (result == n)
                return result;
        }
 
        // Read characters from src_
        std::streamsize amt =
                boost::iostreams::read(src_, s + result, n - result);
        return amt != -1 ?
               result + amt :
               result ? result : -1;
    }
    void putback(const std::string& s)
    {
        putback_.replace(0, offset_, s);
        offset_ = 0;
    }
 
    Source&          src_;
    std::string      putback_;
    std::streamsize  offset_;
};

And now we can simply use this to peek at the first two characters of our input stream to see if they are a gz file or not, and then delegate the actual read either to the decompressor or the parent source:

struct GzDecompressor {
    typedef char              char_type;
    typedef multichar_input_filter_tag  category;
 
    gzip_decompressor m_decompressor;
    bool m_initialized{false};
    bool m_is_compressed{false};
    std::string m_putback;
 
    template>typename source="">
    void init(Source& src) {
        std::string data;
        data.push_back(get(src));
        data.push_back(get(src));
        m_is_compressed = data[0] == static_cast>char>(0x1f) && data[1] == static_cast>char>(0x8b);
        src.putback(data);
        m_initialized = true;
    }
 
    template>typename source="">
    std::streamsize read(Source& src, char* s, std::streamsize n) {
        PeekableSource<source> peek(src, m_putback);
        if (!m_initialized) {
            init(peek);
        }
 
        if (m_is_compressed) {
            return m_decompressor.read(peek, s, n);
        }
 
        return boost::iostreams::read(peek, s, n);
    }
};

As we still go through the main read calls, this filter is almost transparent to the user and should not make any impact on performance.

What I regret deeply is that the Iostreams decompressor should have had an option to do so natively.

Buy Me a Coffee!

Other Amount:
Your Email Address:

I played with JavaScript before the web 2.0, so there are many things I didn’t know about JavaScript. We see lots of performance tests in browser tests, server-side JS, but if you don’t know the language, you are limited in your understanding. But if you just know the language, and nothing about client- or server-side applications?

Read More

What’s the common point between the questions of cryptography (US and Australia), vaccines (and link to disease), vitamin C (to cure cancer), spending thousands on power cables for your sound system? Some people use their non-knowledge to bully experts. And I think this book answers the question of why this happens.

Read More

On my quest for a good Flask book, I saw this book from Tarek Ziade. We are more or less of the same generation, both from France and he wrote a far better introductory book to Python in French than mine. He also founded the French Python community (AFPY), so I always had a huge respect for the guy. And the book was appetizing.

Read More

I’m thinking of writing a Web service for a project of mine. For this purpose, I wanted to learn Flask (and a bunch of other technologies), as Flask seems well established and well documented. This is a book from Packt that agglomerates 3 previously released books. One of the main questions is the relevance of them as the Flask API evolves.

Read More

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler.

The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio.

Read More