Best Audio Format for Storage?

stevecrox@kbin.social · 1 year ago

Best Audio Format for Storage?

stevecrox@kbin.social · 2 years ago

Basic rule if someone claims X magically solves a problem they don’t follow X and are a huge generator of the problem.

For example people who claim they don’t need to write comments because they write self documenting code are the people that use variable names x1,x2,y, etc…

Similarly anyone you meet claiming Test Driven Development means they have better tests will write code with appalling code coverage and epically bad tests.

stevecrox@kbin.social · 2 years ago

Python’s public API changes subtly, so minor changes in Python version can lead to massive changes in the version of dependencies you use.

A few years ago we developed a script to update Cassandra on Python 2.7.Y. Production environment used Python 2.7.X (it was 5 patch releases earlier).

This completely changed the cassandra library version. We had to go back 15 patch releases which annoying resulting in a breaking change in the Cassandra libraries API and wouldn’t work on the dev environments Python.

Python 3 hasn’t solved this, 2 years ago I was asked to look at a number of Machine Learning projects running in docker. Upgrading Python from 3.4 to 3.8 had a huge effect on dependencies and figuring out the right combination was a huge pain.

This is a solved problem in Java, Node.js has the same weakness but their changes to language spec are additive so old code runs on new releases (just not the inverse). Ruby has exactly the same issues as Python

stevecrox@kbin.social · edit-2 2 years ago

This advice isn’t grounded in reality.

Management normally defines ways to track and judge itself, these are typically called Key Performance Indicators.

KPI’s are normally things like contract value growth, new contracts signed, profit margin, etc…

So if the project manager is meeting or exceeding their KPI’s and you walk up to their boss telling them the PM is failing as basic job functions, the boss won’t care.

This is because the boss might have set the KPI’s or the boss might also be judged on them. In either situation its to the bosses advantage to ignore you.

The boss will only care if there is a KPI you can demonstrate the PM failing to meet.

Every person/group will have various incentives and motivations. To affect change you have to understand what they are.

stevecrox@kbin.social · 2 years ago

A project manager has responsibility for delivery of a project but they typically lack domain specific knowledge. As a result they can’t directly deliver something, merely ask subject matter experts for advice and facilitate a team to deliver.

Most PM’s cope with the stress of this position poorly.

This cartoon is an example of micro management (a common coping mechanisim), the manager has involved themselves in the low level decisions because that gives a sense of control. If a technical team then tell them its a bad decison the team are effectively attacking their coping mechanisim.

The solution isn’t to tell them their technical idea is terrible, when you’ve fallen down this rabbit hole you have to treat the PM as a stakeholder. They are someone you have to manage, so a common solution is to give them confidence there is a path to delivery, a way to track and understand it.

stevecrox@kbin.social · 2 years ago

During the pandemic I had some unoccupied python graduates I wanted to teach data engineering to.

Initially I had them implement REST wrappers around Apache OpenNLP and SpaCy and then compare the results of random data sets (project Gutenberg, sharepoint, etc…).

I ended up stealing a grad data scientist because we couldn’t find a difference (while there was a difference in confidence, the actual matches were identical).

SpaCy required 1vCPU and 12GiB of RAM to produce the same result as OpenNLP that was running on 0.5 vCPU and 4.5 GiB of RAM.

2 grads were assigned a Spring Boot/Camel/OpenNLP stack and 2 a Spacy/Flask application. It took both groups 4 weeks to get a working result.

The team slowly acquired lockdown staff so I introduced Minio/RabbitMQ/Nifi/Hadoop/Express/React and then different file types (not raw UTF-8, but what about doc, pdf, etc…) for NLP pipelines. They built a fairly complex NLP processing system with a data exploration UI.

I figured I had a group to help me figure out Python best approach in the space, but Python limitations just lead to stuff like needing a Kubernetes volume to host data.

Conversely none of the data scientists we acquired were willing to code in anything but Python.

I tried arguing in my company of the time there was a huge unsolved bit of market there (e.g. MLOP’s)

Alas unless you can show profit on the first customer no business would invest. Which is why I am trying to start a business.

stevecrox@kbin.social · 2 years ago

This is why Java rocks with ETL, the language is built to access files via input/output streams.

It means you don’t need to download a local copy of a file, you can drop it into a data lake (S3, HDFS, etc…) and pass around a URI reference.

Considering the size of Large Language Models I really am surprised at how poor streaming is handled within Python.

stevecrox@kbin.social · edit-2 2 years ago

@ergoplato I didn’t suggest that.

Personally I don’t think its ego. I think you have two issues.

The first is people go through stages learning DevOps. Stage 1 has people deploy a CI because its cool, they build a few basic pipelines and then 90% of people get bored. The 2nd stage is people start extending those pipelines, it results in really complex pipelines requiring lots of unique changes based on the opinion of the writer. You move to the 3rd stage when your asked to recreate/extend for a new project and realise how specific your solutions are.

Learning how to make minor tweaks and hook in a few key points to get what you want takes years. Without that most packagers will want to make big changes upstream which won’t go down well.

The second issue, I have met quite a few developers who become highly stressed when the build system is doing something they haven’t needed to do or understand.

A really simple example I have a Jenkins function which I tend to slip into release pipelines, it captures the release version and creates a version in Jira.

I normally deploy it first as a test before a few other functions to automate various service management requirements.

Its surprising how many devs will suddenly decide every problem (test failed, code failed review, sharepoint breaks, bad os update, etc…) is due to that function.

For me this little function is a test, if the team don’t care I will work to integrate various bits. If they freak out, I’ll revert decide if it is worth walking them through the process or walk away.

stevecrox@kbin.social · edit-2 2 years ago

One of the reasons for the #DevOps movement is developers see building and packaging as #notmyjob.

The task would historically fall on the most junior member of the team, who would make a pigs ear out of it due to complete lack of experience.

This is compounded by the issue that most C/C++ build systems don’t really include dependency management.

Linux distributions have all tried to work out those dependency trees but they came up with slightly different solutions. This is why there are a few “root” distributions everything branches from.

That means developers have to learn about a few root distributions to design a deb/rpm/aur package systems to base their release around.

That is a considerable amount of learning in a subject most aren’t interested in.

The real question is why don’t package maintainers upstream a packaging solution?

stevecrox@kbin.social · 2 years ago

I am currently teaching python and JavaScript devs Typescript. Everytime they hit a problem they switch to any

Sigh

stevecrox@kbin.social · 2 years ago

Engineering is tradeoffs.

A command shell is focused on file operations and starting/stopping applications. So it makes it easy to do those things.

You can use scripting languages (e.g. Node.js/Python) to do everything bash does but they are for general purpose computing and so what and how you perform a task becomes more complicated.

This is why its important to know multiple languages, since each one will make specific tasks easier and a community forms around them as a result.

If I want to mess with the file system/configuration I will use Bash, if I want to build a website I will use Typescript, if I want to train a machine learning model I will use Python, if I am data engineering I will use Java, etc .

stevecrox@kbin.social · 2 years ago

You’ve just moved the packaging problem from distributions to app developers.

The reason you have issues is historically app developers weren’t interested in packaging their application so distributions would figure it out.

If app developers want to package deb, rpm, etc… packages it would also solve the problem.

stevecrox@kbin.social · edit-2 2 years ago

Nice out of date dependencies with those lovely security vulnerabilities!

stevecrox@kbin.social · 2 years ago

I will die onthis hill along with David Mitchell and could NOT care less what people think.

stevecrox@kbin.social · 2 years ago

Just to add.

Look at any hobby in your life and break out the money spent vs the enjoyment you got out of it.

For example the Cinema costs me £10 and a film is 2 hours long, meaning my fun time costs £5 per hour.

A £100 console would have to provide me 20 hours entertainment for it to be comparable to going to the cinema.

These days any PS4 game will have 10-40 hours content, but buying them costs money. Popping on CEX website the most expensive PS4 games are £12. Assuning you only get 10 hours of fun from a game…

The question you should ask yourself is are there 3 games on the PS4 you are interested in playing?

stevecrox@kbin.social · 2 years ago

So reading twitter…

It seems much of the “Ammunition shortage” Prigozhin was loudly complaining about was stock pilling. Similarly much of Wagnar was pulled out of Ukraine to rebuild.

There have been suggestions Prigozhin was planning to launch an attack on Sunday but the Russian MoD attacked a Wagner site forcing him to launch a day early.

One tweet suggested Wagner soliders had been calling family all day (e.g. before a big operation).

Seizing Rostok Von Don was a clever initial play, since its a major logistics hub. This allowed him to arm his troops and provides a base if the coup fails.

It seems the South Military District gave up without a fight, with soliders surrendering.

Prigozhin has sent a shock force to Moscow, its bypassing major cities and trying to get there ASAP. There is a belief senior Kremlin officials will abandon ship.

Various helicopters are attacking the shock force but it seems Wagner are using air defence. Various MI-8, KA-52 and a ll-s2 have been shown destroyed.

The Tik Tok bigrade are trying to attack Rostok, considering they aren’t “true Russians” and were used as barrier troops, this doesn’t seem to be going down well. They are also stripping Donetsk of defenders to do this.

stevecrox@kbin.social · edit-2 2 years ago

Its a really immature and niave response from Kev. Information is power, he’s chosen to operate without knowledge for internet points.

Meta think there is potential to enlarge their market and make money, Kev’s response won’t impact their business making decisions.

Kev should have gone to the meeting to understand what Meta are planning. That would help him figure out how to deal with Meta entering the space.

I don’t expect he could shape their approach but knowing they want to do X, Y or Z might make certain features/fixes a priority so it doesn’t impact everyone else

stevecrox@kbin.social · edit-2 2 years ago

I think there is a focus on C/C++ to justify Pythons performance.

There have been times when the performance of Node js/Python were part of the reasoning for choosing Java/Scala.

Each time you are regaled with how Python is C/C++ underneath and so faster. Each time you have to ask if they will write C/C++ libraries to ensure the application performance meets our needs.

Similarly any time you go near lambas you get a comment on how python lambdas are faster because its C running, where as Java has to start a virtual machine, etc…

3 different companies samething

stevecrox@kbin.social · edit-2 2 years ago

Interview with a Postdoc, Junior Python Developer

stevecrox@kbin.social · 2 years ago

I think every article is missing a key issue and no one has asked Spez yet…

If 3rd Party Apps and AI services are making millions (as he asserts) why isn’t Reddit competing in those areas?

3rd Party Apps aren’t in a war of new features, putting a 5-10 person development team together to analyse the competitor apps and match the features would kill off the unique selling point of the 3rd Party Apps. Why hasn’t Reddit done this?

LLM aren’t new, the first appeared in 2018. Why hasn’t Reddit assembled a team to exploit their own data? In my experience 1 data scientist backed by 2 software engineers can do a lot. It isn’t a huge amount of people needed.

Even if you buy his argument that they companies are profiting from Reddit, Reddit is a platform those companies are building value from. Reddit isn’t providing those services and so those companies profits aren’t “stolen” from Reddit.

It’s like company who sells art supplies. They sell them to a painter for £100, then a painter sells their artwork for £1000. The art supply company then gets upset it didn’t get £1000 for its supplies.

stevecrox

Best Audio Format for Storage?

Best Audio Format for Storage?

Interview with a Postdoc, Junior Python Developer

Interview with a Postdoc, Junior Python Developer